Am 11.04.2014 09:52, schrieb Lauri Kasanen:
On Thu, 10 Apr 2014 21:30:03 +0200 Christian König deathsimple@vodafone.de wrote:
Quick thought from someone entirely unfamiliar with the hardware: perhaps you can get the performance benefit without the size increase by moving the else portion into a non-inline function? I'm guessing that most accesses happen in the "if" branch.
The function call overhead is about equal to branching overhead, so splitting it would only help about half that. It's called from many places, and a lot of calls per sec.
Actually direct register access shouldn't be necessary so often. Apart from page flips, write/read pointer updates and irq processing there shouldn't be so many of them. Could you clarify a bit more what issue you are seeing here?
Too much cpu usage for such a simple function. 2% makes it #2 in top-10 radeon.ko functions, right after evergreen_cs_parse. For reference, #3 (radeon_cs_packet_parse) is only 0.5%, one fourth of this function's usage.
I think you misunderstood me here. I do believe your numbers that it makes a noticeable difference.
But I've did a couple of perf tests recently on SI and CIK while hacking on VM support, and IIRC r100_mm_rreg didn't showed up in the top 10 on those systems.
So what puzzles me is who the hack is calling r100_mm_rreg so often that it makes a noticeable difference on evergreen/NI?
Christian.
As proved by the perf increase, it's called often enough that getting rid of the function call overhead (and compiling the if out compile-time) helps measurably.
- Lauri