On Wed, Jul 23, 2014 at 9:21 AM, Michel Dänzer michel@daenzer.net wrote:
On 23.07.2014 15:42, Christian König wrote:
Am 23.07.2014 05:54, schrieb Michel Dänzer:
On 21.07.2014 17:07, Christian König wrote:
Am 19.07.2014 03:15, schrieb Michel Dänzer:
On 19.07.2014 00:47, Christian König wrote:
Am 18.07.2014 05:07, schrieb Michel Dänzer: >>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI >> I'm still not very keen with this change since I still don't >> understand >> the reason why it's faster than with GTT. Definitely needs more >> testing >> on a wider range of systems. > Sure. If anyone wants to give this patch a spin and see if they can > measure any performance difference, good or bad, that would be > interesting. > >> Maybe limit it to APUs for now? > But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an > even > bigger win with dedicated GPUs than with the Kaveri built-in GPU > on my > system. I suspect it may depend on the bandwidth available for > PCIe vs. > system memory though. I've made a few tests today with the kernel part of the patches running Xonotic on Ultra in 1920 x 1080.
Without any patches I get around ~47.0fps on average with my dedicated HD7870.
Adding only "drm/radeon: Use write-combined CPU mappings of rings and IBs on >= SI" and that goes down to ~45.3fps.
Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >= SI" and the frame rate goes down to ~27.74fps.
Hmm, looks like I'll need to do more benchmarking of 3D workloads as well.
I haven't been able to consistently[0] measure any significant difference between all placements of the rings and IBs with Xonotic or Reaction Quake with my Bonaire. I'd expect Xonotic to be shader / GPU memory bandwidth bound rather than CS bound anyway, so a ~40% hit from that kernel patch alone is very surprising. Are you sure it wasn't just the same kind of variation as described below?
Yes, I've measured that multiple times and the results where quite consistent.
But I didn't measured it on a Bonaire, where the bottleneck probably isn't the CPU load. I measured it on a fast Pitcairn
Ahem, my Bonaire is cranking out ~90fps of Xonotic Ultra at 1920x1080. :) (And AFAIK there are even faster Bonaire variants)
and there Xonotic was clearly affected by the patches.
Okay, I hadn't realized we're not doing any command stream checking as of CIK, that probably explains the difference.
I think CIK is doing CS checking for VCE, but not for graphics. SI is doing CS checking for everything.
Marek