Re: [PATCH 0/5] radeon: Write-combined CPU mappings of BOs in GTT

23 Jul 2014


      Am 23.07.2014 05:54, schrieb Michel Dänzer:
...
On 21.07.2014 17:07, Christian König wrote:
...
Am 19.07.2014 03:15, schrieb Michel Dänzer:
...
On 19.07.2014 00:47, Christian König wrote:
...
Am 18.07.2014 05:07, schrieb Michel Dänzer:
...
...
> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI
I'm still not very keen with this change since I still don't
understand
the reason why it's faster than with GTT. Definitely needs more
testing
on a wider range of systems.
Sure. If anyone wants to give this patch a spin and see if they can
measure any performance difference, good or bad, that would be
interesting.
...
Maybe limit it to APUs for now?
But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an
even
bigger win with dedicated GPUs than with the Kaveri built-in GPU on my
system. I suspect it may depend on the bandwidth available for PCIe vs.
system memory though.
I've made a few tests today with the kernel part of the patches running
Xonotic on Ultra in 1920 x 1080.
Without any patches I get around ~47.0fps on average with my dedicated
HD7870.
Adding only "drm/radeon: Use write-combined CPU mappings of rings and
IBs on >= SI" and that goes down to ~45.3fps.
Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >=
SI" and the frame rate goes down to ~27.74fps.
Hmm, looks like I'll need to do more benchmarking of 3D workloads as
well.
I haven't been able to consistently[0] measure any significant
difference between all placements of the rings and IBs with Xonotic or
Reaction Quake with my Bonaire. I'd expect Xonotic to be shader / GPU
memory bandwidth bound rather than CS bound anyway, so a ~40% hit from
that kernel patch alone is very surprising. Are you sure it wasn't just
the same kind of variation as described below?
Yes, I've measured that multiple times and the results where quite 
consistent.
But I didn't measured it on a Bonaire, where the bottleneck probably 
isn't the CPU load. I measured it on a fast Pitcairn and there Xonotic 
was clearly affected by the patches.
...
[0] There were slightly different results sometimes, but next time I
tried the same setup again, it was back to the same as always. So it
seemed to depend more on the particular system boot / test run / moon
phase / ... than the kernel patches themselves.
...
...
Alex, given those numbers, it's probably best if you remove the "Use
write-combined CPU mappings of rings and IBs on >= SI" change from your
tree as well for now.
I wouldn't go as far as reverting the patch. It just needs a bit more
fine tuning and that can happen in the 3.17rc cycle.
There's no need to revert it, just drop it from the tree. I'd still
prefer that for now.
...
My tests clearly show that we still can use USWC for the ring buffer on
SI and probably earlier chips as well.
Yeah, that might be the safest approach for now.
How about using USWC for the rings on all chips since R600 and for the 
IB only on CIK? As far as I can see that should do the trick quite well.
Christian.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH 0/5] radeon: Write-combined CPU mappings of BOs in GTT