[ Adding Dan Williams and dri-devel ]
On 14/10/16 03:28 AM, Shawn Starr wrote:
[...]
Thanks for bisecting this Shawn.
It would be nice to get some more specific pointers what amdgpu (or maybe ttm, since that calls vm_insert_mixed in ttm_bo_vm_fault) might be doing wrong.
On Fri, Oct 14, 2016 at 3:33 AM, Michel Dänzer michel@daenzer.net wrote:
BTW, people have reported that rendering stalls every time TTM tries to move a buffer, even if the move is only a few MB.
See FPS and num_bytes_moved here: https://i.imgur.com/kNj2vqF.png
There are 5 big stalls. 4 of them are due to the mm commit.
Marek
On 17 October 2016 at 04:41, Marek Olšák maraeo@gmail.com wrote:
/* * We'd like to use VM_PFNMAP on shared mappings, where * (vma->vm_flags & VM_SHARED) != 0, for performance reasons, * but for some reason VM_PFNMAP + x86 PAT + write-combine is very * bad for performance. Until that has been sorted out, use * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719 */ vma->vm_flags |= VM_MIXEDMAP;
We have that comment in the ttm code, which to me implies that mixed is doing the right thing now, but that is slow, as the interface we should be using.
Dave.
On Sun, Oct 16, 2016 at 1:53 PM, Dave Airlie airlied@gmail.com wrote:
Aren't there only 2 possibilities for this regression?
1/ a memtype entry was never made so track_pfn_insert() returns an uncached mapping
2/ a conflicting memtype entry exists and undefined behavior due to mixed mapping types is avoided with the change.
On 18 October 2016 at 07:25, Dan Williams dan.j.williams@intel.com wrote:
3/ The CPU usage through this path goes up, and slows things down, though I suspect you it's more an uncached mapping showing up when we don't expect it.
Dave.
On 18 October 2016 at 08:01, Dave Airlie airlied@gmail.com wrote:
It's looking line number 1, there is no mapping, now we get uncached where we used to get write through.
difference in page prot 7f7bbc0e0000, pfn 20000000000e71e4, 8000000000000037, 800000000000002f
0x2f is the vma pg prot which has PWT set in it, 0x37 is the returned pgprot which lacks that bit.
not sure where to go from here, suggestions? Dave.
On Mon, Oct 17, 2016 at 8:48 PM, Dave Airlie airlied@gmail.com wrote: [..]
If the driver established an ioremap_wt() across the range, or just called reserve_memtype() directly that should restore WT mappings.
Although Daniel's suggestion to use the i915 mapping helpers sounds like it avoids problem 3/ as well.
On 18 October 2016 at 23:53, Dan Williams dan.j.williams@intel.com wrote:
Well we shouldn't be doing that many VRAM mappings on the CPU so I doubt we'll hit the overheads here that often.
Ideally we'd always use DMA to move stuff in/out of VRAM, but there are some places where we still do WC VRAM writes for uploads.
So I've sent the patches, any major opinions on them, we can't just ioremap_wc the whole BAR, as on 32-bit that just messes things up and it's unnecessary anyways.
Dave.
On Wed, Oct 19, 2016 at 8:42 AM, Dave Airlie airlied@gmail.com wrote:
WC VRAM for uploads is better than WC GART IMO.
Marek
On 19/10/16 07:33 PM, Marek Olšák wrote:
It's not a simple choice I'm afraid. While writing directly to WC VRAM can be faster than writing to WC GART and then DMA'ing to VRAM, doing so increases pressure on the first 256MB of VRAM. That's why I disabled direct VRAM writes for streaming uploads again in https://cgit.freedesktop.org/mesa/mesa/commit/?id=7b4276d7acf2e0f77044cb50ca... . It's possible that something has changed since then though, feel free to play with enabling it again.
On Thu, Oct 20, 2016 at 3:11 AM, Michel Dänzer michel@daenzer.net wrote:
amdgpu should handle any memory pressure gracefully. radeon is not so robust though.
Marek
On Tue, Oct 18, 2016 at 08:01:01AM +1000, Dave Airlie wrote:
Sounds reasonable, at least we (=i915 folks) known pte caching type tracking is ridiculously expensive. In 4.9 we have our own pte walker and upfront (at driver load) caching type checking to avoid all that. It's in i915_mm.c, but probably should be moved into core kernel code (next to the io_mapping stuff, which we reused as the tracking structure). -Daniel
dri-devel@lists.freedesktop.org