On Fri, Oct 14, 2016 at 3:33 AM, Michel Dänzer michel@daenzer.net wrote:
[ Adding Dan Williams and dri-devel ]
On 14/10/16 03:28 AM, Shawn Starr wrote:
Hello AMD folks,
I have discovered a problem in Linus master that affects AMDGPU, nobody would notice this in drm-next-4.9-wip since its not in this repo.
[...]
87744ab3832b83ba71b931f86f9cfdb000d07da5 is the first bad commit commit 87744ab3832b83ba71b931f86f9cfdb000d07da5 Author: Dan Williams dan.j.williams@intel.com Date: Fri Oct 7 17:00:18 2016 -0700
mm: fix cache mode tracking in vm_insert_mixed() vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(), fails to check the pgprot_t it uses for the mapping against the one recorded in the memtype tracking tree. Add the missing call to track_pfn_insert() to preclude cases where incompatible aliased mappings are established for a given physical address range. Link: http://lkml.kernel.org/r/
147328717909.35069.14256589123570653697.stgit@dwillia2- desk3.amr.corp.intel.com Signed-off-by: Dan Williams dan.j.williams@intel.com Cc: David Airlie airlied@linux.ie Cc: Matthew Wilcox mawilcox@microsoft.com Cc: Ross Zwisler ross.zwisler@linux.intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
:040000 040000 7517c0019fe49c1830b5a1d81f1dc099c5aab98a fd497a604a2af5995db2b8ed1e9c640bede6adf3 M mm
Removal of this patch stops graphics stalls.
Thanks for bisecting this Shawn.
A friend of mine mentions,
"looks like a graphics thingy you depend on is requesting a mapping with a not-allowed cache mode, and now you are (rightfully) getting errors?"
It would be nice to get some more specific pointers what amdgpu (or maybe ttm, since that calls vm_insert_mixed in ttm_bo_vm_fault) might be doing wrong.
BTW, people have reported that rendering stalls every time TTM tries to move a buffer, even if the move is only a few MB.
See FPS and num_bytes_moved here: https://i.imgur.com/kNj2vqF.png
There are 5 big stalls. 4 of them are due to the mm commit.
Marek