Hi, Christian,
On 4/8/20 1:53 PM, Thomas Hellström (VMware) wrote:
From: "Thomas Hellstrom (VMware)" thomas_os@shipmail.org
With amdgpu and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y, there are errors like: BUG: non-zero pgtables_bytes on freeing mm and: BUG: Bad rss-counter state with TTM transparent huge-pages. Until we've figured out what other TTM drivers do differently compared to vmwgfx, disable the huge_fault() callback, eliminating transhuge page-table entries.
Cc: Christian König christian.koenig@amd.com Signed-off-by: Thomas Hellstrom (VMware) thomas_os@shipmail.org Reported-by: Alex Xu (Hello71) alex_y_xu@yahoo.ca Tested-by: Alex Xu (Hello71) alex_y_xu@yahoo.ca
Without being able to test and track this down on amdgpu there's little more than this I can do at the moment. Hopefully I'll be able to test on nouveau/ttm after getting back from vacation to see if I can reproduce.
It looks like some part of the kernel mistakes a huge page-table entry for a page directory, and that would be a path that is not hit with vmwgfx.
/Thomas