https://bugs.freedesktop.org/show_bug.cgi?id=107065
--- Comment #11 from Christian König ckoenig.leichtzumerken@gmail.com --- (In reply to Andrey Grodzovsky from comment #10)
Created attachment 140418 [details] [review] drm/amdgpu: Verify root PD is mapped into kernel address space.
dwagner, please try this patch. Fixes the issue for me and I observed no suspend/resume issues.
Christian, please take a look at the patch, problem was that in amdgpu_vm_update_directories the parent BO didn't have a kernel mapping and so later inside amdgpu_vm_cpu_set_ptes pe += (unsigned long)amdgpu_bo_kptr(bo); would equal to 0000000000002000 since parent amdgpu_bo_kptr woudld return NULL. The parent was the root PD.
This was still working in 67b8d5c Linus Torvalds 7 weeks ago Linux 4.17-rc5 (tag: v4.17-rc5) but I wasn't able to exactly pinpoint which change broke it. I am not sure my fix is the right one so please advise.
No idea when that broke either, CPU based updates is not something we usually test.
Anyway it's a good catch, but I would rather add that to amdgpu_vm_bo_base_init() (with the appropriate checks).
That would also allow us to remove the duplicated code from amdgpu_vm_alloc_levels().