https://bugs.freedesktop.org/show_bug.cgi?id=107065
--- Comment #21 from Andrey Grodzovsky andrey.grodzovsky@amd.com --- (In reply to dwagner from comment #20)
(In reply to Andrey Grodzovsky from comment #19)
I was able to reproduce this instantly without even using page tables CPU update mode. Looks like a regression since S3 was working fine for long time. Were you able to find a regression point for this ?
Not for the exact symptom described in this report, but for an older S3 resume issue that was partially resolved - https://bugs.freedesktop.org/show_bug.cgi?id=103277 - I did once find the regression caused by the "drm/amd/display: Match actual state during S3 resume" commit.
Unluckily, the many changes that followed thereafter do no longer allow to bisect the symptom there to one specific commit, but given that it still occurs if I use the option "drm.edid_firmware=edid/LG_EG9609_edid.bin", I think there is still some bug in the order of things done during re-initialization upon S3 resumes, and setting some fixed EDID seems to expose it as crash.
I found the offending patch - drm: Stop updating plane->crtc/fb/old_fb on atomic drivers Not sure yet what's going on there and not sure it will fix you issue with amdgpu_vm_cpu_set_ptes page fault after S3 since I haven't observe it here. Still worth a try on your side to revert it and see what happens.