https://bugs.freedesktop.org/show_bug.cgi?id=102322
--- Comment #40 from Andrey Grodzovsky andrey.grodzovsky@amd.com --- Created attachment 141112 --> https://bugs.freedesktop.org/attachment.cgi?id=141112&action=edit .config
I uploaded my .config file - maybe something in your Kconfig flags makes this happen - you can try and rebuild latest kernel from Alex's repository using my .config and see if you don't experience this anymore. https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
Other than that, since you system hard hangs so you can't do any postmortem dumps, you can at least provide output from events tracing though trace_pipe to catch live logs on the fly. Maybe we can infer something from there...
So again - Load the system and before starting reproduce run the following trace command -
sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"
then cd /sys/kernel/debug/tracing && cat trace_pipe
When the problem happens just copy all the output from the terminal to a log file. Make sure your terminal app has largest possible buffer to catch ALL the output.