https://bugs.freedesktop.org/show_bug.cgi?id=102322
--- Comment #48 from dwagner jb5sgc1n.nya@20mm.eu --- (In reply to Andrey Grodzovsky from comment #47)
Created attachment 141174 [details] [review] add_debug_info.patch
A am attaching a basic debug patch, please try to apply it.
Done.
It should give a bit more info in dmesg whe VM fault happens.
Hmm - I could not see any additional output resulting from it.
Reproduce again like before with the cmd-trace like before and once the fault happens if possible try quickly run
sudo umr -O halt_waves -wa
and only if you still have running system after that do the sudo umr -O verbose -R gfx[.]
The driver should be loaded amdgpu.vm_fault_stop=2 from grub
Did that - will attach the script "gpu_debug3.sh" and its output - this time, dmesg and trace output are in the same file, if you want to look only at the dmesg part, "grep '^[' gpu_debug_3.txt" will get it.
I reproduced the bug 4 times, on 2 occasions no error was emitted before crashing, the 2 other times both umr commands could still run - since the error message looked the same, I'll attach the shorter file, where the crash occurred more quickly.
Also check if adding amdgpu.vm_debug=1 makes the issue reproduce more quickly
I used that setting, but it did not seem to make a difference for how quickly the crash occurred - still "some seconds to some minutes".