https://bugs.freedesktop.org/show_bug.cgi?id=102322
--- Comment #61 from dwagner jb5sgc1n.nya@20mm.eu ---
Please use amdgpu.vm_update_mode=3 to get back to VM_FAULTs issue.
The "good" news is that reproduction of the crashes with 3-fps-video-replay is very quick when using amdgpu.vm_update_mode=3.
But the bad news is that I have not been able to get useful error output when using vm_update_mode=3.
At first I tried with also amdgpu.vm_debug=1, and with that in 10 crashes not a single error output line was emitted to either the ssh channel or the system journal.
I then tried with amdgpu.vm_debug=0, and while a few error lines output become logged, then, not quite anything useful - see also in attached example:
[ 912.447139] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=12818, emitted seq=12819 [ 912.447145] [drm] GPU recovery disabled.
These are the only lines indicating the error, not even the echo "crash detected!" after the "dmesg -w | tee /dev/tty | grep -m 1 -e "amdgpu.*GPU" -e "amdgpu.*ERROR" gets emitted, much less the theoretically following umr commands.
What could I do to not let the kernel die so quickly when using amdgpu.vm_update_mode=3?