https://bugs.freedesktop.org/show_bug.cgi?id=102322
--- Comment #37 from dwagner jb5sgc1n.nya@20mm.eu --- In the related bug report (https://bugs.freedesktop.org/show_bug.cgi?id=107152) I noticed that this bug can be triggered very reliably and quickly by playing a video with a deliberately lowered frame rate: "mpv --no-correct-pts --fps=3 --ao=null some_arbitrary_video.webm"
This led me to assume this bug might be caused by the dynamic power management, that often ramps performance up/down when a video is played at such a low frame rate.
And indeed, I found this confirmed by many experiments: If I use a script like
#!/bin/bash cd /sys/class/drm/card0/device echo manual >power_dpm_force_performance_level # low echo 0 >pp_dpm_mclk echo 0 >pp_dpm_sclk # medium #echo 1 >pp_dpm_mclk #echo 1 >pp_dpm_sclk # high #echo 1 >pp_dpm_mclk #echo 6 >pp_dpm_sclk
to enforce just any performance level, then the crashes do not occur anymore - also with the "low frame rate video test".
So it seems that the transition from one "dpm" performance level to another, with a certain probability, causes these crashes. And the more often the transitions occur, the sooner one will experience them.
(BTW: For unknown reason, invoking "xrandr" or enabling a monitor after sleep causes the above settings to get lost, so one has to invoke above script again.)