https://bugs.freedesktop.org/show_bug.cgi?id=101528
--- Comment #2 from Alexander Tsoy alexander@tsoy.me --- I've added printing of some debug info into smu7_hwmgr.c and here what I get before GPU enters that state:
[ 778.701843] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 778.707608] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 778.713379] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 60 [ 778.748777] AMDGPU: vblank_time_us: 0, switch_limit_us: 450 [ 778.754361] AMDGPU: vblank_time_us: 0, switch_limit_us: 450 [ 778.759951] AMDGPU: disable_mclk_switching: 1, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 1, mode_info.refresh_rate: 0
For some reason if refresh_rate = 0 then vblank_time_us = 0. Shouldn't the latter be 0xffffffff instead? So I guess the following commit is the culprit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
and the following patch should fix (or workaround?) this issue:
- if (vblank_time_us < switch_limit_us) + if (vblank_time_us && (vblank_time_us < switch_limit_us))
After applying it:
[ 409.588673] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 409.594427] AMDGPU: vblank_time_us: 630, switch_limit_us: 450 [ 409.600182] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 60 [ 409.639750] AMDGPU: vblank_time_us: 0, switch_limit_us: 450 [ 409.645321] AMDGPU: vblank_time_us: 0, switch_limit_us: 450 [ 409.650917] AMDGPU: disable_mclk_switching: 0, disable_mclk_switching_for_frame_lock: 0, info.display_count: 1, smu7_vblank_too_short: 0, mode_info.refresh_rate: 0
$ cat /sys/class/drm/card0/device/pp_dpm_mclk 0: 150Mhz * 1: 300Mhz 2: 700Mhz 3: 1450Mhz