https://bugzilla.kernel.org/show_bug.cgi?id=206017
Bug ID: 206017 Summary: Kernel 5.4.x unusable with GUI due to [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! Product: Drivers Version: 2.5 Kernel Version: 5.4.x Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: udovdh@xs4all.nl Regression: No
AMD Ryzen 5 3400G with Radeon Vega Graphics on Gigabyte X570 AORUS PRO, running Fedora 31, kernel.org kernels, git mesa, etc After booting into 5.4, soon the GUI freezes, e.g. right when I start Firefox. On 5.3.18 it takes days to crash amdgpu. Soft recovery does not help. 5.3.18 is EOLed so 5.4 issues need priority:
(..) [ 12.884828] pps pps0: new PPS source serial0 [ 12.884832] pps pps0: source "/dev/ttyS0" added [ 12.898511] it87: it87 driver version <not provided> [ 12.898635] it87: Found IT8792E/IT8795E chip at 0xa60, revision 3 [ 12.898675] it87: Beeping is supported [ 14.244524] igb 0000:04:00.0: changing MTU from 1500 to 7200 [ 17.328845] igb 0000:04:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 17.331973] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 18.564142] io scheduler mq-deadline registered [ 22.352636] igb 0000:04:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 31.464130] fuse: init (API version 7.31) [ 75.198868] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! [ 80.318799] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #1 from udo (udovdh@xs4all.nl) --- See https://gitlab.freedesktop.org/drm/amd/issues/934 for more details.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #2 from Alex Deucher (alexdeucher@gmail.com) --- There is no need to file another bug report. Let's keep this all in one place.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #3 from udo (udovdh@xs4all.nl) --- amdgpu.noretry=0 appears to help on 5.4.6.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #4 from udo (udovdh@xs4all.nl) --- But 5.4.x is not really stable; it crashes easily within a day where 5.3.18 can stay up for a few days.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #5 from udo (udovdh@xs4all.nl) --- Firefox is still the trigger. When I do not use it the system remains usable. When I use Firefox the system crashes hard within a few hours.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
udo (udovdh@xs4all.nl) changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Kernel 5.4.x unusable with |Kernel 5.4.x unusable with |GUI due to |GUI due to hard crashes |[drm:amdgpu_dm_atomic_commi | |t_tail [amdgpu]] *ERROR* | |Waiting for fences timed | |out! |
https://bugzilla.kernel.org/show_bug.cgi?id=206017
udo (udovdh@xs4all.nl) changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|Kernel 5.4.x unusable with |Kernel 5.4.x unusable with |GUI due to hard crashes |GUI due to crashes (some | |hard)
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #6 from udo (udovdh@xs4all.nl) --- 5.4.8 also suffers from the hard hang, Firefox is involded playing youtube and such.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #7 from udo (udovdh@xs4all.nl) --- And it happened again, without youtube playing but while browsing. 5.3.18 takes a lot longer to crash/hang or whatever.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #8 from udo (udovdh@xs4all.nl) --- Does the screen corruption I see now and then have something to do with this issue?
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #9 from udo (udovdh@xs4all.nl) --- 5.4.8 runs less than 12 hours until hard crash when used.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #10 from udo (udovdh@xs4all.nl) --- More like 6 hours or less.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
udo (udovdh@xs4all.nl) changed:
What |Removed |Added ---------------------------------------------------------------------------- Severity|high |blocking
--- Comment #11 from udo (udovdh@xs4all.nl) --- I.e.: it is stable and working OK with e.g. mkv playing. Then we start Firefox and boom. System freezes,
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #12 from udo (udovdh@xs4all.nl) --- 5.4.9 also has this issue. Runs ok with firefox not being used, as far as I can test and detect. With firefox the system locks hard after a while.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
Paul (paul.e.hill2@outlook.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |paul.e.hill2@outlook.com
--- Comment #13 from Paul (paul.e.hill2@outlook.com) --- Hello!
I am experiencing the same issue on 5.4.10 (Fedora 31, KDE Spin). I'm going to attempt the 'amdgpu.noretry=0' fix later today.
I made the below bug report with Fedora: https://ask.fedoraproject.org/t/fedora-kde-amdgpu-issue/5026
Summarized: gpu: Radeon Vega 10
Issue: I discovered a lot of these entries within journalctl and dmesg after gui freezes:
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Thank you!
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #14 from Paul (paul.e.hill2@outlook.com) --- (In reply to Paul from comment #13)
Hello!
I am experiencing the same issue on 5.4.10 (Fedora 31, KDE Spin). I'm going to attempt the 'amdgpu.noretry=0' fix later today.
I made the below bug report with Fedora: https://ask.fedoraproject.org/t/fedora-kde-amdgpu-issue/5026
Summarized: gpu: Radeon Vega 10
Issue: I discovered a lot of these entries within journalctl and dmesg after gui freezes:
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Thank you!
Just wanted to report in that the 'amdgpu.noretry=0' workaround resolved my issues. Thanks!
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #15 from Alex Deucher (alexdeucher@gmail.com) --- Should be fixed in: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... which should also be landing in various stable kernels as well.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #16 from udo (udovdh@xs4all.nl) --- amdgpu.noretry=0 works as workaround so the commit should fix things well. Thanks for the commit!
Still looking for the right component for https://bugzilla.kernel.org/show_bug.cgi?id=206191 :-/
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #17 from udo (udovdh@xs4all.nl) --- Kernel 5.6.x works very well. Git mesa might help too.
https://bugzilla.kernel.org/show_bug.cgi?id=206017
udo (udovdh@xs4all.nl) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |CODE_FIX
https://bugzilla.kernel.org/show_bug.cgi?id=206017
priit@ww.ee (priit@ww.ee) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |priit@ww.ee
--- Comment #18 from priit@ww.ee (priit@ww.ee) --- And it's back after several months.
5.6.16-1-MANJARO mesa 20.0.7-3
amdgpu.ppfeaturemask=0xfffd7fff amdgpu.noretry=0 amdgpu.lockup_timeout=0 amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on iommu=pt
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #19 from udo (udovdh@xs4all.nl) --- Appears to work OK for me:
AMD Ryzen 5 3400G with Radeon Vega Graphics on Gigabyte X570 AORUS PRO, Fedora 31, git mesa, kernel.org 5.6.x, etc
amdgpu.gttsize=8192 amdgpu.lockup_timeout=1000 amdgpu.gpu_recovery=1 amdgpu.noretry=0 amdgpu.ppfeaturemask=0xfffd3fff
https://bugzilla.kernel.org/show_bug.cgi?id=206017
--- Comment #20 from priit@ww.ee (priit@ww.ee) --- kernel 5.8.0-2-MANJARO; Vega 64 GPU; mesa 20.1.5; xf86-video-amdgpu 19.1.0
aug 20 12:58:47 Zen kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! aug 20 12:58:47 Zen kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! aug 20 12:58:52 Zen kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out! aug 20 12:58:52 Zen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=158674961, emitted seq=158674963 aug 20 12:58:52 Zen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 933 thread Xorg:cs0 pid 941 aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x63, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x26, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x61, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed message: 0x37, input parameter: 0x0, error code: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x63, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x26, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed to send message: 0x61, ret value: 0xffffffff aug 20 12:58:53 Zen kernel: amdgpu: [powerplay] Failed message: 0x37, input parameter: 0x0, error code: 0xffffffff aug 20 12:58:56 Zen systemd-coredump[109412]: Process 933 (Xorg) of user 0 dumped core.
dri-devel@lists.freedesktop.org