https://bugzilla.kernel.org/show_bug.cgi?id=209713
Bug ID: 209713 Summary: amdgpu drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_l ink_encoder.c:483 dcn10_get_dig_frontend+0x9e/0xc0 [amdgpu] when resuming from S3 state Product: Drivers Version: 2.5 Kernel Version: 5.8.13-arch1-1 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: low Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: samy@lahfa.xyz Regression: No
Created attachment 293025 --> https://bugzilla.kernel.org/attachment.cgi?id=293025&action=edit parts of dmesg where the call trace happens during the resume from S3 sleep state.
I'm thinking that this bug is a regression since I haven't seen this call trace before on kernel older than 5.8.12-arch1-1 but I have yet to confirm this.
The call trace may also happen only in a very specific way, my current computer has a USB-C Dock that is plugged in and the call trace happened when the USB-C was plugged in and the computer was suspended, then resumed.
It is a Lenovo Thinkpad T495 model 20NKS28F00 with an AMD Ryzen 7 3700U and a Vega Radeon RX 10.
Further comments will confirm if the call trace happens only when the USB-C Dock is plugged.
As well as if this call trace happens on kernels older than 5.8.12-arch1-1.
The computer does resume successfully and there is a like a minor screen glitch for a millisecond so it's not a very severe bug.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
Lahfa Samy (samy@lahfa.xyz) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |UNREPRODUCIBLE
--- Comment #1 from Lahfa Samy (samy@lahfa.xyz) --- I cannot reproduce this call trace on the new kernel 5.9.1, so I could take that this issue was silently fixed ? I'll open the issue again if I see that the call trace shows up again someday.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #2 from Michel Dänzer (michel@daenzer.net) --- I'm still hitting this (when fbcon is initialized) with the DRM code queued for 5.10.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
Lahfa Samy (samy@lahfa.xyz) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|UNREPRODUCIBLE |---
https://bugzilla.kernel.org/show_bug.cgi?id=209713
Klaus Mueller (kmueller@justmail.de) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |kmueller@justmail.de
--- Comment #3 from Klaus Mueller (kmueller@justmail.de) --- I'm hitting this problem, too, after resume from s2ram.
- Linux 5.10.1 - CPU: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx - Xorg 1.20 - Mesa 20.2
See attached file dcn10_get_dig_frontend.log
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #4 from Klaus Mueller (kmueller@justmail.de) --- Created attachment 294343 --> https://bugzilla.kernel.org/attachment.cgi?id=294343&action=edit Trace linux 5.10.1
see entry before
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #5 from Klaus Mueller (kmueller@justmail.de) --- Seems to be fixed for me since the last firmware update for the Picasso driver: - xf86-video-amdgpu-19.1.0-lp152.67.5.x86_64 - kernel-firmware-20201218-lp152.36.1.noarch
https://bugzilla.kernel.org/show_bug.cgi?id=209713
Oliver Reeh (oliver@diereehs.de) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |oliver@diereehs.de
--- Comment #6 from Oliver Reeh (oliver@diereehs.de) --- The same happens on kernel 5.10.7 with kernel-firmware-20210109_d528862.
- CPU: AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx (family: 0x17, model: 0x18, stepping: 0x1) - XOrg 1.20.10 . Mesa 20.3.2
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #7 from Klaus Mueller (kmueller@justmail.de) --- Yeah, it came again yesterday evening - after it has been disappeared for about one week ... .
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #8 from Oliver Reeh (oliver@diereehs.de) --- It's fixed in kernel 5.10.9 with Mesa 20.3.3.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #9 from Klaus Mueller (kmueller@justmail.de) --- no - same behavior as before with 5.10.9 and Mesa 20.3.3
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #10 from Klaus Mueller (kmueller@justmail.de) --- Uups, it's the other crash now: 2021-01-23T18:45:31.955962+01:00 localhost kernel: [23110.401847] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out 2021-01-23T18:45:31.955989+01:00 localhost kernel: [23110.401869] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:62:crtc-0] flip_done timed out 2021-01-23T18:45:42.709289+01:00 localhost kernel: [23121.153848] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:52:plane-3] flip_done timed out 2021-01-23T18:45:42.709318+01:00 localhost kernel: [23121.153944] ------------[ cut here ]------------ 2021-01-23T18:45:42.709320+01:00 localhost kernel: [23121.154112] WARNING: CPU: 4 PID: 2627 at ../drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7272 amdgpu_dm_atomic_commit_tail+0x22b1/0x2360 [amdgpu]
let's wait ...
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #11 from Oliver Reeh (oliver@diereehs.de) --- The problem is back with kernel 5.10.10.
[ 89.664494] WARNING: CPU: 6 PID: 4323 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_link_encoder.c:483 dcn10_get_dig_frontend+0x94/0xc0 [amdgpu]
https://bugzilla.kernel.org/show_bug.cgi?id=209713
Frank Kruger (fkrueger@mailbox.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |fkrueger@mailbox.org
--- Comment #12 from Frank Kruger (fkrueger@mailbox.org) --- I am seeing the aforementioned warning at boot for kernel >= 5.10.10, with kernel-firmware-amdgpu-20210119 (AMD Ryzen 7 PRO 4750U). Kernel 5.10.9 does not have it.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #13 from Frank Kruger (fkrueger@mailbox.org) --- The only change regarding "DCN" from 5.10.9 to 5.10.10 is
commit 99ea120383b19feb1737c787dc1c8b35ce630fc5 Author: Alex Deucher alexander.deucher@amd.com Date: Mon Jan 4 11:24:20 2021 -0500
drm/amdgpu/display: drop DCN support for aarch64
commit c241ed2f0ea549c18cff62a3708b43846b84dae3 upstream.
From Ard:
"Simply disabling -mgeneral-regs-only left and right is risky, given that the standard AArch64 ABI permits the use of FP/SIMD registers anywhere, and GCC is known to use SIMD registers for spilling, and may invent other uses of the FP/SIMD register file that have nothing to do with the floating point code in question. Note that putting kernel_neon_begin() and kernel_neon_end() around the code that does use FP is not sufficient here, the problem is in all the other code that may be emitted with references to SIMD registers in it.
So the only way to do this properly is to put all floating point code in a separate compilation unit, and only compile that unit with -mgeneral-regs-only."
Disable support until the code can be properly refactored to support this properly on aarch64.
Acked-by: Will Deacon will@kernel.org Reported-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Alex Deucher alexander.deucher@amd.com [ardb: backport to v5.10 by reverting c38d444e44badc55 instead] Acked-by: Alex Deucher alexander.deucher@amd.com # v5.10 backport Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Any idea?
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #14 from Michel Dänzer (michel@daenzer.net) --- (In reply to Frank Kruger from comment #12)
I am seeing the aforementioned warning at boot for kernel >= 5.10.10, with kernel-firmware-amdgpu-20210119 (AMD Ryzen 7 PRO 4750U). Kernel 5.10.9 does not have it.
This was originally reported for older kernels, and per comment 2, I was hitting it with the DRM code merged for 5.10 before 5.10-rc1. You probably just didn't hit it with 5.10.9 by luck.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
Erik Quaeghebeur (kernelbugs@equaeghe.nospammail.net) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |kernelbugs@equaeghe.nospamm | |ail.net
--- Comment #15 from Erik Quaeghebeur (kernelbugs@equaeghe.nospammail.net) --- Created attachment 295319 --> https://bugzilla.kernel.org/attachment.cgi?id=295319&action=edit Trace for 5.10.16 at boot
(In reply to Frank Kruger from comment #12)
I am seeing the aforementioned warning at boot for kernel >= 5.10.10, with kernel-firmware-amdgpu-20210119 (AMD Ryzen 7 PRO 4750U). […]
I can confirm this for 5.10.16.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #16 from Oliver Reeh (oliver@diereehs.de) --- Looks like this is fixed in 5.11.0 and 5.11.1.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #17 from Lahfa Samy (samy@lahfa.xyz) ---
Looks like this is fixed in 5.11.0 and 5.11.1.
I'm still getting this issue reliably under kernel 5.11.1 when resuming from suspended state.(In reply to Oliver Reeh from comment #16)
So I confirm this for 5.11.1, still not solved.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #18 from Lahfa Samy (samy@lahfa.xyz) --- Created attachment 295425 --> https://bugzilla.kernel.org/attachment.cgi?id=295425&action=edit Trace after resume from S3 state on 5.11.1
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #19 from Klaus Mueller (kmueller@justmail.de) --- I didn't see any problem any more since 2021-02-14 and linux 5.10.16 with this patch applied: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree...
Hope that this really fixed the problem for me.
https://bugzilla.kernel.org/show_bug.cgi?id=209713
--- Comment #20 from Oliver Reeh (oliver@diereehs.de) --- Still no problem here with the 5.11.x kernels.
dri-devel@lists.freedesktop.org