https://bugzilla.kernel.org/show_bug.cgi?id=206519
Bug ID: 206519 Summary: [amdgpu] kernel NULL pointer dereference on shutdown Product: Drivers Version: 2.5 Kernel Version: 5.5.1.arch1-1, 5.5.3-arch1-1 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: shlomo@fastmail.com Regression: No
Created attachment 287353 --> https://bugzilla.kernel.org/attachment.cgi?id=287353&action=edit shutdown screen photo
When I try to power off my machine, it shows the usual shutdown messages and the screens turn off, but the machine is still powered on. The virtual console shows a kernel NULL pointer dereference at address 0.
I run Arch Linux.
The bug occurs even if I never run X. I can turn on the machine and immediately try to shut it down, and the same bug still occurs.
This bug occurred since I upgraded linux 5.4.15.arch1-1 to 5.5.1.arch1-1. I now run linux 5.5.3.arch1-1 and the bug still exists.
My graphics card is Gigabyte Radeon RX VEGA 56 GAMING OC 8G, connected to six monitors.
A photo of the screen at shutdown is attached. I think these are the relevant lines for this bug:
BUG: kernel NULL pointer dereference, address: 0 [...] RIP: 0010:queue_work_on+0x17/0x40 Code: fd ff ff 44 89 e0 5d 41 5c c3 [...] Call Trace: handle_hpd_rx_irq+0x26e/0x320 [amdgpu] ? _raw_spin_unlock_irq+0x1d/0x30 dm_irq_work_func+0x49/0x60 [amdgpu] process_one_work+0x1e1/0x3d0 [...]
https://bugzilla.kernel.org/show_bug.cgi?id=206519
--- Comment #1 from Shlomo (shlomo@fastmail.com) --- Created attachment 287355 --> https://bugzilla.kernel.org/attachment.cgi?id=287355&action=edit dmesg after boot
https://bugzilla.kernel.org/show_bug.cgi?id=206519
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #2 from Alex Deucher (alexdeucher@gmail.com) --- Can you bisect?
https://bugzilla.kernel.org/show_bug.cgi?id=206519
--- Comment #3 from Shlomo (shlomo@fastmail.com) --- The bug first occurs in Arch Linux 5.5.arch1-1, which set CONFIG_DRM_AMD_DC_HDCP=y [1].
Arch Linux 5.4.15.arch1-1 is good. Arch Linux 5.4.15.arch1-1 with CONFIG_DRM_AMD_DC_HDCP=y set (and no other changes) is bad.
Arch Linux 5.5.arch1-1 (and later) is bad. (CONFIG_DRM_AMD_DC_HDCP=y is set)
Testing the most recent Arch Linux kernel shows the same: Arch Linux 5.5.3.arch1 is bad. Arch Linux 5.5.3.arch1 with CONFIG_DRM_AMD_DC_HDCP unset is good.
This means that this bug was triggered by changes to the config, not kernel changes, so I don't know if this is a regression or not.
[1] https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linu...
https://bugzilla.kernel.org/show_bug.cgi?id=206519
--- Comment #4 from Shlomo (shlomo@fastmail.com) --- I bisected the bug.
The first bad commit is 96a3b32e67236f547cc8acd69d5a3cef125b2295 (drm/amd/display: only enable HDCP for DCN+) with ea268870d6f548d0661e896e9746673210c1fa79 (drm/amd/display: Add hdcp to Kconfig) cherry-picked on top of it.
(The previous commit da3fd7ac0bcf372cc57117bdfcd725cca7ef975a with ea268870d6f548d0661e896e9746673210c1fa79 cherry-picked on top of it is good.)
The call trace for this bug is the same as I posted above.
https://bugzilla.kernel.org/show_bug.cgi?id=206519
Shlomo (shlomo@fastmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|[amdgpu] kernel NULL |[amdgpu] kernel NULL |pointer dereference on |pointer dereference on |shutdown |shutdown when | |CONFIG_DRM_AMD_DC_HDCP=y
https://bugzilla.kernel.org/show_bug.cgi?id=206519
--- Comment #5 from Alex Deucher (alexdeucher@gmail.com) --- Created attachment 287487 --> https://bugzilla.kernel.org/attachment.cgi?id=287487&action=edit possible fix
I think this patch should fix it.
https://bugzilla.kernel.org/show_bug.cgi?id=206519
--- Comment #6 from Shlomo (shlomo@fastmail.com) --- Yes, this fixes the bug.
I applied your patch over linux v5.5, but I first had to modify it so it would apply: - drm_connector_attach_content_protection_property(&aconnector->base, true); + drm_connector_attach_content_protection_property(&aconnector->base, false);
https://bugzilla.kernel.org/show_bug.cgi?id=206519
Shlomo (shlomo@fastmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |CODE_FIX
--- Comment #7 from Shlomo (shlomo@fastmail.com) --- Confirmed fixed on Arch linux 5.6.2-arch1-2. Thanks.
dri-devel@lists.freedesktop.org