On Thu, Aug 5, 2021 at 2:14 PM Linus Torvalds torvalds@linux-foundation.org wrote:
This might possibly have been fixed already by the previous drm pull, but I wanted to report it anyway, just in case.
It happened after an uptime of over a week, so it might not be trivial to reproduce.
It's a NULL pointer dereference in dc_stream_retain() with the code being
lock xadd %eax,0x390(%rdi) <-- trapping instruction
and that's just the
kref_get(&stream->refcount);
with a NULL 'stream' argument.
Call Trace: dc_resource_state_copy_construct+0x13f/0x190 [amdgpu] amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu] commit_tail+0x97/0x180 [drm_kms_helper] process_one_work+0x1df/0x3a0
the oops is followed by a stream of
[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1] hw_done or flip_done timed out
and the machine was not usable afterwards.
lspci says this is a
49:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7) (prog-if 00 [VGA controller])
Full oops in the attachment, but I think the above is all the really salient details.
Thanks for the report. Adding some display folks to take a look.
Alex