Hi Linus,
Regular drm fixes pull, seems about the right size, lots of small fixes across the board, mostly amdgpu, but msm and i915 are in there along with panel and ttm. There is an rc3 backmerge due to some patches ending up in the gap between last and this week.
Dave.
drm-fixes-2021-07-30: drm fixes for 5.14-rc4
amdgpu: - Fix resource leak in an error path - Avoid stack contents exposure in error path - pmops check fix for S0ix vs S3 - DCN 2.1 display fixes - DCN 2.0 display fix - Backlight control fix for laptops with HDR panels - Maintainers updates
i915: - Fix vbt port mask - Fix around reading the right DSC disable fuse in display_ver 10 - Split display version 9 and 10 in intel_setup_outputs
msm: - iommu fault display fix - misc dp compliance fixes - dpu reg sizing fix
panel: - Fix bpc for ytc700tlag_05_201c
ttm: - debugfs init fixes The following changes since commit ff1176468d368232b684f75e82563369208bc371:
Linux 5.14-rc3 (2021-07-25 15:35:14 -0700)
are available in the Git repository at:
git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-07-30
for you to fetch changes up to d28e2568ac26fff351c846bf74ba6ca5dded733e:
Merge tag 'amd-drm-fixes-5.14-2021-07-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes (2021-07-29 17:20:29 +1000)
---------------------------------------------------------------- drm fixes for 5.14-rc4
amdgpu: - Fix resource leak in an error path - Avoid stack contents exposure in error path - pmops check fix for S0ix vs S3 - DCN 2.1 display fixes - DCN 2.0 display fix - Backlight control fix for laptops with HDR panels - Maintainers updates
i915: - Fix vbt port mask - Fix around reading the right DSC disable fuse in display_ver 10 - Split display version 9 and 10 in intel_setup_outputs
msm: - iommu fault display fix - misc dp compliance fixes - dpu reg sizing fix
panel: - Fix bpc for ytc700tlag_05_201c
ttm: - debugfs init fixes
---------------------------------------------------------------- Alex Deucher (1): drm/amdgpu/display: only enable aux backlight control for OLED panels
Bjorn Andersson (1): drm/msm/dp: Initialize the INTF_CONFIG register
Dale Zhao (1): drm/amd/display: ensure dentist display clock update finished in DCN20
Dave Airlie (4): Merge tag 'drm-msm-fixes-2021-07-27' of https://gitlab.freedesktop.org/drm/msm into drm-fixes Merge tag 'drm-misc-fixes-2021-07-28' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Merge tag 'drm-intel-fixes-2021-07-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Merge tag 'amd-drm-fixes-5.14-2021-07-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
Jagan Teki (1): drm/panel: panel-simple: Fix proper bpc for ytc700tlag_05_201c
Jason Ekstrand (1): drm/ttm: Initialize debugfs from ttm_global_init()
Jiri Kosina (2): drm/amdgpu: Fix resource leak on probe error path drm/amdgpu: Avoid printing of stack contents on firmware load error
Kuogee Hsieh (2): drm/msm/dp: use dp_ctrl_off_link_stream during PHY compliance test run drm/msm/dp: signal audio plugged change at dp_pm_resume
Lucas De Marchi (2): drm/i915: fix not reading DSC disable fuse in GLK drm/i915/display: split DISPLAY_VER 9 and 10 in intel_setup_outputs()
Pratik Vishwakarma (1): drm/amdgpu: Check pmops for desired suspend state
Rob Clark (1): drm/msm: Fix display fault handling
Robert Foss (1): drm/msm/dpu: Fix sm8250_mdp register length
Rodrigo Vivi (1): drm/i915/bios: Fix ports mask
Sean Paul (1): drm/msm/dp: Initialize dp->aux->drm_dev before registration
Simon Ser (1): maintainers: add bugs and chat URLs for amdgpu
Thomas Zimmermann (1): Merge drm/drm-fixes into drm-misc-fixes
Victor Lu (2): drm/amd/display: Guard DST_Y_PREFETCH register overflow in DCN21 drm/amd/display: Add missing DCN21 IP parameter
MAINTAINERS | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++------ drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 7 +++---- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 ++-- .../gpu/drm/amd/display/dc/clk_mgr/dcn20/dcn20_clk_mgr.c | 2 +- drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 1 + .../drm/amd/display/dc/dml/dcn21/display_mode_vba_21.c | 3 +++ drivers/gpu/drm/i915/display/intel_bios.c | 3 ++- drivers/gpu/drm/i915/display/intel_display.c | 8 +++++++- drivers/gpu/drm/i915/intel_device_info.c | 9 +++++---- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 2 +- drivers/gpu/drm/msm/dp/dp_catalog.c | 1 + drivers/gpu/drm/msm/dp/dp_ctrl.c | 2 +- drivers/gpu/drm/msm/dp/dp_display.c | 5 +++++ drivers/gpu/drm/msm/msm_iommu.c | 11 ++++++++++- drivers/gpu/drm/panel/panel-simple.c | 2 +- drivers/gpu/drm/ttm/ttm_device.c | 12 ++++++++++++ drivers/gpu/drm/ttm/ttm_module.c | 16 ---------------- 19 files changed, 61 insertions(+), 40 deletions(-)
The pull request you sent on Fri, 30 Jul 2021 11:11:27 +1000:
git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-07-30
has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/764a5bc89b12b82c18ce7ca5d7c1b10dd748a440
Thank you!
This might possibly have been fixed already by the previous drm pull, but I wanted to report it anyway, just in case.
It happened after an uptime of over a week, so it might not be trivial to reproduce.
It's a NULL pointer dereference in dc_stream_retain() with the code being
lock xadd %eax,0x390(%rdi) <-- trapping instruction
and that's just the
kref_get(&stream->refcount);
with a NULL 'stream' argument.
Call Trace: dc_resource_state_copy_construct+0x13f/0x190 [amdgpu] amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu] commit_tail+0x97/0x180 [drm_kms_helper] process_one_work+0x1df/0x3a0
the oops is followed by a stream of
[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1] hw_done or flip_done timed out
and the machine was not usable afterwards.
lspci says this is a
49:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7) (prog-if 00 [VGA controller])
Full oops in the attachment, but I think the above is all the really salient details.
Linus
On Thu, Aug 5, 2021 at 2:14 PM Linus Torvalds torvalds@linux-foundation.org wrote:
This might possibly have been fixed already by the previous drm pull, but I wanted to report it anyway, just in case.
It happened after an uptime of over a week, so it might not be trivial to reproduce.
It's a NULL pointer dereference in dc_stream_retain() with the code being
lock xadd %eax,0x390(%rdi) <-- trapping instruction
and that's just the
kref_get(&stream->refcount);
with a NULL 'stream' argument.
Call Trace: dc_resource_state_copy_construct+0x13f/0x190 [amdgpu] amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu] commit_tail+0x97/0x180 [drm_kms_helper] process_one_work+0x1df/0x3a0
the oops is followed by a stream of
[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1] hw_done or flip_done timed out
and the machine was not usable afterwards.
lspci says this is a
49:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7) (prog-if 00 [VGA controller])
Full oops in the attachment, but I think the above is all the really salient details.
Thanks for the report. Adding some display folks to take a look.
Alex
On Thu, Aug 5, 2021 at 8:14 PM Linus Torvalds torvalds@linux-foundation.org wrote:
This might possibly have been fixed already by the previous drm pull, but I wanted to report it anyway, just in case.
It happened after an uptime of over a week, so it might not be trivial to reproduce.
It's a NULL pointer dereference in dc_stream_retain() with the code being
lock xadd %eax,0x390(%rdi) <-- trapping instruction
and that's just the
kref_get(&stream->refcount);
with a NULL 'stream' argument.
Call Trace: dc_resource_state_copy_construct+0x13f/0x190 [amdgpu] amdgpu_dm_atomic_commit_tail+0xd5/0x1540 [amdgpu] commit_tail+0x97/0x180 [drm_kms_helper] process_one_work+0x1df/0x3a0
the oops is followed by a stream of
[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:55:crtc-1] hw_done or flip_done timed out
and the machine was not usable afterwards.
Hm that part is a bit disappointing because the atomic modeset commit helpers are designed to recover from this (assuming we didn't fry the hw). But amdgpu does these waits in amdgpu_dm_atomic_check() which is decidedly not great (you're not supposed to block on hw or a previous in that atomic_check ever, because it can be called by userspace in a TEST_ONLY mode to figure out whether a desired config would work), and then returns that error to userspace, which is worse.
I guess that's another area where the integration between what atomic modeset expects and the DC backend provides is suboptimal. I think the data structures we managed to fuse together fairly ok, but the check/commit flow and semantics are a bit a struggle.
Anyway this was just an aside, I guess given the bug the driver wouldn't have recovered anyway. -Daniel
lspci says this is a
49:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7) (prog-if 00 [VGA controller])
Full oops in the attachment, but I think the above is all the really salient details.
Linus
dri-devel@lists.freedesktop.org