Hi,
I have noticed that on an imx53-qsb, it is no longer possible to reboot the system as it fails like this:
Requesting system reboot [ 23.819116] cfg80211: failed to load regulatory.db [ 23.827569] imx-sdma 63fb0000.sdma: external firmware not found, using ROM firmware [ 23.956838] ci_hdrc ci_hdrc.0: remove, state 1 [ 23.968029] usb usb1: USB disconnect, device number 1 [ 23.976033] usb 1-1: USB disconnect, device number 2 [ 24.234253] ci_hdrc ci_hdrc.0: USB bus 1 deregistered [ 24.268964] 8<--- cut here --- [ 24.274602] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 24.283434] pgd = (ptrval) [ 24.286387] [00000000] *pgd=ca212831 [ 24.290788] Internal error: Oops: 17 [#1] SMP ARM [ 24.295609] Modules linked in: [ 24.298777] CPU: 0 PID: 197 Comm: init Not tainted 5.11.0-rc2-next-20210111 #333 [ 24.306276] Hardware name: Freescale i.MX53 (Device Tree Support) [ 24.312442] PC is at msm_atomic_commit_tail+0x54/0xb9c [ 24.317743] LR is at commit_tail+0xa4/0x1b0 [ 24.322032] pc : [<c0769b78>] lr : [<c06db0b4>] psr: 60000013 [ 24.328374] sp : c28d1d50 ip : c23a3000 fp : 00000000 [ 24.333670] r10: c2816780 r9 : c12d71c0 r8 : c17fb018 [ 24.338967] r7 : c23a3000 r6 : c2816780 r5 : 00000000 r4 : 00000000 [ 24.345572] r3 : c24c2c00 r2 : c23a3000 r1 : c0769b24 r0 : 00000000 [ 24.352177] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 24.359407] Control: 10c5387d Table: 72858019 DAC: 00000051 [ 24.365220] Process init (pid: 197, stack limit = 0x(ptrval)) [ 24.371052] Stack: (0xc28d1d50 to 0xc28d2000) [ 24.375508] 1d40: ffffffff 00000000 9682f000 00000005 [ 24.383794] 1d60: 3031e53d 00000000 00000dc0 c0f816d8 c23a3000 c23a3000 00000000 c17fb018 [ 24.392079] 1d80: c12d71c0 c2816780 00000000 c06db0b4 a5f7faba 00000005 00000000 c2816780 [ 24.400363] 1da0: 00000000 c23a3000 00000000 c17fb018 c12d71c0 c24c20a0 00000000 c06dbed0 [ 24.408647] 1dc0: 00000000 00000000 c2816780 c23a349c c2816780 c28d1dfc c23a34a4 c06db604 [ 24.416932] 1de0: c23a3000 00000000 c1609388 c12ba9dc c17fb018 c06db704 c2965e80 c2965e80 [ 24.425214] 1e00: 00000008 00000001 00000000 00000000 c175f454 00000000 c175f458 c1c669cc [ 24.433498] 1e20: 00000000 c12bebb8 00000000 00000001 00000008 00000000 c23a32ec c23a32ec [ 24.441783] 1e40: 00000000 433f193b c24c2014 c24c2014 c24c2010 c17674c8 c1e68bec c07c76e8 [ 24.450067] 1e60: 00000000 c16158d8 c1609388 fee1dead 00000000 c28d0000 00000058 c0153730 [ 24.458350] 1e80: 01234567 c01539d4 fffffffe 00000000 00000000 00000000 00000000 00000000 [ 24.466633] 1ea0: 00000000 00000000 00000000 00000000 00000000 ffffffff 00000000 c1609388 [ 24.474917] 1ec0: c29663c8 00000000 00000000 c0e17954 ffffe000 ffffffff 00000001 ffffffff [ 24.483200] 1ee0: c1609388 c1609388 c16093d4 433f193b 00000000 c1581584 ffffe000 1ea51000 [ 24.491485] 1f00: 00000001 00000080 c1609388 c1609794 c29663c8 00000000 00000000 c0e17954 [ 24.499769] 1f20: 00000000 ffffffff 00000000 c1609388 c1609388 c16093d4 00000000 c1609388 [ 24.508054] 1f40: c29663c8 c0183f24 c2965e80 c1609388 00000001 c1609794 c2995090 c018ce7c [ 24.516337] 1f60: 00000001 c2995080 c0136e80 c010012c 00000000 00000001 c158b21c c0e22334 [ 24.524622] 1f80: c158b21c c010019c c1609794 433f193b 00000000 beefefd4 00000001 00000058 [ 24.532907] 1fa0: c0100264 c0100080 00000000 beefefd4 fee1dead 28121969 01234567 00000000 [ 24.541191] 1fc0: 00000000 beefefd4 00000001 00000058 00000000 00000000 b6f1ef74 00000000 [ 24.549476] 1fe0: 000d7298 beefed40 00091a48 b6e8894c 60000010 fee1dead 00000000 00000000 [ 24.557742] [<c0769b78>] (msm_atomic_commit_tail) from [<c06db0b4>] (commit_tail+0xa4/0x1b0) [ 24.566349] [<c06db0b4>] (commit_tail) from [<c06dbed0>] (drm_atomic_helper_commit+0x154/0x188) [ 24.575193] [<c06dbed0>] (drm_atomic_helper_commit) from [<c06db604>] (drm_atomic_helper_disable_all+0x154/0x1c0) [ 24.585599] [<c06db604>] (drm_atomic_helper_disable_all) from [<c06db704>] (drm_atomic_helper_shutdown+0x94/0x12c) [ 24.596094] [<c06db704>] (drm_atomic_helper_shutdown) from [<c07c76e8>] (device_shutdown+0x118/0x250) [ 24.605475] [<c07c76e8>] (device_shutdown) from [<c0153730>] (kernel_restart+0xc/0x68) [ 24.613574] [<c0153730>] (kernel_restart) from [<c01539d4>] (__do_sys_reboot+0x144/0x200) [ 24.621915] [<c01539d4>] (__do_sys_reboot) from [<c0100080>] (ret_fast_syscall+0x0/0x2c) [ 24.630160] Exception stack(0xc28d1fa8 to 0xc28d1ff0) [ 24.635315] 1fa0: 00000000 beefefd4 fee1dead 28121969 01234567 00000000 [ 24.643600] 1fc0: 00000000 beefefd4 00000001 00000058 00000000 00000000 b6f1ef74 00000000 [ 24.651867] 1fe0: 000d7298 beefed40 00091a48 b6e8894c [ 24.657025] Code: 1592208c 1185521c e1530000 1afffff8 (e5942000) [ 24.663681] ---[ end trace 9a1e129deec83f42 ]--- [ 25.670432] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000 [ 25.678331] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000 ]---
It happens on 5.4 as well as 5.11-rc2.
Any ideas?
Thanks,
Fabio Estevam
Adding some more folks in case anyone has any suggestions to fix this reboot hang.
Thanks
On Tue, Jan 12, 2021 at 5:07 PM Fabio Estevam festevam@gmail.com wrote:
Hi,
I have noticed that on an imx53-qsb, it is no longer possible to reboot the system as it fails like this:
Requesting system reboot [ 23.819116] cfg80211: failed to load regulatory.db [ 23.827569] imx-sdma 63fb0000.sdma: external firmware not found, using ROM firmware [ 23.956838] ci_hdrc ci_hdrc.0: remove, state 1 [ 23.968029] usb usb1: USB disconnect, device number 1 [ 23.976033] usb 1-1: USB disconnect, device number 2 [ 24.234253] ci_hdrc ci_hdrc.0: USB bus 1 deregistered [ 24.268964] 8<--- cut here --- [ 24.274602] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 24.283434] pgd = (ptrval) [ 24.286387] [00000000] *pgd=ca212831 [ 24.290788] Internal error: Oops: 17 [#1] SMP ARM [ 24.295609] Modules linked in: [ 24.298777] CPU: 0 PID: 197 Comm: init Not tainted 5.11.0-rc2-next-20210111 #333 [ 24.306276] Hardware name: Freescale i.MX53 (Device Tree Support) [ 24.312442] PC is at msm_atomic_commit_tail+0x54/0xb9c [ 24.317743] LR is at commit_tail+0xa4/0x1b0 [ 24.322032] pc : [<c0769b78>] lr : [<c06db0b4>] psr: 60000013 [ 24.328374] sp : c28d1d50 ip : c23a3000 fp : 00000000 [ 24.333670] r10: c2816780 r9 : c12d71c0 r8 : c17fb018 [ 24.338967] r7 : c23a3000 r6 : c2816780 r5 : 00000000 r4 : 00000000 [ 24.345572] r3 : c24c2c00 r2 : c23a3000 r1 : c0769b24 r0 : 00000000 [ 24.352177] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 24.359407] Control: 10c5387d Table: 72858019 DAC: 00000051 [ 24.365220] Process init (pid: 197, stack limit = 0x(ptrval)) [ 24.371052] Stack: (0xc28d1d50 to 0xc28d2000) [ 24.375508] 1d40: ffffffff 00000000 9682f000 00000005 [ 24.383794] 1d60: 3031e53d 00000000 00000dc0 c0f816d8 c23a3000 c23a3000 00000000 c17fb018 [ 24.392079] 1d80: c12d71c0 c2816780 00000000 c06db0b4 a5f7faba 00000005 00000000 c2816780 [ 24.400363] 1da0: 00000000 c23a3000 00000000 c17fb018 c12d71c0 c24c20a0 00000000 c06dbed0 [ 24.408647] 1dc0: 00000000 00000000 c2816780 c23a349c c2816780 c28d1dfc c23a34a4 c06db604 [ 24.416932] 1de0: c23a3000 00000000 c1609388 c12ba9dc c17fb018 c06db704 c2965e80 c2965e80 [ 24.425214] 1e00: 00000008 00000001 00000000 00000000 c175f454 00000000 c175f458 c1c669cc [ 24.433498] 1e20: 00000000 c12bebb8 00000000 00000001 00000008 00000000 c23a32ec c23a32ec [ 24.441783] 1e40: 00000000 433f193b c24c2014 c24c2014 c24c2010 c17674c8 c1e68bec c07c76e8 [ 24.450067] 1e60: 00000000 c16158d8 c1609388 fee1dead 00000000 c28d0000 00000058 c0153730 [ 24.458350] 1e80: 01234567 c01539d4 fffffffe 00000000 00000000 00000000 00000000 00000000 [ 24.466633] 1ea0: 00000000 00000000 00000000 00000000 00000000 ffffffff 00000000 c1609388 [ 24.474917] 1ec0: c29663c8 00000000 00000000 c0e17954 ffffe000 ffffffff 00000001 ffffffff [ 24.483200] 1ee0: c1609388 c1609388 c16093d4 433f193b 00000000 c1581584 ffffe000 1ea51000 [ 24.491485] 1f00: 00000001 00000080 c1609388 c1609794 c29663c8 00000000 00000000 c0e17954 [ 24.499769] 1f20: 00000000 ffffffff 00000000 c1609388 c1609388 c16093d4 00000000 c1609388 [ 24.508054] 1f40: c29663c8 c0183f24 c2965e80 c1609388 00000001 c1609794 c2995090 c018ce7c [ 24.516337] 1f60: 00000001 c2995080 c0136e80 c010012c 00000000 00000001 c158b21c c0e22334 [ 24.524622] 1f80: c158b21c c010019c c1609794 433f193b 00000000 beefefd4 00000001 00000058 [ 24.532907] 1fa0: c0100264 c0100080 00000000 beefefd4 fee1dead 28121969 01234567 00000000 [ 24.541191] 1fc0: 00000000 beefefd4 00000001 00000058 00000000 00000000 b6f1ef74 00000000 [ 24.549476] 1fe0: 000d7298 beefed40 00091a48 b6e8894c 60000010 fee1dead 00000000 00000000 [ 24.557742] [<c0769b78>] (msm_atomic_commit_tail) from [<c06db0b4>] (commit_tail+0xa4/0x1b0) [ 24.566349] [<c06db0b4>] (commit_tail) from [<c06dbed0>] (drm_atomic_helper_commit+0x154/0x188) [ 24.575193] [<c06dbed0>] (drm_atomic_helper_commit) from [<c06db604>] (drm_atomic_helper_disable_all+0x154/0x1c0) [ 24.585599] [<c06db604>] (drm_atomic_helper_disable_all) from [<c06db704>] (drm_atomic_helper_shutdown+0x94/0x12c) [ 24.596094] [<c06db704>] (drm_atomic_helper_shutdown) from [<c07c76e8>] (device_shutdown+0x118/0x250) [ 24.605475] [<c07c76e8>] (device_shutdown) from [<c0153730>] (kernel_restart+0xc/0x68) [ 24.613574] [<c0153730>] (kernel_restart) from [<c01539d4>] (__do_sys_reboot+0x144/0x200) [ 24.621915] [<c01539d4>] (__do_sys_reboot) from [<c0100080>] (ret_fast_syscall+0x0/0x2c) [ 24.630160] Exception stack(0xc28d1fa8 to 0xc28d1ff0) [ 24.635315] 1fa0: 00000000 beefefd4 fee1dead 28121969 01234567 00000000 [ 24.643600] 1fc0: 00000000 beefefd4 00000001 00000058 00000000 00000000 b6f1ef74 00000000 [ 24.651867] 1fe0: 000d7298 beefed40 00091a48 b6e8894c [ 24.657025] Code: 1592208c 1185521c e1530000 1afffff8 (e5942000) [ 24.663681] ---[ end trace 9a1e129deec83f42 ]--- [ 25.670432] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000 [ 25.678331] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000 ]---
It happens on 5.4 as well as 5.11-rc2.
Any ideas?
Thanks,
Fabio Estevam
On Mon, Jan 18, 2021 at 6:44 PM Fabio Estevam festevam@gmail.com wrote:
Adding some more folks in case anyone has any suggestions to fix this reboot hang.
Not sure if this is a valid fix, but the change below makes reboot works correctly.
kmscube still works.
--- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,8 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state); - bool async = kms->funcs->vsync_time && - can_do_async(state, &async_crtc); + bool async; + + if (!kms) + return; + + async = kms->funcs->vsync_time && can_do_async(state, &async_crtc);
trace_msm_atomic_commit_tail_start(async, crtc_mask);
Any comments?
Thanks
On Mon, Jan 18, 2021 at 11:00 PM Fabio Estevam festevam@gmail.com wrote:
On Mon, Jan 18, 2021 at 6:44 PM Fabio Estevam festevam@gmail.com wrote:
Adding some more folks in case anyone has any suggestions to fix this reboot hang.
Not sure if this is a valid fix, but the change below makes reboot works correctly.
kmscube still works.
--- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,8 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state);
bool async = kms->funcs->vsync_time &&
can_do_async(state, &async_crtc);
bool async;
if (!kms)
return;
That looks a bit like a hack papering over the real issue.
From your report it sounds like earlier kernels worked, did you
attempt bisecting? Also for regressions put regressions into the subject, it's the magic work that gets much more attention. -Daniel
async = kms->funcs->vsync_time && can_do_async(state, &async_crtc); trace_msm_atomic_commit_tail_start(async, crtc_mask);
Any comments?
Thanks _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, Jan 18, 2021 at 11:00 PM Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Jan 18, 2021 at 11:00 PM Fabio Estevam festevam@gmail.com wrote:
On Mon, Jan 18, 2021 at 6:44 PM Fabio Estevam festevam@gmail.com wrote:
Adding some more folks in case anyone has any suggestions to fix this reboot hang.
Not sure if this is a valid fix, but the change below makes reboot works correctly.
kmscube still works.
--- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,8 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state);
bool async = kms->funcs->vsync_time &&
can_do_async(state, &async_crtc);
bool async;
if (!kms)
return;
That looks a bit like a hack papering over the real issue.
From your report it sounds like earlier kernels worked, did you attempt bisecting? Also for regressions put regressions into the subject, it's the magic work that gets much more attention.
the root issue is how are we doing KMS stuff on imx (where drm/msm is only used for gpu).. which I think is this commit:
---------- commit 9d5cbf5fe46e350715389d89d0c350d83289a102 Author: Krishna Manikandan mkrishn@codeaurora.org AuthorDate: Mon Jun 1 16:33:22 2020 +0530 Commit: Rob Clark robdclark@chromium.org CommitDate: Tue Aug 18 08:09:01 2020 -0700
drm/msm: add shutdown support for display platform_driver
Define shutdown callback for display drm driver, so as to disable all the CRTCS when shutdown notification is received by the driver.
This change will turn off the timing engine so that no display transactions are requested while mmu translations are getting disabled during reboot sequence.
Signed-off-by: Krishna Manikandan mkrishn@codeaurora.org
Changes in v2: - Remove NULL check from msm_pdev_shutdown (Stephen Boyd) - Change commit text to reflect when this issue was uncovered (Sai Prakash Ranjan)
Signed-off-by: Rob Clark robdclark@chromium.org ----------
I suppose we should do the drm_atomic_helper_shutdown() conditionally? Or the helper should bail if there is no kms?
BR, -R
-Daniel
async = kms->funcs->vsync_time && can_do_async(state, &async_crtc); trace_msm_atomic_commit_tail_start(async, crtc_mask);
Any comments?
Thanks _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
forgot to CC Krishna
On Tue, Jan 19, 2021 at 8:34 AM Rob Clark robdclark@gmail.com wrote:
On Mon, Jan 18, 2021 at 11:00 PM Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Jan 18, 2021 at 11:00 PM Fabio Estevam festevam@gmail.com wrote:
On Mon, Jan 18, 2021 at 6:44 PM Fabio Estevam festevam@gmail.com wrote:
Adding some more folks in case anyone has any suggestions to fix this reboot hang.
Not sure if this is a valid fix, but the change below makes reboot works correctly.
kmscube still works.
--- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,8 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state);
bool async = kms->funcs->vsync_time &&
can_do_async(state, &async_crtc);
bool async;
if (!kms)
return;
That looks a bit like a hack papering over the real issue.
From your report it sounds like earlier kernels worked, did you attempt bisecting? Also for regressions put regressions into the subject, it's the magic work that gets much more attention.
the root issue is how are we doing KMS stuff on imx (where drm/msm is only used for gpu).. which I think is this commit:
commit 9d5cbf5fe46e350715389d89d0c350d83289a102 Author: Krishna Manikandan mkrishn@codeaurora.org AuthorDate: Mon Jun 1 16:33:22 2020 +0530 Commit: Rob Clark robdclark@chromium.org CommitDate: Tue Aug 18 08:09:01 2020 -0700
drm/msm: add shutdown support for display platform_driver Define shutdown callback for display drm driver, so as to disable all the CRTCS when shutdown notification is received by the driver. This change will turn off the timing engine so that no display transactions are requested while mmu translations are getting disabled during reboot sequence. Signed-off-by: Krishna Manikandan <mkrishn@codeaurora.org> Changes in v2: - Remove NULL check from msm_pdev_shutdown (Stephen Boyd) - Change commit text to reflect when this issue was uncovered (Sai Prakash Ranjan) Signed-off-by: Rob Clark <robdclark@chromium.org>
I suppose we should do the drm_atomic_helper_shutdown() conditionally? Or the helper should bail if there is no kms?
BR, -R
-Daniel
async = kms->funcs->vsync_time && can_do_async(state, &async_crtc); trace_msm_atomic_commit_tail_start(async, crtc_mask);
Any comments?
Thanks _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Hi Rob,
On Tue, Jan 19, 2021 at 1:40 PM Rob Clark robdclark@gmail.com wrote:
I suppose we should do the drm_atomic_helper_shutdown() conditionally?
This suggestion works, thanks. I will submit a patch shortly.
Thanks
On Tue, Jan 19, 2021 at 6:36 PM Fabio Estevam festevam@gmail.com wrote:
Hi Rob,
On Tue, Jan 19, 2021 at 1:40 PM Rob Clark robdclark@gmail.com wrote:
I suppose we should do the drm_atomic_helper_shutdown() conditionally?
This suggestion works, thanks. I will submit a patch shortly.
I think the cleanest way to do this is 2 patches: - add checks for DRIVER_MODESET to these helpers (there's a function to check these flags) - filter out DRIVER_MODESET flag in drm_device.driver_features (this is used to substract features at load time so that the drm_device struct can stay const)
There's probably other drivers that'd need the same treatment (but they don't use these helpers yet). -Daniel
dri-devel@lists.freedesktop.org