if GPU components have failed to bind, shutdown callback would fail with the following backtrace. Add safeguard check to stop that oops from happening and allow the board to reboot.
[ 66.617046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 66.626066] Mem abort info: [ 66.628939] ESR = 0x96000006 [ 66.632088] EC = 0x25: DABT (current EL), IL = 32 bits [ 66.637542] SET = 0, FnV = 0 [ 66.640688] EA = 0, S1PTW = 0 [ 66.643924] Data abort info: [ 66.646889] ISV = 0, ISS = 0x00000006 [ 66.650832] CM = 0, WnR = 0 [ 66.653890] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000107f81000 [ 66.660505] [0000000000000000] pgd=0000000100bb2003, p4d=0000000100bb2003, pud=0000000100897003, pmd=0000000000000000 [ 66.671398] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 66.677115] Modules linked in: [ 66.680261] CPU: 6 PID: 352 Comm: reboot Not tainted 5.11.0-rc2-00309-g79e3faa756b2 #38 [ 66.688473] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) [ 66.695347] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) [ 66.701507] pc : msm_atomic_commit_tail+0x78/0x4e0 [ 66.706437] lr : commit_tail+0xa4/0x184 [ 66.710381] sp : ffff8000108f3af0 [ 66.713791] x29: ffff8000108f3af0 x28: ffff418c44337000 [ 66.719242] x27: 0000000000000000 x26: ffff418c40a24490 [ 66.724693] x25: ffffd3a842a4f1a0 x24: 0000000000000008 [ 66.730146] x23: ffffd3a84313f030 x22: ffff418c444ce000 [ 66.735598] x21: ffff418c408a4980 x20: 0000000000000000 [ 66.741049] x19: 0000000000000000 x18: ffff800010710fbc [ 66.746500] x17: 000000000000000c x16: 0000000000000001 [ 66.751954] x15: 0000000000010008 x14: 0000000000000068 [ 66.757405] x13: 0000000000000001 x12: 0000000000000000 [ 66.762855] x11: 0000000000000001 x10: 00000000000009b0 [ 66.768306] x9 : ffffd3a843192000 x8 : ffff418c44337000 [ 66.773757] x7 : 0000000000000000 x6 : 00000000a401b34e [ 66.779210] x5 : 00ffffffffffffff x4 : 0000000000000000 [ 66.784660] x3 : 0000000000000000 x2 : ffff418c444ce000 [ 66.790111] x1 : ffffd3a841dce530 x0 : ffff418c444cf000 [ 66.795563] Call trace: [ 66.798075] msm_atomic_commit_tail+0x78/0x4e0 [ 66.802633] commit_tail+0xa4/0x184 [ 66.806217] drm_atomic_helper_commit+0x160/0x390 [ 66.811051] drm_atomic_commit+0x4c/0x60 [ 66.815082] drm_atomic_helper_disable_all+0x1f4/0x210 [ 66.820355] drm_atomic_helper_shutdown+0x80/0x130 [ 66.825276] msm_pdev_shutdown+0x14/0x20 [ 66.829303] platform_shutdown+0x28/0x40 [ 66.833330] device_shutdown+0x158/0x330 [ 66.837357] kernel_restart+0x40/0xa0 [ 66.841122] __do_sys_reboot+0x228/0x250 [ 66.845148] __arm64_sys_reboot+0x28/0x34 [ 66.849264] el0_svc_common.constprop.0+0x74/0x190 [ 66.854187] do_el0_svc+0x24/0x90 [ 66.857595] el0_svc+0x14/0x20 [ 66.860739] el0_sync_handler+0x1a4/0x1b0 [ 66.864858] el0_sync+0x174/0x180 [ 66.868269] Code: 1ac020a0 2a000273 eb02007f 54ffff01 (f9400285) [ 66.874525] ---[ end trace 20dedb2a3229fec8 ]---
Fixes: 9d5cbf5fe46e ("drm/msm: add shutdown support for display platform_driver") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org --- drivers/gpu/drm/msm/msm_atomic.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 6a326761dc4a..2fd0cf6421ad 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state); - bool async = kms->funcs->vsync_time && + bool async; + + if (!kms) + return; + + async = kms->funcs->vsync_time && can_do_async(state, &async_crtc);
trace_msm_atomic_commit_tail_start(async, crtc_mask);
On 2021-03-01 13:41, Dmitry Baryshkov wrote:
if GPU components have failed to bind, shutdown callback would fail with the following backtrace. Add safeguard check to stop that oops from happening and allow the board to reboot.
[ 66.617046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 66.626066] Mem abort info: [ 66.628939] ESR = 0x96000006 [ 66.632088] EC = 0x25: DABT (current EL), IL = 32 bits [ 66.637542] SET = 0, FnV = 0 [ 66.640688] EA = 0, S1PTW = 0 [ 66.643924] Data abort info: [ 66.646889] ISV = 0, ISS = 0x00000006 [ 66.650832] CM = 0, WnR = 0 [ 66.653890] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000107f81000 [ 66.660505] [0000000000000000] pgd=0000000100bb2003, p4d=0000000100bb2003, pud=0000000100897003, pmd=0000000000000000 [ 66.671398] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 66.677115] Modules linked in: [ 66.680261] CPU: 6 PID: 352 Comm: reboot Not tainted 5.11.0-rc2-00309-g79e3faa756b2 #38 [ 66.688473] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) [ 66.695347] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) [ 66.701507] pc : msm_atomic_commit_tail+0x78/0x4e0 [ 66.706437] lr : commit_tail+0xa4/0x184 [ 66.710381] sp : ffff8000108f3af0 [ 66.713791] x29: ffff8000108f3af0 x28: ffff418c44337000 [ 66.719242] x27: 0000000000000000 x26: ffff418c40a24490 [ 66.724693] x25: ffffd3a842a4f1a0 x24: 0000000000000008 [ 66.730146] x23: ffffd3a84313f030 x22: ffff418c444ce000 [ 66.735598] x21: ffff418c408a4980 x20: 0000000000000000 [ 66.741049] x19: 0000000000000000 x18: ffff800010710fbc [ 66.746500] x17: 000000000000000c x16: 0000000000000001 [ 66.751954] x15: 0000000000010008 x14: 0000000000000068 [ 66.757405] x13: 0000000000000001 x12: 0000000000000000 [ 66.762855] x11: 0000000000000001 x10: 00000000000009b0 [ 66.768306] x9 : ffffd3a843192000 x8 : ffff418c44337000 [ 66.773757] x7 : 0000000000000000 x6 : 00000000a401b34e [ 66.779210] x5 : 00ffffffffffffff x4 : 0000000000000000 [ 66.784660] x3 : 0000000000000000 x2 : ffff418c444ce000 [ 66.790111] x1 : ffffd3a841dce530 x0 : ffff418c444cf000 [ 66.795563] Call trace: [ 66.798075] msm_atomic_commit_tail+0x78/0x4e0 [ 66.802633] commit_tail+0xa4/0x184 [ 66.806217] drm_atomic_helper_commit+0x160/0x390 [ 66.811051] drm_atomic_commit+0x4c/0x60 [ 66.815082] drm_atomic_helper_disable_all+0x1f4/0x210 [ 66.820355] drm_atomic_helper_shutdown+0x80/0x130 [ 66.825276] msm_pdev_shutdown+0x14/0x20 [ 66.829303] platform_shutdown+0x28/0x40 [ 66.833330] device_shutdown+0x158/0x330 [ 66.837357] kernel_restart+0x40/0xa0 [ 66.841122] __do_sys_reboot+0x228/0x250 [ 66.845148] __arm64_sys_reboot+0x28/0x34 [ 66.849264] el0_svc_common.constprop.0+0x74/0x190 [ 66.854187] do_el0_svc+0x24/0x90 [ 66.857595] el0_svc+0x14/0x20 [ 66.860739] el0_sync_handler+0x1a4/0x1b0 [ 66.864858] el0_sync+0x174/0x180 [ 66.868269] Code: 1ac020a0 2a000273 eb02007f 54ffff01 (f9400285) [ 66.874525] ---[ end trace 20dedb2a3229fec8 ]---
Fixes: 9d5cbf5fe46e ("drm/msm: add shutdown support for display platform_driver") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
Reviewed-by: Abhinav Kumar abhinavk@codeaurora.org
drivers/gpu/drm/msm/msm_atomic.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 6a326761dc4a..2fd0cf6421ad 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state);
- bool async = kms->funcs->vsync_time &&
bool async;
if (!kms)
return;
async = kms->funcs->vsync_time && can_do_async(state, &async_crtc);
trace_msm_atomic_commit_tail_start(async, crtc_mask);
On Mon, Mar 1, 2021 at 1:41 PM Dmitry Baryshkov dmitry.baryshkov@linaro.org wrote:
if GPU components have failed to bind, shutdown callback would fail with the following backtrace. Add safeguard check to stop that oops from happening and allow the board to reboot.
[ 66.617046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 66.626066] Mem abort info: [ 66.628939] ESR = 0x96000006 [ 66.632088] EC = 0x25: DABT (current EL), IL = 32 bits [ 66.637542] SET = 0, FnV = 0 [ 66.640688] EA = 0, S1PTW = 0 [ 66.643924] Data abort info: [ 66.646889] ISV = 0, ISS = 0x00000006 [ 66.650832] CM = 0, WnR = 0 [ 66.653890] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000107f81000 [ 66.660505] [0000000000000000] pgd=0000000100bb2003, p4d=0000000100bb2003, pud=0000000100897003, pmd=0000000000000000 [ 66.671398] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 66.677115] Modules linked in: [ 66.680261] CPU: 6 PID: 352 Comm: reboot Not tainted 5.11.0-rc2-00309-g79e3faa756b2 #38 [ 66.688473] Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) [ 66.695347] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) [ 66.701507] pc : msm_atomic_commit_tail+0x78/0x4e0 [ 66.706437] lr : commit_tail+0xa4/0x184 [ 66.710381] sp : ffff8000108f3af0 [ 66.713791] x29: ffff8000108f3af0 x28: ffff418c44337000 [ 66.719242] x27: 0000000000000000 x26: ffff418c40a24490 [ 66.724693] x25: ffffd3a842a4f1a0 x24: 0000000000000008 [ 66.730146] x23: ffffd3a84313f030 x22: ffff418c444ce000 [ 66.735598] x21: ffff418c408a4980 x20: 0000000000000000 [ 66.741049] x19: 0000000000000000 x18: ffff800010710fbc [ 66.746500] x17: 000000000000000c x16: 0000000000000001 [ 66.751954] x15: 0000000000010008 x14: 0000000000000068 [ 66.757405] x13: 0000000000000001 x12: 0000000000000000 [ 66.762855] x11: 0000000000000001 x10: 00000000000009b0 [ 66.768306] x9 : ffffd3a843192000 x8 : ffff418c44337000 [ 66.773757] x7 : 0000000000000000 x6 : 00000000a401b34e [ 66.779210] x5 : 00ffffffffffffff x4 : 0000000000000000 [ 66.784660] x3 : 0000000000000000 x2 : ffff418c444ce000 [ 66.790111] x1 : ffffd3a841dce530 x0 : ffff418c444cf000 [ 66.795563] Call trace: [ 66.798075] msm_atomic_commit_tail+0x78/0x4e0 [ 66.802633] commit_tail+0xa4/0x184 [ 66.806217] drm_atomic_helper_commit+0x160/0x390 [ 66.811051] drm_atomic_commit+0x4c/0x60 [ 66.815082] drm_atomic_helper_disable_all+0x1f4/0x210 [ 66.820355] drm_atomic_helper_shutdown+0x80/0x130 [ 66.825276] msm_pdev_shutdown+0x14/0x20 [ 66.829303] platform_shutdown+0x28/0x40 [ 66.833330] device_shutdown+0x158/0x330 [ 66.837357] kernel_restart+0x40/0xa0 [ 66.841122] __do_sys_reboot+0x228/0x250 [ 66.845148] __arm64_sys_reboot+0x28/0x34 [ 66.849264] el0_svc_common.constprop.0+0x74/0x190 [ 66.854187] do_el0_svc+0x24/0x90 [ 66.857595] el0_svc+0x14/0x20 [ 66.860739] el0_sync_handler+0x1a4/0x1b0 [ 66.864858] el0_sync+0x174/0x180 [ 66.868269] Code: 1ac020a0 2a000273 eb02007f 54ffff01 (f9400285) [ 66.874525] ---[ end trace 20dedb2a3229fec8 ]---
Fixes: 9d5cbf5fe46e ("drm/msm: add shutdown support for display platform_driver") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
drivers/gpu/drm/msm/msm_atomic.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 6a326761dc4a..2fd0cf6421ad 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state);
bool async = kms->funcs->vsync_time &&
bool async;
if (!kms)
return;
I think we could instead just check for null priv->kms in msm_pdev_shutdown() and not call drm_atomic_helper_shutdown()?
BR, -R
async = kms->funcs->vsync_time && can_do_async(state, &async_crtc); trace_msm_atomic_commit_tail_start(async, crtc_mask);
-- 2.30.1
On 17/03/2021 19:25, Rob Clark wrote:
On Mon, Mar 1, 2021 at 1:41 PM Dmitry Baryshkov dmitry.baryshkov@linaro.org wrote:
if GPU components have failed to bind, shutdown callback would fail with the following backtrace. Add safeguard check to stop that oops from happening and allow the board to reboot.
[skipped]
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 6a326761dc4a..2fd0cf6421ad 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state);
bool async = kms->funcs->vsync_time &&
bool async;
if (!kms)
return;
I think we could instead just check for null priv->kms in msm_pdev_shutdown() and not call drm_atomic_helper_shutdown()?
Good idea. Sending v2.
BR, -R
async = kms->funcs->vsync_time && can_do_async(state, &async_crtc); trace_msm_atomic_commit_tail_start(async, crtc_mask);
-- 2.30.1
Hi Dmitry,
On Mon, Mar 1, 2021 at 6:41 PM Dmitry Baryshkov dmitry.baryshkov@linaro.org wrote:
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 6a326761dc4a..2fd0cf6421ad 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state);
bool async = kms->funcs->vsync_time &&
bool async;
if (!kms)
return;
async = kms->funcs->vsync_time && can_do_async(state, &async_crtc);
I also see the same issue on a i.MX53: https://lists.freedesktop.org/archives/freedreno/2021-January/009369.html
Then I got a different suggestion from Rob. Please check:
https://www.spinics.net/lists/dri-devel/msg286648.html and https://www.spinics.net/lists/dri-devel/msg286649.html
Does this series fix the issue in your platform too?
On Fri, Mar 19, 2021 at 5:09 AM Fabio Estevam festevam@gmail.com wrote:
Hi Dmitry,
On Mon, Mar 1, 2021 at 6:41 PM Dmitry Baryshkov dmitry.baryshkov@linaro.org wrote:
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 6a326761dc4a..2fd0cf6421ad 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -207,7 +207,12 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) struct msm_kms *kms = priv->kms; struct drm_crtc *async_crtc = NULL; unsigned crtc_mask = get_crtc_mask(state);
bool async = kms->funcs->vsync_time &&
bool async;
if (!kms)
return;
async = kms->funcs->vsync_time && can_do_async(state, &async_crtc);
I also see the same issue on a i.MX53: https://lists.freedesktop.org/archives/freedreno/2021-January/009369.html
Then I got a different suggestion from Rob. Please check:
https://www.spinics.net/lists/dri-devel/msg286648.html and https://www.spinics.net/lists/dri-devel/msg286649.html
Does this series fix the issue in your platform too?
I think that might not help if something fails to probe due to (for example) a missing dependency, so !priv->kms is probably a better check to cover both cases. But the 2nd patch makes a good point, that the suspend/resume path probably needs the same treatment
BR, -R
Hi Rob,
On Fri, Mar 19, 2021 at 11:44 AM Rob Clark robdclark@gmail.com wrote:
I think that might not help if something fails to probe due to (for example) a missing dependency, so !priv->kms is probably a better check to cover both cases. But the 2nd patch makes a good point, that the suspend/resume path probably needs the same treatment
Thanks for the feedback. I will follow the same approach for fixing the suspend/resume path then.
Let me test it and then I will re-submit Dmitry's patch and the one for suspend/resume as part of a patch series.
Thanks
On Fri, Mar 19, 2021 at 8:13 AM Fabio Estevam festevam@gmail.com wrote:
Hi Rob,
On Fri, Mar 19, 2021 at 11:44 AM Rob Clark robdclark@gmail.com wrote:
I think that might not help if something fails to probe due to (for example) a missing dependency, so !priv->kms is probably a better check to cover both cases. But the 2nd patch makes a good point, that the suspend/resume path probably needs the same treatment
Thanks for the feedback. I will follow the same approach for fixing the suspend/resume path then.
Let me test it and then I will re-submit Dmitry's patch and the one for suspend/resume as part of a patch series.
Thanks,
BR, -R
On Fri, Mar 19, 2021 at 12:13 PM Fabio Estevam festevam@gmail.com wrote:
Thanks for the feedback. I will follow the same approach for fixing the suspend/resume path then.
Let me test it and then I will re-submit Dmitry's patch and the one for suspend/resume as part of a patch series.
This approach works here for the suspend/resume path too.
I have just submitted the series, thanks.
On Fri, 19 Mar 2021 at 19:25, Fabio Estevam festevam@gmail.com wrote:
On Fri, Mar 19, 2021 at 12:13 PM Fabio Estevam festevam@gmail.com wrote:
Thanks for the feedback. I will follow the same approach for fixing the suspend/resume path then.
Let me test it and then I will re-submit Dmitry's patch and the one for suspend/resume as part of a patch series.
This approach works here for the suspend/resume path too.
I have just submitted the series, thanks.
Thank you!
dri-devel@lists.freedesktop.org