On Mon, 2 May 2022 at 04:38, Abhinav Kumar quic_abhinavk@quicinc.com wrote:
Looks like our new CI has given all the answers we need :) which is a great win for the CI in my opinion.
Take a look at this report : https://gitlab.freedesktop.org/drm/msm/-/jobs/22015361
This issue seems to be because this change https://github.com/torvalds/linux/commit/169466d4e59ca204683998b7f45673ebf0e... is missing in our tree.
Without this change, what happens is that we are not hitting the return 0 because we check for ENODEV.
/* * External bridges are mandatory for eDP interfaces: one has to * provide at least an eDP panel (which gets wrapped into panel-bridge). * * For DisplayPort interfaces external bridges are optional, so * silently ignore an error if one is not present (-ENODEV). */ rc = dp_parser_find_next_bridge(dp_priv->parser); if (!dp->is_edp && rc == -ENODEV) return 0;
So, I think we should do both:
- Since we are running CI on the tree, backport this change so that
this error path doesnt hit?
- Add this protection as well because this shows that we can indeed hit
this path in EDEFER cases causing this crash.
I have been waiting for v2 for the last week or so. It should include a fixed Fixes tag and an updated description (which should note that this happens in the error path, etc) as requested by Stephen.
Thanks
Abhinav
On 4/27/2022 3:53 AM, Dmitry Baryshkov wrote:
On 27/04/2022 00:50, Stephen Boyd wrote:
Quoting Vinod Polimera (2022-04-25 23:02:11)
Avoid clearing irqs and derefernce hw_intr when hw_intr is null.
Presumably this is only the case when the display driver doesn't fully probe and something probe defers? Can you clarify how this situation happens?
BUG: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Call trace: dpu_core_irq_uninstall+0x50/0xb0 dpu_irq_uninstall+0x18/0x24 msm_drm_uninit+0xd8/0x16c msm_drm_bind+0x580/0x5fc try_to_bring_up_master+0x168/0x1c0 __component_add+0xb4/0x178 component_add+0x1c/0x28 dp_display_probe+0x38c/0x400 platform_probe+0xb0/0xd0 really_probe+0xcc/0x2c8 __driver_probe_device+0xbc/0xe8 driver_probe_device+0x48/0xf0 __device_attach_driver+0xa0/0xc8 bus_for_each_drv+0x8c/0xd8 __device_attach+0xc4/0x150 device_initial_probe+0x1c/0x28
Fixes: a73033619ea ("drm/msm/dpu: squash dpu_core_irq into dpu_hw_interrupts")
The fixes tag looks odd. In dpu_core_irq_uninstall() at that commit it is dealing with 'irq_obj' which isn't a pointer. After commit f25f656608e3 ("drm/msm/dpu: merge struct dpu_irq into struct dpu_hw_intr") dpu_core_irq_uninstall() starts using 'hw_intr' which is allocated on the heap. If we backported this patch to a place that had a73033619ea without f25f656608e3 it wouldn't make any sense.
I'd agree here. The following tag would be correct:
Fixes: f25f656608e3 ("drm/msm/dpu: merge struct dpu_irq into struct dpu_hw_intr")
Signed-off-by: Vinod Polimera quic_vpolimer@quicinc.com
drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c index c515b7c..ab28577 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c @@ -599,6 +599,9 @@ void dpu_core_irq_uninstall(struct dpu_kms *dpu_kms) { int i;
if (!dpu_kms->hw_intr)
return;
pm_runtime_get_sync(&dpu_kms->pdev->dev); for (i = 0; i < dpu_kms->hw_intr->total_irqs; i++)