There is kernel crashed due to unable to handle kernel NULL pointer dereference of dp_panel->connector while running DP link layer compliance test case 4.2.2.6 (EDID Corruption Detection). This patch fixes this problem by populating connector of dp_panel.
[drm:dp_panel_read_sink_caps] *ERROR* panel edid read failed Unable to handle kernel NULL pointer dereference at virtual address 00000000000006e1 Mem abort info: ESR = 0x96000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000115f25000 [00000000000006e1] pgd=00000001174fe003, p4d=00000001174fe003, pud=00000001174fe003, pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP {...]
Changes in V2: -- populate panel connector at msm_dp_modeset_init() instead of at dp_panel_read_sink_caps()
Fixes: 7948fe12d47 ("drm/msm/dp: return correct edid checksum after corrupted edid checksum read") Signee-off-by: Kuogee Hsieh quic_khsieh@quicinc.com --- drivers/gpu/drm/msm/dp/dp_display.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c index 3449d3f..c282bbf 100644 --- a/drivers/gpu/drm/msm/dp/dp_display.c +++ b/drivers/gpu/drm/msm/dp/dp_display.c @@ -1495,36 +1495,41 @@ void msm_dp_debugfs_init(struct msm_dp *dp_display, struct drm_minor *minor) } }
-int msm_dp_modeset_init(struct msm_dp *dp_display, struct drm_device *dev, +int msm_dp_modeset_init(struct msm_dp *dp, struct drm_device *dev, struct drm_encoder *encoder) { struct msm_drm_private *priv; + struct dp_display_private *dp_display; int ret;
- if (WARN_ON(!encoder) || WARN_ON(!dp_display) || WARN_ON(!dev)) + if (WARN_ON(!encoder) || WARN_ON(!dp) || WARN_ON(!dev)) return -EINVAL;
priv = dev->dev_private; - dp_display->drm_dev = dev; + dp->drm_dev = dev; + + dp_display = container_of(dp, struct dp_display_private, dp_display);
- ret = dp_display_request_irq(dp_display); + ret = dp_display_request_irq(dp); if (ret) { DRM_ERROR("request_irq failed, ret=%d\n", ret); return ret; }
- dp_display->encoder = encoder; + dp->encoder = encoder;
- dp_display->connector = dp_drm_connector_init(dp_display); - if (IS_ERR(dp_display->connector)) { - ret = PTR_ERR(dp_display->connector); + dp->connector = dp_drm_connector_init(dp); + if (IS_ERR(dp->connector)) { + ret = PTR_ERR(dp->connector); DRM_DEV_ERROR(dev->dev, "failed to create dp connector: %d\n", ret); - dp_display->connector = NULL; + dp->connector = NULL; return ret; }
- priv->connectors[priv->num_connectors++] = dp_display->connector; + dp_display->panel->connector = dp->connector; + + priv->connectors[priv->num_connectors++] = dp->connector; return 0; }
Quoting Kuogee Hsieh (2021-12-29 11:17:02)
Can you explain how we get into that situation? Like
"We never assign struct dp_panel::connector, instead the connector is stored in struct msm_dp::connector. When we run compliance testing test case 4.2.2.6 dp_panel_handle_sink_request() won't have a valid edid set in struct dp_panel::edid so we'll try to use the connectors real_edid_checksum and hit a NULL pointer deref error because the connector pointer is never assigned."
This sort of stuff isn't really useful because it takes quite a few lines to say "We hit a NULL pointer deref" which was already stated. I'd rather have a clear description of what goes wrong and how setting the pointer in msm_dp_modeset_init() fixes it.
This is the one line that matters I think? Can we reach the connector for the dp device without going through the panel in dp_panel_handle_sink_request()? That would reduce the number of struct elements if possible.
Can we not rename all the local variables in this patch and do it later or never? Reading this patch takes a long time because we have to make sure nothing has actually changed with the rename of 'dp_display' to 'dp'.
dri-devel@lists.freedesktop.org