Hi all,
I'm sending these patches to try to improve the current situation for a particular corner case (DRM driver unbinding).
I could reproduce a specific race condition during the unbinding of the mediatek-drm driver that caused an invalid memory address. The race condition is triggered by a userspace event (gnome-shell requesting a DRM GET_CONNECTOR ioctl) while the encoders and drivers are in the process of being disabled.
While I tried to mitigate this by making a small change in the parade-ps8640 driver (for the bridge I'm testing on) and by making a couple of functions in drm_bridge.c more robust, this is only a symptom of a larger problem that might not be getting enough attention, understandably, because this is an unusual corner case.
The scenario looks like this:
<userspace>: unbind mediatek-drm --------------------+ | | <kernel> | | | ... | | ... mtk_dsi_unbind | | | `- drm_encoder_cleanup v | | gnome-shell ... `- drm_bridge_detach *<------ ioctl (GET_CONNECTOR) | <kernel> | ... | | ps8640_bridge_get_edid | `drm_bridge_chain_post_disable
which causes drm_bridge_chain_post_disable() to walk the bridge chain after the bridge has already been detached and removed from the list. I guess a more radical and subsystem-wide solution would be to not allow or to block certain ioctl calls once the driver has started to unbind, but I'd like to hear your opinion on this.
This was tested on an Acer Chromebook R13 (Elm, MT8173) running Debian Sid, the command that triggers the race condition is
echo mediatek-drm.12.auto > /sys/bus/platform/drivers/mediatek-drm/unbind
Cheers, Ricardo
Ricardo Cañuelo (2): drm/bridge: parade-ps8640: avoid race condition on driver unbinding drm/bridge: Add extra checks in pre_enable and post_enable
drivers/gpu/drm/bridge/parade-ps8640.c | 6 +++--- drivers/gpu/drm/drm_bridge.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-)
-- 2.25.1
When unbinding a DRM master driver there's a race condition that sometimes results in an invalid vm access when userspace (gnome-shell) issues a DRM_IOCTL_MODE_GETCONNECTOR ioctl right after a bridge has been removed from an encoder's bridge chain.
This means that once a bridge has been disabled and gone through ps8640_post_disable(), if ps8640_bridge_get_edid() runs afterwards as a result of that ioctl call it will call drm_bridge_chain_pre_enable() and drm_bridge_chain_post_disable() again in a bridge that has been already detached.
Setting `ps_bridge->pre_enabled = false` at a later stage of the bringdown path and calling drm_bridge_chain_pre_enable() inside ps8640_bridge_get_edid() only if needed avoid this, although a proper subsystem-wide fix would be the proper solution, since the same type of race conditions might happen somewhere else.
Tested on an Acer Chromebook R13 (Elm, MT8173) with Debian Sid.
Signed-off-by: Ricardo Cañuelo ricardo.canuelo@collabora.com --- drivers/gpu/drm/bridge/parade-ps8640.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c index 3f17337ee389..a927787a89bf 100644 --- a/drivers/gpu/drm/bridge/parade-ps8640.c +++ b/drivers/gpu/drm/bridge/parade-ps8640.c @@ -434,8 +434,6 @@ static void ps8640_post_disable(struct drm_bridge *bridge) { struct ps8640 *ps_bridge = bridge_to_ps8640(bridge);
- ps_bridge->pre_enabled = false; - ps8640_bridge_vdo_control(ps_bridge, DISABLE); pm_runtime_put_sync_suspend(&ps_bridge->page[PAGE0_DP_CNTL]->dev); } @@ -487,6 +485,7 @@ static void ps8640_bridge_detach(struct drm_bridge *bridge) drm_dp_aux_unregister(&ps_bridge->aux); if (ps_bridge->link) device_link_del(ps_bridge->link); + ps_bridge->pre_enabled = false; }
static struct edid *ps8640_bridge_get_edid(struct drm_bridge *bridge, @@ -508,7 +507,8 @@ static struct edid *ps8640_bridge_get_edid(struct drm_bridge *bridge, * EDID, for this chip, we need to do a full poweron, otherwise it will * fail. */ - drm_bridge_chain_pre_enable(bridge); + if (poweroff) + drm_bridge_chain_pre_enable(bridge);
edid = drm_get_edid(connector, ps_bridge->page[PAGE0_DP_CNTL]->adapter);
Il 11/03/22 10:34, Ricardo Cañuelo ha scritto:
When unbinding a DRM master driver there's a race condition that sometimes results in an invalid vm access when userspace (gnome-shell) issues a DRM_IOCTL_MODE_GETCONNECTOR ioctl right after a bridge has been removed from an encoder's bridge chain.
This means that once a bridge has been disabled and gone through ps8640_post_disable(), if ps8640_bridge_get_edid() runs afterwards as a result of that ioctl call it will call drm_bridge_chain_pre_enable() and drm_bridge_chain_post_disable() again in a bridge that has been already detached.
Setting `ps_bridge->pre_enabled = false` at a later stage of the bringdown path and calling drm_bridge_chain_pre_enable() inside ps8640_bridge_get_edid() only if needed avoid this, although a proper subsystem-wide fix would be the proper solution, since the same type of race conditions might happen somewhere else.
Tested on an Acer Chromebook R13 (Elm, MT8173) with Debian Sid.
Signed-off-by: Ricardo Cañuelo ricardo.canuelo@collabora.com
Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com
drivers/gpu/drm/bridge/parade-ps8640.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
Depending on the bridge code, certain userspace events during a driver teardown (such as a DRM ioctl call) might cause a race condition where the drm_bridge_chain_pre_enable() and drm_bridge_chain_post_enable() functions could be called for a bridge that has just been detached and removed from the bridge chain of an encoder.
This change makes these functions a bit more robust by bailing out if the bridge has already been detached.
Tested on an Acer Chromebook R13 (Elm, MT8173) with Debian Sid.
Signed-off-by: Ricardo Cañuelo ricardo.canuelo@collabora.com --- drivers/gpu/drm/drm_bridge.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_bridge.c b/drivers/gpu/drm/drm_bridge.c index c96847fc0ebc..e074aa456dd1 100644 --- a/drivers/gpu/drm/drm_bridge.c +++ b/drivers/gpu/drm/drm_bridge.c @@ -529,7 +529,7 @@ void drm_bridge_chain_post_disable(struct drm_bridge *bridge) { struct drm_encoder *encoder;
- if (!bridge) + if (!bridge || !bridge->dev) return;
encoder = bridge->encoder; @@ -585,7 +585,7 @@ void drm_bridge_chain_pre_enable(struct drm_bridge *bridge) struct drm_encoder *encoder; struct drm_bridge *iter;
- if (!bridge) + if (!bridge || !bridge->dev) return;
encoder = bridge->encoder;
Il 11/03/22 10:34, Ricardo Cañuelo ha scritto:
Depending on the bridge code, certain userspace events during a driver teardown (such as a DRM ioctl call) might cause a race condition where the drm_bridge_chain_pre_enable() and drm_bridge_chain_post_enable() functions could be called for a bridge that has just been detached and removed from the bridge chain of an encoder.
This change makes these functions a bit more robust by bailing out if the bridge has already been detached.
Tested on an Acer Chromebook R13 (Elm, MT8173) with Debian Sid.
Signed-off-by: Ricardo Cañuelo ricardo.canuelo@collabora.com
On various chromebooks with different MediaTek SoC models, and on Qualcomm SC7180 Trogdor,
Tested-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com
dri-devel@lists.freedesktop.org