On Mon, Oct 21, 2019 at 10:36:01PM -0400, Lyude Paul wrote:
This is a complicated one. Essentially, there's currently a problem in the MST core that hasn't really caused any issues that we're aware of (emphasis on "that we're aware of"): locking.
When we go through and probe the link addresses and path resources in a topology, we hold no locks when updating ports with said information. The members I'm referring to in particular are:
- ldps
- ddps
- mcs
- pdt
- dpcd_rev
- num_sdp_streams
- num_sdp_stream_sinks
- available_pbn
- input
- connector
Now that we're handling UP requests asynchronously and will be using some of the struct members mentioned above in atomic modesetting in the future for features such as PBN validation, this is going to become a lot more important. As well, the next few commits that prepare us for and introduce suspend/resume reprobing will also need clear locking in order to prevent from additional racing hilarities that we never could have hit in the past.
So, let's solve this issue by using &mgr->base.lock, the modesetting lock which currently only protects &mgr->base.state. This works perfectly because it allows us to avoid blocking connection_mutex unnecessarily, and we can grab this in connector detection paths since it's a ww mutex. We start by having drm_dp_mst_handle_up_req() hold this when updating ports. For drm_dp_mst_handle_link_address_port() things are a bit more complicated. As I've learned the hard way, we can grab &mgr->lock.base for everything except for port->connector. See, our normal driver probing paths end up generating this rather obvious lockdep chain:
&drm->mode_config.mutex -> crtc_ww_class_mutex/crtc_ww_class_acquire -> &connector->mutex
However, sysfs grabs &drm->mode_config.mutex in order to protect itself from connector state changing under it. Because this entails grabbing kn->count, e.g. the lock that the kernel provides for protecting sysfs contexts, we end up grabbing kn->count followed by &drm->mode_config.mutex. This ends up creating an extremely rude chain:
&kn->count -> &drm->mode_config.mutex -> crtc_ww_class_mutex/crtc_ww_class_acquire -> &connector->mutex
I mean, look at that thing! It's just evil!!! This gross thing ends up making any calls to drm_connector_register()/drm_connector_unregister() impossible when holding any kind of modesetting lock. This is annoying because ideally, we always want to ensure that drm_dp_mst_port->connector never changes when doing an atomic commit or check that would affect the atomic topology state so that it can reliably and easily be used from future DRM DP MST helpers to assist with tasks such as scanning through the current VCPI allocations and adding connectors which need to have their allocations updated in response to a bandwidth change or the like.
Being able to hold &mgr->base.lock throughout the entire link probe process would have been _great_, since we could prevent userspace from ever seeing any states in-between individual port changes and as a result likely end up with a much faster probe and more consistent results from said probes. But without some rework of how we handle connector probing in sysfs it's not at all currently possible. In the future, maybe we can try using the sysfs locks to protect updates to connector probing state and fix this mess.
So for now, to protect everything other than port->connector under &mgr->base.lock and ensure that we still have the guarantee that atomic check/commit contexts will never see port->connector change we use a silly trick. See: port->connector only needs to change in order to ensure that input ports (see the MST spec) never have a ghost connector associated with them. But, there's nothing stopping us from simply throwing the entire port out and creating a new one in order to maintain that requirement while still keeping port->connector consistent across the lifetime of the port in atomic check/commit contexts. For all intended purposes this works fine, as we validate ports in any contexts we care about before using them and as such will end up reporting the connector as disconnected until it's port's destruction finalizes. So, we just do that in cases where we detect port->input has transitioned from true->false. We don't need to worry about the other direction, since a port without a connector isn't visible to userspace and as such doesn't need to be protected by &mgr->base.lock until we finish registering a connector for it.
For updating members of drm_dp_mst_port other than port->connector, we simply grab &mgr->base.lock in drm_dp_mst_link_probe_work() for already registered ports, update said members and drop the lock before potentially registering a connector and probing the link address of it's children.
Finally, we modify drm_dp_mst_detect_port() to take a modesetting lock acquisition context in order to acquire &mgr->base.lock under &connection_mutex and convert all it's users over to using the .detect_ctx probe hooks.
With that, we finally have well defined locking.
Changes since v4:
- Get rid of port->mutex, stop using connection_mutex and just use our own modesetting lock - mgr->base.lock. Also, add a probe_lock that comes before this patch.
- Just throw out ports that get changed from an output to an input, and replace them with new ports. This lets us ensure that modesetting contexts never see port->connector go from having a connector to being NULL.
- Write an extremely detailed explanation of what problems this is trying to fix, since there's a _lot_ of context here and I honestly forgot some of it myself a couple times.
- Don't grab mgr->lock when reading port->mstb in drm_dp_mst_handle_link_address_port(). It's not needed.
Cc: Juston Li juston.li@intel.com Cc: Imre Deak imre.deak@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Harry Wentland hwentlan@amd.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Sean Paul sean@poorly.run Signed-off-by: Lyude Paul lyude@redhat.com
Overall makes sense to me. Thanks for the comprehensive commit message and comments, they definitely help :)
Just one nit below,
Reviewed-by: Sean Paul sean@poorly.run
.../display/amdgpu_dm/amdgpu_dm_mst_types.c | 28 +-- drivers/gpu/drm/drm_dp_mst_topology.c | 230 ++++++++++++------ drivers/gpu/drm/i915/display/intel_dp_mst.c | 28 ++- drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 +-- drivers/gpu/drm/radeon/radeon_dp_mst.c | 24 +- include/drm/drm_dp_mst_helper.h | 38 ++- 6 files changed, 240 insertions(+), 140 deletions(-)
/snip
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 11d842f0bff5..7bf4db91ff90 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c
/snip
@@ -1912,35 +1984,40 @@ drm_dp_mst_handle_link_address_port(struct drm_dp_mst_branch *mstb, { struct drm_dp_mst_topology_mgr *mgr = mstb->mgr; struct drm_dp_mst_port *port;
- bool created = false;
- int old_ddps = 0;
int old_ddps = 0, ret;
u8 new_pdt = DP_PEER_DEVICE_NONE;
bool created = false, send_link_addr = false;
port = drm_dp_get_port(mstb, port_msg->port_number); if (!port) {
port = kzalloc(sizeof(*port), GFP_KERNEL);
port = drm_dp_mst_add_port(dev, mgr, mstb,
if (!port) return;port_msg->port_number);
kref_init(&port->topology_kref);
kref_init(&port->malloc_kref);
port->parent = mstb;
port->port_num = port_msg->port_number;
port->mgr = mgr;
port->aux.name = "DPMST";
port->aux.dev = dev->dev;
port->aux.is_remote = true;
/*
* Make sure the memory allocation for our parent branch stays
* around until our own memory allocation is released
created = true;
- } else if (port_msg->input_port && !port->input && port->connector) {
/* Destroying the connector is impossible in this context, so
*/* replace the port with a new one
drm_dp_mst_get_mstb_malloc(mstb);
drm_dp_mst_topology_unlink_port(mgr, port);
drm_dp_mst_topology_put_port(port);
port = drm_dp_mst_add_port(dev, mgr, mstb,
port_msg->port_number);
if (!port)
return;
created = true; } else {
/* Locking is only needed when the port has a connector
* exposed to userspace
*/
drm_modeset_lock(&mgr->base.lock, NULL);
Random musing: It's kind of unfortunate that we don't have a void varient of drm_modeset_lock for when there's no acquire_ctx since we end up with a mix of drm_modeset_lock calls with and without return checking.
/snip
@@ -3441,22 +3516,31 @@ EXPORT_SYMBOL(drm_dp_mst_hpd_irq); /**
- drm_dp_mst_detect_port() - get connection status for an MST port
- @connector: DRM connector for this port
- @ctx: The acquisition context to use for grabbing locks
- @mgr: manager for this port
- @port: unverified pointer to a port
- @port: pointer to a port
- This returns the current connection state for a port. It validates the
- port pointer still exists so the caller doesn't require a reference
- This returns the current connection state for a port.
"On error, this returns -errno"
/snip
-- 2.21.0