While trying to fix a problem with suspend/resume that I introduced in the atomic VCPI helpers for MST, I also ran into some issues with i915 varying from "not that bad" to "oh wow that's very bad". Here are the fixes for those issues.
This series was originally just one patch, "drm/i915: Don't send MST hotplugs during resume"
Lyude Paul (3): drm/i915: Block fbdev HPD processing during suspend drm/i915: Don't send MST hotplugs during resume drm/i915: Don't send hotplug in intel_dp_check_mst_status()
drivers/gpu/drm/i915/intel_dp.c | 13 ++++++------ drivers/gpu/drm/i915/intel_drv.h | 10 +++++++++ drivers/gpu/drm/i915/intel_fbdev.c | 33 +++++++++++++++++++++++++++++- 3 files changed, 49 insertions(+), 7 deletions(-)
When resuming, we check whether or not any previously connected MST topologies are still present and if so, attempt to resume them. If this fails, we disable said MST topologies and fire off a hotplug event so that userspace knows to reprobe.
However, sending a hotplug event involves calling drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a connector reprobe in the caller's thread - something we can't do at the point in which i915 calls drm_dp_mst_topology_mgr_resume() since hotplugging hasn't been fully initialized yet.
This currently causes some rather subtle but fatal issues. For example, on my T480s the laptop dock connected to it usually disappears during a suspend cycle, and comes back up a short while after the system has been resumed. This guarantees pretty much every suspend and resume cycle, drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn, a connector hotplug will occur. Now it's Rute Goldberg time: when the connector hotplug occurs, i915 reprobes /all/ of the connectors, including eDP. However, eDP probing requires that we power on the panel VDD which in turn, grabs a wakeref to the appropriate power domain on the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where things start breaking, since this all happens before intel_power_domains_enable() is called we end up leaking the wakeref that was acquired and never releasing it later. Come next suspend/resume cycle, this causes us to fail to shut down the GPU properly, which causes it not to resume properly and die a horrible complicated death.
(as a note: this only happens when there's both an eDP panel and MST topology connected which is removed mid-suspend. One or the other seems to always be OK).
We could try to fix the VDD wakeref leak, but this doesn't seem like it's worth it at all since we aren't able to handle hotplug detection while resuming anyway. So, let's go with a more robust solution inspired by nouveau: block fbdev from handling hotplug events until we resume fbdev. This allows us to still send sysfs hotplug events to be handled later by user space while we're resuming, while also preventing us from actually processing any hotplug events we receive until it's safe.
This fixes the wakeref leak observed on the T480s and as such, also fixes suspend/resume with MST topologies connected on this machine.
Changes since v2: * Don't call drm_fb_helper_hotplug_event() under lock, do it after lock (Chris Wilson) * Don't call drm_fb_helper_hotplug_event() in intel_fbdev_output_poll_changed() under lock (Chris Wilson) * Always set ifbdev->hpd_waiting (Chris Wilson)
Signed-off-by: Lyude Paul lyude@redhat.com Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)") Cc: Todd Previte tprevite@gmail.com Cc: Dave Airlie airlied@redhat.com Cc: Jani Nikula jani.nikula@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Imre Deak imre.deak@intel.com Cc: intel-gfx@lists.freedesktop.org Cc: stable@vger.kernel.org # v3.17+ --- drivers/gpu/drm/i915/intel_drv.h | 10 +++++++++ drivers/gpu/drm/i915/intel_fbdev.c | 33 +++++++++++++++++++++++++++++- 2 files changed, 42 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 90ba5436370e..9ec3d00fbd19 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -213,6 +213,16 @@ struct intel_fbdev { unsigned long vma_flags; async_cookie_t cookie; int preferred_bpp; + + /* Whether or not fbdev hpd processing is temporarily suspended */ + bool hpd_suspended : 1; + /* Set when a hotplug was received while HPD processing was + * suspended + */ + bool hpd_waiting : 1; + + /* Protects hpd_suspended */ + struct mutex hpd_lock; };
struct intel_encoder { diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index 8cf3efe88f02..376ffe842e26 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -681,6 +681,7 @@ int intel_fbdev_init(struct drm_device *dev) if (ifbdev == NULL) return -ENOMEM;
+ mutex_init(&ifbdev->hpd_lock); drm_fb_helper_prepare(dev, &ifbdev->helper, &intel_fb_helper_funcs);
if (!intel_fbdev_init_bios(dev, ifbdev)) @@ -754,6 +755,26 @@ void intel_fbdev_fini(struct drm_i915_private *dev_priv) intel_fbdev_destroy(ifbdev); }
+/* Suspends/resumes fbdev processing of incoming HPD events. When resuming HPD + * processing, fbdev will perform a full connector reprobe if a hotplug event + * was received while HPD was suspended. + */ +static void intel_fbdev_hpd_set_suspend(struct intel_fbdev *ifbdev, int state) +{ + bool send_hpd = false; + + mutex_lock(&ifbdev->hpd_lock); + ifbdev->hpd_suspended = state == FBINFO_STATE_SUSPENDED; + send_hpd = !ifbdev->hpd_suspended && ifbdev->hpd_waiting; + ifbdev->hpd_waiting = false; + mutex_unlock(&ifbdev->hpd_lock); + + if (send_hpd) { + DRM_DEBUG_KMS("Handling delayed fbcon HPD event\n"); + drm_fb_helper_hotplug_event(&ifbdev->helper); + } +} + void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous) { struct drm_i915_private *dev_priv = to_i915(dev); @@ -775,6 +796,7 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous */ if (state != FBINFO_STATE_RUNNING) flush_work(&dev_priv->fbdev_suspend_work); + console_lock(); } else { /* @@ -802,17 +824,26 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
drm_fb_helper_set_suspend(&ifbdev->helper, state); console_unlock(); + + intel_fbdev_hpd_set_suspend(ifbdev, state); }
void intel_fbdev_output_poll_changed(struct drm_device *dev) { struct intel_fbdev *ifbdev = to_i915(dev)->fbdev; + bool send_hpd;
if (!ifbdev) return;
intel_fbdev_sync(ifbdev); - if (ifbdev->vma || ifbdev->helper.deferred_setup) + + mutex_lock(&ifbdev->hpd_lock); + send_hpd = !ifbdev->hpd_suspended; + ifbdev->hpd_waiting = true; + mutex_unlock(&ifbdev->hpd_lock); + + if (send_hpd && (ifbdev->vma || ifbdev->helper.deferred_setup)) drm_fb_helper_hotplug_event(&ifbdev->helper); }
Quoting Lyude Paul (2019-01-29 19:09:59)
When resuming, we check whether or not any previously connected MST topologies are still present and if so, attempt to resume them. If this fails, we disable said MST topologies and fire off a hotplug event so that userspace knows to reprobe.
However, sending a hotplug event involves calling drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a connector reprobe in the caller's thread - something we can't do at the point in which i915 calls drm_dp_mst_topology_mgr_resume() since hotplugging hasn't been fully initialized yet.
This currently causes some rather subtle but fatal issues. For example, on my T480s the laptop dock connected to it usually disappears during a suspend cycle, and comes back up a short while after the system has been resumed. This guarantees pretty much every suspend and resume cycle, drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn, a connector hotplug will occur. Now it's Rute Goldberg time: when the connector hotplug occurs, i915 reprobes /all/ of the connectors, including eDP. However, eDP probing requires that we power on the panel VDD which in turn, grabs a wakeref to the appropriate power domain on the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where things start breaking, since this all happens before intel_power_domains_enable() is called we end up leaking the wakeref that was acquired and never releasing it later. Come next suspend/resume cycle, this causes us to fail to shut down the GPU properly, which causes it not to resume properly and die a horrible complicated death.
(as a note: this only happens when there's both an eDP panel and MST topology connected which is removed mid-suspend. One or the other seems to always be OK).
We could try to fix the VDD wakeref leak, but this doesn't seem like it's worth it at all since we aren't able to handle hotplug detection while resuming anyway. So, let's go with a more robust solution inspired by nouveau: block fbdev from handling hotplug events until we resume fbdev. This allows us to still send sysfs hotplug events to be handled later by user space while we're resuming, while also preventing us from actually processing any hotplug events we receive until it's safe.
This fixes the wakeref leak observed on the T480s and as such, also fixes suspend/resume with MST topologies connected on this machine.
Changes since v2:
- Don't call drm_fb_helper_hotplug_event() under lock, do it after lock (Chris Wilson)
- Don't call drm_fb_helper_hotplug_event() in intel_fbdev_output_poll_changed() under lock (Chris Wilson)
- Always set ifbdev->hpd_waiting (Chris Wilson)
Signed-off-by: Lyude Paul lyude@redhat.com Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)") Cc: Todd Previte tprevite@gmail.com Cc: Dave Airlie airlied@redhat.com Cc: Jani Nikula jani.nikula@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Imre Deak imre.deak@intel.com Cc: intel-gfx@lists.freedesktop.org Cc: stable@vger.kernel.org # v3.17+
I suspect the locking is overkill, but certainly easier to reason than trying to remember all the ins and outs of fbdev with its dubious async initialisation.
Reviewed-by: Chris Wilson chris@chris-wilson.co.uk -Chris
We have a bad habit of calling drm_fb_helper_hotplug_event() far more then we actually need to. MST appears to be one of these cases, where we call drm_fb_helper_hotplug_event() if we fail to resume a connected MST topology in intel_dp_mst_resume(). We don't actually need to do this at all though since hotplug events are already sent from drm_dp_connector_destroy_work() every time connectors are unregistered from userspace's PoV. Additionally, extra calls to drm_fb_helper_hotplug_event() also just mean more of a chance of doing a connector probe somewhere we shouldn't.
So, don't send any hotplug events during resume if the MST topology fails to come up. Just rely on the DP MST helpers to send them for us.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: Imre Deak imre.deak@intel.com Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/i915/intel_dp.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 681e88405ada..c2399acf177b 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -7096,7 +7096,10 @@ void intel_dp_mst_resume(struct drm_i915_private *dev_priv) continue;
ret = drm_dp_mst_topology_mgr_resume(&intel_dp->mst_mgr); - if (ret) - intel_dp_check_mst_status(intel_dp); + if (ret) { + intel_dp->is_mst = false; + drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr, + false); + } } }
On Tue, Jan 29, 2019 at 02:10:00PM -0500, Lyude Paul wrote:
We have a bad habit of calling drm_fb_helper_hotplug_event() far more then we actually need to. MST appears to be one of these cases, where we call drm_fb_helper_hotplug_event() if we fail to resume a connected MST topology in intel_dp_mst_resume(). We don't actually need to do this at all though since hotplug events are already sent from drm_dp_connector_destroy_work() every time connectors are unregistered from userspace's PoV. Additionally, extra calls to drm_fb_helper_hotplug_event() also just mean more of a chance of doing a connector probe somewhere we shouldn't.
So, don't send any hotplug events during resume if the MST topology fails to come up. Just rely on the DP MST helpers to send them for us.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: Imre Deak imre.deak@intel.com Cc: Daniel Vetter daniel@ffwll.ch
drivers/gpu/drm/i915/intel_dp.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 681e88405ada..c2399acf177b 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -7096,7 +7096,10 @@ void intel_dp_mst_resume(struct drm_i915_private *dev_priv) continue;
ret = drm_dp_mst_topology_mgr_resume(&intel_dp->mst_mgr);
if (ret)
intel_dp_check_mst_status(intel_dp);
if (ret) {
intel_dp->is_mst = false;
drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr,
false);
drm_dp_mst_topology_mgr_set_mst() -> drm_dp_mst_topology_put_mstb() -> drm_dp_destroy_mst_branch_device() -> drm_dp_mst_topology_put_port() -> drm_dp_destroy_port() -> drm_dp_destroy_connector_work() -> drm_kms_helper_hotplug_event()
That of course requires that no one is hanging on to the kref(s). The lifetime of the references isn't really clear to me, but I'll take your word that it works.
Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com
}}
}
2.20.1
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
This hotplug also isn't needed: drm_dp_mst_topology_mgr_set_mst() already sends a hotplug on its own from drm_dp_destroy_connector_work() after destroying connectors in the MST topology.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: Imre Deak imre.deak@intel.com Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/i915/intel_dp.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index c2399acf177b..f9113c0cdfcd 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -4608,12 +4608,10 @@ intel_dp_check_mst_status(struct intel_dp *intel_dp)
return ret; } else { - struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp); DRM_DEBUG_KMS("failed to get ESI - device may have failed\n"); intel_dp->is_mst = false; - drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr, intel_dp->is_mst); - /* send a hotplug event */ - drm_kms_helper_hotplug_event(intel_dig_port->base.base.dev); + drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr, + intel_dp->is_mst); } } return -EINVAL;
On Tue, Jan 29, 2019 at 02:10:01PM -0500, Lyude Paul wrote:
This hotplug also isn't needed: drm_dp_mst_topology_mgr_set_mst() already sends a hotplug on its own from drm_dp_destroy_connector_work() after destroying connectors in the MST topology.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: Imre Deak imre.deak@intel.com Cc: Daniel Vetter daniel@ffwll.ch
Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com
drivers/gpu/drm/i915/intel_dp.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index c2399acf177b..f9113c0cdfcd 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -4608,12 +4608,10 @@ intel_dp_check_mst_status(struct intel_dp *intel_dp)
return ret; } else {
struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp); DRM_DEBUG_KMS("failed to get ESI - device may have failed\n"); intel_dp->is_mst = false;
drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr, intel_dp->is_mst);
/* send a hotplug event */
drm_kms_helper_hotplug_event(intel_dig_port->base.base.dev);
drm_dp_mst_topology_mgr_set_mst(&intel_dp->mst_mgr,
} } return -EINVAL;intel_dp->is_mst);
-- 2.20.1
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
dri-devel@lists.freedesktop.org