Hotplug detection is a mode setting operation and must hold the struct_mutex or risk colliding with other mode setting operations.
In particular, the display port hotplug function attempts to re-train the link if the monitor is supposed to be running when plugged back in. If that happens while mode setting is underway, the link will get scrambled, leaving it in an inconsistent state.
Signed-off-by: Keith Packard keithp@keithp.com --- drivers/gpu/drm/i915/i915_irq.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 3b03f85..5fe8f28 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -306,12 +306,15 @@ static void i915_hotplug_work_func(struct work_struct *work) struct drm_mode_config *mode_config = &dev->mode_config; struct intel_encoder *encoder;
+ mutex_lock(&dev_priv->dev->struct_mutex); DRM_DEBUG_KMS("running encoder hotplug functions\n");
list_for_each_entry(encoder, &mode_config->encoder_list, base.head) if (encoder->hot_plug) encoder->hot_plug(encoder);
+ mutex_unlock(&dev_priv->dev->struct_mutex); + /* Just fire off a uevent and let userspace tell us what to do */ drm_helper_hpd_irq_event(dev); }
On Mon, 25 Jul 2011 10:10:29 -0700 Keith Packard keithp@keithp.com wrote:
yay, sounds like this will fix Andrew's problem and probably lots of other random DP related failures.
Looks like the ->detect function is similarly protected at the call site (though one level up in ->fill_modes), so it should be safe. Looks like all the call sites in the link_status function are safe too.
Reviewed-by: Jesse Barnes jbarnes@virtuousgeek.org
Let's get this one upstream asap. Should probably be cc'd to stable@kernel.org as well.
Will test tonight.
It looks like there is a lot of hotplug activity when 'xset dpms force off' gets run. (That's not a typo. I do mean "off," not "on."
See attached trace. perf rocks, even over ssh :)
--Andy
On Mon, Jul 25, 2011 at 1:37 PM, Jesse Barnes jbarnes@virtuousgeek.org wrote:
On Mon, 25 Jul 2011 13:40:58 -0400, Andrew Lutomirski luto@mit.edu wrote:
Will test tonight.
Thanks.
It looks like there is a lot of hotplug activity when 'xset dpms force off' gets run. (That's not a typo. I do mean "off," not "on."
Yup, that's what I've seen as well -- do a mode set to turn stuff off and you get spammed with hotplug events.
See attached trace. perf rocks, even over ssh :)
Wow. You're nested about three deep in the mode setting code due to overlapping hotplug events. What could possibly go wrong?
Makes me optimistic that a bit of locking will help a lot here.
On Mon, Jul 25, 2011 at 1:37 PM, Jesse Barnes jbarnes@virtuousgeek.org wrote:
Doesn't help :(
When I do 'xset dpms force off', one of two things happens. Either the display comes back all by itself or it never comes back until I power cycle it.
If the display is generating a hotplug event after the dpms code drops the link, then with "drm/i915/dp: remove DPMS mode tracking from DP" applied the driver will try to bring the display back up. Maybe my display can't handle coming back up that quickly after being told to go to sleep, or maybe there's another bug.
Is the original patch supposed to bring the display up if the user unplugs it and re-plugs it? If so, why? And shouldn't a dpms off command at least stick until a hotplug event reports that the display isn't there?
--Andy
On Mon, 25 Jul 2011 22:52:15 -0400, Andrew Lutomirski luto@mit.edu wrote:
Doesn't help :(
Oh, it helps, just not this issue :-)
Oh. Right. Actual DPMS, as in power savings. So, I think Jesse and I were both wrong -- we *do* need to track DPMS in the DP driver so that we can skip link retraining when DPMS is off. The rest of the connection is disabled, so doing the link retraining may well confuse the DP system badly. Turning the monitor off and back on will generate suitable hot plug messages and presumably clean things up once the rest of the stack is running again.
Indeed you are correct -- we don't gate the retraining on whether the monitor is supposed to be running or not.
Is the original patch supposed to bring the display up if the user unplugs it and re-plugs it? If so, why?
Yes. In the absence of an external agent listening to the hotplug events (which is hard to arrange these days, unless you're running an X environment that doesn't watch for monitor hot-plug), the kernel will see the hot-plug event, decide that because the monitor is supposed to be running (has a CRTC assigned), it will go ahead and try to retrain the DP link.
The DPMS off should stick until DPMS is set back on -- hotplug shouldn't have any effect on DPMS.
Here's a patch which reverts the DPMS tracking, and then fixes the bug that it had -- you wouldn't get retraining on hotplug after the driver had been initialized because nothing in the mode setting path would set the dpms_mode to DRM_MODE_DPMS_ON, so the hotplug code would bail every time. With that fixed, this patch should work for you *and* for others.
Care to give it a try?
From 59b920597999381fab70c485c161dd50590e561a Mon Sep 17 00:00:00 2001 From: Keith Packard keithp@keithp.com Date: Mon, 25 Jul 2011 22:37:51 -0700 Subject: [PATCH] Revert and fix "drm/i915/dp: remove DPMS mode tracking from DP"
This reverts commit 885a50147f00a8a80108904bf58a18af357717f3.
We actually *do* need to track DPMS state so that on hotplug, we don't retrain the link until DPMS is disabled.
However, that code had a small bug -- it wouldn't set the dpms_mode at mode set time, and so link retraining would not actually occur on monitor hotplug until the monitor had gone through a DPMS off/DPMS on cycle. Setting dpms_mode to DRM_MODE_DPMS_ON in intel_dp_commit fixes that.
Signed-off-by: Keith Packard keithp@keithp.com --- drivers/gpu/drm/i915/intel_dp.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 9f134d2..4493641 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -50,6 +50,7 @@ struct intel_dp { bool has_audio; int force_audio; uint32_t color_range; + int dpms_mode; uint8_t link_bw; uint8_t lane_count; uint8_t dpcd[8]; @@ -1011,6 +1012,8 @@ static void intel_dp_commit(struct drm_encoder *encoder)
if (is_edp(intel_dp)) ironlake_edp_backlight_on(dev); + + intel_dp->dpms_mode = DRM_MODE_DPMS_ON; }
static void @@ -1045,6 +1048,7 @@ intel_dp_dpms(struct drm_encoder *encoder, int mode) if (is_edp(intel_dp)) ironlake_edp_backlight_on(dev); } + intel_dp->dpms_mode = mode; }
/* @@ -1591,6 +1595,9 @@ intel_dp_get_dpcd(struct intel_dp *intel_dp) static void intel_dp_check_link_status(struct intel_dp *intel_dp) { + if (intel_dp->dpms_mode != DRM_MODE_DPMS_ON) + return; + if (!intel_dp->base.base.crtc) return;
@@ -1939,6 +1946,7 @@ intel_dp_init(struct drm_device *dev, int output_reg) return;
intel_dp->output_reg = output_reg; + intel_dp->dpms_mode = -1;
intel_connector = kzalloc(sizeof(struct intel_connector), GFP_KERNEL); if (!intel_connector) {
On Tue, Jul 26, 2011 at 1:54 AM, Keith Packard keithp@keithp.com wrote:
Should that be some actual mode constant instead of -1?
In any case, this patch, applied manually on top of the struct_mutex fix, seems to work. xset dpms force off turns the display off briefly (presumably X or GNOME helpfully turns it back on), but if I let the display turn off on its own, it comes back.
There's still an obnoxious bug, though: I use audio over DP, and when the display turns off, the attached speaker crackles loudly for a second or two. I don't think it's a hardware problem with the monitor, because if I just pull the plug it doesn't make any noise. Should the driver do something to drop the audio link before turning off the DP link as a whole?
--Andy
Two things I've noticed: - Why not dev->mode_config.mutex? I was under the impression that mode_config.mutex protects most of the modesetting state and dev->struct_mutex protects things related to the gpu execution cores (i.e. all things gem), with struct_mutex nested within mode_config.mutex. It's hazy at the edges and likely broken in tons of corner cases, but still ... This has also the benefit that it won't stall execbuf. - And a nitpick: Why the dev_priv->dev indirection, when dev is already lying around?
-Daniel
On Tue, 26 Jul 2011 09:24:39 +0200, Daniel Vetter daniel@ffwll.ch wrote:
Two things I've noticed:
You're right, of course. I noticed that just after posting that version and updated it; the updated version is on my drm-intel-fixes branch already (having been reviewed by Jesse).
All nicely cleaned up by using &mode_config->mutex instead :-)
Thanks for looking it over!
On Tue, 26 Jul 2011 08:23:13 -0700 Keith Packard keithp@keithp.com wrote:
I'd like to amend my reviewed by and say the lock shouldn't be held around the call to the drm helper function. It queues some work that also takes the mode config lock, which will break. So you can drop it before that... Previously I had only checked our internal driver callbacks but missed that the lock was held across the helper too.
On Tue, 26 Jul 2011 12:12:25 -0700, Jesse Barnes jbarnes@virtuousgeek.org wrote:
So the work may get executed immediately rather than being run later at some point?
On Wed, 27 Jul 2011 02:21:24 -0700 Keith Packard keithp@keithp.com wrote:
It sure looks that way... but I don't remember any rule about work queue items having inter dependencies like this.
On Wed, 27 Jul 2011 09:03:31 -0700, Jesse Barnes jbarnes@virtuousgeek.org wrote:
Sounds fine; I've stuck a fixup that moves the mutex_unlock:
if (encoder->hot_plug) encoder->hot_plug(encoder);
+ mutex_unlock(&mode_config->mutex); + /* Just fire off a uevent and let userspace tell us what to do */ drm_helper_hpd_irq_event(dev); - - mutex_unlock(&mode_config->mutex); }
On Thu, Jul 28, 2011 at 03:50:00PM -0700, Keith Packard wrote:
I've checked the workqueue code and haven't found it to run a work immediately, it's always queued. Further this problem is very easy to diagnose: Even without lockdep the scheduler will notice the stuck task after about 120s and the backtrace should make matters extremely clear. On the other hand if somebody adds some nice state clobbering in the drm helper, we have a very hard bug to track down.
Generally modesetting isn't perf critical, so I vote for more locking, just in case. -Daniel
dri-devel@lists.freedesktop.org