While in real life, we could never fail to grab the newly created mutex, ww_mutex fault injection has no way to know this. Which could result that kernels built with CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y might fail to acquire the new crtc lock. Which results in bad things when the locks are dropped.
See: https://bugzilla.kernel.org/show_bug.cgi?id=83341
Signed-off-by: Rob Clark robdclark@gmail.com --- drivers/gpu/drm/drm_crtc.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 7d7c1fd..8bb11fa 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -682,7 +682,15 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, drm_modeset_lock_all(dev); drm_modeset_lock_init(&crtc->mutex); /* dropped by _unlock_all(): */ - drm_modeset_lock(&crtc->mutex, config->acquire_ctx); + /* NOTE: use trylock here for the benefit of ww_mutex fault + * injection. We cannot actually fail to grab this lock (as + * it has only just been created), but fault injection does + * not know this, which can result in the this lock failing, + * and hilarity when we later try to drop the locks. See: + * https://bugzilla.kernel.org/show_bug.cgi?id=83341 + */ + ret = ww_mutex_trylock(&crtc->mutex.mutex); + WARN_ON(ret);
ret = drm_mode_object_get(dev, &crtc->base, DRM_MODE_OBJECT_CRTC); if (ret)
On Fri, Sep 05, 2014 at 07:59:45AM -0400, Rob Clark wrote:
While in real life, we could never fail to grab the newly created mutex, ww_mutex fault injection has no way to know this. Which could result that kernels built with CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y might fail to acquire the new crtc lock. Which results in bad things when the locks are dropped.
See: https://bugzilla.kernel.org/show_bug.cgi?id=83341
Signed-off-by: Rob Clark robdclark@gmail.com
drivers/gpu/drm/drm_crtc.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 7d7c1fd..8bb11fa 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -682,7 +682,15 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, drm_modeset_lock_all(dev); drm_modeset_lock_init(&crtc->mutex); /* dropped by _unlock_all(): */
- drm_modeset_lock(&crtc->mutex, config->acquire_ctx);
- /* NOTE: use trylock here for the benefit of ww_mutex fault
* injection. We cannot actually fail to grab this lock (as
* it has only just been created), but fault injection does
* not know this, which can result in the this lock failing,
* and hilarity when we later try to drop the locks. See:
* https://bugzilla.kernel.org/show_bug.cgi?id=83341
*/
- ret = ww_mutex_trylock(&crtc->mutex.mutex);
- WARN_ON(ret);
Hm, I've thought on our quick discussion on irc we've agreed that the locking here in the init path is useless anyway and best dropped? Not just remove the crtc locking, but the entire modeset_lock_all. -Daniel
ret = drm_mode_object_get(dev, &crtc->base, DRM_MODE_OBJECT_CRTC); if (ret) -- 1.9.3
On Fri, Sep 5, 2014 at 8:25 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 05, 2014 at 07:59:45AM -0400, Rob Clark wrote:
While in real life, we could never fail to grab the newly created mutex, ww_mutex fault injection has no way to know this. Which could result that kernels built with CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y might fail to acquire the new crtc lock. Which results in bad things when the locks are dropped.
See: https://bugzilla.kernel.org/show_bug.cgi?id=83341
Signed-off-by: Rob Clark robdclark@gmail.com
drivers/gpu/drm/drm_crtc.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 7d7c1fd..8bb11fa 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -682,7 +682,15 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, drm_modeset_lock_all(dev); drm_modeset_lock_init(&crtc->mutex); /* dropped by _unlock_all(): */
drm_modeset_lock(&crtc->mutex, config->acquire_ctx);
/* NOTE: use trylock here for the benefit of ww_mutex fault
* injection. We cannot actually fail to grab this lock (as
* it has only just been created), but fault injection does
* not know this, which can result in the this lock failing,
* and hilarity when we later try to drop the locks. See:
* https://bugzilla.kernel.org/show_bug.cgi?id=83341
*/
ret = ww_mutex_trylock(&crtc->mutex.mutex);
WARN_ON(ret);
Hm, I've thought on our quick discussion on irc we've agreed that the locking here in the init path is useless anyway and best dropped? Not just remove the crtc locking, but the entire modeset_lock_all.
well, 0day appears to disagree with you.. I still think we should go the trylock route for 3.17, as it is more the more conservative patch.
I'm not against getting rid of that locking (which is in fact overkill) once the other fallout is fixed up. But that seems more like a merge-window thing, so probably best to wait for 3.18.
BR, -R
-Daniel
ret = drm_mode_object_get(dev, &crtc->base, DRM_MODE_OBJECT_CRTC); if (ret)
-- 1.9.3
-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On 09/07/2014 05:02 PM, Rob Clark wrote:
On Fri, Sep 5, 2014 at 8:25 AM, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Sep 05, 2014 at 07:59:45AM -0400, Rob Clark wrote:
While in real life, we could never fail to grab the newly created mutex, ww_mutex fault injection has no way to know this. Which could result that kernels built with CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y might fail to acquire the new crtc lock. Which results in bad things when the locks are dropped.
See: https://bugzilla.kernel.org/show_bug.cgi?id=83341
Signed-off-by: Rob Clark robdclark@gmail.com
drivers/gpu/drm/drm_crtc.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 7d7c1fd..8bb11fa 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -682,7 +682,15 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, drm_modeset_lock_all(dev); drm_modeset_lock_init(&crtc->mutex); /* dropped by _unlock_all(): */
drm_modeset_lock(&crtc->mutex, config->acquire_ctx);
/* NOTE: use trylock here for the benefit of ww_mutex fault
* injection. We cannot actually fail to grab this lock (as
* it has only just been created), but fault injection does
* not know this, which can result in the this lock failing,
* and hilarity when we later try to drop the locks. See:
* https://bugzilla.kernel.org/show_bug.cgi?id=83341
*/
ret = ww_mutex_trylock(&crtc->mutex.mutex);
WARN_ON(ret);
Hm, I've thought on our quick discussion on irc we've agreed that the locking here in the init path is useless anyway and best dropped? Not just remove the crtc locking, but the entire modeset_lock_all.
well, 0day appears to disagree with you.. I still think we should go the trylock route for 3.17, as it is more the more conservative patch.
I'm not against getting rid of that locking (which is in fact overkill) once the other fallout is fixed up. But that seems more like a merge-window thing, so probably best to wait for 3.18.
BR, -R
FWIW, Reviewed-by: Thomas Hellstrom thellstrom@vmware.com
-Daniel
ret = drm_mode_object_get(dev, &crtc->base, DRM_MODE_OBJECT_CRTC); if (ret)
-- 1.9.3
-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
At driver init no one can access modeset objects and we're single-threaded. So locking is just cargo-culting here. Worse, with the new ww mutexes and ww mutex slowpath debugging the mutex_lock might actually fail, and we don't have the full-blown ww recovery dance.
Which then leads to fireworks when we try to unlock the not-locked crtc lock.
An audit of all the functions called from here shows that none of them contain locking checks, so there's also no reason to keep the locking around just for consistency of caller contexts. Besides that I have the rule (at least in i915) that such places where we take locks just to simplify locking checks and not for correctness always require a comment.
This regression was introduced in
commit 51fd371bbaf94018a1223b4e2cf20b9880fd92d4 Author: Rob Clark robdclark@gmail.com Date: Tue Nov 19 12:10:12 2013 -0500
drm: convert crtc and connection_mutex to ww_mutex (v5)
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=83341 Cc: Rob Clark robdclark@gmail.com Cc: thellstrom@vmware.com Cc: maarten.lankhorst@canonical.com Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/drm_crtc.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 7d7c1fd15443..269d2990c180 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -679,11 +679,6 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, crtc->funcs = funcs; crtc->invert_dimensions = false;
- drm_modeset_lock_all(dev); - drm_modeset_lock_init(&crtc->mutex); - /* dropped by _unlock_all(): */ - drm_modeset_lock(&crtc->mutex, config->acquire_ctx); - ret = drm_mode_object_get(dev, &crtc->base, DRM_MODE_OBJECT_CRTC); if (ret) goto out; @@ -701,7 +696,6 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, cursor->possible_crtcs = 1 << drm_crtc_index(crtc);
out: - drm_modeset_unlock_all(dev);
return ret; }
At driver init no one can access modeset objects and we're single-threaded. So locking is just cargo-culting here. Worse, with the new ww mutexes and ww mutex slowpath debugging the mutex_lock might actually fail, and we don't have the full-blown ww recovery dance.
Which then leads to fireworks when we try to unlock the not-locked crtc lock.
An audit of all the functions called from here shows that none of them contain locking checks, so there's also no reason to keep the locking around just for consistency of caller contexts. Besides that I have the rule (at least in i915) that such places where we take locks just to simplify locking checks and not for correctness always require a comment.
This regression was introduced in
commit 51fd371bbaf94018a1223b4e2cf20b9880fd92d4 Author: Rob Clark robdclark@gmail.com Date: Tue Nov 19 12:10:12 2013 -0500
drm: convert crtc and connection_mutex to ww_mutex (v5)
v2: Don't drop the lock_init call, spotted by the 0day builder.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=83341 Cc: Rob Clark robdclark@gmail.com Cc: thellstrom@vmware.com Cc: maarten.lankhorst@canonical.com Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com --- drivers/gpu/drm/drm_crtc.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index e2f4b8c21440..b1271a8d8ce7 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -679,11 +679,7 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, crtc->funcs = funcs; crtc->invert_dimensions = false;
- drm_modeset_lock_all(dev); drm_modeset_lock_init(&crtc->mutex); - /* dropped by _unlock_all(): */ - drm_modeset_lock(&crtc->mutex, config->acquire_ctx); - ret = drm_mode_object_get(dev, &crtc->base, DRM_MODE_OBJECT_CRTC); if (ret) goto out; @@ -701,7 +697,6 @@ int drm_crtc_init_with_planes(struct drm_device *dev, struct drm_crtc *crtc, cursor->possible_crtcs = 1 << drm_crtc_index(crtc);
out: - drm_modeset_unlock_all(dev);
return ret; }
On 09/08/2014 09:03 AM, Daniel Vetter wrote:
At driver init no one can access modeset objects and we're single-threaded. So locking is just cargo-culting here. Worse, with the new ww mutexes and ww mutex slowpath debugging the mutex_lock might actually fail, and we don't have the full-blown ww recovery dance.
Which then leads to fireworks when we try to unlock the not-locked crtc lock.
An audit of all the functions called from here shows that none of them contain locking checks, so there's also no reason to keep the locking around just for consistency of caller contexts. Besides that I have the rule (at least in i915) that such places where we take locks just to simplify locking checks and not for correctness always require a comment.
I'm not really opposed to any of the patches. It's clear that trylock will work, and it's also clear that locking is not strictly needed, at least not of a lock that has not been published yet.
However, I tend to go for the "lock even if it's unnecessary" version for a couple of reasons:
a) If that turns out to be impossible or very hard, then something is probably wrong with the design. b) It's good to think of locks where possible as "protecting data" rather than serializing something. With that in mind, and if we in the future were to have tools to automatically check that relevant locks are held while accessing lock-protected stuff, we're in trouble. c) Even if there aren't any functions now that check for relevant locks held, there might be in the future. d) People will probably spend time wondering why locking is done elsewhere but not here.
So at least considering d) and b) I'd like to see documentation the other way around: If we avoid taking locks around data accesses that are supposed to be protected by the lock for whatever reason, the reason should be documented.
Thanks, Thomas
On Mon, Sep 08, 2014 at 02:57:23PM +0200, Thomas Hellstrom wrote:
On 09/08/2014 09:03 AM, Daniel Vetter wrote:
At driver init no one can access modeset objects and we're single-threaded. So locking is just cargo-culting here. Worse, with the new ww mutexes and ww mutex slowpath debugging the mutex_lock might actually fail, and we don't have the full-blown ww recovery dance.
Which then leads to fireworks when we try to unlock the not-locked crtc lock.
An audit of all the functions called from here shows that none of them contain locking checks, so there's also no reason to keep the locking around just for consistency of caller contexts. Besides that I have the rule (at least in i915) that such places where we take locks just to simplify locking checks and not for correctness always require a comment.
I'm not really opposed to any of the patches. It's clear that trylock will work, and it's also clear that locking is not strictly needed, at least not of a lock that has not been published yet.
However, I tend to go for the "lock even if it's unnecessary" version for a couple of reasons:
a) If that turns out to be impossible or very hard, then something is probably wrong with the design. b) It's good to think of locks where possible as "protecting data" rather than serializing something. With that in mind, and if we in the future were to have tools to automatically check that relevant locks are held while accessing lock-protected stuff, we're in trouble. c) Even if there aren't any functions now that check for relevant locks held, there might be in the future. d) People will probably spend time wondering why locking is done elsewhere but not here.
So at least considering d) and b) I'd like to see documentation the other way around: If we avoid taking locks around data accesses that are supposed to be protected by the lock for whatever reason, the reason should be documented.
Well my argument nowadays is that you should never abuse locking to enforce ordering constraints. And an objects that's just getting initialized but isn't published anywhere really has no reason to have locking for data consistency, so really only justified to be held if it makes the "protecting data" self-checks easier. E.g. we have a bunch of checks about manipulating the connector mode list all over, so init code must hold the relevant locks.
Also ime locking sprawl (which happens escpecially with big locks like dev->struct_mutex) is really bad for long-term maintainability, so by default I want as little locking as possible.
I guess a comment can make sense if there's no locking to explain the odd case. But if everything is sanely designed then imo pure init functions really should have comments about locks they take and not if they take no locks. Since with a sane design the no-locks case really should be the default. And if that makes people question the locking scheme and learn something about "locking for ordering" vs. "locking to protect data" that's a feature ;-) Note that this is specifically about init/teardown (suspend/resume is similar), which is all about ordering and otherwise rather single-threaded (at least if you don't botch up the synchronization with async workers on teardown).
So at least in i915 I'll keep on rejecting patches that add random carg-culted locking to init/shutdown paths. -Daniel
dri-devel@lists.freedesktop.org