The following patch series fixes the mutex corruption problem due to bit-copying of drm_crtc structure that happens when drm_crtc_helper_set_config function takes the error-recovery path. The patch series is the alternative solution for the patch that was proposed and NACK-ed two weeks ago [1].
The actual fix is implemented in patch #6; preceding 5 patches are necessary prerequisites.
A couple of my own remarks:
1) Someone (including myself) may be tempted to eliminate the bit-copy for encoder and connector structures as well. I would, however, prefer to leave that improvement for a different patch series. The primary reason is that this patch series addresses an acute (and serious) problem (mutex corruption), while doing the equivalent rework for connector and encoder structure would be only for the sake of improving the code quality.
2) The problem exists in old (stable@...) kernels and my original intention (in the patch that was NACK-ed [1]) was to make the fix simple enough to be eligible for stable@.... This patch series is now probably more complex than what stable@... may be willing to accept. So the question in my mind is what we should do with the old kernels.
3) This patch series does not include the fix for incidental finding described in earlier discussions about this problem [2] because it's not the part of the fix that this patch series is targeting, so I'd leave it for later.
regards,
Ilija
[1] http://lists.freedesktop.org/archives/dri-devel/2013-October/047479.html [2] http://lists.freedesktop.org/archives/dri-devel/2013-October/047557.html
Signed-off-by: Ilija Hadzic ihadzic@research.bell-labs.com --- drivers/gpu/drm/drm_crtc_helper.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index dbcd687..d0ac595 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -814,8 +814,7 @@ int drm_crtc_helper_set_config(struct drm_mode_set *set) set->crtc->y = set->y;
old_fb = set->crtc->fb; - if (set->crtc->fb != set->fb) - set->crtc->fb = set->fb; + set->crtc->fb = set->fb; ret = crtc_funcs->mode_set_base(set->crtc, set->x, set->y, old_fb); if (ret != 0) {
Old framebuffer is stored in save_set.fb and it is the same value that is later stored in old_fb. This makes old_fb redundant so we can replace it with save_set.fb.
Signed-off-by: Ilija Hadzic ihadzic@research.bell-labs.com --- drivers/gpu/drm/drm_crtc_helper.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index d0ac595..b06a6c4 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -595,7 +595,6 @@ int drm_crtc_helper_set_config(struct drm_mode_set *set) struct drm_device *dev; struct drm_crtc *save_crtcs, *new_crtc, *crtc; struct drm_encoder *save_encoders, *new_encoder, *encoder; - struct drm_framebuffer *old_fb = NULL; bool mode_changed = false; /* if true do a full mode set */ bool fb_changed = false; /* if true and !mode_changed just do a flip */ struct drm_connector *save_connectors, *connector; @@ -790,14 +789,13 @@ int drm_crtc_helper_set_config(struct drm_mode_set *set) DRM_DEBUG_KMS("attempting to set mode from" " userspace\n"); drm_mode_debug_printmodeline(set->mode); - old_fb = set->crtc->fb; set->crtc->fb = set->fb; if (!drm_crtc_helper_set_mode(set->crtc, set->mode, set->x, set->y, - old_fb)) { + save_set.fb)) { DRM_ERROR("failed to set mode on [CRTC:%d]\n", set->crtc->base.id); - set->crtc->fb = old_fb; + set->crtc->fb = save_set.fb; ret = -EINVAL; goto fail; } @@ -812,13 +810,11 @@ int drm_crtc_helper_set_config(struct drm_mode_set *set) } else if (fb_changed) { set->crtc->x = set->x; set->crtc->y = set->y; - - old_fb = set->crtc->fb; set->crtc->fb = set->fb; ret = crtc_funcs->mode_set_base(set->crtc, - set->x, set->y, old_fb); + set->x, set->y, save_set.fb); if (ret != 0) { - set->crtc->fb = old_fb; + set->crtc->fb = save_set.fb; goto fail; } }
Signed-off-by: Ilija Hadzic ihadzic@research.bell-labs.com --- drivers/gpu/drm/drm_crtc_helper.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index b06a6c4..65f3658 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -814,6 +814,8 @@ int drm_crtc_helper_set_config(struct drm_mode_set *set) ret = crtc_funcs->mode_set_base(set->crtc, set->x, set->y, save_set.fb); if (ret != 0) { + set->crtc->x = save_set.x; + set->crtc->y = save_set.y; set->crtc->fb = save_set.fb; goto fail; }
There is no need to save or restore hwmode field, because by the time this function sets this field, it cannot fail any more. However, we should save old enabled field because if the function fails, we want to return with unchanged CRTC.
Signed-off-by: Ilija Hadzic ihadzic@research.bell-labs.com --- drivers/gpu/drm/drm_crtc_helper.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index 65f3658..9e536b1 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -405,22 +405,25 @@ bool drm_crtc_helper_set_mode(struct drm_crtc *crtc, struct drm_framebuffer *old_fb) { struct drm_device *dev = crtc->dev; - struct drm_display_mode *adjusted_mode, saved_mode, saved_hwmode; + struct drm_display_mode *adjusted_mode, saved_mode; struct drm_crtc_helper_funcs *crtc_funcs = crtc->helper_private; struct drm_encoder_helper_funcs *encoder_funcs; int saved_x, saved_y; + bool saved_enabled; struct drm_encoder *encoder; bool ret = true;
+ saved_enabled = crtc->enabled; crtc->enabled = drm_helper_crtc_in_use(crtc); if (!crtc->enabled) return true;
adjusted_mode = drm_mode_duplicate(dev, mode); - if (!adjusted_mode) + if (!adjusted_mode) { + crtc->enabled = saved_enabled; return false; + }
- saved_hwmode = crtc->hwmode; saved_mode = crtc->mode; saved_x = crtc->x; saved_y = crtc->y; @@ -539,7 +542,7 @@ bool drm_crtc_helper_set_mode(struct drm_crtc *crtc, done: drm_mode_destroy(dev, adjusted_mode); if (!ret) { - crtc->hwmode = saved_hwmode; + crtc->enabled = saved_enabled; crtc->mode = saved_mode; crtc->x = saved_x; crtc->y = saved_y;
There is no need to set crtc->enabled field in drm_crtc_helper_set_config. This is already done (and properly restored in case of failure) in drm_crtc_helper_set_mode that is called by drm_crtc_helper_set_config. Doing it at only one place makes restoration in case of failure easier.
Signed-off-by: Ilija Hadzic ihadzic@research.bell-labs.com --- drivers/gpu/drm/drm_crtc_helper.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index 9e536b1..a1322af 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -787,8 +787,7 @@ int drm_crtc_helper_set_config(struct drm_mode_set *set) mode_changed = true;
if (mode_changed) { - set->crtc->enabled = drm_helper_crtc_in_use(set->crtc); - if (set->crtc->enabled) { + if (drm_helper_crtc_in_use(set->crtc)) { DRM_DEBUG_KMS("attempting to set mode from" " userspace\n"); drm_mode_debug_printmodeline(set->mode);
Bit-copying restoration of CRTC structure in failure-recovery path of drm_crtc_helper_set_config function evokes a subtle and rare, but very dangerous, corruption of CRTC mutex structure.
Namely, if drm_crtc_helper_set_config takes the path under 'fail:' label *and* some other process has attempted to grab the crtc mutex (and got blocked), restoring the CRTC structure by bit-copying it will overwrite the CRTC mutex state and the waiters list pointer within the mutex structure. Consequently the blocked process will never be scheduled.
This patch fixes the issue by eliminating the bit-copy restoration. The elimination is possible because previous patches have cleaned up the resoration path so that only the fields touched by the drm_crtc_helper_set_config function are saved and restored if necessary.
Signed-off-by: Ilija Hadzic ihadzic@research.bell-labs.com --- drivers/gpu/drm/drm_crtc_helper.c | 33 ++++++++------------------------- 1 file changed, 8 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index a1322af..d171032 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -596,7 +596,7 @@ drm_crtc_helper_disable(struct drm_crtc *crtc) int drm_crtc_helper_set_config(struct drm_mode_set *set) { struct drm_device *dev; - struct drm_crtc *save_crtcs, *new_crtc, *crtc; + struct drm_crtc *new_crtc; struct drm_encoder *save_encoders, *new_encoder, *encoder; bool mode_changed = false; /* if true do a full mode set */ bool fb_changed = false; /* if true and !mode_changed just do a flip */ @@ -633,38 +633,28 @@ int drm_crtc_helper_set_config(struct drm_mode_set *set)
dev = set->crtc->dev;
- /* Allocate space for the backup of all (non-pointer) crtc, encoder and - * connector data. */ - save_crtcs = kzalloc(dev->mode_config.num_crtc * - sizeof(struct drm_crtc), GFP_KERNEL); - if (!save_crtcs) - return -ENOMEM; - + /* + * Allocate space for the backup of all (non-pointer) encoder and + * connector data. + */ save_encoders = kzalloc(dev->mode_config.num_encoder * sizeof(struct drm_encoder), GFP_KERNEL); - if (!save_encoders) { - kfree(save_crtcs); + if (!save_encoders) return -ENOMEM; - }
save_connectors = kzalloc(dev->mode_config.num_connector * sizeof(struct drm_connector), GFP_KERNEL); if (!save_connectors) { - kfree(save_crtcs); kfree(save_encoders); return -ENOMEM; }
- /* Copy data. Note that driver private data is not affected. + /* + * Copy data. Note that driver private data is not affected. * Should anything bad happen only the expected state is * restored, not the drivers personal bookkeeping. */ count = 0; - list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) { - save_crtcs[count++] = *crtc; - } - - count = 0; list_for_each_entry(encoder, &dev->mode_config.encoder_list, head) { save_encoders[count++] = *encoder; } @@ -825,17 +815,11 @@ int drm_crtc_helper_set_config(struct drm_mode_set *set)
kfree(save_connectors); kfree(save_encoders); - kfree(save_crtcs); return 0;
fail: /* Restore all previous data. */ count = 0; - list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) { - *crtc = save_crtcs[count++]; - } - - count = 0; list_for_each_entry(encoder, &dev->mode_config.encoder_list, head) { *encoder = save_encoders[count++]; } @@ -853,7 +837,6 @@ fail:
kfree(save_connectors); kfree(save_encoders); - kfree(save_crtcs); return ret; } EXPORT_SYMBOL(drm_crtc_helper_set_config);
On Tue, Oct 29, 2013 at 11:09:40AM -0400, Ilija Hadzic wrote:
The following patch series fixes the mutex corruption problem due to bit-copying of drm_crtc structure that happens when drm_crtc_helper_set_config function takes the error-recovery path. The patch series is the alternative solution for the patch that was proposed and NACK-ed two weeks ago [1].
On the series:
Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch
Aside: The horribly ad-hoc calling convetions with some of the (x, y, fb, mode) parameters being set before calling a lower-level functions, some being restored, some of them being the old backup value and some of them being expected to be updated by the called function really gives me the creeps.
But fixing that is something for a _really_ slow week (month even ...).
The actual fix is implemented in patch #6; preceding 5 patches are necessary prerequisites.
Hm, I don't really see why patches 1,2&4 are required. If we reorder them to the end of the series as follow-up cleanups then we'd only need to backport 3 patches. Which is imo reasonable. -Daniel
A couple of my own remarks:
- Someone (including myself) may be tempted to eliminate the
bit-copy for encoder and connector structures as well. I would, however, prefer to leave that improvement for a different patch series. The primary reason is that this patch series addresses an acute (and serious) problem (mutex corruption), while doing the equivalent rework for connector and encoder structure would be only for the sake of improving the code quality.
- The problem exists in old (stable@...) kernels and my original intention
(in the patch that was NACK-ed [1]) was to make the fix simple enough to be eligible for stable@.... This patch series is now probably more complex than what stable@... may be willing to accept. So the question in my mind is what we should do with the old kernels.
- This patch series does not include the fix for incidental finding
described in earlier discussions about this problem [2] because it's not the part of the fix that this patch series is targeting, so I'd leave it for later.
regards,
Ilija
[1] http://lists.freedesktop.org/archives/dri-devel/2013-October/047479.html [2] http://lists.freedesktop.org/archives/dri-devel/2013-October/047557.html
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, 29 Oct 2013, Daniel Vetter wrote:
Aside: The horribly ad-hoc calling convetions with some of the (x, y, fb, mode) parameters being set before calling a lower-level functions, some being restored, some of them being the old backup value and some of them being expected to be updated by the called function really gives me the creeps.
But fixing that is something for a _really_ slow week (month even ...).
I agree, I don't like it either. It's too tangled-up, but fixing it will probably be a major rework.
The actual fix is implemented in patch #6; preceding 5 patches are necessary prerequisites.
Hm, I don't really see why patches 1,2&4 are required. If we reorder them to the end of the series as follow-up cleanups then we'd only need to backport 3 patches. Which is imo reasonable.
1 and 2 could indeed be left out, but the end-result will look really ugly (e.g., x and y restoration will come from save_set.x and save_set.y, while frame buffer restoration will come from old_fb (and IMO, it's always better to first cleanup the code that one is about to touch and then make functional changes).
Patch 4, however, is required because of saving and restoration of 'enabled' flag, but it could be split in two: the required part that restores the enabled flag that restores only the the 'enabled' flag and the cleanup part that eliminates unecessary restoration of hwmode field.
-- Ilija
On Tue, Oct 29, 2013 at 8:22 PM, Ilija Hadzic ihadzic@research.bell-labs.com wrote:
The actual fix is implemented in patch #6; preceding 5 patches are necessary prerequisites.
Hm, I don't really see why patches 1,2&4 are required. If we reorder them to the end of the series as follow-up cleanups then we'd only need to backport 3 patches. Which is imo reasonable.
1 and 2 could indeed be left out, but the end-result will look really ugly (e.g., x and y restoration will come from save_set.x and save_set.y, while frame buffer restoration will come from old_fb (and IMO, it's always better to first cleanup the code that one is about to touch and then make functional changes).
Patch 4, however, is required because of saving and restoration of 'enabled' flag, but it could be split in two: the required part that restores the enabled flag that restores only the the 'enabled' flag and the cleanup part that eliminates unecessary restoration of hwmode field
Oh right, I've forgotten that between the review and writing the mail ;-) I guess we could try to bend the stable rules a bit and just submit all 6. It's a regression fix after all, and at least personally I prefer the most minimal backports to avoid diverging between upstream and stable kernel branches.
But I guess that's Dave's call to make. -Daniel
On Tue, 29 Oct 2013, Daniel Vetter wrote:
Oh right, I've forgotten that between the review and writing the mail ;-) I guess we could try to bend the stable rules a bit and just submit all 6. It's a regression fix after all, and at least personally I prefer the most minimal backports to avoid diverging between upstream and stable kernel branches.
But I guess that's Dave's call to make.
How about taking the patch series into drm-next and marking patch #6 only with CC: stable@ and specifying hashes of the prerequisite 5 patches for cherry-pick?
I think that is fully compliant with stable rules (at least that's my understanding of line 41 of Documentation/stable_kernel_rules.txt) and would not be bending anything.
I presume that if this is acceptable, Dave would have to add stable@ tags for cherry-picking because hash values may change in his tree if drm-next changes relative to my base (which is state of drm-next as of yesterday PM).
-- Ilija
dri-devel@lists.freedesktop.org