This is more of RFC than a proper patch.
I was looking at https://bugzilla.kernel.org/show_bug.cgi?id=15673 and I wrote a script to show where we call drm_gem_object_unreference() without holding struct_mutex and it showed these two places.
There call tree to here is: i915_load_modeset_init() => intel_modeset_init() => intel_setup_overlay()
"reg_bo" is a local variable so we could probably leave that as drm_gem_object_unreference() although I changed it in this patch.
The i915_gem_object_pin() function calls i915_gem_object_bind_to_gtt() where we start unreferencing stuff if we're low on space. So possibly that lock is needed. The call tree is:
i915_gem_object_pin() => i915_gem_object_bind_to_gtt() => i915_gem_evict_something() => i915_gem_retire_requests() => i915_gem_retire_request() => drm_gem_object_unreference()
On the other hand, this is all in the module init so it seems like locking shouldn't be necessary, and certainly I can't see how it would cause the suspend bug.
I don't have this hardware btw, so I can't actually test this patch.
Signed-off-by: Dan Carpenter error27@gmail.com
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index 6d524a1..a103582 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -1347,7 +1347,9 @@ void intel_setup_overlay(struct drm_device *dev) overlay->reg_bo = to_intel_bo(reg_bo);
if (OVERLAY_NONPHYSICAL(dev)) { + mutex_lock(&dev->struct_mutex); ret = i915_gem_object_pin(reg_bo, PAGE_SIZE); + mutex_unlock(&dev->struct_mutex); if (ret) { DRM_ERROR("failed to pin overlay register bo\n"); goto out_free_bo; @@ -1385,7 +1387,7 @@ void intel_setup_overlay(struct drm_device *dev) return;
out_free_bo: - drm_gem_object_unreference(reg_bo); + drm_gem_object_unreference_unlocked(reg_bo); out_free: kfree(overlay); return;
I changed the drm_gem_object_unreference() to drm_gem_object_unreference_unlocked().
"reg_bo" is a local variable so there are no synchronization issues, but the problem is that if struct_mutex isn't held, it triggers a BUG_ON() in drm_gem_object_free().
I also took the "struct_mutex" when calling i915_gem_object_pin(). The i915_gem_object_pin() function calls i915_gem_object_bind_to_gtt() which unreferences stuff if we are low on gtt space. The truth is that if we run out of gtt space on this code path, we're sunk already regardless of the locking. But we should take the lock anyway for consistency and to make the static checkers happy.
That call tree is: i915_gem_object_pin() => i915_gem_object_bind_to_gtt() => i915_gem_evict_something() => i915_gem_retire_requests() => i915_gem_retire_request() => drm_gem_object_unreference()
intel_setup_overlay() is always called with "struct_mutex" unlocked. There is only one call tree and it is:
i915_load_modeset_init() => intel_modeset_init() => intel_setup_overlay()
Signed-off-by: Dan Carpenter error27@gmail.com --- V2: The code is the same, but the change log is updated. No one has complained about hitting the BUG_ON() and I can't tested it myself, so this patch is probably linux-next material.
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index 6d524a1..a103582 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -1347,7 +1347,9 @@ void intel_setup_overlay(struct drm_device *dev) overlay->reg_bo = to_intel_bo(reg_bo);
if (OVERLAY_NONPHYSICAL(dev)) { + mutex_lock(&dev->struct_mutex); ret = i915_gem_object_pin(reg_bo, PAGE_SIZE); + mutex_unlock(&dev->struct_mutex); if (ret) { DRM_ERROR("failed to pin overlay register bo\n"); goto out_free_bo; @@ -1385,7 +1387,7 @@ void intel_setup_overlay(struct drm_device *dev) return;
out_free_bo: - drm_gem_object_unreference(reg_bo); + drm_gem_object_unreference_unlocked(reg_bo); out_free: kfree(overlay); return;
This patch never made it into 2.6.35 but it's still valid. Can you pick it up for 2.6.36? It still applies cleanly to 2.6.35-rc3 with fuzz (8 lines offset).
regards, dan carpenter
On Mon, May 03, 2010 at 09:58:32AM +0200, Dan Carpenter wrote:
I changed the drm_gem_object_unreference() to drm_gem_object_unreference_unlocked().
"reg_bo" is a local variable so there are no synchronization issues, but the problem is that if struct_mutex isn't held, it triggers a BUG_ON() in drm_gem_object_free().
I also took the "struct_mutex" when calling i915_gem_object_pin(). The i915_gem_object_pin() function calls i915_gem_object_bind_to_gtt() which unreferences stuff if we are low on gtt space. The truth is that if we run out of gtt space on this code path, we're sunk already regardless of the locking. But we should take the lock anyway for consistency and to make the static checkers happy.
That call tree is: i915_gem_object_pin() => i915_gem_object_bind_to_gtt() => i915_gem_evict_something() => i915_gem_retire_requests() => i915_gem_retire_request() => drm_gem_object_unreference()
intel_setup_overlay() is always called with "struct_mutex" unlocked. There is only one call tree and it is:
i915_load_modeset_init() => intel_modeset_init() => intel_setup_overlay()
Signed-off-by: Dan Carpenter error27@gmail.com
V2: The code is the same, but the change log is updated. No one has complained about hitting the BUG_ON() and I can't tested it myself, so this patch is probably linux-next material.
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index 6d524a1..a103582 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -1347,7 +1347,9 @@ void intel_setup_overlay(struct drm_device *dev) overlay->reg_bo = to_intel_bo(reg_bo);
if (OVERLAY_NONPHYSICAL(dev)) {
ret = i915_gem_object_pin(reg_bo, PAGE_SIZE);mutex_lock(&dev->struct_mutex);
if (ret) { DRM_ERROR("failed to pin overlay register bo\n"); goto out_free_bo;mutex_unlock(&dev->struct_mutex);
@@ -1385,7 +1387,7 @@ void intel_setup_overlay(struct drm_device *dev) return;
out_free_bo:
- drm_gem_object_unreference(reg_bo);
- drm_gem_object_unreference_unlocked(reg_bo);
out_free: kfree(overlay); return;
dri-devel@lists.freedesktop.org