Got this playing with virgl, it happens when gem_object_open driver callback fails.
Now this probably shouldn't be failing that often, but when it does deadlock seems wrong.
Dave.
happens if the driver fails [ 677.932957] ============================================= [ 677.932957] [ INFO: possible recursive locking detected ] [ 677.932957] 4.3.0-rc5-virtio-gpu+ #11 Not tainted [ 677.932957] --------------------------------------------- [ 677.932957] Xorg/2661 is trying to acquire lock: [ 677.932957] (&prime_fpriv->lock){+.+.+.}, at: [<ffffffffa00151b0>] drm_gem_remove_prime_handles.isra.7+0x20/0x50 [drm] [ 677.932957] but task is already holding lock: [ 677.932957] (&prime_fpriv->lock){+.+.+.}, at: [<ffffffffa002d68b>] drm_gem_prime_fd_to_handle+0x4b/0x240 [drm] [ 677.932957] other info that might help us debug this: [ 677.932957] Possible unsafe locking scenario:
[ 677.932957] CPU0 [ 677.932957] ---- [ 677.932957] lock(&prime_fpriv->lock); [ 677.932957] lock(&prime_fpriv->lock); [ 677.932957] *** DEADLOCK ***
[ 677.932957] May be due to missing lock nesting notation
[ 677.932957] 1 lock held by Xorg/2661: [ 677.932957] #0: (&prime_fpriv->lock){+.+.+.}, at: [<ffffffffa002d68b>] drm_gem_prime_fd_to_handle+0x4b/0x240 [drm] [ 677.932957] stack backtrace: [ 677.932957] CPU: 1 PID: 2661 Comm: Xorg Not tainted 4.3.0-rc5-virtio-gpu+ #11 [ 677.932957] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 677.932957] ffffffff82619740 ffff88007b22fb20 ffffffff813159f9 ffffffff82619740 [ 677.932957] ffff88007b22fbd8 ffffffff810d373d 0000000000000000 ffff88007b22fb58 [ 677.932957] ffffffff00000000 ffff88004cfd8700 00000000008e8474 0000000000048af0 [ 677.932957] Call Trace: [ 677.932957] [<ffffffff813159f9>] dump_stack+0x4b/0x72 [ 677.932957] [<ffffffff810d373d>] __lock_acquire+0x193d/0x1a60 [ 677.932957] [<ffffffff810d411d>] lock_acquire+0x6d/0x90 [ 677.932957] [<ffffffffa00151b0>] ? drm_gem_remove_prime_handles.isra.7+0x20/0x50 [drm] [ 677.932957] [<ffffffff816d30e4>] mutex_lock_nested+0x64/0x3a0 [ 677.932957] [<ffffffffa00151b0>] ? drm_gem_remove_prime_handles.isra.7+0x20/0x50 [drm] [ 677.932957] [<ffffffffa00151b0>] drm_gem_remove_prime_handles.isra.7+0x20/0x50 [drm] [ 677.932957] [<ffffffffa0015947>] drm_gem_handle_delete+0xd7/0x110 [drm] [ 677.932957] [<ffffffffa0015baf>] drm_gem_handle_create_tail+0xff/0x160 [drm] [ 677.932957] [<ffffffffa002d731>] drm_gem_prime_fd_to_handle+0xf1/0x240 [drm] [ 677.932957] [<ffffffffa002dd28>] drm_prime_fd_to_handle_ioctl+0x28/0x40 [drm] [ 677.932957] [<ffffffffa0016594>] drm_ioctl+0x124/0x4f0 [drm] [ 677.932957] [<ffffffffa002dd00>] ? drm_prime_handle_to_fd_ioctl+0x60/0x60 [drm] [ 677.932957] [<ffffffff812b40e7>] ? ioctl_has_perm+0xa7/0xc0 [ 677.932957] [<ffffffff811c30aa>] do_vfs_ioctl+0x2da/0x530 [ 677.932957] [<ffffffff812b4159>] ? selinux_file_ioctl+0x59/0xf0 [ 677.932957] [<ffffffff812a68ae>] ? security_file_ioctl+0x3e/0x60 [ 677.932957] [<ffffffff811c3374>] SyS_ioctl+0x74/0x80 [ 677.932957] [<ffffffff816d6632>] entry_SYSCALL_64_fastpath+0x12/0x76 [ 840.028068] INFO: task Xorg:2661 blocked for more than 120 seconds.
If we need to take the error path and drop the references to the objects and handle we created earlier, there is a possibility that we then free the object and then attempt to free any associated PRIME resources under the prime mutex. If we are holding the prime mutex when we attempt to free the handle/object, we risk a recursive deadlock.
[ 677.932957] ============================================= [ 677.932957] [ INFO: possible recursive locking detected ] [ 677.932957] 4.3.0-rc5-virtio-gpu+ #11 Not tainted [ 677.932957] --------------------------------------------- [ 677.932957] Xorg/2661 is trying to acquire lock: [ 677.932957] (&prime_fpriv->lock){+.+.+.}, at: [<ffffffffa00151b0>] drm_gem_remove_prime_handles.isra.7+0x20/0x50 [drm] [ 677.932957] but task is already holding lock: [ 677.932957] (&prime_fpriv->lock){+.+.+.}, at: [<ffffffffa002d68b>] drm_gem_prime_fd_to_handle+0x4b/0x240 [drm]
Reported-by: Dave Airlie airlied@gmail.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dave Airlie airlied@gmail.com --- drivers/gpu/drm/drm_prime.c | 47 ++++++++++++++++++--------------------------- 1 file changed, 19 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa7183b20b..1bdd69e1ef43 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -564,26 +564,26 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev, uint32_t *handle) { struct dma_buf *dma_buf; - struct drm_gem_object *obj; + struct drm_gem_object *obj = NULL; int ret;
+ *handle = 0; + dma_buf = dma_buf_get(prime_fd); if (IS_ERR(dma_buf)) return PTR_ERR(dma_buf);
mutex_lock(&file_priv->prime.lock);
- ret = drm_prime_lookup_buf_handle(&file_priv->prime, - dma_buf, handle); + ret = drm_prime_lookup_buf_handle(&file_priv->prime, dma_buf, handle); if (ret == 0) - goto out_put; + goto out;
/* never seen this one, need to import */ - mutex_lock(&dev->object_name_lock); obj = dev->driver->gem_prime_import(dev, dma_buf); if (IS_ERR(obj)) { ret = PTR_ERR(obj); - goto out_unlock; + goto out; }
if (obj->dma_buf) { @@ -593,33 +593,24 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev, get_dma_buf(dma_buf); }
- /* drm_gem_handle_create_tail unlocks dev->object_name_lock. */ - ret = drm_gem_handle_create_tail(file_priv, obj, handle); - drm_gem_object_unreference_unlocked(obj); + ret = drm_gem_handle_create(file_priv, obj, handle); if (ret) - goto out_put; + goto out;
- ret = drm_prime_add_buf_handle(&file_priv->prime, - dma_buf, *handle); - if (ret) - goto fail; + ret = drm_prime_add_buf_handle(&file_priv->prime, dma_buf, *handle);
+out: mutex_unlock(&file_priv->prime.lock); - - dma_buf_put(dma_buf); - - return 0; - -fail: - /* hmm, if driver attached, we are relying on the free-object path - * to detach.. which seems ok.. - */ - drm_gem_handle_delete(file_priv, *handle); -out_unlock: - mutex_unlock(&dev->object_name_lock); -out_put: + if (ret) { + /* hmm, if driver attached, we are relying on the + * free-object path to detach.. which seems ok.. + */ + if (*handle) + drm_gem_handle_delete(file_priv, *handle); + } + if (!IS_ERR_OR_NULL(obj)) + drm_gem_object_unreference_unlocked(obj); dma_buf_put(dma_buf); - mutex_unlock(&file_priv->prime.lock); return ret; } EXPORT_SYMBOL(drm_gem_prime_fd_to_handle);
On Wed, Oct 14, 2015 at 10:52:00AM +0100, Chris Wilson wrote:
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa7183b20b..1bdd69e1ef43 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -564,26 +564,26 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev, uint32_t *handle)
/* never seen this one, need to import */
- mutex_lock(&dev->object_name_lock);
I found the use of object_name_lock here confusing. I couldn't see what it was meant to protect (it is supposed to just lock access to the global flink name assigned to the object, and I'm sure what value it is adding to drm_gem_handle_create_tail() either). Anyway the handle here is protected against concurrent creation/destruction by the prime.lock, so it looked safe enough to move the object_name_lock back to to drm_gem_handle_create_tail(). -Chris
On 14 October 2015 at 19:52, Chris Wilson chris@chris-wilson.co.uk wrote:
If we need to take the error path and drop the references to the objects and handle we created earlier, there is a possibility that we then free the object and then attempt to free any associated PRIME resources under the prime mutex. If we are holding the prime mutex when we attempt to free the handle/object, we risk a recursive deadlock.
This doesn't fix the case where drm_gem_handle_create_tail internally fails and calls drm_gem_handle_delete, which then takes the lock again.
Dave.
dri-devel@lists.freedesktop.org