running 3.4.0-rc3 + Christian's reset patch series.
The locks are definitely taken in different orders between vm_bo_add and cs ioctl.
Dave.
====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc3+ #33 Not tainted ------------------------------------------------------- shader_runner/3090 is trying to acquire lock: (&vm->mutex){+.+...}, at: [<ffffffffa00c513f>] radeon_cs_ioctl+0x438/0x5c1 [radeon]
but task is already holding lock: (&rdev->cs_mutex){+.+.+.}, at: [<ffffffffa00c4d3a>] radeon_cs_ioctl+0x33/0x5c1 [radeon]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&rdev->cs_mutex){+.+.+.}: [<ffffffff810757f5>] lock_acquire+0xf0/0x116 [<ffffffff81427881>] mutex_lock_nested+0x6a/0x2bb [<ffffffffa00b5f4d>] radeon_vm_bo_add+0x118/0x1f5 [radeon] [<ffffffffa00b6479>] radeon_vm_init+0x6b/0x70 [radeon] [<ffffffffa00a3bfc>] radeon_driver_open_kms+0x68/0x9a [radeon] [<ffffffffa0019698>] drm_open+0x201/0x587 [drm] [<ffffffffa0019b0a>] drm_stub_open+0xec/0x14a [drm] [<ffffffff8110f788>] chrdev_open+0x11c/0x145 [<ffffffff8110a23a>] __dentry_open+0x17e/0x29b [<ffffffff8110b138>] nameidata_to_filp+0x5b/0x62 [<ffffffff811188d0>] do_last+0x75d/0x771 [<ffffffff81118ab3>] path_openat+0xcb/0x380 [<ffffffff81118e51>] do_filp_open+0x33/0x81 [<ffffffff8110b23f>] do_sys_open+0x100/0x192 [<ffffffff8110b2ed>] sys_open+0x1c/0x1e [<ffffffff81430722>] system_call_fastpath+0x16/0x1b
-> #0 (&vm->mutex){+.+...}: [<ffffffff81074c99>] __lock_acquire+0xfcd/0x1664 [<ffffffff810757f5>] lock_acquire+0xf0/0x116 [<ffffffff81427881>] mutex_lock_nested+0x6a/0x2bb [<ffffffffa00c513f>] radeon_cs_ioctl+0x438/0x5c1 [radeon] [<ffffffffa00187a9>] drm_ioctl+0x2d8/0x3a4 [drm] [<ffffffff8111afd6>] do_vfs_ioctl+0x469/0x4aa [<ffffffff8111b068>] sys_ioctl+0x51/0x75 [<ffffffff81430722>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&rdev->cs_mutex); lock(&vm->mutex); lock(&rdev->cs_mutex); lock(&vm->mutex);
*** DEADLOCK ***
1 lock held by shader_runner/3090: #0: (&rdev->cs_mutex){+.+.+.}, at: [<ffffffffa00c4d3a>] radeon_cs_ioctl+0x33/0x5c1 [radeon]
stack backtrace: Pid: 3090, comm: shader_runner Not tainted 3.4.0-rc3+ #33 Call Trace: [<ffffffff81420ac7>] print_circular_bug+0x28a/0x29b [<ffffffff81074c99>] __lock_acquire+0xfcd/0x1664 [<ffffffff810757f5>] lock_acquire+0xf0/0x116 [<ffffffffa00c513f>] ? radeon_cs_ioctl+0x438/0x5c1 [radeon] [<ffffffff810db991>] ? might_fault+0x57/0xa7 [<ffffffff81427881>] mutex_lock_nested+0x6a/0x2bb [<ffffffffa00c513f>] ? radeon_cs_ioctl+0x438/0x5c1 [radeon] [<ffffffffa00f4196>] ? evergreen_ib_parse+0x1b2/0x204 [radeon] [<ffffffffa00c513f>] radeon_cs_ioctl+0x438/0x5c1 [radeon] [<ffffffffa00187a9>] drm_ioctl+0x2d8/0x3a4 [drm] [<ffffffffa00c4d07>] ? radeon_cs_finish_pages+0xa3/0xa3 [radeon] [<ffffffff811ee4c4>] ? avc_has_perm_flags+0xd7/0x160 [<ffffffff811ee413>] ? avc_has_perm_flags+0x26/0x160 [<ffffffff8104bf6a>] ? up_read+0x1b/0x32 [<ffffffff8111afd6>] do_vfs_ioctl+0x469/0x4aa [<ffffffff8111b068>] sys_ioctl+0x51/0x75 [<ffffffff8104f955>] ? __wake_up+0x1d/0x48 [<ffffffff81430722>] system_call_fastpath+0x16/0x1b
Interesting, I'm pretty sure that I haven't touched the locking order of the cs_mutex vs. vm_mutex.
Maybe it is just some kind of side effect, going to locking into it anyway.
Christian.
On 21.04.2012 13:39, Dave Airlie wrote:
running 3.4.0-rc3 + Christian's reset patch series.
The locks are definitely taken in different orders between vm_bo_add and cs ioctl.
Dave.
====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc3+ #33 Not tainted
shader_runner/3090 is trying to acquire lock: (&vm->mutex){+.+...}, at: [<ffffffffa00c513f>] radeon_cs_ioctl+0x438/0x5c1 [radeon]
but task is already holding lock: (&rdev->cs_mutex){+.+.+.}, at: [<ffffffffa00c4d3a>] radeon_cs_ioctl+0x33/0x5c1 [radeon]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&rdev->cs_mutex){+.+.+.}: [<ffffffff810757f5>] lock_acquire+0xf0/0x116 [<ffffffff81427881>] mutex_lock_nested+0x6a/0x2bb [<ffffffffa00b5f4d>] radeon_vm_bo_add+0x118/0x1f5 [radeon] [<ffffffffa00b6479>] radeon_vm_init+0x6b/0x70 [radeon] [<ffffffffa00a3bfc>] radeon_driver_open_kms+0x68/0x9a [radeon] [<ffffffffa0019698>] drm_open+0x201/0x587 [drm] [<ffffffffa0019b0a>] drm_stub_open+0xec/0x14a [drm] [<ffffffff8110f788>] chrdev_open+0x11c/0x145 [<ffffffff8110a23a>] __dentry_open+0x17e/0x29b [<ffffffff8110b138>] nameidata_to_filp+0x5b/0x62 [<ffffffff811188d0>] do_last+0x75d/0x771 [<ffffffff81118ab3>] path_openat+0xcb/0x380 [<ffffffff81118e51>] do_filp_open+0x33/0x81 [<ffffffff8110b23f>] do_sys_open+0x100/0x192 [<ffffffff8110b2ed>] sys_open+0x1c/0x1e [<ffffffff81430722>] system_call_fastpath+0x16/0x1b
-> #0 (&vm->mutex){+.+...}: [<ffffffff81074c99>] __lock_acquire+0xfcd/0x1664 [<ffffffff810757f5>] lock_acquire+0xf0/0x116 [<ffffffff81427881>] mutex_lock_nested+0x6a/0x2bb [<ffffffffa00c513f>] radeon_cs_ioctl+0x438/0x5c1 [radeon] [<ffffffffa00187a9>] drm_ioctl+0x2d8/0x3a4 [drm] [<ffffffff8111afd6>] do_vfs_ioctl+0x469/0x4aa [<ffffffff8111b068>] sys_ioctl+0x51/0x75 [<ffffffff81430722>] system_call_fastpath+0x16/0x1b
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1 ---- ----
lock(&rdev->cs_mutex); lock(&vm->mutex); lock(&rdev->cs_mutex); lock(&vm->mutex);
*** DEADLOCK ***
1 lock held by shader_runner/3090: #0: (&rdev->cs_mutex){+.+.+.}, at: [<ffffffffa00c4d3a>] radeon_cs_ioctl+0x33/0x5c1 [radeon]
stack backtrace: Pid: 3090, comm: shader_runner Not tainted 3.4.0-rc3+ #33 Call Trace: [<ffffffff81420ac7>] print_circular_bug+0x28a/0x29b [<ffffffff81074c99>] __lock_acquire+0xfcd/0x1664 [<ffffffff810757f5>] lock_acquire+0xf0/0x116 [<ffffffffa00c513f>] ? radeon_cs_ioctl+0x438/0x5c1 [radeon] [<ffffffff810db991>] ? might_fault+0x57/0xa7 [<ffffffff81427881>] mutex_lock_nested+0x6a/0x2bb [<ffffffffa00c513f>] ? radeon_cs_ioctl+0x438/0x5c1 [radeon] [<ffffffffa00f4196>] ? evergreen_ib_parse+0x1b2/0x204 [radeon] [<ffffffffa00c513f>] radeon_cs_ioctl+0x438/0x5c1 [radeon] [<ffffffffa00187a9>] drm_ioctl+0x2d8/0x3a4 [drm] [<ffffffffa00c4d07>] ? radeon_cs_finish_pages+0xa3/0xa3 [radeon] [<ffffffff811ee4c4>] ? avc_has_perm_flags+0xd7/0x160 [<ffffffff811ee413>] ? avc_has_perm_flags+0x26/0x160 [<ffffffff8104bf6a>] ? up_read+0x1b/0x32 [<ffffffff8111afd6>] do_vfs_ioctl+0x469/0x4aa [<ffffffff8111b068>] sys_ioctl+0x51/0x75 [<ffffffff8104f955>] ? __wake_up+0x1d/0x48 [<ffffffff81430722>] system_call_fastpath+0x16/0x1b
2012/4/21 Christian König deathsimple@vodafone.de:
Interesting, I'm pretty sure that I haven't touched the locking order of the cs_mutex vs. vm_mutex.
Maybe it is just some kind of side effect, going to locking into it anyway.
Christian.
It's the using, init path take lock in different order than cs path
Cheers, Jerome
On 21.04.2012 16:08, Jerome Glisse wrote:
2012/4/21 Christian Königdeathsimple@vodafone.de:
Interesting, I'm pretty sure that I haven't touched the locking order of the cs_mutex vs. vm_mutex.
Maybe it is just some kind of side effect, going to locking into it anyway.
Christian.
It's the using, init path take lock in different order than cs path
Well, could you explain to me why the vm code takes cs mutex in the first place?
It clearly has it's own mutex and it doesn't looks like that it deals with any cs related data anyway.
Christian.
2012/4/21 Christian König deathsimple@vodafone.de:
On 21.04.2012 16:08, Jerome Glisse wrote:
2012/4/21 Christian Königdeathsimple@vodafone.de:
Interesting, I'm pretty sure that I haven't touched the locking order of the cs_mutex vs. vm_mutex.
Maybe it is just some kind of side effect, going to locking into it anyway.
Christian.
It's the using, init path take lock in different order than cs path
Well, could you explain to me why the vm code takes cs mutex in the first place?
It clearly has it's own mutex and it doesn't looks like that it deals with any cs related data anyway.
Christian.
Lock simplification is on my todo. The issue is that vm manager is protected by cs_mutex The vm.mutex is specific to each vm it doesn't protect the global vm management. I didn't wanted to introduce a new global vm mutex as vm activity is mostly trigger on behalf of cs so i dediced to use the cs mutex.
That's why non cs path of vm need to take the cs mutex.
Cheers, Jerome
2012/4/21 Jerome Glisse j.glisse@gmail.com:
2012/4/21 Christian König deathsimple@vodafone.de:
On 21.04.2012 16:08, Jerome Glisse wrote:
2012/4/21 Christian Königdeathsimple@vodafone.de:
Interesting, I'm pretty sure that I haven't touched the locking order of the cs_mutex vs. vm_mutex.
Maybe it is just some kind of side effect, going to locking into it anyway.
Christian.
It's the using, init path take lock in different order than cs path
Well, could you explain to me why the vm code takes cs mutex in the first place?
It clearly has it's own mutex and it doesn't looks like that it deals with any cs related data anyway.
Christian.
Lock simplification is on my todo. The issue is that vm manager is protected by cs_mutex The vm.mutex is specific to each vm it doesn't protect the global vm management. I didn't wanted to introduce a new global vm mutex as vm activity is mostly trigger on behalf of cs so i dediced to use the cs mutex.
That's why non cs path of vm need to take the cs mutex.
So if one app is adding a bo, and another doing CS, isn't deadlock a real possibility?
I expect the VM code need to take CS mutex earlier then.
Dave.
On 21.04.2012 17:57, Dave Airlie wrote:
2012/4/21 Jerome Glissej.glisse@gmail.com:
2012/4/21 Christian Königdeathsimple@vodafone.de:
On 21.04.2012 16:08, Jerome Glisse wrote:
2012/4/21 Christian Königdeathsimple@vodafone.de:
Interesting, I'm pretty sure that I haven't touched the locking order of the cs_mutex vs. vm_mutex.
Maybe it is just some kind of side effect, going to locking into it anyway.
Christian.
It's the using, init path take lock in different order than cs path
Well, could you explain to me why the vm code takes cs mutex in the first place?
It clearly has it's own mutex and it doesn't looks like that it deals with any cs related data anyway.
Christian.
Lock simplification is on my todo. The issue is that vm manager is protected by cs_mutex The vm.mutex is specific to each vm it doesn't protect the global vm management. I didn't wanted to introduce a new global vm mutex as vm activity is mostly trigger on behalf of cs so i dediced to use the cs mutex.
That's why non cs path of vm need to take the cs mutex.
So if one app is adding a bo, and another doing CS, isn't deadlock a real possibility?
Yeah, I think so.
I expect the VM code need to take CS mutex earlier then.
I would strongly suggest to give the vm code their own global mutex and remove the per vm mutex, cause the later is pretty superfluous if the cs_mutex is also taken most of the time.
The attached patch is against drm-fixes and does exactly that.
Christian.
2012/4/21 Christian König deathsimple@vodafone.de:
On 21.04.2012 17:57, Dave Airlie wrote:
2012/4/21 Jerome Glissej.glisse@gmail.com:
2012/4/21 Christian Königdeathsimple@vodafone.de:
On 21.04.2012 16:08, Jerome Glisse wrote:
2012/4/21 Christian Königdeathsimple@vodafone.de:
Interesting, I'm pretty sure that I haven't touched the locking order of the cs_mutex vs. vm_mutex.
Maybe it is just some kind of side effect, going to locking into it anyway.
Christian.
It's the using, init path take lock in different order than cs path
Well, could you explain to me why the vm code takes cs mutex in the first place?
It clearly has it's own mutex and it doesn't looks like that it deals with any cs related data anyway.
Christian.
Lock simplification is on my todo. The issue is that vm manager is protected by cs_mutex The vm.mutex is specific to each vm it doesn't protect the global vm management. I didn't wanted to introduce a new global vm mutex as vm activity is mostly trigger on behalf of cs so i dediced to use the cs mutex.
That's why non cs path of vm need to take the cs mutex.
So if one app is adding a bo, and another doing CS, isn't deadlock a real possibility?
Yeah, I think so.
No it's not. Look at the code.
I expect the VM code need to take CS mutex earlier then.
No it does not. The idea is that when adding a bo we only need to take the cs mutex if we need to resize the vm size (and even that can be worked around).
So we will need to take the cs ioctl in very few case (suspend, increasing vm size).
I would strongly suggest to give the vm code their own global mutex and remove the per vm mutex, cause the later is pretty superfluous if the cs_mutex is also taken most of the time.
The attached patch is against drm-fixes and does exactly that.
Christian.
NAK with your change there will be lock contention if one app is in cs and another try to create bo. Currently there is allmost never contention. Once i ironed out the DP->VGA i will work on something to remove the cs mutex from vm path (ie remove it from bo creation/del path).
Cheers, Jerome
On 21.04.2012 19:30, Jerome Glisse wrote:
2012/4/21 Christian Königdeathsimple@vodafone.de:
On 21.04.2012 17:57, Dave Airlie wrote:
2012/4/21 Jerome Glissej.glisse@gmail.com:
2012/4/21 Christian Königdeathsimple@vodafone.de:
On 21.04.2012 16:08, Jerome Glisse wrote:
2012/4/21 Christian Königdeathsimple@vodafone.de: > Interesting, I'm pretty sure that I haven't touched the locking order > of > the > cs_mutex vs. vm_mutex. > > Maybe it is just some kind of side effect, going to locking into it > anyway. > > Christian. > It's the using, init path take lock in different order than cs path
Well, could you explain to me why the vm code takes cs mutex in the first place?
It clearly has it's own mutex and it doesn't looks like that it deals with any cs related data anyway.
Christian.
Lock simplification is on my todo. The issue is that vm manager is protected by cs_mutex The vm.mutex is specific to each vm it doesn't protect the global vm management. I didn't wanted to introduce a new global vm mutex as vm activity is mostly trigger on behalf of cs so i dediced to use the cs mutex.
That's why non cs path of vm need to take the cs mutex.
So if one app is adding a bo, and another doing CS, isn't deadlock a real possibility?
Yeah, I think so.
No it's not. Look at the code.
I expect the VM code need to take CS mutex earlier then.
No it does not. The idea is that when adding a bo we only need to take the cs mutex if we need to resize the vm size (and even that can be worked around).
So we will need to take the cs ioctl in very few case (suspend, increasing vm size).
I would strongly suggest to give the vm code their own global mutex and remove the per vm mutex, cause the later is pretty superfluous if the cs_mutex is also taken most of the time.
The attached patch is against drm-fixes and does exactly that.
Christian.
NAK with your change there will be lock contention if one app is in cs and another try to create bo. Currently there is allmost never contention. Once i ironed out the DP->VGA i will work on something to remove the cs mutex from vm path (ie remove it from bo creation/del path).
Ok, sounds like I don't understand the code deeply enough to fix this. So I'm just going to wait for your fix.
By the way: If you are talking about the NUTMEG DP->VGA problem, I have two systems with that sitting directly beside me. So if you got any patches just leave me a note and I can try them.
Christian.
dri-devel@lists.freedesktop.org