I have this lockdep warning on wireless-testing tree based on 3.7-rc1 (no other patches except wireless bits).
============================================= Restarting tasks ... done. [ INFO: possible recursive locking detected ] 3.7.0-rc1-wl+ #2 Not tainted --------------------------------------------- Xorg/2269 is trying to acquire lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>] nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
but task is already holding lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>] nouveau_abi16_get+0x34/0x100 [nouveau]
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&cli->mutex); lock(&cli->mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by Xorg/2269: #0: (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>] nouveau_abi16_get+0x34/0x100 [nouveau]
stack backtrace: Pid: 2269, comm: Xorg Not tainted 3.7.0-rc1-wl+ #2 Call Trace: [<ffffffff810bbc24>] print_deadlock_bug+0xf4/0x100 [<ffffffff810bdba9>] validate_chain+0x549/0x7e0 [<ffffffff810be1a7>] __lock_acquire+0x367/0x580 [<ffffffffa012a27f>] ? nouveau_bo_move_m2mf+0x5f/0x170 [nouveau] [<ffffffff810be464>] lock_acquire+0xa4/0x120 [<ffffffffa012a27f>] ? nouveau_bo_move_m2mf+0x5f/0x170 [nouveau] [<ffffffff8156c860>] ? _raw_spin_unlock_irqrestore+0x40/0x80 [<ffffffff81569217>] __mutex_lock_common+0x47/0x3f0 [<ffffffffa012a27f>] ? nouveau_bo_move_m2mf+0x5f/0x170 [nouveau] [<ffffffffa011dd61>] ? nv84_graph_tlb_flush+0x291/0x2b0 [nouveau] [<ffffffffa00b4be6>] ? _nouveau_gpuobj_wr32+0x26/0x30 [nouveau] [<ffffffffa012a27f>] ? nouveau_bo_move_m2mf+0x5f/0x170 [nouveau] [<ffffffff815696e7>] mutex_lock_nested+0x37/0x50 [<ffffffffa012a27f>] nouveau_bo_move_m2mf+0x5f/0x170 [nouveau] [<ffffffffa012a783>] nouveau_bo_move+0xe3/0x330 [nouveau] [<ffffffffa009619d>] ttm_bo_handle_move_mem+0x2bd/0x670 [ttm] [<ffffffffa0098a1e>] ttm_bo_move_buffer+0x12e/0x150 [ttm] [<ffffffffa0098ad9>] ttm_bo_validate+0x99/0x130 [ttm] [<ffffffffa012add3>] nouveau_bo_validate+0x23/0x30 [nouveau] [<ffffffffa012cd8e>] validate_list+0xae/0x2c0 [nouveau] [<ffffffffa012dec2>] nouveau_gem_pushbuf_validate+0xa2/0x1e0 [nouveau] [<ffffffffa012e22c>] nouveau_gem_ioctl_pushbuf+0x22c/0x8a0 [nouveau] [<ffffffffa002c465>] drm_ioctl+0x355/0x570 [drm] [<ffffffff8119349a>] ? do_sync_read+0xaa/0xf0 [<ffffffffa012e000>] ? nouveau_gem_pushbuf_validate+0x1e0/0x1e0 [nouveau] [<ffffffff811a579c>] do_vfs_ioctl+0x8c/0x350 [<ffffffff81575745>] ? sysret_check+0x22/0x5d [<ffffffff811a5b01>] sys_ioctl+0xa1/0xb0 [<ffffffff81291eee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81575719>] system_call_fastpath+0x16/0x1b
On 10/16/2012 02:43 PM, Stanislaw Gruszka wrote:
I have this lockdep warning on wireless-testing tree based on 3.7-rc1 (no other patches except wireless bits).
============================================= Restarting tasks ... done. [ INFO: possible recursive locking detected ] 3.7.0-rc1-wl+ #2 Not tainted
Xorg/2269 is trying to acquire lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>] nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
but task is already holding lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>] nouveau_abi16_get+0x34/0x100 [nouveau]
I have observed the same bug so I built and tested v3.7-rc2 tag with lockdep enabled. It has the same problem and it results in a failure to resume after suspend. See below.
Gr. AvS
[ 76.272795] PM: suspend of devices complete after 2149.188 msecs [ 76.273110] PM: suspend devices took 2.152 seconds [ 76.273354] suspend debug: Waiting for 5 seconds. [ 81.233082] ehci_hcd 0000:00:1a.0: setting latency timer to 64 [ 81.233369] ehci_hcd 0000:00:1d.0: setting latency timer to 64 [ 81.233422] pci 0000:00:1e.0: setting latency timer to 64 [ 81.248934] e1000e 0000:00:19.0: wake-up capability disabled by ACPI [ 81.249398] e1000e 0000:00:19.0: irq 41 for MSI/MSI-X [ 81.249903] ahci 0000:00:1f.2: setting latency timer to 64 [ 81.249982] snd_hda_intel 0000:00:1b.0: irq 43 for MSI/MSI-X [ 81.250515] nouveau [ DRM] re-enabling device... [ 81.250548] nouveau [ DRM] resuming client object trees... [ 81.250557] nouveau [ VBIOS][0000:01:00.0] running init tables [ 81.701998] nouveau [ DRM] resuming display... [ 81.803923] firewire_core 0000:04:00.4: rediscovered device fw0 [ 81.823913] dell_wmi: Received unknown WMI event (0x11) [ 81.824521] serial 00:08: activated [ 82.135333] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 82.187115] ata6: SATA link down (SStatus 0 SControl 300) [ 82.232290] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 82.284002] ata5: SATA link down (SStatus 0 SControl 300) [ 82.330629] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04) [ 82.408079] ata2.00: configured for UDMA/133 [ 84.073571] ata1.00: failed to get Identify Device Data, Emask 0x1 [ 84.127965] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04) [ 84.202292] ata1.00: failed to get Identify Device Data, Emask 0x1 [ 84.254039] ata1.00: configured for UDMA/133 [ 84.303718] sd 0:0:0:0: [sda] Starting disk [ 84.360186] PM: resume of devices complete after 3132.774 msecs [ 84.410322] PM: resume devices took 3.180 seconds [ 84.449642] PM: Finishing wakeup. [ 84.505964] [ 84.506716] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx [ 84.477326] Restarting tasks ... done. [ 84.575294] video LNXVIDEO:00: Restoring backlight state [ 84.623825] ============================================= [ 84.623825] [ INFO: possible recursive locking detected ] [ 84.623826] 3.7.0-rc2-testing-lockdep #1 Not tainted [ 84.623827] --------------------------------------------- [ 84.623827] Xorg/1369 is trying to acquire lock: [ 84.623828] (&cli->mutex){+.+.+.}, at: [<f8974ca8>] nouveau_bo_move_m2mf.isra.13+0x38/0x120 [nouveau] [ 84.623856] [ 84.623856] but task is already holding lock: [ 84.623856] (&cli->mutex){+.+.+.}, at: [<f8979346>] nouveau_abi16_get+0x26/0x110 [nouveau] [ 84.623871] [ 84.623871] other info that might help us debug this: [ 84.623872] Possible unsafe locking scenario: [ 84.623872] [ 84.623872] CPU0 [ 84.623872] ---- [ 84.623873] lock(&cli->mutex); [ 84.623874] lock(&cli->mutex); [ 84.623874] [ 84.623874] *** DEADLOCK *** [ 84.623874] [ 84.623874] May be due to missing lock nesting notation [ 84.623874] [ 84.623875] 1 lock held by Xorg/1369: [ 84.623889] #0: (&cli->mutex){+.+.+.}, at: [<f8979346>] nouveau_abi16_get+0x26/0x110 [nouveau] [ 84.623890]
On 10/24/2012 01:14 PM, Arend van Spriel wrote:
On 10/16/2012 02:43 PM, Stanislaw Gruszka wrote:
I have this lockdep warning on wireless-testing tree based on 3.7-rc1 (no other patches except wireless bits).
============================================= Restarting tasks ... done. [ INFO: possible recursive locking detected ] 3.7.0-rc1-wl+ #2 Not tainted
Xorg/2269 is trying to acquire lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>] nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
but task is already holding lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>] nouveau_abi16_get+0x34/0x100 [nouveau]
I have observed the same bug so I built and tested v3.7-rc2 tag with lockdep enabled. It has the same problem and it results in a failure to resume after suspend. See below.
Gr. AvS
digging into the trace:
nouveau_gem_ioctl_pushbuf() calls nouveau_abi16_get() which grabs the mutex. Assume this should protect the chan variable passed to nouveau_gem_pushbuf_validate(), which does a bit more that validate as it ends up in nouveau_bo_move_m2mf() which uses the drm->chan. However, it deadlocks before that.
Gr. AvS
On 10/24/2012 02:45 PM, Arend van Spriel wrote:
On 10/24/2012 01:14 PM, Arend van Spriel wrote:
On 10/16/2012 02:43 PM, Stanislaw Gruszka wrote:
I have this lockdep warning on wireless-testing tree based on 3.7-rc1 (no other patches except wireless bits).
============================================= Restarting tasks ... done. [ INFO: possible recursive locking detected ] 3.7.0-rc1-wl+ #2 Not tainted
Xorg/2269 is trying to acquire lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>] nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
but task is already holding lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>] nouveau_abi16_get+0x34/0x100 [nouveau]
I have observed the same bug so I built and tested v3.7-rc2 tag with lockdep enabled. It has the same problem and it results in a failure to resume after suspend. See below.
Gr. AvS
digging into the trace:
nouveau_gem_ioctl_pushbuf() calls nouveau_abi16_get() which grabs the mutex. Assume this should protect the chan variable passed to nouveau_gem_pushbuf_validate(), which does a bit more that validate as it ends up in nouveau_bo_move_m2mf() which uses the drm->chan. However, it deadlocks before that.
Gr. AvS
Maybe this helps. The two locations where the lock is grabbed are from the same commit (see below).
Gr. AvS
commit ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69 Author: Ben Skeggs bskeggs@redhat.com Date: Fri Jul 20 08:17:34 2012 +1000
drm/nouveau: port all engines to new engine module format
This is a HUGE commit, but it's not nearly as bad as it looks - any problems can be isolated to a particular chipset and engine combination. It was simply too difficult to port each one at a time, the compat layers are *already* ridiculous.
Most of the changes here are simply to the glue, the process for each of the engine modules was to start with a standard skeleton and copy+paste the old code into the appropriate places, fixing up variable names etc as needed.
v2: Marcin Slusarz marcin.slusarz@gmail.com - fix find/replace bug in license header
v3: Ben Skeggs bskeggs@redhat.com - bump indirect pushbuf size to 8KiB, 4KiB barely enough for userspace and left no space for kernel's requirements during GEM pushbuf submission. - fix duplicate assignments noticed by clang
v4: Marcin Slusarz marcin.slusarz@gmail.com - add sparse annotations to nv04_fifo_pause/nv04_fifo_start - use ioread32_native/iowrite32_native for fifo control registers
v5: Ben Skeggs bskeggs@redhat.com - rebase on v3.6-rc4, modified to keep copy engine fix intact - nv10/fence: unmap fence bo before destroying - fixed fermi regression when using nvidia gr fuc - fixed typo in supported dma_mask checking
Signed-off-by: Ben Skeggs bskeggs@redhat.com
On 10/24/2012 02:45 PM, Arend van Spriel wrote:
On 10/24/2012 01:14 PM, Arend van Spriel wrote:
On 10/16/2012 02:43 PM, Stanislaw Gruszka wrote:
I have this lockdep warning on wireless-testing tree based on 3.7-rc1 (no other patches except wireless bits).
============================================= Restarting tasks ... done. [ INFO: possible recursive locking detected ] 3.7.0-rc1-wl+ #2 Not tainted
Xorg/2269 is trying to acquire lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>] nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
but task is already holding lock: (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>] nouveau_abi16_get+0x34/0x100 [nouveau]
I have observed the same bug so I built and tested v3.7-rc2 tag with lockdep enabled. It has the same problem and it results in a failure to resume after suspend. See below.
Gr. AvS
digging into the trace:
nouveau_gem_ioctl_pushbuf() calls nouveau_abi16_get() which grabs the mutex. Assume this should protect the chan variable passed to nouveau_gem_pushbuf_validate(), which does a bit more that validate as it ends up in nouveau_bo_move_m2mf() which uses the drm->chan. However, it deadlocks before that.
Gr. AvS
I reverted the two drm merges:
ceb736c Merge branch 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/no 612a9aa Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
It is not surprising that it solved the deadlock (doing pm_test). Unfortunately, suspend/resume still does not work. System goes to sleep just fine, but when trying to resume the BIOS kicks in and system boots instead of waking up.
Gr. AvS
dri-devel@lists.freedesktop.org