From recent additional locking in nouveau, it looks like we see
recursive lock acquisition in 3.8-rc6:
nouveau [ DEVICE][0000:01:00.0] BOOT0 : 0x0e7150a2 nouveau [ DEVICE][0000:01:00.0] Chipset: GK107 (NVE7) nouveau [ DEVICE][0000:01:00.0] Family : NVE0 nouveau [ VBIOS][0000:01:00.0] checking PRAMIN for image... nouveau [ VBIOS][0000:01:00.0] ... appears to be valid nouveau [ VBIOS][0000:01:00.0] using image from PRAMIN nouveau [ VBIOS][0000:01:00.0] BIT signature found nouveau [ VBIOS][0000:01:00.0] version 80.07.26.04.01 nouveau [ PFB][0000:01:00.0] RAM type: GDDR5 nouveau [ PFB][0000:01:00.0] RAM size: 1024 MiB nouveau [ PFB][0000:01:00.0] ZCOMP: 0 tags init: gdm main process (960) killed by TERM signal vga_switcheroo: enabled [TTM] Zone kernel: Available graphics memory: 4038258 kiB [TTM] Zone dma32: Available graphics memory: 2097152 kiB [TTM] Initializing pool allocator [TTM] Initializing DMA pool allocator nouveau [ DRM] VRAM: 1024 MiB nouveau [ DRM] GART: 512 MiB nouveau [ DRM] BIT BIOS found nouveau [ DRM] Bios version 80.07.26.04 nouveau [ DRM] TMDS table version 2.0 nouveau [ DRM] DCB version 4.0 nouveau [ DRM] DCB outp 00: 048101b6 0f230010 nouveau [ DRM] DCB outp 01: 018212d6 0f220020 nouveau [ DRM] DCB outp 02: 01021212 00020020 nouveau [ DRM] DCB outp 03: 088324c6 0f220010 nouveau [ DRM] DCB outp 04: 08032402 00020010 nouveau [ DRM] DCB outp 05: 02843862 00020010 nouveau [ DRM] DCB conn 00: 00020047 nouveau [ DRM] DCB conn 01: 02208146 nouveau [ DRM] DCB conn 02: 01104246 nouveau [ DRM] DCB conn 03: 00410361
============================================= [ INFO: possible recursive locking detected ] 3.8.0-rc6-ninja+ #1 Not tainted --------------------------------------------- modprobe/585 is trying to acquire lock: (&subdev->mutex){+.+.+.}, at: [<ffffffffa016c323>] nouveau_instobj_create_+0x43/0x90 [nouveau]
but task is already holding lock: (&subdev->mutex){+.+.+.}, at: [<ffffffffa017672d>] nv50_disp_data_ctor+0x5d/0xd0 [nouveau]
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&subdev->mutex); lock(&subdev->mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by modprobe/585: #0: (&__lockdep_no_validate__){......}, at: [<ffffffff813075f3>] __driver_attach+0x53/0xb0 #1: (&__lockdep_no_validate__){......}, at: [<ffffffff81307601>] __driver_attach+0x61/0xb0 #2: (drm_global_mutex){+.+.+.}, at: [<ffffffff812ee59c>] drm_get_pci_dev+0xbc/0x2b0 #3: (&subdev->mutex){+.+.+.}, at: [<ffffffffa017672d>] nv50_disp_data_ctor+0x5d/0xd0 [nouveau]
stack backtrace: Pid: 585, comm: modprobe Not tainted 3.8.0-rc6-expert+ #1 Call Trace: [<ffffffff8108fde2>] validate_chain.isra.33+0xd72/0x10d0 [<ffffffff8105fa08>] ? __kernel_text_address+0x58/0x80 [<ffffffff8100575d>] ? print_context_stack+0x5d/0xd0 [<ffffffff81090bc1>] __lock_acquire+0x3a1/0xb60 [<ffffffff8108d504>] ? __lock_is_held+0x54/0x80 [<ffffffff8109184a>] lock_acquire+0x5a/0x70 [<ffffffffa016c323>] ? nouveau_instobj_create_+0x43/0x90 [nouveau] [<ffffffff81558739>] mutex_lock_nested+0x69/0x340 [<ffffffffa016c323>] ? nouveau_instobj_create_+0x43/0x90 [nouveau] [<ffffffffa0152370>] ? nouveau_object_create_+0x60/0xa0 [nouveau] [<ffffffffa016c323>] nouveau_instobj_create_+0x43/0x90 [nouveau] [<ffffffffa016cf8c>] nv50_instobj_ctor+0x4c/0xf0 [nouveau] [<ffffffffa0152163>] nouveau_object_ctor+0x33/0xc0 [nouveau] [<ffffffffa016cd51>] nv50_instmem_alloc+0x21/0x30 [nouveau] [<ffffffffa0150917>] nouveau_gpuobj_create_+0x247/0x2f0 [nouveau] [<ffffffff8155b35a>] ? _raw_spin_unlock_irqrestore+0x3a/0x70 [<ffffffff810921fd>] ? trace_hardirqs_on_caller+0x10d/0x1a0 [<ffffffffa014f4bc>] nouveau_engctx_create_+0x25c/0x2a0 [nouveau] [<ffffffffa0176791>] nv50_disp_data_ctor+0xc1/0xd0 [nouveau] [<ffffffffa0153722>] ? nouveau_subdev_reset+0x52/0x60 [nouveau] [<ffffffffa0152163>] nouveau_object_ctor+0x33/0xc0 [nouveau] [<ffffffffa0152a42>] nouveau_object_new+0x112/0x240 [nouveau] [<ffffffffa01f4b1d>] nv50_display_create+0x18d/0x860 [nouveau] [<ffffffff8105cb5d>] ? __cancel_work_timer+0x6d/0xc0 [<ffffffffa01db8eb>] nouveau_display_create+0x3cb/0x670 [nouveau] [<ffffffffa01cb1bf>] nouveau_drm_load+0x26f/0x590 [nouveau] [<ffffffff81304c99>] ? device_register+0x19/0x20 [<ffffffff812efe91>] ? drm_sysfs_device_add+0x81/0xb0 [<ffffffff812ee65e>] drm_get_pci_dev+0x17e/0x2b0 [<ffffffff81245e56>] ? __pci_set_master+0x26/0x80 [<ffffffffa01cab2a>] nouveau_drm_probe+0x25a/0x2a0 [nouveau] [<ffffffff8124a386>] local_pci_probe+0x46/0x80 [<ffffffff8124ac11>] pci_device_probe+0x101/0x110 [<ffffffff813073d6>] driver_probe_device+0x76/0x240 [<ffffffff81307643>] __driver_attach+0xa3/0xb0 [<ffffffff813075a0>] ? driver_probe_device+0x240/0x240 [<ffffffff8130564d>] bus_for_each_dev+0x4d/0x90 [<ffffffff81306f39>] driver_attach+0x19/0x20 [<ffffffff81306af0>] bus_add_driver+0x1a0/0x270 [<ffffffffa023d000>] ? 0xffffffffa023cfff [<ffffffff81307cd2>] driver_register+0x72/0x170 [<ffffffffa023d000>] ? 0xffffffffa023cfff [<ffffffff8124ad0f>] __pci_register_driver+0x5f/0x70 [<ffffffff812ee8a5>] drm_pci_init+0x115/0x130 [<ffffffffa023d000>] ? 0xffffffffa023cfff [<ffffffffa023d000>] ? 0xffffffffa023cfff [<ffffffffa023d04d>] nouveau_drm_init+0x4d/0x1000 [nouveau] [<ffffffff810002da>] do_one_initcall+0x11a/0x170 [<ffffffff8109d044>] load_module+0xe84/0x1470 [<ffffffff81098c30>] ? in_lock_functions+0x20/0x20 [<ffffffff8122c22e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8109d6e7>] sys_init_module+0xb7/0xe0 [<ffffffff8155c156>] system_call_fastpath+0x1a/0x1f [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). [drm] No driver support for vblank timestamp query. nouveau W[ DRM] voltage table 0x50 unknown nouveau [ DRM] 4 available performance level(s) nouveau [ DRM] 1: core 209MHz shader 419MHz memory 405MHz voltage 520mV nouveau [ DRM] 2: core 390MHz shader 780MHz memory 1080MHz voltage 610mV nouveau [ DRM] 3: core 1000MHz shader 2000MHz memory 1080MHz voltage 630mV nouveau [ DRM] 4: core 1254MHz shader 2508MHz memory 1080MHz voltage 630mV nouveau [ DRM] c: nouveau [ DRM] MM: using COPY for buffer copies nouveau 0000:01:00.0: No connectors reported connected with modes [drm] Cannot find any crtc or sizes - going 1024x768 nouveau [ DRM] allocated 1024x768 fb: 0x80000, bo ffff88025b966800 nouveau 0000:01:00.0: fb1: nouveaufb frame buffer device [drm] Initialized nouveau 1.1.0 20120801 for 0000:01:00.0 on minor 1 snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002) hda-intel 0000:01:00.1: Handle VGA-switcheroo audio client snd_hda_intel 0000:01:00.1: irq 49 for MSI/MSI-X input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input11 input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input12 input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input13 hda-intel 0000:01:00.1: Disabling via VGA-switcheroo VGA switcheroo: switched nouveau off nouveau [ DRM] suspending fbcon... nouveau [ DRM] suspending display... nouveau [ DRM] unpinning framebuffer(s)... nouveau [ DRM] evicting buffers... nouveau [ DRM] suspending client object trees... nouveau E[ I2C][0000:01:00.0] AUXCH(3): begin idle timeout 0xffffffff nouveau E[ I2C][0000:01:00.0] AUXCH(2): begin idle timeout 0xffffffff nouveau E[ I2C][0000:01:00.0] AUXCH(1): begin idle timeout 0xffffffff applesmc: light sensor data length set to 10 nouveau E[ I2C][0000:01:00.0] AUXCH(1): begin idle timeout 0xffffffff nouveau E[ I2C][0000:01:00.0] AUXCH(3): begin idle timeout 0xffffffff nouveau E[ I2C][0000:01:00.0] AUXCH(2): begin idle timeout 0xffffffff
1) Lockdep thinks all nouveau subdevs belong to the same class and can be locked in arbitrary order, which is not true (at least in general case). Tell it to distinguish subdevs by (o)class type. 2) DRM client can be locked under user client lock - tell lockdep to put DRM client lock in a separate class.
Reported-by: Arend van Spriel arend@broadcom.com Reported-by: Peter Hurley peter@hurleysoftware.com Reported-by: Maarten Lankhorst maarten.lankhorst@canonical.com Reported-by: Daniel J Blueman daniel@quora.org Signed-off-by: Marcin Slusarz marcin.slusarz@gmail.com Cc: stable@vger.kernel.org [3.7, but needs s/const ofuncs/ofuncs/ to build] --- Lightly tested, only on NV4B and NVC1. --- drivers/gpu/drm/nouveau/core/core/subdev.c | 2 +- drivers/gpu/drm/nouveau/core/include/core/object.h | 7 +++++-- drivers/gpu/drm/nouveau/nouveau_drm.c | 3 +++ 3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/core/core/subdev.c b/drivers/gpu/drm/nouveau/core/core/subdev.c index f74c30a..48f0637 100644 --- a/drivers/gpu/drm/nouveau/core/core/subdev.c +++ b/drivers/gpu/drm/nouveau/core/core/subdev.c @@ -99,7 +99,7 @@ nouveau_subdev_create_(struct nouveau_object *parent, if (ret) return ret;
- mutex_init(&subdev->mutex); + __mutex_init(&subdev->mutex, subname, &oclass->lock_class_key); subdev->name = subname;
if (parent) { diff --git a/drivers/gpu/drm/nouveau/core/include/core/object.h b/drivers/gpu/drm/nouveau/core/include/core/object.h index 6a90267..62e68ba 100644 --- a/drivers/gpu/drm/nouveau/core/include/core/object.h +++ b/drivers/gpu/drm/nouveau/core/include/core/object.h @@ -50,10 +50,13 @@ int nouveau_object_fini(struct nouveau_object *, bool suspend);
extern struct nouveau_ofuncs nouveau_object_ofuncs;
+/* Don't allocate dynamically, because lockdep needs lock_class_keys to be in + * ".data". */ struct nouveau_oclass { u32 handle; - struct nouveau_ofuncs *ofuncs; - struct nouveau_omthds *omthds; + struct nouveau_ofuncs * const ofuncs; + struct nouveau_omthds * const omthds; + struct lock_class_key lock_class_key; };
#define nv_oclass(o) nv_object(o)->oclass diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index ef1ad21..bc00587 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -245,6 +245,8 @@ static int nouveau_drm_probe(struct pci_dev *pdev, return 0; }
+static struct lock_class_key drm_client_lock_class_key; + static int nouveau_drm_load(struct drm_device *dev, unsigned long flags) { @@ -256,6 +258,7 @@ nouveau_drm_load(struct drm_device *dev, unsigned long flags) ret = nouveau_cli_create(pdev, "DRM", sizeof(*drm), (void**)&drm); if (ret) return ret; + lockdep_set_class(&drm->client.mutex, &drm_client_lock_class_key);
dev->dev_private = drm; drm->dev = dev;
Op 04-02-13 22:30, Marcin Slusarz schreef:
Apart from this specific case, is there any other reason why we require being able to nest 2 subdev locks?
Add a NVOBJ_FLAG_CREATE_EXCLUSIVE flag to nouveau_engctx_create and return -EBUSY if there is any existing object. Problem solved, and lockdep is still generic.
Can you copy paste a typical lockdep splat for this? Also this should be a separate patch. :-)
Hi Maarten
On Mon, 2013-02-04 at 22:59 +0100, Maarten Lankhorst wrote:
PS - Deep call chain. Has anyone looked into max stack depth yet?
[ 5.995881] ============================================= [ 5.995886] [ INFO: possible recursive locking detected ] [ 5.995892] 3.8.0-next-20130125+ttypatch-xeon+lockdep #20130125+ttypatch Not tainted [ 5.995898] --------------------------------------------- [ 5.995904] modprobe/275 is trying to acquire lock: [ 5.995909] (&subdev->mutex){+.+.+.}, at: [<ffffffffa00d10b8>] nouveau_instobj_create_+0x48/0x90 [nouveau] [ 5.995955] [ 5.995955] but task is already holding lock: [ 5.995961] (&subdev->mutex){+.+.+.}, at: [<ffffffffa00da3b5>] nv50_disp_data_ctor+0x65/0xd0 [nouveau] [ 5.995989] [ 5.995989] other info that might help us debug this: [ 5.995995] Possible unsafe locking scenario: [ 5.995995] [ 5.996001] CPU0 [ 5.996004] ---- [ 5.996005] lock(&subdev->mutex); [ 5.996005] lock(&subdev->mutex); [ 5.996005] [ 5.996005] *** DEADLOCK *** [ 5.996005] [ 5.996005] May be due to missing lock nesting notation [ 5.996005] [ 5.996005] 4 locks held by modprobe/275: [ 5.996005] #0: (&__lockdep_no_validate__){......}, at: [<ffffffff8146fa5b>] __driver_attach+0x5b/0xb0 [ 5.996005] #1: (&__lockdep_no_validate__){......}, at: [<ffffffff8146fa69>] __driver_attach+0x69/0xb0 [ 5.996005] #2: (drm_global_mutex){+.+.+.}, at: [<ffffffffa0028d86>] drm_get_pci_dev+0xc6/0x2d0 [drm] [ 5.996005] #3: (&subdev->mutex){+.+.+.}, at: [<ffffffffa00da3b5>] nv50_disp_data_ctor+0x65/0xd0 [nouveau] [ 5.996005] [ 5.996005] stack backtrace: [ 5.996005] Pid: 275, comm: modprobe Not tainted 3.8.0-next-20130125+ttypatch-xeon+lockdep #20130125+ttypatch [ 5.996005] Call Trace: [ 5.996005] [<ffffffff810bd437>] __lock_acquire+0x687/0x1a70 [ 5.996005] [<ffffffff810bf70b>] ? mark_held_locks+0x9b/0x100 [ 5.996005] [<ffffffff810bf87d>] ? trace_hardirqs_on_caller+0x10d/0x1a0 [ 5.996005] [<ffffffff810bf91d>] ? trace_hardirqs_on+0xd/0x10 [ 5.996005] [<ffffffff8170a87a>] ? _raw_write_unlock_irqrestore+0x4a/0x90 [ 5.996005] [<ffffffff810bedc8>] lock_acquire+0x98/0x150 [ 5.996005] [<ffffffffa00d10b8>] ? nouveau_instobj_create_+0x48/0x90 [nouveau] [ 5.996005] [<ffffffffa00d10b8>] ? nouveau_instobj_create_+0x48/0x90 [nouveau] [ 5.996005] [<ffffffff81707f60>] mutex_lock_nested+0x50/0x390 [ 5.996005] [<ffffffffa00d10b8>] ? nouveau_instobj_create_+0x48/0x90 [nouveau] [ 5.996005] [<ffffffffa00b6376>] ? nouveau_object_ref+0x46/0xd0 [nouveau] [ 5.996005] [<ffffffffa00b64b5>] ? nouveau_object_create_+0x65/0xa0 [nouveau] [ 5.996005] [<ffffffffa00d10b8>] nouveau_instobj_create_+0x48/0x90 [nouveau] [ 5.996005] [<ffffffffa00d1be1>] nv50_instobj_ctor+0x51/0xf0 [nouveau] [ 5.996005] [<ffffffffa00b62a8>] nouveau_object_ctor+0x38/0xc0 [nouveau] [ 5.996005] [<ffffffffa00d1b36>] nv50_instmem_alloc+0x26/0x30 [nouveau] [ 5.996005] [<ffffffffa00b49a7>] nouveau_gpuobj_create_+0x247/0x2f0 [nouveau] [ 5.996005] [<ffffffff8170afed>] ? _raw_spin_unlock_irqrestore+0x6d/0x90 [ 5.996005] [<ffffffff810bf87d>] ? trace_hardirqs_on_caller+0x10d/0x1a0 [ 5.996005] [<ffffffffa00b34dc>] nouveau_engctx_create_+0x26c/0x2b0 [nouveau] [ 5.996005] [<ffffffffa00da411>] nv50_disp_data_ctor+0xc1/0xd0 [nouveau] [ 5.996005] [<ffffffffa00b62a8>] nouveau_object_ctor+0x38/0xc0 [nouveau] [ 5.996005] [<ffffffffa00b6b82>] nouveau_object_new+0x112/0x240 [nouveau] [ 5.996005] [<ffffffffa0155fe5>] nv50_display_create+0x1a5/0x890 [nouveau] [ 5.996005] [<ffffffff8107837b>] ? __cancel_work_timer+0x8b/0xe0 [ 5.996005] [<ffffffffa013c80b>] nouveau_display_create+0x3cb/0x670 [nouveau] [ 5.996005] [<ffffffffa012ba0f>] nouveau_drm_load+0x26f/0x590 [nouveau] [ 5.996005] [<ffffffff8146caee>] ? device_register+0x1e/0x30 [ 5.996005] [<ffffffffa002a626>] ? drm_sysfs_device_add+0x86/0xb0 [drm] [ 5.996005] [<ffffffffa0028e46>] drm_get_pci_dev+0x186/0x2d0 [drm] [ 5.996005] [<ffffffffa012bf9a>] nouveau_drm_probe+0x26a/0x2c0 [nouveau] [ 5.996005] [<ffffffff8170afca>] ? _raw_spin_unlock_irqrestore+0x4a/0x90 [ 5.996005] [<ffffffff81393a8b>] local_pci_probe+0x4b/0x80 [ 5.996005] [<ffffffff81393da1>] pci_device_probe+0x111/0x120 [ 5.996005] [<ffffffff8146f6eb>] driver_probe_device+0x8b/0x3a0 [ 5.996005] [<ffffffff8146faab>] __driver_attach+0xab/0xb0 [ 5.996005] [<ffffffff8146fa00>] ? driver_probe_device+0x3a0/0x3a0 [ 5.996005] [<ffffffff8146d635>] bus_for_each_dev+0x55/0x90 [ 5.996005] [<ffffffff8146f12e>] driver_attach+0x1e/0x20 [ 5.996005] [<ffffffff8146ebe1>] bus_add_driver+0x121/0x2b0 [ 5.996005] [<ffffffffa01ad000>] ? 0xffffffffa01acfff [ 5.996005] [<ffffffff814701b7>] driver_register+0x77/0x170 [ 5.996005] [<ffffffffa01ad000>] ? 0xffffffffa01acfff [ 5.996005] [<ffffffff81392b14>] __pci_register_driver+0x64/0x70 [ 5.996005] [<ffffffffa00290aa>] drm_pci_init+0x11a/0x130 [drm] [ 5.996005] [<ffffffffa01ad000>] ? 0xffffffffa01acfff [ 5.996005] [<ffffffffa01ad000>] ? 0xffffffffa01acfff [ 5.996005] [<ffffffffa01ad04d>] nouveau_drm_init+0x4d/0x1000 [nouveau] [ 5.996005] [<ffffffff8100215a>] do_one_initcall+0x12a/0x180 [ 5.996005] [<ffffffff810cc6a7>] load_module+0x12f7/0x1be0 [ 5.996005] [<ffffffff8137ce20>] ? ddebug_proc_open+0xd0/0xd0 [ 5.996005] [<ffffffff8136993e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 5.996005] [<ffffffff810cd067>] sys_init_module+0xd7/0x120 [ 5.996005] [<ffffffff81712f59>] system_call_fastpath+0x16/0x1b
On Mon, Feb 04, 2013 at 10:59:28PM +0100, Maarten Lankhorst wrote:
I think I tend to prefer Marcin's fix for this actually. The subdev's are completely separate classes of objects and as interaction between them increases (PM will be very much like this), we may very well require holding multiple subdev mutexes at once.
Ben.
Hey,
Op 05-02-13 21:52, Ben Skeggs schreef:
Depends, I think for this specific example I think my cleanup is better.
For the generic case you could use nested mutexes, which will give you a different lockdep class when you need it. It's probably better to have those cases where you do need to nest locking annotated:
mutex_lock_nested(&mutex, SINGLE_DEPTH_NESTING);
See also Documentation/lockdep-design.txt
~Maarten
dri-devel@lists.freedesktop.org