Hi,
I have a Lenovo T410s (Ironlake/Arrandale graphics) that I use docked and connected to an external DP monitor (laptop is closed so the only active display is the DP monitor). With 3.12-rc4 I reproducably hit a deadlock when I undock the laptop -- it seems to be a deadlock on dev->mode_confix.mutext between intel_crtc_wait_for_pending_flips (mutex held because of drm_modeset_lock_all in intel_lid_notify) and drm_mode_getconnector. See below for the kernel logging (there's no logging for more than 2 minutes before the hung task stuff) -- it seems that intel_crtc_wait_for_pending_flips is getting stuck, and then everything else piles up behind it:
[ 241.009121] INFO: task kworker/0:1:70 blocked for more than 120 seconds. [ 241.009131] Tainted: G W 3.12.0-999-generic #201310070425 [ 241.009133] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 241.009137] kworker/0:1 D 0000000000000000 0 70 2 0x00000000 [ 241.009152] Workqueue: kacpi_notify acpi_os_execute_deferred [ 241.009156] ffff88022b2c9ae8 0000000000000046 ffff88022b2c9ab8 000000008e17de71 [ 241.009162] ffff88022b2c9fd8 ffff88022b2c9fd8 ffff88022b2c9fd8 00000000000144c0 [ 241.009167] ffffffff81c144a0 ffff88022b290000 0000000000000286 ffff88022ae70000 [ 241.009173] Call Trace: [ 241.009186] [<ffffffff81753619>] schedule+0x29/0x70 [ 241.009247] [<ffffffffa011eb9d>] intel_crtc_wait_for_pending_flips+0x8d/0x120 [i915] [ 241.009257] [<ffffffff8108cc30>] ? add_wait_queue+0x60/0x60 [ 241.009289] [<ffffffffa0127baf>] ironlake_crtc_disable+0x7f/0x2a0 [i915] [ 241.009318] [<ffffffffa012a0f6>] intel_crtc_update_dpms+0x76/0xb0 [i915] [ 241.009347] [<ffffffffa012d8e5>] intel_sanitize_crtc+0xd5/0x370 [i915] [ 241.009377] [<ffffffffa012e1ed>] intel_modeset_setup_hw_state+0x17d/0x380 [i915] [ 241.009407] [<ffffffffa01307a1>] intel_lid_notify+0xc1/0x100 [i915] [ 241.009412] [<ffffffff817596ad>] notifier_call_chain+0x4d/0x70 [ 241.009419] [<ffffffff81091ea8>] __blocking_notifier_call_chain+0x58/0x80 [ 241.009424] [<ffffffff81091ee6>] blocking_notifier_call_chain+0x16/0x20 [ 241.009431] [<ffffffff8142b15b>] acpi_lid_send_state+0x86/0xaf [ 241.009436] [<ffffffff8142b1e1>] acpi_button_notify+0x3b/0xa2 [ 241.009442] [<ffffffff814032f1>] acpi_device_notify+0x19/0x1b [ 241.009448] [<ffffffff814131cd>] acpi_ev_notify_dispatch+0x41/0x5c [ 241.009453] [<ffffffff813ff59e>] acpi_os_execute_deferred+0x25/0x32 [ 241.009458] [<ffffffff81083ccf>] process_one_work+0x17f/0x4d0 [ 241.009463] [<ffffffff81084f0b>] worker_thread+0x11b/0x3d0 [ 241.009468] [<ffffffff81084df0>] ? manage_workers.isra.20+0x1b0/0x1b0 [ 241.009473] [<ffffffff8108c0d0>] kthread+0xc0/0xd0 [ 241.009479] [<ffffffff8108c010>] ? flush_kthread_worker+0xb0/0xb0 [ 241.009484] [<ffffffff8175dfbc>] ret_from_fork+0x7c/0xb0 [ 241.009489] [<ffffffff8108c010>] ? flush_kthread_worker+0xb0/0xb0 [ 241.009525] INFO: task Xorg:1436 blocked for more than 120 seconds. [ 241.009528] Tainted: G W 3.12.0-999-generic #201310070425 [ 241.009531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 241.009533] Xorg D 0000000000000000 0 1436 1410 0x00400004 [ 241.009538] ffff88022ca1bc48 0000000000000082 ffff88022ca1bc18 ffffffff811a31a1 [ 241.009543] ffff88022ca1bfd8 ffff88022ca1bfd8 ffff88022ca1bfd8 00000000000144c0 [ 241.009549] ffff880230198000 ffff8800b06546e0 ffff88022ca1bc28 ffff8800363b8330 [ 241.009554] Call Trace: [ 241.009562] [<ffffffff811a31a1>] ? kmem_cache_free+0x121/0x180 [ 241.009568] [<ffffffff81753619>] schedule+0x29/0x70 [ 241.009574] [<ffffffff8175394e>] schedule_preempt_disabled+0xe/0x10 [ 241.009580] [<ffffffff817518c4>] __mutex_lock_slowpath+0x114/0x1b0 [ 241.009585] [<ffffffff81751983>] mutex_lock+0x23/0x40 [ 241.009613] [<ffffffffa004e582>] drm_mode_getconnector+0xb2/0x430 [drm] [ 241.009633] [<ffffffffa003ef1a>] drm_ioctl+0x4fa/0x620 [drm] [ 241.009656] [<ffffffffa004e4d0>] ? drm_mode_getcrtc+0xe0/0xe0 [drm] [ 241.009663] [<ffffffff811cedda>] do_vfs_ioctl+0x7a/0x2e0 [ 241.009668] [<ffffffff811bcc01>] ? vfs_read+0x111/0x180 [ 241.009673] [<ffffffff811cf0d1>] SyS_ioctl+0x91/0xb0 [ 241.009678] [<ffffffff811bce40>] ? SyS_read+0x70/0xa0 [ 241.009683] [<ffffffff8175e06d>] system_call_fastpath+0x1a/0x1f
Detailed lspci info for my laptop:
00:00.0 Host bridge [0600]: Intel Corporation Core Processor DRAM Controller [8086:0044] (rev 02) 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) 00:16.0 Communication controller [0780]: Intel Corporation 5 Series/3400 Series Chipset HECI Controller [8086:3b64] (rev 06) 00:19.0 Ethernet controller [0200]: Intel Corporation 82577LM Gigabit Network Connection [8086:10ea] (rev 06) 00:1a.0 USB controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b3c] (rev 06) 00:1b.0 Audio device [0403]: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio [8086:3b57] (rev 06) 00:1c.0 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 [8086:3b42] (rev 06) 00:1c.1 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 [8086:3b44] (rev 06) 00:1c.3 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 [8086:3b48] (rev 06) 00:1d.0 USB controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b34] (rev 06) 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev a6) 00:1f.0 ISA bridge [0601]: Intel Corporation 5 Series/3400 Series Chipset LPC Interface Controller [8086:3b0f] (rev 06) 00:1f.2 SATA controller [0106]: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller [8086:3b2f] (rev 06) 00:1f.3 SMBus [0c05]: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller [8086:3b30] (rev 06) 00:1f.6 Signal processing controller [1180]: Intel Corporation 5 Series/3400 Series Chipset Thermal Subsystem [8086:3b32] (rev 06) 03:00.0 Network controller [0280]: Intel Corporation Centrino Wireless-N 1000 [Condor Peak] [8086:0084] ff:00.0 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers [8086:2c62] (rev 02) ff:00.1 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture System Address Decoder [8086:2d01] (rev 02) ff:02.0 Host bridge [0600]: Intel Corporation Core Processor QPI Link 0 [8086:2d10] (rev 02) ff:02.1 Host bridge [0600]: Intel Corporation Core Processor QPI Physical 0 [8086:2d11] (rev 02) ff:02.2 Host bridge [0600]: Intel Corporation Core Processor Reserved [8086:2d12] (rev 02) ff:02.3 Host bridge [0600]: Intel Corporation Core Processor Reserved [8086:2d13] (rev 02)
Please let me know if there is any way I can help debug this further.
Thanks! Roland
On Mon, Oct 07, 2013 at 03:41:05PM -0700, Roland Dreier wrote:
Hi,
I have a Lenovo T410s (Ironlake/Arrandale graphics) that I use docked and connected to an external DP monitor (laptop is closed so the only active display is the DP monitor). With 3.12-rc4 I reproducably hit a deadlock when I undock the laptop -- it seems to be a deadlock on dev->mode_confix.mutext between intel_crtc_wait_for_pending_flips (mutex held because of drm_modeset_lock_all in intel_lid_notify) and drm_mode_getconnector. See below for the kernel logging (there's no logging for more than 2 minutes before the hung task stuff) -- it seems that intel_crtc_wait_for_pending_flips is getting stuck, and then everything else piles up behind it:
[ 241.009121] INFO: task kworker/0:1:70 blocked for more than 120 seconds. [ 241.009131] Tainted: G W 3.12.0-999-generic #201310070425 [ 241.009133] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 241.009137] kworker/0:1 D 0000000000000000 0 70 2 0x00000000 [ 241.009152] Workqueue: kacpi_notify acpi_os_execute_deferred [ 241.009156] ffff88022b2c9ae8 0000000000000046 ffff88022b2c9ab8 000000008e17de71 [ 241.009162] ffff88022b2c9fd8 ffff88022b2c9fd8 ffff88022b2c9fd8 00000000000144c0 [ 241.009167] ffffffff81c144a0 ffff88022b290000 0000000000000286 ffff88022ae70000 [ 241.009173] Call Trace: [ 241.009186] [<ffffffff81753619>] schedule+0x29/0x70 [ 241.009247] [<ffffffffa011eb9d>] intel_crtc_wait_for_pending_flips+0x8d/0x120 [i915] [ 241.009257] [<ffffffff8108cc30>] ? add_wait_queue+0x60/0x60 [ 241.009289] [<ffffffffa0127baf>] ironlake_crtc_disable+0x7f/0x2a0 [i915] [ 241.009318] [<ffffffffa012a0f6>] intel_crtc_update_dpms+0x76/0xb0 [i915] [ 241.009347] [<ffffffffa012d8e5>] intel_sanitize_crtc+0xd5/0x370 [i915] [ 241.009377] [<ffffffffa012e1ed>] intel_modeset_setup_hw_state+0x17d/0x380 [i915] [ 241.009407] [<ffffffffa01307a1>] intel_lid_notify+0xc1/0x100 [i915] [ 241.009412] [<ffffffff817596ad>] notifier_call_chain+0x4d/0x70 [ 241.009419] [<ffffffff81091ea8>] __blocking_notifier_call_chain+0x58/0x80 [ 241.009424] [<ffffffff81091ee6>] blocking_notifier_call_chain+0x16/0x20 [ 241.009431] [<ffffffff8142b15b>] acpi_lid_send_state+0x86/0xaf [ 241.009436] [<ffffffff8142b1e1>] acpi_button_notify+0x3b/0xa2 [ 241.009442] [<ffffffff814032f1>] acpi_device_notify+0x19/0x1b [ 241.009448] [<ffffffff814131cd>] acpi_ev_notify_dispatch+0x41/0x5c [ 241.009453] [<ffffffff813ff59e>] acpi_os_execute_deferred+0x25/0x32 [ 241.009458] [<ffffffff81083ccf>] process_one_work+0x17f/0x4d0 [ 241.009463] [<ffffffff81084f0b>] worker_thread+0x11b/0x3d0 [ 241.009468] [<ffffffff81084df0>] ? manage_workers.isra.20+0x1b0/0x1b0 [ 241.009473] [<ffffffff8108c0d0>] kthread+0xc0/0xd0 [ 241.009479] [<ffffffff8108c010>] ? flush_kthread_worker+0xb0/0xb0 [ 241.009484] [<ffffffff8175dfbc>] ret_from_fork+0x7c/0xb0 [ 241.009489] [<ffffffff8108c010>] ? flush_kthread_worker+0xb0/0xb0 [ 241.009525] INFO: task Xorg:1436 blocked for more than 120 seconds. [ 241.009528] Tainted: G W 3.12.0-999-generic #201310070425 [ 241.009531] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 241.009533] Xorg D 0000000000000000 0 1436 1410 0x00400004 [ 241.009538] ffff88022ca1bc48 0000000000000082 ffff88022ca1bc18 ffffffff811a31a1 [ 241.009543] ffff88022ca1bfd8 ffff88022ca1bfd8 ffff88022ca1bfd8 00000000000144c0 [ 241.009549] ffff880230198000 ffff8800b06546e0 ffff88022ca1bc28 ffff8800363b8330 [ 241.009554] Call Trace: [ 241.009562] [<ffffffff811a31a1>] ? kmem_cache_free+0x121/0x180 [ 241.009568] [<ffffffff81753619>] schedule+0x29/0x70 [ 241.009574] [<ffffffff8175394e>] schedule_preempt_disabled+0xe/0x10 [ 241.009580] [<ffffffff817518c4>] __mutex_lock_slowpath+0x114/0x1b0 [ 241.009585] [<ffffffff81751983>] mutex_lock+0x23/0x40 [ 241.009613] [<ffffffffa004e582>] drm_mode_getconnector+0xb2/0x430 [drm] [ 241.009633] [<ffffffffa003ef1a>] drm_ioctl+0x4fa/0x620 [drm] [ 241.009656] [<ffffffffa004e4d0>] ? drm_mode_getcrtc+0xe0/0xe0 [drm] [ 241.009663] [<ffffffff811cedda>] do_vfs_ioctl+0x7a/0x2e0 [ 241.009668] [<ffffffff811bcc01>] ? vfs_read+0x111/0x180 [ 241.009673] [<ffffffff811cf0d1>] SyS_ioctl+0x91/0xb0 [ 241.009678] [<ffffffff811bce40>] ? SyS_read+0x70/0xa0 [ 241.009683] [<ffffffff8175e06d>] system_call_fastpath+0x1a/0x1f
Detailed lspci info for my laptop:
00:00.0 Host bridge [0600]: Intel Corporation Core Processor DRAM Controller [8086:0044] (rev 02) 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) 00:16.0 Communication controller [0780]: Intel Corporation 5 Series/3400 Series Chipset HECI Controller [8086:3b64] (rev 06) 00:19.0 Ethernet controller [0200]: Intel Corporation 82577LM Gigabit Network Connection [8086:10ea] (rev 06) 00:1a.0 USB controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b3c] (rev 06) 00:1b.0 Audio device [0403]: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio [8086:3b57] (rev 06) 00:1c.0 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 [8086:3b42] (rev 06) 00:1c.1 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 [8086:3b44] (rev 06) 00:1c.3 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 [8086:3b48] (rev 06) 00:1d.0 USB controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b34] (rev 06) 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev a6) 00:1f.0 ISA bridge [0601]: Intel Corporation 5 Series/3400 Series Chipset LPC Interface Controller [8086:3b0f] (rev 06) 00:1f.2 SATA controller [0106]: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller [8086:3b2f] (rev 06) 00:1f.3 SMBus [0c05]: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller [8086:3b30] (rev 06) 00:1f.6 Signal processing controller [1180]: Intel Corporation 5 Series/3400 Series Chipset Thermal Subsystem [8086:3b32] (rev 06) 03:00.0 Network controller [0280]: Intel Corporation Centrino Wireless-N 1000 [Condor Peak] [8086:0084] ff:00.0 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers [8086:2c62] (rev 02) ff:00.1 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture System Address Decoder [8086:2d01] (rev 02) ff:02.0 Host bridge [0600]: Intel Corporation Core Processor QPI Link 0 [8086:2d10] (rev 02) ff:02.1 Host bridge [0600]: Intel Corporation Core Processor QPI Physical 0 [8086:2d11] (rev 02) ff:02.2 Host bridge [0600]: Intel Corporation Core Processor Reserved [8086:2d12] (rev 02) ff:02.3 Host bridge [0600]: Intel Corporation Core Processor Reserved [8086:2d13] (rev 02)
Please let me know if there is any way I can help debug this further.
Can you please boot with drm.debug=0xe, reproduce the issue and then attach the dmesg. The additional debug spam should shed some light on how we managed to get into this peculiar situation ...
Please make sure that the dmesg includes everything since boot-up (so that we know all the details about your gfx hw). If it scrolls off before you reproduce the deadlock please attach 2 dmesgs.
Thanks, Daniel
On Mon, Oct 7, 2013 at 11:10 PM, Daniel Vetter daniel@ffwll.ch wrote:
Can you please boot with drm.debug=0xe, reproduce the issue and then attach the dmesg. The additional debug spam should shed some light on how we managed to get into this peculiar situation ...
Sure, here it is.
Thanks, Roland
On Tue, Oct 8, 2013 at 6:08 PM, Roland Dreier roland@kernel.org wrote:
On Mon, Oct 7, 2013 at 11:10 PM, Daniel Vetter daniel@ffwll.ch wrote:
Can you please boot with drm.debug=0xe, reproduce the issue and then attach the dmesg. The additional debug spam should shed some light on how we managed to get into this peculiar situation ...
Sure, here it is.
As suspected a DP screen, and we disconnect right before a modeset sequence. The unexpected thing here is that the modeset santizer kicks in and wreaks utter havoc due to a lid event. Now we shouldn't hang the entire box fundamentally when we unplug the DP cable, and that's a bug in the DP code. It's also quite some work. But we also should be a bit less enthusiastic with wreaking havoc in the lid noifier, and I think that can be fixed quickly.
I'll reply with some patch. -Daniel
dri-devel@lists.freedesktop.org