[PATCH 1/2] [RESEND] console: use might_sleep in console_lock

List overview All Threads
Download

newer

older

[Bug 49981] On HD6850, Power...

Daniel Vetter

17 Sep 2012 17 Sep '12

11:03 p.m.

Instead of BUG_ON(in_interrupt()), since that doesn't check for all the newfangled stuff like preempt.

Note that this is valid since the console_sem is essentially used like a real mutex with only two twists: - we allow trylock from hardirq context - across suspend/resume we lock the logical console_lock, but drop the semaphore protecting the locking state.

Now that doesn't guarantee that no one is playing tricks in single-thread atomic contexts at suspend/resume/boot time, but - I couldn't find anything suspicious with some grepping, - might_sleep shouldn't die, - and I think the upside of catching more potential issues is worth the risk of getting a might_sleep backtrace that would have been save (and then dealing with that fallout).

Cc: Dave Airlie airlied@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Alan Cox alan@lxorguk.ukuu.org.uk Cc: Peter Zijlstra a.p.zijlstra@chello.nl Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- kernel/printk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/printk.c b/kernel/printk.c index 66a2ea3..ed9af6a 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -1909,7 +1909,8 @@ static int __cpuinit console_cpu_notify(struct notifier_block *self, */ void console_lock(void) { - BUG_ON(in_interrupt()); + might_sleep(); + down(&console_sem); if (console_suspended) return;

-- 1.7.10.4

Show replies by date

Daniel Vetter

17 Sep 17 Sep

11:03 p.m.

New subject: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock:

https://lkml.org/lkml/2012/8/21/36

Unfortunately the console_lock isn't a plain mutex and hence has no lockdep support. Which resulted in a few days wasted of tracking down this bug (complicated by the fact that printk doesn't show anything when the console is locked) instead of noticing the bug much earlier with the lockdep splat.

Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen.

Now the console_lock has rather funky semantics, so after a quick irc discussion with Thomas Gleixner and Dave Airlie I've quickly ditched the original idead of switching to a real mutex (since it won't work) and instead opted to annotate the console_lock with lockdep information manually.

There are a few special cases: - The console_lock state is protected by the console_sem, and usually grabbed/dropped at _lock/_unlock time. But the suspend/resume code drops the semaphore without dropping the console_lock (see suspend_console/resume_console). But since the same thread that did the suspend will do the resume, we don't need to fix up anything.

- In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either.

- The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that.

This all leaves us with annotating the normal console_lock, _unlock and _trylock functions.

And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit:

====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc2+ #552 Not tainted ------------------------------------------------------- kms-reload/3577 is trying to acquire lock: ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7

but task is already holding lock: (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (console_lock){+.+.+.}: [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81040190>] console_lock+0x59/0x5b [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4 [<ffffffff810584a2>] worker_thread+0x1a7/0x24b [<ffffffff8105ca29>] kthread+0x7f/0x87 [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10

-> #0 ((&info->queue)){+.+...}: [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6 [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81058cab>] wait_on_work+0x3b/0xa7 [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102 [<ffffffff81058e33>] cancel_work_sync+0xb/0xd [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc [<ffffffff81264793>] bind_con_driver+0x145/0x263 [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195 [<ffffffff8126540c>] store_bind+0x1ad/0x1c1 [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121 [<ffffffff811145b2>] vfs_write+0x9b/0xfd [<ffffffff811147b7>] sys_write+0x3e/0x6b [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1 ---- ---- lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue));

*** DEADLOCK ***

diff --git a/kernel/printk.c b/kernel/printk.c index ed9af6a..ab2ab24 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -87,6 +87,12 @@ static DEFINE_SEMAPHORE(console_sem); struct console *console_drivers; EXPORT_SYMBOL_GPL(console_drivers);

+#ifdef CONFIG_LOCKDEP +struct lockdep_map console_lock_dep_map = { + .name = "console_lock" +}; +#endif + /* * This is used for debugging the mess that is the VT code by * keeping track if we have the console semaphore held. It's @@ -1916,6 +1922,7 @@ void console_lock(void) return; console_locked = 1; console_may_schedule = 1; + mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_); } EXPORT_SYMBOL(console_lock);

@@ -1937,6 +1944,7 @@ int console_trylock(void) } console_locked = 1; console_may_schedule = 0; + mutex_acquire(&console_lock_dep_map, 0, 1, _RET_IP_); return 1; } EXPORT_SYMBOL(console_trylock); @@ -2097,6 +2105,7 @@ skip: local_irq_restore(flags); } console_locked = 0; + mutex_release(&console_lock_dep_map, 1, _RET_IP_);

/* Release the exclusive_console once it is used */ if (unlikely(exclusive_console))

-- 1.7.10.4

Jani Nikula

18 Sep 18 Sep

7:33 a.m.

New subject: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

On Tue, 18 Sep 2012, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...

+#ifdef CONFIG_LOCKDEP +struct lockdep_map console_lock_dep_map = {

.name = "console_lock"

+}; +#endif

static?

BR, Jani.

Daniel Vetter

19 Sep 19 Sep

7:30 a.m.

New subject: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

On Tue, Sep 18, 2012 at 10:33:28AM +0300, Jani Nikula wrote:

...

On Tue, 18 Sep 2012, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...
+#ifdef CONFIG_LOCKDEP +struct lockdep_map console_lock_dep_map = {

.name = "console_lock"

+}; +#endif

static?

Yeah, static. I'm travelling atm, so will take a while to resend ... -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Vetter

22 Sep 22 Sep

5:52 p.m.

New subject: [PATCH] console: implement lockdep support for console_lock

Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock:

https://lkml.org/lkml/2012/8/21/36

Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen.

- The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that.

This all leaves us with annotating the normal console_lock, _unlock and _trylock functions.

And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit:

but task is already holding lock: (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1 ---- ---- lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue));

*** DEADLOCK ***

v2: Mark the lockdep_map static, noticed by Jani Nikula.

diff --git a/kernel/printk.c b/kernel/printk.c index ed9af6a..e5c6dba 100644 --- a/kernel/printk.c +++ b/kernel/printk.c @@ -87,6 +87,12 @@ static DEFINE_SEMAPHORE(console_sem); struct console *console_drivers; EXPORT_SYMBOL_GPL(console_drivers);

+#ifdef CONFIG_LOCKDEP +static struct lockdep_map console_lock_dep_map = { + .name = "console_lock" +}; +#endif + /* * This is used for debugging the mess that is the VT code by * keeping track if we have the console semaphore held. It's @@ -1916,6 +1922,7 @@ void console_lock(void) return; console_locked = 1; console_may_schedule = 1; + mutex_acquire(&console_lock_dep_map, 0, 0, _RET_IP_); } EXPORT_SYMBOL(console_lock);

/* Release the exclusive_console once it is used */ if (unlikely(exclusive_console))

-- 1.7.10.4

Greg KH

8:06 p.m.

New subject: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock

On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote:

...

Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock:

https://lkml.org/lkml/2012/8/21/36

Unfortunately the console_lock isn't a plain mutex and hence has no lockdep support. Which resulted in a few days wasted of tracking down this bug (complicated by the fact that printk doesn't show anything when the console is locked) instead of noticing the bug much earlier with the lockdep splat.

Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen.

Now the console_lock has rather funky semantics, so after a quick irc discussion with Thomas Gleixner and Dave Airlie I've quickly ditched the original idead of switching to a real mutex (since it won't work) and instead opted to annotate the console_lock with lockdep information manually.

There are a few special cases:

The console_lock state is protected by the console_sem, and usually grabbed/dropped at _lock/_unlock time. But the suspend/resume code drops the semaphore without dropping the console_lock (see suspend_console/resume_console). But since the same thread that did the suspend will do the resume, we don't need to fix up anything.

In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either.

The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that.

This all leaves us with annotating the normal console_lock, _unlock and _trylock functions.

And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit:

====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc2+ #552 Not tainted

kms-reload/3577 is trying to acquire lock: ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7

but task is already holding lock: (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (console_lock){+.+.+.}: [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81040190>] console_lock+0x59/0x5b [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4 [<ffffffff810584a2>] worker_thread+0x1a7/0x24b [<ffffffff8105ca29>] kthread+0x7f/0x87 [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10

-> #0 ((&info->queue)){+.+...}: [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6 [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81058cab>] wait_on_work+0x3b/0xa7 [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102 [<ffffffff81058e33>] cancel_work_sync+0xb/0xd [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc [<ffffffff81264793>] bind_con_driver+0x145/0x263 [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195 [<ffffffff8126540c>] store_bind+0x1ad/0x1c1 [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121 [<ffffffff811145b2>] vfs_write+0x9b/0xfd [<ffffffff811147b7>] sys_write+0x3e/0x6b [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

Possible unsafe locking scenario:
   CPU0                    CPU1
   ----                    ----
lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue));

*** DEADLOCK ***

v2: Mark the lockdep_map static, noticed by Jani Nikula.

Cc: Dave Airlie airlied@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Alan Cox alan@lxorguk.ukuu.org.uk Cc: Peter Zijlstra a.p.zijlstra@chello.nl Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

kernel/printk.c | 9 +++++++++ 1 file changed, 9 insertions(+)

So I'm guessing I should take this through the tty tree, right? Any objections to that for 3.7?

thanks,

greg k-h

Daniel Vetter

24 Sep 24 Sep

11:36 a.m.

New subject: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock

On Sat, Sep 22, 2012 at 01:06:29PM -0700, Greg KH wrote:

...

On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote:

...
Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock:

https://lkml.org/lkml/2012/8/21/36

Unfortunately the console_lock isn't a plain mutex and hence has no lockdep support. Which resulted in a few days wasted of tracking down this bug (complicated by the fact that printk doesn't show anything when the console is locked) instead of noticing the bug much earlier with the lockdep splat.

Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen.

Now the console_lock has rather funky semantics, so after a quick irc discussion with Thomas Gleixner and Dave Airlie I've quickly ditched the original idead of switching to a real mutex (since it won't work) and instead opted to annotate the console_lock with lockdep information manually.

There are a few special cases:

The console_lock state is protected by the console_sem, and usually grabbed/dropped at _lock/_unlock time. But the suspend/resume code drops the semaphore without dropping the console_lock (see suspend_console/resume_console). But since the same thread that did the suspend will do the resume, we don't need to fix up anything.

In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either.

The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that.

This all leaves us with annotating the normal console_lock, _unlock and _trylock functions.

And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit:

====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc2+ #552 Not tainted

kms-reload/3577 is trying to acquire lock: ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7

but task is already holding lock: (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (console_lock){+.+.+.}: [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81040190>] console_lock+0x59/0x5b [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4 [<ffffffff810584a2>] worker_thread+0x1a7/0x24b [<ffffffff8105ca29>] kthread+0x7f/0x87 [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10

-> #0 ((&info->queue)){+.+...}: [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6 [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81058cab>] wait_on_work+0x3b/0xa7 [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102 [<ffffffff81058e33>] cancel_work_sync+0xb/0xd [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc [<ffffffff81264793>] bind_con_driver+0x145/0x263 [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195 [<ffffffff8126540c>] store_bind+0x1ad/0x1c1 [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121 [<ffffffff811145b2>] vfs_write+0x9b/0xfd [<ffffffff811147b7>] sys_write+0x3e/0x6b [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

Possible unsafe locking scenario:
   CPU0                    CPU1
   ----                    ----
lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue));

*** DEADLOCK ***

v2: Mark the lockdep_map static, noticed by Jani Nikula.

Cc: Dave Airlie airlied@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Alan Cox alan@lxorguk.ukuu.org.uk Cc: Peter Zijlstra a.p.zijlstra@chello.nl Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

kernel/printk.c | 9 +++++++++ 1 file changed, 9 insertions(+)
So I'm guessing I should take this through the tty tree, right? Any objections to that for 3.7?

I didn't know who would be the relevant maintainer, so just spammed a few people. Would be awesome if you could merge these patches for 3.7, and at least Alan Cox seems to like them:

http://marc.info/?l=linux-fbdev&m=134564125601147&w=1

Thanks, Daniel

...

thanks,

greg k-h

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Vetter

2 Oct 2 Oct

12:56 p.m.

New subject: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock

On Sat, Sep 22, 2012 at 10:06 PM, Greg KH gregkh@linuxfoundation.org wrote:

...

On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote:

...
Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock:

https://lkml.org/lkml/2012/8/21/36

Unfortunately the console_lock isn't a plain mutex and hence has no lockdep support. Which resulted in a few days wasted of tracking down this bug (complicated by the fact that printk doesn't show anything when the console is locked) instead of noticing the bug much earlier with the lockdep splat.

Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen.

Now the console_lock has rather funky semantics, so after a quick irc discussion with Thomas Gleixner and Dave Airlie I've quickly ditched the original idead of switching to a real mutex (since it won't work) and instead opted to annotate the console_lock with lockdep information manually.

There are a few special cases:

The console_lock state is protected by the console_sem, and usually grabbed/dropped at _lock/_unlock time. But the suspend/resume code drops the semaphore without dropping the console_lock (see suspend_console/resume_console). But since the same thread that did the suspend will do the resume, we don't need to fix up anything.

In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either.

The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that.

This all leaves us with annotating the normal console_lock, _unlock and _trylock functions.

And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit:

====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc2+ #552 Not tainted

kms-reload/3577 is trying to acquire lock: ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7

but task is already holding lock: (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (console_lock){+.+.+.}: [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81040190>] console_lock+0x59/0x5b [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4 [<ffffffff810584a2>] worker_thread+0x1a7/0x24b [<ffffffff8105ca29>] kthread+0x7f/0x87 [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10

-> #0 ((&info->queue)){+.+...}: [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6 [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81058cab>] wait_on_work+0x3b/0xa7 [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102 [<ffffffff81058e33>] cancel_work_sync+0xb/0xd [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc [<ffffffff81264793>] bind_con_driver+0x145/0x263 [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195 [<ffffffff8126540c>] store_bind+0x1ad/0x1c1 [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121 [<ffffffff811145b2>] vfs_write+0x9b/0xfd [<ffffffff811147b7>] sys_write+0x3e/0x6b [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

Possible unsafe locking scenario:
   CPU0                    CPU1
   ----                    ----
lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue));

*** DEADLOCK ***

v2: Mark the lockdep_map static, noticed by Jani Nikula.

Cc: Dave Airlie airlied@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Alan Cox alan@lxorguk.ukuu.org.uk Cc: Peter Zijlstra a.p.zijlstra@chello.nl Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

kernel/printk.c | 9 +++++++++ 1 file changed, 9 insertions(+)
So I'm guessing I should take this through the tty tree, right? Any objections to that for 3.7?

I've noticed that the tty tree went in already :( Any chance you could still slip this in for 3.7? I'd _really_ like to have this stuff in for debugging console_lock madness in drm drivers - we've already had our fair share of those ...

Thanks, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Greg KH

1:28 p.m.

New subject: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock

On Tue, Oct 02, 2012 at 02:56:48PM +0200, Daniel Vetter wrote:

...

On Sat, Sep 22, 2012 at 10:06 PM, Greg KH gregkh@linuxfoundation.org wrote:

...
On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote:

...
Dave Airlie recently discovered a locking bug in the fbcon layer, where a timer_del_sync (for the blinking cursor) deadlocks with the timer itself, since both (want to) hold the console_lock:

https://lkml.org/lkml/2012/8/21/36

Unfortunately the console_lock isn't a plain mutex and hence has no lockdep support. Which resulted in a few days wasted of tracking down this bug (complicated by the fact that printk doesn't show anything when the console is locked) instead of noticing the bug much earlier with the lockdep splat.

Hence I've figured I need to fix that for the next deadlock involving console_lock - and with kms/drm growing ever more complex locking that'll eventually happen.

Now the console_lock has rather funky semantics, so after a quick irc discussion with Thomas Gleixner and Dave Airlie I've quickly ditched the original idead of switching to a real mutex (since it won't work) and instead opted to annotate the console_lock with lockdep information manually.

There are a few special cases:

The console_lock state is protected by the console_sem, and usually grabbed/dropped at _lock/_unlock time. But the suspend/resume code drops the semaphore without dropping the console_lock (see suspend_console/resume_console). But since the same thread that did the suspend will do the resume, we don't need to fix up anything.

In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either.

The console_lock can also be acquired form irq context (but only with a trylock). lockdep already handles that.

This all leaves us with annotating the normal console_lock, _unlock and _trylock functions.

And yes, it works - simply unloading a drm kms driver resulted in lockdep complaining about the deadlock in fbcon_deinit:

====================================================== [ INFO: possible circular locking dependency detected ] 3.6.0-rc2+ #552 Not tainted

kms-reload/3577 is trying to acquire lock: ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7

but task is already holding lock: (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (console_lock){+.+.+.}: [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81040190>] console_lock+0x59/0x5b [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4 [<ffffffff810584a2>] worker_thread+0x1a7/0x24b [<ffffffff8105ca29>] kthread+0x7f/0x87 [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10

-> #0 ((&info->queue)){+.+...}: [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6 [<ffffffff81087440>] lock_acquire+0x95/0x105 [<ffffffff81058cab>] wait_on_work+0x3b/0xa7 [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102 [<ffffffff81058e33>] cancel_work_sync+0xb/0xd [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc [<ffffffff81264793>] bind_con_driver+0x145/0x263 [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195 [<ffffffff8126540c>] store_bind+0x1ad/0x1c1 [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121 [<ffffffff811145b2>] vfs_write+0x9b/0xfd [<ffffffff811147b7>] sys_write+0x3e/0x6b [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

Possible unsafe locking scenario:
   CPU0                    CPU1
   ----                    ----
lock(console_lock); lock((&info->queue)); lock(console_lock); lock((&info->queue));

*** DEADLOCK ***

v2: Mark the lockdep_map static, noticed by Jani Nikula.

Cc: Dave Airlie airlied@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Alan Cox alan@lxorguk.ukuu.org.uk Cc: Peter Zijlstra a.p.zijlstra@chello.nl Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch

kernel/printk.c | 9 +++++++++ 1 file changed, 9 insertions(+)
So I'm guessing I should take this through the tty tree, right? Any objections to that for 3.7?
I've noticed that the tty tree went in already :( Any chance you could still slip this in for 3.7? I'd _really_ like to have this stuff in for debugging console_lock madness in drm drivers - we've already had our fair share of those ...

No, as it hasn't been in linux-next already, I can't send it in for 3.7, sorry, you know that. I'll be glad to queue it up for 3.8 if you want me to.

thanks,

greg k-h

Daniel Vetter

1:31 p.m.

New subject: [Intel-gfx] [PATCH] console: implement lockdep support for console_lock

On Tue, Oct 2, 2012 at 3:28 PM, Greg KH gregkh@linuxfoundation.org wrote:

...

No, as it hasn't been in linux-next already, I can't send it in for 3.7, sorry, you know that. I'll be glad to queue it up for 3.8 if you want me to.

Hey, was worth a shot ;-) Yeah, if you can pick it up for 3.8, that would be nice, since the patches have been floating for a while by now ...

Thanks, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Peter Zijlstra

24 Sep 24 Sep

12:17 p.m.

New subject: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

On Tue, 2012-09-18 at 01:03 +0200, Daniel Vetter wrote:

...

In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either.

Not particularly evil, just plain broken.

See this series:

http://marc.info/?l=linux-kernel&m=132446644123326&w=2

In particular see patches 1-4. The problem with lockdep is that lockdep uses printk and printk's recursion stuff is utterly broken.

Console drivers being on crack simply doesn't help :-) Note that I would never recommend drm/ksm to anybody who really cares about their console output.

Peter Zijlstra

12:24 p.m.

New subject: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

On Mon, 2012-09-24 at 14:17 +0200, Peter Zijlstra wrote:

...

On Tue, 2012-09-18 at 01:03 +0200, Daniel Vetter wrote:

...

In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either.

Not particularly evil, just plain broken.

See this series:

http://marc.info/?l=linux-kernel&m=132446644123326&w=2

In particular see patches 1-4. The problem with lockdep is that lockdep uses printk and printk's recursion stuff is utterly broken.

Console drivers being on crack simply doesn't help :-) Note that I would never recommend drm/ksm to anybody who really cares about their console output.

Also, don't cross-post with a subscribe only list.

Daniel Vetter

12:54 p.m.

New subject: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

On Mon, Sep 24, 2012 at 2:24 PM, Peter Zijlstra peterz@infradead.org wrote:

...

On Mon, 2012-09-24 at 14:17 +0200, Peter Zijlstra wrote:

...
On Tue, 2012-09-18 at 01:03 +0200, Daniel Vetter wrote:

...

In the printk code there's a special trylock, only used to kick off the logbuffer printk'ing in console_unlock. But all that happens while lockdep is disable (since printk does a few other evil tricks). So no issue there, either.

Not particularly evil, just plain broken.

See this series:

http://marc.info/?l=linux-kernel&m=132446644123326&w=2

In particular see patches 1-4. The problem with lockdep is that lockdep uses printk and printk's recursion stuff is utterly broken.

Console drivers being on crack simply doesn't help :-) Note that I would never recommend drm/ksm to anybody who really cares about their console output.

I've read through the patches and I'm hoping you don't volunteer me to pick these up ... ;-) But there doesn't seem to be anything that would get worse through this lockdep annotation patch, right?

...

Also, don't cross-post with a subscribe only list.

Sorry, forgot about this, I'll try to fix this - we really need an intel-gfx that's open to people not subscribed to it.

Thanks, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Peter Zijlstra

1:10 p.m.

New subject: [PATCH 2/2] [RESEND] console: implement lockdep support for console_lock

On Mon, 2012-09-24 at 14:54 +0200, Daniel Vetter wrote:

...

I've read through the patches and I'm hoping you don't volunteer me to pick these up ... ;-)

Worth a try, right? :-)

...

But there doesn't seem to be anything that would get worse through this lockdep annotation patch, right?

No indeed, your patch looks fine, I just wanted to comment on the printk() thing.

4583

Age (days ago)

4598

Last active (days ago)

dri-devel@lists.freedesktop.org

13 comments

6 participants

tags (0)

participants (6)

Daniel Vetter
Daniel Vetter
Greg KH
Jani Nikula
Peter Zijlstra
Peter Zijlstra