On Thu 2019-05-02 16:16:43, Daniel Vetter wrote:
console_trylock, called from within printk, can be called from pretty much anywhere. Including try_to_wake_up. Note that this isn't common, usually the box is in pretty bad shape at that point already. But it really doesn't help when then lockdep jumps in and spams the logs, potentially obscuring the real backtrace we're really interested in. One case I've seen (slightly simplified backtrace):
Call Trace:
<IRQ> console_trylock+0xe/0x60 vprintk_emit+0xf1/0x320 printk+0x4d/0x69 __warn_printk+0x46/0x90 native_smp_send_reschedule+0x2f/0x40 check_preempt_curr+0x81/0xa0 ttwu_do_wakeup+0x14/0x220 try_to_wake_up+0x218/0x5f0 pollwake+0x6f/0x90 credit_entropy_bits+0x204/0x310 add_interrupt_randomness+0x18f/0x210 handle_irq+0x67/0x160 do_IRQ+0x5e/0x130 common_interrupt+0xf/0xf </IRQ>
This alone isn't a problem, but the spinlock in the semaphore is also still held while waking up waiters (up() -> __up() -> try_to_wake_up() callchain), which then closes the runqueue vs. semaphore.lock loop, and upsets lockdep, which issues a circular locking splat to dmesg. Worse it upsets developers, since we don't want to spam dmesg with clutter when the machine is dying already.
Fix this by creating a __down_trylock which only trylocks the semaphore.lock. This isn't correct in full generality, but good enough for console_lock:
there's only ever one console_lock holder, we won't fail spuriously because someone is doing a down() or up() while there's still room (unlike other semaphores with count > 1).
console_unlock() has one massive retry loop, which will catch anyone who races the trylock against the up(). This makes sure that no printk lines will get lost. Making the trylock more racy therefore has no further impact.
To be honest, I do not see how this could solve the problem.
The circular dependency is still there. If the new __down_trylock() succeeds then console_unlock() will get called in the same context and it will still need to call up() -> try_to_wake_up().
Note that there are many other console_lock() callers that might happen in parallel and might appear in the wait queue.
Best Regards, Petr