On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
I didn't want to bother you so I was planning to send the next spin after making more progress. However, PATCH v2 reports too many false positives because Dept tracked the bit_wait_table[] wrong way - I apologize for that. So I decided to send PATCH v3 first before going further for those who want to run Dept for now.
There might still be some false positives but not overwhelming.
Hello Byungchul, I'm running DEPT v3 on my system and I see report below.
Looking at the kmemleak code and comment, I think kmemleak tried to avoid lockdep recursive warning but detected by DEPT?
=================================================== DEPT: Circular dependency has been detected. 5.17.0-rc1+ #1 Tainted: G W
summary
*** AA DEADLOCK ***
context A [S] __raw_spin_lock_irqsave(&object->lock:0) [W] _raw_spin_lock_nested(&object->lock:0) [E] spin_unlock(&object->lock:0)
[S]: start of the event context [W]: the wait blocked [E]: the event not reachable
context A's detail
context A [S] __raw_spin_lock_irqsave(&object->lock:0) [W] _raw_spin_lock_nested(&object->lock:0) [E] spin_unlock(&object->lock:0)
[S] __raw_spin_lock_irqsave(&object->lock:0): [<ffffffc00810302c>] scan_gray_list+0x84/0x13c stacktrace: dept_ecxt_enter+0x88/0xf4 _raw_spin_lock_irqsave+0xf0/0x1c4 scan_gray_list+0x84/0x13c kmemleak_scan+0x2d8/0x54c kmemleak_scan_thread+0xac/0xd4 kthread+0xd4/0xe4 ret_from_fork+0x10/0x20
[W]'s stack trace is missed. But I guess this issue is the same issue of what you reported following this one. We can discuss this issue on the other report's thread.
Thanks, Byunghcul
[E] spin_unlock(&object->lock:0): [<ffffffc008102ee0>] scan_block+0x60/0x128
information that might be helpful
CPU: 1 PID: 38 Comm: kmemleak Tainted: G W 5.17.0-rc1+ #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0x9c/0xc4 show_stack+0x14/0x28 dump_stack_lvl+0x9c/0xcc dump_stack+0x14/0x2c print_circle+0x2d4/0x438 cb_check_dl+0x44/0x70 bfs+0x60/0x168 add_dep+0x88/0x11c add_wait+0x2d0/0x2dc __dept_wait+0x8c/0xa4 dept_wait+0x6c/0x88 _raw_spin_lock_nested+0xa8/0x1b0 scan_block+0xb4/0x128 scan_gray_list+0xc4/0x13c kmemleak_scan+0x2d8/0x54c kmemleak_scan_thread+0xac/0xd4 kthread+0xd4/0xe4 ret_from_fork+0x10/0x20
Hi Linus and folks,
I've been developing a tool for detecting deadlock possibilities by tracking wait/event rather than lock(?) acquisition order to try to cover all synchonization machanisms. It's done on v5.17-rc1 tag.
https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
[...]
Benifit:
- Works with all lock primitives.
- Works with wait_for_completion()/complete().
- Works with 'wait' on PG_locked.
- Works with 'wait' on PG_writeback.
- Works with swait/wakeup.
- Works with waitqueue.
- Multiple reports are allowed.
- Deduplication control on multiple reports.
- Withstand false positives thanks to 6.
- Easy to tag any wait/event.
Future work:
- To make it more stable.
- To separates Dept from Lockdep.
- To improves performance in terms of time and space.
- To use Dept as a dependency engine for Lockdep.
- To add any missing tags of wait/event in the kernel.
- To deduplicate stack trace.
How to interpret reports:
- E(event) in each context cannot be triggered because of the W(wait) that cannot be woken.
- The stack trace helping find the problematic code is located in each conext's detail.
Thanks, Byungchul
Changes from v2:
- Disable Dept on bit_wait_table[] in sched/wait_bit.c reporting a lot of false positives, which is my fault. Wait/event for bit_wait_table[] should've been tagged in a higher layer for better work, which is a future work. (feedback from Jan Kara)
- Disable Dept on crypto_larval's completion to prevent a false positive.
Changes from v1:
- Fix coding style and typo. (feedback from Steven)
- Distinguish each work context from another in workqueue.
- Skip checking lock acquisition with nest_lock, which is about correct lock usage that should be checked by Lockdep.
Changes from RFC:
- Prevent adding a wait tag at prepare_to_wait() but __schedule(). (feedback from Linus and Matthew)
- Use try version at lockdep_acquire_cpus_lock() annotation.
- Distinguish each syscall context from another.
[ ... ]
-- Thank you, You are awesome! Hyeonggon :-)