On Fri, Mar 18, 2022 at 04:51:29PM +0900, Byungchul Park wrote:
On Wed, Mar 16, 2022 at 09:30:02AM +0000, Hyeonggon Yoo wrote:
On Wed, Mar 16, 2022 at 01:32:13PM +0900, Byungchul Park wrote:
On Sat, Mar 12, 2022 at 01:53:26AM +0000, Hyeonggon Yoo wrote:
On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote:
Hi Linus and folks,
I've been developing a tool for detecting deadlock possibilities by tracking wait/event rather than lock(?) acquisition order to try to cover all synchonization machanisms. It's done on v5.17-rc1 tag.
https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
Small feedback unrelated to thread: I'm not sure "Need to expand the ring buffer" is something to call WARN(). Is this stack trace useful for something? ========
Hello Byungchul. These are two warnings of DEPT on system.
Hi Hyeonggon,
Could you run scripts/decode_stacktrace.sh and share the result instead of the raw format below if the reports still appear with PATCH v5? It'd be appreciated (:
Hi Byungchul.
on dept1.18_on_v5.17-rc7, the kernel_clone() warning has gone. There is one warning remaining on my system:
It warns when running kunit-try-catch-test testcase.
Hi Hyeonggon,
I can reproduce it thanks to you. I will let you know on all works done.
Hi Hyeonggon,
All works wrt this issue have been done. I've just updated the same branch.
https://github.com/lgebyungchulpark/linux-dept/commits/dept1.18_on_v5.17-rc7
This is just for your information.
Thanks, Byungchul
Thanks, Byungchul
=================================================== DEPT: Circular dependency has been detected. 5.17.0-rc7+ #4 Not tainted
summary
*** AA DEADLOCK ***
context A [S] (unknown)(&try_completion:0) [W] wait_for_completion_timeout(&try_completion:0) [E] complete(&try_completion:0)
[S]: start of the event context [W]: the wait blocked [E]: the event not reachable
context A's detail
context A [S] (unknown)(&try_completion:0) [W] wait_for_completion_timeout(&try_completion:0) [E] complete(&try_completion:0)
[S] (unknown)(&try_completion:0): (N/A)
[W] wait_for_completion_timeout(&try_completion:0): kunit_try_catch_run (lib/kunit/try-catch.c:78 (discriminator 1)) stacktrace: dept_wait (kernel/dependency/dept.c:2149) wait_for_completion_timeout (kernel/sched/completion.c:119 (discriminator 4) kernel/sched/completion.c:165 (discriminator 4)) kunit_try_catch_run (lib/kunit/try-catch.c:78 (discriminator 1)) kunit_test_try_catch_successful_try_no_catch (lib/kunit/kunit-test.c:43) kunit_try_run_case (lib/kunit/test.c:333 lib/kunit/test.c:374) kunit_generic_run_threadfn_adapter (lib/kunit/try-catch.c:30) kthread (kernel/kthread.c:379) ret_from_fork (arch/arm64/kernel/entry.S:757)
[E] complete(&try_completion:0): kthread_complete_and_exit (kernel/kthread.c:327) stacktrace: dept_event (kernel/dependency/dept.c:2376 (discriminator 2)) complete (kernel/sched/completion.c:33 (discriminator 4)) kthread_complete_and_exit (kernel/kthread.c:327) kunit_try_catch_throw (lib/kunit/try-catch.c:18) kthread (kernel/kthread.c:379) ret_from_fork (arch/arm64/kernel/entry.S:757)
information that might be helpful
Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0 (arch/arm64/kernel/stacktrace.c:186) show_stack (arch/arm64/kernel/stacktrace.c:193) dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) dump_stack (lib/dump_stack.c:114) print_circle (./arch/arm64/include/asm/atomic_ll_sc.h:112 ./arch/arm64/include/asm/atomic.h:30 ./include/linux/atomic/atomic-arch-fallback.h:511 ./include/linux/atomic/atomic-instrumented.h:258 kernel/dependency/dept.c:140 kernel/dependency/dept.c:748) cb_check_dl (kernel/dependency/dept.c:1083 kernel/dependency/dept.c:1064) bfs (kernel/dependency/dept.c:833) add_dep (kernel/dependency/dept.c:1409) do_event (kernel/dependency/dept.c:175 kernel/dependency/dept.c:1644) dept_event (kernel/dependency/dept.c:2376 (discriminator 2)) complete (kernel/sched/completion.c:33 (discriminator 4)) kthread_complete_and_exit (kernel/kthread.c:327) kunit_try_catch_throw (lib/kunit/try-catch.c:18) kthread (kernel/kthread.c:379) ret_from_fork (arch/arm64/kernel/entry.S:757)
-- Thank you, You are awesome! Hyeonggon :-)
https://lkml.org/lkml/2022/3/15/1277 (or https://github.com/lgebyungchulpark/linux-dept/commits/dept1.18_on_v5.17-rc7)
Thank you very much!
-- Byungchul
Both cases look similar.
In what case DEPT says (unknown)? I'm not sure we can properly debug this.
=================================================== DEPT: Circular dependency has been detected. 5.17.0-rc1+ #3 Tainted: G W
summary
*** AA DEADLOCK ***
context A [S] (unknown)(&vfork:0) [W] wait_for_completion_killable(&vfork:0) [E] complete(&vfork:0)
[S]: start of the event context [W]: the wait blocked [E]: the event not reachable
context A's detail
context A [S] (unknown)(&vfork:0) [W] wait_for_completion_killable(&vfork:0) [E] complete(&vfork:0)
[S] (unknown)(&vfork:0): (N/A)
[W] wait_for_completion_killable(&vfork:0): [<ffffffc00802204c>] kernel_clone+0x25c/0x2b8 stacktrace: dept_wait+0x74/0x88 wait_for_completion_killable+0x60/0xa0 kernel_clone+0x25c/0x2b8 __do_sys_clone+0x5c/0x74 __arm64_sys_clone+0x18/0x20 invoke_syscall.constprop.0+0x78/0xc4 do_el0_svc+0x98/0xd0 el0_svc+0x44/0xe4 el0t_64_sync_handler+0xb0/0x12c el0t_64_sync+0x158/0x15c
[E] complete(&vfork:0): [<ffffffc00801f49c>] mm_release+0x7c/0x90 stacktrace: dept_event+0xe0/0x100 complete+0x48/0x98 mm_release+0x7c/0x90 exit_mm_release+0xc/0x14 do_exit+0x1b4/0x81c do_group_exit+0x30/0x9c __wake_up_parent+0x0/0x24 invoke_syscall.constprop.0+0x78/0xc4 do_el0_svc+0x98/0xd0 el0_svc+0x44/0xe4 el0t_64_sync_handler+0xb0/0x12c el0t_64_sync+0x158/0x15c
information that might be helpful
CPU: 6 PID: 229 Comm: start-stop-daem Tainted: G W 5.17.0-rc1+ #3 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0x9c/0xc4 show_stack+0x14/0x28 dump_stack_lvl+0x9c/0xcc dump_stack+0x14/0x2c print_circle+0x2d4/0x438 cb_check_dl+0x44/0x70 bfs+0x60/0x168 add_dep+0x88/0x11c do_event.constprop.0+0x19c/0x2c0 dept_event+0xe0/0x100 complete+0x48/0x98 mm_release+0x7c/0x90 exit_mm_release+0xc/0x14 do_exit+0x1b4/0x81c do_group_exit+0x30/0x9c __wake_up_parent+0x0/0x24 invoke_syscall.constprop.0+0x78/0xc4 do_el0_svc+0x98/0xd0 el0_svc+0x44/0xe4 el0t_64_sync_handler+0xb0/0x12c el0t_64_sync+0x158/0x15c
=================================================== DEPT: Circular dependency has been detected. 5.17.0-rc1+ #3 Tainted: G W
summary
*** AA DEADLOCK ***
context A [S] (unknown)(&try_completion:0) [W] wait_for_completion_timeout(&try_completion:0) [E] complete(&try_completion:0)
[S]: start of the event context [W]: the wait blocked [E]: the event not reachable
context A's detail
context A [S] (unknown)(&try_completion:0) [W] wait_for_completion_timeout(&try_completion:0) [E] complete(&try_completion:0)
[S] (unknown)(&try_completion:0): (N/A)
[W] wait_for_completion_timeout(&try_completion:0): [<ffffffc008166bf4>] kunit_try_catch_run+0xb4/0x160 stacktrace: dept_wait+0x74/0x88 wait_for_completion_timeout+0x64/0xa0 kunit_try_catch_run+0xb4/0x160 kunit_test_try_catch_successful_try_no_catch+0x3c/0x98 kunit_try_run_case+0x9c/0xa0 kunit_generic_run_threadfn_adapter+0x1c/0x28 kthread+0xd4/0xe4 ret_from_fork+0x10/0x20
[E] complete(&try_completion:0): [<ffffffc00803dce4>] kthread_complete_and_exit+0x18/0x20 stacktrace: dept_event+0xe0/0x100 complete+0x48/0x98 kthread_complete_and_exit+0x18/0x20 kunit_try_catch_throw+0x0/0x1c kthread+0xd4/0xe4 ret_from_fork+0x10/0x20
information that might be helpful
CPU: 15 PID: 132 Comm: kunit_try_catch Tainted: G W 5.17.0-rc1+ #3 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace.part.0+0x9c/0xc4 show_stack+0x14/0x28 dump_stack_lvl+0x9c/0xcc dump_stack+0x14/0x2c print_circle+0x2d4/0x438 cb_check_dl+0x44/0x70 bfs+0x60/0x168 add_dep+0x88/0x11c do_event.constprop.0+0x19c/0x2c0 dept_event+0xe0/0x100 complete+0x48/0x98 kthread_complete_and_exit+0x18/0x20 kunit_try_catch_throw+0x0/0x1c kthread+0xd4/0xe4 ret_from_fork+0x10/0x20
Benifit:
- Works with all lock primitives.
- Works with wait_for_completion()/complete().
- Works with 'wait' on PG_locked.
- Works with 'wait' on PG_writeback.
- Works with swait/wakeup.
- Works with waitqueue.
- Multiple reports are allowed.
- Deduplication control on multiple reports.
- Withstand false positives thanks to 6.
- Easy to tag any wait/event.
Future work:
[...]
-- 1.9.1
-- Thank you, You are awesome! Hyeonggon :-)