On Wed, Feb 16, 2022 at 01:17:03PM +0900, Byungchul Park wrote:
[ 7.013330] =================================================== [ 7.013331] DEPT: Circular dependency has been detected. [ 7.013332] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G W [ 7.013333] --------------------------------------------------- [ 7.013334] summary [ 7.013334] --------------------------------------------------- [ 7.013335] *** DEADLOCK *** [ 7.013335] [ 7.013335] context A [ 7.013336] [S] (unknown)(&(&ei->socket.wq.wait)->dmap:0) [ 7.013337] [W] __mutex_lock_common(&u->iolock:0) [ 7.013338] [E] event(&(&ei->socket.wq.wait)->dmap:0) [ 7.013340] [ 7.013340] context B [ 7.013341] [S] __raw_spin_lock(&u->lock:0) [ 7.013342] [W] wait(&(&ei->socket.wq.wait)->dmap:0) [ 7.013343] [E] spin_unlock(&u->lock:0)
This seems unlikely to be real. We're surely not actually waiting while holding a spinlock; existing debug checks would catch it.
[ 7.013407] --------------------------------------------------- [ 7.013407] context B's detail [ 7.013408] --------------------------------------------------- [ 7.013408] context B [ 7.013409] [S] __raw_spin_lock(&u->lock:0) [ 7.013410] [W] wait(&(&ei->socket.wq.wait)->dmap:0) [ 7.013411] [E] spin_unlock(&u->lock:0) [ 7.013412] [ 7.013412] [S] __raw_spin_lock(&u->lock:0): [ 7.013413] [<ffffffff81aa451f>] unix_stream_read_generic+0x6bf/0xb60 [ 7.013416] stacktrace: [ 7.013416] _raw_spin_lock+0x6e/0x90 [ 7.013418] unix_stream_read_generic+0x6bf/0xb60
It would be helpful if you'd run this through scripts/decode_stacktrace.sh so we could see line numbers instead of hex offsets (which arene't much use without the binary kernel).
[ 7.013420] unix_stream_recvmsg+0x40/0x50 [ 7.013422] sock_read_iter+0x85/0xd0 [ 7.013424] new_sync_read+0x162/0x180 [ 7.013426] vfs_read+0xf3/0x190 [ 7.013428] ksys_read+0xa6/0xc0 [ 7.013429] do_syscall_64+0x3a/0x90 [ 7.013431] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 7.013433] [ 7.013434] [W] wait(&(&ei->socket.wq.wait)->dmap:0): [ 7.013434] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0
... this may be the source of confusion. Just because we prepare to wait doesn't mean we end up actually waiting. For example, look at unix_wait_for_peer():
prepare_to_wait_exclusive(&u->peer_wait, &wait, TASK_INTERRUPTIBLE);
sched = !sock_flag(other, SOCK_DEAD) && !(other->sk_shutdown & RCV_SHUTDOWN) && unix_recvq_full(other);
unix_state_unlock(other);
if (sched) timeo = schedule_timeout(timeo);
finish_wait(&u->peer_wait, &wait);
We *prepare* to wait, *then* drop the lock, then actually schedule.
On Wed, Feb 16, 2022 at 04:33:41AM +0000, Matthew Wilcox wrote:
On Wed, Feb 16, 2022 at 01:17:03PM +0900, Byungchul Park wrote:
[ 7.013330] =================================================== [ 7.013331] DEPT: Circular dependency has been detected. [ 7.013332] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G W [ 7.013333] --------------------------------------------------- [ 7.013334] summary [ 7.013334] --------------------------------------------------- [ 7.013335] *** DEADLOCK *** [ 7.013335] [ 7.013335] context A [ 7.013336] [S] (unknown)(&(&ei->socket.wq.wait)->dmap:0) [ 7.013337] [W] __mutex_lock_common(&u->iolock:0) [ 7.013338] [E] event(&(&ei->socket.wq.wait)->dmap:0) [ 7.013340] [ 7.013340] context B [ 7.013341] [S] __raw_spin_lock(&u->lock:0) [ 7.013342] [W] wait(&(&ei->socket.wq.wait)->dmap:0) [ 7.013343] [E] spin_unlock(&u->lock:0)
This seems unlikely to be real. We're surely not actually waiting while holding a spinlock; existing debug checks would catch it.
[ 7.013407] --------------------------------------------------- [ 7.013407] context B's detail [ 7.013408] --------------------------------------------------- [ 7.013408] context B [ 7.013409] [S] __raw_spin_lock(&u->lock:0) [ 7.013410] [W] wait(&(&ei->socket.wq.wait)->dmap:0) [ 7.013411] [E] spin_unlock(&u->lock:0) [ 7.013412] [ 7.013412] [S] __raw_spin_lock(&u->lock:0): [ 7.013413] [<ffffffff81aa451f>] unix_stream_read_generic+0x6bf/0xb60 [ 7.013416] stacktrace: [ 7.013416] _raw_spin_lock+0x6e/0x90 [ 7.013418] unix_stream_read_generic+0x6bf/0xb60
It would be helpful if you'd run this through scripts/decode_stacktrace.sh
(Sorry for late reply, which was because of my email client issue.)
It was big help. Thank you very much.
so we could see line numbers instead of hex offsets (which arene't much use without the binary kernel).
[ 7.013420] unix_stream_recvmsg+0x40/0x50 [ 7.013422] sock_read_iter+0x85/0xd0 [ 7.013424] new_sync_read+0x162/0x180 [ 7.013426] vfs_read+0xf3/0x190 [ 7.013428] ksys_read+0xa6/0xc0 [ 7.013429] do_syscall_64+0x3a/0x90 [ 7.013431] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 7.013433] [ 7.013434] [W] wait(&(&ei->socket.wq.wait)->dmap:0): [ 7.013434] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0
... this may be the source of confusion. Just because we prepare to wait doesn't mean we end up actually waiting. For example, look at unix_wait_for_peer():
prepare_to_wait_exclusive(&u->peer_wait, &wait, TASK_INTERRUPTIBLE); sched = !sock_flag(other, SOCK_DEAD) && !(other->sk_shutdown & RCV_SHUTDOWN) && unix_recvq_full(other); unix_state_unlock(other); if (sched) timeo = schedule_timeout(timeo); finish_wait(&u->peer_wait, &wait);
We *prepare* to wait, *then* drop the lock, then actually schedule.
Big help, too. I checked some samples for the usage, but where it's almost "prepare == wait" :-(. Thanks a lot!
Thanks, Byungchul
dri-devel@lists.freedesktop.org