Hello,
I'm getting some fence refcounting related panics with the current Linus's master branch:
It happens immediately whenever I start Xorg or sway.
Anyone has any ideas where to start looking? It works fine with v5.15.
(sorry for the interleaved log, it's coming from multiple CPUs at once I guess)
kind regards, o.
------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 4 PID: 560 at lib/refcount.c:28 refcount_warn_saturate+0xec/0x140 Modules linked in: CPU: 4 PID: 560 Comm: sway Not tainted 5.15.0-13547-g5169ae41ace0 #24 Hardware name: Pine64 PinePhonePro (DT) pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : refcount_warn_saturate+0xec/0x140 lr : refcount_warn_saturate+0xec/0x140 sp : ffff8000127b3be0 x29: ffff8000127b3be0 x28: ffff8000127b3d50 x27: ffff00001927e700 x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000004 x23: ffff00001e31da80 x22: ffff000005497580 x21: ffff00001e31da90 x20: ffff00001e31da80 x19: ffff00001e31da90 x18: 0000000000000003 x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000127b3b68 x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520 x11: 3b776f6c66726564 x10: ffff800011d7e8a0 x9 : ffff800010178a1c x8 : 00000000ffffefff x7 : ffff800011dd68a0 x6 : 0000000000000001 x5 : ffff0000f778e788 x4 : 0000000000000000 x3 : 0000000000000027 x2 : 0000000000000023 x1 : ffff0000f778e790 x0 : 0000000000000026 Call trace: refcount_warn_saturate+0xec/0x140 drm_syncobj_replace_fence+0x16c/0x17c panfrost_ioctl_submit+0x364/0x440 drm_ioctl_kernel+0x9c/0x154 drm_ioctl+0x1f0/0x410 __arm64_sys_ioctl+0xb4/0xdc invoke_syscall+0x4c/0x110 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x2c/0x90 el0_svc+0x14/0x50 el0t_64_sync_handler+0x9c/0x120 el0t_64_sync+0x158/0x15c ---[ end trace 51cdc14807ba9222 ]--- ------------[ cut here ]------------ Unable to handle kernel write to read-only memory at virtual address ffff800010820b10 refcount_t: saturated; leaking memory. Mem abort info: WARNING: CPU: 1 PID: 223 at lib/refcount.c:22 refcount_warn_saturate+0x6c/0x140 ESR = 0x9600004e Modules linked in: EC = 0x25: DABT (current EL), IL = 32 bits
CPU: 1 PID: 223 Comm: pan_js Tainted: G W 5.15.0-13547-g5169ae41ace0 #24 SET = 0, FnV = 0 Hardware name: Pine64 PinePhonePro (DT) EA = 0, S1PTW = 0 pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) FSC = 0x0e: level 2 permission fault pc : refcount_warn_saturate+0x6c/0x140 Data abort info: lr : refcount_warn_saturate+0x6c/0x140 ISV = 0, ISS = 0x0000004e sp : ffff800012a2bd90 CM = 0, WnR = 1 x29: ffff800012a2bd90 swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000019ba000 x28: 0000000000000000 [ffff800010820b10] pgd=10000000f7fff003 x27: 0000000000000000 , p4d=10000000f7fff003
, pud=10000000f7ffe003 x26: 0000000000000000 , pmd=0040000000a00781 x25: ffff800011906000
x24: ffff000013ee7a20 Internal error: Oops: 9600004e [#1] SMP
Modules linked in: x23: ffff8000108211e0
x22: ffff800011906000 CPU: 2 PID: 222 Comm: pan_js Tainted: G W 5.15.0-13547-g5169ae41ace0 #24 x21: ffff0000251ef000 Hardware name: Pine64 PinePhonePro (DT)
pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) x20: ffff000005497580 pc : dma_fence_add_callback+0xc8/0x120 x19: ffff00001532b4c0 lr : dma_fence_add_callback+0x78/0x120 x18: 0000000000000000 sp : ffff800012a23d60
x29: ffff800012a23d60 x17: 000000040044ffff x28: 0000000000000000 x16: 00000032b5503510 x27: 0000000000000000 x15: 0000000000000000
x26: 0000000000000000 x14: ffff00000550c380 x25: ffff800011906000 x13: 2e79726f6d656d20 x24: 0000000000000000 x12: 676e696b61656c20
x23: 0000000000000000 x11: 3b64657461727574 x22: ffff8000108211e0 x10: 6173203a745f746e x21: ffff0000054975d0 x9 : ffff80001022e51c
x20: ffff00001532b468 x8 : 0000000000000001 x19: ffff000005497580 x7 : 0000000000000e08 x18: 0000000000000000 x6 : 0000000000000001
x17: 000000040044ffff x5 : 0000000000000000 x16: 00000032b5503510 x4 : ffff0000f773a788 x15: 0000000000000000 x3 : ffff0000f77466f0
x14: ffff00000550d100 x2 : ffff0000f773a788 x13: ffff8000e5e4e000 x1 : ffff8000e5e32000 x12: 0000000034d4d91d x0 : 0000000000000026
x11: 0000000000000000 Call trace: x10: 0000000000000002 refcount_warn_saturate+0x6c/0x140 x9 : ffff800010899578 drm_sched_entity_pop_job+0x418/0x490
drm_sched_main+0xb0/0x41c x8 : ffff0000148dcd60 kthread+0x14c/0x160 x7 : 0000000000000000 ret_from_fork+0x10/0x20 x6 : 00000000010a4760 ---[ end trace 51cdc14807ba9223 ]---
x5 : ffff000013ee79f8 x4 : 0000000000000001 x3 : ffff0000054975b0 x2 : 0000000000000000 x1 : ffff800010820b10 x0 : ffff000005497590 Call trace: dma_fence_add_callback+0xc8/0x120 drm_sched_entity_pop_job+0xa4/0x490 drm_sched_main+0xb0/0x41c kthread+0x14c/0x160 ret_from_fork+0x10/0x20 Code: 91004260 f9400e61 f9000e74 a9000680 (f9000034) ---[ end trace 51cdc14807ba9224 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs SMP: failed to stop secondary CPUs 1-2 Kernel Offset: disabled CPU features: 0x2,00004042,40000806 Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
Hi Ondrej,
On Mon, 15 Nov 2021 at 07:35, Ondřej Jirman megi@xff.cz wrote:
I'm getting some fence refcounting related panics with the current Linus's master branch:
It happens immediately whenever I start Xorg or sway.
Anyone has any ideas where to start looking? It works fine with v5.15.
(sorry for the interleaved log, it's coming from multiple CPUs at once I guess)
Thanks a lot for the report - are you able to bisect this please?
Cheers, Daniel
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
which Christian pushed to drm-misc-next instead of drm-misc-fixes. I already asked Christian in some other thread to cherry-pick it over. -Daniel
On Mon, Nov 15, 2021 at 3:56 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Ondrej,
On Mon, 15 Nov 2021 at 07:35, Ondřej Jirman megi@xff.cz wrote:
I'm getting some fence refcounting related panics with the current Linus's master branch:
It happens immediately whenever I start Xorg or sway.
Anyone has any ideas where to start looking? It works fine with v5.15.
(sorry for the interleaved log, it's coming from multiple CPUs at once I guess)
Thanks a lot for the report - are you able to bisect this please?
Cheers, Daniel
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Mon, Nov 15, 2021 at 04:05:02PM +0100, Daniel Vetter wrote:
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
Thank you, that fixed the panic. :)
kind regards, Ondrej
which Christian pushed to drm-misc-next instead of drm-misc-fixes. I already asked Christian in some other thread to cherry-pick it over. -Daniel
On Mon, Nov 15, 2021 at 3:56 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Ondrej,
On Mon, 15 Nov 2021 at 07:35, Ondřej Jirman megi@xff.cz wrote:
I'm getting some fence refcounting related panics with the current Linus's master branch:
It happens immediately whenever I start Xorg or sway.
Anyone has any ideas where to start looking? It works fine with v5.15.
(sorry for the interleaved log, it's coming from multiple CPUs at once I guess)
Thanks a lot for the report - are you able to bisect this please?
Cheers, Daniel
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Mon, Nov 15, 2021 at 05:04:36PM +0100, megi xff wrote:
On Mon, Nov 15, 2021 at 04:05:02PM +0100, Daniel Vetter wrote:
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
Thank you, that fixed the panic. :)
I spoke too soon. Panic is gone, but I still see (immediately after starting Xorg):
[ 13.290795] ------------[ cut here ]------------ [ 13.291103] refcount_t: addition on 0; use-after-free. [ 13.291495] WARNING: CPU: 5 PID: 548 at lib/refcount.c:25 refcount_warn_saturate+0x98/0x140 [ 13.292124] Modules linked in: [ 13.292285] CPU: 5 PID: 548 Comm: Xorg Not tainted 5.16.0-rc1-00414-g21a254904a26 #29 [ 13.292857] Hardware name: Pine64 PinePhonePro (DT) [ 13.293172] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.293669] pc : refcount_warn_saturate+0x98/0x140 [ 13.293977] lr : refcount_warn_saturate+0x98/0x140 [ 13.294285] sp : ffff8000129a3b50 [ 13.294464] x29: ffff8000129a3b50 x28: ffff8000129a3d50 x27: ffff000017ec4b00 [ 13.294979] x26: 0000000000000001 x25: 0000000000000001 x24: ffff0000127cca48 [ 13.295494] x23: ffff000017d19b00 x22: 000000000000000a x21: 0000000000000001 [ 13.296006] x20: ffff000017e15500 x19: ffff000012980580 x18: 0000000000000003 [ 13.296520] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58 [ 13.297033] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520 [ 13.297546] x11: 3b30206e6f206e6f x10: ffff800011d6e8a0 x9 : ffff80001022f37c [ 13.298059] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001 [ 13.298573] x5 : 0000000000000000 x4 : ffff0000f77a9788 x3 : ffff0000f77b56f0 [ 13.299085] x2 : ffff0000f77a9788 x1 : ffff8000e5eb1000 x0 : 000000000000002a [ 13.299600] Call trace: [ 13.299704] refcount_warn_saturate+0x98/0x140 [ 13.299981] drm_sched_job_add_implicit_dependencies+0x90/0xdc [ 13.300385] panfrost_job_push+0xd0/0x1d4 [ 13.300628] panfrost_ioctl_submit+0x34c/0x440 [ 13.300906] drm_ioctl_kernel+0x9c/0x154 [ 13.301142] drm_ioctl+0x1f0/0x410 [ 13.301330] __arm64_sys_ioctl+0xb4/0xdc [ 13.301566] invoke_syscall+0x4c/0x110 [ 13.301787] el0_svc_common.constprop.0+0x48/0xf0 [ 13.302090] do_el0_svc+0x2c/0x90 [ 13.302271] el0_svc+0x14/0x50 [ 13.302431] el0t_64_sync_handler+0x9c/0x120 [ 13.302693] el0t_64_sync+0x158/0x15c [ 13.302904] ---[ end trace 8c211e57f89714c8 ]--- [ 13.303211] ------------[ cut here ]------------ [ 13.303504] refcount_t: underflow; use-after-free. [ 13.303820] WARNING: CPU: 5 PID: 548 at lib/refcount.c:28 refcount_warn_saturate+0xec/0x140 [ 13.304439] Modules linked in: [ 13.304596] CPU: 5 PID: 548 Comm: Xorg Tainted: G W 5.16.0-rc1-00414-g21a254904a26 #29 [ 13.305286] Hardware name: Pine64 PinePhonePro (DT) [ 13.305600] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.306095] pc : refcount_warn_saturate+0xec/0x140 [ 13.306402] lr : refcount_warn_saturate+0xec/0x140 [ 13.306710] sp : ffff8000129a3b70 [ 13.306887] x29: ffff8000129a3b70 x28: ffff8000129a3d50 x27: ffff000017ec4b00 [ 13.307401] x26: 0000000000000001 x25: 0000000000000001 x24: 0000000000000000 [ 13.307914] x23: 00000000ffffffff x22: ffff0000129807c0 x21: ffff000012980580 [ 13.308428] x20: ffff000017c54d00 x19: 0000000000000000 x18: 0000000000000003 [ 13.308942] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58 [ 13.309454] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520 [ 13.309967] x11: 3b776f6c66726564 x10: ffff800011d6e8a0 x9 : ffff80001017893c [ 13.310480] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001 [ 13.310993] x5 : ffff0000f77a9788 x4 : 0000000000000000 x3 : 0000000000000027 [ 13.311506] x2 : 0000000000000023 x1 : ffff0000f77a9790 x0 : 0000000000000026 [ 13.312020] Call trace: [ 13.312123] refcount_warn_saturate+0xec/0x140 [ 13.312401] dma_resv_add_excl_fence+0x1a8/0x1bc [ 13.312700] panfrost_job_push+0x174/0x1d4 [ 13.312949] panfrost_ioctl_submit+0x34c/0x440 [ 13.313229] drm_ioctl_kernel+0x9c/0x154 [ 13.313464] drm_ioctl+0x1f0/0x410 [ 13.313651] __arm64_sys_ioctl+0xb4/0xdc [ 13.313884] invoke_syscall+0x4c/0x110 [ 13.314103] el0_svc_common.constprop.0+0x48/0xf0 [ 13.314405] do_el0_svc+0x2c/0x90 [ 13.314586] el0_svc+0x14/0x50 [ 13.314745] el0t_64_sync_handler+0x9c/0x120 [ 13.315007] el0t_64_sync+0x158/0x15c [ 13.315217] ---[ end trace 8c211e57f89714c9 ]---
In dmesg. So this looks like some independent issue.
kind regards, o.
kind regards, Ondrej
which Christian pushed to drm-misc-next instead of drm-misc-fixes. I already asked Christian in some other thread to cherry-pick it over. -Daniel
On Mon, Nov 15, 2021 at 3:56 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Ondrej,
On Mon, 15 Nov 2021 at 07:35, Ondřej Jirman megi@xff.cz wrote:
I'm getting some fence refcounting related panics with the current Linus's master branch:
It happens immediately whenever I start Xorg or sway.
Anyone has any ideas where to start looking? It works fine with v5.15.
(sorry for the interleaved log, it's coming from multiple CPUs at once I guess)
Thanks a lot for the report - are you able to bisect this please?
Cheers, Daniel
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Mon, Nov 15, 2021 at 8:16 AM Ondřej Jirman megi@xff.cz wrote:
On Mon, Nov 15, 2021 at 05:04:36PM +0100, megi xff wrote:
On Mon, Nov 15, 2021 at 04:05:02PM +0100, Daniel Vetter wrote:
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
Thank you, that fixed the panic. :)
I spoke too soon. Panic is gone, but I still see (immediately after starting Xorg):
[ 13.290795] ------------[ cut here ]------------ [ 13.291103] refcount_t: addition on 0; use-after-free. [ 13.291495] WARNING: CPU: 5 PID: 548 at lib/refcount.c:25 refcount_warn_saturate+0x98/0x140 [ 13.292124] Modules linked in: [ 13.292285] CPU: 5 PID: 548 Comm: Xorg Not tainted 5.16.0-rc1-00414-g21a254904a26 #29 [ 13.292857] Hardware name: Pine64 PinePhonePro (DT) [ 13.293172] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.293669] pc : refcount_warn_saturate+0x98/0x140 [ 13.293977] lr : refcount_warn_saturate+0x98/0x140 [ 13.294285] sp : ffff8000129a3b50 [ 13.294464] x29: ffff8000129a3b50 x28: ffff8000129a3d50 x27: ffff000017ec4b00 [ 13.294979] x26: 0000000000000001 x25: 0000000000000001 x24: ffff0000127cca48 [ 13.295494] x23: ffff000017d19b00 x22: 000000000000000a x21: 0000000000000001 [ 13.296006] x20: ffff000017e15500 x19: ffff000012980580 x18: 0000000000000003 [ 13.296520] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58 [ 13.297033] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520 [ 13.297546] x11: 3b30206e6f206e6f x10: ffff800011d6e8a0 x9 : ffff80001022f37c [ 13.298059] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001 [ 13.298573] x5 : 0000000000000000 x4 : ffff0000f77a9788 x3 : ffff0000f77b56f0 [ 13.299085] x2 : ffff0000f77a9788 x1 : ffff8000e5eb1000 x0 : 000000000000002a [ 13.299600] Call trace: [ 13.299704] refcount_warn_saturate+0x98/0x140 [ 13.299981] drm_sched_job_add_implicit_dependencies+0x90/0xdc [ 13.300385] panfrost_job_push+0xd0/0x1d4 [ 13.300628] panfrost_ioctl_submit+0x34c/0x440 [ 13.300906] drm_ioctl_kernel+0x9c/0x154 [ 13.301142] drm_ioctl+0x1f0/0x410 [ 13.301330] __arm64_sys_ioctl+0xb4/0xdc [ 13.301566] invoke_syscall+0x4c/0x110 [ 13.301787] el0_svc_common.constprop.0+0x48/0xf0 [ 13.302090] do_el0_svc+0x2c/0x90 [ 13.302271] el0_svc+0x14/0x50 [ 13.302431] el0t_64_sync_handler+0x9c/0x120 [ 13.302693] el0t_64_sync+0x158/0x15c [ 13.302904] ---[ end trace 8c211e57f89714c8 ]--- [ 13.303211] ------------[ cut here ]------------ [ 13.303504] refcount_t: underflow; use-after-free. [ 13.303820] WARNING: CPU: 5 PID: 548 at lib/refcount.c:28 refcount_warn_saturate+0xec/0x140 [ 13.304439] Modules linked in: [ 13.304596] CPU: 5 PID: 548 Comm: Xorg Tainted: G W 5.16.0-rc1-00414-g21a254904a26 #29 [ 13.305286] Hardware name: Pine64 PinePhonePro (DT) [ 13.305600] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.306095] pc : refcount_warn_saturate+0xec/0x140 [ 13.306402] lr : refcount_warn_saturate+0xec/0x140 [ 13.306710] sp : ffff8000129a3b70 [ 13.306887] x29: ffff8000129a3b70 x28: ffff8000129a3d50 x27: ffff000017ec4b00 [ 13.307401] x26: 0000000000000001 x25: 0000000000000001 x24: 0000000000000000 [ 13.307914] x23: 00000000ffffffff x22: ffff0000129807c0 x21: ffff000012980580 [ 13.308428] x20: ffff000017c54d00 x19: 0000000000000000 x18: 0000000000000003 [ 13.308942] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58 [ 13.309454] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520 [ 13.309967] x11: 3b776f6c66726564 x10: ffff800011d6e8a0 x9 : ffff80001017893c [ 13.310480] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001 [ 13.310993] x5 : ffff0000f77a9788 x4 : 0000000000000000 x3 : 0000000000000027 [ 13.311506] x2 : 0000000000000023 x1 : ffff0000f77a9790 x0 : 0000000000000026 [ 13.312020] Call trace: [ 13.312123] refcount_warn_saturate+0xec/0x140 [ 13.312401] dma_resv_add_excl_fence+0x1a8/0x1bc [ 13.312700] panfrost_job_push+0x174/0x1d4 [ 13.312949] panfrost_ioctl_submit+0x34c/0x440 [ 13.313229] drm_ioctl_kernel+0x9c/0x154 [ 13.313464] drm_ioctl+0x1f0/0x410 [ 13.313651] __arm64_sys_ioctl+0xb4/0xdc [ 13.313884] invoke_syscall+0x4c/0x110 [ 13.314103] el0_svc_common.constprop.0+0x48/0xf0 [ 13.314405] do_el0_svc+0x2c/0x90 [ 13.314586] el0_svc+0x14/0x50 [ 13.314745] el0t_64_sync_handler+0x9c/0x120 [ 13.315007] el0t_64_sync+0x158/0x15c [ 13.315217] ---[ end trace 8c211e57f89714c9 ]---
In dmesg. So this looks like some independent issue.
I'm seeing something similar with drm/msm, which is, I think, due to the introduction and location of call to drm_sched_job_arm().. I'm still trying to untangle where it should go, but I think undoing 357285a2d1c0 ("drm/msm: Improve drm/sched point of no return rules") would fix it
BR, -R
On Mon, Nov 15, 2021 at 2:43 PM Rob Clark robdclark@gmail.com wrote:
On Mon, Nov 15, 2021 at 8:16 AM Ondřej Jirman megi@xff.cz wrote:
On Mon, Nov 15, 2021 at 05:04:36PM +0100, megi xff wrote:
On Mon, Nov 15, 2021 at 04:05:02PM +0100, Daniel Vetter wrote:
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
Thank you, that fixed the panic. :)
I spoke too soon. Panic is gone, but I still see (immediately after starting Xorg):
[ 13.290795] ------------[ cut here ]------------ [ 13.291103] refcount_t: addition on 0; use-after-free. [ 13.291495] WARNING: CPU: 5 PID: 548 at lib/refcount.c:25 refcount_warn_saturate+0x98/0x140 [ 13.292124] Modules linked in: [ 13.292285] CPU: 5 PID: 548 Comm: Xorg Not tainted 5.16.0-rc1-00414-g21a254904a26 #29 [ 13.292857] Hardware name: Pine64 PinePhonePro (DT) [ 13.293172] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.293669] pc : refcount_warn_saturate+0x98/0x140 [ 13.293977] lr : refcount_warn_saturate+0x98/0x140 [ 13.294285] sp : ffff8000129a3b50 [ 13.294464] x29: ffff8000129a3b50 x28: ffff8000129a3d50 x27: ffff000017ec4b00 [ 13.294979] x26: 0000000000000001 x25: 0000000000000001 x24: ffff0000127cca48 [ 13.295494] x23: ffff000017d19b00 x22: 000000000000000a x21: 0000000000000001 [ 13.296006] x20: ffff000017e15500 x19: ffff000012980580 x18: 0000000000000003 [ 13.296520] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58 [ 13.297033] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520 [ 13.297546] x11: 3b30206e6f206e6f x10: ffff800011d6e8a0 x9 : ffff80001022f37c [ 13.298059] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001 [ 13.298573] x5 : 0000000000000000 x4 : ffff0000f77a9788 x3 : ffff0000f77b56f0 [ 13.299085] x2 : ffff0000f77a9788 x1 : ffff8000e5eb1000 x0 : 000000000000002a [ 13.299600] Call trace: [ 13.299704] refcount_warn_saturate+0x98/0x140 [ 13.299981] drm_sched_job_add_implicit_dependencies+0x90/0xdc [ 13.300385] panfrost_job_push+0xd0/0x1d4 [ 13.300628] panfrost_ioctl_submit+0x34c/0x440 [ 13.300906] drm_ioctl_kernel+0x9c/0x154 [ 13.301142] drm_ioctl+0x1f0/0x410 [ 13.301330] __arm64_sys_ioctl+0xb4/0xdc [ 13.301566] invoke_syscall+0x4c/0x110 [ 13.301787] el0_svc_common.constprop.0+0x48/0xf0 [ 13.302090] do_el0_svc+0x2c/0x90 [ 13.302271] el0_svc+0x14/0x50 [ 13.302431] el0t_64_sync_handler+0x9c/0x120 [ 13.302693] el0t_64_sync+0x158/0x15c [ 13.302904] ---[ end trace 8c211e57f89714c8 ]--- [ 13.303211] ------------[ cut here ]------------ [ 13.303504] refcount_t: underflow; use-after-free. [ 13.303820] WARNING: CPU: 5 PID: 548 at lib/refcount.c:28 refcount_warn_saturate+0xec/0x140 [ 13.304439] Modules linked in: [ 13.304596] CPU: 5 PID: 548 Comm: Xorg Tainted: G W 5.16.0-rc1-00414-g21a254904a26 #29 [ 13.305286] Hardware name: Pine64 PinePhonePro (DT) [ 13.305600] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.306095] pc : refcount_warn_saturate+0xec/0x140 [ 13.306402] lr : refcount_warn_saturate+0xec/0x140 [ 13.306710] sp : ffff8000129a3b70 [ 13.306887] x29: ffff8000129a3b70 x28: ffff8000129a3d50 x27: ffff000017ec4b00 [ 13.307401] x26: 0000000000000001 x25: 0000000000000001 x24: 0000000000000000 [ 13.307914] x23: 00000000ffffffff x22: ffff0000129807c0 x21: ffff000012980580 [ 13.308428] x20: ffff000017c54d00 x19: 0000000000000000 x18: 0000000000000003 [ 13.308942] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58 [ 13.309454] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520 [ 13.309967] x11: 3b776f6c66726564 x10: ffff800011d6e8a0 x9 : ffff80001017893c [ 13.310480] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001 [ 13.310993] x5 : ffff0000f77a9788 x4 : 0000000000000000 x3 : 0000000000000027 [ 13.311506] x2 : 0000000000000023 x1 : ffff0000f77a9790 x0 : 0000000000000026 [ 13.312020] Call trace: [ 13.312123] refcount_warn_saturate+0xec/0x140 [ 13.312401] dma_resv_add_excl_fence+0x1a8/0x1bc [ 13.312700] panfrost_job_push+0x174/0x1d4 [ 13.312949] panfrost_ioctl_submit+0x34c/0x440 [ 13.313229] drm_ioctl_kernel+0x9c/0x154 [ 13.313464] drm_ioctl+0x1f0/0x410 [ 13.313651] __arm64_sys_ioctl+0xb4/0xdc [ 13.313884] invoke_syscall+0x4c/0x110 [ 13.314103] el0_svc_common.constprop.0+0x48/0xf0 [ 13.314405] do_el0_svc+0x2c/0x90 [ 13.314586] el0_svc+0x14/0x50 [ 13.314745] el0t_64_sync_handler+0x9c/0x120 [ 13.315007] el0t_64_sync+0x158/0x15c [ 13.315217] ---[ end trace 8c211e57f89714c9 ]---
In dmesg. So this looks like some independent issue.
I'm seeing something similar with drm/msm, which is, I think, due to the introduction and location of call to drm_sched_job_arm().. I'm still trying to untangle where it should go, but I think undoing 357285a2d1c0 ("drm/msm: Improve drm/sched point of no return rules") would fix it
ok, disregard that above.. what actually seems to have fixed it for me is:
------------ diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 94fe51b3caa2..f91fb31ab7a7 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -704,12 +704,13 @@ int drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job, int ret;
dma_resv_for_each_fence(&cursor, obj->resv, write, fence) { - ret = drm_sched_job_add_dependency(job, fence); - if (ret) - return ret; - /* Make sure to grab an additional ref on the added fence */ dma_fence_get(fence); + ret = drm_sched_job_add_dependency(job, fence); + if (ret) { + dma_fence_put(fence); + return ret; + } } return 0; } ------------
The problem looks like that drm_sched_job_add_dependencies() was dropping the last ref before the dma_fence_get()..
Not sure if I should send a patch or if this can be squashed into the existing fix?
BR, -R
Am 16.11.21 um 00:04 schrieb Rob Clark:
On Mon, Nov 15, 2021 at 2:43 PM Rob Clark robdclark@gmail.com wrote:
On Mon, Nov 15, 2021 at 8:16 AM Ondřej Jirman megi@xff.cz wrote:
On Mon, Nov 15, 2021 at 05:04:36PM +0100, megi xff wrote:
On Mon, Nov 15, 2021 at 04:05:02PM +0100, Daniel Vetter wrote:
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
Thank you, that fixed the panic. :)
I spoke too soon. Panic is gone, but I still see (immediately after starting Xorg):
[ 13.290795] ------------[ cut here ]------------ [ 13.291103] refcount_t: addition on 0; use-after-free. [ 13.291495] WARNING: CPU: 5 PID: 548 at lib/refcount.c:25 refcount_warn_saturate+0x98/0x140 [ 13.292124] Modules linked in: [ 13.292285] CPU: 5 PID: 548 Comm: Xorg Not tainted 5.16.0-rc1-00414-g21a254904a26 #29 [ 13.292857] Hardware name: Pine64 PinePhonePro (DT) [ 13.293172] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.293669] pc : refcount_warn_saturate+0x98/0x140 [ 13.293977] lr : refcount_warn_saturate+0x98/0x140 [ 13.294285] sp : ffff8000129a3b50 [ 13.294464] x29: ffff8000129a3b50 x28: ffff8000129a3d50 x27: ffff000017ec4b00 [ 13.294979] x26: 0000000000000001 x25: 0000000000000001 x24: ffff0000127cca48 [ 13.295494] x23: ffff000017d19b00 x22: 000000000000000a x21: 0000000000000001 [ 13.296006] x20: ffff000017e15500 x19: ffff000012980580 x18: 0000000000000003 [ 13.296520] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58 [ 13.297033] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520 [ 13.297546] x11: 3b30206e6f206e6f x10: ffff800011d6e8a0 x9 : ffff80001022f37c [ 13.298059] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001 [ 13.298573] x5 : 0000000000000000 x4 : ffff0000f77a9788 x3 : ffff0000f77b56f0 [ 13.299085] x2 : ffff0000f77a9788 x1 : ffff8000e5eb1000 x0 : 000000000000002a [ 13.299600] Call trace: [ 13.299704] refcount_warn_saturate+0x98/0x140 [ 13.299981] drm_sched_job_add_implicit_dependencies+0x90/0xdc [ 13.300385] panfrost_job_push+0xd0/0x1d4 [ 13.300628] panfrost_ioctl_submit+0x34c/0x440 [ 13.300906] drm_ioctl_kernel+0x9c/0x154 [ 13.301142] drm_ioctl+0x1f0/0x410 [ 13.301330] __arm64_sys_ioctl+0xb4/0xdc [ 13.301566] invoke_syscall+0x4c/0x110 [ 13.301787] el0_svc_common.constprop.0+0x48/0xf0 [ 13.302090] do_el0_svc+0x2c/0x90 [ 13.302271] el0_svc+0x14/0x50 [ 13.302431] el0t_64_sync_handler+0x9c/0x120 [ 13.302693] el0t_64_sync+0x158/0x15c [ 13.302904] ---[ end trace 8c211e57f89714c8 ]--- [ 13.303211] ------------[ cut here ]------------ [ 13.303504] refcount_t: underflow; use-after-free. [ 13.303820] WARNING: CPU: 5 PID: 548 at lib/refcount.c:28 refcount_warn_saturate+0xec/0x140 [ 13.304439] Modules linked in: [ 13.304596] CPU: 5 PID: 548 Comm: Xorg Tainted: G W 5.16.0-rc1-00414-g21a254904a26 #29 [ 13.305286] Hardware name: Pine64 PinePhonePro (DT) [ 13.305600] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.306095] pc : refcount_warn_saturate+0xec/0x140 [ 13.306402] lr : refcount_warn_saturate+0xec/0x140 [ 13.306710] sp : ffff8000129a3b70 [ 13.306887] x29: ffff8000129a3b70 x28: ffff8000129a3d50 x27: ffff000017ec4b00 [ 13.307401] x26: 0000000000000001 x25: 0000000000000001 x24: 0000000000000000 [ 13.307914] x23: 00000000ffffffff x22: ffff0000129807c0 x21: ffff000012980580 [ 13.308428] x20: ffff000017c54d00 x19: 0000000000000000 x18: 0000000000000003 [ 13.308942] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000129a3b58 [ 13.309454] x14: ffffffffffffffff x13: 2e656572662d7265 x12: 7466612d65737520 [ 13.309967] x11: 3b776f6c66726564 x10: ffff800011d6e8a0 x9 : ffff80001017893c [ 13.310480] x8 : 00000000ffffefff x7 : ffff800011dc68a0 x6 : 0000000000000001 [ 13.310993] x5 : ffff0000f77a9788 x4 : 0000000000000000 x3 : 0000000000000027 [ 13.311506] x2 : 0000000000000023 x1 : ffff0000f77a9790 x0 : 0000000000000026 [ 13.312020] Call trace: [ 13.312123] refcount_warn_saturate+0xec/0x140 [ 13.312401] dma_resv_add_excl_fence+0x1a8/0x1bc [ 13.312700] panfrost_job_push+0x174/0x1d4 [ 13.312949] panfrost_ioctl_submit+0x34c/0x440 [ 13.313229] drm_ioctl_kernel+0x9c/0x154 [ 13.313464] drm_ioctl+0x1f0/0x410 [ 13.313651] __arm64_sys_ioctl+0xb4/0xdc [ 13.313884] invoke_syscall+0x4c/0x110 [ 13.314103] el0_svc_common.constprop.0+0x48/0xf0 [ 13.314405] do_el0_svc+0x2c/0x90 [ 13.314586] el0_svc+0x14/0x50 [ 13.314745] el0t_64_sync_handler+0x9c/0x120 [ 13.315007] el0t_64_sync+0x158/0x15c [ 13.315217] ---[ end trace 8c211e57f89714c9 ]---
In dmesg. So this looks like some independent issue.
I'm seeing something similar with drm/msm, which is, I think, due to the introduction and location of call to drm_sched_job_arm().. I'm still trying to untangle where it should go, but I think undoing 357285a2d1c0 ("drm/msm: Improve drm/sched point of no return rules") would fix it
ok, disregard that above.. what actually seems to have fixed it for me is:
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 94fe51b3caa2..f91fb31ab7a7 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -704,12 +704,13 @@ int drm_sched_job_add_implicit_dependencies(struct drm_sched_job *job, int ret;
dma_resv_for_each_fence(&cursor, obj->resv, write, fence) {
ret = drm_sched_job_add_dependency(job, fence);
if (ret)
return ret;
/* Make sure to grab an additional ref on the added fence */ dma_fence_get(fence);
ret = drm_sched_job_add_dependency(job, fence);
if (ret) {
dma_fence_put(fence);
return ret;
}} } return 0;
The problem looks like that drm_sched_job_add_dependencies() was dropping the last ref before the dma_fence_get()..
Not sure if I should send a patch or if this can be squashed into the existing fix?
Good catch. A separate patch would probably the best option.
Regards, Christian.
BR, -R
Am 15.11.21 um 16:05 schrieb Daniel Vetter:
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
which Christian pushed to drm-misc-next instead of drm-misc-fixes. I already asked Christian in some other thread to cherry-pick it over.
Sounds like you haven't seen my answer to that request.
I can't cherry pick the patch to drm-misc-fixes because the patch which broke things hasn't showed up in that branch yet causing a conflict.
Regards, Christian.
-Daniel
On Mon, Nov 15, 2021 at 3:56 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Ondrej,
On Mon, 15 Nov 2021 at 07:35, Ondřej Jirman megi@xff.cz wrote:
I'm getting some fence refcounting related panics with the current Linus's master branch:
It happens immediately whenever I start Xorg or sway.
Anyone has any ideas where to start looking? It works fine with v5.15.
(sorry for the interleaved log, it's coming from multiple CPUs at once I guess)
Thanks a lot for the report - are you able to bisect this please?
Cheers, Daniel
-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....
On Mon, Nov 15, 2021 at 9:23 PM Christian König christian.koenig@amd.com wrote:
Am 15.11.21 um 16:05 schrieb Daniel Vetter:
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
which Christian pushed to drm-misc-next instead of drm-misc-fixes. I already asked Christian in some other thread to cherry-pick it over.
Sounds like you haven't seen my answer to that request.
I can't cherry pick the patch to drm-misc-fixes because the patch which broke things hasn't showed up in that branch yet causing a conflict.
Yeah I asked Maxime to roll forward to -rc1 right after sending out this mail so you can do that. Which you could have done too :-) -Daniel
Regards, Christian.
-Daniel
On Mon, Nov 15, 2021 at 3:56 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Ondrej,
On Mon, 15 Nov 2021 at 07:35, Ondřej Jirman megi@xff.cz wrote:
I'm getting some fence refcounting related panics with the current Linus's master branch:
It happens immediately whenever I start Xorg or sway.
Anyone has any ideas where to start looking? It works fine with v5.15.
(sorry for the interleaved log, it's coming from multiple CPUs at once I guess)
Thanks a lot for the report - are you able to bisect this please?
Cheers, Daniel
-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....
Am 16.11.21 um 08:37 schrieb Daniel Vetter:
On Mon, Nov 15, 2021 at 9:23 PM Christian König christian.koenig@amd.com wrote:
Am 15.11.21 um 16:05 schrieb Daniel Vetter:
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
which Christian pushed to drm-misc-next instead of drm-misc-fixes. I already asked Christian in some other thread to cherry-pick it over.
Sounds like you haven't seen my answer to that request.
I can't cherry pick the patch to drm-misc-fixes because the patch which broke things hasn't showed up in that branch yet causing a conflict.
Yeah I asked Maxime to roll forward to -rc1 right after sending out this mail so you can do that.
I've pined him again just a second ago because a "dim update-branches" still doesn't show the patches from -rc1 this morning.
Which you could have done too :-)
Hui? I can push merges from upstream into drm-misc-fixes? ^^
Christian.
-Daniel
Regards, Christian.
-Daniel
On Mon, Nov 15, 2021 at 3:56 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Ondrej,
On Mon, 15 Nov 2021 at 07:35, Ondřej Jirman megi@xff.cz wrote:
I'm getting some fence refcounting related panics with the current Linus's master branch:
It happens immediately whenever I start Xorg or sway.
Anyone has any ideas where to start looking? It works fine with v5.15.
(sorry for the interleaved log, it's coming from multiple CPUs at once I guess)
Thanks a lot for the report - are you able to bisect this please?
Cheers, Daniel
-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....
On Tue, Nov 16, 2021 at 8:39 AM Christian König christian.koenig@amd.com wrote:
Am 16.11.21 um 08:37 schrieb Daniel Vetter:
On Mon, Nov 15, 2021 at 9:23 PM Christian König christian.koenig@amd.com wrote:
Am 15.11.21 um 16:05 schrieb Daniel Vetter:
You need
commit 13e9e30cafea10dff6bc8d63a38a61249e83fd65 Author: Christian König christian.koenig@amd.com Date: Mon Oct 18 21:27:55 2021 +0200
drm/scheduler: fix drm_sched_job_add_implicit_dependencies
which Christian pushed to drm-misc-next instead of drm-misc-fixes. I already asked Christian in some other thread to cherry-pick it over.
Sounds like you haven't seen my answer to that request.
I can't cherry pick the patch to drm-misc-fixes because the patch which broke things hasn't showed up in that branch yet causing a conflict.
Yeah I asked Maxime to roll forward to -rc1 right after sending out this mail so you can do that.
I've pined him again just a second ago because a "dim update-branches" still doesn't show the patches from -rc1 this morning.
Hm yeah I should have checked first that Maxime indeed did it :-/
Which you could have done too :-)
Hui? I can push merges from upstream into drm-misc-fixes? ^^
Ah no, just asking to make it happen. -Daniel
Christian.
-Daniel
Regards, Christian.
-Daniel
On Mon, Nov 15, 2021 at 3:56 PM Daniel Stone daniel@fooishbar.org wrote:
Hi Ondrej,
On Mon, 15 Nov 2021 at 07:35, Ondřej Jirman megi@xff.cz wrote:
I'm getting some fence refcounting related panics with the current Linus's master branch:
It happens immediately whenever I start Xorg or sway.
Anyone has any ideas where to start looking? It works fine with v5.15.
(sorry for the interleaved log, it's coming from multiple CPUs at once I guess)
Thanks a lot for the report - are you able to bisect this please?
Cheers, Daniel
-- Daniel Vetter Software Engineer, Intel Corporation https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll....
dri-devel@lists.freedesktop.org