Re: rcu_barrier() no longer allowed within mmap_sem?

30 Mar 2020

      On Mon, Mar 30, 2020 at 03:00:35PM +0200, Daniel Vetter wrote:
...
Hi all, for all = rcu, cpuhotplug and perf maintainers
We've hit an interesting new lockdep splat in our drm/i915 CI:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17096/shard-tglb7/igt@kms...
Summarizing away the driver parts we have
< gpu locks which are held within mm->mmap_sem in various gpu fault handlers >
-> #4 (&mm->mmap_sem#2){++++}:
<4> [604.892615] __might_fault+0x63/0x90
<4> [604.892617] _copy_to_user+0x1e/0x80
<4> [604.892619] perf_read+0x200/0x2b0
<4> [604.892621] vfs_read+0x96/0x160
<4> [604.892622] ksys_read+0x9f/0xe0
<4> [604.892623] do_syscall_64+0x4f/0x220
<4> [604.892624] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [604.892625]
-> #3 (&cpuctx_mutex){+.+.}:
<4> [604.892626] __mutex_lock+0x9a/0x9c0
<4> [604.892627] perf_event_init_cpu+0xa4/0x140
<4> [604.892629] perf_event_init+0x19d/0x1cd
<4> [604.892630] start_kernel+0x362/0x4e4
<4> [604.892631] secondary_startup_64+0xa4/0xb0
<4> [604.892631]
-> #2 (pmus_lock){+.+.}:
<4> [604.892633] __mutex_lock+0x9a/0x9c0
<4> [604.892633] perf_event_init_cpu+0x6b/0x140
<4> [604.892635] cpuhp_invoke_callback+0x9b/0x9d0
<4> [604.892636] _cpu_up+0xa2/0x140
<4> [604.892637] do_cpu_up+0x61/0xa0
<4> [604.892639] smp_init+0x57/0x96
<4> [604.892639] kernel_init_freeable+0x87/0x1dc
<4> [604.892640] kernel_init+0x5/0x100
<4> [604.892642] ret_from_fork+0x24/0x50
<4> [604.892642]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [604.892643] cpus_read_lock+0x34/0xd0
<4> [604.892644] rcu_barrier+0xaa/0x190
<4> [604.892645] kernel_init+0x21/0x100
<4> [604.892647] ret_from_fork+0x24/0x50
<4> [604.892647]
-> #0 (rcu_state.barrier_mutex){+.+.}:
...
The last backtrace boils down to i915 driver code which holds the same
locks we are holding within mm->mmap_sem, and then ends up calling
rcu_barrier(). From what I can see i915 is just the messenger here,
any driver with this pattern of a lock held within mmap_sem which also
has a path of calling rcu_barrier while holding that lock should be
hitting this splat.
Two questions:

This suggests that calling rcu_barrier() isn't ok anymore while

holding mmap_sem, or anything that has a dependency upon mmap_sem. I
guess that's not the idea, please confirm.

Assuming this depedency is indeed not intended, where should the

loop be broken? It goes through perf, cpuhotplug and rcu subsystems,
and I don't have a clue about any of those.
I wonder what is new here; the 1-4 chain there has been true for a long
time, see also the comment at perf_event_ctx_lock_nested().
That said; it _might_ be possible to break 3->4, that is, all the
copy_{to,from}_user() usage in perf can be lifted out from under the
various locks by re-arranging code, but I have a nagging feeling there
was more to it than that. Of course, while I did document the locking
rules, I seem to have forgotten to comment on exactly why these rules
are as they are.. oh well.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: rcu_barrier() no longer allowed within mmap_sem?