Marcin Slusarz marcin.slusarz@gmail.com writes:
On Sun, Jul 11, 2010 at 11:02:12AM +1000, Ben Skeggs wrote:
On Sun, 2010-07-11 at 01:24 +0200, Marcin Slusarz wrote:
Hi
Patch "drm/nouveau: use drm_mm in preference to custom code doing the same thing" in nouveau tree introduced new deadlock possibility, for which lockdep complains loudly:
(...)
Hey,
Thanks for the report, I'll look at this more during the week.
Deadlock scenario looks like this: CPU1 CPU2 nouveau code calls some drm_mm.c function which takes mm->unused_lock
nouveau_channel_free disables irqs and takes dev_priv->context_switch_lock calls nv50_graph_destroy_context which (... backtrace above) calls drm_mm_put_block which tries to take mm->unused_lock (spins)
nouveau interrupt raises
nouveau_irq_handler tries to take dev_priv->context_switch_lock (spins)
deadlock
It's important to note that the drm_mm referenced eventually by nv50_graph_destroy_context is per-channel on the card, so for the deadlock to happen it'd have to be multiple threads from a single process, one thread creating/destroying and object on the channel while another was trying to destroy the channel.
Yeah, and that situation is impossible ATM because those functions are called with the BKL held.
Possible solutions:
- reverting "drm/nouveau: use drm_mm in preference to custom code doing the same thing"
- disabling interrupts before calling drm_mm functions - unmaintainable and still deadlockable in multicard setups (nouveau and eg radeon)
Agreed it's unmaintainable, but, as mentioned above, the relevant locks can't be touched by radeon.
- making mm->unused_lock HARDIRQ-safe (patch below) - simple but with slight overhead
I'll look more during the week, there's other solutions to be explored.
So, did you find other solution?
Some random ideas:
- Make context_switch_lock HARDIRQ-unsafe. To avoid racing with the IRQ handler we'd have to disable interrupt dispatch on the card before taking context_switch_lock (i.e. at channel creation and destruction time), and the interrupt control registers would have to be protected with a IRQ safe spinlock.
- Split the current destroy_context() hooks in two halves, the first one would be in charge of the PFIFO/PGRAPH-poking tasks (e.g. disable_context()), and the second one would take care of releasing the allocated resources (and it wouldn't need locking).
Marcin
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel