On Sun, Jul 11, 2010 at 11:02:12AM +1000, Ben Skeggs wrote:
On Sun, 2010-07-11 at 01:24 +0200, Marcin Slusarz wrote:
Hi
Patch "drm/nouveau: use drm_mm in preference to custom code doing the same thing" in nouveau tree introduced new deadlock possibility, for which lockdep complains loudly:
(...)
Hey,
Thanks for the report, I'll look at this more during the week.
Deadlock scenario looks like this: CPU1 CPU2 nouveau code calls some drm_mm.c function which takes mm->unused_lock
nouveau_channel_free disables irqs and takes dev_priv->context_switch_lock calls nv50_graph_destroy_context which (... backtrace above) calls drm_mm_put_block which tries to take mm->unused_lock (spins)
nouveau interrupt raises
nouveau_irq_handler tries to take dev_priv->context_switch_lock (spins)
deadlock
It's important to note that the drm_mm referenced eventually by nv50_graph_destroy_context is per-channel on the card, so for the deadlock to happen it'd have to be multiple threads from a single process, one thread creating/destroying and object on the channel while another was trying to destroy the channel.
Possible solutions:
- reverting "drm/nouveau: use drm_mm in preference to custom code doing the same thing"
- disabling interrupts before calling drm_mm functions - unmaintainable and still deadlockable in multicard setups (nouveau and eg radeon)
Agreed it's unmaintainable, but, as mentioned above, the relevant locks can't be touched by radeon.
- making mm->unused_lock HARDIRQ-safe (patch below) - simple but with slight overhead
I'll look more during the week, there's other solutions to be explored.
So, did you find other solution?
Marcin