On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote:
From: Nicolai Hähnle Nicolai.Haehnle@amd.com
Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in the following example. Acquire context stamps are ordered like the thread numbers, i.e. thread #1 should back off when it encounters a mutex locked by thread #0 etc.
Thread #0 Thread #1 Thread #2 Thread #3
lock(ww) success lock(ww') success lock(ww) lock(ww) . . . unlock(ww) part 1
lock(ww) . . . success . . . . . unlock(ww) part 2 . back off lock(ww') . . . (stuck) (stuck)
Here, unlock(ww) part 1 is the part that sets lock->base.count to 1 (without being protected by lock->base.wait_lock), meaning that thread #0 can acquire ww in the fast path or, much more likely, the medium path in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then won't wake up any of the waiters in ww_mutex_set_context_fastpath.
Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is thread #2, since waiters are added at the tail. Thread #2 wakes up and backs off since it sees ww owned by a context with a lower stamp.
Meanwhile, thread #1 is never woken up, and so it won't back off its lock on ww'. So thread #0 gets stuck waiting for ww' to be released.
This patch fixes the deadlock by waking up all waiters in the slow path of ww_mutex_unlock.
We have an internal test case for amdgpu which continuously submits command streams from tens of threads, where all command streams reference hundreds of GPU buffer objects with a lot of overlap in the buffer lists between command streams. This test reliably caused a deadlock, and while I haven't completely confirmed that it is exactly the scenario outlined above, this patch does fix the test case.
v2:
- use wake_q_add
- add additional explanations
Cc: Peter Zijlstra peterz@infradead.org Cc: Ingo Molnar mingo@redhat.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Maarten Lankhorst maarten.lankhorst@canonical.com Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org Reviewed-by: Christian König christian.koenig@amd.com (v1) Signed-off-by: Nicolai Hähnle nicolai.haehnle@amd.com
Yeah, when the owning ctx changes we need to wake up all waiters, to make sure we catch all (new) deadlock scenarios. And I tried poking at your example, and I think it's solid and can't be minimized any further. I don't have much clue on mutex.c code itself, but the changes seem reasonable. With that caveat:
Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch
Cheers, Daniel
kernel/locking/mutex.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-)
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index a70b90d..7fbf9b4 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -409,6 +409,9 @@ static bool mutex_optimistic_spin(struct mutex *lock, __visible __used noinline void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
+static __used noinline +void __sched __mutex_unlock_slowpath_wakeall(atomic_t *lock_count);
/**
- mutex_unlock - release the mutex
- @lock: the mutex to be released
@@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock) */ mutex_clear_owner(&lock->base); #endif
- __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
- /*
* A previously _not_ waiting task may acquire the lock via the fast
* path during our unlock. In that case, already waiting tasks may have
* to back off to avoid a deadlock. Wake up all waiters so that they
* can check their acquire context stamp against the new owner.
*/
- __mutex_fastpath_unlock(&lock->base.count,
__mutex_unlock_slowpath_wakeall);
} EXPORT_SYMBOL(ww_mutex_unlock);
@@ -716,7 +726,7 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible);
- Release the lock, slowpath:
*/ static inline void -__mutex_unlock_common_slowpath(struct mutex *lock, int nested) +__mutex_unlock_common_slowpath(struct mutex *lock, int nested, int wake_all) { unsigned long flags; WAKE_Q(wake_q); @@ -740,7 +750,14 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested) mutex_release(&lock->dep_map, nested, _RET_IP_); debug_mutex_unlock(lock);
- if (!list_empty(&lock->wait_list)) {
- if (wake_all) {
struct mutex_waiter *waiter;
list_for_each_entry(waiter, &lock->wait_list, list) {
debug_mutex_wake_waiter(lock, waiter);
wake_q_add(&wake_q, waiter->task);
}
- } else if (!list_empty(&lock->wait_list)) { /* get the first entry from the wait-list: */ struct mutex_waiter *waiter = list_entry(lock->wait_list.next,
@@ -762,7 +779,15 @@ __mutex_unlock_slowpath(atomic_t *lock_count) { struct mutex *lock = container_of(lock_count, struct mutex, count);
- __mutex_unlock_common_slowpath(lock, 1);
- __mutex_unlock_common_slowpath(lock, 1, 0);
+}
+static void +__mutex_unlock_slowpath_wakeall(atomic_t *lock_count) +{
- struct mutex *lock = container_of(lock_count, struct mutex, count);
- __mutex_unlock_common_slowpath(lock, 1, 1);
}
#ifndef CONFIG_DEBUG_LOCK_ALLOC
2.7.4
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel