Re: [PATCH] gpu/drm/ttm: Use mutex_lock_killable() for shrinker functions.

30 May 2014


      On Thu, May 29, 2014 at 06:47:49AM +0900, Tetsuo Handa wrote:
...
Konrad Rzeszutek Wilk wrote:
...
On Sat, May 24, 2014 at 11:22:09PM +0900, Tetsuo Handa wrote:
...
Hello.
I tried to test whether it is OK (from point of view of reentrant) to use
mutex_lock() or mutex_lock_killable() inside shrinker functions when shrinker
functions do memory allocation, for drivers/gpu/drm/ttm/ttm_page_alloc_dma.c is
doing memory allocation with mutex lock held inside ttm_dma_pool_shrink_scan().
If I compile a test module shown below which mimics extreme case of what
ttm_dma_pool_shrink_scan() will do
And ttm_pool_shrink_scan.
I don't know why but ttm_pool_shrink_scan() does not take mutex.
...
...
and load the test module and do
# echo 3 > /proc/sys/vm/drop_caches
the system stalls with 0% CPU usage because of mutex deadlock
(with prior lockdep warning).
Is this because wrong gfp flags are passed to kmalloc() ? Is this because
the test module's shrinker functions return wrong values? Is this because
doing memory allocation with mutex held inside shrinker functions is
forbidden? Can anybody tell me what is wrong with my test module?
What is the sc->gfp_flags? What if you use GFP_ATOMIC?
I didn't check it but at least I'm sure that __GFP_WAIT bit is set.
Thus, GFP_ATOMIC or GFP_NOWAIT will solve this problem.
...
In regards to the lockdep warning below it looks like
...
Regards.
[   48.077353] 
[   48.077999] =================================
[   48.080023] [ INFO: inconsistent lock state ]
[   48.080023] 3.15.0-rc6-00190-g1ee1cea #203 Tainted: G           OE
[   48.080023] ---------------------------------
[   48.080023] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[   48.086745] kswapd0/784 [HC0[0]:SC0[0]:HE1:SE1] takes:
[   48.086745]  (lock#2){+.+.?.}, at: [<e0861022>] shrink_test_count+0x12/0x60 [test]
[   48.086745] {RECLAIM_FS-ON-W} state was registered at:
You have the scenario you described below, that is:
shrink_test_scan	
   mutex_lock_killable()
   	-> kmalloc
   		-> shrink_test_count
   			mutex_lock_killable()
And 'mutex_lock_killable' is the same (in at least this context)
the same as 'mutex_lock'. In other words, your second 'mutex_lock'
is going to spin - which is a deadlock.
Perhaps a way of not getting in this scenario is:
1). Try to take the mutex (ie, one that won't spin if it can't
     get it).
2). Use the GFP_ATOMIC in the shrinker so that we never
     end up calling ourselves in case of memory pressure
?
Yes, I think so as well.
...
...
...
...
This patch changes "mutex_lock();" to "if (mutex_lock_killable()) return ...;"
so that any threads can promptly give up. (By the way, as far as I tested,
changing to "if (!mutex_trylock()) return ...;" likely shortens the duration
of stall. Maybe we don't need to wait for mutex if someone is already calling
these functions.)
While discussing about XFS problem, I got a question. Is it OK (from point
of view of reentrant) to use mutex_lock() or mutex_lock_killable() inside
shrinker's entry point functions? Can senario shown below possible?
(1) kswapd is doing memory reclaim which does not need to hold mutex.
(2) Someone in GFP_KERNEL context (not kswapd) calls
    ttm_dma_pool_shrink_count() and then calls ttm_dma_pool_shrink_scan()
    from direct reclaim path.
(3) Inside ttm_dma_pool_shrink_scan(), GFP_KERNEL allocation is issued
    while mutex is held by the someone.
(4) GFP_KERNEL allocation cannot be completed immediately due to memory
    pressure.
(5) kswapd calls ttm_dma_pool_shrink_count() which need to hold mutex.
(6) Inside ttm_dma_pool_shrink_count(), kswapd is blocked waiting for
    mutex held by the someone, and the someone is waiting for GFP_KERNEL
    allocation to complete, but GFP_KERNEL allocation cannot be completed
    until mutex held by the someone is released?
Ewww. Perhaps if we used GFP_ATOMIC for the array allocation we do in
ttm_dma_page_pool_free and ttm_page_pool_free?
That would avoid the 4) problem.
Right. Which approach ("use GFP_ATOMIC or GFP_NOWAIT" / "use !mutex_trylock()")
do you prefer? I'll create RHBZ entry for RHEL7 kernel as non count/scan
version has the same problem.
I am not sure why you need an RHBZ - as the patches that go upstream
don't need an RHBZ.
I think the combination of mutex_trylock and GFP_ATOMIC would suffice.
Thank you!
...
---------- test.c start ----------
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/mm.h>
static int shrink_test(struct shrinker *shrinker, struct shrink_control *sc)
{
   static DEFINE_MUTEX(lock);
   LIST_HEAD(list);
   int i = 0;
   if (mutex_lock_killable(&lock)) {
   	printk(KERN_WARNING "Process %u (%s) gave up waiting for mutex"
   	       "\n", current->pid, current->comm);
   	return 0;
   }
   while (1) {
   	struct list_head *l = kmalloc(PAGE_SIZE, sc->gfp_mask);
   	if (!l)
   		break;
   	list_add_tail(l, &list);
   	i++;
   }
   printk(KERN_WARNING "Process %u (%s) allocated %u pages\n",
          current->pid, current->comm, i);
   while (i--) {
   	struct list_head *l = list.next;
   	list_del(l);
   	kfree(l);
   }
   mutex_unlock(&lock);
   return 0;
}
static struct shrinker recursive_shrinker = {
   .shrink = shrink_test,
   .seeks = DEFAULT_SEEKS,
};
static int __init recursive_shrinker_init(void)
{
   register_shrinker(&recursive_shrinker);
   return 0;
}
module_init(recursive_shrinker_init);
MODULE_LICENSE("GPL");
---------- test.c end ----------
[ 1263.179725] 
[ 1263.180756] =================================
[ 1263.182322] [ INFO: inconsistent lock state ]
[ 1263.183920] 3.10.0-121.el7.x86_64.debug #1 Tainted: GF          O--------------  
[ 1263.186162] ---------------------------------
[ 1263.187742] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 1263.189788] kswapd0/105 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1263.191523]  (lock#3){+.+.?.}, at: [<ffffffffa0563040>] shrink_test+0x40/0x140 [test]
[ 1263.194053] {RECLAIM_FS-ON-W} state was registered at:
[ 1263.195848]   [<ffffffff810ea759>] mark_held_locks+0xb9/0x140
[ 1263.197758]   [<ffffffff810ecb6a>] lockdep_trace_alloc+0x7a/0xe0
[ 1263.199718]   [<ffffffff811db9d3>] kmem_cache_alloc_trace+0x33/0x340
[ 1263.201809]   [<ffffffffa0563061>] shrink_test+0x61/0x140 [test]
[ 1263.203662]   [<ffffffff81194a99>] shrink_slab+0xb9/0x4d0
[ 1263.205378]   [<ffffffff81265403>] drop_caches_sysctl_handler+0xc3/0x120
[ 1263.207352]   [<ffffffff8127dab4>] proc_sys_call_handler+0xe4/0x110
[ 1263.209238]   [<ffffffff8127daf4>] proc_sys_write+0x14/0x20
[ 1263.210972]   [<ffffffff811fd1a0>] vfs_write+0xc0/0x1f0
[ 1263.212658]   [<ffffffff811fdc1b>] SyS_write+0x5b/0xb0
[ 1263.214301]   [<ffffffff816bd899>] system_call_fastpath+0x16/0x1b
[ 1263.216172] irq event stamp: 37
[ 1263.217406] hardirqs last  enabled at (37): [<ffffffff816b2f9c>] _raw_spin_unlock_irq+0x2c/0x50
[ 1263.219753] hardirqs last disabled at (36): [<ffffffff816b2dff>] _raw_spin_lock_irq+0x1f/0x90
[ 1263.222052] softirqs last  enabled at (0): [<ffffffff8106aa25>] copy_process.part.22+0x665/0x1750
[ 1263.224414] softirqs last disabled at (0): [<          (null)>]           (null)
[ 1263.226492] 
[ 1263.226492] other info that might help us debug this:
[ 1263.228920]  Possible unsafe locking scenario:
[ 1263.228920] 
[ 1263.231192]        CPU0
[ 1263.232223]        ----
[ 1263.233280]   lock(lock#3);
[ 1263.234435]   <Interrupt>
[ 1263.235489]     lock(lock#3);
[ 1263.236708] 
[ 1263.236708]  *** DEADLOCK ***
[ 1263.236708] 
[ 1263.239358] 1 lock held by kswapd0/105:
[ 1263.240593]  #0:  (shrinker_rwsem){++++.+}, at: [<ffffffff81194a1c>] shrink_slab+0x3c/0x4d0
[ 1263.242894] 
[ 1263.242894] stack backtrace:
[ 1263.244792] CPU: 1 PID: 105 Comm: kswapd0 Tainted: GF          O--------------   3.10.0-121.el7.x86_64.debug #1
[ 1263.247230] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 1263.249747]  ffff880036708000 000000004c6ef89a ffff8800367039c8 ffffffff816a981c
[ 1263.251849]  ffff880036703a18 ffffffff816a3ac5 0000000000000000 ffff880000000001
[ 1263.253956]  ffffffff00000001 000000000000000a ffff880036708000 ffffffff810e88a0
[ 1263.256314] Call Trace:
[ 1263.257365]  [<ffffffff816a981c>] dump_stack+0x19/0x1b
[ 1263.258921]  [<ffffffff816a3ac5>] print_usage_bug+0x1f7/0x208
[ 1263.260591]  [<ffffffff810e88a0>] ? check_usage_backwards+0x1b0/0x1b0
[ 1263.262379]  [<ffffffff810ea61d>] mark_lock+0x21d/0x2a0
[ 1263.263898]  [<ffffffff810eb30a>] __lock_acquire+0x52a/0xb60
[ 1263.265562]  [<ffffffff810232c9>] ? sched_clock+0x9/0x10
[ 1263.267148]  [<ffffffff810b7c75>] ? sched_clock_cpu+0xb5/0x100
[ 1263.268802]  [<ffffffff810ec132>] lock_acquire+0xa2/0x1f0
[ 1263.270378]  [<ffffffffa0563040>] ? shrink_test+0x40/0x140 [test]
[ 1263.272072]  [<ffffffff816ae859>] mutex_lock_killable_nested+0x99/0x5d0
[ 1263.273900]  [<ffffffffa0563040>] ? shrink_test+0x40/0x140 [test]
[ 1263.275610]  [<ffffffffa0563040>] ? shrink_test+0x40/0x140 [test]
[ 1263.277305]  [<ffffffffa0563040>] shrink_test+0x40/0x140 [test]
[ 1263.278970]  [<ffffffff81194a99>] shrink_slab+0xb9/0x4d0
[ 1263.280501]  [<ffffffff811991b9>] balance_pgdat+0x4e9/0x620
[ 1263.282135]  [<ffffffff811994a3>] kswapd+0x1b3/0x640
[ 1263.283604]  [<ffffffff8109f3c0>] ? wake_up_bit+0x30/0x30
[ 1263.285166]  [<ffffffff811992f0>] ? balance_pgdat+0x620/0x620
[ 1263.286798]  [<ffffffff8109e0cd>] kthread+0xed/0x100
[ 1263.288286]  [<ffffffff8109dfe0>] ? insert_kthread_work+0x80/0x80
[ 1263.289973]  [<ffffffff816bd7ec>] ret_from_fork+0x7c/0xb0
[ 1263.291535]  [<ffffffff8109dfe0>] ? insert_kthread_work+0x80/0x80

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH] gpu/drm/ttm: Use mutex_lock_killable() for shrinker functions.