So, I was looking at the below lockdep splat, and discussing it a bit w/ sboyd on IRC, and came to a slightly disturbing realization..
The interaction between prepare_lock and debugfs bits is a little bit worrying. In particular, it is probably not a good idea to assume that anyone who needs to grab prepare_lock does not already hold mmap_sem. Not holding mmap_sem or locks that interact w/ mmap_sem is going to be pretty hard to avoid, at least for gpu drivers that are using iommus that are using CCF ;-)
BR, -R
----------
[15928.894558] [15928.894609] ====================================================== [15928.895145] [ INFO: possible circular locking dependency detected ] [15928.901141] 3.17.0-rc1-00050-g07a489b #802 Tainted: G W [15928.907335] ------------------------------------------------------- [15928.907348] Xorg.bin/5413 is trying to acquire lock: [15928.907417] (prepare_lock){+.+.+.}, at: [<c0781280>] clk_prepare_lock+0x88/0xfc [15928.907424] [15928.907424] but task is already holding lock: [15928.907508] (qcom_iommu_lock){+.+...}, at: [<c079f664>] qcom_iommu_unmap+0x1c/0x1f0 [15928.907519] [15928.907519] which lock already depends on the new lock. [15928.907519] [15928.907532] [15928.907532] the existing dependency chain (in reverse order) is: [15928.907575] [15928.907575] -> #4 (qcom_iommu_lock){+.+...}: [15928.907611] [<c079f860>] qcom_iommu_map+0x28/0x450 [15928.907634] [<c079eb50>] iommu_map+0xc8/0x12c [15928.907662] [<c056c1fc>] msm_iommu_map+0xb4/0x130 [15928.907681] [<c05697bc>] msm_gem_get_iova_locked+0x9c/0xe8 [15928.907714] [<c0569854>] msm_gem_get_iova+0x4c/0x64 [15928.907765] [<c0562208>] mdp4_kms_init+0x4c4/0x6c0 [15928.907813] [<c056881c>] msm_load+0x2ac/0x34c [15928.907846] [<c0545724>] drm_dev_register+0xac/0x108 [15928.907868] [<c0547510>] drm_platform_init+0x50/0xf0 [15928.907892] [<c0578a60>] try_to_bring_up_master.part.3+0xc8/0x108 [15928.907913] [<c0578b48>] component_master_add_with_match+0xa8/0x104 [15928.907934] [<c0568294>] msm_pdev_probe+0x64/0x70 [15928.907955] [<c057e704>] platform_drv_probe+0x2c/0x60 [15928.907983] [<c057cff8>] driver_probe_device+0x108/0x234 [15928.908003] [<c057b65c>] bus_for_each_drv+0x64/0x98 [15928.908040] [<c057cec0>] device_attach+0x78/0x8c [15928.908082] [<c057c590>] bus_probe_device+0x88/0xac [15928.908126] [<c057c9b8>] deferred_probe_work_func+0x68/0x9c [15928.908182] [<c0259db4>] process_one_work+0x1a0/0x40c [15928.908214] [<c025a710>] worker_thread+0x44/0x4d8 [15928.908237] [<c025ec54>] kthread+0xd8/0xec [15928.908262] [<c020e9a8>] ret_from_fork+0x14/0x2c [15928.908291] [15928.908291] -> #3 (&dev->struct_mutex){+.+.+.}: [15928.908311] [<c0541188>] drm_gem_mmap+0x38/0xd0 [15928.908329] [<c05695b8>] msm_gem_mmap+0xc/0x5c [15928.908358] [<c02f0b6c>] mmap_region+0x35c/0x6c8 [15928.908377] [<c02f11ec>] do_mmap_pgoff+0x314/0x398 [15928.908398] [<c02de1e0>] vm_mmap_pgoff+0x84/0xb4 [15928.908416] [<c02ef83c>] SyS_mmap_pgoff+0x94/0xbc [15928.908436] [<c020e8e0>] ret_fast_syscall+0x0/0x48 [15928.908463] [15928.908463] -> #2 (&mm->mmap_sem){++++++}: [15928.908512] [<c0321138>] filldir64+0x68/0x180 [15928.908558] [<c0333fe0>] dcache_readdir+0x188/0x22c [15928.908593] [<c0320ed0>] iterate_dir+0x9c/0x11c [15928.908616] [<c03213b0>] SyS_getdents64+0x78/0xe8 [15928.908640] [<c020e8e0>] ret_fast_syscall+0x0/0x48 [15928.908671] [15928.908671] -> #1 (&sb->s_type->i_mutex_key#3){+.+.+.}: [15928.908706] [<c03fc544>] __create_file+0x58/0x1dc [15928.908728] [<c03fc70c>] debugfs_create_dir+0x1c/0x24 [15928.908761] [<c0781c7c>] clk_debug_create_subtree+0x20/0x170 [15928.908790] [<c0be2af8>] clk_debug_init+0xec/0x14c [15928.908816] [<c0208c70>] do_one_initcall+0x8c/0x1c8 [15928.908846] [<c0b9cce4>] kernel_init_freeable+0x13c/0x1dc [15928.908873] [<c0877bc4>] kernel_init+0x8/0xe8 [15928.908898] [<c020e9a8>] ret_from_fork+0x14/0x2c [15928.908925] [15928.908925] -> #0 (prepare_lock){+.+.+.}: [15928.908948] [<c087c408>] mutex_lock_nested+0x70/0x3e8 [15928.908970] [<c0781280>] clk_prepare_lock+0x88/0xfc [15928.909001] [<c0782c50>] clk_prepare+0xc/0x24 [15928.909022] [<c079f474>] __enable_clocks.isra.4+0x18/0xa4 [15928.909041] [<c079f614>] __flush_iotlb_va+0xe0/0x114 [15928.909071] [<c079f6f4>] qcom_iommu_unmap+0xac/0x1f0 [15928.909093] [<c079ea3c>] iommu_unmap+0x9c/0xe8 [15928.909112] [<c056c2fc>] msm_iommu_unmap+0x64/0x84 [15928.909130] [<c0569da4>] msm_gem_free_object+0x11c/0x338 [15928.909149] [<c05413ec>] drm_gem_object_handle_unreference_unlocked+0xfc/0x130 [15928.909166] [<c0541604>] drm_gem_object_release_handle+0x50/0x68 [15928.909199] [<c0447a98>] idr_for_each+0xa8/0xdc [15928.909225] [<c0541c10>] drm_gem_release+0x1c/0x28 [15928.909258] [<c0540b3c>] drm_release+0x370/0x428 [15928.909302] [<c031105c>] __fput+0x98/0x1e8 [15928.909339] [<c025d73c>] task_work_run+0xb0/0xfc [15928.909386] [<c02477ec>] do_exit+0x2ec/0x948 [15928.909415] [<c0247ec0>] do_group_exit+0x4c/0xb8 [15928.909455] [<c025180c>] get_signal+0x28c/0x6ac [15928.909507] [<c0211204>] do_signal+0xc4/0x3e4 [15928.909548] [<c02116cc>] do_work_pending+0xb4/0xc4 [15928.909584] [<c020e938>] work_pending+0xc/0x20 [15928.909595] [15928.909595] other info that might help us debug this: [15928.909595] [15928.909665] Chain exists of: [15928.909665] prepare_lock --> &dev->struct_mutex --> qcom_iommu_lock [15928.909665] [15928.909675] Possible unsafe locking scenario: [15928.909675] [15928.909685] CPU0 CPU1 [15928.909696] ---- ---- [15928.909724] lock(qcom_iommu_lock); [15928.909753] lock(&dev->struct_mutex); [15928.909769] lock(qcom_iommu_lock); [15928.909786] lock(prepare_lock); [15928.909795] [15928.909795] *** DEADLOCK *** [15928.909795] [15928.909818] 3 locks held by Xorg.bin/5413: [15928.909905] #0: (drm_global_mutex){+.+.+.}, at: [<c0540800>] drm_release+0x34/0x428 [15928.909954] #1: (&dev->struct_mutex){+.+.+.}, at: [<c05413bc>] drm_gem_object_handle_unreference_unlocked+0xcc/0x130 [15928.910029] #2: (qcom_iommu_lock){+.+...}, at: [<c079f664>] qcom_iommu_unmap+0x1c/0x1f0 [15928.910042] [15928.910042] stack backtrace: [15928.910073] CPU: 1 PID: 5413 Comm: Xorg.bin Tainted: G W 3.17.0-rc1-00050-g07a489b #802 [15928.910141] [<c0216290>] (unwind_backtrace) from [<c0211d8c>] (show_stack+0x10/0x14) [15928.910181] [<c0211d8c>] (show_stack) from [<c087a078>] (dump_stack+0x98/0xb8) [15928.910210] [<c087a078>] (dump_stack) from [<c027f024>] (print_circular_bug+0x218/0x340) [15928.910250] [<c027f024>] (print_circular_bug) from [<c0283e08>] (__lock_acquire+0x1d24/0x20b8) [15928.910293] [<c0283e08>] (__lock_acquire) from [<c0284774>] (lock_acquire+0x9c/0xbc) [15928.910332] [<c0284774>] (lock_acquire) from [<c087c408>] (mutex_lock_nested+0x70/0x3e8) [15928.910380] [<c087c408>] (mutex_lock_nested) from [<c0781280>] (clk_prepare_lock+0x88/0xfc) [15928.910436] [<c0781280>] (clk_prepare_lock) from [<c0782c50>] (clk_prepare+0xc/0x24) [15928.910478] [<c0782c50>] (clk_prepare) from [<c079f474>] (__enable_clocks.isra.4+0x18/0xa4) [15928.910517] [<c079f474>] (__enable_clocks.isra.4) from [<c079f614>] (__flush_iotlb_va+0xe0/0x114) [15928.910561] [<c079f614>] (__flush_iotlb_va) from [<c079f6f4>] (qcom_iommu_unmap+0xac/0x1f0) [15928.910602] [<c079f6f4>] (qcom_iommu_unmap) from [<c079ea3c>] (iommu_unmap+0x9c/0xe8) [15928.910635] [<c079ea3c>] (iommu_unmap) from [<c056c2fc>] (msm_iommu_unmap+0x64/0x84) [15928.910669] [<c056c2fc>] (msm_iommu_unmap) from [<c0569da4>] (msm_gem_free_object+0x11c/0x338) [15928.910692] [<c0569da4>] (msm_gem_free_object) from [<c05413ec>] (drm_gem_object_handle_unreference_unlocked+0xfc/0x130) [15928.910715] [<c05413ec>] (drm_gem_object_handle_unreference_unlocked) from [<c0541604>] (drm_gem_object_release_handle+0x50/0x68) [15928.910737] [<c0541604>] (drm_gem_object_release_handle) from [<c0447a98>] (idr_for_each+0xa8/0xdc) [15928.910759] [<c0447a98>] (idr_for_each) from [<c0541c10>] (drm_gem_release+0x1c/0x28) [15928.910786] [<c0541c10>] (drm_gem_release) from [<c0540b3c>] (drm_release+0x370/0x428) [15928.910818] [<c0540b3c>] (drm_release) from [<c031105c>] (__fput+0x98/0x1e8) [15928.910846] [<c031105c>] (__fput) from [<c025d73c>] (task_work_run+0xb0/0xfc) [15928.910870] [<c025d73c>] (task_work_run) from [<c02477ec>] (do_exit+0x2ec/0x948) [15928.910897] [<c02477ec>] (do_exit) from [<c0247ec0>] (do_group_exit+0x4c/0xb8) [15928.910920] [<c0247ec0>] (do_group_exit) from [<c025180c>] (get_signal+0x28c/0x6ac) [15928.910966] [<c025180c>] (get_signal) from [<c0211204>] (do_signal+0xc4/0x3e4) [15928.911019] [<c0211204>] (do_signal) from [<c02116cc>] (do_work_pending+0xb4/0xc4) [15928.911054] [<c02116cc>] (do_work_pending) from [<c020e938>] (work_pending+0xc/0x20)
On 09/04/14 17:46, Rob Clark wrote:
So, I was looking at the below lockdep splat, and discussing it a bit w/ sboyd on IRC, and came to a slightly disturbing realization..
The interaction between prepare_lock and debugfs bits is a little bit worrying. In particular, it is probably not a good idea to assume that anyone who needs to grab prepare_lock does not already hold mmap_sem. Not holding mmap_sem or locks that interact w/ mmap_sem is going to be pretty hard to avoid, at least for gpu drivers that are using iommus that are using CCF ;-)
I'm thinking one way to fix this is to replace the tree traversal for debugfs registration with a list iteration of all registered clocks. That way we don't hold the prepare mutex across debugfs directory/file creation. This should break the chain.
Now that debugfs isn't a hierarchy, this becomes a lot easier, we just need to keep a linked list of all the clocks that are registered. I already have that patch for my wwmutex series, but I didn't convert debugfs to use it. Two patches to follow.
In the near future we're going to move the prepare lock to be a per-clock ww_mutex. __clk_lookup() is called very deep in the set-rate path and we would like to avoid having to take all the locks in the clock tree to search for a clock (basically defeating the purpose of introducing per-clock locks). Introduce a new list that contains all clocks registered in the system and walk this list until the clock is found.
Signed-off-by: Stephen Boyd sboyd@codeaurora.org ---
Yeah this commit text could be updated and/or this could be squashed into the next patch.
drivers/clk/clk.c | 52 +++++++++++++++++---------------------------- include/linux/clk-private.h | 1 + 2 files changed, 21 insertions(+), 32 deletions(-)
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index b76fa69b44cb..cf5df744cb21 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -33,8 +33,10 @@ static struct task_struct *enable_owner; static int prepare_refcnt; static int enable_refcnt;
+static DEFINE_MUTEX(clk_lookup_lock); static HLIST_HEAD(clk_root_list); static HLIST_HEAD(clk_orphan_list); +static HLIST_HEAD(clk_lookup_list); static LIST_HEAD(clk_notifier_list);
/*** locking ***/ @@ -670,46 +672,23 @@ out: } EXPORT_SYMBOL_GPL(__clk_is_enabled);
-static struct clk *__clk_lookup_subtree(const char *name, struct clk *clk) -{ - struct clk *child; - struct clk *ret; - - if (!strcmp(clk->name, name)) - return clk; - - hlist_for_each_entry(child, &clk->children, child_node) { - ret = __clk_lookup_subtree(name, child); - if (ret) - return ret; - } - - return NULL; -} - struct clk *__clk_lookup(const char *name) { - struct clk *root_clk; - struct clk *ret; + struct clk *clk;
if (!name) return NULL;
- /* search the 'proper' clk tree first */ - hlist_for_each_entry(root_clk, &clk_root_list, child_node) { - ret = __clk_lookup_subtree(name, root_clk); - if (ret) - return ret; + mutex_lock(&clk_lookup_lock); + hlist_for_each_entry(clk, &clk_lookup_list, lookup_node) { + if (!strcmp(clk->name, name)) + goto found; } + clk = NULL; +found: + mutex_unlock(&clk_lookup_lock);
- /* if not found, then search the orphan tree */ - hlist_for_each_entry(root_clk, &clk_orphan_list, child_node) { - ret = __clk_lookup_subtree(name, root_clk); - if (ret) - return ret; - } - - return NULL; + return clk; }
/* @@ -1823,6 +1802,11 @@ int __clk_init(struct device *dev, struct clk *clk)
clk->parent = __clk_init_parent(clk);
+ /* Insert into clock lookup list */ + mutex_lock(&clk_lookup_lock); + hlist_add_head(&clk->lookup_node, &clk_lookup_list); + mutex_unlock(&clk_lookup_lock); + /* * Populate clk->parent if parent has already been __clk_init'd. If * parent has not yet been __clk_init'd then place clk in the orphan @@ -2117,6 +2101,10 @@ void clk_unregister(struct clk *clk)
hlist_del_init(&clk->child_node);
+ mutex_lock(&clk_lookup_lock); + hlist_del_init(&clk->lookup_node); + mutex_unlock(&clk_lookup_lock); + if (clk->prepare_count) pr_warn("%s: unregistering prepared clock: %s\n", __func__, clk->name); diff --git a/include/linux/clk-private.h b/include/linux/clk-private.h index efbf70b9fd84..3cd98a930006 100644 --- a/include/linux/clk-private.h +++ b/include/linux/clk-private.h @@ -48,6 +48,7 @@ struct clk { unsigned long accuracy; struct hlist_head children; struct hlist_node child_node; + struct hlist_node lookup_node; unsigned int notifier_count; #ifdef CONFIG_DEBUG_FS struct dentry *dentry;
Rob Clark reports a lockdep splat that involves the prepare_lock chained with the mmap semaphore.
====================================================== [ INFO: possible circular locking dependency detected ] 3.17.0-rc1-00050-g07a489b #802 Tainted: G W ------------------------------------------------------- Xorg.bin/5413 is trying to acquire lock: (prepare_lock){+.+.+.}, at: [<c0781280>] clk_prepare_lock+0x88/0xfc
but task is already holding lock: (qcom_iommu_lock){+.+...}, at: [<c079f664>] qcom_iommu_unmap+0x1c/0x1f0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (qcom_iommu_lock){+.+...}: [<c079f860>] qcom_iommu_map+0x28/0x450 [<c079eb50>] iommu_map+0xc8/0x12c [<c056c1fc>] msm_iommu_map+0xb4/0x130 [<c05697bc>] msm_gem_get_iova_locked+0x9c/0xe8 [<c0569854>] msm_gem_get_iova+0x4c/0x64 [<c0562208>] mdp4_kms_init+0x4c4/0x6c0 [<c056881c>] msm_load+0x2ac/0x34c [<c0545724>] drm_dev_register+0xac/0x108 [<c0547510>] drm_platform_init+0x50/0xf0 [<c0578a60>] try_to_bring_up_master.part.3+0xc8/0x108 [<c0578b48>] component_master_add_with_match+0xa8/0x104 [<c0568294>] msm_pdev_probe+0x64/0x70 [<c057e704>] platform_drv_probe+0x2c/0x60 [<c057cff8>] driver_probe_device+0x108/0x234 [<c057b65c>] bus_for_each_drv+0x64/0x98 [<c057cec0>] device_attach+0x78/0x8c [<c057c590>] bus_probe_device+0x88/0xac [<c057c9b8>] deferred_probe_work_func+0x68/0x9c [<c0259db4>] process_one_work+0x1a0/0x40c [<c025a710>] worker_thread+0x44/0x4d8 [<c025ec54>] kthread+0xd8/0xec [<c020e9a8>] ret_from_fork+0x14/0x2c
-> #3 (&dev->struct_mutex){+.+.+.}: [<c0541188>] drm_gem_mmap+0x38/0xd0 [<c05695b8>] msm_gem_mmap+0xc/0x5c [<c02f0b6c>] mmap_region+0x35c/0x6c8 [<c02f11ec>] do_mmap_pgoff+0x314/0x398 [<c02de1e0>] vm_mmap_pgoff+0x84/0xb4 [<c02ef83c>] SyS_mmap_pgoff+0x94/0xbc [<c020e8e0>] ret_fast_syscall+0x0/0x48
-> #2 (&mm->mmap_sem){++++++}: [<c0321138>] filldir64+0x68/0x180 [<c0333fe0>] dcache_readdir+0x188/0x22c [<c0320ed0>] iterate_dir+0x9c/0x11c [<c03213b0>] SyS_getdents64+0x78/0xe8 [<c020e8e0>] ret_fast_syscall+0x0/0x48
-> #1 (&sb->s_type->i_mutex_key#3){+.+.+.}: [<c03fc544>] __create_file+0x58/0x1dc [<c03fc70c>] debugfs_create_dir+0x1c/0x24 [<c0781c7c>] clk_debug_create_subtree+0x20/0x170 [<c0be2af8>] clk_debug_init+0xec/0x14c [<c0208c70>] do_one_initcall+0x8c/0x1c8 [<c0b9cce4>] kernel_init_freeable+0x13c/0x1dc [<c0877bc4>] kernel_init+0x8/0xe8 [<c020e9a8>] ret_from_fork+0x14/0x2c
-> #0 (prepare_lock){+.+.+.}: [<c087c408>] mutex_lock_nested+0x70/0x3e8 [<c0781280>] clk_prepare_lock+0x88/0xfc [<c0782c50>] clk_prepare+0xc/0x24 [<c079f474>] __enable_clocks.isra.4+0x18/0xa4 [<c079f614>] __flush_iotlb_va+0xe0/0x114 [<c079f6f4>] qcom_iommu_unmap+0xac/0x1f0 [<c079ea3c>] iommu_unmap+0x9c/0xe8 [<c056c2fc>] msm_iommu_unmap+0x64/0x84 [<c0569da4>] msm_gem_free_object+0x11c/0x338 [<c05413ec>] drm_gem_object_handle_unreference_unlocked+0xfc/0x130 [<c0541604>] drm_gem_object_release_handle+0x50/0x68 [<c0447a98>] idr_for_each+0xa8/0xdc [<c0541c10>] drm_gem_release+0x1c/0x28 [<c0540b3c>] drm_release+0x370/0x428 [<c031105c>] __fput+0x98/0x1e8 [<c025d73c>] task_work_run+0xb0/0xfc [<c02477ec>] do_exit+0x2ec/0x948 [<c0247ec0>] do_group_exit+0x4c/0xb8 [<c025180c>] get_signal+0x28c/0x6ac [<c0211204>] do_signal+0xc4/0x3e4 [<c02116cc>] do_work_pending+0xb4/0xc4 [<c020e938>] work_pending+0xc/0x20
other info that might help us debug this:
Chain exists of: prepare_lock --> &dev->struct_mutex --> qcom_iommu_lock
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(qcom_iommu_lock); lock(&dev->struct_mutex); lock(qcom_iommu_lock); lock(prepare_lock);
*** DEADLOCK ***
3 locks held by Xorg.bin/5413: #0: (drm_global_mutex){+.+.+.}, at: [<c0540800>] drm_release+0x34/0x428 #1: (&dev->struct_mutex){+.+.+.}, at: [<c05413bc>] drm_gem_object_handle_unreference_unlocked+0xcc/0x130 #2: (qcom_iommu_lock){+.+...}, at: [<c079f664>] qcom_iommu_unmap+0x1c/0x1f0
stack backtrace: CPU: 1 PID: 5413 Comm: Xorg.bin Tainted: G W 3.17.0-rc1-00050-g07a489b #802 [<c0216290>] (unwind_backtrace) from [<c0211d8c>] (show_stack+0x10/0x14) [<c0211d8c>] (show_stack) from [<c087a078>] (dump_stack+0x98/0xb8) [<c087a078>] (dump_stack) from [<c027f024>] (print_circular_bug+0x218/0x340) [<c027f024>] (print_circular_bug) from [<c0283e08>] (__lock_acquire+0x1d24/0x20b8) [<c0283e08>] (__lock_acquire) from [<c0284774>] (lock_acquire+0x9c/0xbc) [<c0284774>] (lock_acquire) from [<c087c408>] (mutex_lock_nested+0x70/0x3e8) [<c087c408>] (mutex_lock_nested) from [<c0781280>] (clk_prepare_lock+0x88/0xfc) [<c0781280>] (clk_prepare_lock) from [<c0782c50>] (clk_prepare+0xc/0x24) [<c0782c50>] (clk_prepare) from [<c079f474>] (__enable_clocks.isra.4+0x18/0xa4) [<c079f474>] (__enable_clocks.isra.4) from [<c079f614>] (__flush_iotlb_va+0xe0/0x114) [<c079f614>] (__flush_iotlb_va) from [<c079f6f4>] (qcom_iommu_unmap+0xac/0x1f0) [<c079f6f4>] (qcom_iommu_unmap) from [<c079ea3c>] (iommu_unmap+0x9c/0xe8) [<c079ea3c>] (iommu_unmap) from [<c056c2fc>] (msm_iommu_unmap+0x64/0x84) [<c056c2fc>] (msm_iommu_unmap) from [<c0569da4>] (msm_gem_free_object+0x11c/0x338) [<c0569da4>] (msm_gem_free_object) from [<c05413ec>] (drm_gem_object_handle_unreference_unlocked+0xfc/0x130) [<c05413ec>] (drm_gem_object_handle_unreference_unlocked) from [<c0541604>] (drm_gem_object_release_handle+0x50/0x68) [<c0541604>] (drm_gem_object_release_handle) from [<c0447a98>] (idr_for_each+0xa8/0xdc) [<c0447a98>] (idr_for_each) from [<c0541c10>] (drm_gem_release+0x1c/0x28) [<c0541c10>] (drm_gem_release) from [<c0540b3c>] (drm_release+0x370/0x428) [<c0540b3c>] (drm_release) from [<c031105c>] (__fput+0x98/0x1e8) [<c031105c>] (__fput) from [<c025d73c>] (task_work_run+0xb0/0xfc) [<c025d73c>] (task_work_run) from [<c02477ec>] (do_exit+0x2ec/0x948) [<c02477ec>] (do_exit) from [<c0247ec0>] (do_group_exit+0x4c/0xb8) [<c0247ec0>] (do_group_exit) from [<c025180c>] (get_signal+0x28c/0x6ac) [<c025180c>] (get_signal) from [<c0211204>] (do_signal+0xc4/0x3e4) [<c0211204>] (do_signal) from [<c02116cc>] (do_work_pending+0xb4/0xc4) [<c02116cc>] (do_work_pending) from [<c020e938>] (work_pending+0xc/0x20)
We can break this chain if we don't hold the prepare_lock while creating debugfs directories. We only hold the prepare_lock right now because we're traversing the clock tree recursively and we don't want the hierarchy to change during the traversal. Replacing this traversal with a simple linked list walk allows us to only grab a list lock instead of the prepare_lock, thus breaking the lock chain.
Signed-off-by: Stephen Boyd sboyd@codeaurora.org --- drivers/clk/clk.c | 38 +++++++------------------------------- 1 file changed, 7 insertions(+), 31 deletions(-)
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index cf5df744cb21..ffec3814915a 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -302,26 +302,12 @@ out: return ret; }
-/* caller must hold prepare_lock */ static int clk_debug_create_subtree(struct clk *clk, struct dentry *pdentry) { - struct clk *child; - int ret = -EINVAL;; - if (!clk || !pdentry) - goto out; - - ret = clk_debug_create_one(clk, pdentry); - - if (ret) - goto out; - - hlist_for_each_entry(child, &clk->children, child_node) - clk_debug_create_subtree(child, pdentry); + return -EINVAL;
- ret = 0; -out: - return ret; + return clk_debug_create_one(clk, pdentry); }
/** @@ -337,15 +323,10 @@ out: */ static int clk_debug_register(struct clk *clk) { - int ret = 0; - if (!inited) - goto out; - - ret = clk_debug_create_subtree(clk, rootdir); + return 0;
-out: - return ret; + return clk_debug_create_subtree(clk, rootdir); }
/** @@ -417,17 +398,12 @@ static int __init clk_debug_init(void) if (!d) return -ENOMEM;
- clk_prepare_lock(); - - hlist_for_each_entry(clk, &clk_root_list, child_node) - clk_debug_create_subtree(clk, rootdir); - - hlist_for_each_entry(clk, &clk_orphan_list, child_node) + mutex_lock(&clk_lookup_lock); + hlist_for_each_entry(clk, &clk_lookup_list, lookup_node) clk_debug_create_subtree(clk, rootdir);
inited = 1; - - clk_prepare_unlock(); + mutex_unlock(&clk_lookup_lock);
return 0; }
On 09/04, Stephen Boyd wrote:
In the near future we're going to move the prepare lock to be a per-clock ww_mutex. __clk_lookup() is called very deep in the set-rate path and we would like to avoid having to take all the locks in the clock tree to search for a clock (basically defeating the purpose of introducing per-clock locks). Introduce a new list that contains all clocks registered in the system and walk this list until the clock is found.
Signed-off-by: Stephen Boyd sboyd@codeaurora.org
Actually this won't work. We can't grab the list lock while the prepare lock is held. So we need to do the debugfs stuff with a different lock and do it outside of the prepare lock.
Rob Clark reports a lockdep splat that involves the prepare_lock chained with the mmap semaphore.
====================================================== [ INFO: possible circular locking dependency detected ] 3.17.0-rc1-00050-g07a489b #802 Tainted: G W ------------------------------------------------------- Xorg.bin/5413 is trying to acquire lock: (prepare_lock){+.+.+.}, at: [<c0781280>] clk_prepare_lock+0x88/0xfc
but task is already holding lock: (qcom_iommu_lock){+.+...}, at: [<c079f664>] qcom_iommu_unmap+0x1c/0x1f0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (qcom_iommu_lock){+.+...}: [<c079f860>] qcom_iommu_map+0x28/0x450 [<c079eb50>] iommu_map+0xc8/0x12c [<c056c1fc>] msm_iommu_map+0xb4/0x130 [<c05697bc>] msm_gem_get_iova_locked+0x9c/0xe8 [<c0569854>] msm_gem_get_iova+0x4c/0x64 [<c0562208>] mdp4_kms_init+0x4c4/0x6c0 [<c056881c>] msm_load+0x2ac/0x34c [<c0545724>] drm_dev_register+0xac/0x108 [<c0547510>] drm_platform_init+0x50/0xf0 [<c0578a60>] try_to_bring_up_master.part.3+0xc8/0x108 [<c0578b48>] component_master_add_with_match+0xa8/0x104 [<c0568294>] msm_pdev_probe+0x64/0x70 [<c057e704>] platform_drv_probe+0x2c/0x60 [<c057cff8>] driver_probe_device+0x108/0x234 [<c057b65c>] bus_for_each_drv+0x64/0x98 [<c057cec0>] device_attach+0x78/0x8c [<c057c590>] bus_probe_device+0x88/0xac [<c057c9b8>] deferred_probe_work_func+0x68/0x9c [<c0259db4>] process_one_work+0x1a0/0x40c [<c025a710>] worker_thread+0x44/0x4d8 [<c025ec54>] kthread+0xd8/0xec [<c020e9a8>] ret_from_fork+0x14/0x2c
-> #3 (&dev->struct_mutex){+.+.+.}: [<c0541188>] drm_gem_mmap+0x38/0xd0 [<c05695b8>] msm_gem_mmap+0xc/0x5c [<c02f0b6c>] mmap_region+0x35c/0x6c8 [<c02f11ec>] do_mmap_pgoff+0x314/0x398 [<c02de1e0>] vm_mmap_pgoff+0x84/0xb4 [<c02ef83c>] SyS_mmap_pgoff+0x94/0xbc [<c020e8e0>] ret_fast_syscall+0x0/0x48
-> #2 (&mm->mmap_sem){++++++}: [<c0321138>] filldir64+0x68/0x180 [<c0333fe0>] dcache_readdir+0x188/0x22c [<c0320ed0>] iterate_dir+0x9c/0x11c [<c03213b0>] SyS_getdents64+0x78/0xe8 [<c020e8e0>] ret_fast_syscall+0x0/0x48
-> #1 (&sb->s_type->i_mutex_key#3){+.+.+.}: [<c03fc544>] __create_file+0x58/0x1dc [<c03fc70c>] debugfs_create_dir+0x1c/0x24 [<c0781c7c>] clk_debug_create_subtree+0x20/0x170 [<c0be2af8>] clk_debug_init+0xec/0x14c [<c0208c70>] do_one_initcall+0x8c/0x1c8 [<c0b9cce4>] kernel_init_freeable+0x13c/0x1dc [<c0877bc4>] kernel_init+0x8/0xe8 [<c020e9a8>] ret_from_fork+0x14/0x2c
-> #0 (prepare_lock){+.+.+.}: [<c087c408>] mutex_lock_nested+0x70/0x3e8 [<c0781280>] clk_prepare_lock+0x88/0xfc [<c0782c50>] clk_prepare+0xc/0x24 [<c079f474>] __enable_clocks.isra.4+0x18/0xa4 [<c079f614>] __flush_iotlb_va+0xe0/0x114 [<c079f6f4>] qcom_iommu_unmap+0xac/0x1f0 [<c079ea3c>] iommu_unmap+0x9c/0xe8 [<c056c2fc>] msm_iommu_unmap+0x64/0x84 [<c0569da4>] msm_gem_free_object+0x11c/0x338 [<c05413ec>] drm_gem_object_handle_unreference_unlocked+0xfc/0x130 [<c0541604>] drm_gem_object_release_handle+0x50/0x68 [<c0447a98>] idr_for_each+0xa8/0xdc [<c0541c10>] drm_gem_release+0x1c/0x28 [<c0540b3c>] drm_release+0x370/0x428 [<c031105c>] __fput+0x98/0x1e8 [<c025d73c>] task_work_run+0xb0/0xfc [<c02477ec>] do_exit+0x2ec/0x948 [<c0247ec0>] do_group_exit+0x4c/0xb8 [<c025180c>] get_signal+0x28c/0x6ac [<c0211204>] do_signal+0xc4/0x3e4 [<c02116cc>] do_work_pending+0xb4/0xc4 [<c020e938>] work_pending+0xc/0x20
other info that might help us debug this:
Chain exists of: prepare_lock --> &dev->struct_mutex --> qcom_iommu_lock
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(qcom_iommu_lock); lock(&dev->struct_mutex); lock(qcom_iommu_lock); lock(prepare_lock);
*** DEADLOCK ***
3 locks held by Xorg.bin/5413: #0: (drm_global_mutex){+.+.+.}, at: [<c0540800>] drm_release+0x34/0x428 #1: (&dev->struct_mutex){+.+.+.}, at: [<c05413bc>] drm_gem_object_handle_unreference_unlocked+0xcc/0x130 #2: (qcom_iommu_lock){+.+...}, at: [<c079f664>] qcom_iommu_unmap+0x1c/0x1f0
stack backtrace: CPU: 1 PID: 5413 Comm: Xorg.bin Tainted: G W 3.17.0-rc1-00050-g07a489b #802 [<c0216290>] (unwind_backtrace) from [<c0211d8c>] (show_stack+0x10/0x14) [<c0211d8c>] (show_stack) from [<c087a078>] (dump_stack+0x98/0xb8) [<c087a078>] (dump_stack) from [<c027f024>] (print_circular_bug+0x218/0x340) [<c027f024>] (print_circular_bug) from [<c0283e08>] (__lock_acquire+0x1d24/0x20b8) [<c0283e08>] (__lock_acquire) from [<c0284774>] (lock_acquire+0x9c/0xbc) [<c0284774>] (lock_acquire) from [<c087c408>] (mutex_lock_nested+0x70/0x3e8) [<c087c408>] (mutex_lock_nested) from [<c0781280>] (clk_prepare_lock+0x88/0xfc) [<c0781280>] (clk_prepare_lock) from [<c0782c50>] (clk_prepare+0xc/0x24) [<c0782c50>] (clk_prepare) from [<c079f474>] (__enable_clocks.isra.4+0x18/0xa4) [<c079f474>] (__enable_clocks.isra.4) from [<c079f614>] (__flush_iotlb_va+0xe0/0x114) [<c079f614>] (__flush_iotlb_va) from [<c079f6f4>] (qcom_iommu_unmap+0xac/0x1f0) [<c079f6f4>] (qcom_iommu_unmap) from [<c079ea3c>] (iommu_unmap+0x9c/0xe8) [<c079ea3c>] (iommu_unmap) from [<c056c2fc>] (msm_iommu_unmap+0x64/0x84) [<c056c2fc>] (msm_iommu_unmap) from [<c0569da4>] (msm_gem_free_object+0x11c/0x338) [<c0569da4>] (msm_gem_free_object) from [<c05413ec>] (drm_gem_object_handle_unreference_unlocked+0xfc/0x130) [<c05413ec>] (drm_gem_object_handle_unreference_unlocked) from [<c0541604>] (drm_gem_object_release_handle+0x50/0x68) [<c0541604>] (drm_gem_object_release_handle) from [<c0447a98>] (idr_for_each+0xa8/0xdc) [<c0447a98>] (idr_for_each) from [<c0541c10>] (drm_gem_release+0x1c/0x28) [<c0541c10>] (drm_gem_release) from [<c0540b3c>] (drm_release+0x370/0x428) [<c0540b3c>] (drm_release) from [<c031105c>] (__fput+0x98/0x1e8) [<c031105c>] (__fput) from [<c025d73c>] (task_work_run+0xb0/0xfc) [<c025d73c>] (task_work_run) from [<c02477ec>] (do_exit+0x2ec/0x948) [<c02477ec>] (do_exit) from [<c0247ec0>] (do_group_exit+0x4c/0xb8) [<c0247ec0>] (do_group_exit) from [<c025180c>] (get_signal+0x28c/0x6ac) [<c025180c>] (get_signal) from [<c0211204>] (do_signal+0xc4/0x3e4) [<c0211204>] (do_signal) from [<c02116cc>] (do_work_pending+0xb4/0xc4) [<c02116cc>] (do_work_pending) from [<c020e938>] (work_pending+0xc/0x20)
We can break this chain if we don't hold the prepare_lock while creating debugfs directories. We only hold the prepare_lock right now because we're traversing the clock tree recursively and we don't want the hierarchy to change during the traversal. Replacing this traversal with a simple linked list walk allows us to only grab a list lock instead of the prepare_lock, thus breaking the lock chain.
Signed-off-by: Stephen Boyd sboyd@codeaurora.org ---
I don't understand why we need to hold the prepare lock around the kref_put(), so I changed the flow so that we don't do this when we unregister a clock.
Changes since v1: * Squashed two patches into one * Focused entirely on debugfs now
drivers/clk/clk.c | 66 +++++++++++++++------------------------------ include/linux/clk-private.h | 1 + 2 files changed, 23 insertions(+), 44 deletions(-)
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index b76fa69b44cb..3c04d0d69b96 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -100,6 +100,8 @@ static void clk_enable_unlock(unsigned long flags)
static struct dentry *rootdir; static int inited = 0; +static DEFINE_MUTEX(clk_debug_lock); +static HLIST_HEAD(clk_debug_list);
static struct hlist_head *all_lists[] = { &clk_root_list, @@ -300,28 +302,6 @@ out: return ret; }
-/* caller must hold prepare_lock */ -static int clk_debug_create_subtree(struct clk *clk, struct dentry *pdentry) -{ - struct clk *child; - int ret = -EINVAL;; - - if (!clk || !pdentry) - goto out; - - ret = clk_debug_create_one(clk, pdentry); - - if (ret) - goto out; - - hlist_for_each_entry(child, &clk->children, child_node) - clk_debug_create_subtree(child, pdentry); - - ret = 0; -out: - return ret; -} - /** * clk_debug_register - add a clk node to the debugfs clk tree * @clk: the clk being added to the debugfs clk tree @@ -329,20 +309,21 @@ out: * Dynamically adds a clk to the debugfs clk tree if debugfs has been * initialized. Otherwise it bails out early since the debugfs clk tree * will be created lazily by clk_debug_init as part of a late_initcall. - * - * Caller must hold prepare_lock. Only clk_init calls this function (so - * far) so this is taken care. */ static int clk_debug_register(struct clk *clk) { int ret = 0;
+ mutex_lock(&clk_debug_lock); + hlist_add_head(&clk->debug_node, &clk_debug_list); + if (!inited) - goto out; + goto unlock;
- ret = clk_debug_create_subtree(clk, rootdir); + ret = clk_debug_create_one(clk, rootdir); +unlock: + mutex_unlock(&clk_debug_lock);
-out: return ret; }
@@ -353,11 +334,13 @@ out: * Dynamically removes a clk and all it's children clk nodes from the * debugfs clk tree if clk->dentry points to debugfs created by * clk_debug_register in __clk_init. - * - * Caller must hold prepare_lock. */ static void clk_debug_unregister(struct clk *clk) { + mutex_lock(&clk_debug_lock); + hlist_del_init(&clk->debug_node); + mutex_unlock(&clk_debug_lock); + debugfs_remove_recursive(clk->dentry); }
@@ -415,17 +398,12 @@ static int __init clk_debug_init(void) if (!d) return -ENOMEM;
- clk_prepare_lock(); - - hlist_for_each_entry(clk, &clk_root_list, child_node) - clk_debug_create_subtree(clk, rootdir); - - hlist_for_each_entry(clk, &clk_orphan_list, child_node) - clk_debug_create_subtree(clk, rootdir); + mutex_lock(&clk_debug_lock); + hlist_for_each_entry(clk, &clk_debug_list, debug_node) + clk_debug_create_one(clk, rootdir);
inited = 1; - - clk_prepare_unlock(); + mutex_unlock(&clk_debug_lock);
return 0; } @@ -2094,7 +2072,8 @@ void clk_unregister(struct clk *clk)
if (clk->ops == &clk_nodrv_ops) { pr_err("%s: unregistered clock: %s\n", __func__, clk->name); - goto out; + clk_prepare_unlock(); + return; } /* * Assign empty clock ops for consumers that might still hold @@ -2113,17 +2092,16 @@ void clk_unregister(struct clk *clk) clk_set_parent(child, NULL); }
- clk_debug_unregister(clk); - hlist_del_init(&clk->child_node);
if (clk->prepare_count) pr_warn("%s: unregistering prepared clock: %s\n", __func__, clk->name); + clk_prepare_unlock(); + + clk_debug_unregister(clk);
kref_put(&clk->ref, __clk_release); -out: - clk_prepare_unlock(); } EXPORT_SYMBOL_GPL(clk_unregister);
diff --git a/include/linux/clk-private.h b/include/linux/clk-private.h index efbf70b9fd84..4ed34105c371 100644 --- a/include/linux/clk-private.h +++ b/include/linux/clk-private.h @@ -48,6 +48,7 @@ struct clk { unsigned long accuracy; struct hlist_head children; struct hlist_node child_node; + struct hlist_node debug_node; unsigned int notifier_count; #ifdef CONFIG_DEBUG_FS struct dentry *dentry;
On 09/04, Stephen Boyd wrote:
I don't understand why we need to hold the prepare lock around the kref_put(), so I changed the flow so that we don't do this when we unregister a clock.
Ok we hold the prepare mutex to make sure get and put are serialized. Good. Here's the interdiff to move the debugfs unregistration before the prepare lock and detect double unregisters without holding the prepare lock.
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 3c04d0d69b96..8ca28189e4e9 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -338,10 +338,14 @@ unlock: static void clk_debug_unregister(struct clk *clk) { mutex_lock(&clk_debug_lock); - hlist_del_init(&clk->debug_node); - mutex_unlock(&clk_debug_lock); + if (!clk->dentry) + goto out;
+ hlist_del_init(&clk->debug_node); debugfs_remove_recursive(clk->dentry); + clk->dentry = NULL; +out: + mutex_unlock(&clk_debug_lock); }
struct dentry *clk_debugfs_add_file(struct clk *clk, char *name, umode_t mode, @@ -2065,14 +2069,15 @@ void clk_unregister(struct clk *clk) { unsigned long flags;
- if (!clk || WARN_ON_ONCE(IS_ERR(clk))) - return; + if (!clk || WARN_ON_ONCE(IS_ERR(clk))) + return; + + clk_debug_unregister(clk);
clk_prepare_lock();
if (clk->ops == &clk_nodrv_ops) { pr_err("%s: unregistered clock: %s\n", __func__, clk->name); - clk_prepare_unlock(); return; } /* @@ -2097,11 +2102,9 @@ void clk_unregister(struct clk *clk) if (clk->prepare_count) pr_warn("%s: unregistering prepared clock: %s\n", __func__, clk->name); - clk_prepare_unlock(); - - clk_debug_unregister(clk); - kref_put(&clk->ref, __clk_release); + + clk_prepare_unlock(); } EXPORT_SYMBOL_GPL(clk_unregister);
Quoting Stephen Boyd (2014-09-05 17:00:00)
On 09/04, Stephen Boyd wrote:
I don't understand why we need to hold the prepare lock around the kref_put(), so I changed the flow so that we don't do this when we unregister a clock.
Ok we hold the prepare mutex to make sure get and put are serialized. Good. Here's the interdiff to move the debugfs unregistration before the prepare lock and detect double unregisters without holding the prepare lock.
Looks good to me. I've rolled this into the original above and applied it to clk-next.
Regards, Mike
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 3c04d0d69b96..8ca28189e4e9 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -338,10 +338,14 @@ unlock: static void clk_debug_unregister(struct clk *clk) { mutex_lock(&clk_debug_lock);
hlist_del_init(&clk->debug_node);
mutex_unlock(&clk_debug_lock);
if (!clk->dentry)
goto out;
hlist_del_init(&clk->debug_node); debugfs_remove_recursive(clk->dentry);
clk->dentry = NULL;
+out:
mutex_unlock(&clk_debug_lock);
}
struct dentry *clk_debugfs_add_file(struct clk *clk, char *name, umode_t mode, @@ -2065,14 +2069,15 @@ void clk_unregister(struct clk *clk) { unsigned long flags;
if (!clk || WARN_ON_ONCE(IS_ERR(clk)))
return;
if (!clk || WARN_ON_ONCE(IS_ERR(clk)))
return;
clk_debug_unregister(clk); clk_prepare_lock(); if (clk->ops == &clk_nodrv_ops) { pr_err("%s: unregistered clock: %s\n", __func__, clk->name);
clk_prepare_unlock(); return; } /*
@@ -2097,11 +2102,9 @@ void clk_unregister(struct clk *clk) if (clk->prepare_count) pr_warn("%s: unregistering prepared clock: %s\n", __func__, clk->name);
clk_prepare_unlock();
clk_debug_unregister(clk);
kref_put(&clk->ref, __clk_release);
clk_prepare_unlock();
} EXPORT_SYMBOL_GPL(clk_unregister);
-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
dri-devel@lists.freedesktop.org