-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter Sent: Monday, January 5, 2015 16:40 To: Cheng, Yao Cc: Daniel Vetter; Thierry Reding; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei; Beckett, Robert; Barbalho, Rafael Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Sun, Dec 21, 2014 at 02:40:24PM +0000, Cheng, Yao wrote:
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] Sent: Thursday, December 18, 2014 19:21 To: Thierry Reding Cc: Cheng, Yao; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Thu, Dec 18, 2014 at 11:04 AM, Thierry Reding thierry.reding@gmail.com wrote:
I double checked the symptom and found it was a deadlock on
drm_global_mutex.
When i915_driver_load() registers the platform device while ipvr module
is in the system, ipvr's probe() function tries to lock drm_global_mutex which was already held by i915.
I think either of the following 2 actions need to be moved to a bottom half
e.g. a work queue:
platform_device_add () call in i915_ved.c (called during
i915_driver_load())
drm_dev_register() call during ipvr's probe() Which one
makes more sense? pls kindly advise (I personally prefer the former
one.).
Yes, that's somewhat ugly, but I don't see a way around that. I'd also think that moving platform_device_add() to a workqueue would be the best option here.
Or we simply kill drm_global_mutex for platform drivers that don't use the -
probe hook. It should work when they have a correct order betwen
drm_dev_alloc and _register and all the code in between. So just ditch the -
load callback in teh ipvr driver and rework the load sequence as suggested
somewhere else and this is fixed already. No need for bottom halfs I
think.
Daniel, sorry I didn't quite understand "platform drivers that don't use the probe hook". For initialization, the ipvr platform driver's probe() is called in following 2 possible paths:
- ipvr installed before i915. In this case, ipvr's probe() is called
inside i915_driver_load() and falls into the drm_global_mutex dead lock. 2. i915 installed before ipvr. In this case, ipvr's probe() is called without drm_global_mutex held by i915 and no dead lock issue. If we kill drm_global_mutex, will path 2 run into issue? And in your suggestion, how to rework the load sequence? Do you mean calling ipvr's load() callback directly during platform driver probe()?
Hm right it's not that simple really. What we need in more detail is:
Move the mutex_lock(&drm_global_mutex) out of drm_dev_register into all the callers. If a driver has a ->load() callback it most likely is racy with the usual load ordering issues.
Rework ipvr to no longer have a ->load callback. Insteaed use the following sequence (in the platform ->probe callback):
drm_dev_alloc(); ipvr_load(); drm_dev_register();
With that ordering we don't need the additional guarantees that drm_global_mutex provides and we can avoid to take that lock around drm_dev_registrer() call in the ipvr code.
Thanks for the detailed explanation, Daniel! That sounds to be a small refactor on drm core, and need change many drm drivers: nouveau, tegra, udl. Should it be a standalone RFC patch?
This should resolve the deadlock I hope.
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch