on vlv, if ipvr is installed, it need be manually unloaded before i915, otherwise user might run into use-after-free issue.
v2: added this patch per Daniel's comment
v3: no change
Signed-off-by: Yao Cheng yao.cheng@intel.com --- tests/drv_module_reload | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/tests/drv_module_reload b/tests/drv_module_reload index 5cbff89..82c67bd 100755 --- a/tests/drv_module_reload +++ b/tests/drv_module_reload @@ -24,6 +24,14 @@ rmmod snd_hda_intel &> /dev/null
#ignore errors in ips - gen5 only rmmod intel_ips &> /dev/null + +# vlv only for now: +# due to platform device model limitation, need unload ipvr manually +if lsmod | grep ipvr &> /dev/null ; then + echo Need manually unload ipvr.ko. + rmmod ipvr +fi + rmmod i915 #ignore errors in intel-gtt, often built-in rmmod intel-gtt &> /dev/null @@ -31,6 +39,11 @@ rmmod intel-gtt &> /dev/null rmmod drm_kms_helper &> /dev/null rmmod drm &> /dev/null
+if lsmod | grep ipvr &> /dev/null ; then + echo WARNING: ipvr.ko still loaded! + exit 1 +fi + if lsmod | grep i915 &> /dev/null ; then echo WARNING: i915.ko still loaded! exit 1 @@ -41,6 +54,9 @@ fi modprobe i915 echo 1 > /sys/class/vtconsole/vtcon1/bind
+# for vlv, load VED driver +modprobe ipvr + modprobe snd_hda_intel
# try to run something
On Sat, Nov 22, 2014 at 03:10:01AM +0800, Yao Cheng wrote:
on vlv, if ipvr is installed, it need be manually unloaded before i915, otherwise user might run into use-after-free issue.
Huh? That doesn't sound right. What exactly is it that's going wrong? You should never have to do this. If you do you're almost certainly doing something wrong in the kernel module.
Thierry
On Fri, Nov 21, 2014 at 09:27:04PM +0100, Thierry Reding wrote:
On Sat, Nov 22, 2014 at 03:10:01AM +0800, Yao Cheng wrote:
on vlv, if ipvr is installed, it need be manually unloaded before i915, otherwise user might run into use-after-free issue.
Huh? That doesn't sound right. What exactly is it that's going wrong? You should never have to do this. If you do you're almost certainly doing something wrong in the kernel module.
It's the hilarity called platform devices. Removing them is somewhat racy, so doing that upfront makes the entire thing a bit safer. The use after free is on the text, since grabbing a module refcount for the platform device doesn't work (it would pin the module forever). -Daniel
On Fri, Nov 21, 2014 at 09:36:33PM +0100, Daniel Vetter wrote:
On Fri, Nov 21, 2014 at 09:27:04PM +0100, Thierry Reding wrote:
On Sat, Nov 22, 2014 at 03:10:01AM +0800, Yao Cheng wrote:
on vlv, if ipvr is installed, it need be manually unloaded before i915, otherwise user might run into use-after-free issue.
Huh? That doesn't sound right. What exactly is it that's going wrong? You should never have to do this. If you do you're almost certainly doing something wrong in the kernel module.
It's the hilarity called platform devices. Removing them is somewhat racy, so doing that upfront makes the entire thing a bit safer. The use after free is on the text, since grabbing a module refcount for the platform device doesn't work (it would pin the module forever).
I don't understand what the issue is here. I've used platform devices quite extensively on ARM and I've never encountered a situation where they were insufficient (or racy for that matter).
If I understand correctly what this commit tries to achieve, then it unloads one module before another module that it depends on so that the dependency can be removed subsequently without causing a crash. That sounds really brittle to me. How are you going to document this for users so that they don't accidentally go and unload the i915 module and crash their system?
Thierry
On Mon, Nov 24, 2014 at 10:55:46AM +0100, Thierry Reding wrote:
On Fri, Nov 21, 2014 at 09:36:33PM +0100, Daniel Vetter wrote:
On Fri, Nov 21, 2014 at 09:27:04PM +0100, Thierry Reding wrote:
On Sat, Nov 22, 2014 at 03:10:01AM +0800, Yao Cheng wrote:
on vlv, if ipvr is installed, it need be manually unloaded before i915, otherwise user might run into use-after-free issue.
Huh? That doesn't sound right. What exactly is it that's going wrong? You should never have to do this. If you do you're almost certainly doing something wrong in the kernel module.
It's the hilarity called platform devices. Removing them is somewhat racy, so doing that upfront makes the entire thing a bit safer. The use after free is on the text, since grabbing a module refcount for the platform device doesn't work (it would pin the module forever).
I don't understand what the issue is here. I've used platform devices quite extensively on ARM and I've never encountered a situation where they were insufficient (or racy for that matter).
If I understand correctly what this commit tries to achieve, then it unloads one module before another module that it depends on so that the dependency can be removed subsequently without causing a crash. That sounds really brittle to me. How are you going to document this for users so that they don't accidentally go and unload the i915 module and crash their system?
Module unloading taints your kernel and isn't an end-user supported feature. That simple ;-)
Also afaik the problem is that you actually can't unload i915 until you've unloaded the subordinate driver, since i915 registering the platform driver prevents unload. Or at least that was my understanding, I didn't test this myself. I just asked whether the unload script still works and apparently it breaks.
I guess what's different with ARM is that DT creates all the platform devices, and not modules themselves? -Daniel
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter Sent: Monday, November 24, 2014 21:15 To: Thierry Reding Cc: Daniel Vetter; Cheng, Yao; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; daniel.vetter@ffwll.ch; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Mon, Nov 24, 2014 at 10:55:46AM +0100, Thierry Reding wrote:
On Fri, Nov 21, 2014 at 09:36:33PM +0100, Daniel Vetter wrote:
On Fri, Nov 21, 2014 at 09:27:04PM +0100, Thierry Reding wrote:
On Sat, Nov 22, 2014 at 03:10:01AM +0800, Yao Cheng wrote:
on vlv, if ipvr is installed, it need be manually unloaded before i915, otherwise user might run into use-after-free issue.
Huh? That doesn't sound right. What exactly is it that's going wrong? You should never have to do this. If you do you're almost certainly doing something wrong in the kernel module.
It's the hilarity called platform devices. Removing them is somewhat racy, so doing that upfront makes the entire thing a bit safer. The use after free is on the text, since grabbing a module refcount for the platform device doesn't work (it would pin the module forever).
I don't understand what the issue is here. I've used platform devices quite extensively on ARM and I've never encountered a situation where they were insufficient (or racy for that matter).
If I understand correctly what this commit tries to achieve, then it unloads one module before another module that it depends on so that the dependency can be removed subsequently without causing a crash. That sounds really brittle to me. How are you going to document this for users so that they don't accidentally go and unload the i915 module and crash their system?
Module unloading taints your kernel and isn't an end-user supported feature. That simple ;-)
Also afaik the problem is that you actually can't unload i915 until you've unloaded the subordinate driver, since i915 registering the platform driver prevents unload. Or at least that was my understanding, I didn't test this myself. I just asked whether the unload script still works and apparently it breaks.
I guess what's different with ARM is that DT creates all the platform devices, and not modules themselves? -Daniel
Thierry/Daniel, the actual symptom is, after "rmmod i915", though drm_drv_release() is also called on the child device "ipvr", I still see the module exist in the system (check it by "lsmod"). This causes issue when I modprobe i915 and ipvr again later. I don't understand why this happens but I believe what Daniel said: "grabbing a module refcount for the platform device doesn't work (it would pin the module forever)".
-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Mon, Dec 01, 2014 at 03:06:08AM +0000, Cheng, Yao wrote:
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter Sent: Monday, November 24, 2014 21:15 To: Thierry Reding Cc: Daniel Vetter; Cheng, Yao; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; daniel.vetter@ffwll.ch; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Mon, Nov 24, 2014 at 10:55:46AM +0100, Thierry Reding wrote:
On Fri, Nov 21, 2014 at 09:36:33PM +0100, Daniel Vetter wrote:
On Fri, Nov 21, 2014 at 09:27:04PM +0100, Thierry Reding wrote:
On Sat, Nov 22, 2014 at 03:10:01AM +0800, Yao Cheng wrote:
on vlv, if ipvr is installed, it need be manually unloaded before i915, otherwise user might run into use-after-free issue.
Huh? That doesn't sound right. What exactly is it that's going wrong? You should never have to do this. If you do you're almost certainly doing something wrong in the kernel module.
It's the hilarity called platform devices. Removing them is somewhat racy, so doing that upfront makes the entire thing a bit safer. The use after free is on the text, since grabbing a module refcount for the platform device doesn't work (it would pin the module forever).
I don't understand what the issue is here. I've used platform devices quite extensively on ARM and I've never encountered a situation where they were insufficient (or racy for that matter).
If I understand correctly what this commit tries to achieve, then it unloads one module before another module that it depends on so that the dependency can be removed subsequently without causing a crash. That sounds really brittle to me. How are you going to document this for users so that they don't accidentally go and unload the i915 module and crash their system?
Module unloading taints your kernel and isn't an end-user supported feature. That simple ;-)
Also afaik the problem is that you actually can't unload i915 until you've unloaded the subordinate driver, since i915 registering the platform driver prevents unload. Or at least that was my understanding, I didn't test this myself. I just asked whether the unload script still works and apparently it breaks.
I guess what's different with ARM is that DT creates all the platform devices, and not modules themselves? -Daniel
Thierry/Daniel, the actual symptom is, after "rmmod i915", though drm_drv_release() is also called on the child device "ipvr", I still see the module exist in the system (check it by "lsmod").
Which module? ipvr or i915?
This causes issue when I modprobe i915 and ipvr again later.
What issue are you seeing? If your driver can't deal with a situation where it's probed again after being removed then you have a bug.
I don't understand why this happens but I believe what Daniel said: "grabbing a module refcount for the platform device doesn't work (it would pin the module forever)"
What I'd expect to happen is this:
# modprobe i915 i915 registers a platform devices # modprobe ipvr driver core probes ipvr device # modprobe -r i915 i915 removes the platform device (ipvr's ->remove() is called)
I guess if you don't do anything else, then indeed the ipvr module will stay around, but the above should work idempotently, that is you should be able to repeat it an unlimited number of times and nothing should break.
In fact you should be able to run the following in any permutation without causing a crash:
# modprobe i915 # modprobe ipvr # modprobe -r ipvr # modprobe -r i915
If any permutation results in a crash you have a bug.
Thierry
-----Original Message----- From: Thierry Reding [mailto:thierry.reding@gmail.com] Sent: Wednesday, December 17, 2014 16:13 To: Cheng, Yao Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; daniel.vetter@ffwll.ch; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
Thanks Thiery for the suggestion, pls see my inline comments
Thierry/Daniel, the actual symptom is, after "rmmod i915", though drm_drv_release() is also called on the child device "ipvr", I still see the module exist in the system (check it by "lsmod").
Which module? ipvr or i915?
The ipvr module still exist by checking "lsmod" after rmmod i915
This causes issue when I modprobe i915 and ipvr again later.
What issue are you seeing? If your driver can't deal with a situation where it's probed again after being removed then you have a bug.
I double checked the symptom and found it was a deadlock on drm_global_mutex. When i915_driver_load() registers the platform device while ipvr module is in the system, ipvr's probe() function tries to lock drm_global_mutex which was already held by i915. I think either of the following 2 actions need to be moved to a bottom half e.g. a work queue: platform_device_add () call in i915_ved.c (called during i915_driver_load()) drm_dev_register() call during ipvr's probe() Which one makes more sense? pls kindly advise (I personally prefer the former one.).
I don't understand why this happens but I believe what Daniel said: "grabbing a module refcount for the platform device doesn't work (it would pin the module forever)"
What I'd expect to happen is this:
# modprobe i915 i915 registers a platform devices # modprobe ipvr driver core probes ipvr device # modprobe -r i915 i915 removes the platform device (ipvr's ->remove() is called)
I guess if you don't do anything else, then indeed the ipvr module will stay around, but the above should work idempotently, that is you should be able to repeat it an unlimited number of times and nothing should break.
In fact you should be able to run the following in any permutation without causing a crash:
# modprobe i915 # modprobe ipvr # modprobe -r ipvr # modprobe -r i915
If any permutation results in a crash you have a bug.
I assume all the permutations will work after fixing the deadlock.
Thierry
On Thu, Dec 18, 2014 at 05:44:37AM +0000, Cheng, Yao wrote:
-----Original Message----- From: Thierry Reding [mailto:thierry.reding@gmail.com] Sent: Wednesday, December 17, 2014 16:13 To: Cheng, Yao Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; daniel.vetter@ffwll.ch; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
Thanks Thiery for the suggestion, pls see my inline comments
Thierry/Daniel, the actual symptom is, after "rmmod i915", though drm_drv_release() is also called on the child device "ipvr", I still see the module exist in the system (check it by "lsmod").
Which module? ipvr or i915?
The ipvr module still exist by checking "lsmod" after rmmod i915
This causes issue when I modprobe i915 and ipvr again later.
What issue are you seeing? If your driver can't deal with a situation where it's probed again after being removed then you have a bug.
I double checked the symptom and found it was a deadlock on drm_global_mutex. When i915_driver_load() registers the platform device while ipvr module is in the system, ipvr's probe() function tries to lock drm_global_mutex which was already held by i915. I think either of the following 2 actions need to be moved to a bottom half e.g. a work queue: platform_device_add () call in i915_ved.c (called during i915_driver_load()) drm_dev_register() call during ipvr's probe() Which one makes more sense? pls kindly advise (I personally prefer the former one.).
Yes, that's somewhat ugly, but I don't see a way around that. I'd also think that moving platform_device_add() to a workqueue would be the best option here.
Thierry
On Thu, Dec 18, 2014 at 11:04 AM, Thierry Reding thierry.reding@gmail.com wrote:
I double checked the symptom and found it was a deadlock on drm_global_mutex. When i915_driver_load() registers the platform device while ipvr module is in the system, ipvr's probe() function tries to lock drm_global_mutex which was already held by i915. I think either of the following 2 actions need to be moved to a bottom half e.g. a work queue: platform_device_add () call in i915_ved.c (called during i915_driver_load()) drm_dev_register() call during ipvr's probe() Which one makes more sense? pls kindly advise (I personally prefer the former one.).
Yes, that's somewhat ugly, but I don't see a way around that. I'd also think that moving platform_device_add() to a workqueue would be the best option here.
Or we simply kill drm_global_mutex for platform drivers that don't use the ->probe hook. It should work when they have a correct order betwen drm_dev_alloc and _register and all the code in between. So just ditch the ->load callback in teh ipvr driver and rework the load sequence as suggested somewhere else and this is fixed already. No need for bottom halfs I think. -Daniel
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] Sent: Thursday, December 18, 2014 19:21 To: Thierry Reding Cc: Cheng, Yao; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Thu, Dec 18, 2014 at 11:04 AM, Thierry Reding thierry.reding@gmail.com wrote:
I double checked the symptom and found it was a deadlock on
drm_global_mutex.
When i915_driver_load() registers the platform device while ipvr module
is in the system, ipvr's probe() function tries to lock drm_global_mutex which was already held by i915.
I think either of the following 2 actions need to be moved to a bottom half
e.g. a work queue:
platform_device_add () call in i915_ved.c (called during
i915_driver_load())
drm_dev_register() call during ipvr's probe() Which one makes
more sense? pls kindly advise (I personally prefer the former one.).
Yes, that's somewhat ugly, but I don't see a way around that. I'd also think that moving platform_device_add() to a workqueue would be the best option here.
Or we simply kill drm_global_mutex for platform drivers that don't use the -
probe hook. It should work when they have a correct order betwen
drm_dev_alloc and _register and all the code in between. So just ditch the -
load callback in teh ipvr driver and rework the load sequence as suggested
somewhere else and this is fixed already. No need for bottom halfs I think.
Daniel, sorry I didn't quite understand "platform drivers that don't use the probe hook". For initialization, the ipvr platform driver's probe() is called in following 2 possible paths: 1. ipvr installed before i915. In this case, ipvr's probe() is called inside i915_driver_load() and falls into the drm_global_mutex dead lock. 2. i915 installed before ipvr. In this case, ipvr's probe() is called without drm_global_mutex held by i915 and no dead lock issue. If we kill drm_global_mutex, will path 2 run into issue? And in your suggestion, how to rework the load sequence? Do you mean calling ipvr's load() callback directly during platform driver probe()?
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Sun, Dec 21, 2014 at 02:40:24PM +0000, Cheng, Yao wrote:
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] Sent: Thursday, December 18, 2014 19:21 To: Thierry Reding Cc: Cheng, Yao; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Thu, Dec 18, 2014 at 11:04 AM, Thierry Reding thierry.reding@gmail.com wrote:
I double checked the symptom and found it was a deadlock on
drm_global_mutex.
When i915_driver_load() registers the platform device while ipvr module
is in the system, ipvr's probe() function tries to lock drm_global_mutex which was already held by i915.
I think either of the following 2 actions need to be moved to a bottom half
e.g. a work queue:
platform_device_add () call in i915_ved.c (called during
i915_driver_load())
drm_dev_register() call during ipvr's probe() Which one makes
more sense? pls kindly advise (I personally prefer the former one.).
Yes, that's somewhat ugly, but I don't see a way around that. I'd also think that moving platform_device_add() to a workqueue would be the best option here.
Or we simply kill drm_global_mutex for platform drivers that don't use the -
probe hook. It should work when they have a correct order betwen
drm_dev_alloc and _register and all the code in between. So just ditch the -
load callback in teh ipvr driver and rework the load sequence as suggested
somewhere else and this is fixed already. No need for bottom halfs I think.
Daniel, sorry I didn't quite understand "platform drivers that don't use the probe hook". For initialization, the ipvr platform driver's probe() is called in following 2 possible paths:
- ipvr installed before i915. In this case, ipvr's probe() is called
inside i915_driver_load() and falls into the drm_global_mutex dead lock. 2. i915 installed before ipvr. In this case, ipvr's probe() is called without drm_global_mutex held by i915 and no dead lock issue. If we kill drm_global_mutex, will path 2 run into issue? And in your suggestion, how to rework the load sequence? Do you mean calling ipvr's load() callback directly during platform driver probe()?
Hm right it's not that simple really. What we need in more detail is: - Move the mutex_lock(&drm_global_mutex) out of drm_dev_register into all the callers. If a driver has a ->load() callback it most likely is racy with the usual load ordering issues.
- Rework ipvr to no longer have a ->load callback. Insteaed use the following sequence (in the platform ->probe callback):
drm_dev_alloc(); ipvr_load(); drm_dev_register();
With that ordering we don't need the additional guarantees that drm_global_mutex provides and we can avoid to take that lock around drm_dev_registrer() call in the ipvr code.
This should resolve the deadlock I hope. -Daniel
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter Sent: Monday, January 5, 2015 16:40 To: Cheng, Yao Cc: Daniel Vetter; Thierry Reding; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei; Beckett, Robert; Barbalho, Rafael Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Sun, Dec 21, 2014 at 02:40:24PM +0000, Cheng, Yao wrote:
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] Sent: Thursday, December 18, 2014 19:21 To: Thierry Reding Cc: Cheng, Yao; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Thu, Dec 18, 2014 at 11:04 AM, Thierry Reding thierry.reding@gmail.com wrote:
I double checked the symptom and found it was a deadlock on
drm_global_mutex.
When i915_driver_load() registers the platform device while ipvr module
is in the system, ipvr's probe() function tries to lock drm_global_mutex which was already held by i915.
I think either of the following 2 actions need to be moved to a bottom half
e.g. a work queue:
platform_device_add () call in i915_ved.c (called during
i915_driver_load())
drm_dev_register() call during ipvr's probe() Which one
makes more sense? pls kindly advise (I personally prefer the former
one.).
Yes, that's somewhat ugly, but I don't see a way around that. I'd also think that moving platform_device_add() to a workqueue would be the best option here.
Or we simply kill drm_global_mutex for platform drivers that don't use the -
probe hook. It should work when they have a correct order betwen
drm_dev_alloc and _register and all the code in between. So just ditch the -
load callback in teh ipvr driver and rework the load sequence as suggested
somewhere else and this is fixed already. No need for bottom halfs I
think.
Daniel, sorry I didn't quite understand "platform drivers that don't use the probe hook". For initialization, the ipvr platform driver's probe() is called in following 2 possible paths:
- ipvr installed before i915. In this case, ipvr's probe() is called
inside i915_driver_load() and falls into the drm_global_mutex dead lock. 2. i915 installed before ipvr. In this case, ipvr's probe() is called without drm_global_mutex held by i915 and no dead lock issue. If we kill drm_global_mutex, will path 2 run into issue? And in your suggestion, how to rework the load sequence? Do you mean calling ipvr's load() callback directly during platform driver probe()?
Hm right it's not that simple really. What we need in more detail is:
Move the mutex_lock(&drm_global_mutex) out of drm_dev_register into all the callers. If a driver has a ->load() callback it most likely is racy with the usual load ordering issues.
Rework ipvr to no longer have a ->load callback. Insteaed use the following sequence (in the platform ->probe callback):
drm_dev_alloc(); ipvr_load(); drm_dev_register();
With that ordering we don't need the additional guarantees that drm_global_mutex provides and we can avoid to take that lock around drm_dev_registrer() call in the ipvr code.
Thanks for the detailed explanation, Daniel! That sounds to be a small refactor on drm core, and need change many drm drivers: nouveau, tegra, udl. Should it be a standalone RFC patch?
This should resolve the deadlock I hope.
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Tue, Jan 06, 2015 at 02:14:27PM +0000, Cheng, Yao wrote:
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel Vetter Sent: Monday, January 5, 2015 16:40 To: Cheng, Yao Cc: Daniel Vetter; Thierry Reding; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei; Beckett, Robert; Barbalho, Rafael Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Sun, Dec 21, 2014 at 02:40:24PM +0000, Cheng, Yao wrote:
-----Original Message----- From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] Sent: Thursday, December 18, 2014 19:21 To: Thierry Reding Cc: Cheng, Yao; intel-gfx@lists.freedesktop.org; dri- devel@lists.freedesktop.org; Kelley, Sean V; Chehab, John; emil.l.velikov@gmail.com; Jiang, Fei Subject: Re: [RFC PATCH v3 4/4] tests/drv_module_reload: add ipvr support
On Thu, Dec 18, 2014 at 11:04 AM, Thierry Reding thierry.reding@gmail.com wrote:
I double checked the symptom and found it was a deadlock on
drm_global_mutex.
When i915_driver_load() registers the platform device while ipvr module
is in the system, ipvr's probe() function tries to lock drm_global_mutex which was already held by i915.
I think either of the following 2 actions need to be moved to a bottom half
e.g. a work queue:
platform_device_add () call in i915_ved.c (called during
i915_driver_load())
drm_dev_register() call during ipvr's probe() Which one
makes more sense? pls kindly advise (I personally prefer the former
one.).
Yes, that's somewhat ugly, but I don't see a way around that. I'd also think that moving platform_device_add() to a workqueue would be the best option here.
Or we simply kill drm_global_mutex for platform drivers that don't use the -
probe hook. It should work when they have a correct order betwen
drm_dev_alloc and _register and all the code in between. So just ditch the -
load callback in teh ipvr driver and rework the load sequence as suggested
somewhere else and this is fixed already. No need for bottom halfs I
think.
Daniel, sorry I didn't quite understand "platform drivers that don't use the probe hook". For initialization, the ipvr platform driver's probe() is called in following 2 possible paths:
- ipvr installed before i915. In this case, ipvr's probe() is called
inside i915_driver_load() and falls into the drm_global_mutex dead lock. 2. i915 installed before ipvr. In this case, ipvr's probe() is called without drm_global_mutex held by i915 and no dead lock issue. If we kill drm_global_mutex, will path 2 run into issue? And in your suggestion, how to rework the load sequence? Do you mean calling ipvr's load() callback directly during platform driver probe()?
Hm right it's not that simple really. What we need in more detail is:
Move the mutex_lock(&drm_global_mutex) out of drm_dev_register into all the callers. If a driver has a ->load() callback it most likely is racy with the usual load ordering issues.
Rework ipvr to no longer have a ->load callback. Insteaed use the following sequence (in the platform ->probe callback):
drm_dev_alloc(); ipvr_load(); drm_dev_register();
With that ordering we don't need the additional guarantees that drm_global_mutex provides and we can avoid to take that lock around drm_dev_registrer() call in the ipvr code.
Thanks for the detailed explanation, Daniel! That sounds to be a small refactor on drm core, and need change many drm drivers: nouveau, tegra, udl. Should it be a standalone RFC patch?
I think the locking shuffling should be doable in just one patch, but definitely needs to be split out. -Daniel
On Mon, Nov 24, 2014 at 02:14:48PM +0100, Daniel Vetter wrote:
On Mon, Nov 24, 2014 at 10:55:46AM +0100, Thierry Reding wrote:
On Fri, Nov 21, 2014 at 09:36:33PM +0100, Daniel Vetter wrote:
On Fri, Nov 21, 2014 at 09:27:04PM +0100, Thierry Reding wrote:
On Sat, Nov 22, 2014 at 03:10:01AM +0800, Yao Cheng wrote:
on vlv, if ipvr is installed, it need be manually unloaded before i915, otherwise user might run into use-after-free issue.
Huh? That doesn't sound right. What exactly is it that's going wrong? You should never have to do this. If you do you're almost certainly doing something wrong in the kernel module.
It's the hilarity called platform devices. Removing them is somewhat racy, so doing that upfront makes the entire thing a bit safer. The use after free is on the text, since grabbing a module refcount for the platform device doesn't work (it would pin the module forever).
I don't understand what the issue is here. I've used platform devices quite extensively on ARM and I've never encountered a situation where they were insufficient (or racy for that matter).
If I understand correctly what this commit tries to achieve, then it unloads one module before another module that it depends on so that the dependency can be removed subsequently without causing a crash. That sounds really brittle to me. How are you going to document this for users so that they don't accidentally go and unload the i915 module and crash their system?
Module unloading taints your kernel and isn't an end-user supported feature. That simple ;-)
Also afaik the problem is that you actually can't unload i915 until you've unloaded the subordinate driver, since i915 registering the platform driver prevents unload.
That doesn't sound at all like use-after-free, so if that's really the only problem then the commit description should be more accurate.
Or at least that was my understanding, I didn't test this myself. I just asked whether the unload script still works and apparently it breaks.
I guess what's different with ARM is that DT creates all the platform devices, and not modules themselves?
No, I don't think that has anything to do with it. I'm pretty sure I've seen this work reliably with something like MFD where one module can create a number of platform devices, and remove them again, just as well.
Thierry
dri-devel@lists.freedesktop.org