https://bugzilla.kernel.org/show_bug.cgi?id=84431
Bug ID: 84431 Summary: Kernel crash when unloading radeon module for switcheroo card Product: Drivers Version: 2.5 Kernel Version: all Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: pali.rohar@gmail.com Regression: No
Created attachment 149991 --> https://bugzilla.kernel.org/attachment.cgi?id=149991&action=edit Fix crash after rmmod radeon on PX systems.
Calling rmmod radeon on PX system cause kernel crash. Reason is function vga_switcheroo_init_domain_pm_ops() which setting dev->pm_domain function of PCI device. When radeon module is unloaded pointer dev->pm_domain is set to vga_switcheroo function which try to call radeon function (which does not exists in memory after rmmod radeon). I bet that nouveau has same problem.
I'm attaching simple patch which set dev->pm_domain of PCI device back to NULL when removing radeon device so vga_switcheroo will not be called.
But I think that proper way for fixing this bug - which is in vga_switcheroo - should be to add function like "vga_switcheroo_exit_domain_pm_ops()" which will set pm_domain back to origin value (which is in my case NULL).
With my patch on PX system I can call rmmod radeon, modprobe radeon, rmmod radeon, ... many times without no crash.
https://bugzilla.kernel.org/show_bug.cgi?id=84431
Pali Rohár pali.rohar@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #149991|0 |1 is patch| | Attachment #149991|application/octet-stream |text/plain mime type| |
https://bugzilla.kernel.org/show_bug.cgi?id=84431
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #1 from Alex Deucher alexdeucher@gmail.com --- Care to generate a git patch and sign-off on it?
https://bugzilla.kernel.org/show_bug.cgi?id=84431
--- Comment #2 from Pali Rohár pali.rohar@gmail.com --- I can, but I do not know if this is proper way how to fix it. I still think that root of bug is in function vga_switcheroo_init_domain_pm_ops() which overwrite dev->pm_domain, but does not restore it when driver/device unregister.
https://bugzilla.kernel.org/show_bug.cgi?id=84431
--- Comment #3 from Alex Deucher alexdeucher@gmail.com --- Created attachment 150001 --> https://bugzilla.kernel.org/attachment.cgi?id=150001&action=edit patch 1/3
How about this patch set?
https://bugzilla.kernel.org/show_bug.cgi?id=84431
--- Comment #4 from Alex Deucher alexdeucher@gmail.com --- Created attachment 150011 --> https://bugzilla.kernel.org/attachment.cgi?id=150011&action=edit patch 2/3
https://bugzilla.kernel.org/show_bug.cgi?id=84431
--- Comment #5 from Alex Deucher alexdeucher@gmail.com --- Created attachment 150021 --> https://bugzilla.kernel.org/attachment.cgi?id=150021&action=edit patch 3/3
https://bugzilla.kernel.org/show_bug.cgi?id=84431
--- Comment #6 from Pali Rohár pali.rohar@gmail.com --- I tested 1/3 and 2/3 on 3.13 kernel. And as expected (because patches doing same thing) same result as with my patch - no kernel crash anymore. You can add my Signed-off.
I do not have nvidia optimus card, so I cannot test last patch.
Anyway in vga_switcheroo.c is exported function vga_switcheroo_init_domain_pm_optimus_hdmi_audio() which changing dev->pm_domain too. But I do not see any driver which using it.
https://bugzilla.kernel.org/show_bug.cgi?id=84431
--- Comment #7 from Pali Rohár pali.rohar@gmail.com --- Function vga_switcheroo_init_domain_pm_optimus_hdmi_audio() is used in sound/pci/hda/hda_intel.c. So that driver has same problem and cause kernel panic on driver unload.
https://bugzilla.kernel.org/show_bug.cgi?id=84431
Joaquín Aramendía samsagax@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |samsagax@gmail.com
--- Comment #8 from Joaquín Aramendía samsagax@gmail.com --- Alex, That patchset indeed got rid of that bug, but for some reason it introduced another one: https://bugzilla.kernel.org/show_bug.cgi?id=86011
97d30fa3524ff60b43d450012abe8f961d280478 from stable kernel tree breaks nouveau power management through vga-switcheroo.
https://bugzilla.kernel.org/show_bug.cgi?id=84431
Peter Wu peter@lekensteyn.nl changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |peter@lekensteyn.nl
--- Comment #9 from Peter Wu peter@lekensteyn.nl --- (In reply to Pali Rohár from comment #7)
Function vga_switcheroo_init_domain_pm_optimus_hdmi_audio() is used in sound/pci/hda/hda_intel.c. So that driver has same problem and cause kernel panic on driver unload.
A patch for this issue is queued at http://mailman.alsa-project.org/pipermail/alsa-devel/2016-July/110125.html
Joaquín, how does 97d30fa35 break nouveau vga-switcheroo? If you load nouveau with runpm=0, then you can write OFF to debugfs' vga_switcheroo. However runpm=1 (or -1 for Optimus systems) is recommended.
I think that the original bug is fixed, so this can be marked as resolved?
https://bugzilla.kernel.org/show_bug.cgi?id=84431
--- Comment #10 from Joaquín Aramendía samsagax@gmail.com ---
Joaquín, how does 97d30fa35 break nouveau vga-switcheroo? If you load nouveau with runpm=0, then you can write OFF to debugfs' vga_switcheroo. However runpm=1 (or -1 for Optimus systems) is recommended.
Just tested removing nouveau module with Ubuntu 16.04 on mainline kernel v4.6.5 and it worked correctly. Also modprobed it after that and worked correctly. This bug should be marked as resolved.
dri-devel@lists.freedesktop.org