https://bugzilla.kernel.org/show_bug.cgi?id=89661
Bug ID: 89661 Summary: Kernel panic when trying use amdkfd driver on Kaveri Product: Drivers Version: 2.5 Kernel Version: 3.18.0 + drm-next branch Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: linux@bernd-steinhauser.de Regression: No
The kernel I tried to use was 3.18.0 and I merged the drm-next branch from git://people.freedesktop.org/~airlied/linux
which includes the HSA driver amdkfd. CONFIG_HSA_AMD=y is set. When trying to boot the kernel, I get a kernel panic, as shown in the uploaded picture.
CPU is a A10-7800 Kaveri, Motherboard is ASRock FM2A88X Extreme6+.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #1 from Bernd Steinhauser linux@bernd-steinhauser.de --- Created attachment 160441 --> https://bugzilla.kernel.org/attachment.cgi?id=160441&action=edit Picture of the kernel panic output
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #2 from Michel Dänzer michel@daenzer.net --- Does it also happen with CONFIG_HSA_AMD=m?
https://bugzilla.kernel.org/show_bug.cgi?id=89661
Oded Gabbay oded.gabbay@amd.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |oded.gabbay@amd.com
--- Comment #3 from Oded Gabbay oded.gabbay@amd.com --- Created attachment 160721 --> https://bugzilla.kernel.org/attachment.cgi?id=160721&action=edit Print errors in case of NULL pointers and don't dereference them
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #4 from Oded Gabbay oded.gabbay@amd.com --- Hi, Please try the attached patch.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #5 from Bernd Steinhauser linux@bernd-steinhauser.de --- (In reply to Michel Dänzer from comment #2)
Does it also happen with CONFIG_HSA_AMD=m?
Only tried CONFIG_HSA_AMD=n, not module, but this happens so early that I'm confident it does not matter.
Will try the patch, thanks.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #6 from Bernd Steinhauser linux@bernd-steinhauser.de --- Tried the patch, exactly the same result.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #7 from Oded Gabbay oded.gabbay@amd.com --- Created attachment 160751 --> https://bugzilla.kernel.org/attachment.cgi?id=160751&action=edit More checks on pointers being used
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #8 from Oded Gabbay oded.gabbay@amd.com --- Hi, Three things, please:
1. Please try the attached patch. It tries to verify more pointers before using them.
2. You said CONFIG_HSA_AMD=y. What's the value of CONFIG_DRM_RADEON ? If its "m", could you change it to "y" ?
3. I would still like to ask if you could check with the following config: CONFIG_DRM_RADEON="m" CONFIG_HSA_AMD="m"
Thanks
Oded
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #9 from Oded Gabbay oded.gabbay@amd.com --- One more thing, I'm trying to understand the exact tree you are using so we will look at the same code. Did you just took drm-next, or did you manually merged between trees ? If you did a manual merge, could you try instead to just take drm-next ? It's already based on 3.18.0
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #10 from Oded Gabbay oded.gabbay@amd.com --- Hi, So I managed to recreate the bug on my setup. This is happening because you compiled all the modules inside the kernel. I need to address that, but for now, if you will compile them as "m", everything is supposed to work.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #11 from Bernd Steinhauser linux@bernd-steinhauser.de --- Hm, ok. So should I still try the steps above? Because trying to use drm_radeon as a module would require me to do some testing with that setup before.
(In reply to Oded Gabbay from comment #8)
- You said CONFIG_HSA_AMD=y. What's the value of CONFIG_DRM_RADEON ? If its
"m", could you change it to "y" ?
I'm using a static initrd (only a basic system, but doesn't contain any kernel modules), so all drivers necessary to start the system (including drm_radeon) are compiled in.
Regarding the tree: I took plain 3.18 (b2776b) and then merged the drm-next branch from the repo mentioned above. iirc, it was a fast forward.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #12 from Oded Gabbay oded.gabbay@amd.com --- As I said, there is definitely a bug when compiling both radeon and amdkfd inside the kernel. I'm working on fixing it, but that could take a few days. In the meantime, the only way to make it work without touching the code, is to either compile both drivers as modules or just radeon as module.
No need for further experiments.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
Oded Gabbay oded.gabbay@amd.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #160721|0 |1 is obsolete| | Attachment #160751|0 |1 is obsolete| |
--- Comment #13 from Oded Gabbay oded.gabbay@amd.com --- Created attachment 160951 --> https://bugzilla.kernel.org/attachment.cgi?id=160951&action=edit workaround for the module order problem
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #14 from Oded Gabbay oded.gabbay@amd.com --- I attached a new patch which should solve you the problem when compiling all the drivers into the kernel image. This is a hacky workaround, so this is not the final solution, but it will help you continue with your setup, I hope.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
Oded Gabbay oded.gabbay@amd.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #160951|0 |1 is obsolete| |
--- Comment #15 from Oded Gabbay oded.gabbay@amd.com --- Created attachment 160961 --> https://bugzilla.kernel.org/attachment.cgi?id=160961&action=edit hacky workaround for module order problem
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #16 from Bernd Steinhauser linux@bernd-steinhauser.de --- Thanks, I'll give it a try.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
--- Comment #17 from Bernd Steinhauser linux@bernd-steinhauser.de --- Ok, it does now boot and seems to work.
https://bugzilla.kernel.org/show_bug.cgi?id=89661
Bernd Steinhauser linux@bernd-steinhauser.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |CODE_FIX
--- Comment #18 from Bernd Steinhauser linux@bernd-steinhauser.de --- At some point (didn't have a closer look), this was fixed and does now work as expected without workarounds. (Tested: 4.5.1)
dri-devel@lists.freedesktop.org