https://bugzilla.kernel.org/show_bug.cgi?id=199653
Bug ID: 199653 Summary: [AMDGPU][DC] BUG: unable to handle kernel NULL pointer dereference (trace decoded) Product: Drivers Version: 2.5 Kernel Version: drm-next-4.18-wip (agd5f, AMDGPU) Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: marcus.husar@gmail.com Regression: No
Created attachment 275827 --> https://bugzilla.kernel.org/attachment.cgi?id=275827&action=edit Call trace of crash decoded with decode_stacktrace.sh
kernel: BUG: unable to handle kernel NULL pointer dereference at 00000000000002e4
This happens multiple times a day on my machine. It leads to a complete system freeze. Yesterday I was lucky and got a stack trace.
It mostly happens browsing the web with Firefox (WebRender enabled, XWayland, Gnome-Shell) when the cursor moves or rotates. But it can happen anywhere and anytime.
The used kernel is from branch drm-next-4.18-wip@92fb374 of Alex Deucher, AMD (agd5f). See: git://people.freedesktop.org/~agd5f/linux.
My machine: * Hardware name: Acer Swift SF315-41/Becks_RR, BIOS V1.04 01/09/2018 * Ryzen Mobile 2500U * Firmware: VCN: 1.73 (latest available version) * My kernel is tainted because the i2c designware driver emits a warning during boot. This should be unrelated to AMDGPU (see attachment i2c_designware_trace.txt).
Please ask if anything else is needed.
https://bugzilla.kernel.org/show_bug.cgi?id=199653
--- Comment #1 from Marcus Husar (marcus.husar@gmail.com) --- Created attachment 275829 --> https://bugzilla.kernel.org/attachment.cgi?id=275829&action=edit Original call trace of crash
https://bugzilla.kernel.org/show_bug.cgi?id=199653
--- Comment #2 from Marcus Husar (marcus.husar@gmail.com) --- Created attachment 275831 --> https://bugzilla.kernel.org/attachment.cgi?id=275831&action=edit Journal log with amdgpu.dc_log=1 drm.debug=6
https://bugzilla.kernel.org/show_bug.cgi?id=199653
--- Comment #3 from Marcus Husar (marcus.husar@gmail.com) --- Created attachment 275833 --> https://bugzilla.kernel.org/attachment.cgi?id=275833&action=edit Call trace of i2c designware warning (taints kernel)
https://bugzilla.kernel.org/show_bug.cgi?id=199653
James Le Cuirot (chewi@gentoo.org) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |chewi@gentoo.org
--- Comment #4 from James Le Cuirot (chewi@gentoo.org) --- I have almost exactly the same hardware but the Ryzen 7 (2700U) version. I also get multiple daily freezes. Many thanks for the additional info, I never managed to get any. I did try kdump but it never triggers, even when forcing a kernel panic. I'm now running OpenSUSE 15.0 with kernels from the "stable" repository (currently 4.16.12-1.g39c7522) and recent Mesa 18.2 prerelease builds.
https://bugzilla.kernel.org/show_bug.cgi?id=199653
--- Comment #5 from James Le Cuirot (chewi@gentoo.org) --- After seeing amdgpu.vm_update_mode=3 mentioned in bug #199749, I tried it but it didn't help.
https://bugzilla.kernel.org/show_bug.cgi?id=199653
Andrey Grodzovsky (andrey.grodzovsky@amd.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |andrey.grodzovsky@amd.com
--- Comment #6 from Andrey Grodzovsky (andrey.grodzovsky@amd.com) --- Those two are unrelated bugs.
https://bugzilla.kernel.org/show_bug.cgi?id=199653
--- Comment #7 from James Le Cuirot (chewi@gentoo.org) --- I had high hopes for 4.18-rc1 but alas it froze after a few hours. :( I know having the latest firmware is important so I have that too. These OpenSUSE packages (some unofficial) are installed:
kernel-default-4.18.rc1-1.1.gfa9e020 kernel-firmware-20180606-35.1 libdrm2-2.4.99~git20180511-lp150.1.1 Mesa-18.2.0~git20180619-lp150.16.1
https://bugzilla.kernel.org/show_bug.cgi?id=199653
--- Comment #8 from James Le Cuirot (chewi@gentoo.org) --- OP also filed a freedesktop.org bug report with more information.
https://bugs.freedesktop.org/104817
Still the same with 4.18-rc4. :(
https://bugzilla.kernel.org/show_bug.cgi?id=199653
--- Comment #9 from Marcus Husar (marcus.husar@gmail.com) ---
From the freedesktop.org bug:
It seems to me that this is in fact a CPU related problem. Since July 25 I don’t have any problems. My system is pretty stable. What helped was to add idle=nomwait to my GRUB command line. This has fixed those problems for me.
Please try to add idle=nomwait to your GRUB command line. I think this bug can be closed.
https://bugzilla.kernel.org/show_bug.cgi?id=199653
--- Comment #10 from James Le Cuirot (chewi@gentoo.org) --- I added idle=nomwait recently and that has fixed it for me too. I thought I had already tried this, not sure, but perhaps there were two issues and the other has since been fixed.
https://bugzilla.kernel.org/show_bug.cgi?id=199653
Marcus Husar (marcus.husar@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |INVALID
--- Comment #11 from Marcus Husar (marcus.husar@gmail.com) --- See comment #9 and #10. Kernel parameter idle=nomwait fixed this bug for me. It seems to be a CPU related problem.
dri-devel@lists.freedesktop.org