https://bugs.freedesktop.org/show_bug.cgi?id=104817
Bug ID: 104817 Summary: [Raven][GALLIUM_DDEBUG] system crashes/freezes randomly every few minutes/hours Product: Mesa Version: git Hardware: All OS: Linux (All) Status: NEW Severity: critical Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel@lists.freedesktop.org Reporter: marcus.husar@gmail.com QA Contact: dri-devel@lists.freedesktop.org
Created attachment 137000 --> https://bugs.freedesktop.org/attachment.cgi?id=137000&action=edit GALLIUM_DDEBUG: folder ddebug_dumps with multiple dumps
OpenGL renderer string: AMD RAVEN (DRM 3.23.0 / 4.16.0-2.fc27.x86_64, LLVM 6.0.0)
My system is an Acer SF315-41 (Ryzen Mobile 5 2500U) with Fedora 27, Kernel 4.16-drm-next (based on 4.15-rc8), LLVM 6.0.0-rc1, Mesa 18.0.0-rc2.
I can reproduce these crashes from kernel-4.15-rcX/mesa-17.3/llvm5 to kernel-4.16-drm-next/mesa-18-rc2/llvm6-rc1 and in between. They mostly appear while watching videos (firefox/totem), switching tabs in firefox, resizing windows (gnome-shell) or gaming.
With amdgpu.lockup_timeout=2000 and amdgpu.GALLIUM_DDEBUG=2000 I was able to gather lots of dumps within a few minutes (see attachment). As you can see in the dumps the GPU lockup results sometimes in a CPU lockup (kernel bluetooth deadlock) as a result of gnome shell’s complete freezing. I can reproduce amdgpu crashes also with an USB mouse and bluetooth disabled.
Not very often I can find some kernel errors in the logfiles that result from a crash. I’ll attach the few I found in the last two weeks.
https://bugs.freedesktop.org/show_bug.cgi?id=104817
--- Comment #1 from Marcus Husar marcus.husar@gmail.com --- Created attachment 137001 --> https://bugs.freedesktop.org/attachment.cgi?id=137001&action=edit kernel: [drm:amdgpu_job_timedout [amdgpu]]
https://bugs.freedesktop.org/show_bug.cgi?id=104817
--- Comment #2 from Marcus Husar marcus.husar@gmail.com --- Created attachment 137002 --> https://bugs.freedesktop.org/attachment.cgi?id=137002&action=edit kernel: amdgpu [gfxhub] VMC page fault (1)
https://bugs.freedesktop.org/show_bug.cgi?id=104817
--- Comment #3 from Marcus Husar marcus.husar@gmail.com --- Created attachment 137003 --> https://bugs.freedesktop.org/attachment.cgi?id=137003&action=edit kernel: amdgpu [gfxhub] VMC page fault (2)
https://bugs.freedesktop.org/show_bug.cgi?id=104817
--- Comment #4 from Bráulio Barros de Oliveira brauliobo@gmail.com --- Same here with AMD 2500U on a HP Envy x360, details at: - https://bugzilla.redhat.com/show_bug.cgi?id=1562530 - https://lists.freedesktop.org/archives/amd-gfx/2018-March/020580.html
https://bugs.freedesktop.org/show_bug.cgi?id=104817
--- Comment #5 from Justin Mitzel katoflip@protonmail.com --- I am also having this problem. Ryzen 2500u on kernel 4.16-DRM-next. Many hangs that require a reboot to fix.
https://bugs.freedesktop.org/show_bug.cgi?id=104817
--- Comment #6 from Justin Mitzel katoflip@protonmail.com --- Although it also seems very likely that this is a Kernel driver issue.
https://bugs.freedesktop.org/show_bug.cgi?id=104817
James Le Cuirot chewi@gentoo.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |chewi@gentoo.org
--- Comment #7 from James Le Cuirot chewi@gentoo.org --- OP also filed a kernel bug about this. It missed the crucial information about how he was able to debug it! Glad I found this one.
https://bugzilla.kernel.org/show_bug.cgi?id=199653
https://bugs.freedesktop.org/show_bug.cgi?id=104817
--- Comment #8 from Marcus Husar marcus.husar@gmail.com --- It seems to me that this is in fact a CPU related problem. Since July 25 I don’t have any problems. My system is pretty stable. What helped was to add idle=nomwait to my GRUB command line. This has fixed those problems for me.
Please try to add idle=nomwait to your GRUB command line. I think this bug can be closed.
https://bugs.freedesktop.org/show_bug.cgi?id=104817
--- Comment #9 from James Le Cuirot chewi@gentoo.org --- I added idle=nomwait recently and that has fixed it for me too. I thought I had already tried this, not sure, but perhaps there were two issues and the other has since been fixed.
https://bugs.freedesktop.org/show_bug.cgi?id=104817
Marcus Husar marcus.husar@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |WORKSFORME
--- Comment #10 from Marcus Husar marcus.husar@gmail.com --- See comment #8. Kernel parameter idle=nomwait fixed this bug for me. It seems to be a CPU related problem.
dri-devel@lists.freedesktop.org