https://bugzilla.kernel.org/show_bug.cgi?id=203879
Bug ID: 203879 Summary: hard freeze on high single threaded load when Xorg is active (AMD Ryzen 7 2700X CPU, AMD Radeon RX 580 GPU) Product: Drivers Version: 2.5 Kernel Version: 4.19.37-3 (Debian 4.19.0-5-amd64) and others (including mainline versions) Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: claude@mathr.co.uk Regression: No
Created attachment 283233 --> https://bugzilla.kernel.org/attachment.cgi?id=283233&action=edit dmesg from 4.19.0-5-amd64 with amdgpu.dc=1 (no freeze yet)
I am developing a CPU-based program to render fractals, which I usually run with "nice -n 20". The main calculations are multi-threaded, using 16 threads on AMD Ryzen 7 2700X Eight-Core Processor. However, final image PNG saving is single-threaded. During the single-threaded workload only (as observed by htop and program status prints), it can happen that the system freezes hard (no ssh, stuck mouse pointer, no NumLock LED toggle, no magic SysRq, only physical power button for hard power-off works).
This freeze only happens when Xorg is running on the active virtual terminal: I tried to see if some kernel log messages would be displayed before freeze by switching to a console with Ctrl-Alt-F1 after launching my program, but with the terminal active it doesn't seem to freeze.
The freeze does not always occur, but usually happens before a dozen images are saved (sequential process is full-threaded workload, followed by single-threaded workload, repeated). This can take a few hours.
With the virtual terminal active instead of Xorg, I have rendered 100+ images in a row without any issues. Of course, I can't use other X applications at the same time, so this is an annoying workaround.
I mostly run the regular Debian Buster kernel but I have had this freeze occur with other self-compiled kernels of various versions (newer than the Debian kernel, without Debian patches). I also had the freeze with both amdgpu.dc=1 (default) and amdgpu.dc=0 options.
$ uname -a Linux eiskaffee 4.19.0-5-amd64 #1 SMP Debian 4.19.37-3 (2019-05-15) x86_64 GNU/Linux
$ apt-cache policy linux-image-4.19.0-5-amd64 linux-image-4.19.0-5-amd64: Installed: 4.19.37-3 Candidate: 4.19.37-3 Version table: *** 4.19.37-3 990 990 http://ftp.uk.debian.org/debian buster/main amd64 Packages 500 http://ftp.uk.debian.org/debian unstable/main amd64 Packages 100 /var/lib/dpkg/status
https://bugzilla.kernel.org/show_bug.cgi?id=203879
--- Comment #1 from Claude Heiland-Allen (claude@mathr.co.uk) --- My conjecture that inactive Xorg prevents freeze is false: got a system freeze with virtual terminal active, Xorg running on inactive VT. No kernel messages were printed :( Now running a test without Xorg running at all.
https://bugzilla.kernel.org/show_bug.cgi?id=203879
--- Comment #2 from Michel Dänzer (michel@daenzer.net) --- That sounds like a general CPU related stability issue, not directly related to the amdgpu driver.
https://bugzilla.kernel.org/show_bug.cgi?id=203879
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #3 from Alex Deucher (alexdeucher@gmail.com) --- Does appending idle=nomwait to the kernel command line in grub help?
https://bugzilla.kernel.org/show_bug.cgi?id=203879
--- Comment #4 from Claude Heiland-Allen (claude@mathr.co.uk) --- Created attachment 283313 --> https://bugzilla.kernel.org/attachment.cgi?id=283313&action=edit dmesg after boot with idle=nomwait (before freeze which occured some hours later)
I got one freeze so far after about an hour on my workload with idle=nomwait, trying a second time just to verify that it doesn't help.
https://bugzilla.kernel.org/show_bug.cgi?id=203879
Claude Heiland-Allen (claude@mathr.co.uk) changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|hard freeze on high single |hard freeze on high single |threaded load when Xorg is |threaded load (AMD Ryzen 7 |active (AMD Ryzen 7 2700X |2700X CPU) |CPU, AMD Radeon RX 580 GPU) |
--- Comment #5 from Claude Heiland-Allen (claude@mathr.co.uk) --- (In reply to Michel Dänzer from comment #2)
That sounds like a general CPU related stability issue, not directly related to the amdgpu driver.
The later tests make me agree, changed title of report, not sure which Product/Component would be more appropriate.
Adding more system monitoring seems to prevent the condition, perhaps due to the added CPU load:
watch -n 0.1 sensors watch -n 0.1 "cat /proc/cpuinfo | grep MHz"
It freezes during PNG saving of a large image, presumably this involves lots of sequential RAM access. I have XMP enabled in my motherboard BIOS settings iirc, perhaps I should try disabling it?
https://bugzilla.kernel.org/show_bug.cgi?id=203879
Sam Bazley (sambazley@protonmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |sambazley@protonmail.com
--- Comment #6 from Sam Bazley (sambazley@protonmail.com) --- I think this bug is also affecting me on Arch with the 5.2 kernel, since my computer is completely freezing when compiling with -j`nproc`. I've bisected, and found 004b3938e6374f39d43cc32bd4953f2fe8b8905b to be the first bad commit.
https://bugzilla.kernel.org/show_bug.cgi?id=203879
--- Comment #7 from Sam Bazley (sambazley@protonmail.com) --- (I've got a 2700X and a Vega 64, if that helps)
https://bugzilla.kernel.org/show_bug.cgi?id=203879
--- Comment #8 from Sam Bazley (sambazley@fastmail.com) --- Created attachment 283739 --> https://bugzilla.kernel.org/attachment.cgi?id=283739&action=edit dmesg after crash
Retrieved the dmesg log with ssh after the crash.
https://bugzilla.kernel.org/show_bug.cgi?id=203879
--- Comment #9 from Sam Bazley (sambazley@fastmail.com) --- I've realised that I am actually being affected by this bug: https://bugzilla.kernel.org/show_bug.cgi?id=204181
Please disregard my previous comments.
dri-devel@lists.freedesktop.org