https://bugs.freedesktop.org/show_bug.cgi?id=104639
Bug ID: 104639 Summary: kernel 4.15-rc8 reboots randomly with RX 560 Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: blocker Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: peteralm1980@gmail.com
Created attachment 136726 --> https://bugs.freedesktop.org/attachment.cgi?id=136726&action=edit dmesg
My workstation running arch linux with a RX560 GPU randomly reboots after a few minutes of use with the new amdgpu version in kernel 4.15. It seems to mostly happen when there's a full screen redraw happening (such as maximizing a window.) It happens regardless of desktop environment (I've tested GNOME on both wayland and Xorg, as well as KDE). If it's left at the GDM login screen it doesn't seem to reboot.
Kernel version 4.14 and older seems to be rock solid.
I've tried the following things to workaround it but nothing seems to make any difference: * Setting amdgpu.dc to both 1 and 0
* Disabling dpm by setting amdgpu.dpm to 0
* Upgrading MESA from 17.3.2 to latest git master
https://bugs.freedesktop.org/show_bug.cgi?id=104639
--- Comment #1 from Michel Dänzer michel@daenzer.net --- Can you try bisecting? Make sure to test each commit for plenty of time before marking it as good.
https://bugs.freedesktop.org/show_bug.cgi?id=104639
--- Comment #2 from Peter Alm peteralm1980@gmail.com --- Hi Michel,
After bisecting it seems as the offending commit seems to be:
commit 648bc3574716400acc06f99915815f80d9563783 Author: Christian König christian.koenig@amd.com Date: Thu Jul 6 09:59:43 2017 +0200
drm/ttm: add transparent huge page support for DMA allocations v2
Try to allocate huge pages when it makes sense.
v2: fix comment and use ifdef
Signed-off-by: Christian König christian.koenig@amd.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
After reverting that 4.15-rc8 seems to be working fine.
https://bugs.freedesktop.org/show_bug.cgi?id=104639
--- Comment #3 from Christian König ckoenig.leichtzumerken@gmail.com --- Does your kernel also have the following patch?
commit f4c809914a7c3e4a59cf543da6c2a15d0f75ee38 Author: Christian König christian.koenig@amd.com Date: Mon Oct 9 14:34:13 2017 +0200
drm/ttm: don't use compound pages for now
We need to figure out first how to correctly map them into the CPU page tables.
bug: https://bugs.freedesktop.org/show_bug.cgi?id=103138 Signed-off-by: Christian König christian.koenig@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com
https://bugs.freedesktop.org/show_bug.cgi?id=104639
--- Comment #4 from Christian König ckoenig.leichtzumerken@gmail.com --- Sorry, just seen that you wrote that you are using 4-15-rc8 and that should include the patch.
No idea what's going wrong here. You not by any chance could add a serial/network console and grab the last logs before the reboot?
https://bugs.freedesktop.org/show_bug.cgi?id=104639
--- Comment #5 from Peter Alm peteralm1980@gmail.com --- Christian,
These are the last messages from a network console. All of them are from before the crash:
[ 90.254437] [drm] {3840x2160, 4000x2222@533250Khz} [ 91.026831] [drm] U24E590: [Block 0] [ 91.028144] [drm] U24E590: [Block 1] [ 91.029479] [drm] dc_link_detect: manufacturer_id = 2D4C, product_id = CD3, serial_number = 304D5844, manufacture_week = 50, manufacture_year = 26, display_name = U24E590, speaker_flag = 1, audio_mode_count = 1 [ 91.032187] [drm] dc_link_detect: mode number = 0, format_code = 1, channel_count = 1, sample_rate = 7, sample_size = 7 [ 91.033565] [drm] {3840x2160, 4000x2222@533250Khz} [ 91.049360] [drm] {3840x2160, 4400x2250@594000Khz} [ 102.531453] input: Surface Mouse as /devices/virtual/misc/uhid/0005:045E:0919.0006/input/input21 [ 102.531670] hid-generic 0005:045E:0919.0006: input,hidraw5: BLUETOOTH HID v1.10 Mouse [Surface Mouse] on 00:1A:7D:DA:71:15 [ 102.537616] mousedev: PS/2 mouse device common for all mice [ 106.699544] fuse init (API version 7.26) [ 106.974151] usb 1-3.3: 1:1: cannot get freq at ep 0x81 [ 107.942126] rfkill: input handler disabled [ 108.240100] ISO 9660 Extensions: RRIP_1991A
I don't have a serial dongle here so I can't use a serial console but at one point I ran the console on the integrated intel GPU (I usually have it disabled in BIOS) while using Xorg on the AMD GPU, there was no messages there either.
If you come up with any ideas or patches I'm happy to try them out.
https://bugs.freedesktop.org/show_bug.cgi?id=104639
--- Comment #6 from fin4478@hotmail.com --- Do not use mainline kernels and point releases Mesa. They work randomly as you see because of partially implemented drivers. Use the kernel below and Mesa dev git. Debian testing/sid Xfce is easier and more stable than Arch Linux with never ready buggy desktops kde and gnome. Oibaf Mesa git ppa bionic version is compatible with Debian testing/sid.
https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.17-wip
No problems with my RX560 with latest code.
https://bugs.freedesktop.org/show_bug.cgi?id=104639
Martin Peres martin.peres@free.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #7 from Martin Peres martin.peres@free.fr --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/298.
dri-devel@lists.freedesktop.org