[Bug 107154] [drm] GPU recovery disabled. - dri-devel - freedesktop.org experimental mailing list

8 Jul 2018


      https://bugs.freedesktop.org/show_bug.cgi?id=107154
Bug ID: 107154
           Summary: [drm] GPU recovery disabled.
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: freedesktop.org@nentwig.biz
Hi!
This is a surprisingly long standing problem with a RX 460, more precisely
since 4.15 all the way up to 4.18 AMD staging DRM next [1]. 
After resuming from sleep (echo -n mem > /sys/power/state) amdgpu is dead
(always, reliably).
Here's what dmesg has to say about it:
[Sun Jul  8 11:01:17 2018] PM: suspend exit
[Sun Jul  8 11:01:19 2018] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu:
IB test timed out.
[Sun Jul  8 11:01:19 2018] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu:
failed testing IB on GFX ring (-110).
[Sun Jul  8 11:01:19 2018] [drm:process_one_work] *ERROR* ib ring test failed
(-110).
[Sun Jul  8 11:01:28 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=864, last emitted seq=868
[Sun Jul  8 11:01:28 2018] [drm] GPU recovery disabled.
...
From ealier versions:
[   42.802559] PM: suspend exit
[   42.824332] amdgpu 0000:41:00.0: GPU fault detected: 147 0x0bd84802
[   42.824338] amdgpu 0000:41:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x0034F97B
[   42.824341] amdgpu 0000:41:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0C048002
[   42.824345] amdgpu 0000:41:00.0: VM fault (0x02, vmid 6) at page 3471739,
read from 'TC0' (0x54433000) (72)
[   52.956306] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=1287, last emitted seq=1289
[   52.956316] [drm] IP block:gfx_v8_0 is hung!
[   52.956362] [drm] GPU recovery disabled.
I've also seen fault 146 but other than that it mostly looks the same. 4.14-lts
(with dc=0) works fine.
RX 460, Zenith Extreme, 1950x.
[1] arch linux AUR; this versioning is a bit confusing, it may actually already
be the 4.19 branch, latest commit is3838e387fd1eb17bfcf6ff7d443d931adb5cb41b
-- 
You are receiving this mail because:
You are the assignee for the bug.