[Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!

21 Aug 2018


      https://bugs.freedesktop.org/show_bug.cgi?id=102322
--- Comment #57 from Andrey Grodzovsky andrey.grodzovsky@amd.com ---
(In reply to dwagner from comment #56)
...
(In reply to Andrey Grodzovsky from comment #55)
...
...
In above attached file "xz-compressed output of gpu_debug3.sh" there is umr
output at the time of the crash (238 seconds after the reboot):

...
          mpv/vo-897   [005] ....   235.191542: dma_fence_wait_start:
driver=drm_sched timeline=gfx context=162 seqno=87
          mpv/vo-897   [005] d...   235.191548: dma_fence_enable_signal:
driver=drm_sched timeline=gfx context=162 seqno=87
     kworker/0:2-92    [000] ....   238.275988: dma_fence_signaled:
driver=amdgpu timeline=sdma1 context=11 seqno=210
     kworker/0:2-92    [000] ....   238.276004: dma_fence_signaled:
driver=amdgpu timeline=sdma1 context=11 seqno=211
[  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
timeout, signaled seq=32624, emitted seq=32626
[  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
[  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
crash detected!
executing umr -O halt_waves -wa
No active waves!
Did you use amdgpu.vm_fault_stop=2 parameter ? In case a fault happened that
should have froze GPUs compute units and hence the above command would
produce a lot of wave info.
Yes I did, as can be seen from the kernel command line at the very beginning
of the file I attached:
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux_amd
root=UUID=b5d56e15-18f3-4783-af84-bbff3bbff3ef rw
cryptdevice=/dev/nvme0n1p2:root:allow-discards libata.force=1.5 video=DP-1:d
video=DVI-D-1:d video=HDMI-A-1:1024x768 amdgpu.dc=1 amdgpu.vm_update_mode=0
amdgpu.dpm=-1 amdgpu.ppfeaturemask=0xffffffff amdgpu.vm_fault_stop=2
amdgpu.vm_debug=1
Could the "amdgpu 0000:0a:00.0: GPU reset begin!" message indicate a
procedure that discards whatever has been in thoses "waves" before? If yes,
could amdgpu.gpu_recovery=0 prevent that from happening?
Yes, missed that one. No resets.
-- 
You are receiving this mail because:
You are the assignee for the bug.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!