https://bugs.freedesktop.org/show_bug.cgi?id=110457
Bug ID: 110457 Summary: System resumes failed and hits [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout on Acer Squirtle_SR laptop Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: major Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: jian-hong@endlessm.com
Created attachment 144006 --> https://bugs.freedesktop.org/attachment.cgi?id=144006&action=edit dmesg with amdgpu.dc=1 drm.debug=7 in boot command
We have an Acer Squirtle_SR laptop equipped with AMD A9-9420e RADEON R5, 5 COMPUTE CORES 2C+3G and [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1002:6900]. We test it with Linux kernel 5.1.0-rc5+.
The kernel includes the patch [1] mentioned in comment 110360#c9 [2].
System keeps screen black after system resumes from suspending.
The error keeps showing on dmesg:
[ 177.401716] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=290, emitted seq=294 [ 177.401848] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 569 thread Xorg:cs0 pid 571 [ 177.401855] [drm] IP block:gfx_v8_0 is hung! [ 177.401932] [drm] GPU recovery disabled.
01:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1002:6900] (rev c3) Subsystem: Acer Incorporated [ALI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1025:1217] Flags: bus master, fast devsel, latency 0, IRQ 40 Memory at c0000000 (64-bit, prefetchable) [size=256M] Memory at d0000000 (64-bit, prefetchable) [size=2M] I/O ports at 3000 [size=256] Memory at d1400000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at d1440000 [disabled] [size=128K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Capabilities: [270] #19 Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Kernel driver in use: amdgpu Kernel modules: amdgpu
[1] https://patchwork.kernel.org/patch/10889269/ [2] https://bugzilla.freedesktop.org/show_bug.cgi?id=110360#c9
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #1 from jian-hong@endlessm.com --- Created attachment 144007 --> https://bugs.freedesktop.org/attachment.cgi?id=144007&action=edit dmesg with amdgpu.dc=1 drm.debug=7 amdgpu.runpm=0 in boot command
Also tried with amdgpu.runpm=0 in boot command. However, it still get the same error.
[ 78.078762] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=290, emitted seq=294 [ 78.078897] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 572 thread Xorg:cs0 pid 588 [ 78.078908] [drm] IP block:gfx_v8_0 is hung! [ 78.079079] [drm] GPU recovery disabled.
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #2 from jian-hong@endlessm.com --- Created attachment 144008 --> https://bugs.freedesktop.org/attachment.cgi?id=144008&action=edit lspci -nnv on Acer Squirtle_SR
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #3 from jian-hong@endlessm.com --- Created attachment 144030 --> https://bugs.freedesktop.org/attachment.cgi?id=144030&action=edit dmesg with amdgpu.dc=1 drm.debug=7 in boot command on Acer TravelMate B114-21
We have another laptop Acer TravelMate B114-21, which hits the same issue. It is equipped with AMD A4-9120C RADEON R4, 5 COMPUTE CORES 2C+3G.
[ 60.011965] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=206, emitted seq=208 [ 60.012215] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 1388 thread gnome-shel:cs0 pid 1409 [ 60.012226] [drm] IP block:gfx_v8_0 is hung! [ 60.012320] [drm] GPU recovery disabled.
00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] [1002:98e4] (rev eb) (prog-if 00 [VGA controller]) Subsystem: Acer Incorporated [ALI] Stoney [Radeon R2/R3/R4/R5 Graphics] [1025:132a] Flags: bus master, fast devsel, latency 0, IRQ 36 Memory at e8000000 (64-bit, prefetchable) [size=128M] Memory at f0000000 (64-bit, prefetchable) [size=8M] I/O ports at f000 [size=256] Memory at fea00000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [58] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270] #19 Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Kernel driver in use: amdgpu Kernel modules: amdgpu
Also tried with amdgpu.runpm=0 in boot command, but this issue still can be reproduced.
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #4 from jian-hong@endlessm.com --- Created attachment 144031 --> https://bugs.freedesktop.org/attachment.cgi?id=144031&action=edit lspci -nnv on Acer TravelMate B114-21
https://bugs.freedesktop.org/show_bug.cgi?id=110457
jian-hong@endlessm.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|medium |high Severity|major |critical
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #5 from jian-hong@endlessm.com --- Created attachment 144042 --> https://bugs.freedesktop.org/attachment.cgi?id=144042&action=edit journal log on Acer TravelMate B114-21
Got more information after wait more time for resuming on Acer TravelMate B114-21.
Apr 19 15:06:38 endless kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2841, emitted seq=2845 Apr 19 15:06:38 endless kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 695 thread Xorg:cs0 pid 698 Apr 19 15:06:38 endless kernel: [drm] IP block:gfx_v8_0 is hung! Apr 19 15:06:38 endless kernel: [drm] GPU recovery disabled. Apr 19 15:06:40 endless kernel: INFO: task Xorg:695 blocked for more than 604 seconds. Apr 19 15:06:40 endless kernel: Tainted: G W 5.1.0-rc5+ #1 Apr 19 15:06:40 endless kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 19 15:06:40 endless kernel: Xorg D 0 695 683 0x00400004 Apr 19 15:06:40 endless kernel: Call Trace: Apr 19 15:06:40 endless kernel: __schedule+0x2d4/0x840 Apr 19 15:06:40 endless kernel: schedule+0x2c/0x70 Apr 19 15:06:40 endless kernel: schedule_timeout+0x258/0x360 Apr 19 15:06:40 endless kernel: ? amdgpu_atom_execute_table_locked+0x136/0x210 [amdgpu] Apr 19 15:06:40 endless kernel: dma_fence_default_wait+0x20a/0x280 Apr 19 15:06:40 endless kernel: ? dma_fence_release+0xa0/0xa0 Apr 19 15:06:40 endless kernel: dma_fence_wait_timeout+0xe7/0x110 Apr 19 15:06:40 endless kernel: amdgpu_fence_wait_empty+0x61/0xc0 [amdgpu] Apr 19 15:06:40 endless kernel: amdgpu_pm_compute_clocks+0x70/0x590 [amdgpu] Apr 19 15:06:40 endless kernel: dm_pp_apply_display_requirements+0x19a/0x1b0 [amdgpu] Apr 19 15:06:40 endless kernel: dce11_pplib_apply_display_requirements+0x1f4/0x210 [amdgpu] Apr 19 15:06:40 endless kernel: dce11_update_clocks+0xa0/0x100 [amdgpu] Apr 19 15:06:40 endless kernel: dce110_prepare_bandwidth+0x3e/0x50 [amdgpu] Apr 19 15:06:40 endless kernel: dc_commit_state+0x22d/0x5a0 [amdgpu] Apr 19 15:06:40 endless kernel: ? drm_calc_timestamping_constants+0x106/0x150 [drm] Apr 19 15:06:40 endless kernel: amdgpu_dm_atomic_commit_tail+0x1fb/0x1930 [amdgpu] Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x40/0x70 Apr 19 15:06:40 endless kernel: ? __switch_to_xtra+0x3b8/0x5b0 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70 Apr 19 15:06:40 endless kernel: ? ttm_bo_mem_compat+0x28/0x60 [ttm] Apr 19 15:06:40 endless kernel: ? ttm_bo_validate+0x3d/0x130 [ttm] Apr 19 15:06:40 endless kernel: ? __switch_to+0x48b/0x4f0 Apr 19 15:06:40 endless kernel: ? __switch_to_asm+0x34/0x70 Apr 19 15:06:40 endless kernel: ? __schedule+0x2dc/0x840 Apr 19 15:06:40 endless kernel: ? amdgpu_bo_pin_restricted+0x1a2/0x270 [amdgpu] Apr 19 15:06:40 endless kernel: ? _cond_resched+0x19/0x30 Apr 19 15:06:40 endless kernel: ? wait_for_completion_timeout+0x38/0x140 Apr 19 15:06:40 endless kernel: ? _cond_resched+0x19/0x30 Apr 19 15:06:40 endless kernel: ? wait_for_completion_interruptible+0x35/0x1a0 Apr 19 15:06:40 endless kernel: commit_tail+0x42/0x70 [drm_kms_helper] Apr 19 15:06:40 endless kernel: ? commit_tail+0x42/0x70 [drm_kms_helper] Apr 19 15:06:40 endless kernel: drm_atomic_helper_commit+0x113/0x120 [drm_kms_helper] Apr 19 15:06:40 endless kernel: amdgpu_dm_atomic_commit+0x9b/0xe0 [amdgpu] Apr 19 15:06:40 endless kernel: drm_atomic_commit+0x4a/0x50 [drm] Apr 19 15:06:40 endless kernel: drm_atomic_helper_set_config+0x87/0x90 [drm_kms_helper] Apr 19 15:06:40 endless kernel: drm_mode_setcrtc+0x1bb/0x740 [drm] Apr 19 15:06:40 endless kernel: ? drm_is_current_master+0x1f/0x40 [drm] Apr 19 15:06:40 endless kernel: ? drm_mode_getcrtc+0x1a0/0x1a0 [drm] Apr 19 15:06:40 endless kernel: drm_ioctl_kernel+0xb0/0x100 [drm] Apr 19 15:06:40 endless kernel: drm_ioctl+0x233/0x410 [drm] Apr 19 15:06:40 endless kernel: ? drm_mode_getcrtc+0x1a0/0x1a0 [drm] Apr 19 15:06:40 endless kernel: amdgpu_drm_ioctl+0x4f/0x80 [amdgpu] Apr 19 15:06:40 endless kernel: do_vfs_ioctl+0xa9/0x640 Apr 19 15:06:40 endless kernel: ? tomoyo_file_ioctl+0x19/0x20 Apr 19 15:06:40 endless kernel: ksys_ioctl+0x67/0x90 Apr 19 15:06:40 endless kernel: __x64_sys_ioctl+0x1a/0x20 Apr 19 15:06:40 endless kernel: do_syscall_64+0x5a/0x110 Apr 19 15:06:40 endless kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Apr 19 15:06:40 endless kernel: RIP: 0033:0x7f36f7126777 Apr 19 15:06:40 endless kernel: Code: Bad RIP value. Apr 19 15:06:40 endless kernel: RSP: 002b:00007ffeb62a80d8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 Apr 19 15:06:40 endless kernel: RAX: ffffffffffffffda RBX: 00007ffeb62a8110 RCX: 00007f36f7126777 Apr 19 15:06:40 endless kernel: RDX: 00007ffeb62a8110 RSI: 00000000c06864a2 RDI: 000000000000000d Apr 19 15:06:40 endless kernel: RBP: 00007ffeb62a8110 R08: 0000000000000000 R09: 00005652f3eb9510 Apr 19 15:06:40 endless kernel: R10: 00007ffeb62a81d0 R11: 0000000000003246 R12: 00000000c06864a2 Apr 19 15:06:40 endless kernel: R13: 000000000000000d R14: 0000000000000000 R15: 00005652f3eb9510
https://bugs.freedesktop.org/show_bug.cgi?id=110457
jian-hong@endlessm.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|System resumes failed and |System resumes failed and |hits |hits |[drm:amdgpu_job_timedout |[drm:amdgpu_job_timedout |[amdgpu]] *ERROR* ring gfx |[amdgpu]] *ERROR* ring gfx |timeout on Acer Squirtle_SR |timeout on Acer Aspire |laptop |A315-21G
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #6 from Yury Zhuravlev stalkerg@gmail.com --- Vega56 Ryzen 2700x Kernel 5.0.3 Mesa latest master git libdrm latest master git llvm 8
I have the same problem then I use DXVK for the free version of Assasin Creed.
[ 3137.670744] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=191619, emitted seq=191621 [ 3137.670765] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ACU.exe pid 8085 thread ACU.exe:cs0 pid 8118 [ 3137.670767] amdgpu 0000:1f:00.0: GPU reset begin! [ 3147.900752] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #7 from Cameron Banfield freedesktop@cameron.bz --- I am having very similar issues and see similar errors in logs. The most recent error was:
kernel: amdgpu 0000:06:00.0: [gfxhub] no-retry page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 1301 thread Xorg:cs0 pid 1362) kernel: amdgpu 0000:06:00.0: in page starting at address 0x0000800108a18000 from 27 kernel: amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
The laptop is then unusable and requires a hard reboot.
Linux Mint 19.1 Kernel 5.1.0 AMD Ryzen PRO 2700U with Vega 10 graphics
Trying to load cities skylines is a guaranteed crash.
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #8 from Matt Coffin mcoffin13@gmail.com --- This is probably related to bug 102322, yes?
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #9 from redrield@gmail.com --- Created attachment 144900 --> https://bugs.freedesktop.org/attachment.cgi?id=144900&action=edit Thinkpad E585 log file with amdgpu errors
I'm running into an issue that I think is related to this. Attached a journal file containing the traces from the last boot where it occurred. For some reason, it doesn't happen every time I try to resume from suspend, but when it does I have no choice but to hard reboot. This is a Thinkpad E585, uname -a "Linux thonkpad 5.2.3-arch1-1-ARCH #1 SMP PREEMPT Fri Jul 26 08:13:47 UTC 2019 x86_64 GNU/Linux"
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #10 from Eugene Bright eugene@bright.gdn --- The patch is on it's way https://bugs.freedesktop.org/show_bug.cgi?id=110258#c12
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #11 from jian-hong@endlessm.com --- (In reply to Eugene Bright from comment #10)
The patch is on it's way https://bugs.freedesktop.org/show_bug.cgi?id=110258#c12
I tried the patch upon Linux stable 5.2.8. It fixed this issue. Thank you so much!
https://bugs.freedesktop.org/show_bug.cgi?id=110457
Alex Deucher alexdeucher@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |DUPLICATE
--- Comment #12 from Alex Deucher alexdeucher@gmail.com ---
*** This bug has been marked as a duplicate of bug 110258 ***
https://bugs.freedesktop.org/show_bug.cgi?id=110457
--- Comment #13 from darkshvein darkshvein@gmail.com --- Hello. please, explain. Why I work fine with FX-8320 CPU, but after Ryzen r5 1600 upgrade, I see this OS freezes and bug?
is pcie generation any cause? planned obsolescence? or coincidence with amdgpu driver update?
part of my log: [49266.138534] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=5660155, emitted seq=5660157 [49266.138578] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Civ6Sub pid 1778 thread Civ6Sub:cs0 pid 1781 [49266.138580] [drm] GPU recovery disabled. [49275.866518] INFO: task Xorg:sh1:1789 blocked for more than 122 seconds. [49275.866521] Tainted: G R O 5.2.10 #2
radeon 7970. mesa utils(8.4.0-1) linux 5.2.10 amdgpu Version: 18.1.99+git20190207-1
dri-devel@lists.freedesktop.org