https://bugs.freedesktop.org/show_bug.cgi?id=105317
Bug ID: 105317 Summary: The GPU Vega 56 was hang while try to pass #GraphicsFuzz shader15 test Product: DRI Version: XOrg git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: mikhail.v.gavrilov@gmail.com
Preparing:
1. install Fedora 27. https://download.fedoraproject.org/pub/fedora/linux/releases/27/Workstation/... 2. install latest MESA and LLVM https://copr.fedorainfracloud.org/coprs/che/mesa/ 3. build and install staging kernel with latest amdgpu driver $ git clone git://people.freedesktop.org/~agd5f/linux --branch amd-staging-drm-next $ cd linux $ make clean && make bzImage && make module # make modules_install && make install
Reproducing issue: 1. Launch any browser (I checked on Firefox and Opera) 2. Open http://www.graphicsfuzz.com/benchmark/android-v1.html 3. Press Go 4. Wait when reached shader15 test
Symptoms: 1. The system stop to respod. 2. All the LEDs on the video card showing the load start to glow. 3. The turbine on the video card starts to make a lot of noise.
In dmesg appears follow lines: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=71473, last emitted seq=71475 [drm] No hardware hang detected. Did some blocks stall?
If you are used Opera browser before would be added follow lines: [drm:gfx_v9_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream [drm] No hardware hang detected. Did some blocks stall?
https://bugs.freedesktop.org/show_bug.cgi?id=105317
--- Comment #1 from Emil Velikov emil.l.velikov@gmail.com --- Mikhail one suggestion to consider for the future:
Do mention version numbers (or sha if using a git checkout), for the different components mesa, llvm, kernel.
https://bugs.freedesktop.org/show_bug.cgi?id=105317
Michel Dänzer michel@daenzer.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Product|DRI |Mesa QA Contact| |dri-devel@lists.freedesktop | |.org Version|XOrg git |git Component|DRM/AMDgpu |Drivers/Gallium/radeonsi
https://bugs.freedesktop.org/show_bug.cgi?id=105317
--- Comment #2 from mikhail.v.gavrilov@gmail.com --- (In reply to Emil Velikov from comment #1)
Mikhail one suggestion to consider for the future:
Do mention version numbers (or sha if using a git checkout), for the different components mesa, llvm, kernel.
kernel: 4.16.0-rc1-git63e5921e856b mesa: 18.1.0-0.4.git56dc9f9 llvm: 7.0.0-0.1.r326462
https://bugs.freedesktop.org/show_bug.cgi?id=105317
--- Comment #3 from mikhail.v.gavrilov@gmail.com --- [ 463.172901] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=26958, last emitted seq=26960 [ 463.172985] [drm] No hardware hang detected. Did some blocks stall? [ 473.357738] sysrq: SysRq : Show Blocked State [ 473.357758] task PC stack pid father [ 473.357955] amdgpu_cs:0 D13176 2340 2283 0x00000000 [ 473.357969] Call Trace: [ 473.357988] ? __schedule+0x2ed/0xba0 [ 473.358005] ? dma_fence_default_wait+0x14f/0x370 [ 473.358013] schedule+0x2f/0x90 [ 473.358021] schedule_timeout+0x23d/0x540 [ 473.358030] ? find_held_lock+0x34/0xa0 [ 473.358044] ? mark_held_locks+0x56/0x80 [ 473.358053] ? _raw_spin_unlock_irqrestore+0x32/0x60 [ 473.358065] ? dma_fence_default_wait+0x14f/0x370 [ 473.358072] dma_fence_default_wait+0x23b/0x370 [ 473.358081] ? dma_fence_release+0x170/0x170 [ 473.358094] dma_fence_wait_timeout+0x4f/0x270 [ 473.358176] amdgpu_ctx_wait_prev_fence+0x4c/0x80 [amdgpu] [ 473.358237] amdgpu_cs_ioctl+0x99/0x1d60 [amdgpu] [ 473.358357] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [ 473.358383] drm_ioctl_kernel+0x5b/0xb0 [drm] [ 473.358409] drm_ioctl+0x2d5/0x370 [drm] [ 473.358466] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [ 473.358479] ? __pm_runtime_resume+0x54/0x90 [ 473.358493] ? trace_hardirqs_on_caller+0xed/0x180 [ 473.358551] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 473.358566] do_vfs_ioctl+0xa5/0x6e0 [ 473.358589] SyS_ioctl+0x74/0x80 [ 473.358603] do_syscall_64+0x79/0x220 [ 473.358612] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 473.358678] RIP: 0033:0x7fa95fa5c0f7 [ 473.358683] RSP: 002b:00007fa957459998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 473.358692] RAX: ffffffffffffffda RBX: 00007fa957459a80 RCX: 00007fa95fa5c0f7 [ 473.358697] RDX: 00007fa957459a00 RSI: 00000000c0186444 RDI: 000000000000000b [ 473.358701] RBP: 00007fa957459a00 R08: 00007fa957459ab0 R09: 00007fa9574599e0 [ 473.358706] R10: 00007fa957459ab0 R11: 0000000000000246 R12: 00000000c0186444 [ 473.358710] R13: 000000000000000b R14: 0000000002876fe8 R15: 0000000000000002 [ 473.358836] tracker-store D12456 2792 2166 0x00000000 [ 473.358848] Call Trace: [ 473.358862] ? __schedule+0x2ed/0xba0 [ 473.358882] schedule+0x2f/0x90 [ 473.358889] io_schedule+0x12/0x40 [ 473.358898] generic_file_read_iter+0x39e/0xdb0 [ 473.358922] ? page_cache_tree_insert+0x130/0x130 [ 473.359001] xfs_file_buffered_aio_read+0x65/0x1a0 [xfs] [ 473.359066] xfs_file_read_iter+0x64/0xc0 [xfs] [ 473.359077] __vfs_read+0x102/0x170 [ 473.359100] vfs_read+0x9e/0x150 [ 473.359111] SyS_pread64+0x93/0xb0 [ 473.359119] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 473.359132] do_syscall_64+0x79/0x220 [ 473.359142] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 473.359148] RIP: 0033:0x7f7bb7448873 [ 473.359152] RSP: 002b:00007ffc37fd1220 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 [ 473.359161] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f7bb7448873 [ 473.359166] RDX: 0000000000001000 RSI: 0000556e21670258 RDI: 0000000000000008 [ 473.359170] RBP: 0000000000001000 R08: 0000556e21670258 R09: 000000000fef0fff [ 473.359175] R10: 0000000002761000 R11: 0000000000000293 R12: 0000000000000000 [ 473.359179] R13: 0000556e21670258 R14: 0000000002761000 R15: 0000556e214e9d80 [ 473.359343] kworker/u16:0 D12152 4711 2 0x80000000 [ 473.359370] Workqueue: events_unbound commit_work [drm_kms_helper] [ 473.359379] Call Trace: [ 473.359394] ? __schedule+0x2ed/0xba0 [ 473.359410] ? dma_fence_default_wait+0x14f/0x370 [ 473.359418] schedule+0x2f/0x90 [ 473.359425] schedule_timeout+0x23d/0x540 [ 473.359433] ? find_held_lock+0x34/0xa0 [ 473.359448] ? mark_held_locks+0x56/0x80 [ 473.359456] ? _raw_spin_unlock_irqrestore+0x32/0x60 [ 473.359469] ? dma_fence_default_wait+0x14f/0x370 [ 473.359476] dma_fence_default_wait+0x23b/0x370 [ 473.359484] ? dma_fence_release+0x170/0x170 [ 473.359498] dma_fence_wait_timeout+0x4f/0x270 [ 473.359509] reservation_object_wait_timeout_rcu+0x193/0x4d0 [ 473.359607] amdgpu_dm_do_flip+0x112/0x350 [amdgpu] [ 473.359761] amdgpu_dm_atomic_commit_tail+0xb66/0xdc0 [amdgpu] [ 473.359777] ? wait_for_completion_timeout+0x76/0x1b0 [ 473.359826] commit_tail+0x3d/0x70 [drm_kms_helper] [ 473.359841] process_one_work+0x266/0x6b0 [ 473.359876] worker_thread+0x3a/0x390 [ 473.359883] ? process_one_work+0x6b0/0x6b0 [ 473.359886] kthread+0x121/0x140 [ 473.359890] ? kthread_create_worker_on_cpu+0x70/0x70 [ 473.359896] ret_from_fork+0x3a/0x50
https://bugs.freedesktop.org/show_bug.cgi?id=105317
--- Comment #4 from Timothy Arceri t_arceri@yahoo.com.au --- Created attachment 138471 --> https://bugs.freedesktop.org/attachment.cgi?id=138471&action=edit Shader runner link test
I've distilled one problem in the attached shader runner test. Seems we have another unrolling bug somewhere in the GLSL IR unrolling pass.
We end up with the following:
FRAG DCL OUT[0], COLOR DCL TEMP[0..3], LOCAL IMM[0] UINT32 {0, 4294967295, 0, 0} IMM[1] INT32 {0, 1, 0, 0} IMM[2] FLT32 { 1.0000, 0.0000, 0.0000, 0.0000} 0: MOV TEMP[0].x, IMM[0].xxxx 1: MOV TEMP[1].x, IMM[1].xxxx 2: BGNLOOP 3: USEQ TEMP[2].x, TEMP[1].xxxx, IMM[1].yyyy 4: UIF TEMP[2].xxxx 5: BRK 6: ENDIF 7: MOV TEMP[3], IMM[2].xxxx 8: MOV TEMP[0].x, IMM[0].yyyy 9: BRK 10: UADD TEMP[1].x, TEMP[1].xxxx, IMM[1].yyyy 11: ENDLOOP 12: MOV OUT[0], IMM[2].xxxx 13: END
Terminator found in the middle of a basic block! label %endif6 LLVM ERROR: Broken function found, compilation aborted!
https://bugs.freedesktop.org/show_bug.cgi?id=105317
Timothy Arceri t_arceri@yahoo.com.au changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |freedesktop@linux-geek.org
--- Comment #5 from Timothy Arceri t_arceri@yahoo.com.au --- *** Bug 104683 has been marked as a duplicate of this bug. ***
https://bugs.freedesktop.org/show_bug.cgi?id=105317
--- Comment #6 from Timothy Arceri t_arceri@yahoo.com.au --- Piglit test:
https://patchwork.freedesktop.org/patch/214341/
Mesa fix:
https://patchwork.freedesktop.org/patch/214346/
Note the WebGL test still froze in my testing but I think Firefox was continuing to use my system mesa libs for some reason. The mesa patch fixes the hang in the piglit test.
https://bugs.freedesktop.org/show_bug.cgi?id=105317
--- Comment #7 from Bráulio Barros de Oliveira brauliobo@gmail.com --- Likely duplicate of this https://bugs.freedesktop.org/show_bug.cgi?id=104817
https://bugs.freedesktop.org/show_bug.cgi?id=105317
--- Comment #8 from Juan A. Suarez jasuarez@igalia.com --- This already landed in Mesa. Can we close this as fixed?
https://bugs.freedesktop.org/show_bug.cgi?id=105317
--- Comment #9 from mikhail.v.gavrilov@gmail.com --- I don't thinks so because if it happens again by another reason GPU again will hang. I will be happy if it this case GPU reset code will present in driver.
https://bugs.freedesktop.org/show_bug.cgi?id=105317
Mauro Gaspari ilvipero@gmx.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |ilvipero@gmx.com
--- Comment #10 from Mauro Gaspari ilvipero@gmx.com --- I am also affected by this bug. I filed a bug with openSUSE tumbleweed and bug was closed earlier this year. However, with latest mesa updates, the issue resurfaced, therefore I reopened the bug. This is the link https://bugzilla.opensuse.org/show_bug.cgi?id=1090456
System Info: OS: OpenSUSE tumbleweed x86_64 updated (2018 08 27) Kernel: 4.18.0-1-default Desktop Environment: KDE Plasma (x11) OpenGL version string: 3.1 Mesa 18.1.6 GPU: AMD Radeon RX Vega 64 8GB
Relevant log lines I found during freeze:
2018-08-09T23:16:53.103775+08:00 MGDT-Tumbleweed kernel: [ 6305.852703] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=1745163, last emitted seq= 1745165 2018-08-09T23:16:53.103795+08:00 MGDT-Tumbleweed kernel: [ 6305.852704] [drm] No hardware hang detected. Did some blocks stall?
Dmesg lines relative to amdgpu:
[ 3.130759] [drm] amdgpu kernel modesetting enabled. [ 3.135770] fb: switching to amdgpudrmfb from EFI VGA [ 3.136106] amdgpu 0000:03:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff [ 3.136171] amdgpu 0000:03:00.0: VRAM: 8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used) [ 3.136173] amdgpu 0000:03:00.0: GTT: 512M 0x000000F600000000 - 0x000000F61FFFFFFF [ 3.136494] [drm] amdgpu: 8176M of VRAM memory ready [ 3.136495] [drm] amdgpu: 8176M of GTT memory ready. [ 4.114469] fbcon: amdgpudrmfb (fb0) is primary device [ 4.141179] amdgpu 0000:03:00.0: fb0: amdgpudrmfb frame buffer device [ 4.164072] amdgpu 0000:03:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0 [ 4.164074] amdgpu 0000:03:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0 [ 4.164075] amdgpu 0000:03:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0 [ 4.164075] amdgpu 0000:03:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0 [ 4.164076] amdgpu 0000:03:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0 [ 4.164077] amdgpu 0000:03:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0 [ 4.164078] amdgpu 0000:03:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0 [ 4.164079] amdgpu 0000:03:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0 [ 4.164079] amdgpu 0000:03:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0 [ 4.164080] amdgpu 0000:03:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0 [ 4.164081] amdgpu 0000:03:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1 [ 4.164082] amdgpu 0000:03:00.0: ring 11(sdma1) uses VM inv eng 5 on hub 1 [ 4.164083] amdgpu 0000:03:00.0: ring 12(uvd) uses VM inv eng 6 on hub 1 [ 4.164084] amdgpu 0000:03:00.0: ring 13(uvd_enc0) uses VM inv eng 7 on hub 1 [ 4.164085] amdgpu 0000:03:00.0: ring 14(uvd_enc1) uses VM inv eng 8 on hub 1 [ 4.164085] amdgpu 0000:03:00.0: ring 15(vce0) uses VM inv eng 9 on hub 1 [ 4.164086] amdgpu 0000:03:00.0: ring 16(vce1) uses VM inv eng 10 on hub 1 [ 4.164087] amdgpu 0000:03:00.0: ring 17(vce2) uses VM inv eng 11 on hub 1 [ 4.164553] [drm] Initialized amdgpu 3.25.0 20150101 for 0000:03:00.0 on minor 0
as a side note, the freeze does not happen on my Kubuntu system. Same hardware, same games.
OS: Kubuntu 18.04 x86_64 updated (2018 08 27) Kernel: 4.15.0-33-generic Desktop Environment: KDE Plasma (x11) OpenGL version string: 3.0 Mesa 18.0.5 GPU: AMD Radeon RX Vega 64 8GB
https://bugs.freedesktop.org/show_bug.cgi?id=105317
GitLab Migration User gitlab-migration@fdo.invalid changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED
--- Comment #11 from GitLab Migration User gitlab-migration@fdo.invalid --- -- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1307.
dri-devel@lists.freedesktop.org