https://bugs.freedesktop.org/show_bug.cgi?id=104001
Bug ID: 104001 Summary: GPU driver hung when start steam client while playback video on Youtube (it occurs on latest staging kernel) Product: DRI Version: XOrg git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: mikhail.v.gavrilov@gmail.com
Created attachment 135839 --> https://bugs.freedesktop.org/attachment.cgi?id=135839&action=edit dmesg
* Fedora 27 - https://download.fedoraproject.org/pub/fedora/linux/releases/27/Workstation/... * staging kernel - git://people.freedesktop.org/~agd5f/linux branch amd-staging-drm-next * mesa 17.4 and llvm 6.0 - https://copr.fedorainfracloud.org/coprs/che/mesa/
For reproduction issue: 1) Start playback video on Youtube in browser (Firefox ot Opera it's not matter) 2) Launch Steam client After few seconds GPU driver will hung...
Demonstration: https://youtu.be/2LuWI47oCFg
If we wait after it more than two minutes we got follow backtrace:
[ 492.840627] INFO: task kworker/u16:5:147 blocked for more than 120 seconds. [ 492.840641] Not tainted 4.14.0-rc3-amd-vega+ #5 [ 492.840644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 492.840648] kworker/u16:5 D11392 147 2 0x80000000 [ 492.840662] Workqueue: events_unbound commit_work [drm_kms_helper] [ 492.840666] Call Trace: [ 492.840674] __schedule+0x2dc/0xbb0 [ 492.840681] schedule+0x33/0x90 [ 492.840694] schedule_timeout+0x288/0x5c0 [ 492.840701] ? mark_held_locks+0x57/0x80 [ 492.840704] ? _raw_spin_unlock_irqrestore+0x36/0x60 [ 492.840713] dma_fence_default_wait+0x22a/0x380 [ 492.840716] ? dma_fence_default_wait+0x22a/0x380 [ 492.840720] ? dma_fence_release+0x170/0x170 [ 492.840725] dma_fence_wait_timeout+0x4f/0x270 [ 492.840729] reservation_object_wait_timeout_rcu+0x18d/0x510 [ 492.840768] amdgpu_dm_do_flip+0x12b/0x390 [amdgpu] [ 492.840801] amdgpu_dm_atomic_commit_tail+0xbe1/0xe80 [amdgpu] [ 492.840815] commit_tail+0x3f/0x70 [drm_kms_helper] [ 492.840820] commit_work+0x12/0x20 [drm_kms_helper] [ 492.840824] process_one_work+0x26b/0x6c0 [ 492.840832] worker_thread+0x35/0x3b0 [ 492.840837] kthread+0x171/0x190 [ 492.840840] ? process_one_work+0x6c0/0x6c0 [ 492.840843] ? kthread_create_on_node+0x70/0x70 [ 492.840847] ? kthread_create_on_node+0x70/0x70 [ 492.840850] ret_from_fork+0x2a/0x40 [ 492.840906] INFO: task amdgpu_cs:0:2013 blocked for more than 120 seconds. [ 492.840909] Not tainted 4.14.0-rc3-amd-vega+ #5 [ 492.840912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 492.840915] amdgpu_cs:0 D13312 2013 1981 0x00000000 [ 492.840921] Call Trace: [ 492.840926] __schedule+0x2dc/0xbb0 [ 492.840931] ? save_stack_trace+0x1b/0x20 [ 492.840936] schedule+0x33/0x90 [ 492.840939] schedule_timeout+0x288/0x5c0 [ 492.840944] ? mark_held_locks+0x57/0x80 [ 492.840947] ? _raw_spin_unlock_irqrestore+0x36/0x60 [ 492.840951] ? trace_hardirqs_on_caller+0xf4/0x190 [ 492.840957] dma_fence_default_wait+0x22a/0x380 [ 492.840960] ? dma_fence_default_wait+0x22a/0x380 [ 492.840964] ? dma_fence_release+0x170/0x170 [ 492.840969] dma_fence_wait_timeout+0x4f/0x270 [ 492.840987] amdgpu_ctx_wait_prev_fence+0x4a/0x80 [amdgpu] [ 492.841003] amdgpu_cs_ioctl+0xaf/0x1eb0 [amdgpu] [ 492.841038] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [ 492.841051] drm_ioctl_kernel+0x5d/0xb0 [drm] [ 492.841060] drm_ioctl+0x31b/0x3d0 [drm] [ 492.841074] ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu] [ 492.841081] ? trace_hardirqs_on_caller+0xf4/0x190 [ 492.841085] ? trace_hardirqs_on+0xd/0x10 [ 492.841101] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu] [ 492.841107] do_vfs_ioctl+0xa6/0x6c0 [ 492.841115] SyS_ioctl+0x79/0x90 [ 492.841120] entry_SYSCALL_64_fastpath+0x1f/0xbe [ 492.841123] RIP: 0033:0x7f7eb9078dc7 [ 492.841125] RSP: 002b:00007f7eb0c479b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 492.841130] RAX: ffffffffffffffda RBX: 00000000025f5ab8 RCX: 00007f7eb9078dc7 [ 492.841132] RDX: 00007f7eb0c47a20 RSI: 00000000c0186444 RDI: 000000000000000b [ 492.841134] RBP: 000000000266f860 R08: 00007f7eb0c47ad0 R09: 00007f7eb0c47a00 [ 492.841136] R10: 00007f7eb0c47ad0 R11: 0000000000000246 R12: 0000000000000007 [ 492.841138] R13: 0000000000000001 R14: 00000000025f5ab8 R15: 0000000000000000 [ 492.841171] INFO: task kworker/3:3:2263 blocked for more than 120 seconds. [ 492.841174] Not tainted 4.14.0-rc3-amd-vega+ #5 [ 492.841177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 492.841180] kworker/3:3 D13448 2263 2 0x80000000 [ 492.841190] Workqueue: events ttm_bo_delayed_workqueue [ttm] [ 492.841194] Call Trace: [ 492.841199] __schedule+0x2dc/0xbb0 [ 492.841206] schedule+0x33/0x90 [ 492.841209] schedule_preempt_disabled+0x15/0x20 [ 492.841212] __ww_mutex_lock.constprop.9+0xa6f/0x10a0 [ 492.841216] ? __lock_is_held+0x59/0xa0 [ 492.841220] ? ttm_bo_delayed_delete+0x108/0x1b0 [ttm] [ 492.841228] ww_mutex_lock+0x5e/0x70 [ 492.841231] ? ww_mutex_lock+0x5e/0x70 [ 492.841235] ttm_bo_delayed_delete+0x108/0x1b0 [ttm] [ 492.841243] ttm_bo_delayed_workqueue+0x1b/0x40 [ttm] [ 492.841246] process_one_work+0x26b/0x6c0 [ 492.841253] worker_thread+0x35/0x3b0 [ 492.841259] kthread+0x171/0x190 [ 492.841262] ? process_one_work+0x6c0/0x6c0 [ 492.841264] ? kthread_create_on_node+0x70/0x70 [ 492.841269] ret_from_fork+0x2a/0x40 [ 492.841323] Showing all locks held in the system: [ 492.841332] 1 lock held by khungtaskd/62: [ 492.841336] #0: (tasklist_lock){.+.+}, at: [<ffffffffba1116ed>] debug_show_all_locks+0x3d/0x1a0 [ 492.841353] 3 locks held by kworker/u16:5/147: [ 492.841355] #0: ("events_unbound"){+.+.}, at: [<ffffffffba0ceb41>] process_one_work+0x1e1/0x6c0 [ 492.841365] #1: ((&state->commit_work)){+.+.}, at: [<ffffffffba0ceb41>] process_one_work+0x1e1/0x6c0 [ 492.841376] #2: (reservation_ww_class_mutex){+.+.}, at: [<ffffffffc03bbbea>] amdgpu_dm_do_flip+0xea/0x390 [amdgpu] [ 492.841451] 1 lock held by gnome-shell/1981: [ 492.841453] #0: (reservation_ww_class_mutex){+.+.}, at: [<ffffffffc0273626>] ttm_bo_vm_fault+0x66/0x5d0 [ttm] [ 492.841468] 1 lock held by amdgpu_cs:0/2013: [ 492.841470] #0: (&ctx->lock){+.+.}, at: [<ffffffffc02e4d4e>] amdgpu_cs_ioctl+0x59e/0x1eb0 [amdgpu] [ 492.841513] 3 locks held by kworker/3:3/2263: [ 492.841515] #0: ("events"){+.+.}, at: [<ffffffffba0ceb41>] process_one_work+0x1e1/0x6c0 [ 492.841526] #1: ((&(&bdev->wq)->work)){+.+.}, at: [<ffffffffba0ceb41>] process_one_work+0x1e1/0x6c0 [ 492.841536] #2: (reservation_ww_class_mutex){+.+.}, at: [<ffffffffc026f948>] ttm_bo_delayed_delete+0x108/0x1b0 [ttm] [ 492.841623] 1 lock held by steamerrorrepor/4598: [ 492.841625] #0: (drm_global_mutex){+.+.}, at: [<ffffffffc01f86ab>] drm_release+0x3b/0x3b0 [drm]
[ 492.841644] =============================================
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #1 from Michel Dänzer michel@daenzer.net --- Does "it occurs on latest staging kernel" mean it doesn't happen with an earlier staging kernel or with another kernel version? If so, can you provide more details about what kernels it doesn't happen with?
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #2 from mikhail.v.gavrilov@gmail.com --- Earlier kernels don't support GPU Vega. So I can't recheck it with earlier kernel which works fine with IGPU on same machine.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #3 from mikhail.v.gavrilov@gmail.com --- Created attachment 135984 --> https://bugs.freedesktop.org/attachment.cgi?id=135984&action=edit dmesg
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #4 from mikhail.v.gavrilov@gmail.com --- Created attachment 136012 --> https://bugs.freedesktop.org/attachment.cgi?id=136012&action=edit dmesg with 4.15.0-rc2 amd-staging-drm-next
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #5 from mikhail.v.gavrilov@gmail.com --- Created attachment 136036 --> https://bugs.freedesktop.org/attachment.cgi?id=136036&action=edit dmesg with 4.15.0-rc2 amd-staging-drm-next
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #6 from mikhail.v.gavrilov@gmail.com --- Created attachment 136200 --> https://bugs.freedesktop.org/attachment.cgi?id=136200&action=edit dmesg with 4.15.0-rc2 amd-staging-drm-next
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #7 from mikhail.v.gavrilov@gmail.com --- With latest build in dmesg appear message when hang again occurs: [ 341.475043] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=110200, last emitted seq=110202 [ 341.475059] [drm] No hardware hang detected. Did some blocks stall?
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #8 from mikhail.v.gavrilov@gmail.com --- Created attachment 136346 --> https://bugs.freedesktop.org/attachment.cgi?id=136346&action=edit dmesg with 4.15.0-rc2 amd-staging-drm-next
https://bugs.freedesktop.org/show_bug.cgi?id=104001
mikhail.v.gavrilov@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|GPU driver hung when start |[drm:amdgpu_job_timedout |steam client while playback |[amdgpu]] *ERROR* ring gfx |video on Youtube (it occurs |timeout, last signaled |on latest staging kernel) |seq=6582, last emitted | |seq=6584
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #9 from mikhail.v.gavrilov@gmail.com --- Created attachment 136517 --> https://bugs.freedesktop.org/attachment.cgi?id=136517&action=edit dmesg with 4.15.0-rc2 amd-staging-drm-next
https://bugs.freedesktop.org/show_bug.cgi?id=104001
Christian König ckoenig.leichtzumerken@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|[drm:amdgpu_job_timedout |GPU driver hung when start |[amdgpu]] *ERROR* ring gfx |steam client while playback |timeout, last signaled |video on Youtube (it occurs |seq=6582, last emitted |on latest staging kernel) |seq=6584 |
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #10 from Christian König ckoenig.leichtzumerken@gmail.com --- Yeah, I enabled more error messages on amd-staging-drm-next.
But please don't change the bug subject to something less descriptive.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #11 from mikhail.v.gavrilov@gmail.com --- (In reply to Christian König from comment #10)
Yeah, I enabled more error messages on amd-staging-drm-next.
But it still not enough for understand root cause?
Can you also rebase amd-staging-drm-next to RC5 with enabled KPTI patch? I do not want to sit on a vulnerable kernel. The default shipped kernel in Fedora already patched but not having AMD Vega support.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #12 from mikhail.v.gavrilov@gmail.com --- https://youtu.be/MZ2O4XqxBZM
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #13 from mikhail.v.gavrilov@gmail.com --- Created attachment 136579 --> https://bugs.freedesktop.org/attachment.cgi?id=136579&action=edit dmesg with 4.15.0-rc2 amd-staging-drm-next
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #14 from Dieter Nützel Dieter@nuetzel-hh.de --- (In reply to mikhail.v.gavrilov from comment #11)
(In reply to Christian König from comment #10)
Yeah, I enabled more error messages on amd-staging-drm-next.
But it still not enough for understand root cause?
Can you also rebase amd-staging-drm-next to RC5 with enabled KPTI patch? I do not want to sit on a vulnerable kernel. The default shipped kernel in Fedora already patched but not having AMD Vega support.
Christian and Alex,
when you do this (rebase to RCx with enabled KPTI patch) please do it at least to RC7 (NOT penalize AMD chips) even that I'm on Intel Xeon currently...;-)
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #15 from mikhail.v.gavrilov@gmail.com --- Created attachment 136599 --> https://bugs.freedesktop.org/attachment.cgi?id=136599&action=edit dmesg with 4.15.0-rc2 amd-staging-drm-next with SysRq : Show State
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #16 from mikhail.v.gavrilov@gmail.com --- Created attachment 136809 --> https://bugs.freedesktop.org/attachment.cgi?id=136809&action=edit dmesg with 4.15.0-rc4 amd-staging-drm-next with SysRq : Show State
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #17 from mikhail.v.gavrilov@gmail.com --- Created attachment 136836 --> https://bugs.freedesktop.org/attachment.cgi?id=136836&action=edit dmesg with 4.15.0-rc4 amd-staging-drm-next e6555e61902c with SysRq : Show State
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #18 from Christian König ckoenig.leichtzumerken@gmail.com --- Please stop attaching more and more dmesg with unrelated information to the bug report. The initial one is perfectly sufficient.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #19 from mikhail.v.gavrilov@gmail.com --- I am sorry for misunderstanding. Every time when I see new commits in branch I hope that this issue may be fixed. And every time I rebuild kernel for testing. And after it I every time I reproduce this annoying bug. And I still hope that anybody works on it and improve logging for understanding root cause of this hung. So I every time attach new dmesg log.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #20 from mikhail.v.gavrilov@gmail.com --- Anybody is investigated this bug? It is not necessary watch video for occurring computer hung. It hangs just after running the client's Steam or during the game if computer already worked some time. I'm already tired of pressing the reset button because "init 6" is not able to restart the computer after such a hang. For today I already press reset button more than 30 times. But no one care about it :(
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #21 from mikhail.v.gavrilov@gmail.com --- Created attachment 137680 --> https://bugs.freedesktop.org/attachment.cgi?id=137680&action=edit dmesg with 4.16.0-rc1 amd-staging-drm-next
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #22 from mikhail.v.gavrilov@gmail.com --- Sadly still present in 4.16 rc1
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #23 from mikhail.v.gavrilov@gmail.com --- Found you another crash case:
The @GraphicsFuzz demo found 1 issue (14/15 tests passed) on my desktop device, affecting my @AMD GPU driver Give it a try: www.graphicsfuzz.com/#demo #GraphicsFuzz
Computer always hangs on shader15
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #24 from mikhail.v.gavrilov@gmail.com --- Created attachment 137710 --> https://bugs.freedesktop.org/attachment.cgi?id=137710&action=edit photo of test when computer is hang
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #25 from Michel Dänzer michel@daenzer.net --- (In reply to mikhail.v.gavrilov from comment #23)
Found you another crash case:
That's unlikely to be the exact same cause as that of the Steam hang this report is about, so it needs to be tracked separately.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #26 from mikhail.v.gavrilov@gmail.com --- (In reply to Michel Dänzer from comment #25)
That's unlikely to be the exact same cause as that of the Steam hang this report is about, so it needs to be tracked separately.
Ok, https://bugs.freedesktop.org/show_bug.cgi?id=105317
https://bugs.freedesktop.org/show_bug.cgi?id=104001
Dennis Schridde devurandom@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |devurandom@gmx.net
--- Comment #27 from Dennis Schridde devurandom@gmx.net --- I run into this issue regularly with an AMD Ryzen 5 2400G (primary display, connected via DP to the monitor) and an AMD Radeon RX 560 (not connected to a monitor, secondary display according to mainboard firmware configuration).
After using my computer for some time, the graphics suddenly freezes and I see lines like the following in dmesg (after logging in via SSH):
[Fri Mar 2 21:05:33 2018] amdgpu: [powerplay] pp_dpm_get_temperature was not implemented. [Fri Mar 2 21:06:03 2018] INFO: task X:898 blocked for more than 120 seconds. [Fri Mar 2 21:06:03 2018] Tainted: G W 4.15.7-gentoo-r1 #2 [Fri Mar 2 21:06:03 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Fri Mar 2 21:06:03 2018] X D 0 898 881 0x00000004 [Fri Mar 2 21:06:03 2018] Call Trace: [Fri Mar 2 21:06:03 2018] ? __schedule+0x2a7/0x8b0 [Fri Mar 2 21:06:03 2018] schedule+0x28/0x80 [Fri Mar 2 21:06:03 2018] schedule_preempt_disabled+0xa/0x10 [Fri Mar 2 21:06:03 2018] __ww_mutex_lock.isra.3+0x224/0x690 [Fri Mar 2 21:06:03 2018] ? drm_modeset_backoff+0x3e/0xb0 [drm] [Fri Mar 2 21:06:03 2018] drm_modeset_backoff+0x3e/0xb0 [drm] [Fri Mar 2 21:06:03 2018] drm_mode_gamma_set_ioctl+0xb4/0x200 [drm] [Fri Mar 2 21:06:03 2018] ? drm_mode_crtc_set_gamma_size+0xa0/0xa0 [drm] [Fri Mar 2 21:06:03 2018] drm_ioctl_kernel+0x5b/0xb0 [drm] [Fri Mar 2 21:06:03 2018] drm_ioctl+0x2d5/0x370 [drm] [Fri Mar 2 21:06:03 2018] ? drm_mode_crtc_set_gamma_size+0xa0/0xa0 [drm] [Fri Mar 2 21:06:03 2018] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [Fri Mar 2 21:06:03 2018] do_vfs_ioctl+0xa4/0x670 [Fri Mar 2 21:06:03 2018] ? __sys_recvmsg+0x64/0xa0 [Fri Mar 2 21:06:03 2018] ? __sys_recvmsg+0x95/0xa0 [Fri Mar 2 21:06:03 2018] SyS_ioctl+0x74/0x80 [Fri Mar 2 21:06:03 2018] do_syscall_64+0x6e/0x120 [Fri Mar 2 21:06:03 2018] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [Fri Mar 2 21:06:03 2018] RIP: 0033:0x7fd8924c0467 [Fri Mar 2 21:06:03 2018] RSP: 002b:00007ffcb17d7b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [Fri Mar 2 21:06:03 2018] RAX: ffffffffffffffda RBX: 0000560ddb4480e0 RCX: 00007fd8924c0467 [Fri Mar 2 21:06:03 2018] RDX: 00007ffcb17d7b40 RSI: 00000000c02064a5 RDI: 0000000000000016 [Fri Mar 2 21:06:03 2018] RBP: 00007ffcb17d7b40 R08: 0000560ddb4487a0 R09: 0000560ddb4489a0 [Fri Mar 2 21:06:03 2018] R10: 0000000000000001 R11: 0000000000000246 R12: 00000000c02064a5 [Fri Mar 2 21:06:03 2018] R13: 0000000000000016 R14: 0000560ddb448bb0 R15: 0000560ddb4485a0 [Fri Mar 2 21:06:03 2018] INFO: task kworker/u32:2:32344 blocked for more than 120 seconds. [Fri Mar 2 21:06:03 2018] Tainted: G W 4.15.7-gentoo-r1 #2 [Fri Mar 2 21:06:03 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Fri Mar 2 21:06:03 2018] kworker/u32:2 D 0 32344 2 0x80000000 [Fri Mar 2 21:06:03 2018] Workqueue: events_unbound commit_work [drm_kms_helper] [Fri Mar 2 21:06:03 2018] Call Trace: [Fri Mar 2 21:06:03 2018] ? __schedule+0x2a7/0x8b0 [Fri Mar 2 21:06:03 2018] schedule+0x28/0x80 [Fri Mar 2 21:06:03 2018] schedule_timeout+0x1e7/0x370 [Fri Mar 2 21:06:03 2018] ? generic_reg_get+0x21/0x30 [amdgpu] [Fri Mar 2 21:06:03 2018] dma_fence_default_wait+0x1f0/0x280 [Fri Mar 2 21:06:03 2018] ? dma_fence_release+0x90/0x90 [Fri Mar 2 21:06:03 2018] dma_fence_wait_timeout+0x39/0xf0 [Fri Mar 2 21:06:03 2018] reservation_object_wait_timeout_rcu+0x17b/0x370 [Fri Mar 2 21:06:03 2018] amdgpu_dm_do_flip+0x11f/0x360 [amdgpu] [Fri Mar 2 21:06:03 2018] amdgpu_dm_atomic_commit_tail+0x8a1/0x9a0 [amdgpu] [Fri Mar 2 21:06:03 2018] ? _cond_resched+0x15/0x40 [Fri Mar 2 21:06:03 2018] ? wait_for_completion_timeout+0x35/0x180 [Fri Mar 2 21:06:03 2018] commit_tail+0x3d/0x70 [drm_kms_helper] [Fri Mar 2 21:06:03 2018] process_one_work+0x1da/0x3d0 [Fri Mar 2 21:06:03 2018] worker_thread+0x2b/0x3f0 [Fri Mar 2 21:06:03 2018] ? process_one_work+0x3d0/0x3d0 [Fri Mar 2 21:06:03 2018] kthread+0x113/0x130 [Fri Mar 2 21:06:03 2018] ? kthread_create_worker_on_cpu+0x70/0x70 [Fri Mar 2 21:06:03 2018] ? SyS_exit_group+0x10/0x10 [Fri Mar 2 21:06:03 2018] ret_from_fork+0x22/0x40 [Fri Mar 2 21:06:33 2018] i2c /dev entries driver
Everything apart from the graphics appears to continue to run fine, except any application (e.g. started on the command line) that tries to talk to the X server: They will hang. Most applications that hang can be killed with SIGKILL, except the X server and a few others, which will be zombies forever.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #28 from Dennis Schridde devurandom@gmx.net --- Linux kernel is at 4.15.7-gentoo-r1, LLVM at 5.0.1, Mesa at 18.0.0_rc4.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #29 from mikhail.v.gavrilov@gmail.com --- [ 69.089101] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=897, last emitted seq=899 [ 69.089176] [drm] No hardware hang detected. Did some blocks stall? [ 85.813890] sysrq: SysRq : Show Blocked State [ 85.813982] task PC stack pid father [ 85.814019] kworker/u16:4 D14104 146 2 0x80000000 [ 85.814055] Workqueue: events_unbound commit_work [drm_kms_helper] [ 85.814058] Call Trace: [ 85.814064] ? __schedule+0x2ed/0xba0 [ 85.814070] ? dma_fence_default_wait+0x14f/0x370 [ 85.814073] schedule+0x2f/0x90 [ 85.814076] schedule_timeout+0x23d/0x540 [ 85.814079] ? find_held_lock+0x34/0xa0 [ 85.814084] ? mark_held_locks+0x56/0x80 [ 85.814087] ? _raw_spin_unlock_irqrestore+0x32/0x60 [ 85.814091] ? dma_fence_default_wait+0x14f/0x370 [ 85.814094] dma_fence_default_wait+0x23b/0x370 [ 85.814097] ? dma_fence_release+0x170/0x170 [ 85.814101] dma_fence_wait_timeout+0x4f/0x270 [ 85.814105] reservation_object_wait_timeout_rcu+0x193/0x4d0 [ 85.814148] amdgpu_dm_do_flip+0x112/0x350 [amdgpu] [ 85.814188] amdgpu_dm_atomic_commit_tail+0xb66/0xdc0 [amdgpu] [ 85.814194] ? wait_for_completion_timeout+0x76/0x1b0 [ 85.814206] commit_tail+0x3d/0x70 [drm_kms_helper] [ 85.814211] process_one_work+0x266/0x6b0 [ 85.814218] worker_thread+0x3a/0x390 [ 85.814222] ? process_one_work+0x6b0/0x6b0 [ 85.814225] kthread+0x121/0x140 [ 85.814228] ? kthread_create_worker_on_cpu+0x70/0x70 [ 85.814231] ret_from_fork+0x3a/0x50 [ 85.814391] tracker-store D12184 2786 2167 0x00000000 [ 85.814395] Call Trace: [ 85.814400] ? __schedule+0x2ed/0xba0 [ 85.814406] schedule+0x2f/0x90 [ 85.814409] io_schedule+0x12/0x40 [ 85.814413] generic_file_read_iter+0x39e/0xdb0 [ 85.814420] ? page_cache_tree_insert+0x130/0x130 [ 85.814474] xfs_file_buffered_aio_read+0x65/0x1a0 [xfs] [ 85.814498] xfs_file_read_iter+0x64/0xc0 [xfs] [ 85.814504] __vfs_read+0x102/0x170 [ 85.814511] vfs_read+0x9e/0x150 [ 85.814515] SyS_pread64+0x93/0xb0 [ 85.814518] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 85.814523] do_syscall_64+0x79/0x220 [ 85.814526] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 85.814528] RIP: 0033:0x7ff185a6a873 [ 85.814530] RSP: 002b:00007ffe0e646780 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 [ 85.814533] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007ff185a6a873 [ 85.814535] RDX: 0000000000001000 RSI: 00005613cc6fbe48 RDI: 0000000000000008 [ 85.814536] RBP: 0000000000001000 R08: 00005613cc6fbe48 R09: 000000000ff80fff [ 85.814538] R10: 000000001982a000 R11: 0000000000000293 R12: 0000000000000000 [ 85.814539] R13: 00005613cc6fbe48 R14: 000000001982a000 R15: 00005613cc446580 [ 85.814601] gldriverquery D12856 4120 4072 0xa0020002 [ 85.814606] Call Trace: [ 85.814611] ? __schedule+0x2ed/0xba0 [ 85.814617] schedule+0x2f/0x90 [ 85.814621] drm_sched_entity_fini+0xbe/0x2b0 [gpu_sched] [ 85.814626] ? finish_wait+0x80/0x80 [ 85.814649] amdgpu_ctx_fini+0xbf/0x100 [amdgpu] [ 85.814672] amdgpu_ctx_mgr_fini+0x7c/0xc0 [amdgpu] [ 85.814692] amdgpu_driver_postclose_kms+0x57/0x220 [amdgpu] [ 85.814708] drm_release+0x2a0/0x3c0 [drm] [ 85.814714] __fput+0xe9/0x200 [ 85.814719] task_work_run+0x87/0xb0 [ 85.814723] do_exit+0x345/0xd70 [ 85.814727] ? up_read+0x1c/0x40 [ 85.814730] ? __do_page_fault+0x2af/0x530 [ 85.814735] do_group_exit+0x47/0xc0 [ 85.814738] SyS_exit_group+0x10/0x10 [ 85.814740] do_fast_syscall_32+0xbf/0x376 [ 85.814744] entry_SYSENTER_compat+0x84/0x96
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #30 from Marek Olšák maraeo@gmail.com --- This might fix it: https://cgit.freedesktop.org/mesa/mesa/commit/?id=d15fb766aa3c98ffbe16d050b2...
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #31 from mikhail.v.gavrilov@gmail.com --- (In reply to Marek Olšák from comment #30)
This might fix it: https://cgit.freedesktop.org/mesa/mesa/commit/ ?id=d15fb766aa3c98ffbe16d050b2af4804e4b12c57
For which mesa version this patch?
My si_pipe.c (mesa 18.0.1) looks differently https://imgur.com/a/dc3RoHi
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #32 from Marek Olšák maraeo@gmail.com --- The patch is already backported in the 18.0 branch: https://cgit.freedesktop.org/mesa/mesa/log/?h=18.0
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #33 from mikhail.v.gavrilov@gmail.com --- (In reply to Marek Olšák from comment #32)
The patch is already backported in the 18.0 branch: https://cgit.freedesktop.org/mesa/mesa/log/?h=18.0
How I can sure what patch already applied in my mesa?
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #34 from mikhail.v.gavrilov@gmail.com --- Created attachment 139363 --> https://bugs.freedesktop.org/attachment.cgi?id=139363&action=edit my si_pipe.c file
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #35 from mikhail.v.gavrilov@gmail.com --- Looks like my si_pipe.c already patched. But my GPU still hangs when I try pass one and the same place in the game Rise of Tomb Rider.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #36 from mikhail.v.gavrilov@gmail.com --- Created attachment 139364 --> https://bugs.freedesktop.org/attachment.cgi?id=139364&action=edit Here GPU VEGA always hungs
https://bugs.freedesktop.org/show_bug.cgi?id=104001
Marek Olšák maraeo@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
--- Comment #37 from Marek Olšák maraeo@gmail.com --- Rise of Tomb Raider is a Vulkan game. You can file a RADV bug for it. I'm closing this since you are not reporting any issues with Youtube.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #38 from mikhail.v.gavrilov@gmail.com --- (In reply to Marek Olšák from comment #37)
Rise of Tomb Raider is a Vulkan game. You can file a RADV bug for it. I'm closing this since you are not reporting any issues with Youtube.
My bug report about the game Rise of Tomb Raider unfortunately closed without explanations which patches needed for fix hung: https://bugs.freedesktop.org/show_bug.cgi?id=106196 And symptomes of problem same as in this bug report: - The system stop to respod. - All the LEDs on the video card showing power consumption start to glow. - The turbine on the video card starts to make a lot of noise.
Long time ago the Intel driver hungs too: https://bugs.freedesktop.org/show_bug.cgi?id=54226 but intel developers add GPU reset in such situations why not add GPU reset also for AMD GPU?
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #39 from Marek Olšák maraeo@gmail.com --- We are working on the GPU reset, we just don't have any ETA. The GPU reset is something you can't rely on to save you. In most cases, a successful GPU reset needs a complete restart of X or Wayland, so you'll lose the whole desktop and all running desktop applications. You are likely to get into an infinite GPU-hang+GPU-reset loop if the driver doesn't kill all apps. With the current Linux desktop architecture that isn't aware of GPU resets, a GPU reset is mostly unusable.
The implementation of the GPU reset is secondary to making sure that GPU hangs don't occur. Thus, bugs about GPU hangs are only about fixing GPU hangs. Rise Of The Tomb Raider is a Vulkan game, so any hangs within the game are RADV bugs. Filing a bug against DRM/AMDgpu for a GPU hang within that game is less effective than filing a bug against RADV.
https://bugs.freedesktop.org/show_bug.cgi?id=104001
--- Comment #40 from mikhail.v.gavrilov@gmail.com --- Ok, now video playback results in a hangup without a steam client if vaapi is used: https://bugs.freedesktop.org/show_bug.cgi?id=106430
dri-devel@lists.freedesktop.org