New subject: [Bug 104001] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=6582, last emitted seq=6584

30 Nov 2017


      https://bugs.freedesktop.org/show_bug.cgi?id=104001
Bug ID: 104001
           Summary: GPU driver hung when start steam client while playback
                    video on Youtube (it occurs on latest staging kernel)
           Product: DRI
           Version: XOrg git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: mikhail.v.gavrilov@gmail.com
Created attachment 135839
  --> https://bugs.freedesktop.org/attachment.cgi?id=135839&action=edit
dmesg
* Fedora 27 -
https://download.fedoraproject.org/pub/fedora/linux/releases/27/Workstation/...
* staging kernel - git://people.freedesktop.org/~agd5f/linux branch
amd-staging-drm-next
* mesa 17.4 and llvm 6.0 - https://copr.fedorainfracloud.org/coprs/che/mesa/
For reproduction issue:
1) Start playback video on Youtube in browser (Firefox ot Opera it's
not matter)
2) Launch Steam client
After few seconds GPU driver will hung...
Demonstration: https://youtu.be/2LuWI47oCFg
If we wait after it more than two minutes we got follow backtrace:
[  492.840627] INFO: task kworker/u16:5:147 blocked for more than 120 seconds.
[  492.840641]       Not tainted 4.14.0-rc3-amd-vega+ #5
[  492.840644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  492.840648] kworker/u16:5   D11392   147      2 0x80000000
[  492.840662] Workqueue: events_unbound commit_work [drm_kms_helper]
[  492.840666] Call Trace:
[  492.840674]  __schedule+0x2dc/0xbb0
[  492.840681]  schedule+0x33/0x90
[  492.840694]  schedule_timeout+0x288/0x5c0
[  492.840701]  ? mark_held_locks+0x57/0x80
[  492.840704]  ? _raw_spin_unlock_irqrestore+0x36/0x60
[  492.840713]  dma_fence_default_wait+0x22a/0x380
[  492.840716]  ? dma_fence_default_wait+0x22a/0x380
[  492.840720]  ? dma_fence_release+0x170/0x170
[  492.840725]  dma_fence_wait_timeout+0x4f/0x270
[  492.840729]  reservation_object_wait_timeout_rcu+0x18d/0x510
[  492.840768]  amdgpu_dm_do_flip+0x12b/0x390 [amdgpu]
[  492.840801]  amdgpu_dm_atomic_commit_tail+0xbe1/0xe80 [amdgpu]
[  492.840815]  commit_tail+0x3f/0x70 [drm_kms_helper]
[  492.840820]  commit_work+0x12/0x20 [drm_kms_helper]
[  492.840824]  process_one_work+0x26b/0x6c0
[  492.840832]  worker_thread+0x35/0x3b0
[  492.840837]  kthread+0x171/0x190
[  492.840840]  ? process_one_work+0x6c0/0x6c0
[  492.840843]  ? kthread_create_on_node+0x70/0x70
[  492.840847]  ? kthread_create_on_node+0x70/0x70
[  492.840850]  ret_from_fork+0x2a/0x40
[  492.840906] INFO: task amdgpu_cs:0:2013 blocked for more than 120 seconds.
[  492.840909]       Not tainted 4.14.0-rc3-amd-vega+ #5
[  492.840912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  492.840915] amdgpu_cs:0     D13312  2013   1981 0x00000000
[  492.840921] Call Trace:
[  492.840926]  __schedule+0x2dc/0xbb0
[  492.840931]  ? save_stack_trace+0x1b/0x20
[  492.840936]  schedule+0x33/0x90
[  492.840939]  schedule_timeout+0x288/0x5c0
[  492.840944]  ? mark_held_locks+0x57/0x80
[  492.840947]  ? _raw_spin_unlock_irqrestore+0x36/0x60
[  492.840951]  ? trace_hardirqs_on_caller+0xf4/0x190
[  492.840957]  dma_fence_default_wait+0x22a/0x380
[  492.840960]  ? dma_fence_default_wait+0x22a/0x380
[  492.840964]  ? dma_fence_release+0x170/0x170
[  492.840969]  dma_fence_wait_timeout+0x4f/0x270
[  492.840987]  amdgpu_ctx_wait_prev_fence+0x4a/0x80 [amdgpu]
[  492.841003]  amdgpu_cs_ioctl+0xaf/0x1eb0 [amdgpu]
[  492.841038]  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
[  492.841051]  drm_ioctl_kernel+0x5d/0xb0 [drm]
[  492.841060]  drm_ioctl+0x31b/0x3d0 [drm]
[  492.841074]  ? amdgpu_cs_find_mapping+0x110/0x110 [amdgpu]
[  492.841081]  ? trace_hardirqs_on_caller+0xf4/0x190
[  492.841085]  ? trace_hardirqs_on+0xd/0x10
[  492.841101]  amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[  492.841107]  do_vfs_ioctl+0xa6/0x6c0
[  492.841115]  SyS_ioctl+0x79/0x90
[  492.841120]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[  492.841123] RIP: 0033:0x7f7eb9078dc7
[  492.841125] RSP: 002b:00007f7eb0c479b8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  492.841130] RAX: ffffffffffffffda RBX: 00000000025f5ab8 RCX:
00007f7eb9078dc7
[  492.841132] RDX: 00007f7eb0c47a20 RSI: 00000000c0186444 RDI:
000000000000000b
[  492.841134] RBP: 000000000266f860 R08: 00007f7eb0c47ad0 R09:
00007f7eb0c47a00
[  492.841136] R10: 00007f7eb0c47ad0 R11: 0000000000000246 R12:
0000000000000007
[  492.841138] R13: 0000000000000001 R14: 00000000025f5ab8 R15:
0000000000000000
[  492.841171] INFO: task kworker/3:3:2263 blocked for more than 120 seconds.
[  492.841174]       Not tainted 4.14.0-rc3-amd-vega+ #5
[  492.841177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  492.841180] kworker/3:3     D13448  2263      2 0x80000000
[  492.841190] Workqueue: events ttm_bo_delayed_workqueue [ttm]
[  492.841194] Call Trace:
[  492.841199]  __schedule+0x2dc/0xbb0
[  492.841206]  schedule+0x33/0x90
[  492.841209]  schedule_preempt_disabled+0x15/0x20
[  492.841212]  __ww_mutex_lock.constprop.9+0xa6f/0x10a0
[  492.841216]  ? __lock_is_held+0x59/0xa0
[  492.841220]  ? ttm_bo_delayed_delete+0x108/0x1b0 [ttm]
[  492.841228]  ww_mutex_lock+0x5e/0x70
[  492.841231]  ? ww_mutex_lock+0x5e/0x70
[  492.841235]  ttm_bo_delayed_delete+0x108/0x1b0 [ttm]
[  492.841243]  ttm_bo_delayed_workqueue+0x1b/0x40 [ttm]
[  492.841246]  process_one_work+0x26b/0x6c0
[  492.841253]  worker_thread+0x35/0x3b0
[  492.841259]  kthread+0x171/0x190
[  492.841262]  ? process_one_work+0x6c0/0x6c0
[  492.841264]  ? kthread_create_on_node+0x70/0x70
[  492.841269]  ret_from_fork+0x2a/0x40
[  492.841323] 
               Showing all locks held in the system:
[  492.841332] 1 lock held by khungtaskd/62:
[  492.841336]  #0:  (tasklist_lock){.+.+}, at: [<ffffffffba1116ed>]
debug_show_all_locks+0x3d/0x1a0
[  492.841353] 3 locks held by kworker/u16:5/147:
[  492.841355]  #0:  ("events_unbound"){+.+.}, at: [<ffffffffba0ceb41>]
process_one_work+0x1e1/0x6c0
[  492.841365]  #1:  ((&state->commit_work)){+.+.}, at: [<ffffffffba0ceb41>]
process_one_work+0x1e1/0x6c0
[  492.841376]  #2:  (reservation_ww_class_mutex){+.+.}, at:
[<ffffffffc03bbbea>] amdgpu_dm_do_flip+0xea/0x390 [amdgpu]
[  492.841451] 1 lock held by gnome-shell/1981:
[  492.841453]  #0:  (reservation_ww_class_mutex){+.+.}, at:
[<ffffffffc0273626>] ttm_bo_vm_fault+0x66/0x5d0 [ttm]
[  492.841468] 1 lock held by amdgpu_cs:0/2013:
[  492.841470]  #0:  (&ctx->lock){+.+.}, at: [<ffffffffc02e4d4e>]
amdgpu_cs_ioctl+0x59e/0x1eb0 [amdgpu]
[  492.841513] 3 locks held by kworker/3:3/2263:
[  492.841515]  #0:  ("events"){+.+.}, at: [<ffffffffba0ceb41>]
process_one_work+0x1e1/0x6c0
[  492.841526]  #1:  ((&(&bdev->wq)->work)){+.+.}, at: [<ffffffffba0ceb41>]
process_one_work+0x1e1/0x6c0
[  492.841536]  #2:  (reservation_ww_class_mutex){+.+.}, at:
[<ffffffffc026f948>] ttm_bo_delayed_delete+0x108/0x1b0 [ttm]
[  492.841623] 1 lock held by steamerrorrepor/4598:
[  492.841625]  #0:  (drm_global_mutex){+.+.}, at: [<ffffffffc01f86ab>]
drm_release+0x3b/0x3b0 [drm]
[  492.841644] =============================================
-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 104001] GPU driver hung when start steam client while playback video on Youtube (it occurs on latest staging kernel)