https://bugs.freedesktop.org/show_bug.cgi?id=96445
Bug ID: 96445 Summary: [amdgpu][tonga] display freezes soon after X start Product: DRI Version: DRI git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: csaba.halasz@gmail.com
Starting X and just trying to do general stuff leads to display freeze within minutes. Most of the time nothing in syslog but during bisecting I have at least once seen this message which might be unrelated:
[drm:process_one_work] *ERROR* ring sdma1 timeout, last signaled seq=144370, last emitted seq=144370
Initially I got an RCU stall message:
INFO: rcu_sched detected stalls on CPUs/tasks: 3-...: (13593 GPs behind) idle=561/140000000000000/0 softirq=0/0 fqs=14996 5-...: (13593 GPs behind) idle=2e7/140000000000000/0 softirq=0/0 fqs=14996 (detected by 0, t=15002 jiffies, g=13293, c=13292, q=0) Task dump for CPU 3: sdma1 R running task 0 499 2 0x00080008 ffffffff810f83fb ffff8803e02de580 ffffffffa0333d01 ffff8800b9d1c110 ffff8800b89a3c00 ffff8800b89a3cd8 0000000000000000 ffff8800b9d1c110 ffffffffa036812b ffff8800b9fd58b8 ffffffff813c66bf ffff8800b9fd58b8 Call Trace: [<ffffffff810f83fb>] ? kmem_cache_free+0xab/0xc0 [<ffffffffa0333d01>] ? amdgpu_sync_get_fence+0x51/0xc0 [amdgpu] [<ffffffffa036812b>] ? amdgpu_job_dependency+0x2b/0xb0 [amdgpu] [<ffffffff813c66bf>] ? _raw_spin_lock_irqsave+0x1f/0x30 [<ffffffffa0367643>] ? amd_sched_main+0x1a3/0x3f0 [amdgpu] [<ffffffff81077310>] ? add_wait_queue+0x60/0x60 [<ffffffffa03674a0>] ? amd_sched_process_job+0x70/0x70 [amdgpu] [<ffffffff8105fbbc>] ? kthread+0xbc/0xe0 [<ffffffff813c6b02>] ? ret_from_fork+0x22/0x40 [<ffffffff8105fb00>] ? kthread_stop+0x70/0x70 Task dump for CPU 5: kworker/5:1 R running task 0 143 2 0x00080008 Workqueue: events amd_sched_job_finish [amdgpu] ffff88043ed57900 0000000000000000 ffffffff81059ae2 0000000000000018 ffff88042b4e0000 ffff88042b982d00 ffff88043ed53420 ffff88042b982d00 ffff88042b982d00 ffff88043ed53400 ffff8800ba673d80 ffff8800ba673db0 Call Trace: [<ffffffff81059ae2>] ? process_one_work+0x132/0x350 [<ffffffff8105b3fe>] ? worker_thread+0x11e/0x430 [<ffffffff8105b2e0>] ? create_worker+0x180/0x180 [<ffffffff8105fbbc>] ? kthread+0xbc/0xe0 [<ffffffff813c6b02>] ? ret_from_fork+0x22/0x40 [<ffffffff8105fb00>] ? kthread_stop+0x70/0x70
During bisecting I probably did not wait long enough for this to show up (apparently it's configured for 1 minute). According to git bisect:
8df07daf3952b7606e2d17076198ec3fb38ab1f1 is the first bad commit commit 8df07daf3952b7606e2d17076198ec3fb38ab1f1 Date: Thu May 19 09:54:15 2016 +0200
drm/amdgpu: fix and cleanup job destruction
kernel agd5f/drm-next-4.8-wip mesa git 65c2abf6fdd51b0a80a72caa0c52cf3f4578e743 llvm git ef1f2996c17c9b1480201239002b58851810e8fc xf86-video-amdgpu git 60ced5026ebc34d9f32c7618430b6a7ef7c8eb4b Xorg 1.18.0 mplayer svn r37870 gigabyte 380 (tonga)
https://bugs.freedesktop.org/show_bug.cgi?id=96445
--- Comment #1 from Christian König deathsimple@vodafone.de --- Yeah, we stumbled over that problem internally as well and are already working on it.
https://bugs.freedesktop.org/show_bug.cgi?id=96445
--- Comment #2 from Nicolai Hähnle nhaehnle@gmail.com --- Created attachment 124493 --> https://bugs.freedesktop.org/attachment.cgi?id=124493&action=edit avoid schedule() during spinlock
Hi Csaba! The attached patch doesn't fix the problem for me, but it seems correct and at least changes the symptoms. Maybe it helps on your system?
https://bugs.freedesktop.org/show_bug.cgi?id=96445
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #124493|0 |1 is obsolete| |
--- Comment #3 from Christian König deathsimple@vodafone.de --- Created attachment 124495 --> https://bugs.freedesktop.org/attachment.cgi?id=124495&action=edit Possible fix
Complete fix for the issue, thanks to Nicolai for pointing me into the right direction.
https://bugs.freedesktop.org/show_bug.cgi?id=96445
--- Comment #4 from Nicolai Hähnle nhaehnle@gmail.com --- This patches does the trick. I've run my stress test for about an hour, so it's safe to say that it's fixed - feel free to add my Tested-by.
https://bugs.freedesktop.org/show_bug.cgi?id=96445
Christian König deathsimple@vodafone.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED
dri-devel@lists.freedesktop.org