Tejun/Jerome (and radeon devs):
I'd like to bring a suspend/resume radeon bug full circle (see: http://thread.gmane.org/gmane.linux.kernel/1209587 for complete thread and Tejun's excellent summary below).
The problem was triggered by new input serio driver code (commit 8ee294cd9def000 found through bisection). Don't ask me why, but that set it off.
In a nutshell, X would intermittently lock up across suspend/resume cycles.
The issue remained at least through 3.1.5. I skipped 3.2.x altogether to kernel 3.3.4 and confirm the problem seems to be gone.
What might have happened in the radeon codebase since 3.1.x that would have addressed this either intentionally or as a side-effect? Maybe 721604a15b934f or 9fc04b503df9a3?
Thanks.
~ Andy
----- Forwarded message from Tejun Heo tj@kernel.org -----
Date: Fri, 4 Nov 2011 09:14:31 -0700 From: Tejun Heo tj@kernel.org Subject: Re: [REGRESSION]: hibernate/sleep regression w/ bisection To: Andrew Watts akwatts@ymail.com Cc: Dmitry Torokhov dmitry.torokhov@gmail.com, linux-kernel@vger.kernel.org, linux-pm@lists.linux-foundation.org, David Airlie airlied@linux.ie, dri-devel@lists.freedesktop.org
(cc'ing David Airlie and dri-devel)
Hello, the original thread can be read from
http://thread.gmane.org/gmane.linux.kernel/1209587
Full sysrq-t output at
http://article.gmane.org/gmane.linux.kernel/1211256
So, the problem is that after a seemingly unreated update to input serio driver (convert to use workqueue), X seems to lock up sporadically across suspend/resume cycles.
I went through the full sysrq-t output but couldn't spot anything suspicious w/ anything else. No worker is stuck and nobody is waiting for flush to finish.
Stack trace for X follows.
X S f499b944 5800 1652 1651 0x00400080 f499b9a8 00003086 00000000 f499b944 c100d4a4 00000000 00000000 f499b958 00000000 f499b9a8 f5173140 d7857c56 00000057 f5173140 d8b69880 00000057 00000001 00000000 f499b9b4 c104dd89 000f4240 00000000 00000000 f499ba68 Call Trace: [<c1291301>] ttm_bo_wait_unreserved+0x5f/0x106 [<c129145f>] ttm_bo_reserve_locked+0xb7/0xe1 [<c1292c27>] ttm_bo_reserve+0x26/0x95 [<c12c3c97>] radeon_crtc_do_set_base+0xbd/0x6d2 [<c12c42e7>] radeon_crtc_set_base+0x1b/0x1d [<c12c430d>] radeon_crtc_mode_set+0x24/0xdd7 [<c1279c57>] drm_crtc_helper_set_mode+0x32c/0x48b [<c1279e2f>] drm_helper_resume_force_mode+0x79/0x23e [<c12ace10>] radeon_gpu_reset+0x84/0x98 [<c12c0838>] radeon_fence_wait+0x2d1/0x311 [<c12c0e37>] radeon_sync_obj_wait+0xc/0xe [<c12908be>] ttm_bo_wait+0xa1/0x108 [<c12d6e7b>] radeon_gem_wait_idle_ioctl+0x76/0xc4 [<c127e62e>] drm_ioctl+0x1c2/0x42c [<c10e288e>] do_vfs_ioctl+0x79/0x54b [<c10e2dcb>] sys_ioctl+0x6b/0x70 [<c1593813>] sysenter_do_call+0x12/0x22
Do you guys have any ideas what's going on? It seems to be waiting for bo->reserved to go zero. Is it possible that someone there is forgetting to properly kick a work item after resume causing the wait to stall?
Andrew, can you please kill the X server after the hang and see whether that brings the system back? I think sshd should still work and if not you can write a script to kill the X server after 30secs after resume (and kill that script if resume succeeds).
Thank you.