On 08/28/2013 03:15 AM, Ben Skeggs wrote:
On Wed, Aug 28, 2013 at 6:12 AM, Peter Hurley peter@hurleysoftware.com wrote:
This series was originally motivated by a deadlock, introduced in commit 1d7c71a3e2f77336df536855b0efd2dc5bdeb41b 'drm/nouveau/disp: port vblank handling to event interface', due to inverted lock order between nouveau_drm_vblank_enable() and nouveau_drm_vblank_handler() (the complete lockdep report is included in the patch 4/5 changelog).
Hey Peter,
Thanks for the patch series. I've only taken a cursory glance through the patches thus far, as they're definitely too intrusive for -fixes at this point. I will take a proper look through the series within the next week or so, I have some work pending which may possibly make some of these changes unnecessary, though from what I can tell, there's other good bits here we'll want.
In a previous mail on the locking issue, you mentioned the drm core doing the "wrong thing" here too, did you have any further thoughts on correcting that issue too?
I'm working on the premise that drm_handle_vblank() can be lockless; that would minimize the api disruption. I still have a fair bit of analysis to do, but some early observations:
1) The vbl_time_lock spinlock is unnecessary -- drm_handle_vblank() could be protected with vbl_lock, since everywhere else vbl_time_lock is held, vbl_lock is already held.
2) The _vblank_count[crtc] doesn't need to be atomic. All the code paths that update _vblank_count[] are already spinlocked. Even if the code weren't spinlocked, the atomic_inc/_adds aren't necessary; only the memory barriers and read/write order of the vblank count/timestamp tuple is relevant.
3) Careful enabling of drm_handle_vblank() is sufficient to solve the concurrency between drm_handle_vblank() and drm_vblank_get() without needing a spinlock for exclusion. drm_handle_vblank() would need to account for the possibility of having missed a vblank interrupt between the reading of the hw vblank counter and the enabling drm_handle_vblank().
4) I'm still thinking about how/whether to exclude drm_handle_vblank() and vblank_disable_and_save(). One thought is to replace the timeout timer with delayed work instead so that vblank_disable_and_save() could wait for drm_handle_vblank() to finish after disabling it. [I'd also need to verify that the drm drivers other than intel which use drm_vblank_off() do so from non-atomic context.]
I realize that I'm mostly referring to the hw counter/timestamp flavor of vblank handling; that's primarily because it requires a more rigorous handling and has race conditions that don't apply to the software-only counter.
Regards, Peter Hurley