On Thu, Oct 27, 2011 at 10:19:39PM -0500, Ilija Hadzic wrote:
On Thu, 27 Oct 2011, Daniel Vetter wrote:
The only really hairy thing going on is racing modeset ioctls against vblank waiters. But modeset is protected by the dev->mode_config.mutex and hence already races against wait_vblank with the current code, so I'm slightly inclined to ignore this for the moment. Would be great if you coudl check that in-depth, too.
Will leave this for some other patch at some other time; the critical path is to fix the hang/crawl that I explained in my previous note.
Agreed, one thing at a time. It's complicated enough as is.
So I think the right thing to do is
- Kill dev->last_vblank_wait (in a prep patch).
Agreed. Also drm_vblank_info function can go away
- Imo we also have a few too many atomic ops going on, e.g.
dev->vblank_refcount looks like it's atomic only because or the procfs file, so maybe kill that procfs file entirely.
I am not sure about that. dev->vblank_refcount looks critical to me: it is incremented through drm_vblank_get which is called from several different contexts: page-flip functions in several drivers (which can run in the context of multiple user processes), **_pm_** stuff in radeon driver (which looks like it runs in the context of kernel worker thread). Mutexes used at all these different places are not quite consistent. If anything, as the result of a later audit, some other mutexes may be rendered unecessary, but definitely, I would keep vblank_refcount atomic.
I've re-read the code a bit and in _get it's protected by dev->vbl_lock, but not so in _put (because we issue a timer call to actually switch it off). I think we could just shove it under the protection of dev->vbl_lock completely. Another fuzzzy area is the relation between dev->vbl_lock and dev->vblank_time_lock. The latter looks a bit rendundant.
- Audit the vblank locking, maybe resulting in a patch with updated
comments, if there's anything misleading or tricky going on. And it's always good when the locking of structe members is explained in the header file ...
I'll add comments that I feel make sense for this set of patches (if anything). Full audit, should come later at a slower pace. There are quite a few mutexes and locks many of which are overlapping in function and some are spurious. It will take more than just myself to untangle this know.
Yeah, agreed. Let's first drop the mutex around this and untangle the spinlock/atomic mess in a second step. This is too hairy.
- Drop the mutex locking because it's not needed.
Agreed. Will drop.
While locking at the code I also noticed that wait_vblank calls drm_vblank_get, but not always drm_vblank_put. The code is correct, the missing vblank_put is hidden in drm_queue_vblank_event. Can you also write the patch to move that around to not trip up reviewers the next time around?
Actually, there is something fishy here. Look at the 'else' branch under the 'if ((seq - vblwait->request.sequence) <= (1 << 23))' at the end of the drm_queue_vblank_event. It doesn't have the 'drm_vblank_put'. It looks wrong to me, but I need to first convince myself that there is not some other obscure place where drm_vblank_put is accounted for. If that branch is really missing a drm_vblank_put, then it will be easy pull the drm_vblank_put out. Otherwise, it will be another knot to untangle.
I've re-read the code and agree with your follow-up analysis.