Message: 5 Date: Fri, 28 Oct 2011 11:30:38 +0200 From: Daniel Vetter daniel@ffwll.ch Subject: Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex To: Ilija Hadzic ihadzic@research.bell-labs.com Cc: dri-devel@lists.freedesktop.org Message-ID: 20111028093038.GB2919@phenom.ffwll.local Content-Type: text/plain; charset=us-ascii
- Imo we also have a few too many atomic ops going on, e.g.
dev->vblank_refcount looks like it's atomic only because or the procfs file, so maybe kill that procfs file entirely.
I am not sure about that. dev->vblank_refcount looks critical to me: it is incremented through drm_vblank_get which is called from several different contexts: page-flip functions in several drivers (which can run in the context of multiple user processes), **_pm_** stuff in radeon driver (which looks like it runs in the context of kernel worker thread). Mutexes used at all these different places are not quite consistent. If anything, as the result of a later audit, some other mutexes may be rendered unecessary, but definitely, I would keep vblank_refcount atomic.
Hi,
be careful with vblank_refcount. I think it probably should stay atomic. The drm_vblank_put() is often used in interrupt handlers of the kms drivers where you don't want to uneccessary wait on a lock, but be as quick as possible.
I've re-read the code a bit and in _get it's protected by dev-
vbl_lock,
but not so in _put (because we issue a timer call to actually switch it off). I think we could just shove it under the protection of dev-
vbl_lock
completely. Another fuzzzy area is the relation between dev-
vbl_lock and
dev->vblank_time_lock. The latter looks a bit rendundant.
The vblank_time_lock serves a different purpose from vbl_lock. It is used in the vblank irq handler drm_handle_vblank() and in the vblank irq on/off functions to protect the vblank timestamping and counting against races between the irq handler and the on/off code, and some funny races between the cpu reinitializing vblank counts and timestamps in non-irq path and the gpu firing vblank interrupts and/ or updating its hardware vblank counter at the "wrong" moment. vblank_time_lock basically blocks/delays gpu vblank irq processing during such problematic periods. Timing there is criticial for reliable vblank timestamping, kms page flipping and artifact free dynamic gpu power-management. It should be locked as seldomly and held as short as possible. I initially tried to get away only with vbl_lock, but there were uses of vbl_lock that looked as if using it in the interrupt handler as well could cause long delays in irq processing.
thanks, -mario
********************************************************************* Mario Kleiner Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tuebingen Germany
e-mail: mario.kleiner@tuebingen.mpg.de office: +49 (0)7071/601-1623 fax: +49 (0)7071/601-616 www: http://www.kyb.tuebingen.mpg.de/~kleinerm ********************************************************************* "For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." (Richard Feynman)
On Fri, Oct 28, 2011 at 08:15:11PM +0200, Mario Kleiner wrote:
be careful with vblank_refcount. I think it probably should stay atomic. The drm_vblank_put() is often used in interrupt handlers of the kms drivers where you don't want to uneccessary wait on a lock, but be as quick as possible.
I don't think that's a real issue - if we have lock contention anywhere with the current code, we already have a problem - and it looks like we have ;-) And the spinlocks are all already switching off irqs locally, so we should already strive for the shortest possible lock hold times.
[snip]
The vblank_time_lock serves a different purpose from vbl_lock. It is used in the vblank irq handler drm_handle_vblank() and in the vblank irq on/off functions to protect the vblank timestamping and counting against races between the irq handler and the on/off code, and some funny races between the cpu reinitializing vblank counts and timestamps in non-irq path and the gpu firing vblank interrupts and/ or updating its hardware vblank counter at the "wrong" moment. vblank_time_lock basically blocks/delays gpu vblank irq processing during such problematic periods. Timing there is criticial for reliable vblank timestamping, kms page flipping and artifact free dynamic gpu power-management. It should be locked as seldomly and held as short as possible. I initially tried to get away only with vbl_lock, but there were uses of vbl_lock that looked as if using it in the interrupt handler as well could cause long delays in irq processing.
I've gone through the code, and in almost all case where the vbl_lock is hold for a bit more than just a few load and stores we are immediately taking the vblank_time_lock and hold it almost as long. Which is why I think this is redundant. The only exeption is the irq uninstall code, which isn't really a fast path anyway.
Also, all these paths with longer lock-hold times are in the initial vblank_get/final vblank_put. And because we delay the vblank disabling by a full second, we should never hit that in steady state.
But I think you're one of the most timing-sensitive users of this vblank stuff, so when I get around to clean this up I'll certainly put you on cc so you can test for any adverse effects. Hopefully ripping out unecessary atomic ops makes stuff faster, not slower. -Daniel
On Oct 28, 2011, at 9:15 PM, Daniel Vetter wrote:
On Fri, Oct 28, 2011 at 08:15:11PM +0200, Mario Kleiner wrote:
be careful with vblank_refcount. I think it probably should stay atomic. The drm_vblank_put() is often used in interrupt handlers of the kms drivers where you don't want to uneccessary wait on a lock, but be as quick as possible.
I don't think that's a real issue - if we have lock contention anywhere with the current code, we already have a problem - and it looks like we have ;-) And the spinlocks are all already switching off irqs locally, so we should already strive for the shortest possible lock hold times.
[snip]
The vblank_time_lock serves a different purpose from vbl_lock. It is used in the vblank irq handler drm_handle_vblank() and in the vblank irq on/off functions to protect the vblank timestamping and counting against races between the irq handler and the on/off code, and some funny races between the cpu reinitializing vblank counts and timestamps in non-irq path and the gpu firing vblank interrupts and/ or updating its hardware vblank counter at the "wrong" moment. vblank_time_lock basically blocks/delays gpu vblank irq processing during such problematic periods. Timing there is criticial for reliable vblank timestamping, kms page flipping and artifact free dynamic gpu power-management. It should be locked as seldomly and held as short as possible. I initially tried to get away only with vbl_lock, but there were uses of vbl_lock that looked as if using it in the interrupt handler as well could cause long delays in irq processing.
I've gone through the code, and in almost all case where the vbl_lock is hold for a bit more than just a few load and stores we are immediately taking the vblank_time_lock and hold it almost as long. Which is why I think this is redundant. The only exeption is the irq uninstall code, which isn't really a fast path anyway.
Ok, if that is the case in the current code, fair enough. We would have to stress the importance of *very low* vbl_lock hold times in the documentation of the code if we want to get rid of the vblank_time_lock. The distinctive lock for that purpose is also meant as a marker that you should try twice as hard to be quick while you hold that lock than with other spinlocks. A few dozen usecs extra delay there could make a difference between met and missed pageflip deadlines and well working or not well working dynamic gpu reclocking, e.g., on radeon, where this stuff is triggered by vblank irq's and quite timing sensitive.
Also, all these paths with longer lock-hold times are in the initial vblank_get/final vblank_put. And because we delay the vblank disabling by a full second, we should never hit that in steady state.
5 seconds by default, i think. We wanted to lower that to something like < 100 msecs at some point, but there's still some work to do to remove some potential race with the gpu in the on/off path which could cause lost/wrong vblank counts, and the nouveau kms driver doesn't implement hardware vblank counter at all atm, which makes small values on nouveau unsafe atm. With higher on/off frequencies the impact of too long lock hold times would become worse.
But I think you're one of the most timing-sensitive users of this vblank stuff, so when I get around to clean this up I'll certainly put you on cc so you can test for any adverse effects. Hopefully ripping out unecessary atomic ops makes stuff faster, not slower.
Yes, the current timing/timestamp precision and some of the related dri2 features of the free graphics stack is a very strong selling point to lure quite many timing sensitive users of my userspace toolkit away from other proprietary solutions.
Ok, cc'ing me sounds like a plan.
Thanks, -mario
-Daniel
Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48
********************************************************************* Mario Kleiner Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tuebingen Germany
e-mail: mario.kleiner@tuebingen.mpg.de office: +49 (0)7071/601-1623 fax: +49 (0)7071/601-616 www: http://www.kyb.tuebingen.mpg.de/~kleinerm ********************************************************************* "For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." (Richard Feynman)
dri-devel@lists.freedesktop.org