When userspace queries the current vblank for the CRTC, we can reply with the cached value (using atomic reads to serialise with the vblank interrupt as necessary) without having to touch registers. In the instant disable case, this saves us from enabling/disabling the vblank around every query, greatly reducing the number of registers read and written.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Imre Deak imre.deak@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com --- drivers/gpu/drm/drm_irq.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index c8a34476570a..6c4570082b65 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1585,7 +1585,18 @@ int drm_wait_vblank(struct drm_device *dev, void *data, if (crtc >= dev->num_crtcs) return -EINVAL;
- vblank = &dev->vblank[crtc]; + /* Fast-path the query for the current value (without an event) + * to avoid having to enable/disable the vblank interrupts. + */ + if ((vblwait->request.type & (_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK)) == _DRM_VBLANK_RELATIVE && + vblwait->request.sequence == 0) { + struct timeval now; + + vblwait->reply.sequence = drm_vblank_count_and_time(dev, crtc, &now); + vblwait->reply.tval_sec = now.tv_sec; + vblwait->reply.tval_usec = now.tv_usec; + return 0; + }
ret = drm_vblank_get(dev, crtc); if (ret) { @@ -1619,6 +1630,8 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
DRM_DEBUG("waiting on vblank count %d, crtc %d\n", vblwait->request.sequence, crtc); + + vblank = &dev->vblank[crtc]; vblank->last_wait = vblwait->request.sequence; DRM_WAIT_ON(ret, vblank->queue, 3 * HZ, (((drm_vblank_count(dev, crtc) -
On 18.03.2015 00:44, Chris Wilson wrote:
When userspace queries the current vblank for the CRTC, we can reply with the cached value (using atomic reads to serialise with the vblank interrupt as necessary) without having to touch registers. In the instant disable case, this saves us from enabling/disabling the vblank around every query, greatly reducing the number of registers read and written.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Imre Deak imre.deak@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com
drivers/gpu/drm/drm_irq.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index c8a34476570a..6c4570082b65 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1585,7 +1585,18 @@ int drm_wait_vblank(struct drm_device *dev, void *data, if (crtc >= dev->num_crtcs) return -EINVAL;
- vblank = &dev->vblank[crtc];
- /* Fast-path the query for the current value (without an event)
* to avoid having to enable/disable the vblank interrupts.
*/
- if ((vblwait->request.type & (_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK)) == _DRM_VBLANK_RELATIVE &&
vblwait->request.sequence == 0) {
struct timeval now;
vblwait->reply.sequence = drm_vblank_count_and_time(dev, crtc, &now);
vblwait->reply.tval_sec = now.tv_sec;
vblwait->reply.tval_usec = now.tv_usec;
return 0;
- }
drm_vblank_count_and_time() doesn't return the correct sequence number while the vblank interrupt is disabled, does it? It returns the sequence number from the last time vblank_disable_and_save() was called (when the vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
On 18.03.2015 11:53, Michel Dänzer wrote:
On 18.03.2015 00:44, Chris Wilson wrote:
When userspace queries the current vblank for the CRTC, we can reply with the cached value (using atomic reads to serialise with the vblank interrupt as necessary) without having to touch registers. In the instant disable case, this saves us from enabling/disabling the vblank around every query, greatly reducing the number of registers read and written.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Imre Deak imre.deak@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com
drivers/gpu/drm/drm_irq.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index c8a34476570a..6c4570082b65 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1585,7 +1585,18 @@ int drm_wait_vblank(struct drm_device *dev, void *data, if (crtc >= dev->num_crtcs) return -EINVAL;
- vblank = &dev->vblank[crtc];
- /* Fast-path the query for the current value (without an event)
* to avoid having to enable/disable the vblank interrupts.
*/
- if ((vblwait->request.type & (_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK)) == _DRM_VBLANK_RELATIVE &&
vblwait->request.sequence == 0) {
struct timeval now;
vblwait->reply.sequence = drm_vblank_count_and_time(dev, crtc, &now);
vblwait->reply.tval_sec = now.tv_sec;
vblwait->reply.tval_usec = now.tv_usec;
return 0;
- }
drm_vblank_count_and_time() doesn't return the correct sequence number while the vblank interrupt is disabled, does it? It returns the sequence number from the last time vblank_disable_and_save() was called (when the vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
It might be possible to avoid drm_vblank_get() and use drm_update_vblank_count() before (or in) drm_vblank_count_and_time() instead when the vblank interrupt is disabled, but then you'd first need to make it safe to call drm_update_vblank_count() several times while the vblank interrupt is disabled, by making it update vblank->last at least.
On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
drm_vblank_count_and_time() doesn't return the correct sequence number while the vblank interrupt is disabled, does it? It returns the sequence number from the last time vblank_disable_and_save() was called (when the vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
Ville enlightened me as well. I thought the value was cooked so that time did not pass whilst the IRQ was disabled. Hopefully, I can impress upon the Intel folks, at least, that enabling/disabling the interrupts just to read the current hw counter is interesting to say the least and sits at the top of the profiles when benchmarking Present. -Chris
On 03/18/2015 10:30 AM, Chris Wilson wrote:
On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
drm_vblank_count_and_time() doesn't return the correct sequence number while the vblank interrupt is disabled, does it? It returns the sequence number from the last time vblank_disable_and_save() was called (when the vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
Ville enlightened me as well. I thought the value was cooked so that time did not pass whilst the IRQ was disabled. Hopefully, I can impress upon the Intel folks, at least, that enabling/disabling the interrupts just to read the current hw counter is interesting to say the least and sits at the top of the profiles when benchmarking Present. -Chris
drm_wait_vblank() not only gets the counter but also the corresponding vblank timestamp. Counters are recalculated in vblank_disable_and_save() for irq off, then in the vblank irq on path, and every refresh in drm_handle_vblank at vblank irq time.
The timestamps can be recalculated at any time iff the driver supports high precision timestamping, which currently intel kms, radeon kms, and nouveau kms do. But for other parts, like most SoC's, afaik you only get a valid timestamp by sampling system time in the vblank irq handler, so there you'd have a problem.
There are also some races around the enable/disable path which require a lot of care and exact knowledge of when each hardware fires its vblanks, updates its hardware counters etc. to get rid of them. Ville did that - successfully as far as my tests go - for Intel kms, but other drivers would be less forgiving.
Our current method is to:
a) Only disable vblank irqs after a default idle period of 5 seconds, so we don't get races frequent/likely enough to cause problems for clients. And we save the overhead for all the vblank irq on/off.
b) On drivers which have high precision timestamping and have been carefully checked to be race free (== intel kms only atm.) we have instant disable, so things like blinking cursors don't keep vblank irq on forever.
If b) causes so much overhead, maybe we could change the "instant disable" into a "disable after a very short time", e.g., lowering the timeout from 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That would still disable vblank irqs for power saving if the desktop is really idle, but avoid on/off storms for the various drm_wait_vblank's that happen when preparing a swap.
-mario
On Wed, Mar 18, 2015 at 03:52:56PM +0100, Mario Kleiner wrote:
On 03/18/2015 10:30 AM, Chris Wilson wrote:
On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
drm_vblank_count_and_time() doesn't return the correct sequence number while the vblank interrupt is disabled, does it? It returns the sequence number from the last time vblank_disable_and_save() was called (when the vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
Ville enlightened me as well. I thought the value was cooked so that time did not pass whilst the IRQ was disabled. Hopefully, I can impress upon the Intel folks, at least, that enabling/disabling the interrupts just to read the current hw counter is interesting to say the least and sits at the top of the profiles when benchmarking Present. -Chris
drm_wait_vblank() not only gets the counter but also the corresponding vblank timestamp. Counters are recalculated in vblank_disable_and_save() for irq off, then in the vblank irq on path, and every refresh in drm_handle_vblank at vblank irq time.
The timestamps can be recalculated at any time iff the driver supports high precision timestamping, which currently intel kms, radeon kms, and nouveau kms do. But for other parts, like most SoC's, afaik you only get a valid timestamp by sampling system time in the vblank irq handler, so there you'd have a problem.
There are also some races around the enable/disable path which require a lot of care and exact knowledge of when each hardware fires its vblanks, updates its hardware counters etc. to get rid of them. Ville did that - successfully as far as my tests go - for Intel kms, but other drivers would be less forgiving.
Our current method is to:
a) Only disable vblank irqs after a default idle period of 5 seconds, so we don't get races frequent/likely enough to cause problems for clients. And we save the overhead for all the vblank irq on/off.
b) On drivers which have high precision timestamping and have been carefully checked to be race free (== intel kms only atm.) we have instant disable, so things like blinking cursors don't keep vblank irq on forever.
If b) causes so much overhead, maybe we could change the "instant disable" into a "disable after a very short time", e.g., lowering the timeout from 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That would still disable vblank irqs for power saving if the desktop is really idle, but avoid on/off storms for the various drm_wait_vblank's that happen when preparing a swap.
Yeah I think we could add code which only gets run for drivers which support instant disable (i915 doesn't do that on gen2 because the hw is lacking). There we should be able to update the vblank counter/timestamp correctly without enabling interrupts temporarily. Ofc we need to make sure we have enough nasty igt testcase to ensure there's not going to be jumps and missed frame numbers in that case. -Daniel
On Thu, Mar 19, 2015 at 03:33:11PM +0100, Daniel Vetter wrote:
On Wed, Mar 18, 2015 at 03:52:56PM +0100, Mario Kleiner wrote:
On 03/18/2015 10:30 AM, Chris Wilson wrote:
On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
drm_vblank_count_and_time() doesn't return the correct sequence number while the vblank interrupt is disabled, does it? It returns the sequence number from the last time vblank_disable_and_save() was called (when the vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
Ville enlightened me as well. I thought the value was cooked so that time did not pass whilst the IRQ was disabled. Hopefully, I can impress upon the Intel folks, at least, that enabling/disabling the interrupts just to read the current hw counter is interesting to say the least and sits at the top of the profiles when benchmarking Present. -Chris
drm_wait_vblank() not only gets the counter but also the corresponding vblank timestamp. Counters are recalculated in vblank_disable_and_save() for irq off, then in the vblank irq on path, and every refresh in drm_handle_vblank at vblank irq time.
The timestamps can be recalculated at any time iff the driver supports high precision timestamping, which currently intel kms, radeon kms, and nouveau kms do. But for other parts, like most SoC's, afaik you only get a valid timestamp by sampling system time in the vblank irq handler, so there you'd have a problem.
There are also some races around the enable/disable path which require a lot of care and exact knowledge of when each hardware fires its vblanks, updates its hardware counters etc. to get rid of them. Ville did that - successfully as far as my tests go - for Intel kms, but other drivers would be less forgiving.
Our current method is to:
a) Only disable vblank irqs after a default idle period of 5 seconds, so we don't get races frequent/likely enough to cause problems for clients. And we save the overhead for all the vblank irq on/off.
b) On drivers which have high precision timestamping and have been carefully checked to be race free (== intel kms only atm.) we have instant disable, so things like blinking cursors don't keep vblank irq on forever.
If b) causes so much overhead, maybe we could change the "instant disable" into a "disable after a very short time", e.g., lowering the timeout from 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That would still disable vblank irqs for power saving if the desktop is really idle, but avoid on/off storms for the various drm_wait_vblank's that happen when preparing a swap.
Yeah I think we could add code which only gets run for drivers which support instant disable (i915 doesn't do that on gen2 because the hw is lacking). There we should be able to update the vblank counter/timestamp correctly without enabling interrupts temporarily. Ofc we need to make sure we have enough nasty igt testcase to ensure there's not going to be jumps and missed frame numbers in that case.
Is enabling the interrupts the expensive part, or is it the actual double timestamp read + scanout pos read? Or is it due to the several spinlocks we have in this code?
Also why is userspace reading the vblank counter in the first place? Due to the crazy OML_whatever stuff perhaps? In the simple swap interval case you shouldn't really need to read it. And if we actually made the page flip/atomic ioctl take a target vblank count and let the kernel deal with it we wouldn't need to call the vblank ioctl at all.
As far as gen2 goes, I have some code somewhere that improved things a bit. Essentially I just looked at the monotonic timestamps to figure out if we missed any vblanks while interrupts were off. IIRC my current code only added one to the cooked count in that case, but should be easy to make it come up with a more realistic delta.
On Thu, Mar 19, 2015 at 05:04:19PM +0200, Ville Syrjälä wrote:
Is enabling the interrupts the expensive part, or is it the actual double timestamp read + scanout pos read? Or is it due to the several spinlocks we have in this code?
Chiefly it was the read during disable, then the spinlocked irq enabling.
Also why is userspace reading the vblank counter in the first place? Due to the crazy OML_whatever stuff perhaps? In the simple swap interval case you shouldn't really need to read it. And if we actually made the page flip/atomic ioctl take a target vblank count and let the kernel deal with it we wouldn't need to call the vblank ioctl at all.
More or less due to OML, yes. Client asks for random MSC, often 0, we compute the next MSC for it. This happens 10,0000+ a second when benchmarking and drmWaitVblank is the most expensive step in that chain for the server...
Pretty much an igt that compared the speed of just querying the hw counter vs querying with a regular vblank interrupt would be ideal for measuring the impact here. -Chris
On Thu, Mar 19, 2015 at 03:13:15PM +0000, Chris Wilson wrote:
Pretty much an igt that compared the speed of just querying the hw counter vs querying with a regular vblank interrupt would be ideal for measuring the impact here.
ickle@crystalwell:/usr/src/intel-gpu-tools$ sudo ./tests/kms_vblank IGT-Version: 1.10-gc15fb9e (x86_64) (Linux: 4.0.0-rc4+ x86_64) Time to query current counter (idle): 2.217µs Subtest query-idle: SUCCESS (1.006s) Time to query current counter (busy): 0.244µs Subtest query-busy: SUCCESS (1.201s
- 28.04% kms_vblank [kernel.kallsyms] [k] gen6_read32 - gen6_read32 + 65.27% gm45_get_vblank_counter + 17.50% ironlake_disable_display_irq + 16.53% ironlake_enable_display_irq + 21.57% kms_vblank [kernel.kallsyms] [k] hsw_unclaimed_reg_detect.isra + 10.69% kms_vblank [kernel.kallsyms] [k] _raw_spin_lock_irqsave + 9.91% kms_vblank [kernel.kallsyms] [k] __intel_get_crtc_scanline + 6.66% kms_vblank [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore + 1.34% kms_vblank [kernel.kallsyms] [k] i915_get_crtc_scanoutpos + 1.29% kms_vblank [kernel.kallsyms] [k] drm_wait_vblank + 1.27% kms_vblank [kernel.kallsyms] [k] hsw_unclaimed_reg_debug.isra. + 1.17% kms_vblank [kernel.kallsyms] [k] copy_user_enhanced_fast_strin + 1.12% kms_vblank [kernel.kallsyms] [k] drm_ioctl + 1.05% kms_vblank [kernel.kallsyms] [k] vblank_disable_and_save
On 03/19/2015 04:04 PM, Ville Syrjälä wrote:
On Thu, Mar 19, 2015 at 03:33:11PM +0100, Daniel Vetter wrote:
On Wed, Mar 18, 2015 at 03:52:56PM +0100, Mario Kleiner wrote:
On 03/18/2015 10:30 AM, Chris Wilson wrote:
On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
drm_vblank_count_and_time() doesn't return the correct sequence number while the vblank interrupt is disabled, does it? It returns the sequence number from the last time vblank_disable_and_save() was called (when the vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
Ville enlightened me as well. I thought the value was cooked so that time did not pass whilst the IRQ was disabled. Hopefully, I can impress upon the Intel folks, at least, that enabling/disabling the interrupts just to read the current hw counter is interesting to say the least and sits at the top of the profiles when benchmarking Present. -Chris
drm_wait_vblank() not only gets the counter but also the corresponding vblank timestamp. Counters are recalculated in vblank_disable_and_save() for irq off, then in the vblank irq on path, and every refresh in drm_handle_vblank at vblank irq time.
The timestamps can be recalculated at any time iff the driver supports high precision timestamping, which currently intel kms, radeon kms, and nouveau kms do. But for other parts, like most SoC's, afaik you only get a valid timestamp by sampling system time in the vblank irq handler, so there you'd have a problem.
There are also some races around the enable/disable path which require a lot of care and exact knowledge of when each hardware fires its vblanks, updates its hardware counters etc. to get rid of them. Ville did that - successfully as far as my tests go - for Intel kms, but other drivers would be less forgiving.
Our current method is to:
a) Only disable vblank irqs after a default idle period of 5 seconds, so we don't get races frequent/likely enough to cause problems for clients. And we save the overhead for all the vblank irq on/off.
b) On drivers which have high precision timestamping and have been carefully checked to be race free (== intel kms only atm.) we have instant disable, so things like blinking cursors don't keep vblank irq on forever.
If b) causes so much overhead, maybe we could change the "instant disable" into a "disable after a very short time", e.g., lowering the timeout from 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That would still disable vblank irqs for power saving if the desktop is really idle, but avoid on/off storms for the various drm_wait_vblank's that happen when preparing a swap.
Yeah I think we could add code which only gets run for drivers which support instant disable (i915 doesn't do that on gen2 because the hw is lacking). There we should be able to update the vblank counter/timestamp correctly without enabling interrupts temporarily. Ofc we need to make sure we have enough nasty igt testcase to ensure there's not going to be jumps and missed frame numbers in that case.
I'd rather go for the very simple "fast disable with short timeout" method. That would only be a tiny almost one-liner patch that reuses the existing timer for the default slow case, and we'd know already that it will work reliably on "instant off" capable drivers - no extra tests required. Those drm_vblank_get/put calls usually come in short bursts which should be covered by a timeout of maybe 1 to max. 3 refresh durations.
When we query the hw timestamps, we always have a little bit of unavoidable noise, even if it's often only +/- 1 usec on modern hw, so clients querying the timestamp for the same vblank would get slightly different results on repeated queries. On hw which only allows scanline granularity for queries, we can get variability up to 1 scanline duration. If the caller does things like delta calculations on those results (dT = currentts - lastts) it can get confusing results like time going backwards by a few microseconds. That's why the current code caches the last vblank ts, to save overhead and to make sure that repeated queries of the same vblank give identical results.
Is enabling the interrupts the expensive part, or is it the actual double timestamp read + scanout pos read? Or is it due to the several spinlocks we have in this code?
The timestamp/scanout pos read itself is not that expensive iirc, usually 1-3 usecs depending on hw, from some testing i did a year ago. The machinery for irq on/off + all the reinitializing of vblank counts and matching timestamps etc. is probably not that cheap.
Also why is userspace reading the vblank counter in the first place? Due to the crazy OML_whatever stuff perhaps? In the simple swap interval case you shouldn't really need to read it. And if we actually made the page flip/atomic ioctl take a target vblank count and let the kernel deal with it we wouldn't need to call the vblank ioctl at all.
I object to the "crazy", extensions have feelings too ;-). Support for OML_sync_control is one of the most lovely features Linux has atm. as a big advantage over other OS's for research, vr and medical applications requiring precise visual timing. But yes, glXGetSyncValuesOML(), glXGetVideoSyncSGI() for clients. Mine and similar applications use it to synchronize code execution, or things like audio and other i/o to vblank, for correctness checks, and for remapping user provided target times into target vblank counts. X11 compositors use it, Weston uses it for scheduling and for synchronizing its repaint cycle to vblank. The ddx use it a lot under dri2 and dri3/present.
Not that it couldn't be done better, e.g., having the kernel directly deal with a target vblank count, or even with a target system time after which a flip should happen, could be useful additions. But then you'd have extra complexity for dealing with things like client windows going away or getting (un)redirected after requesting a page flip far into the future, and you'd still have to deal with windowed X11 clients which can't use page flipping directly for their swaps.
-mario
As far as gen2 goes, I have some code somewhere that improved things a bit. Essentially I just looked at the monotonic timestamps to figure out if we missed any vblanks while interrupts were off. IIRC my current code only added one to the cooked count in that case, but should be easy to make it come up with a more realistic delta.
On vblank instant-off systems, we can get into a situation where the cost of enabling and disabling the vblank IRQ around a drmWaitVblank query dominates. However, we know that if the user wants the current vblank counter, they are also very likely to immediately queue a vblank wait and so we can keep the interrupt around and only turn it off if we have no further vblank requests in the interrupt interval.
After vblank event delivery there is a shadow of one vblank where the interrupt is kept alive for the user to query and queue another vblank event. Similarly, if the user is using blocking drmWaitVblanks, the interrupt will be disabled on the IRQ following the wait completion. However, if the user is simply querying the current vblank counter and timestamp, the interrupt will be disabled after every IRQ and the user will enabled it again on the first query following the IRQ.
Testcase: igt/kms_vblank Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com --- drivers/gpu/drm/drm_irq.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index c8a34476570a..6f5dc18779e2 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1091,9 +1091,9 @@ void drm_vblank_put(struct drm_device *dev, int crtc) if (atomic_dec_and_test(&vblank->refcount)) { if (drm_vblank_offdelay == 0) return; - else if (dev->vblank_disable_immediate || drm_vblank_offdelay < 0) + else if (drm_vblank_offdelay < 0) vblank_disable_fn((unsigned long)vblank); - else + else if (!dev->vblank_disable_immediate) mod_timer(&vblank->disable_timer, jiffies + ((drm_vblank_offdelay * HZ)/1000)); } @@ -1697,6 +1697,17 @@ bool drm_handle_vblank(struct drm_device *dev, int crtc)
spin_lock_irqsave(&dev->event_lock, irqflags);
+ if (dev->vblank_disable_immediate && !atomic_read(&vblank->refcount)) { + unsigned long vbl_lock_irqflags; + + spin_lock_irqsave(&dev->vbl_lock, vbl_lock_irqflags); + if (atomic_read(&vblank->refcount) == 0 && vblank->enabled) { + DRM_DEBUG("disabling vblank on crtc %d\n", crtc); + vblank_disable_and_save(dev, crtc); + } + spin_unlock_irqrestore(&dev->vbl_lock, vbl_lock_irqflags); + } + /* Need timestamp lock to prevent concurrent execution with * vblank enable/disable, as this would cause inconsistent * or corrupted timestamps and vblank counts.
On 02.04.2015 20:34, Chris Wilson wrote:
On vblank instant-off systems, we can get into a situation where the cost of enabling and disabling the vblank IRQ around a drmWaitVblank query dominates. However, we know that if the user wants the current vblank counter, they are also very likely to immediately queue a vblank wait and so we can keep the interrupt around and only turn it off if we have no further vblank requests in the interrupt interval.
After vblank event delivery there is a shadow of one vblank where the interrupt is kept alive for the user to query and queue another vblank event. Similarly, if the user is using blocking drmWaitVblanks, the interrupt will be disabled on the IRQ following the wait completion. However, if the user is simply querying the current vblank counter and timestamp, the interrupt will be disabled after every IRQ and the user will enabled it again on the first query following the IRQ.
As I mentioned before, it might not be too hard to make querying the current counter work without enabling the interrupt. But this looks like a step in the right direction.
Acked-by: Michel Dänzer michel.daenzer@amd.com
On Fri, Apr 03, 2015 at 11:20:20AM +0900, Michel Dänzer wrote:
On 02.04.2015 20:34, Chris Wilson wrote:
On vblank instant-off systems, we can get into a situation where the cost of enabling and disabling the vblank IRQ around a drmWaitVblank query dominates. However, we know that if the user wants the current vblank counter, they are also very likely to immediately queue a vblank wait and so we can keep the interrupt around and only turn it off if we have no further vblank requests in the interrupt interval.
After vblank event delivery there is a shadow of one vblank where the interrupt is kept alive for the user to query and queue another vblank event. Similarly, if the user is using blocking drmWaitVblanks, the interrupt will be disabled on the IRQ following the wait completion. However, if the user is simply querying the current vblank counter and timestamp, the interrupt will be disabled after every IRQ and the user will enabled it again on the first query following the IRQ.
As I mentioned before, it might not be too hard to make querying the current counter work without enabling the interrupt. But this looks like a step in the right direction.
I honestly chickened out in case I broke something!
Hindsight says both are useful as currently with instant-off we will disable the vblank interrupt inside the IRQ handler delivering the event, whereas we can save quite a bit of pain by waiting for the next IRQ before doing the disable (culmulatively saving a lot more CPU cycles over the course of swap chain than the extra IRQ will cost). -Chris
Avoid adding to the waitqueue and reprobing the current vblank if the caller is only querying the current vblank sequence and timestamp and so we would return immediately.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com --- drivers/gpu/drm/drm_irq.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 6f5dc18779e2..ba80b51b4b00 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1617,14 +1617,16 @@ int drm_wait_vblank(struct drm_device *dev, void *data, vblwait->request.sequence = seq + 1; }
- DRM_DEBUG("waiting on vblank count %d, crtc %d\n", - vblwait->request.sequence, crtc); - vblank->last_wait = vblwait->request.sequence; - DRM_WAIT_ON(ret, vblank->queue, 3 * HZ, - (((drm_vblank_count(dev, crtc) - - vblwait->request.sequence) <= (1 << 23)) || - !vblank->enabled || - !dev->irq_enabled)); + if (vblwait->request.sequence != seq) { + DRM_DEBUG("waiting on vblank count %d, crtc %d\n", + vblwait->request.sequence, crtc); + vblank->last_wait = vblwait->request.sequence; + DRM_WAIT_ON(ret, vblank->queue, 3 * HZ, + (((drm_vblank_count(dev, crtc) - + vblwait->request.sequence) <= (1 << 23)) || + !vblank->enabled || + !dev->irq_enabled)); + }
if (ret != -EINTR) { struct timeval now;
Bypass all the spinlocks and return the last timestamp and counter from the last vblank if the driver delcares that it is accurate (and stable across on/off), and the vblank is currently enabled.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com --- drivers/gpu/drm/drm_irq.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index ba80b51b4b00..be9c210bb22e 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1538,6 +1538,17 @@ err_put: return ret; }
+static bool drm_wait_vblank_is_query(union drm_wait_vblank *vblwait) +{ + if (vblwait->request.sequence) + return false; + + return _DRM_VBLANK_RELATIVE == + (vblwait->request.type & (_DRM_VBLANK_TYPES_MASK | + _DRM_VBLANK_EVENT | + _DRM_VBLANK_NEXTONMISS)); +} + /* * Wait for VBLANK. * @@ -1587,6 +1598,21 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
vblank = &dev->vblank[crtc];
+ /* If the counter is currently enabled and accurate, short-circuit queries + * to return the cached timestamp of the last vblank. + */ + if (dev->vblank_disable_immediate && + drm_wait_vblank_is_query(vblwait) && + vblank->enabled) { + struct timeval now; + + vblwait->reply.sequence = + drm_vblank_count_and_time(dev, crtc, &now); + vblwait->reply.tval_sec = now.tv_sec; + vblwait->reply.tval_usec = now.tv_usec; + return 0; + } + ret = drm_vblank_get(dev, crtc); if (ret) { DRM_DEBUG("failed to acquire vblank counter, %d\n", ret);
On 04/05/2015 05:40 PM, Chris Wilson wrote:
Bypass all the spinlocks and return the last timestamp and counter from the last vblank if the driver delcares that it is accurate (and stable across on/off), and the vblank is currently enabled.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_irq.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index ba80b51b4b00..be9c210bb22e 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1538,6 +1538,17 @@ err_put: return ret; }
+static bool drm_wait_vblank_is_query(union drm_wait_vblank *vblwait) +{
- if (vblwait->request.sequence)
return false;
- return _DRM_VBLANK_RELATIVE ==
(vblwait->request.type & (_DRM_VBLANK_TYPES_MASK |
_DRM_VBLANK_EVENT |
_DRM_VBLANK_NEXTONMISS));
+}
- /*
- Wait for VBLANK.
@@ -1587,6 +1598,21 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
vblank = &dev->vblank[crtc];
- /* If the counter is currently enabled and accurate, short-circuit queries
* to return the cached timestamp of the last vblank.
*/
Maybe somehow stress in the comment that this location in drm_wait_vblank is really the only place where it is ok'ish to call drm_vblank_count_and_time() without wrapping it into a drm_vblank_get/put(), so nobody thinks this approach is ok anywhere else.
- if (dev->vblank_disable_immediate &&
drm_wait_vblank_is_query(vblwait) &&
vblank->enabled) {
You should also check for (drm_vblank_offdelay != 0) whenever checking for dev->vblank_disable_immediate. This is so one can override all the vblank_disable_immediate related logic via the drm vblankoffdelay module parameter, both for debugging and as a safety switch for desparate users in case some driver+gpu combo screws up wrt. immediate disable and that makes it into distro kernels.
The other thing i'm not sure is if it wouldn't be a good idea to have some kind of write memory barrier in vblank_disable_and_save() after setting vblank->enabled = false; and some read memory barrier here before your check for vblank->enabled? I don't have a feeling for how much time can pass between one core executing the disable and the other core receiving the news that vblank->enabled is no longer true if those bits run on different cores?
I've run your patches through my standard tests on x86_64 and they don't seem to introduce errors or more skipped frames. Normally it would be so wrong to do this without drm_vblank_get/put(), but i think here potential errors introduced wouldn't be worse than what a userspace client would see due to preemption or other execution delays at the wrong moment, so it's probably ok. But i don't know if lack of memory barriers etc. could introduce large delays and trouble on other architectures?
struct timeval now;
vblwait->reply.sequence =
drm_vblank_count_and_time(dev, crtc, &now);
vblwait->reply.tval_sec = now.tv_sec;
vblwait->reply.tval_usec = now.tv_usec;
Have some DRM_DEBUG here, so one can follow the client doing the instant query through this path.
return 0;
- }
- ret = drm_vblank_get(dev, crtc); if (ret) { DRM_DEBUG("failed to acquire vblank counter, %d\n", ret);
With the above addressed i'd give you a Reviewed-and-tested-by, but it would be good if somebody else could look over it as well.
-mario
On Tue, Apr 14, 2015 at 08:43:12PM +0200, Mario Kleiner wrote:
On 04/05/2015 05:40 PM, Chris Wilson wrote:
Bypass all the spinlocks and return the last timestamp and counter from the last vblank if the driver delcares that it is accurate (and stable across on/off), and the vblank is currently enabled.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_irq.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index ba80b51b4b00..be9c210bb22e 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1538,6 +1538,17 @@ err_put: return ret; }
+static bool drm_wait_vblank_is_query(union drm_wait_vblank *vblwait) +{
- if (vblwait->request.sequence)
return false;
- return _DRM_VBLANK_RELATIVE ==
(vblwait->request.type & (_DRM_VBLANK_TYPES_MASK |
_DRM_VBLANK_EVENT |
_DRM_VBLANK_NEXTONMISS));
+}
/*
- Wait for VBLANK.
@@ -1587,6 +1598,21 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
vblank = &dev->vblank[crtc];
- /* If the counter is currently enabled and accurate, short-circuit queries
* to return the cached timestamp of the last vblank.
*/
Maybe somehow stress in the comment that this location in drm_wait_vblank is really the only place where it is ok'ish to call drm_vblank_count_and_time() without wrapping it into a drm_vblank_get/put(), so nobody thinks this approach is ok anywhere else.
- if (dev->vblank_disable_immediate &&
drm_wait_vblank_is_query(vblwait) &&
vblank->enabled) {
You should also check for (drm_vblank_offdelay != 0) whenever checking for dev->vblank_disable_immediate. This is so one can override all the vblank_disable_immediate related logic via the drm vblankoffdelay module parameter, both for debugging and as a safety switch for desparate users in case some driver+gpu combo screws up wrt. immediate disable and that makes it into distro kernels.
The other thing i'm not sure is if it wouldn't be a good idea to have some kind of write memory barrier in vblank_disable_and_save() after setting vblank->enabled = false; and some read memory barrier here before your check for vblank->enabled? I don't have a feeling for how much time can pass between one core executing the disable and the other core receiving the news that vblank->enabled is no longer true if those bits run on different cores?
I've run your patches through my standard tests on x86_64 and they don't seem to introduce errors or more skipped frames. Normally it would be so wrong to do this without drm_vblank_get/put(), but i think here potential errors introduced wouldn't be worse than what a userspace client would see due to preemption or other execution delays at the wrong moment, so it's probably ok. But i don't know if lack of memory barriers etc. could introduce large delays and trouble on other architectures?
Barriers don't reduce that latency but only enforce ordering. And you always need two of them, one on the sending side of some piece of data and the other on the receiving side. From that pov drm_vblank_count_and_time is broken since it doesn't fully braket the timestamp read against the counter update (you'd need a barrier both before and after), and the barrier on the write side is missing. And then it's also too heavy, as long as we only have 1 updater we don't need atomics for the counter.
I think I'll review this properly and then write a patch. -Daniel
On 06.04.2015 00:40, Chris Wilson wrote:
Avoid adding to the waitqueue and reprobing the current vblank if the caller is only querying the current vblank sequence and timestamp and so we would return immediately.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_irq.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 6f5dc18779e2..ba80b51b4b00 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1617,14 +1617,16 @@ int drm_wait_vblank(struct drm_device *dev, void *data, vblwait->request.sequence = seq + 1; }
- DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
vblwait->request.sequence, crtc);
- vblank->last_wait = vblwait->request.sequence;
- DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
(((drm_vblank_count(dev, crtc) -
vblwait->request.sequence) <= (1 << 23)) ||
!vblank->enabled ||
!dev->irq_enabled));
- if (vblwait->request.sequence != seq) {
DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
vblwait->request.sequence, crtc);
vblank->last_wait = vblwait->request.sequence;
BTW, it looks like the last_wait field is unused and could be removed.
DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
(((drm_vblank_count(dev, crtc) -
vblwait->request.sequence) <= (1 << 23)) ||
!vblank->enabled ||
!dev->irq_enabled));
}
if (ret != -EINTR) { struct timeval now;
Also, the two patches should have different and more specific shortlogs.
But apart from that, this patch is
Reviewed-by: Michel Dänzer michel.daenzer@amd.com
and the second patch is
Acked-by: Michel Dänzer michel.daenzer@amd.com
On Tue, Apr 14, 2015 at 06:42:03PM +0900, Michel Dänzer wrote:
Also, the two patches should have different and more specific shortlogs.
Second patch:
drm: Query vblank counters directly for known accurate state
?
On 04/14/2015 04:21 PM, Chris Wilson wrote:
On Tue, Apr 14, 2015 at 06:42:03PM +0900, Michel Dänzer wrote:
Also, the two patches should have different and more specific shortlogs.
Second patch:
drm: Query vblank counters directly for known accurate state
?
When i applied your patches, both patches showed a shortlog of "drm: Shortcircuit vblank queries", i think that's what Michel means.
-mario
On Tue, Apr 14, 2015 at 08:17:21PM +0200, Mario Kleiner wrote:
On 04/14/2015 04:21 PM, Chris Wilson wrote:
On Tue, Apr 14, 2015 at 06:42:03PM +0900, Michel Dänzer wrote:
Also, the two patches should have different and more specific shortlogs.
Second patch:
drm: Query vblank counters directly for known accurate state
?
When i applied your patches, both patches showed a shortlog of "drm: Shortcircuit vblank queries", i think that's what Michel means.
I was making a suggestion for retitling. :) -Chris
On 14.04.2015 23:21, Chris Wilson wrote:
On Tue, Apr 14, 2015 at 06:42:03PM +0900, Michel Dänzer wrote:
Also, the two patches should have different and more specific shortlogs.
Second patch:
drm: Query vblank counters directly for known accurate state
?
Sounds good to me.
On 04/05/2015 05:40 PM, Chris Wilson wrote:
Avoid adding to the waitqueue and reprobing the current vblank if the caller is only querying the current vblank sequence and timestamp and so we would return immediately.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_irq.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 6f5dc18779e2..ba80b51b4b00 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1617,14 +1617,16 @@ int drm_wait_vblank(struct drm_device *dev, void *data, vblwait->request.sequence = seq + 1; }
- DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
vblwait->request.sequence, crtc);
- vblank->last_wait = vblwait->request.sequence;
- DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
(((drm_vblank_count(dev, crtc) -
vblwait->request.sequence) <= (1 << 23)) ||
!vblank->enabled ||
!dev->irq_enabled));
if (vblwait->request.sequence != seq) {
DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
vblwait->request.sequence, crtc);
vblank->last_wait = vblwait->request.sequence;
DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
(((drm_vblank_count(dev, crtc) -
vblwait->request.sequence) <= (1 << 23)) ||
!vblank->enabled ||
!dev->irq_enabled));
}
if (ret != -EINTR) { struct timeval now;
It would be good to have some DRM_DEBUG output for the skip-the-wait case as well, so one can still follow from dmesg output when a client does a drmWaitVblank call even if it is only a query.
Other than that, this one [1/2] is
Reviewed-and-tested-by: Mario Kleiner mario.kleiner.de@gmail.com
-mario
On Tue, Apr 14, 2015 at 08:22:20PM +0200, Mario Kleiner wrote:
On 04/05/2015 05:40 PM, Chris Wilson wrote:
Avoid adding to the waitqueue and reprobing the current vblank if the caller is only querying the current vblank sequence and timestamp and so we would return immediately.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_irq.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 6f5dc18779e2..ba80b51b4b00 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1617,14 +1617,16 @@ int drm_wait_vblank(struct drm_device *dev, void *data, vblwait->request.sequence = seq + 1; }
- DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
vblwait->request.sequence, crtc);
- vblank->last_wait = vblwait->request.sequence;
- DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
(((drm_vblank_count(dev, crtc) -
vblwait->request.sequence) <= (1 << 23)) ||
!vblank->enabled ||
!dev->irq_enabled));
if (vblwait->request.sequence != seq) {
DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
vblwait->request.sequence, crtc);
vblank->last_wait = vblwait->request.sequence;
DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
(((drm_vblank_count(dev, crtc) -
vblwait->request.sequence) <= (1 << 23)) ||
!vblank->enabled ||
!dev->irq_enabled));
}
if (ret != -EINTR) { struct timeval now;
It would be good to have some DRM_DEBUG output for the skip-the-wait case as well, so one can still follow from dmesg output when a client does a drmWaitVblank call even if it is only a query.
We still have DRM_DEBUG("returning %d to client"), as well as the drmIoctl:DRM_DEBUG(ioctl->name), is that not sufficient? -Chris
On 04/14/2015 08:36 PM, Chris Wilson wrote:
On Tue, Apr 14, 2015 at 08:22:20PM +0200, Mario Kleiner wrote:
On 04/05/2015 05:40 PM, Chris Wilson wrote:
Avoid adding to the waitqueue and reprobing the current vblank if the caller is only querying the current vblank sequence and timestamp and so we would return immediately.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_irq.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 6f5dc18779e2..ba80b51b4b00 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1617,14 +1617,16 @@ int drm_wait_vblank(struct drm_device *dev, void *data, vblwait->request.sequence = seq + 1; }
- DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
vblwait->request.sequence, crtc);
- vblank->last_wait = vblwait->request.sequence;
- DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
(((drm_vblank_count(dev, crtc) -
vblwait->request.sequence) <= (1 << 23)) ||
!vblank->enabled ||
!dev->irq_enabled));
if (vblwait->request.sequence != seq) {
DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
vblwait->request.sequence, crtc);
vblank->last_wait = vblwait->request.sequence;
DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
(((drm_vblank_count(dev, crtc) -
vblwait->request.sequence) <= (1 << 23)) ||
!vblank->enabled ||
!dev->irq_enabled));
}
if (ret != -EINTR) { struct timeval now;
It would be good to have some DRM_DEBUG output for the skip-the-wait case as well, so one can still follow from dmesg output when a client does a drmWaitVblank call even if it is only a query.
We still have DRM_DEBUG("returning %d to client"), as well as the drmIoctl:DRM_DEBUG(ioctl->name), is that not sufficient? -Chris
Oh right, that's good enough. Maybe add "on crtc %d" to that DRM_DEBUG, to make it unambiguous for which crtc some count is returned?
-mario
On 04/02/2015 01:34 PM, Chris Wilson wrote:
On vblank instant-off systems, we can get into a situation where the cost of enabling and disabling the vblank IRQ around a drmWaitVblank query dominates. However, we know that if the user wants the current vblank counter, they are also very likely to immediately queue a vblank wait and so we can keep the interrupt around and only turn it off if we have no further vblank requests in the interrupt interval.
After vblank event delivery there is a shadow of one vblank where the interrupt is kept alive for the user to query and queue another vblank event. Similarly, if the user is using blocking drmWaitVblanks, the interrupt will be disabled on the IRQ following the wait completion. However, if the user is simply querying the current vblank counter and timestamp, the interrupt will be disabled after every IRQ and the user will enabled it again on the first query following the IRQ.
Testcase: igt/kms_vblank Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_irq.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index c8a34476570a..6f5dc18779e2 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1091,9 +1091,9 @@ void drm_vblank_put(struct drm_device *dev, int crtc) if (atomic_dec_and_test(&vblank->refcount)) { if (drm_vblank_offdelay == 0) return;
else if (dev->vblank_disable_immediate || drm_vblank_offdelay < 0)
else if (drm_vblank_offdelay < 0) vblank_disable_fn((unsigned long)vblank);
else
}else if (!dev->vblank_disable_immediate) mod_timer(&vblank->disable_timer, jiffies + ((drm_vblank_offdelay * HZ)/1000));
@@ -1697,6 +1697,17 @@ bool drm_handle_vblank(struct drm_device *dev, int crtc)
spin_lock_irqsave(&dev->event_lock, irqflags);
You could move the code before the spin_lock_irqsave(&dev->event_lock, irqflags); i think it doesn't need that lock?
- if (dev->vblank_disable_immediate && !atomic_read(&vblank->refcount)) {
Also check for (drm_vblank_offdelay > 0) to make sure we have a way out of instant disable here, and the same meaning of of drm_vblank_offdelay like we have in the current implementation.
This hunk ...
unsigned long vbl_lock_irqflags;
spin_lock_irqsave(&dev->vbl_lock, vbl_lock_irqflags);
if (atomic_read(&vblank->refcount) == 0 && vblank->enabled) {
DRM_DEBUG("disabling vblank on crtc %d\n", crtc);
vblank_disable_and_save(dev, crtc);
}
spin_unlock_irqrestore(&dev->vbl_lock, vbl_lock_irqflags);
... is the same as a call to vblank_disable_fn((unsigned long) vblank); Maybe replace by that call?
You could also return here already, as the code below will just take a lock, realize vblanks are now disabled and then release the locks and exit.
- }
- /* Need timestamp lock to prevent concurrent execution with
- vblank enable/disable, as this would cause inconsistent
- or corrupted timestamps and vblank counts.
I think the logic itself is fine and at least basic testing of the patch on a Intel HD Ironlake didn't show problems, so with the above taken into account it would have my slightly uneasy reviewed-by.
One thing that worries me a little bit about the disable inside vblank irq are the potential races between the disable code and the display engine which could cause really bad off-by-one errors for clients on a imperfect driver. These races can only happen if vblank enable or disable happens close to or inside the vblank. This approach lets the instant disable happen exactly inside vblank when there is the highest chance of triggering that condition.
This doesn't seem to be a problem for intel kms, but other drivers don't have instant disable yet, so we don't know how well we could do it there. Additionally things like dynamic power management tend to operate inside vblank, sometimes with "funny" side effects to other stuff, e.g., dpm on AMD, as i remember from some long debug session with Michel and Alex last summer where dpm played a role. Therefore it seems more safe to me to avoid actions inside vblank that could be done outside. E.g., instead of doing the disable inside the vblank irq one could maybe just schedule an exact timer to do the disable a few milliseconds later in the middle of active scanout to avoid these potential issues?
-mario
On 04/15/2015 03:03 AM, Mario Kleiner wrote:
On 04/02/2015 01:34 PM, Chris Wilson wrote:
On vblank instant-off systems, we can get into a situation where the cost of enabling and disabling the vblank IRQ around a drmWaitVblank query dominates. However, we know that if the user wants the current vblank counter, they are also very likely to immediately queue a vblank wait and so we can keep the interrupt around and only turn it off if we have no further vblank requests in the interrupt interval.
After vblank event delivery there is a shadow of one vblank where the interrupt is kept alive for the user to query and queue another vblank event. Similarly, if the user is using blocking drmWaitVblanks, the interrupt will be disabled on the IRQ following the wait completion. However, if the user is simply querying the current vblank counter and timestamp, the interrupt will be disabled after every IRQ and the user will enabled it again on the first query following the IRQ.
Testcase: igt/kms_vblank Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com
drivers/gpu/drm/drm_irq.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index c8a34476570a..6f5dc18779e2 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1091,9 +1091,9 @@ void drm_vblank_put(struct drm_device *dev, int crtc) if (atomic_dec_and_test(&vblank->refcount)) { if (drm_vblank_offdelay == 0) return;
else if (dev->vblank_disable_immediate || drm_vblank_offdelay
< 0)
else if (drm_vblank_offdelay < 0) vblank_disable_fn((unsigned long)vblank);
else
else if (!dev->vblank_disable_immediate) mod_timer(&vblank->disable_timer, jiffies + ((drm_vblank_offdelay * HZ)/1000)); }
@@ -1697,6 +1697,17 @@ bool drm_handle_vblank(struct drm_device *dev, int crtc)
spin_lock_irqsave(&dev->event_lock, irqflags);
You could move the code before the spin_lock_irqsave(&dev->event_lock, irqflags); i think it doesn't need that lock?
- if (dev->vblank_disable_immediate &&
!atomic_read(&vblank->refcount)) {
Also check for (drm_vblank_offdelay > 0) to make sure we have a way out of instant disable here, and the same meaning of of drm_vblank_offdelay like we have in the current implementation.
This hunk ...
unsigned long vbl_lock_irqflags;
spin_lock_irqsave(&dev->vbl_lock, vbl_lock_irqflags);
if (atomic_read(&vblank->refcount) == 0 && vblank->enabled) {
DRM_DEBUG("disabling vblank on crtc %d\n", crtc);
vblank_disable_and_save(dev, crtc);
}
spin_unlock_irqrestore(&dev->vbl_lock, vbl_lock_irqflags);
... is the same as a call to vblank_disable_fn((unsigned long) vblank); Maybe replace by that call?
You could also return here already, as the code below will just take a lock, realize vblanks are now disabled and then release the locks and exit.
- }
/* Need timestamp lock to prevent concurrent execution with * vblank enable/disable, as this would cause inconsistent * or corrupted timestamps and vblank counts.
I think the logic itself is fine and at least basic testing of the patch on a Intel HD Ironlake didn't show problems, so with the above taken into account it would have my slightly uneasy reviewed-by.
One thing that worries me a little bit about the disable inside vblank irq are the potential races between the disable code and the display engine which could cause really bad off-by-one errors for clients on a imperfect driver. These races can only happen if vblank enable or disable happens close to or inside the vblank. This approach lets the instant disable happen exactly inside vblank when there is the highest chance of triggering that condition.
This doesn't seem to be a problem for intel kms, but other drivers don't have instant disable yet, so we don't know how well we could do it there. Additionally things like dynamic power management tend to operate inside vblank, sometimes with "funny" side effects to other stuff, e.g., dpm on AMD, as i remember from some long debug session with Michel and Alex last summer where dpm played a role. Therefore it seems more safe to me to avoid actions inside vblank that could be done outside. E.g., instead of doing the disable inside the vblank irq one could maybe just schedule an exact timer to do the disable a few milliseconds later in the middle of active scanout to avoid these potential issues?
-mario
After testing this, one more thing that would make sense is to move the disable block at the end of drm_handle_vblank() instead of at the top.
Turns out that if high precision timestaming is disabled or doesn't work for some reason (as can be simulated by echo 0 > /sys/module/drm/parameters/timestamp_precision_usec), then with your delayed disable code at its current place, the vblank counter won't increment anymore at all for instant queries, ie. with your other "instant query" patches. Clients which repeatedly query the counter and wait for it to progress will simply hang, spinning in an endless query loop. There's that comment in vblank_disable_and_save:
"* Skip this step if there isn't any high precision timestamp * available. In that case we can't account for this and just * hope for the best. */
With the disable happening after leading edge of vblank (== hw counter increment already happened) but before the vblank counter/timestamp handling in drm_handle_vblank, that step is needed to keep the counter progressing, so skipping it is bad.
Now without high precision timestamping support, a kms driver must not set dev->vblank_disable_immediate = true, as this would cause problems for clients, so this shouldn't matter, but it would be good to still make this robust against a future kms driver which might have unreliable high precision timestamping, e.g., high precision timestamping that intermittently doesn't work.
For reference, my tweak of your patch looks now like this for the hunk at the bottom of drm_handle_vblank():
.... spin_unlock(&dev->vblank_time_lock);
wake_up(&vblank->queue); drm_handle_vblank_events(dev, crtc);
+ if (dev->vblank_disable_immediate && drm_vblank_offdelay > 0 && + !atomic_read(&vblank->refcount)) { + unsigned long vbl_lock_irqflags; + + spin_lock_irqsave(&dev->vbl_lock, vbl_lock_irqflags); + if (atomic_read(&vblank->refcount) == 0 && vblank->enabled) { + DRM_DEBUG("disabling vblank on crtc %d\n", crtc); + vblank_disable_and_save(dev, crtc); + } + spin_unlock_irqrestore(&dev->vbl_lock, vbl_lock_irqflags); + } + spin_unlock_irqrestore(&dev->event_lock, irqflags);
return true;
This could be further simplified to
...... spin_unlock(&dev->vblank_time_lock);
wake_up(&vblank->queue); drm_handle_vblank_events(dev, crtc); + + if (dev->vblank_disable_immediate && drm_vblank_offdelay > 0 && + !atomic_read(&vblank->refcount)) { + vblank_disable_fn(vblank); + } + spin_unlock_irqrestore(&dev->event_lock, irqflags);
return true;
The tweaked version worked fine in all my tests. -mario
On vblank instant-off systems, we can get into a situation where the cost of enabling and disabling the vblank IRQ around a drmWaitVblank query dominates. However, we know that if the user wants the current vblank counter, they are also very likely to immediately queue a vblank wait and so we can keep the interrupt around and only turn it off if we have no further vblank requests in the interrupt interval.
After vblank event delivery there is a shadow of one vblank where the interrupt is kept alive for the user to query and queue another vblank event. Similarly, if the user is using blocking drmWaitVblanks, the interrupt will be disabled on the IRQ following the wait completion. However, if the user is simply querying the current vblank counter and timestamp, the interrupt will be disabled after every IRQ and the user will enabled it again on the first query following the IRQ.
v2: Mario Kleiner - After testing this, one more thing that would make sense is to move the disable block at the end of drm_handle_vblank() instead of at the top.
Turns out that if high precision timestaming is disabled or doesn't work for some reason (as can be simulated by echo 0 > /sys/module/drm/parameters/timestamp_precision_usec), then with your delayed disable code at its current place, the vblank counter won't increment anymore at all for instant queries, ie. with your other "instant query" patches. Clients which repeatedly query the counter and wait for it to progress will simply hang, spinning in an endless query loop. There's that comment in vblank_disable_and_save:
"* Skip this step if there isn't any high precision timestamp * available. In that case we can't account for this and just * hope for the best. */
With the disable happening after leading edge of vblank (== hw counter increment already happened) but before the vblank counter/timestamp handling in drm_handle_vblank, that step is needed to keep the counter progressing, so skipping it is bad.
Now without high precision timestamping support, a kms driver must not set dev->vblank_disable_immediate = true, as this would cause problems for clients, so this shouldn't matter, but it would be good to still make this robust against a future kms driver which might have unreliable high precision timestamping, e.g., high precision timestamping that intermittently doesn't work.
Testcase: igt/kms_vblank Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Michel Dänzer michel@daenzer.net Cc: Laurent Pinchart laurent.pinchart@ideasonboard.com Cc: Dave Airlie airlied@redhat.com, Cc: Mario Kleiner mario.kleiner.de@gmail.com --- drivers/gpu/drm/drm_irq.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index c8a34476570a..d27b91f8a357 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1091,9 +1091,9 @@ void drm_vblank_put(struct drm_device *dev, int crtc) if (atomic_dec_and_test(&vblank->refcount)) { if (drm_vblank_offdelay == 0) return; - else if (dev->vblank_disable_immediate || drm_vblank_offdelay < 0) + else if (drm_vblank_offdelay < 0) vblank_disable_fn((unsigned long)vblank); - else + else if (!dev->vblank_disable_immediate) mod_timer(&vblank->disable_timer, jiffies + ((drm_vblank_offdelay * HZ)/1000)); } @@ -1751,6 +1751,16 @@ bool drm_handle_vblank(struct drm_device *dev, int crtc) wake_up(&vblank->queue); drm_handle_vblank_events(dev, crtc);
+ /* With instant-off, we defer disabling the interrupt until after + * we finish processing the following vblank. The disable has to + * be last (after drm_handle_vblank_events) so that the timestamp + * is always accurate. + */ + if (dev->vblank_disable_immediate & + drm_vblank_offdelay > 0 && + !atomic_read(&vblank->refcount)) + vblank_disable_fn(vblank); + spin_unlock_irqrestore(&dev->event_lock, irqflags);
return true;
dri-devel@lists.freedesktop.org