[RFC] Reduce idle vblank wakeups

List overview All Threads
Download

newer

older

[RFC] fence, sa allocator, ib...

[PATCH] drm/radeon: don't mess...

Matthew Garrett

16 Nov 2011 16 Nov '11

2:20 p.m.

The drm core currently waits 5 seconds from userspace dropping a request for vblanks to vblanks actually being disabled. This appears to be a workaround for broken hardware, but results in a mostly idle desktop generating a huge number of wakeups that are entirely unnecessary but which consume measurable amounts of power. This patchset makes the vblank timeout per-device rather than global, making it easy for drivers to override the behaviour without compromising other graphics hardware in the system. It then removes the delay on Intel hardware. I've tested this successfully on Sandybridge without any evidence of spurious or missing irqs, but I don't know how well it works on older hardware. Feedback not only welcome, but positively desired.

Show replies by date

Matthew Garrett

16 Nov 16 Nov

2:20 p.m.

New subject: [RFC 1/3] drm: Make drm_vblank_offdelay per-device

drm_vblank_offdelay is currently a system global, despite the optimal value being hardware-specific. Move it to the drm_device structure.

Signed-off-by: Matthew Garrett mjg@redhat.com --- drivers/gpu/drm/drm_irq.c | 6 +++--- drivers/gpu/drm/drm_stub.c | 8 +++++--- include/drm/drmP.h | 2 +- 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index cb3794a..8bcb6a4 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -928,7 +928,7 @@ EXPORT_SYMBOL(drm_vblank_get); * @crtc: which counter to give up * * Release ownership of a given vblank counter, turning off interrupts - * if possible. Disable interrupts after drm_vblank_offdelay milliseconds. + * if possible. Disable interrupts after vblank_offdelay milliseconds. */ void drm_vblank_put(struct drm_device *dev, int crtc) { @@ -936,9 +936,9 @@ void drm_vblank_put(struct drm_device *dev, int crtc)

/* Last user schedules interrupt disable */ if (atomic_dec_and_test(&dev->vblank_refcount[crtc]) && - (drm_vblank_offdelay > 0)) + (dev->vblank_offdelay > 0)) mod_timer(&dev->vblank_disable_timer, - jiffies + ((drm_vblank_offdelay * DRM_HZ)/1000)); + jiffies + ((dev->vblank_offdelay * DRM_HZ)/1000)); } EXPORT_SYMBOL(drm_vblank_put);

diff --git a/drivers/gpu/drm/drm_stub.c b/drivers/gpu/drm/drm_stub.c index 6d7b083..189a077 100644 --- a/drivers/gpu/drm/drm_stub.c +++ b/drivers/gpu/drm/drm_stub.c @@ -40,8 +40,8 @@ unsigned int drm_debug = 0; /* 1 to enable debug output */ EXPORT_SYMBOL(drm_debug);

-unsigned int drm_vblank_offdelay = 5000; /* Default to 5000 msecs. */ -EXPORT_SYMBOL(drm_vblank_offdelay); +unsigned int drm_default_vblank_offdelay = 5000; /* Default to 5000 msecs. */ +EXPORT_SYMBOL(drm_default_vblank_offdelay);

unsigned int drm_timestamp_precision = 20; /* Default to 20 usecs. */ EXPORT_SYMBOL(drm_timestamp_precision); @@ -54,7 +54,7 @@ MODULE_PARM_DESC(vblankoffdelay, "Delay until vblank irq auto-disable [msecs]"); MODULE_PARM_DESC(timestamp_precision_usec, "Max. error on timestamps [usecs]");

module_param_named(debug, drm_debug, int, 0600); -module_param_named(vblankoffdelay, drm_vblank_offdelay, int, 0600); +module_param_named(vblankoffdelay, drm_default_vblank_offdelay, int, 0600); module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 0600);

struct idr drm_minors_idr; @@ -290,6 +290,8 @@ int drm_fill_in_dev(struct drm_device *dev,

dev->driver = driver;

+ dev->vblank_offdelay = drm_default_vblank_offdelay; + if (dev->driver->bus->agp_init) { retcode = dev->driver->bus->agp_init(dev); if (retcode) diff --git a/include/drm/drmP.h b/include/drm/drmP.h index cf39949..81e6bbb 100644 --- a/include/drm/drmP.h +++ b/include/drm/drmP.h @@ -1168,6 +1168,7 @@ struct drm_device { struct idr object_name_idr; /*@} */ int switch_power_state; + int vblank_offdelay; };

#define DRM_SWITCH_POWER_ON 0 @@ -1463,7 +1464,6 @@ extern void drm_put_dev(struct drm_device *dev); extern int drm_put_minor(struct drm_minor **minor); extern unsigned int drm_debug;

-extern unsigned int drm_vblank_offdelay; extern unsigned int drm_timestamp_precision;

extern struct class *drm_class;

-- 1.7.7.1

Matthew Garrett

2:20 p.m.

New subject: [RFC 2/3] drm: Handle the vblank_offdelay=0 case properly

Right now if vblank_offdelay is 0, vblanks won't be disabled after the last user. Fix that case up.

Signed-off-by: Matthew Garrett mjg@redhat.com --- drivers/gpu/drm/drm_irq.c | 3 +-- 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 8bcb6a4..94f9579 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -935,8 +935,7 @@ void drm_vblank_put(struct drm_device *dev, int crtc) BUG_ON(atomic_read(&dev->vblank_refcount[crtc]) == 0);

/* Last user schedules interrupt disable */ - if (atomic_dec_and_test(&dev->vblank_refcount[crtc]) && - (dev->vblank_offdelay > 0)) + if (atomic_dec_and_test(&dev->vblank_refcount[crtc])) mod_timer(&dev->vblank_disable_timer, jiffies + ((dev->vblank_offdelay * DRM_HZ)/1000)); }

-- 1.7.7.1

Matthew Garrett

2:20 p.m.

New subject: [RFC 3/3] i915: Drop vblank_offdelay

Sandybridge, at least, seems to manage without any vblank offdelay. Dropping this reduces the number of wakeups on an otherwise idle system dramatically.

Signed-off-by: Matthew Garrett mjg@redhat.com --- drivers/gpu/drm/i915/i915_dma.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index a9533c5..46e7172 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1917,6 +1917,9 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags) goto free_priv; }

+ /* vblank is reliable */ + dev->vblank_offdelay = 0; + /* overlay on gen2 is broken and can't address above 1G */ if (IS_GEN2(dev)) dma_set_coherent_mask(&dev->pdev->dev, DMA_BIT_MASK(30));

-- 1.7.7.1

Adam Jackson

2:32 p.m.

On Wed, 2011-11-16 at 09:20 -0500, Matthew Garrett wrote:

...

The drm core currently waits 5 seconds from userspace dropping a request for vblanks to vblanks actually being disabled. This appears to be a workaround for broken hardware, but results in a mostly idle desktop generating a huge number of wakeups that are entirely unnecessary but which consume measurable amounts of power. This patchset makes the vblank timeout per-device rather than global, making it easy for drivers to override the behaviour without compromising other graphics hardware in the system. It then removes the delay on Intel hardware. I've tested this successfully on Sandybridge without any evidence of spurious or missing irqs, but I don't know how well it works on older hardware. Feedback not only welcome, but positively desired.

Looks good to me. I'll test this on some other intel kit I've got handy. For the series:

Reviewed-by: Adam Jackson ajax@redhat.com

- ajax

Michel Dänzer

5:03 p.m.

On Mit, 2011-11-16 at 09:20 -0500, Matthew Garrett wrote:

...

The drm core currently waits 5 seconds from userspace dropping a request for vblanks to vblanks actually being disabled. This appears to be a workaround for broken hardware, but results in a mostly idle desktop generating a huge number of wakeups that are entirely unnecessary but which consume measurable amounts of power. This patchset makes the vblank timeout per-device rather than global, making it easy for drivers to override the behaviour without compromising other graphics hardware in the system. It then removes the delay on Intel hardware. I've tested this successfully on Sandybridge without any evidence of spurious or missing irqs, but I don't know how well it works on older hardware. Feedback not only welcome, but positively desired.

Have you discussed this with Mario Kleiner (CC'd)?

I thought the main reason for the delay wasn't broken hardware but to avoid constantly ping-ponging the vblank IRQ between on and off with apps which regularly neeed the vblank counter value, as that could make the counter unreliable. Maybe I'm misremembering though.

Even if I'm not, lowering the delay shouldn't be a problem, so long as it's long enough that at least apps which need the vblank counter every or every other frame don't cause the IRQ to ping-pong. But that shouldn't depend on the hardware.

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer

Matthew Garrett

5:11 p.m.

On Wed, Nov 16, 2011 at 06:03:15PM +0100, Michel Dänzer wrote:

...

I thought the main reason for the delay wasn't broken hardware but to avoid constantly ping-ponging the vblank IRQ between on and off with apps which regularly neeed the vblank counter value, as that could make the counter unreliable. Maybe I'm misremembering though.

If turning it on and off results in the counter value being wrong then surely that's a hardware problem? I've tested that turning it on and off between every IRQ still gives valid counter values on sandybridge.

-- Matthew Garrett | mjg59@srcf.ucam.org

Andrew Lutomirski

5:53 p.m.

New subject: [Intel-gfx] [RFC] Reduce idle vblank wakeups

2011/11/16 Matthew Garrett mjg59@srcf.ucam.org:

...

On Wed, Nov 16, 2011 at 06:03:15PM +0100, Michel Dänzer wrote:

...
I thought the main reason for the delay wasn't broken hardware but to avoid constantly ping-ponging the vblank IRQ between on and off with apps which regularly neeed the vblank counter value, as that could make the counter unreliable. Maybe I'm misremembering though.

If turning it on and off results in the counter value being wrong then surely that's a hardware problem? I've tested that turning it on and off between every IRQ still gives valid counter values on sandybridge.

This, and the thread that contains it, might be interesting:

http://permalink.gmane.org/gmane.comp.video.dri.devel/53201

In summary: there was some resistance to doing this in January, but I couldn't find any legitimate reason.

--Andy

Mario Kleiner

6:27 p.m.

On Nov 16, 2011, at 6:11 PM, Matthew Garrett wrote:

...

On Wed, Nov 16, 2011 at 06:03:15PM +0100, Michel Dänzer wrote:

...
I thought the main reason for the delay wasn't broken hardware but to avoid constantly ping-ponging the vblank IRQ between on and off with apps which regularly neeed the vblank counter value, as that could make the counter unreliable. Maybe I'm misremembering though.

If turning it on and off results in the counter value being wrong then surely that's a hardware problem? I've tested that turning it on and off between every IRQ still gives valid counter values on sandybridge.

-- Matthew Garrett | mjg59@srcf.ucam.org

...

On Wed, Nov 16, 2011 at 06:03:15PM +0100, Michel Dänzer wrote:

...

Even if I'm not, lowering the delay shouldn't be a problem, so long as it's long enough that at least apps which need the vblank counter

every

...

or every other frame don't cause the IRQ to ping-pong. But that shouldn't depend on the hardware.

Hi, and thanks Michel for cc'ing me,

It's not broken hardware, but fast ping-ponging it on and off can make the vblank counter and vblank timestamps unreliable for apps that need high timing precision, especially for the ones that use the OML_sync_control extensions for precise swap scheduling. My target application is vision science neuroscience, where (sub-)milliseconds often matter for visual stimulation.

I think making the vblank off delay driver specific via these patches is a good idea. Lowering the timeout to something like a few refresh cycles, maybe somewhere between 50 msecs and 100 msecs would be also fine by me. I still would like to keep some drm config option to disable or override the vblank off delay by users.

The intel and radeon kms drivers implement everything that's needed to make it mostly work. Except for a small race between the cpu and gpu in the vblank_disable_and_save() function <http://lxr.free- electrons.com/source/drivers/gpu/drm/drm_irq.c#L101> and drm_update_vblank_count(). It can cause an off-by-one error when reinitializing the drm vblank counter from the gpu's hardware counter if the enable/disable function is called at the wrong moment while the gpu's scanout is inside the vblank interval, see comments in the code. I have some sketchy idea for a patch that could detect when the race happens and retry hw counter queries to fix this. Without that patch, there's some chance between 0% and 4% of being off-by-one.

On current nouveau kms, disabling vblank irqs guarantees you wrong vblank counts and wrong vblank timestamps after turning them on again, because the kms driver doesn't implement the hook for hardware vblank counter query correctly. The drm vblank counter misses all counts during the vblank irq off period. Other timing related hooks are missing as well. I have a couple of patches queued up and some more to come for the ddx and kms driver to mostly fix this. NVidia gpu's only have hardware vblank counters for NV-50 and later, fixing this for earlier gpu's would require some emulation of a hw vblank counter inside the kms driver.

Apps that rely on the vblank counter being totally reliable over long periods of time currently would be in a bad situation with a lowered vblank off delay, but that's probably generally not a good assumption. Toolkits like mine, which are more paranoid, currently can work fine as long as the off delay is at least a few video refresh cycles. I do the following for scheduling a reliably timed swap:

1. Query current vblank counter current_msc and vblank timestamp current_ust. 2. Calculate a target vblank count target_msc, based on current_msc, current_ust and some target time from usercode. 3. Schedule bufferswap for target_msc.

As long as the vblank off delay is long enough so that vblanks don't get turned off between 1. and 3, everything is fine, otherwise bad things will happen. Keeping a way to override the default off delay would be good to allow poor scientists to work around potentially broken drivers or gpu's in the future. @Matthew: I'm appealing here to your ex- Drosophila biologist heritage ;-)

thanks, -mario

********************************************************************* Mario Kleiner Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tuebingen Germany

e-mail: mario.kleiner@tuebingen.mpg.de office: +49 (0)7071/601-1623 fax: +49 (0)7071/601-616 www: http://www.kyb.tuebingen.mpg.de/~kleinerm ********************************************************************* "For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." (Richard Feynman)

Matthew Garrett

6:48 p.m.

On Wed, Nov 16, 2011 at 07:27:51PM +0100, Mario Kleiner wrote:

...

It's not broken hardware, but fast ping-ponging it on and off can make the vblank counter and vblank timestamps unreliable for apps that need high timing precision, especially for the ones that use the OML_sync_control extensions for precise swap scheduling. My target application is vision science neuroscience, where (sub-)milliseconds often matter for visual stimulation.

I'll admit that I'm struggling to understand the issue here. If the vblank counter is incremented at the time of vblank (which isn't the case for radeon, it seems, but as far as I can tell is the case for Intel) then how does ping-ponging the IRQ matter? vblank_disable_and_save() appears to handle this case.

...

I think making the vblank off delay driver specific via these patches is a good idea. Lowering the timeout to something like a few refresh cycles, maybe somewhere between 50 msecs and 100 msecs would be also fine by me. I still would like to keep some drm config option to disable or override the vblank off delay by users.

Does the timeout serve any purpose other than letting software effectively prevent vblanks from being disabled?

...

The intel and radeon kms drivers implement everything that's needed to make it mostly work. Except for a small race between the cpu and gpu in the vblank_disable_and_save() function http://lxr.free- electrons.com/source/drivers/gpu/drm/drm_irq.c#L101 and drm_update_vblank_count(). It can cause an off-by-one error when reinitializing the drm vblank counter from the gpu's hardware counter if the enable/disable function is called at the wrong moment while the gpu's scanout is inside the vblank interval, see comments in the code. I have some sketchy idea for a patch that could detect when the race happens and retry hw counter queries to fix this. Without that patch, there's some chance between 0% and 4% of being off-by-one.

For Radeon, I'd have thought you could handle this by scheduling an irq for the beginning of scanout (avivo has a register for that) and delaying the vblank disable until you hit it?

...

On current nouveau kms, disabling vblank irqs guarantees you wrong vblank counts and wrong vblank timestamps after turning them on again, because the kms driver doesn't implement the hook for hardware vblank counter query correctly. The drm vblank counter misses all counts during the vblank irq off period. Other timing related hooks are missing as well. I have a couple of patches queued up and some more to come for the ddx and kms driver to mostly fix this. NVidia gpu's only have hardware vblank counters for NV-50 and later, fixing this for earlier gpu's would require some emulation of a hw vblank counter inside the kms driver.

I've no problem with all of this work being case by case.

...

Apps that rely on the vblank counter being totally reliable over long periods of time currently would be in a bad situation with a lowered vblank off delay, but that's probably generally not a good assumption. Toolkits like mine, which are more paranoid, currently can work fine as long as the off delay is at least a few video refresh cycles. I do the following for scheduling a reliably timed swap:

Query current vblank counter current_msc and vblank timestamp

current_ust. 2. Calculate a target vblank count target_msc, based on current_msc, current_ust and some target time from usercode. 3. Schedule bufferswap for target_msc.

As long as the vblank off delay is long enough so that vblanks don't get turned off between 1. and 3, everything is fine, otherwise bad things will happen. Keeping a way to override the default off delay would be good to allow poor scientists to work around potentially broken drivers or gpu's in the future. @Matthew: I'm appealing here to your ex- Drosophila biologist heritage ;-)

If vblanks are disabled and then re-enabled between 1 and 3, what's the negative outcome?

-- Matthew Garrett | mjg59@srcf.ucam.org

Mario Kleiner

17 Nov 17 Nov

12:26 a.m.

On Nov 16, 2011, at 7:48 PM, Matthew Garrett wrote:

...

On Wed, Nov 16, 2011 at 07:27:51PM +0100, Mario Kleiner wrote:

...
It's not broken hardware, but fast ping-ponging it on and off can make the vblank counter and vblank timestamps unreliable for apps that need high timing precision, especially for the ones that use the OML_sync_control extensions for precise swap scheduling. My target application is vision science neuroscience, where (sub-)milliseconds often matter for visual stimulation.

I'll admit that I'm struggling to understand the issue here. If the vblank counter is incremented at the time of vblank (which isn't the case for radeon, it seems, but as far as I can tell is the case for Intel) then how does ping-ponging the IRQ matter? vblank_disable_and_save() appears to handle this case.

The drm software vblank counter which is used for scheduling swaps etc. gets incremented in the gpu's vblank irq handler. The gpu's hardware counter gets incremented somewhere inside the vblank interval. The increments don't happen at the same point in time. The race is between the vblank on/off code, which gets scheduled either by the timeout timer for the "off" case, or by usercode for the "on" case and the gpu's hardware vblank counter.

The existing code uses a lock (vblank_time_lock) to prevent some races between itself and the vblank irq handler, but it can't "lock" the gpu, so if the enable/disable code executes while the gpu is scanning out the vblank interval, it is undefined if the final vblank count will be correct or off by one. Vblank duration is somewhere up to 4.5% of refresh interval duration, so there's your up to 4% chance of getting it wrong.

If one could reliably avoid enabling/disabling in the problematic time period, the problem would go away.

...

...
I think making the vblank off delay driver specific via these patches is a good idea. Lowering the timeout to something like a few refresh cycles, maybe somewhere between 50 msecs and 100 msecs would be also fine by me. I still would like to keep some drm config option to disable or override the vblank off delay by users.

Does the timeout serve any purpose other than letting software effectively prevent vblanks from being disabled?

With perfect drivers and gpu's in a perfect world, no. In reality there's the race i described above, and nouveau and all other drivers except intel and radeon. The vblank irq also drives timestamping of vblanks, one update per vblank. The timestamps are cached if a client needs them inbetween updates. Turning off vblank irq invalidates the timestamps. radeon and intel can recreate the timestamp anytime as needed, but nouveau lacks this atm., so timestamps remain invalid for a whole video refresh cycle after vblank irq on. We have patches for nouveau kms almost ready, so only the race mentioned above would remain.

...

...
The intel and radeon kms drivers implement everything that's needed to make it mostly work. Except for a small race between the cpu and gpu in the vblank_disable_and_save() function http://lxr.free- electrons.com/source/drivers/gpu/drm/drm_irq.c#L101 and drm_update_vblank_count(). It can cause an off-by-one error when reinitializing the drm vblank counter from the gpu's hardware counter if the enable/disable function is called at the wrong moment while the gpu's scanout is inside the vblank interval, see comments in the code. I have some sketchy idea for a patch that could detect when the race happens and retry hw counter queries to fix this. Without that patch, there's some chance between 0% and 4% of being off-by-one.

For Radeon, I'd have thought you could handle this by scheduling an irq for the beginning of scanout (avivo has a register for that) and delaying the vblank disable until you hit it?

For Radeon there is such an irq, but iirc we had some discussions on this, also with Alex Deucher, a while ago and some irq's weren't considered very reliable, or already used for other stuff. The idea i had goes like this:

Use the crtc scanout position queries together with the vblank counter queries inside some calibration loop, maybe executed after each modeset, to find out the scanline range in which the hardware vblank counter increments -- basically a forbidden range of scanline positions where the race would happen. Then at each vblank off/on, query scanout position before and after the hw vblank counter query. If according to the scanout positions the vblank counter query happened within the forbidden time window, retry the query. With a well working calibration that should add no delay in most cases and a delay to the on/off code of a few dozen microseconds (=duration of a few scanlines) worst case.

With that and the pending nouveau patches in place i think we wouldn't need the vblank off delay anymore on such drivers.

...

...
On current nouveau kms, disabling vblank irqs guarantees you wrong vblank counts and wrong vblank timestamps after turning them on again, because the kms driver doesn't implement the hook for hardware vblank counter query correctly. The drm vblank counter misses all counts during the vblank irq off period. Other timing related hooks are missing as well. I have a couple of patches queued up and some more to come for the ddx and kms driver to mostly fix this. NVidia gpu's only have hardware vblank counters for NV-50 and later, fixing this for earlier gpu's would require some emulation of a hw vblank counter inside the kms driver.

I've no problem with all of this work being case by case.

Me neither. I just say if you'd go to the extreme and disable vblank irq's immediately with zero delay, without reliably fixing the problem i mentioned, you'd get those off by one counts, which would be very bad for apps that rely on precise timing. Doing so would basically make the whole oml_sync_control implementation mostly useless.

...

...
Apps that rely on the vblank counter being totally reliable over long periods of time currently would be in a bad situation with a lowered vblank off delay, but that's probably generally not a good assumption. Toolkits like mine, which are more paranoid, currently can work fine as long as the off delay is at least a few video refresh cycles. I do the following for scheduling a reliably timed swap:

Query current vblank counter current_msc and vblank timestamp

current_ust. 2. Calculate a target vblank count target_msc, based on current_msc, current_ust and some target time from usercode. 3. Schedule bufferswap for target_msc.

As long as the vblank off delay is long enough so that vblanks don't get turned off between 1. and 3, everything is fine, otherwise bad things will happen. Keeping a way to override the default off delay would be good to allow poor scientists to work around potentially broken drivers or gpu's in the future. @Matthew: I'm appealing here to your ex- Drosophila biologist heritage ;-)

If vblanks are disabled and then re-enabled between 1 and 3, what's the negative outcome?

1. glXGetSyncValuesOML() gives you the current vblank count and a timestamp of when exactly scanout of the corresponding video frame started. These values are the baseline for any vblank based swap scheduling or any other vblank synchronized actions, e.g., via glXWaitForMscOML().

2. App calculates stuff based on 1. but gets preempted or experiences scheduling delay on return from 1. - or the x-server gets preempted or scheduled away. Whatever, time passes...

3. glXSwapBuffersMscOML() or glXWaitForMscOML() is used to schedule something to occur at a target vblank count, based on the results of 1.

As long as vblanks don't get disabled between 1 and 3, everything is consistent. Scheduling delays/preemption of more than a few (dozen) milliseconds worst case are very rare, so a lowered vblank off delay of even one or two refresh cycles would solve the problem. I assume that most toolkits with needs for timing precision would follow a three step strategy like this one.

If vblank irq's get disabled immediately after step 1. and reenabled at step 3., and this triggers an off by one error due to a race, then the calculated target vblank count from steps 1/2 is invalid and useless for step 3 and many vision scientists which just started to make friendship with Linux lately will make very sad faces. Due to the nature of many of the stimulation paradigms used, there is also a high chance that many of the enables and disables would happen inside or very close to the problematic vblank interval when off by one error is most likely.

I agree with your patches, i just don't want vblank irq's to be disabled immediately with a zero timeout before remaining races are fixed, that would sure break things at least for scientific applications. And i'd like to keep some optional parameter that allows desparate users to disable the vblank off mechanism in case they would encounter a bug.

-mario

********************************************************************* Mario Kleiner Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tuebingen Germany

Matthew Garrett

2:19 a.m.

On Thu, Nov 17, 2011 at 01:26:37AM +0100, Mario Kleiner wrote:

...

On Nov 16, 2011, at 7:48 PM, Matthew Garrett wrote:

...
I'll admit that I'm struggling to understand the issue here. If the vblank counter is incremented at the time of vblank (which isn't the case for radeon, it seems, but as far as I can tell is the case for Intel) then how does ping-ponging the IRQ matter? vblank_disable_and_save() appears to handle this case.

The drm software vblank counter which is used for scheduling swaps etc. gets incremented in the gpu's vblank irq handler. The gpu's hardware counter gets incremented somewhere inside the vblank interval. The increments don't happen at the same point in time. The race is between the vblank on/off code, which gets scheduled either by the timeout timer for the "off" case, or by usercode for the "on" case and the gpu's hardware vblank counter.

Yes, that makes sense given the current design.

...

The existing code uses a lock (vblank_time_lock) to prevent some races between itself and the vblank irq handler, but it can't "lock" the gpu, so if the enable/disable code executes while the gpu is scanning out the vblank interval, it is undefined if the final vblank count will be correct or off by one. Vblank duration is somewhere up to 4.5% of refresh interval duration, so there's your up to 4% chance of getting it wrong.

Well presumably by "undefined" we really mean "hardware-specific" - for any given well-functioning hardware I'd expect the answer to be well-defined. Like I said, with Radeon we have the option of triggering an interrupt at the point where the hardware counter is actually incremented. If that then shuts down the vblank interrupt then we should have a well-defined answer.

...

...
Does the timeout serve any purpose other than letting software effectively prevent vblanks from being disabled?

With perfect drivers and gpu's in a perfect world, no. In reality there's the race i described above, and nouveau and all other drivers except intel and radeon. The vblank irq also drives timestamping of vblanks, one update per vblank. The timestamps are cached if a client needs them inbetween updates. Turning off vblank irq invalidates the timestamps. radeon and intel can recreate the timestamp anytime as needed, but nouveau lacks this atm., so timestamps remain invalid for a whole video refresh cycle after vblank irq on. We have patches for nouveau kms almost ready, so only the race mentioned above would remain.

Sure. We'd certainly need to improve things for other GPUs before enabling the same functionality there.

...

...
For Radeon, I'd have thought you could handle this by scheduling an irq for the beginning of scanout (avivo has a register for that) and delaying the vblank disable until you hit it?

For Radeon there is such an irq, but iirc we had some discussions on this, also with Alex Deucher, a while ago and some irq's weren't considered very reliable, or already used for other stuff. The idea i had goes like this:

Use the crtc scanout position queries together with the vblank counter queries inside some calibration loop, maybe executed after each modeset, to find out the scanline range in which the hardware vblank counter increments -- basically a forbidden range of scanline positions where the race would happen. Then at each vblank off/on, query scanout position before and after the hw vblank counter query. If according to the scanout positions the vblank counter query happened within the forbidden time window, retry the query. With a well working calibration that should add no delay in most cases and a delay to the on/off code of a few dozen microseconds (=duration of a few scanlines) worst case.

Assuming we're sleeping rather than busy-looping, that's certainly ok. My previous experiments with radeon indicated that the scanout irq was certainly not entirely reliable - on the other hand, I was trying to use it for completing memory reclocking within the vblank interval. It was typically still within a few scanlines, so a sanity check there wouldn't pose too much of a problem.

...

...
I've no problem with all of this work being case by case.

Me neither. I just say if you'd go to the extreme and disable vblank irq's immediately with zero delay, without reliably fixing the problem i mentioned, you'd get those off by one counts, which would be very bad for apps that rely on precise timing. Doing so would basically make the whole oml_sync_control implementation mostly useless.

Right. My testing of sandybridge suggests that there wasn't a problem here - even with the ping-ponging I was reliably getting 60 interrupts in 60 seconds, with the counter incrementing by 1 each time. I certainly wouldn't push to enable it elsewhere without making sure that the results are reliable.

...

...
If vblanks are disabled and then re-enabled between 1 and 3, what's the negative outcome?

glXGetSyncValuesOML() gives you the current vblank count and a

timestamp of when exactly scanout of the corresponding video frame started. These values are the baseline for any vblank based swap scheduling or any other vblank synchronized actions, e.g., via glXWaitForMscOML().

App calculates stuff based on 1. but gets preempted or

experiences scheduling delay on return from 1. - or the x-server gets preempted or scheduled away. Whatever, time passes...

glXSwapBuffersMscOML() or glXWaitForMscOML() is used to schedule

something to occur at a target vblank count, based on the results of

As long as vblanks don't get disabled between 1 and 3, everything is consistent. Scheduling delays/preemption of more than a few (dozen) milliseconds worst case are very rare, so a lowered vblank off delay of even one or two refresh cycles would solve the problem. I assume that most toolkits with needs for timing precision would follow a three step strategy like this one.

If vblank irq's get disabled immediately after step 1. and reenabled at step 3., and this triggers an off by one error due to a race, then the calculated target vblank count from steps 1/2 is invalid and useless for step 3 and many vision scientists which just started to make friendship with Linux lately will make very sad faces. Due to the nature of many of the stimulation paradigms used, there is also a high chance that many of the enables and disables would happen inside or very close to the problematic vblank interval when off by one error is most likely.

Ok, so as long as we believe that we're reliably reading the hardware counter and not getting off by one, we should be ok? I just want to be clear that this is dependent on hardware behaviour rather than being absolutely inherent :)

...

I agree with your patches, i just don't want vblank irq's to be disabled immediately with a zero timeout before remaining races are fixed, that would sure break things at least for scientific applications. And i'd like to keep some optional parameter that allows desparate users to disable the vblank off mechanism in case they would encounter a bug.

Yeah, no problem. I understand that completely.

-- Matthew Garrett | mjg59@srcf.ucam.org

Andrew Lutomirski

5:36 a.m.

New subject: [Intel-gfx] [RFC] Reduce idle vblank wakeups

On Wed, Nov 16, 2011 at 6:19 PM, Matthew Garrett mjg59@srcf.ucam.org wrote:

...

On Thu, Nov 17, 2011 at 01:26:37AM +0100, Mario Kleiner wrote:

...
On Nov 16, 2011, at 7:48 PM, Matthew Garrett wrote:

...
For Radeon, I'd have thought you could handle this by scheduling an irq for the beginning of scanout (avivo has a register for that) and delaying the vblank disable until you hit it?

For Radeon there is such an irq, but iirc we had some discussions on this, also with Alex Deucher, a while ago and some irq's weren't considered very reliable, or already used for other stuff. The idea i had goes like this:

Use the crtc scanout position queries together with the vblank counter queries inside some calibration loop, maybe executed after each modeset, to find out the scanline range in which the hardware vblank counter increments -- basically a forbidden range of scanline positions where the race would happen. Then at each vblank off/on, query scanout position before and after the hw vblank counter query. If according to the scanout positions the vblank counter query happened within the forbidden time window, retry the query. With a well working calibration that should add no delay in most cases and a delay to the on/off code of a few dozen microseconds (=duration of a few scanlines) worst case.

Assuming we're sleeping rather than busy-looping, that's certainly ok. My previous experiments with radeon indicated that the scanout irq was certainly not entirely reliable - on the other hand, I was trying to use it for completing memory reclocking within the vblank interval. It was typically still within a few scanlines, so a sanity check there wouldn't pose too much of a problem.

I think there's a simpler fix: just keep the hardware and software counts in sync -- if everything is working correctly (even with all these crazy races), the difference should be constant. Patch coming momentarily.

--Andy

Andy Lutomirski

5:39 a.m.

New subject: [RFC PATCH] drm: Fix off-by-one races on vblank disable

There are two possible races when disabling vblanks. If the IRQ fired but the hardware didn't update its counter yet, then we store too low a hardware counter. (Sensible hardware never does this. Apparently not all hardware is sensible.) If, on the other hand, the counter updated but the IRQ didn't fire yet, we store too high a counter.

We handled the former case with a heuristic based on timestamps and we did not handle the latter case. By saving a little more state, we can handle both cases exactly: all we need to do is watch for changes in the difference between the hardware and software vblank counts.

Signed-off-by: Andy Lutomirski luto@mit.edu ---

Rather than tweaking more things to reduce the chance of hitting a race while keeping the vblank disable timeout as low as possible, why not just fix the race?

This compiles but is not very well tested, because I don't know what tests to run. I haven't been able to provoke either race on my SNB laptop.

drivers/gpu/drm/drm_irq.c | 92 ++++++++++++++++++++++++++++---------------- include/drm/drmP.h | 2 +- 2 files changed, 59 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 3830e9e..1674a33 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -56,6 +56,12 @@ */ #define DRM_REDUNDANT_VBLIRQ_THRESH_NS 1000000

+/* Saved vblank count data, used only in this file. */ +struct drm_vbl_counts { + u32 hwcount; /* hw count at last state save or load */ + u32 drmcount; /* drm count at last state save or load */ +}; + /** * Get interrupt from bus id. * @@ -101,7 +107,8 @@ static void clear_vblank_timestamps(struct drm_device *dev, int crtc) static void vblank_disable_and_save(struct drm_device *dev, int crtc) { unsigned long irqflags; - u32 vblcount; + u32 drmcount, hwcount; + u32 drm_counts_seen, hw_counts_seen, offset; s64 diff_ns; int vblrc; struct timeval tvblank; @@ -121,44 +128,53 @@ static void vblank_disable_and_save(struct drm_device *dev, int crtc) /* No further vblank irq's will be processed after * this point. Get current hardware vblank count and * vblank timestamp, repeat until they are consistent. - * - * FIXME: There is still a race condition here and in - * drm_update_vblank_count() which can cause off-by-one - * reinitialization of software vblank counter. If gpu - * vblank counter doesn't increment exactly at the leading - * edge of a vblank interval, then we can lose 1 count if - * we happen to execute between start of vblank and the - * delayed gpu counter increment. */ do { - dev->last_vblank[crtc] = dev->driver->get_vblank_counter(dev, crtc); + hwcount = dev->driver->get_vblank_counter(dev, crtc); vblrc = drm_get_last_vbltimestamp(dev, crtc, &tvblank, 0); - } while (dev->last_vblank[crtc] != dev->driver->get_vblank_counter(dev, crtc)); + } while (hwcount != dev->driver->get_vblank_counter(dev, crtc));

/* Compute time difference to stored timestamp of last vblank * as updated by last invocation of drm_handle_vblank() in vblank irq. */ - vblcount = atomic_read(&dev->_vblank_count[crtc]); + drmcount = atomic_read(&dev->_vblank_count[crtc]); diff_ns = timeval_to_ns(&tvblank) - - timeval_to_ns(&vblanktimestamp(dev, crtc, vblcount)); + timeval_to_ns(&vblanktimestamp(dev, crtc, drmcount));

- /* If there is at least 1 msec difference between the last stored - * timestamp and tvblank, then we are currently executing our - * disable inside a new vblank interval, the tvblank timestamp - * corresponds to this new vblank interval and the irq handler - * for this vblank didn't run yet and won't run due to our disable. - * Therefore we need to do the job of drm_handle_vblank() and - * increment the vblank counter by one to account for this vblank. + /* We could be off by one in either direction. If a vblank just + * happened but the IRQ hasn't been handled yet, then drmcount is + * too low by one. On the other hand, if the GPU fires its vblank + * interrupts *before* updating its counter, then hwcount could + * be too low by one. (If both happen, they cancel out.) * - * Skip this step if there isn't any high precision timestamp - * available. In that case we can't account for this and just - * hope for the best. + * Fortunately, we have enough information to figure out what + * happened. Assuming the hardware counter works right, the + * difference between drmcount and vblcount should be a constant + * (modulo max_vblank_count). We have both saved values from last + * time we turned the interrupt on. */ - if ((vblrc > 0) && (abs64(diff_ns) > 1000000)) { - atomic_inc(&dev->_vblank_count[crtc]); - smp_mb__after_atomic_inc(); + drm_counts_seen = drmcount - dev->last_vblank[crtc].drmcount; + hw_counts_seen = hwcount - dev->last_vblank[crtc].hwcount; + + offset = (drm_counts_seen - hw_counts_seen) % dev->max_vblank_count; + if (offset == 1) { + hwcount++; + DRM_DEBUG("last_vblank[%d]: hwcount++\n", crtc); + } else if (offset == dev->max_vblank_count - 1) { + hwcount--; + DRM_DEBUG("last_vblank[%d]: hwcount--\n", crtc); + } else if (offset != 0) { + DRM_DEBUG("last_vblank[%d]: drm_counts_seen = %u, hw_counts_seen = %u, max = %u is unexpected", + crtc, (unsigned)drm_counts_seen, + (unsigned)hw_counts_seen, + (unsigned)dev->max_vblank_count); + + /* Trying to fix it might just make it worse. */ }

+ dev->last_vblank[crtc].hwcount = hwcount; + dev->last_vblank[crtc].drmcount = drmcount; /* not really necessary */ + /* Invalidate all timestamps while vblank irq's are off. */ clear_vblank_timestamps(dev, crtc);

@@ -238,7 +254,8 @@ int drm_vblank_init(struct drm_device *dev, int num_crtcs) if (!dev->vblank_enabled) goto err;

- dev->last_vblank = kcalloc(num_crtcs, sizeof(u32), GFP_KERNEL); + dev->last_vblank = kcalloc(num_crtcs, sizeof(struct drm_vbl_counts), + GFP_KERNEL); if (!dev->last_vblank) goto err;

@@ -268,6 +285,9 @@ int drm_vblank_init(struct drm_device *dev, int num_crtcs) init_waitqueue_head(&dev->vbl_queue[i]); atomic_set(&dev->_vblank_count[i], 0); atomic_set(&dev->vblank_refcount[i], 0); + dev->last_vblank[i].drmcount = 0; + dev->last_vblank[i].hwcount = + dev->driver->get_vblank_counter(dev, i); }

dev->vblank_disable_allowed = 0; @@ -410,7 +430,9 @@ int drm_irq_uninstall(struct drm_device *dev) for (i = 0; i < dev->num_crtcs; i++) { DRM_WAKEUP(&dev->vbl_queue[i]); dev->vblank_enabled[i] = 0; - dev->last_vblank[i] = dev->driver->get_vblank_counter(dev, i); + dev->last_vblank[i].hwcount = + dev->driver->get_vblank_counter(dev, i); + dev->last_vblank[i].drmcount = drm_vblank_count(dev, i); } spin_unlock_irqrestore(&dev->vbl_lock, irqflags);

@@ -841,12 +863,12 @@ static void drm_update_vblank_count(struct drm_device *dev, int crtc) } while (cur_vblank != dev->driver->get_vblank_counter(dev, crtc));

/* Deal with counter wrap */ - diff = cur_vblank - dev->last_vblank[crtc]; - if (cur_vblank < dev->last_vblank[crtc]) { + diff = cur_vblank - dev->last_vblank[crtc].hwcount; + if (cur_vblank < dev->last_vblank[crtc].hwcount) { diff += dev->max_vblank_count;

- DRM_DEBUG("last_vblank[%d]=0x%x, cur_vblank=0x%x => diff=0x%x\n", - crtc, dev->last_vblank[crtc], cur_vblank, diff); + DRM_DEBUG("last_vblank[%d].hwcount=0x%x, cur_vblank=0x%x => diff=0x%x\n", + crtc, dev->last_vblank[crtc].hwcount, cur_vblank, diff); }

DRM_DEBUG("enabling vblank interrupts on crtc %d, missed %d\n", @@ -861,8 +883,10 @@ static void drm_update_vblank_count(struct drm_device *dev, int crtc) vblanktimestamp(dev, crtc, tslot) = t_vblank; }

+ dev->last_vblank[crtc].hwcount = cur_vblank; smp_mb__before_atomic_inc(); - atomic_add(diff, &dev->_vblank_count[crtc]); + dev->last_vblank[crtc].drmcount = + atomic_add_return(diff, &dev->_vblank_count[crtc]); smp_mb__after_atomic_inc(); }

diff --git a/include/drm/drmP.h b/include/drm/drmP.h index 9b7c2bb..4950c0f 100644 --- a/include/drm/drmP.h +++ b/include/drm/drmP.h @@ -1108,7 +1108,7 @@ struct drm_device { spinlock_t vblank_time_lock; /**< Protects vblank count and time updates during vblank enable/disable */ spinlock_t vbl_lock; atomic_t *vblank_refcount; /* number of users of vblank interruptsper crtc */ - u32 *last_vblank; /* protected by dev->vbl_lock, used */ + struct drm_vbl_counts *last_vblank; /* protected by dev->vbl_lock, used */ /* for wraparound handling */ int *vblank_enabled; /* so we don't call enable more than once per disable */

-- 1.7.7.1

Michel Dänzer

8:30 a.m.

New subject: [RFC PATCH] drm: Fix off-by-one races on vblank disable

[ Dropping intel-gfx list from CC, as it automatically rejects posts from non-subscribers ]

On Mit, 2011-11-16 at 21:39 -0800, Andy Lutomirski wrote:

...

There are two possible races when disabling vblanks. If the IRQ fired but the hardware didn't update its counter yet, then we store too low a hardware counter. (Sensible hardware never does this. Apparently not all hardware is sensible.)

The thing is, 'the' IRQ and 'the' hardware counter aren't necessarily about the same thing. We want an IRQ which triggers at the beginning of vertical blank, but the Radeon hardware counters increment at the beginning of scanout, i.e. at the end of vertical blank. Does that make the hardware 'broken' or 'not sensible'?

...

If, on the other hand, the counter updated but the IRQ didn't fire yet, we store too high a counter.

We handled the former case with a heuristic based on timestamps and we did not handle the latter case. By saving a little more state, we can handle both cases exactly: all we need to do is watch for changes in the difference between the hardware and software vblank counts.

I'm afraid that can't work:

Some (AFAIR also Intel) hardware resets the counter to 0 when the CRTC is disabled / enabled (e.g. for DPMS, or a modeset). That's why we ended up only counting interrupts while the IRQ is enabled, and only using the hardware counter to fill in while the IRQ is disabled. The hardware counter cannot be assumed to have any defined behaviour between enabling and disabling the IRQ.

To compensate for this, the drivers call drm_vblank_pre_modeset (which enables the IRQ, which also updates the virtual counter from the hardware counter) before disabling the CRTC and drm_vblank_post_modeset (which disables the IRQ, which also records the hardware counter) after enabling the CRTC.

...

This compiles but is not very well tested, because I don't know what tests to run.

Not sure there are any good tests yet. Mario, would it be possible to extract something exercising the various corner cases from your toolkit?

-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Debian, X and DRI developer

Andrew Lutomirski

3:39 p.m.

New subject: [RFC PATCH] drm: Fix off-by-one races on vblank disable

2011/11/17 Michel Dänzer michel@daenzer.net:

...

[ Dropping intel-gfx list from CC, as it automatically rejects posts from non-subscribers ]

On Mit, 2011-11-16 at 21:39 -0800, Andy Lutomirski wrote:

...
There are two possible races when disabling vblanks. If the IRQ fired but the hardware didn't update its counter yet, then we store too low a hardware counter. (Sensible hardware never does this. Apparently not all hardware is sensible.)

The thing is, 'the' IRQ and 'the' hardware counter aren't necessarily about the same thing. We want an IRQ which triggers at the beginning of vertical blank, but the Radeon hardware counters increment at the beginning of scanout, i.e. at the end of vertical blank. Does that make the hardware 'broken' or 'not sensible'?

Perhaps not. I think that using that counter as the return value of get_vblank_counter is a bit unfortunate in that case, though. Anyway, I'm not particularly attached to the comment as written.

Am I correct in guessing that Radeon fires its vblank irq at the beginning of vblank instead of at the end And doesn't this mean that Radeon has a separate issue when vblank_get happens during the vblank interval? A trick similar to drm_calc_vbltimestamp_from_scanoutpos ought to be able to fix that.

...

...
If, on the other hand, the counter updated but the IRQ didn't fire yet, we store too high a counter.

We handled the former case with a heuristic based on timestamps and we did not handle the latter case. By saving a little more state, we can handle both cases exactly: all we need to do is watch for changes in the difference between the hardware and software vblank counts.

I'm afraid that can't work:

Some (AFAIR also Intel) hardware resets the counter to 0 when the CRTC is disabled / enabled (e.g. for DPMS, or a modeset). That's why we ended up only counting interrupts while the IRQ is enabled, and only using the hardware counter to fill in while the IRQ is disabled. The hardware counter cannot be assumed to have any defined behaviour between enabling and disabling the IRQ.

To compensate for this, the drivers call drm_vblank_pre_modeset (which enables the IRQ, which also updates the virtual counter from the hardware counter) before disabling the CRTC and drm_vblank_post_modeset (which disables the IRQ, which also records the hardware counter) after enabling the CRTC.

Shouldn't this be fixable, though, by adding appropriate logic in drm_vblank_pre_modeset and drm_vblank_post_modeset?

Would this be a lot simpler if drm had a function (or call to the driver) to compute both the vblank count and the last vblank timestamp? That way the irq handler, drm_update_vblank_count, vblank_disable_and_save, and pre/post modeset could keep everything exactly in sync. IIRC i915 has this capability in hardware, and if Radeon indeed updates the frame count exactly at the end of the vblank interval, it could compute this exactly as well.

--Andy

Mario Kleiner

10:54 p.m.

New subject: [RFC PATCH] drm: Fix off-by-one races on vblank disable

Jesse, cc'ing you in case you have thoughts about this for the intel side of things?

On Nov 17, 2011, at 4:39 PM, Andrew Lutomirski wrote:

...

2011/11/17 Michel Dänzer michel@daenzer.net:

...
[ Dropping intel-gfx list from CC, as it automatically rejects posts from non-subscribers ]

On Mit, 2011-11-16 at 21:39 -0800, Andy Lutomirski wrote:

...
There are two possible races when disabling vblanks. If the IRQ fired but the hardware didn't update its counter yet, then we store too low a hardware counter. (Sensible hardware never does this. Apparently not all hardware is sensible.)

The thing is, 'the' IRQ and 'the' hardware counter aren't necessarily about the same thing. We want an IRQ which triggers at the beginning of vertical blank, but the Radeon hardware counters increment at the beginning of scanout, i.e. at the end of vertical blank. Does that make the hardware 'broken' or 'not sensible'?

Perhaps not. I think that using that counter as the return value of get_vblank_counter is a bit unfortunate in that case, though. Anyway, I'm not particularly attached to the comment as written.

Am I correct in guessing that Radeon fires its vblank irq at the beginning of vblank instead of at the end And doesn't this mean that Radeon has a separate issue when vblank_get happens during the vblank interval? A trick similar to drm_calc_vbltimestamp_from_scanoutpos ought to be able to fix that.

The Radeon's i tested (R500, R600) fire the vblank irq even a few scanlines before the start of vblank. I think most gpu's fire very close to the beginning of vblank, because one wants to use the vblank irq to trigger some work that needs to be done within the vblank interval. Some Radeon's sometimes fire a spurious vblank irq at the moment vblank irq's get enabled, althought that could also be some tiny bug in the kms driver, maybe missing some acknowledge of a pending irq somewhere in the vblank off or on function. That's the reason for some of the fun inside drm_calc_vbltimestamp_from_scanoutpos() and drm_handle_vblank(), to associate a vblank irq and the vblank timestamps with the correct vblank interval, even if the irq handler executes a bit outside the vblank interval.

...

...
...
If, on the other hand, the counter updated but the IRQ didn't fire yet, we store too high a counter.

We handled the former case with a heuristic based on timestamps and we did not handle the latter case. By saving a little more state, we can handle both cases exactly: all we need to do is watch for changes in the difference between the hardware and software vblank counts.

I'm afraid that can't work:

Some (AFAIR also Intel) hardware resets the counter to 0 when the CRTC is disabled / enabled (e.g. for DPMS, or a modeset). That's why we ended up only counting interrupts while the IRQ is enabled, and only using the hardware counter to fill in while the IRQ is disabled. The hardware counter cannot be assumed to have any defined behaviour between enabling and disabling the IRQ.

To compensate for this, the drivers call drm_vblank_pre_modeset (which enables the IRQ, which also updates the virtual counter from the hardware counter) before disabling the CRTC and drm_vblank_post_modeset (which disables the IRQ, which also records the hardware counter) after enabling the CRTC.

Shouldn't this be fixable, though, by adding appropriate logic in drm_vblank_pre_modeset and drm_vblank_post_modeset?

Would this be a lot simpler if drm had a function (or call to the driver) to compute both the vblank count and the last vblank timestamp? That way the irq handler, drm_update_vblank_count, vblank_disable_and_save, and pre/post modeset could keep everything exactly in sync. IIRC i915 has this capability in hardware, and if Radeon indeed updates the frame count exactly at the end of the vblank interval, it could compute this exactly as well.

I think Andrews idea is very close to what i proposed in Matthews RFC e-mail thread, if we use the scanout position as a "clock", except we could get rid of the calibration loop, the part i like least about my proposal, if we know for sure when each gpu increments its hardware counter.

At least intel, radeon and nouveau (hopefully soonish - patches almost ready for submission) expose the dev->driver-

...

get_scanout_position() hook, so we know the scanline at time of

vblank counter query. If we knew at which scanline each gpu increments its counter we could simply compare against scanline at counter query time and decide if we need to add 1 to the value or not. We could make the hardware counter value look like as if it is from a hardware counter that always increments at leading edge of vblank, which i think would be consistent with what the irq driven vblank increment does.

Pseudocode (with bonus goto included!):

// To be called from vblank irq on/off routines instead of driver-

...

get_vblank_count(crtc):

uint32 get_effective_hw_vblank_count(crtc) {

retry:

uint32 scanline_pre = driver->get_scanout_position(crtc); uint32 hw_count = driver->get_vblank_count(crtc); uint32 scanline_post = driver->get_scanout_position(crtc);

if ((scanline_pre outside vblank) && (scanline_post outside vblank) { // Inside active scanout -> All counters stable -> no problem: return hw_count; }

// Inside vblank -> careful! if (scanline_where_gpu_increments_its_counter intersects interval [scanline_pre ; scanline_post]) { // Exact time of get_vblank_count() query "uncertain" wrt. hw counter increment. cpu_relax(); // or something similar... goto retry; }

// Inside vblank. Query happened before or after hw increment? if (scanline_post < scanline_where_gpu_increments_its_counter) { // Query happened before hw counter increment for sure. Fake // the increment has already happened: hw_count++; }

return hw_count; }

Something like this? The worst case busy waiting time would be maybe 2 scanline durations ( < 50 microseconds) in the very rare case where we hit exactly the scanline where hw counter increments. That amount of busy waiting sounds acceptable to me, given that we'd still keep a vblank off delay of a few milliseconds to avoid many redundant on/off transitions during a typical OpenGL bufferswap and that this never gets directly called from interrupt context.

We could put this into the drm, then the drivers would need to tell the drm the scanline where increment happens, or we could replace the driver->get_vblank_count() hook in each driver with some variant of this? Drivers with such a thing could lower the vblank off delay to a few msecs, other drivers would leave it at a high value.

Thoughts? -mario

Andrew Lutomirski

11:03 p.m.

New subject: [RFC PATCH] drm: Fix off-by-one races on vblank disable

On Thu, Nov 17, 2011 at 2:54 PM, Mario Kleiner mario.kleiner@tuebingen.mpg.de wrote:

...

Jesse, cc'ing you in case you have thoughts about this for the intel side of things?

On Nov 17, 2011, at 4:39 PM, Andrew Lutomirski wrote:

...
2011/11/17 Michel Dänzer michel@daenzer.net:

...
[ Dropping intel-gfx list from CC, as it automatically rejects posts from non-subscribers ]

On Mit, 2011-11-16 at 21:39 -0800, Andy Lutomirski wrote:

...
There are two possible races when disabling vblanks. If the IRQ fired but the hardware didn't update its counter yet, then we store too low a hardware counter. (Sensible hardware never does this. Apparently not all hardware is sensible.)

The thing is, 'the' IRQ and 'the' hardware counter aren't necessarily about the same thing. We want an IRQ which triggers at the beginning of vertical blank, but the Radeon hardware counters increment at the beginning of scanout, i.e. at the end of vertical blank. Does that make the hardware 'broken' or 'not sensible'?

Perhaps not. I think that using that counter as the return value of get_vblank_counter is a bit unfortunate in that case, though. Anyway, I'm not particularly attached to the comment as written.

Am I correct in guessing that Radeon fires its vblank irq at the beginning of vblank instead of at the end And doesn't this mean that Radeon has a separate issue when vblank_get happens during the vblank interval? A trick similar to drm_calc_vbltimestamp_from_scanoutpos ought to be able to fix that.

The Radeon's i tested (R500, R600) fire the vblank irq even a few scanlines before the start of vblank. I think most gpu's fire very close to the beginning of vblank, because one wants to use the vblank irq to trigger some work that needs to be done within the vblank interval. Some Radeon's sometimes fire a spurious vblank irq at the moment vblank irq's get enabled, althought that could also be some tiny bug in the kms driver, maybe missing some acknowledge of a pending irq somewhere in the vblank off or on function. That's the reason for some of the fun inside drm_calc_vbltimestamp_from_scanoutpos() and drm_handle_vblank(), to associate a vblank irq and the vblank timestamps with the correct vblank interval, even if the irq handler executes a bit outside the vblank interval.

...
...
...
If, on the other hand, the counter updated but the IRQ didn't fire yet, we store too high a counter.

We handled the former case with a heuristic based on timestamps and we did not handle the latter case. By saving a little more state, we can handle both cases exactly: all we need to do is watch for changes in the difference between the hardware and software vblank counts.

I'm afraid that can't work:

Some (AFAIR also Intel) hardware resets the counter to 0 when the CRTC is disabled / enabled (e.g. for DPMS, or a modeset). That's why we ended up only counting interrupts while the IRQ is enabled, and only using the hardware counter to fill in while the IRQ is disabled. The hardware counter cannot be assumed to have any defined behaviour between enabling and disabling the IRQ.

To compensate for this, the drivers call drm_vblank_pre_modeset (which enables the IRQ, which also updates the virtual counter from the hardware counter) before disabling the CRTC and drm_vblank_post_modeset (which disables the IRQ, which also records the hardware counter) after enabling the CRTC.

Shouldn't this be fixable, though, by adding appropriate logic in drm_vblank_pre_modeset and drm_vblank_post_modeset?

Would this be a lot simpler if drm had a function (or call to the driver) to compute both the vblank count and the last vblank timestamp? That way the irq handler, drm_update_vblank_count, vblank_disable_and_save, and pre/post modeset could keep everything exactly in sync. IIRC i915 has this capability in hardware, and if Radeon indeed updates the frame count exactly at the end of the vblank interval, it could compute this exactly as well.

I think Andrews idea is very close to what i proposed in Matthews RFC e-mail thread, if we use the scanout position as a "clock", except we could get rid of the calibration loop, the part i like least about my proposal, if we know for sure when each gpu increments its hardware counter.

At least intel, radeon and nouveau (hopefully soonish - patches almost ready for submission) expose the dev->driver->get_scanout_position() hook, so we know the scanline at time of vblank counter query. If we knew at which scanline each gpu increments its counter we could simply compare against scanline at counter query time and decide if we need to add 1 to the value or not. We could make the hardware counter value look like as if it is from a hardware counter that always increments at leading edge of vblank, which i think would be consistent with what the irq driven vblank increment does.

Pseudocode (with bonus goto included!):

// To be called from vblank irq on/off routines instead of driver->get_vblank_count(crtc): uint32 get_effective_hw_vblank_count(crtc) {

retry:

uint32 scanline_pre = driver->get_scanout_position(crtc); uint32 hw_count = driver->get_vblank_count(crtc); uint32 scanline_post = driver->get_scanout_position(crtc);

if ((scanline_pre outside vblank) && (scanline_post outside vblank) { // Inside active scanout -> All counters stable -> no problem: return hw_count; }

// Inside vblank -> careful! if (scanline_where_gpu_increments_its_counter intersects interval [scanline_pre ; scanline_post]) { // Exact time of get_vblank_count() query "uncertain" wrt. hw counter increment. cpu_relax(); // or something similar... goto retry; }

// Inside vblank. Query happened before or after hw increment? if (scanline_post < scanline_where_gpu_increments_its_counter) { // Query happened before hw counter increment for sure. Fake // the increment has already happened: hw_count++; }

return hw_count; }

Something like this? The worst case busy waiting time would be maybe 2 scanline durations ( < 50 microseconds) in the very rare case where we hit exactly the scanline where hw counter increments. That amount of busy waiting sounds acceptable to me, given that we'd still keep a vblank off delay of a few milliseconds to avoid many redundant on/off transitions during a typical OpenGL bufferswap and that this never gets directly called from interrupt context.

We could put this into the drm, then the drivers would need to tell the drm the scanline where increment happens, or we could replace the driver->get_vblank_count() hook in each driver with some variant of this? Drivers with such a thing could lower the vblank off delay to a few msecs, other drivers would leave it at a high value.

I have some partly-written code to query the vblank count and scanout position together (it'll be more efficient that way, I think, as well as simpler). I'll see how it works tonight on i915.

For added fun, can we get notified when the system is about to go idle? A decent heuristic for when to turn off vblanks might be if the system goes idle with refcount == 0 or if a vblank happens that no one cares about.

--Andy

Mario Kleiner

8:36 p.m.

On Nov 17, 2011, at 3:19 AM, Matthew Garrett wrote:

...

On Thu, Nov 17, 2011 at 01:26:37AM +0100, Mario Kleiner wrote:

...
On Nov 16, 2011, at 7:48 PM, Matthew Garrett wrote:

...
I'll admit that I'm struggling to understand the issue here. If the vblank counter is incremented at the time of vblank (which isn't the case for radeon, it seems, but as far as I can tell is the case for Intel) then how does ping-ponging the IRQ matter? vblank_disable_and_save() appears to handle this case.

The drm software vblank counter which is used for scheduling swaps etc. gets incremented in the gpu's vblank irq handler. The gpu's hardware counter gets incremented somewhere inside the vblank interval. The increments don't happen at the same point in time. The race is between the vblank on/off code, which gets scheduled either by the timeout timer for the "off" case, or by usercode for the "on" case and the gpu's hardware vblank counter.

Yes, that makes sense given the current design.

...
The existing code uses a lock (vblank_time_lock) to prevent some races between itself and the vblank irq handler, but it can't "lock" the gpu, so if the enable/disable code executes while the gpu is scanning out the vblank interval, it is undefined if the final vblank count will be correct or off by one. Vblank duration is somewhere up to 4.5% of refresh interval duration, so there's your up to 4% chance of getting it wrong.

Well presumably by "undefined" we really mean "hardware-specific" - for any given well-functioning hardware I'd expect the answer to be well-defined. Like I said, with Radeon we have the option of triggering an interrupt at the point where the hardware counter is actually incremented. If that then shuts down the vblank interrupt then we should have a well-defined answer.

Yes, "hardware-specific". It is only "undefined" from the point of view of the drm core in the sense that the code in the core is currently agnostic to what the underlying gpu/driver does.

...

...
Use the crtc scanout position queries together with the vblank counter queries inside some calibration loop, maybe executed after each modeset, to find out the scanline range in which the hardware vblank counter increments -- basically a forbidden range of scanline positions where the race would happen. Then at each vblank off/on, query scanout position before and after the hw vblank counter query. If according to the scanout positions the vblank counter query happened within the forbidden time window, retry the query. With a well working calibration that should add no delay in most cases and a delay to the on/off code of a few dozen microseconds (=duration of a few scanlines) worst case.

Assuming we're sleeping rather than busy-looping, that's certainly ok. My previous experiments with radeon indicated that the scanout irq was certainly not entirely reliable - on the other hand, I was trying to use it for completing memory reclocking within the vblank interval. It was typically still within a few scanlines, so a sanity check there wouldn't pose too much of a problem.

Sleeping in the timer triggered off path would be ok, but in the on- path we need to be relatively fast, so i think some kind of cpu friendly busy waiting (cpu_relax() or something like that, if i understand its purpose?) probably would be neccessary.

...

...
...
I've no problem with all of this work being case by case.

Me neither. I just say if you'd go to the extreme and disable vblank irq's immediately with zero delay, without reliably fixing the problem i mentioned, you'd get those off by one counts, which would be very bad for apps that rely on precise timing. Doing so would basically make the whole oml_sync_control implementation mostly useless.

Right. My testing of sandybridge suggests that there wasn't a problem here - even with the ping-ponging I was reliably getting 60 interrupts in 60 seconds, with the counter incrementing by 1 each time. I certainly wouldn't push to enable it elsewhere without making sure that the results are reliable.

How hard did you test it? Given that the off-by-one is a race- condition, there is a good chance you miss the bad cases due to lucky timing. I think one way to test the ping-pong case might be a tight loop with calls to glXGetSyncValuesOML(), or at a lower level a tight loop to the drmWaitVblank() ioctl, which is what glXGetSyncValuesOML () does.

That would be a sequence of drm_vblank_get() -> irq's on, reinitalize vblank counter and timestamps -> Fetch values -> drm_vblank_put() -> irq's off etc. If you run this with a couple of thousand calls per second there would be some coverage of the vblank period. Either that or some funkyness because we wouldn't exercise the vblank irq anymore for vblank counting and miss something there for some reason...

...

...
...
If vblanks are disabled and then re-enabled between 1 and 3, what's the negative outcome?

glXGetSyncValuesOML() gives you the current vblank count and a

timestamp of when exactly scanout of the corresponding video frame started. These values are the baseline for any vblank based swap scheduling or any other vblank synchronized actions, e.g., via glXWaitForMscOML().

App calculates stuff based on 1. but gets preempted or

experiences scheduling delay on return from 1. - or the x-server gets preempted or scheduled away. Whatever, time passes...

glXSwapBuffersMscOML() or glXWaitForMscOML() is used to schedule

something to occur at a target vblank count, based on the results of

As long as vblanks don't get disabled between 1 and 3, everything is consistent. Scheduling delays/preemption of more than a few (dozen) milliseconds worst case are very rare, so a lowered vblank off delay of even one or two refresh cycles would solve the problem. I assume that most toolkits with needs for timing precision would follow a three step strategy like this one.

If vblank irq's get disabled immediately after step 1. and reenabled at step 3., and this triggers an off by one error due to a race, then the calculated target vblank count from steps 1/2 is invalid and useless for step 3 and many vision scientists which just started to make friendship with Linux lately will make very sad faces. Due to the nature of many of the stimulation paradigms used, there is also a high chance that many of the enables and disables would happen inside or very close to the problematic vblank interval when off by one error is most likely.

Ok, so as long as we believe that we're reliably reading the hardware counter and not getting off by one, we should be ok? I just want to be clear that this is dependent on hardware behaviour rather than being absolutely inherent :)

Yes. The way we need to find the final/new software/hardware vblank count at irq off /on time must be consistent with what would have happened if the counters would have been incremented by "just another vblank irq".

But apart from this, i would assume that even if we can remove all races, it probably would still make sense to always have some small vblank off delay, even if it is only half a video refresh cycle? During execution of a bufferswap or a three-step procedure like my toolkit does, or a desktop composition pass (at least the way compiz does it, didn't check other compositors), we will usually have multiple drm_vblank_get() -> drm_vblank_put() invocations in quick succession, triggered by the ddx inside the x-server and the drm code during swap preparation and swap completion. Without a delay of at least a few milliseconds the code would turn on and off the irq's multiple times within a few milliseconds, each time involving locking, transactions with the gpu, some memory barriers, etc. As long as the desktop is busy with some OpenGL animations, vblank irq's will neccessarily fire for each refresh cycle regardless what the timeout is. And for a small timeout of a few milliseconds, when the desktop finally goes idle, you'd probably end up with at most one extra vblank irq, but save a little bit of cpu time for multiple on/ off transitions for each frame during the busy period.

Out of pure interest, how much power does vblank off actually save, compared to waking up the cpu every 16 msecs?

...

...
I agree with your patches, i just don't want vblank irq's to be disabled immediately with a zero timeout before remaining races are fixed, that would sure break things at least for scientific applications. And i'd like to keep some optional parameter that allows desparate users to disable the vblank off mechanism in case they would encounter a bug.

Yeah, no problem. I understand that completely.

Great, thanks. -mario

...

-- Matthew Garrett | mjg59@srcf.ucam.org

********************************************************************* Mario Kleiner Max Planck Institute for Biological Cybernetics Spemannstr. 38 72076 Tuebingen Germany

Matthew Garrett

8:46 p.m.

On Thu, Nov 17, 2011 at 09:36:23PM +0100, Mario Kleiner wrote:

...

On Nov 17, 2011, at 3:19 AM, Matthew Garrett wrote:

...
Assuming we're sleeping rather than busy-looping, that's certainly ok. My previous experiments with radeon indicated that the scanout irq was certainly not entirely reliable - on the other hand, I was trying to use it for completing memory reclocking within the vblank interval. It was typically still within a few scanlines, so a sanity check there wouldn't pose too much of a problem.

Sleeping in the timer triggered off path would be ok, but in the on- path we need to be relatively fast, so i think some kind of cpu friendly busy waiting (cpu_relax() or something like that, if i understand its purpose?) probably would be neccessary.

The aim is to reduce the wakeup count and spend more time in deep C states. Busy waiting defeats that.

...

...
Right. My testing of sandybridge suggests that there wasn't a problem here - even with the ping-ponging I was reliably getting 60 interrupts in 60 seconds, with the counter incrementing by 1 each time. I certainly wouldn't push to enable it elsewhere without making sure that the results are reliable.

How hard did you test it? Given that the off-by-one is a race- condition, there is a good chance you miss the bad cases due to lucky timing. I think one way to test the ping-pong case might be a tight loop with calls to glXGetSyncValuesOML(), or at a lower level a tight loop to the drmWaitVblank() ioctl, which is what glXGetSyncValuesOML() does.

Enable vblank, wait for vblank, immediately disable vblank, read counter, wait 5msec, repeat. Verify that I get 60 vblanks per second and that the counter incremented by 60. Tested with various delayoff values, which should have had the effect of making it likely that the disable would fall at various points throughout the scanout. It's not definitive, so I'm happy to do other tests.

...

...
Ok, so as long as we believe that we're reliably reading the hardware counter and not getting off by one, we should be ok? I just want to be clear that this is dependent on hardware behaviour rather than being absolutely inherent :)

Yes. The way we need to find the final/new software/hardware vblank count at irq off /on time must be consistent with what would have happened if the counters would have been incremented by "just another vblank irq".

Ok.

...

But apart from this, i would assume that even if we can remove all races, it probably would still make sense to always have some small vblank off delay, even if it is only half a video refresh cycle? During execution of a bufferswap or a three-step procedure like my toolkit does, or a desktop composition pass (at least the way compiz does it, didn't check other compositors), we will usually have multiple drm_vblank_get() -> drm_vblank_put() invocations in quick succession, triggered by the ddx inside the x-server and the drm code during swap preparation and swap completion. Without a delay of at least a few milliseconds the code would turn on and off the irq's multiple times within a few milliseconds, each time involving locking, transactions with the gpu, some memory barriers, etc. As long as the desktop is busy with some OpenGL animations, vblank irq's will neccessarily fire for each refresh cycle regardless what the timeout is. And for a small timeout of a few milliseconds, when the desktop finally goes idle, you'd probably end up with at most one extra vblank irq, but save a little bit of cpu time for multiple on/off transitions for each frame during the busy period.

As long as the delay is shorter than a frame then yes, it ought to be fine.

...

Out of pure interest, how much power does vblank off actually save, compared to waking up the cpu every 16 msecs?

It depends on the state of the rest of the system. If you're under load, basically nothing. If you're otherwise idle, around a Watt or so.

-- Matthew Garrett | mjg59@srcf.ucam.org

Matthew Garrett

4 May 4 May

6:31 p.m.

New subject: [Intel-gfx] [RFC] Reduce idle vblank wakeups

Anyone have any further thoughts on this?

-- Matthew Garrett | mjg59@srcf.ucam.org

4736

Age (days ago)

4906

Last active (days ago)

dri-devel@lists.freedesktop.org

20 comments

7 participants

tags (0)

participants (7)

Adam Jackson
Andrew Lutomirski
Andy Lutomirski
Mario Kleiner
Matthew Garrett
Matthew Garrett
Michel Dänzer