Hi Dave,
this is v2 of the patch set for improving/restoring accuracy and robustness of vblank timestamping and for fixing incompatibilities with the PREEMPT_RT patches.
Could you please merge this for the next kernel? Would be good to have the old accuracy restored as soon as possible. Thanks.
v2: Added the reviewed-by's of Ville and Alex, thanks for the review! Fixed multi-line code formatting as suggested by Ville.
Successfully tested on Intel and AMD Radeon hardware.
thanks, -mario
Preemption handling will get pushed into the kms drivers in followup patches, to make timestamping more robust and PREEMPT_RT friendly.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com Reviewed-by: Alex Deucher alexander.deucher@amd.com --- drivers/gpu/drm/drm_irq.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index f9af048..33ee515 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -586,11 +586,6 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct drm_device *dev, int crtc, * code gets preempted or delayed for some reason. */ for (i = 0; i < DRM_TIMESTAMP_MAXRETRIES; i++) { - /* Disable preemption to make it very likely to - * succeed in the first iteration even on PREEMPT_RT kernel. - */ - preempt_disable(); - /* Get system timestamp before query. */ stime = ktime_get();
@@ -602,8 +597,6 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct drm_device *dev, int crtc, if (!drm_timestamp_monotonic) mono_time_offset = ktime_get_monotonic_offset();
- preempt_enable(); - /* Return as no-op if scanout query unsupported or failed. */ if (!(vbl_status & DRM_SCANOUTPOS_VALID)) { DRM_DEBUG("crtc %d : scanoutpos query failed [%d].\n",
A change in locking of some kms drivers (currently intel-kms) make the old approach too inaccurate and also incompatible with the PREEMPT_RT realtime kernel patchset.
The driver->get_scanout_position() method of intel-kms now needs to aquire a spinlock, which clashes badly with the former preempt_disable() calls in the drm, and it also introduces larger delays and timing uncertainty on a contended lock than acceptable.
This patch changes the prototype of driver->get_scanout_position() to require/allow kms drivers to perform the ktime_get() system time queries which go along with actual scanout position readout in a way that provides maximum precision and to return those timestamps to the drm. kms drivers implementations of get_scanout_position() are asked to implement timestamping and scanoutpos readout in a way that is as precise as possible and compatible with preempt_disable() on a PREMPT_RT kernel. A driver should follow this pattern in get_scanout_position() for precision and compatibility:
spin_lock...(...); preempt_disable_rt(); // On a PREEMPT_RT kernel, otherwise omit. if (stime) *stime = ktime_get(); ... Minimum amount of MMIO register reads to get scanout position ... ... no taking of locks allowed here! ... if (etime) *etime = ktime_get(); preempt_enable_rt(); // On PREEMPT_RT kernel, otherwise omit. spin_unlock...(...);
v2: Fix formatting of new multi-line code comments.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com Reviewed-by: Alex Deucher alexander.deucher@amd.com --- drivers/gpu/drm/drm_irq.c | 20 ++++++++++++-------- include/drm/drmP.h | 10 ++++++++-- 2 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 33ee515..d80d952 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -219,7 +219,7 @@ int drm_vblank_init(struct drm_device *dev, int num_crtcs) for (i = 0; i < num_crtcs; i++) init_waitqueue_head(&dev->vblank[i].queue);
- DRM_INFO("Supports vblank timestamp caching Rev 1 (10.10.2010).\n"); + DRM_INFO("Supports vblank timestamp caching Rev 2 (21.10.2013).\n");
/* Driver specific high-precision vblank timestamping supported? */ if (dev->driver->get_vblank_timestamp) @@ -586,14 +586,17 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct drm_device *dev, int crtc, * code gets preempted or delayed for some reason. */ for (i = 0; i < DRM_TIMESTAMP_MAXRETRIES; i++) { - /* Get system timestamp before query. */ - stime = ktime_get(); - - /* Get vertical and horizontal scanout pos. vpos, hpos. */ - vbl_status = dev->driver->get_scanout_position(dev, crtc, &vpos, &hpos); + /* + * Get vertical and horizontal scanout position vpos, hpos, + * and bounding timestamps stime, etime, pre/post query. + */ + vbl_status = dev->driver->get_scanout_position(dev, crtc, &vpos, + &hpos, &stime, &etime);
- /* Get system timestamp after query. */ - etime = ktime_get(); + /* + * Get correction for CLOCK_MONOTONIC -> CLOCK_REALTIME if + * CLOCK_REALTIME is requested. + */ if (!drm_timestamp_monotonic) mono_time_offset = ktime_get_monotonic_offset();
@@ -604,6 +607,7 @@ int drm_calc_vbltimestamp_from_scanoutpos(struct drm_device *dev, int crtc, return -EIO; }
+ /* Compute uncertainty in timestamp of scanout position query. */ duration_ns = ktime_to_ns(etime) - ktime_to_ns(stime);
/* Accept result with < max_error nsecs timing uncertainty. */ diff --git a/include/drm/drmP.h b/include/drm/drmP.h index 2b954ad..48d15f0 100644 --- a/include/drm/drmP.h +++ b/include/drm/drmP.h @@ -835,12 +835,17 @@ struct drm_driver { /** * Called by vblank timestamping code. * - * Return the current display scanout position from a crtc. + * Return the current display scanout position from a crtc, and an + * optional accurate ktime_get timestamp of when position was measured. * * \param dev DRM device. * \param crtc Id of the crtc to query. * \param *vpos Target location for current vertical scanout position. * \param *hpos Target location for current horizontal scanout position. + * \param *stime Target location for timestamp taken immediately before + * scanout position query. Can be NULL to skip timestamp. + * \param *etime Target location for timestamp taken immediately after + * scanout position query. Can be NULL to skip timestamp. * * Returns vpos as a positive number while in active scanout area. * Returns vpos as a negative number inside vblank, counting the number @@ -857,7 +862,8 @@ struct drm_driver { * */ int (*get_scanout_position) (struct drm_device *dev, int crtc, - int *vpos, int *hpos); + int *vpos, int *hpos, ktime_t *stime, + ktime_t *etime);
/** * Called by \c drm_get_last_vbltimestamp. Should return a precise
Move the ktime_get() clock readouts and potential preempt_disable() calls from drm core into kms driver to make it compatible with the api changes in the drm core.
This should not introduce any change in functionality or behaviour in radeon-kms, just a reshuffling of code.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com Reviewed-by: Alex Deucher alexander.deucher@amd.com --- drivers/gpu/drm/radeon/radeon_display.c | 24 +++++++++++++++++++++--- drivers/gpu/drm/radeon/radeon_drv.c | 3 ++- drivers/gpu/drm/radeon/radeon_mode.h | 3 ++- drivers/gpu/drm/radeon/radeon_pm.c | 2 +- 4 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 0d1aa05..ccd8751 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -306,7 +306,7 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id) */ if (update_pending && (DRM_SCANOUTPOS_VALID & radeon_get_crtc_scanoutpos(rdev->ddev, crtc_id, - &vpos, &hpos)) && + &vpos, &hpos, NULL, NULL)) && ((vpos >= (99 * rdev->mode_info.crtcs[crtc_id]->base.hwmode.crtc_vdisplay)/100) || (vpos < 0 && !ASIC_IS_AVIVO(rdev)))) { /* crtc didn't flip in this target vblank interval, @@ -1539,12 +1539,17 @@ bool radeon_crtc_scaling_mode_fixup(struct drm_crtc *crtc, }
/* - * Retrieve current video scanout position of crtc on a given gpu. + * Retrieve current video scanout position of crtc on a given gpu, and + * an optional accurate timestamp of when query happened. * * \param dev Device to query. * \param crtc Crtc to query. * \param *vpos Location where vertical scanout position should be stored. * \param *hpos Location where horizontal scanout position should go. + * \param *stime Target location for timestamp taken immediately before + * scanout position query. Can be NULL to skip timestamp. + * \param *etime Target location for timestamp taken immediately after + * scanout position query. Can be NULL to skip timestamp. * * Returns vpos as a positive number while in active scanout area. * Returns vpos as a negative number inside vblank, counting the number @@ -1560,7 +1565,8 @@ bool radeon_crtc_scaling_mode_fixup(struct drm_crtc *crtc, * unknown small number of scanlines wrt. real scanout position. * */ -int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, int *vpos, int *hpos) +int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, int *vpos, int *hpos, + ktime_t *stime, ktime_t *etime) { u32 stat_crtc = 0, vbl = 0, position = 0; int vbl_start, vbl_end, vtotal, ret = 0; @@ -1568,6 +1574,12 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, int *vpos, int
struct radeon_device *rdev = dev->dev_private;
+ /* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */ + + /* Get optional system timestamp before query. */ + if (stime) + *stime = ktime_get(); + if (ASIC_IS_DCE4(rdev)) { if (crtc == 0) { vbl = RREG32(EVERGREEN_CRTC_V_BLANK_START_END + @@ -1650,6 +1662,12 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, int *vpos, int } }
+ /* Get optional system timestamp after query. */ + if (etime) + *etime = ktime_get(); + + /* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */ + /* Decode into vertical and horizontal scanout position. */ *vpos = position & 0x1fff; *hpos = (position >> 16) & 0x1fff; diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c index 22f6858..101e7c0 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -106,7 +106,8 @@ int radeon_gem_object_open(struct drm_gem_object *obj, void radeon_gem_object_close(struct drm_gem_object *obj, struct drm_file *file_priv); extern int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, - int *vpos, int *hpos); + int *vpos, int *hpos, ktime_t *stime, + ktime_t *etime); extern const struct drm_ioctl_desc radeon_ioctls_kms[]; extern int radeon_max_kms_ioctl; int radeon_mmap(struct file *filp, struct vm_area_struct *vma); diff --git a/drivers/gpu/drm/radeon/radeon_mode.h b/drivers/gpu/drm/radeon/radeon_mode.h index ef63d3f..3bfa910 100644 --- a/drivers/gpu/drm/radeon/radeon_mode.h +++ b/drivers/gpu/drm/radeon/radeon_mode.h @@ -758,7 +758,8 @@ extern int radeon_crtc_cursor_move(struct drm_crtc *crtc, int x, int y);
extern int radeon_get_crtc_scanoutpos(struct drm_device *dev, int crtc, - int *vpos, int *hpos); + int *vpos, int *hpos, ktime_t *stime, + ktime_t *etime);
extern bool radeon_combios_check_hardcoded_edid(struct radeon_device *rdev); extern struct edid * diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c index ac07ad1..1bbeac9 100644 --- a/drivers/gpu/drm/radeon/radeon_pm.c +++ b/drivers/gpu/drm/radeon/radeon_pm.c @@ -1465,7 +1465,7 @@ static bool radeon_pm_in_vbl(struct radeon_device *rdev) */ for (crtc = 0; (crtc < rdev->num_crtc) && in_vbl; crtc++) { if (rdev->pm.active_crtcs & (1 << crtc)) { - vbl_status = radeon_get_crtc_scanoutpos(rdev->ddev, crtc, &vpos, &hpos); + vbl_status = radeon_get_crtc_scanoutpos(rdev->ddev, crtc, &vpos, &hpos, NULL, NULL); if ((vbl_status & DRM_SCANOUTPOS_VALID) && !(vbl_status & DRM_SCANOUTPOS_INVBL)) in_vbl = false;
Move the ktime_get() clock readouts and potential preempt_disable() calls from drm core into kms driver to make it compatible with the api changes in the drm core.
The intel-kms driver needs to take the uncore.lock inside i915_get_crtc_scanoutpos() and intel_pipe_in_vblank(). This is incompatible with the preempt_disable() on a PREEMPT_RT patched kernel, as regular spin locks must not be taken within a preempt_disable'd section. Lock contention on the uncore.lock also introduced too much uncertainty in vblank timestamps.
Push the ktime_get() timestamping for scanoutpos queries and potential preempt_disable_rt() into i915_get_crtc_scanoutpos(), so these problems can be avoided:
1. First lock the uncore.lock (might sleep on a PREEMPT_RT kernel). 2. preempt_disable_rt() (will be added by the rt-linux folks). 3. ktime_get() a timestamp before scanout pos query. 4. Do all mmio reads as fast as possible without grabbing any new locks! 5. ktime_get() a post-query timestamp. 6. preempt_enable_rt() 7. Unlock the uncore.lock.
This reduces timestamp uncertainty on a low-end HP Atom Mini netbook with Intel GMA-950 nicely:
Before: 3-8 usecs with spikes > 20 usecs, triggering query retries. After : Typically 1 usec (98% of all samples), occassionally 2 usecs (2% of all samples), with maximum of 3 usecs (a handful).
v2: Fix formatting of new multi-line code comments.
Signed-off-by: Mario Kleiner mario.kleiner.de@gmail.com Reviewed-by: Ville Syrjälä ville.syrjala@linux.intel.com Reviewed-by: Alex Deucher alexander.deucher@amd.com --- drivers/gpu/drm/i915/i915_irq.c | 54 +++++++++++++++++++++++++++++++-------- 1 file changed, 43 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 156a1a4..7cafe64 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -599,35 +599,40 @@ static u32 gm45_get_vblank_counter(struct drm_device *dev, int pipe) return I915_READ(reg); }
-static bool intel_pipe_in_vblank(struct drm_device *dev, enum pipe pipe) +/* raw reads, only for fast reads of display block, no need for forcewake etc. */ +#define __raw_i915_read32(dev_priv__, reg__) readl((dev_priv__)->regs + (reg__)) +#define __raw_i915_read16(dev_priv__, reg__) readw((dev_priv__)->regs + (reg__)) + +static bool intel_pipe_in_vblank_locked(struct drm_device *dev, enum pipe pipe) { struct drm_i915_private *dev_priv = dev->dev_private; uint32_t status; + int reg;
if (IS_VALLEYVIEW(dev)) { status = pipe == PIPE_A ? I915_DISPLAY_PIPE_A_VBLANK_INTERRUPT : I915_DISPLAY_PIPE_B_VBLANK_INTERRUPT;
- return I915_READ(VLV_ISR) & status; + reg = VLV_ISR; } else if (IS_GEN2(dev)) { status = pipe == PIPE_A ? I915_DISPLAY_PIPE_A_VBLANK_INTERRUPT : I915_DISPLAY_PIPE_B_VBLANK_INTERRUPT;
- return I915_READ16(ISR) & status; + reg = ISR; } else if (INTEL_INFO(dev)->gen < 5) { status = pipe == PIPE_A ? I915_DISPLAY_PIPE_A_VBLANK_INTERRUPT : I915_DISPLAY_PIPE_B_VBLANK_INTERRUPT;
- return I915_READ(ISR) & status; + reg = ISR; } else if (INTEL_INFO(dev)->gen < 7) { status = pipe == PIPE_A ? DE_PIPEA_VBLANK : DE_PIPEB_VBLANK;
- return I915_READ(DEISR) & status; + reg = DEISR; } else { switch (pipe) { default: @@ -642,12 +647,17 @@ static bool intel_pipe_in_vblank(struct drm_device *dev, enum pipe pipe) break; }
- return I915_READ(DEISR) & status; + reg = DEISR; } + + if (IS_GEN2(dev)) + return __raw_i915_read16(dev_priv, reg) & status; + else + return __raw_i915_read32(dev_priv, reg) & status; }
static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe, - int *vpos, int *hpos) + int *vpos, int *hpos, ktime_t *stime, ktime_t *etime) { struct drm_i915_private *dev_priv = dev->dev_private; struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe]; @@ -657,6 +667,7 @@ static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe, int vbl_start, vbl_end, htotal, vtotal; bool in_vbl = true; int ret = 0; + unsigned long irqflags;
if (!intel_crtc->active) { DRM_DEBUG_DRIVER("trying to get scanoutpos for disabled " @@ -671,14 +682,27 @@ static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe,
ret |= DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE;
+ /* + * Lock uncore.lock, as we will do multiple timing critical raw + * register reads, potentially with preemption disabled, so the + * following code must not block on uncore.lock. + */ + spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); + + /* preempt_disable_rt() should go right here in PREEMPT_RT patchset. */ + + /* Get optional system timestamp before query. */ + if (stime) + *stime = ktime_get(); + if (IS_GEN2(dev) || IS_G4X(dev) || INTEL_INFO(dev)->gen >= 5) { /* No obvious pixelcount register. Only query vertical * scanout position from Display scan line register. */ if (IS_GEN2(dev)) - position = I915_READ(PIPEDSL(pipe)) & DSL_LINEMASK_GEN2; + position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN2; else - position = I915_READ(PIPEDSL(pipe)) & DSL_LINEMASK_GEN3; + position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
/* * The scanline counter increments at the leading edge @@ -687,7 +711,7 @@ static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe, * to get a more accurate picture whether we're in vblank * or not. */ - in_vbl = intel_pipe_in_vblank(dev, pipe); + in_vbl = intel_pipe_in_vblank_locked(dev, pipe); if ((in_vbl && position == vbl_start - 1) || (!in_vbl && position == vbl_end - 1)) position = (position + 1) % vtotal; @@ -696,7 +720,7 @@ static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe, * We can split this into vertical and horizontal * scanout position. */ - position = (I915_READ(PIPEFRAMEPIXEL(pipe)) & PIPE_PIXEL_MASK) >> PIPE_PIXEL_SHIFT; + position = (__raw_i915_read32(dev_priv, PIPEFRAMEPIXEL(pipe)) & PIPE_PIXEL_MASK) >> PIPE_PIXEL_SHIFT;
/* convert to pixel counts */ vbl_start *= htotal; @@ -704,6 +728,14 @@ static int i915_get_crtc_scanoutpos(struct drm_device *dev, int pipe, vtotal *= htotal; }
+ /* Get optional system timestamp after query. */ + if (etime) + *etime = ktime_get(); + + /* preempt_enable_rt() should go right here in PREEMPT_RT patchset. */ + + spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags); + in_vbl = position >= vbl_start && position < vbl_end;
/*
dri-devel@lists.freedesktop.org