Without async flip support in the kernel, fullscreen apps where game resolution is equal to the screen resolution, must perform an extra blit per frame prior to flipping.
Asynchronous page flips will also boost the FPS of Mesa benchmarks.
v2: -Few patches have been squashed and patches have been shuffled as per the reviews on the previous version.
v3: -Few patches have been squashed and patches have been shuffled as per the reviews on the previous version.
v4: -Made changes to fix the sequence and time stamp issue as per the comments received on the previous version. -Timestamps are calculated using the flip done time stamp and current timestamp. Here I915_MODE_FLAG_GET_SCANLINE_FROM_TIMESTAMP flag is used for timestamp calculations. -Event is sent from the interrupt handler immediately using this updated timestamps and sequence. -Added more state checks as async flip should only allow change in plane surface address and nothing else should be allowed to change. -Added a separate plane hook for async flip. -Need to find a way to reject fbc enabling if it comes as part of this flip as bspec states that changes to FBC are not allowed.
v5: -Fixed the Checkpatch and sparse warnings.
Karthik B S (5): drm/i915: Add enable/disable flip done and flip done handler drm/i915: Add support for async flips in I915 drm/i915: Add checks specific to async flips drm/i915: Do not call drm_crtc_arm_vblank_event in async flips drm/i915: Enable async flips in i915
drivers/gpu/drm/i915/display/intel_display.c | 123 +++++++++++++++++++ drivers/gpu/drm/i915/display/intel_sprite.c | 33 ++++- drivers/gpu/drm/i915/i915_irq.c | 83 +++++++++++-- drivers/gpu/drm/i915/i915_irq.h | 2 + drivers/gpu/drm/i915/i915_reg.h | 5 +- 5 files changed, 237 insertions(+), 9 deletions(-)
Add enable/disable flip done functions and the flip done handler function which handles the flip done interrupt.
Enable the flip done interrupt in IER.
Enable flip done function is called before writing the surface address register as the write to this register triggers the flip done interrupt
Flip done handler is used to send the page flip event as soon as the surface address is written as per the requirement of async flips. The interrupt is disabled after the event is sent.
v2: -Change function name from icl_* to skl_* (Paulo) -Move flip handler to this patch (Paulo) -Remove vblank_put() (Paulo) -Enable flip done interrupt for gen9+ only (Paulo) -Enable flip done interrupt in power_well_post_enable hook (Paulo) -Removed the event check in flip done handler to handle async flips without pageflip events.
v3: -Move skl_disable_flip_done out of interrupt handler (Paulo) -Make the pending vblank event NULL in the beginning of flip_done_handler to remove sporadic WARN_ON that is seen.
v4: -Calculate timestamps using flip done time stamp and current timestamp for async flips (Ville)
v5: -Fix the sparse warning by making the function 'g4x_get_flip_counter' static.(Reported-by: kernel test robot lkp@intel.com) -Fix the typo in commit message.
Signed-off-by: Karthik B S karthik.b.s@intel.com Signed-off-by: Vandita Kulkarni vandita.kulkarni@intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 10 +++ drivers/gpu/drm/i915/i915_irq.c | 83 ++++++++++++++++++-- drivers/gpu/drm/i915/i915_irq.h | 2 + drivers/gpu/drm/i915/i915_reg.h | 4 +- 4 files changed, 91 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index db2a5a1a9b35..b8ff032195d9 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -15562,6 +15562,13 @@ static void intel_atomic_commit_tail(struct intel_atomic_state *state)
intel_dbuf_pre_plane_update(state);
+ for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) { + if (new_crtc_state->uapi.async_flip) { + skl_enable_flip_done(&crtc->base); + break; + } + } + /* Now enable the clocks, plane, pipe, and connectors that we set up. */ dev_priv->display.commit_modeset_enables(state);
@@ -15583,6 +15590,9 @@ static void intel_atomic_commit_tail(struct intel_atomic_state *state) drm_atomic_helper_wait_for_flip_done(dev, &state->base);
for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) { + if (new_crtc_state->uapi.async_flip) + skl_disable_flip_done(&crtc->base); + if (new_crtc_state->hw.active && !needs_modeset(new_crtc_state) && !new_crtc_state->preload_luts && diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xffffff; }
+static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{ + struct drm_i915_private *dev_priv = to_i915(crtc->dev); + enum pipe pipe = to_intel_crtc(crtc)->pipe; + + return I915_READ(PIPE_FLIPCOUNT_G4X(pipe)); +} + u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe;
+ if (crtc->state->async_flip) + return g4x_get_flip_counter(crtc); + return I915_READ(PIPE_FRMCOUNT_G4X(pipe)); } - /* * On certain encoders on certain platforms, pipe * scanline register will not work to get the scanline, @@ -737,17 +747,24 @@ static u32 __intel_get_crtc_scanline_from_timestamp(struct intel_crtc *crtc) * pipe frame time stamp. The time stamp value * is sampled at every start of vertical blank. */ - scan_prev_time = intel_de_read_fw(dev_priv, - PIPE_FRMTMSTMP(crtc->pipe)); - + if (!crtc->config->uapi.async_flip) + scan_prev_time = intel_de_read_fw(dev_priv, + PIPE_FRMTMSTMP(crtc->pipe)); + else + scan_prev_time = intel_de_read_fw(dev_priv, + PIPE_FLIPTMSTMP(crtc->pipe)); /* * The TIMESTAMP_CTR register has the current * time stamp value. */ scan_curr_time = intel_de_read_fw(dev_priv, IVB_TIMESTAMP_CTR);
- scan_post_time = intel_de_read_fw(dev_priv, - PIPE_FRMTMSTMP(crtc->pipe)); + if (!crtc->config->uapi.async_flip) + scan_post_time = intel_de_read_fw(dev_priv, + PIPE_FRMTMSTMP(crtc->pipe)); + else + scan_post_time = intel_de_read_fw(dev_priv, + PIPE_FLIPTMSTMP(crtc->pipe)); } while (scan_post_time != scan_prev_time);
scanline = div_u64(mul_u32_u32(scan_curr_time - scan_prev_time, @@ -937,7 +954,6 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc, *vpos = position / htotal; *hpos = position - (*vpos * htotal); } - return true; }
@@ -1295,6 +1311,24 @@ display_pipe_crc_irq_handler(struct drm_i915_private *dev_priv, u32 crc4) {} #endif
+static void flip_done_handler(struct drm_i915_private *dev_priv, + unsigned int pipe) +{ + struct intel_crtc *crtc = intel_get_crtc_for_pipe(dev_priv, pipe); + struct drm_crtc_state *crtc_state = crtc->base.state; + struct drm_pending_vblank_event *e = crtc_state->event; + struct drm_device *dev = &dev_priv->drm; + unsigned long irqflags; + + crtc_state->event = NULL; + + drm_crtc_accurate_vblank_count(&crtc->base); + spin_lock_irqsave(&dev->event_lock, irqflags); + + drm_crtc_send_vblank_event(&crtc->base, e); + + spin_unlock_irqrestore(&dev->event_lock, irqflags); +}
static void hsw_pipe_crc_irq_handler(struct drm_i915_private *dev_priv, enum pipe pipe) @@ -2389,6 +2423,9 @@ gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl) if (iir & GEN8_PIPE_VBLANK) intel_handle_vblank(dev_priv, pipe);
+ if (iir & GEN9_PIPE_PLANE1_FLIP_DONE) + flip_done_handler(dev_priv, pipe); + if (iir & GEN8_PIPE_CDCLK_CRC_DONE) hsw_pipe_crc_irq_handler(dev_priv, pipe);
@@ -2710,6 +2747,19 @@ int bdw_enable_vblank(struct drm_crtc *crtc) return 0; }
+void skl_enable_flip_done(struct drm_crtc *crtc) +{ + struct drm_i915_private *dev_priv = to_i915(crtc->dev); + enum pipe pipe = to_intel_crtc(crtc)->pipe; + unsigned long irqflags; + + spin_lock_irqsave(&dev_priv->irq_lock, irqflags); + + bdw_enable_pipe_irq(dev_priv, pipe, GEN9_PIPE_PLANE1_FLIP_DONE); + + spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags); +} + /* Called from drm generic code, passed 'crtc' which * we use as a pipe index */ @@ -2770,6 +2820,19 @@ void bdw_disable_vblank(struct drm_crtc *crtc) spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags); }
+void skl_disable_flip_done(struct drm_crtc *crtc) +{ + struct drm_i915_private *dev_priv = to_i915(crtc->dev); + enum pipe pipe = to_intel_crtc(crtc)->pipe; + unsigned long irqflags; + + spin_lock_irqsave(&dev_priv->irq_lock, irqflags); + + bdw_disable_pipe_irq(dev_priv, pipe, GEN9_PIPE_PLANE1_FLIP_DONE); + + spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags); +} + static void ibx_irq_reset(struct drm_i915_private *dev_priv) { struct intel_uncore *uncore = &dev_priv->uncore; @@ -2980,6 +3043,9 @@ void gen8_irq_power_well_post_enable(struct drm_i915_private *dev_priv, u32 extra_ier = GEN8_PIPE_VBLANK | GEN8_PIPE_FIFO_UNDERRUN; enum pipe pipe;
+ if (INTEL_GEN(dev_priv) >= 9) + extra_ier |= GEN9_PIPE_PLANE1_FLIP_DONE; + spin_lock_irq(&dev_priv->irq_lock);
if (!intel_irqs_enabled(dev_priv)) { @@ -3458,6 +3524,9 @@ static void gen8_de_irq_postinstall(struct drm_i915_private *dev_priv) de_pipe_enables = de_pipe_masked | GEN8_PIPE_VBLANK | GEN8_PIPE_FIFO_UNDERRUN;
+ if (INTEL_GEN(dev_priv) >= 9) + de_pipe_enables |= GEN9_PIPE_PLANE1_FLIP_DONE; + de_port_enables = de_port_masked; if (IS_GEN9_LP(dev_priv)) de_port_enables |= BXT_DE_PORT_HOTPLUG_MASK; diff --git a/drivers/gpu/drm/i915/i915_irq.h b/drivers/gpu/drm/i915/i915_irq.h index 25f25cd95818..2f10c8135116 100644 --- a/drivers/gpu/drm/i915/i915_irq.h +++ b/drivers/gpu/drm/i915/i915_irq.h @@ -112,11 +112,13 @@ int i915gm_enable_vblank(struct drm_crtc *crtc); int i965_enable_vblank(struct drm_crtc *crtc); int ilk_enable_vblank(struct drm_crtc *crtc); int bdw_enable_vblank(struct drm_crtc *crtc); +void skl_enable_flip_done(struct drm_crtc *crtc); void i8xx_disable_vblank(struct drm_crtc *crtc); void i915gm_disable_vblank(struct drm_crtc *crtc); void i965_disable_vblank(struct drm_crtc *crtc); void ilk_disable_vblank(struct drm_crtc *crtc); void bdw_disable_vblank(struct drm_crtc *crtc); +void skl_disable_flip_done(struct drm_crtc *crtc);
void gen2_irq_reset(struct intel_uncore *uncore); void gen3_irq_reset(struct intel_uncore *uncore, i915_reg_t imr, diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index a0d31f3bf634..8cee06314d5d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -11144,9 +11144,11 @@ enum skl_power_gate { #define GEN9_TIMESTAMP_OVERRIDE_US_COUNTER_DENOMINATOR_MASK (0xf << 12)
#define _PIPE_FRMTMSTMP_A 0x70048 +#define _PIPE_FLIPTMSTMP_A 0x7004C #define PIPE_FRMTMSTMP(pipe) \ _MMIO_PIPE2(pipe, _PIPE_FRMTMSTMP_A) - +#define PIPE_FLIPTMSTMP(pipe) \ + _MMIO_PIPE2(pipe, _PIPE_FLIPTMSTMP_A) /* BXT MIPI clock controls */ #define BXT_MAX_VAR_OUTPUT_KHZ 39500
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
Add enable/disable flip done functions and the flip done handler function which handles the flip done interrupt.
Enable the flip done interrupt in IER.
Enable flip done function is called before writing the surface address register as the write to this register triggers the flip done interrupt
Flip done handler is used to send the page flip event as soon as the surface address is written as per the requirement of async flips. The interrupt is disabled after the event is sent.
v2: -Change function name from icl_* to skl_* (Paulo) -Move flip handler to this patch (Paulo) -Remove vblank_put() (Paulo) -Enable flip done interrupt for gen9+ only (Paulo) -Enable flip done interrupt in power_well_post_enable hook (Paulo) -Removed the event check in flip done handler to handle async flips without pageflip events.
v3: -Move skl_disable_flip_done out of interrupt handler (Paulo) -Make the pending vblank event NULL in the beginning of flip_done_handler to remove sporadic WARN_ON that is seen.
v4: -Calculate timestamps using flip done time stamp and current timestamp for async flips (Ville)
v5: -Fix the sparse warning by making the function 'g4x_get_flip_counter' static.(Reported-by: kernel test robot lkp@intel.com) -Fix the typo in commit message.
Signed-off-by: Karthik B S karthik.b.s@intel.com Signed-off-by: Vandita Kulkarni vandita.kulkarni@intel.com
drivers/gpu/drm/i915/display/intel_display.c | 10 +++ drivers/gpu/drm/i915/i915_irq.c | 83 ++++++++++++++++++-- drivers/gpu/drm/i915/i915_irq.h | 2 + drivers/gpu/drm/i915/i915_reg.h | 4 +- 4 files changed, 91 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index db2a5a1a9b35..b8ff032195d9 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -15562,6 +15562,13 @@ static void intel_atomic_commit_tail(struct intel_atomic_state *state)
intel_dbuf_pre_plane_update(state);
- for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) {
if (new_crtc_state->uapi.async_flip) {
skl_enable_flip_done(&crtc->base);
break;
Do we really want the break here? What if more than one CRTC wants an async flip?
Perhaps you could extend IGT to try this.
}
- }
- /* Now enable the clocks, plane, pipe, and connectors that we set up. */ dev_priv->display.commit_modeset_enables(state);
@@ -15583,6 +15590,9 @@ static void intel_atomic_commit_tail(struct intel_atomic_state *state) drm_atomic_helper_wait_for_flip_done(dev, &state->base);
for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) {
if (new_crtc_state->uapi.async_flip)
skl_disable_flip_done(&crtc->base);
Here we don't break in the first found, so at least there's an inconsistency.
- if (new_crtc_state->hw.active && !needs_modeset(new_crtc_state) && !new_crtc_state->preload_luts &&
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xffffff; }
+static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- return I915_READ(PIPE_FLIPCOUNT_G4X(pipe));
+}
u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe;
- if (crtc->state->async_flip)
return g4x_get_flip_counter(crtc);
- return I915_READ(PIPE_FRMCOUNT_G4X(pipe));
I don't understand the intention behind this, can you please clarify? This goes back to my reply of the cover letter. It seems that here we're going to alternate between two different counters in our vblank count. So if user space alternates between sometimes using async flips and sometimes using normal flip it's going to get some very weird deltas, isn't it? At least this is what I remember from when I played with these registers: FLIPCOUNT drifts away from FRMCOUNT when we start using async flips.
IMHO we really need our IGT to exercise this possibility.
}
Don't remove this blank line, please.
/*
- On certain encoders on certain platforms, pipe
- scanline register will not work to get the scanline,
@@ -737,17 +747,24 @@ static u32 __intel_get_crtc_scanline_from_timestamp(struct intel_crtc *crtc) * pipe frame time stamp. The time stamp value * is sampled at every start of vertical blank. */
scan_prev_time = intel_de_read_fw(dev_priv,
PIPE_FRMTMSTMP(crtc->pipe));
if (!crtc->config->uapi.async_flip)
scan_prev_time = intel_de_read_fw(dev_priv,
PIPE_FRMTMSTMP(crtc->pipe));
else
scan_prev_time = intel_de_read_fw(dev_priv,
/*PIPE_FLIPTMSTMP(crtc->pipe));
*/ scan_curr_time = intel_de_read_fw(dev_priv, IVB_TIMESTAMP_CTR);
- The TIMESTAMP_CTR register has the current
- time stamp value.
scan_post_time = intel_de_read_fw(dev_priv,
PIPE_FRMTMSTMP(crtc->pipe));
if (!crtc->config->uapi.async_flip)
scan_post_time = intel_de_read_fw(dev_priv,
PIPE_FRMTMSTMP(crtc->pipe));
else
scan_post_time = intel_de_read_fw(dev_priv,
PIPE_FLIPTMSTMP(crtc->pipe));
} while (scan_post_time != scan_prev_time);
scanline = div_u64(mul_u32_u32(scan_curr_time - scan_prev_time,
@@ -937,7 +954,6 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc, *vpos = position / htotal; *hpos = position - (*vpos * htotal); }
Please don't remove random blank lines.
return true; }
@@ -1295,6 +1311,24 @@ display_pipe_crc_irq_handler(struct drm_i915_private *dev_priv, u32 crc4) {} #endif
+static void flip_done_handler(struct drm_i915_private *dev_priv,
unsigned int pipe)
+{
- struct intel_crtc *crtc = intel_get_crtc_for_pipe(dev_priv, pipe);
- struct drm_crtc_state *crtc_state = crtc->base.state;
- struct drm_pending_vblank_event *e = crtc_state->event;
- struct drm_device *dev = &dev_priv->drm;
- unsigned long irqflags;
- crtc_state->event = NULL;
- drm_crtc_accurate_vblank_count(&crtc->base);
- spin_lock_irqsave(&dev->event_lock, irqflags);
- drm_crtc_send_vblank_event(&crtc->base, e);
Can you please explain why we need this pair of functions instead of relying on intel_handle_vblank() like the handler for the 'real' vblank interrupt? I'm not saying this is wrong, I'm just trying to understand the code in order to review it properly.
- spin_unlock_irqrestore(&dev->event_lock, irqflags);
+}
static void hsw_pipe_crc_irq_handler(struct drm_i915_private *dev_priv, enum pipe pipe) @@ -2389,6 +2423,9 @@ gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl) if (iir & GEN8_PIPE_VBLANK) intel_handle_vblank(dev_priv, pipe);
if (iir & GEN9_PIPE_PLANE1_FLIP_DONE)
flip_done_handler(dev_priv, pipe);
- if (iir & GEN8_PIPE_CDCLK_CRC_DONE) hsw_pipe_crc_irq_handler(dev_priv, pipe);
@@ -2710,6 +2747,19 @@ int bdw_enable_vblank(struct drm_crtc *crtc) return 0; }
+void skl_enable_flip_done(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- unsigned long irqflags;
- spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
- bdw_enable_pipe_irq(dev_priv, pipe, GEN9_PIPE_PLANE1_FLIP_DONE);
- spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+}
/* Called from drm generic code, passed 'crtc' which
- we use as a pipe index
*/ @@ -2770,6 +2820,19 @@ void bdw_disable_vblank(struct drm_crtc *crtc) spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags); }
+void skl_disable_flip_done(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- unsigned long irqflags;
- spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
- bdw_disable_pipe_irq(dev_priv, pipe, GEN9_PIPE_PLANE1_FLIP_DONE);
- spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+}
static void ibx_irq_reset(struct drm_i915_private *dev_priv) { struct intel_uncore *uncore = &dev_priv->uncore; @@ -2980,6 +3043,9 @@ void gen8_irq_power_well_post_enable(struct drm_i915_private *dev_priv, u32 extra_ier = GEN8_PIPE_VBLANK | GEN8_PIPE_FIFO_UNDERRUN; enum pipe pipe;
if (INTEL_GEN(dev_priv) >= 9)
extra_ier |= GEN9_PIPE_PLANE1_FLIP_DONE;
spin_lock_irq(&dev_priv->irq_lock);
if (!intel_irqs_enabled(dev_priv)) {
@@ -3458,6 +3524,9 @@ static void gen8_de_irq_postinstall(struct drm_i915_private *dev_priv) de_pipe_enables = de_pipe_masked | GEN8_PIPE_VBLANK | GEN8_PIPE_FIFO_UNDERRUN;
- if (INTEL_GEN(dev_priv) >= 9)
de_pipe_enables |= GEN9_PIPE_PLANE1_FLIP_DONE;
- de_port_enables = de_port_masked; if (IS_GEN9_LP(dev_priv)) de_port_enables |= BXT_DE_PORT_HOTPLUG_MASK;
diff --git a/drivers/gpu/drm/i915/i915_irq.h b/drivers/gpu/drm/i915/i915_irq.h index 25f25cd95818..2f10c8135116 100644 --- a/drivers/gpu/drm/i915/i915_irq.h +++ b/drivers/gpu/drm/i915/i915_irq.h @@ -112,11 +112,13 @@ int i915gm_enable_vblank(struct drm_crtc *crtc); int i965_enable_vblank(struct drm_crtc *crtc); int ilk_enable_vblank(struct drm_crtc *crtc); int bdw_enable_vblank(struct drm_crtc *crtc); +void skl_enable_flip_done(struct drm_crtc *crtc); void i8xx_disable_vblank(struct drm_crtc *crtc); void i915gm_disable_vblank(struct drm_crtc *crtc); void i965_disable_vblank(struct drm_crtc *crtc); void ilk_disable_vblank(struct drm_crtc *crtc); void bdw_disable_vblank(struct drm_crtc *crtc); +void skl_disable_flip_done(struct drm_crtc *crtc);
void gen2_irq_reset(struct intel_uncore *uncore); void gen3_irq_reset(struct intel_uncore *uncore, i915_reg_t imr, diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index a0d31f3bf634..8cee06314d5d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -11144,9 +11144,11 @@ enum skl_power_gate { #define GEN9_TIMESTAMP_OVERRIDE_US_COUNTER_DENOMINATOR_MASK (0xf << 12)
#define _PIPE_FRMTMSTMP_A 0x70048 +#define _PIPE_FLIPTMSTMP_A 0x7004C #define PIPE_FRMTMSTMP(pipe) \ _MMIO_PIPE2(pipe, _PIPE_FRMTMSTMP_A)
+#define PIPE_FLIPTMSTMP(pipe) \
_MMIO_PIPE2(pipe, _PIPE_FLIPTMSTMP_A)
/* BXT MIPI clock controls */ #define BXT_MAX_VAR_OUTPUT_KHZ 39500
On 2020-07-25 1:26 a.m., Paulo Zanoni wrote:
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xffffff; }
+static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- return I915_READ(PIPE_FLIPCOUNT_G4X(pipe));
+}
u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe;
- if (crtc->state->async_flip)
return g4x_get_flip_counter(crtc);
- return I915_READ(PIPE_FRMCOUNT_G4X(pipe));
I don't understand the intention behind this, can you please clarify? This goes back to my reply of the cover letter. It seems that here we're going to alternate between two different counters in our vblank count. So if user space alternates between sometimes using async flips and sometimes using normal flip it's going to get some very weird deltas, isn't it? At least this is what I remember from when I played with these registers: FLIPCOUNT drifts away from FRMCOUNT when we start using async flips.
This definitely looks wrong. The counter value returned by the get_vblank_counter hook is supposed to increment when a vertical blank period occurs; page flips are not supposed to affect this in any way.
On Mon, Jul 27, 2020 at 2:27 PM Michel Dänzer michel@daenzer.net wrote:
On 2020-07-25 1:26 a.m., Paulo Zanoni wrote:
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xffffff; }
+static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- return I915_READ(PIPE_FLIPCOUNT_G4X(pipe));
+}
u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe;
- if (crtc->state->async_flip)
return g4x_get_flip_counter(crtc);
- return I915_READ(PIPE_FRMCOUNT_G4X(pipe));
I don't understand the intention behind this, can you please clarify? This goes back to my reply of the cover letter. It seems that here we're going to alternate between two different counters in our vblank count. So if user space alternates between sometimes using async flips and sometimes using normal flip it's going to get some very weird deltas, isn't it? At least this is what I remember from when I played with these registers: FLIPCOUNT drifts away from FRMCOUNT when we start using async flips.
This definitely looks wrong. The counter value returned by the get_vblank_counter hook is supposed to increment when a vertical blank period occurs; page flips are not supposed to affect this in any way.
Also you just flat out can't access crtc->state from interrupt context. Anything you need in there needs to be protected by the right irq-type spin_lock, updates correctly synchronized against both the interrupt handler and atomic updates, and data copied over, not pointers. Otherwise just crash&burn. -Daniel
On 7/28/2020 3:04 AM, Daniel Vetter wrote:
On Mon, Jul 27, 2020 at 2:27 PM Michel Dänzer michel@daenzer.net wrote:
On 2020-07-25 1:26 a.m., Paulo Zanoni wrote:
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xffffff; }
+static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- return I915_READ(PIPE_FLIPCOUNT_G4X(pipe));
+}
u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe;
if (crtc->state->async_flip)
return g4x_get_flip_counter(crtc);
return I915_READ(PIPE_FRMCOUNT_G4X(pipe));
I don't understand the intention behind this, can you please clarify? This goes back to my reply of the cover letter. It seems that here we're going to alternate between two different counters in our vblank count. So if user space alternates between sometimes using async flips and sometimes using normal flip it's going to get some very weird deltas, isn't it? At least this is what I remember from when I played with these registers: FLIPCOUNT drifts away from FRMCOUNT when we start using async flips.
This definitely looks wrong. The counter value returned by the get_vblank_counter hook is supposed to increment when a vertical blank period occurs; page flips are not supposed to affect this in any way.
Also you just flat out can't access crtc->state from interrupt context. Anything you need in there needs to be protected by the right irq-type spin_lock, updates correctly synchronized against both the interrupt handler and atomic updates, and data copied over, not pointers. Otherwise just crash&burn.
Thanks for the review. I will be removing this change in the next revision based on the feedback received, but I will keep this in mind whenever I'll have to access something from the interrupt context.
Thanks, Karthik.B.S
-Daniel
On 7/27/2020 5:57 PM, Michel Dänzer wrote:
On 2020-07-25 1:26 a.m., Paulo Zanoni wrote:
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xffffff; }
+static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- return I915_READ(PIPE_FLIPCOUNT_G4X(pipe));
+}
u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe;
if (crtc->state->async_flip)
return g4x_get_flip_counter(crtc);
return I915_READ(PIPE_FRMCOUNT_G4X(pipe));
I don't understand the intention behind this, can you please clarify? This goes back to my reply of the cover letter. It seems that here we're going to alternate between two different counters in our vblank count. So if user space alternates between sometimes using async flips and sometimes using normal flip it's going to get some very weird deltas, isn't it? At least this is what I remember from when I played with these registers: FLIPCOUNT drifts away from FRMCOUNT when we start using async flips.
This definitely looks wrong. The counter value returned by the get_vblank_counter hook is supposed to increment when a vertical blank period occurs; page flips are not supposed to affect this in any way.
Thanks for the review. As per the feedback received, I will be removing this and will revert back to the original implementation in the next revision.
Thanks, Karthik.B.S
On 7/25/2020 4:56 AM, Paulo Zanoni wrote:
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
Add enable/disable flip done functions and the flip done handler function which handles the flip done interrupt.
Enable the flip done interrupt in IER.
Enable flip done function is called before writing the surface address register as the write to this register triggers the flip done interrupt
Flip done handler is used to send the page flip event as soon as the surface address is written as per the requirement of async flips. The interrupt is disabled after the event is sent.
v2: -Change function name from icl_* to skl_* (Paulo) -Move flip handler to this patch (Paulo) -Remove vblank_put() (Paulo) -Enable flip done interrupt for gen9+ only (Paulo) -Enable flip done interrupt in power_well_post_enable hook (Paulo) -Removed the event check in flip done handler to handle async flips without pageflip events.
v3: -Move skl_disable_flip_done out of interrupt handler (Paulo) -Make the pending vblank event NULL in the beginning of flip_done_handler to remove sporadic WARN_ON that is seen.
v4: -Calculate timestamps using flip done time stamp and current timestamp for async flips (Ville)
v5: -Fix the sparse warning by making the function 'g4x_get_flip_counter' static.(Reported-by: kernel test robot lkp@intel.com) -Fix the typo in commit message.
Signed-off-by: Karthik B S karthik.b.s@intel.com Signed-off-by: Vandita Kulkarni vandita.kulkarni@intel.com
drivers/gpu/drm/i915/display/intel_display.c | 10 +++ drivers/gpu/drm/i915/i915_irq.c | 83 ++++++++++++++++++-- drivers/gpu/drm/i915/i915_irq.h | 2 + drivers/gpu/drm/i915/i915_reg.h | 4 +- 4 files changed, 91 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index db2a5a1a9b35..b8ff032195d9 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -15562,6 +15562,13 @@ static void intel_atomic_commit_tail(struct intel_atomic_state *state)
intel_dbuf_pre_plane_update(state);
- for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) {
if (new_crtc_state->uapi.async_flip) {
skl_enable_flip_done(&crtc->base);
break;
Do we really want the break here? What if more than one CRTC wants an async flip?
Thanks for the review. This will fail for multiple CRTC case, I will remove this break.
Perhaps you could extend IGT to try this.
Currently we cannot add this scenario of having 2 crtc's in the same commit, as we're using the page flip ioctl. But I did try by hacking via the atomic path and 2 display with async is working fine.
}
- }
- /* Now enable the clocks, plane, pipe, and connectors that we set up. */ dev_priv->display.commit_modeset_enables(state);
@@ -15583,6 +15590,9 @@ static void intel_atomic_commit_tail(struct intel_atomic_state *state) drm_atomic_helper_wait_for_flip_done(dev, &state->base);
for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) {
if (new_crtc_state->uapi.async_flip)
skl_disable_flip_done(&crtc->base);
Here we don't break in the first found, so at least there's an inconsistency.
I will remove the break in the earlier loop.
- if (new_crtc_state->hw.active && !needs_modeset(new_crtc_state) && !new_crtc_state->preload_luts &&
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xffffff; }
+static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- return I915_READ(PIPE_FLIPCOUNT_G4X(pipe));
+}
u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe;
if (crtc->state->async_flip)
return g4x_get_flip_counter(crtc);
return I915_READ(PIPE_FRMCOUNT_G4X(pipe));
I don't understand the intention behind this, can you please clarify? This goes back to my reply of the cover letter. It seems that here we're going to alternate between two different counters in our vblank count. So if user space alternates between sometimes using async flips and sometimes using normal flip it's going to get some very weird deltas, isn't it? At least this is what I remember from when I played with these registers: FLIPCOUNT drifts away from FRMCOUNT when we start using async flips.
IMHO we really need our IGT to exercise this possibility.
As per the feedback received, I will be removing this in the next revision and will revert back to the original implementation. And in the IGT, will be checking the time stamp during flip done from the user space itself.
}
Don't remove this blank line, please.
Will fix this.
/*
- On certain encoders on certain platforms, pipe
- scanline register will not work to get the scanline,
@@ -737,17 +747,24 @@ static u32 __intel_get_crtc_scanline_from_timestamp(struct intel_crtc *crtc) * pipe frame time stamp. The time stamp value * is sampled at every start of vertical blank. */
scan_prev_time = intel_de_read_fw(dev_priv,
PIPE_FRMTMSTMP(crtc->pipe));
if (!crtc->config->uapi.async_flip)
scan_prev_time = intel_de_read_fw(dev_priv,
PIPE_FRMTMSTMP(crtc->pipe));
else
scan_prev_time = intel_de_read_fw(dev_priv,
/*PIPE_FLIPTMSTMP(crtc->pipe));
*/ scan_curr_time = intel_de_read_fw(dev_priv, IVB_TIMESTAMP_CTR);
- The TIMESTAMP_CTR register has the current
- time stamp value.
scan_post_time = intel_de_read_fw(dev_priv,
PIPE_FRMTMSTMP(crtc->pipe));
if (!crtc->config->uapi.async_flip)
scan_post_time = intel_de_read_fw(dev_priv,
PIPE_FRMTMSTMP(crtc->pipe));
else
scan_post_time = intel_de_read_fw(dev_priv,
PIPE_FLIPTMSTMP(crtc->pipe));
} while (scan_post_time != scan_prev_time);
scanline = div_u64(mul_u32_u32(scan_curr_time - scan_prev_time,
@@ -937,7 +954,6 @@ static bool i915_get_crtc_scanoutpos(struct drm_crtc *_crtc, *vpos = position / htotal; *hpos = position - (*vpos * htotal); }
Please don't remove random blank lines.
Will fix this.
return true; }
@@ -1295,6 +1311,24 @@ display_pipe_crc_irq_handler(struct drm_i915_private *dev_priv, u32 crc4) {} #endif
+static void flip_done_handler(struct drm_i915_private *dev_priv,
unsigned int pipe)
+{
- struct intel_crtc *crtc = intel_get_crtc_for_pipe(dev_priv, pipe);
- struct drm_crtc_state *crtc_state = crtc->base.state;
- struct drm_pending_vblank_event *e = crtc_state->event;
- struct drm_device *dev = &dev_priv->drm;
- unsigned long irqflags;
- crtc_state->event = NULL;
- drm_crtc_accurate_vblank_count(&crtc->base);
- spin_lock_irqsave(&dev->event_lock, irqflags);
- drm_crtc_send_vblank_event(&crtc->base, e);
Can you please explain why we need this pair of functions instead of relying on intel_handle_vblank() like the handler for the 'real' vblank interrupt? I'm not saying this is wrong, I'm just trying to understand the code in order to review it properly.
intel_handle_vblank() would require the condition of drm_vblank_passed to be met, in order to send out the events. Since in async case this would not be true, I'm using the 'drm_crtc_send_vblank_event' function to send the vblank event. Also, will remove the 'drm_crtc_accurate_vblank_count' function in the next revision as there will be no changes to the time stamping code now. Thanks, Karthik.B.S
- spin_unlock_irqrestore(&dev->event_lock, irqflags);
+}
static void hsw_pipe_crc_irq_handler(struct drm_i915_private *dev_priv, enum pipe pipe) @@ -2389,6 +2423,9 @@ gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl) if (iir & GEN8_PIPE_VBLANK) intel_handle_vblank(dev_priv, pipe);
if (iir & GEN9_PIPE_PLANE1_FLIP_DONE)
flip_done_handler(dev_priv, pipe);
- if (iir & GEN8_PIPE_CDCLK_CRC_DONE) hsw_pipe_crc_irq_handler(dev_priv, pipe);
@@ -2710,6 +2747,19 @@ int bdw_enable_vblank(struct drm_crtc *crtc) return 0; }
+void skl_enable_flip_done(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- unsigned long irqflags;
- spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
- bdw_enable_pipe_irq(dev_priv, pipe, GEN9_PIPE_PLANE1_FLIP_DONE);
- spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+}
- /* Called from drm generic code, passed 'crtc' which
*/
- we use as a pipe index
@@ -2770,6 +2820,19 @@ void bdw_disable_vblank(struct drm_crtc *crtc) spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags); }
+void skl_disable_flip_done(struct drm_crtc *crtc) +{
- struct drm_i915_private *dev_priv = to_i915(crtc->dev);
- enum pipe pipe = to_intel_crtc(crtc)->pipe;
- unsigned long irqflags;
- spin_lock_irqsave(&dev_priv->irq_lock, irqflags);
- bdw_disable_pipe_irq(dev_priv, pipe, GEN9_PIPE_PLANE1_FLIP_DONE);
- spin_unlock_irqrestore(&dev_priv->irq_lock, irqflags);
+}
- static void ibx_irq_reset(struct drm_i915_private *dev_priv) { struct intel_uncore *uncore = &dev_priv->uncore;
@@ -2980,6 +3043,9 @@ void gen8_irq_power_well_post_enable(struct drm_i915_private *dev_priv, u32 extra_ier = GEN8_PIPE_VBLANK | GEN8_PIPE_FIFO_UNDERRUN; enum pipe pipe;
if (INTEL_GEN(dev_priv) >= 9)
extra_ier |= GEN9_PIPE_PLANE1_FLIP_DONE;
spin_lock_irq(&dev_priv->irq_lock);
if (!intel_irqs_enabled(dev_priv)) {
@@ -3458,6 +3524,9 @@ static void gen8_de_irq_postinstall(struct drm_i915_private *dev_priv) de_pipe_enables = de_pipe_masked | GEN8_PIPE_VBLANK | GEN8_PIPE_FIFO_UNDERRUN;
- if (INTEL_GEN(dev_priv) >= 9)
de_pipe_enables |= GEN9_PIPE_PLANE1_FLIP_DONE;
- de_port_enables = de_port_masked; if (IS_GEN9_LP(dev_priv)) de_port_enables |= BXT_DE_PORT_HOTPLUG_MASK;
diff --git a/drivers/gpu/drm/i915/i915_irq.h b/drivers/gpu/drm/i915/i915_irq.h index 25f25cd95818..2f10c8135116 100644 --- a/drivers/gpu/drm/i915/i915_irq.h +++ b/drivers/gpu/drm/i915/i915_irq.h @@ -112,11 +112,13 @@ int i915gm_enable_vblank(struct drm_crtc *crtc); int i965_enable_vblank(struct drm_crtc *crtc); int ilk_enable_vblank(struct drm_crtc *crtc); int bdw_enable_vblank(struct drm_crtc *crtc); +void skl_enable_flip_done(struct drm_crtc *crtc); void i8xx_disable_vblank(struct drm_crtc *crtc); void i915gm_disable_vblank(struct drm_crtc *crtc); void i965_disable_vblank(struct drm_crtc *crtc); void ilk_disable_vblank(struct drm_crtc *crtc); void bdw_disable_vblank(struct drm_crtc *crtc); +void skl_disable_flip_done(struct drm_crtc *crtc);
void gen2_irq_reset(struct intel_uncore *uncore); void gen3_irq_reset(struct intel_uncore *uncore, i915_reg_t imr, diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index a0d31f3bf634..8cee06314d5d 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -11144,9 +11144,11 @@ enum skl_power_gate { #define GEN9_TIMESTAMP_OVERRIDE_US_COUNTER_DENOMINATOR_MASK (0xf << 12)
#define _PIPE_FRMTMSTMP_A 0x70048 +#define _PIPE_FLIPTMSTMP_A 0x7004C #define PIPE_FRMTMSTMP(pipe) \ _MMIO_PIPE2(pipe, _PIPE_FRMTMSTMP_A)
+#define PIPE_FLIPTMSTMP(pipe) \
/* BXT MIPI clock controls */ #define BXT_MAX_VAR_OUTPUT_KHZ 39500_MMIO_PIPE2(pipe, _PIPE_FLIPTMSTMP_A)
Set the Async Address Update Enable bit in plane ctl when async flip is requested.
v2: -Move the Async flip enablement to individual patch (Paulo)
v3: -Rebased.
v4: -Add separate plane hook for async flip case (Ville)
v5: -Rebased.
Signed-off-by: Karthik B S karthik.b.s@intel.com Signed-off-by: Vandita Kulkarni vandita.kulkarni@intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 6 +++++ drivers/gpu/drm/i915/display/intel_sprite.c | 25 ++++++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 1 + 3 files changed, 32 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index b8ff032195d9..4773f39e5924 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -4766,6 +4766,12 @@ u32 skl_plane_ctl(const struct intel_crtc_state *crtc_state, const struct drm_intel_sprite_colorkey *key = &plane_state->ckey; u32 plane_ctl;
+ /* During Async flip, no other updates are allowed */ + if (crtc_state->uapi.async_flip) { + plane_ctl |= PLANE_CTL_ASYNC_FLIP; + return plane_ctl; + } + plane_ctl = PLANE_CTL_ENABLE;
if (INTEL_GEN(dev_priv) < 10 && !IS_GEMINILAKE(dev_priv)) { diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c b/drivers/gpu/drm/i915/display/intel_sprite.c index c26ca029fc0a..3747482e8fa3 100644 --- a/drivers/gpu/drm/i915/display/intel_sprite.c +++ b/drivers/gpu/drm/i915/display/intel_sprite.c @@ -603,6 +603,24 @@ icl_program_input_csc(struct intel_plane *plane, PLANE_INPUT_CSC_POSTOFF(pipe, plane_id, 2), 0x0); }
+static void +skl_program_async_surface_address(struct drm_i915_private *dev_priv, + const struct intel_plane_state *plane_state, + enum pipe pipe, enum plane_id plane_id, + u32 surf_addr) +{ + unsigned long irqflags; + u32 plane_ctl = plane_state->ctl; + + spin_lock_irqsave(&dev_priv->uncore.lock, irqflags); + + intel_de_write_fw(dev_priv, PLANE_CTL(pipe, plane_id), plane_ctl); + intel_de_write_fw(dev_priv, PLANE_SURF(pipe, plane_id), + intel_plane_ggtt_offset(plane_state) + surf_addr); + + spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags); +} + static void skl_program_plane(struct intel_plane *plane, const struct intel_crtc_state *crtc_state, @@ -631,6 +649,13 @@ skl_program_plane(struct intel_plane *plane, u32 keymsk, keymax; u32 plane_ctl = plane_state->ctl;
+ /* During Async flip, no other updates are allowed */ + if (crtc_state->uapi.async_flip) { + skl_program_async_surface_address(dev_priv, plane_state, + pipe, plane_id, surf_addr); + return; + } + plane_ctl |= skl_plane_ctl_crtc(crtc_state);
if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv)) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 8cee06314d5d..19aad4199874 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6935,6 +6935,7 @@ enum { #define PLANE_CTL_TILED_X (1 << 10) #define PLANE_CTL_TILED_Y (4 << 10) #define PLANE_CTL_TILED_YF (5 << 10) +#define PLANE_CTL_ASYNC_FLIP (1 << 9) #define PLANE_CTL_FLIP_HORIZONTAL (1 << 8) #define PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE (1 << 4) /* TGL+ */ #define PLANE_CTL_ALPHA_MASK (0x3 << 4) /* Pre-GLK */
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
Set the Async Address Update Enable bit in plane ctl when async flip is requested.
v2: -Move the Async flip enablement to individual patch (Paulo)
v3: -Rebased.
v4: -Add separate plane hook for async flip case (Ville)
v5: -Rebased.
Signed-off-by: Karthik B S karthik.b.s@intel.com Signed-off-by: Vandita Kulkarni vandita.kulkarni@intel.com
drivers/gpu/drm/i915/display/intel_display.c | 6 +++++ drivers/gpu/drm/i915/display/intel_sprite.c | 25 ++++++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 1 + 3 files changed, 32 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index b8ff032195d9..4773f39e5924 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -4766,6 +4766,12 @@ u32 skl_plane_ctl(const struct intel_crtc_state *crtc_state, const struct drm_intel_sprite_colorkey *key = &plane_state->ckey; u32 plane_ctl;
- /* During Async flip, no other updates are allowed */
My understanding is that this function is fully setting the right bits based on the chosen config (instead of doing read-modify-write), and the checks for "other updates" were done before. So the logic implemented here of early returning doesn't make sense.
- if (crtc_state->uapi.async_flip) {
plane_ctl |= PLANE_CTL_ASYNC_FLIP;
I wonder why gcc does not complain we're ORing with an unitialized value.
return plane_ctl;
- }
- plane_ctl = PLANE_CTL_ENABLE;
It seems to be the return above means we'll never even try to enable the plane, we're only relying on the fact that plane_ctl is not zero initialize so maybe bit 31 is already set.
if (INTEL_GEN(dev_priv) < 10 && !IS_GEMINILAKE(dev_priv)) { diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c b/drivers/gpu/drm/i915/display/intel_sprite.c index c26ca029fc0a..3747482e8fa3 100644 --- a/drivers/gpu/drm/i915/display/intel_sprite.c +++ b/drivers/gpu/drm/i915/display/intel_sprite.c @@ -603,6 +603,24 @@ icl_program_input_csc(struct intel_plane *plane, PLANE_INPUT_CSC_POSTOFF(pipe, plane_id, 2), 0x0); }
+static void +skl_program_async_surface_address(struct drm_i915_private *dev_priv,
const struct intel_plane_state *plane_state,
enum pipe pipe, enum plane_id plane_id,
u32 surf_addr)
+{
- unsigned long irqflags;
- u32 plane_ctl = plane_state->ctl;
- spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
- intel_de_write_fw(dev_priv, PLANE_CTL(pipe, plane_id), plane_ctl);
- intel_de_write_fw(dev_priv, PLANE_SURF(pipe, plane_id),
intel_plane_ggtt_offset(plane_state) + surf_addr);
- spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+}
static void skl_program_plane(struct intel_plane *plane, const struct intel_crtc_state *crtc_state, @@ -631,6 +649,13 @@ skl_program_plane(struct intel_plane *plane, u32 keymsk, keymax; u32 plane_ctl = plane_state->ctl;
- /* During Async flip, no other updates are allowed */
- if (crtc_state->uapi.async_flip) {
skl_program_async_surface_address(dev_priv, plane_state,
pipe, plane_id, surf_addr);
return;
- }
I'd vote for us to keep the "don't rewrite registers that shouldn't change" part on its own commit, since it's just an optimization. It could even go at the end of the series. But perhaps this is simple enough and not needed.
plane_ctl |= skl_plane_ctl_crtc(crtc_state);
if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 8cee06314d5d..19aad4199874 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6935,6 +6935,7 @@ enum { #define PLANE_CTL_TILED_X (1 << 10) #define PLANE_CTL_TILED_Y (4 << 10) #define PLANE_CTL_TILED_YF (5 << 10) +#define PLANE_CTL_ASYNC_FLIP (1 << 9) #define PLANE_CTL_FLIP_HORIZONTAL (1 << 8) #define PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE (1 << 4) /* TGL+ */ #define PLANE_CTL_ALPHA_MASK (0x3 << 4) /* Pre-GLK */
On 7/25/2020 4:56 AM, Paulo Zanoni wrote:
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
Set the Async Address Update Enable bit in plane ctl when async flip is requested.
v2: -Move the Async flip enablement to individual patch (Paulo)
v3: -Rebased.
v4: -Add separate plane hook for async flip case (Ville)
v5: -Rebased.
Signed-off-by: Karthik B S karthik.b.s@intel.com Signed-off-by: Vandita Kulkarni vandita.kulkarni@intel.com
drivers/gpu/drm/i915/display/intel_display.c | 6 +++++ drivers/gpu/drm/i915/display/intel_sprite.c | 25 ++++++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 1 + 3 files changed, 32 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index b8ff032195d9..4773f39e5924 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -4766,6 +4766,12 @@ u32 skl_plane_ctl(const struct intel_crtc_state *crtc_state, const struct drm_intel_sprite_colorkey *key = &plane_state->ckey; u32 plane_ctl;
- /* During Async flip, no other updates are allowed */
My understanding is that this function is fully setting the right bits based on the chosen config (instead of doing read-modify-write), and the checks for "other updates" were done before. So the logic implemented here of early returning doesn't make sense.
Thanks for the review. Yes the check for other updates are done before.
So I could either do read-modify-write and return early, or, keep the existing code flow as is, since the are checks already present.
I will keep the existing flow and remove the early return in the next revision.
- if (crtc_state->uapi.async_flip) {
plane_ctl |= PLANE_CTL_ASYNC_FLIP;
I wonder why gcc does not complain we're ORing with an unitialized value.
Will initialize the plane_ctl variable to zero.
return plane_ctl;
- }
- plane_ctl = PLANE_CTL_ENABLE;
It seems to be the return above means we'll never even try to enable the plane, we're only relying on the fact that plane_ctl is not zero initialize so maybe bit 31 is already set.
Since we only allow async flips on planes that are already enabled, I assumed this would not be needed. Also, other than bit 9 (async address update enable), this register is double buffered and cannot be updated asynchronously.
if (INTEL_GEN(dev_priv) < 10 && !IS_GEMINILAKE(dev_priv)) { diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c b/drivers/gpu/drm/i915/display/intel_sprite.c index c26ca029fc0a..3747482e8fa3 100644 --- a/drivers/gpu/drm/i915/display/intel_sprite.c +++ b/drivers/gpu/drm/i915/display/intel_sprite.c @@ -603,6 +603,24 @@ icl_program_input_csc(struct intel_plane *plane, PLANE_INPUT_CSC_POSTOFF(pipe, plane_id, 2), 0x0); }
+static void +skl_program_async_surface_address(struct drm_i915_private *dev_priv,
const struct intel_plane_state *plane_state,
enum pipe pipe, enum plane_id plane_id,
u32 surf_addr)
+{
- unsigned long irqflags;
- u32 plane_ctl = plane_state->ctl;
- spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
- intel_de_write_fw(dev_priv, PLANE_CTL(pipe, plane_id), plane_ctl);
- intel_de_write_fw(dev_priv, PLANE_SURF(pipe, plane_id),
intel_plane_ggtt_offset(plane_state) + surf_addr);
- spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+}
- static void skl_program_plane(struct intel_plane *plane, const struct intel_crtc_state *crtc_state,
@@ -631,6 +649,13 @@ skl_program_plane(struct intel_plane *plane, u32 keymsk, keymax; u32 plane_ctl = plane_state->ctl;
- /* During Async flip, no other updates are allowed */
- if (crtc_state->uapi.async_flip) {
skl_program_async_surface_address(dev_priv, plane_state,
pipe, plane_id, surf_addr);
return;
- }
I'd vote for us to keep the "don't rewrite registers that shouldn't change" part on its own commit, since it's just an optimization. It could even go at the end of the series. But perhaps this is simple enough and not needed.
Will move this change to the end of the series.
Thanks, Karthik.B.S
plane_ctl |= skl_plane_ctl_crtc(crtc_state);
if (INTEL_GEN(dev_priv) >= 10 || IS_GEMINILAKE(dev_priv))
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 8cee06314d5d..19aad4199874 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6935,6 +6935,7 @@ enum { #define PLANE_CTL_TILED_X (1 << 10) #define PLANE_CTL_TILED_Y (4 << 10) #define PLANE_CTL_TILED_YF (5 << 10) +#define PLANE_CTL_ASYNC_FLIP (1 << 9) #define PLANE_CTL_FLIP_HORIZONTAL (1 << 8) #define PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE (1 << 4) /* TGL+ */ #define PLANE_CTL_ALPHA_MASK (0x3 << 4) /* Pre-GLK */
Support added only for async flips on primary plane. If flip is requested on any other plane, reject it.
Make sure there is no change in fbc, offset and framebuffer modifiers when async flip is requested.
If any of these are modified, reject async flip.
v2: -Replace DRM_ERROR (Paulo) -Add check for changes in OFFSET, FBC, RC(Paulo)
v3: -Removed TODO as benchmarking tests have been run now.
v4: -Added more state checks for async flip (Ville) -Moved intel_atomic_check_async to the end of intel_atomic_check as the plane checks needs to pass before this. (Ville) -Removed crtc_state->enable_fbc check. (Ville) -Set the I915_MODE_FLAG_GET_SCANLINE_FROM_TIMESTAMP flag for async flip case as scanline counter is not reliable here.
v5: -Fix typo and other check patch errors seen in CI in 'intel_atomic_check_async' function.
Signed-off-by: Karthik B S karthik.b.s@intel.com Signed-off-by: Vandita Kulkarni vandita.kulkarni@intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 104 +++++++++++++++++++ 1 file changed, 104 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 4773f39e5924..562e3173ef83 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -14835,6 +14835,102 @@ static bool intel_cpu_transcoders_need_modeset(struct intel_atomic_state *state, return false; }
+static int intel_atomic_check_async(struct intel_atomic_state *state) +{ + struct intel_crtc_state *old_crtc_state, *new_crtc_state; + struct intel_plane_state *new_plane_state, *old_plane_state; + struct intel_crtc *crtc; + struct intel_plane *intel_plane; + int i, n_planes = 0; + + for_each_oldnew_intel_crtc_in_state(state, crtc, old_crtc_state, + new_crtc_state, i) { + if (needs_modeset(new_crtc_state)) { + DRM_DEBUG_KMS("Modeset Required. Async flip not supported\n"); + return -EINVAL; + } + + if (!new_crtc_state->uapi.active) { + DRM_DEBUG_KMS("CRTC inactive\n"); + return -EINVAL; + } + if (old_crtc_state->active_planes != new_crtc_state->active_planes) { + DRM_DEBUG_KMS("Active planes cannot be changed during async flip\n"); + return -EINVAL; + } + } + + for_each_oldnew_intel_plane_in_state(state, intel_plane, old_plane_state, + new_plane_state, i) { + /*TODO: Async flip is only supported through the page flip IOCTL + * as of now. So support currently added for primary plane only. + * Support for other planes should be added when async flip is + * enabled in the atomic IOCTL path. + */ + if (intel_plane->id != PLANE_PRIMARY) + return -EINVAL; + + if (old_plane_state->color_plane[0].x != + new_plane_state->color_plane[0].x || + old_plane_state->color_plane[0].y != + new_plane_state->color_plane[0].y) { + DRM_DEBUG_KMS("Offsets cannot be changed in async flip\n"); + return -EINVAL; + } + + if (old_plane_state->uapi.fb->modifier != + new_plane_state->uapi.fb->modifier) { + DRM_DEBUG_KMS("Framebuffer modifiers cannot be changed in async flip\n"); + return -EINVAL; + } + + if (old_plane_state->uapi.fb->format != + new_plane_state->uapi.fb->format) { + DRM_DEBUG_KMS("Framebuffer format cannot be changed in async flip\n"); + return -EINVAL; + } + + if (intel_wm_need_update(old_plane_state, new_plane_state)) { + DRM_DEBUG_KMS("WM update not allowed in async flip\n"); + return -EINVAL; + } + + if (old_plane_state->uapi.alpha != new_plane_state->uapi.alpha) { + DRM_DEBUG_KMS("Alpha value cannot be changed in async flip\n"); + return -EINVAL; + } + + if (old_plane_state->uapi.pixel_blend_mode != + new_plane_state->uapi.pixel_blend_mode) { + DRM_DEBUG_KMS("Pixel blend mode cannot be changed in async flip\n"); + return -EINVAL; + } + + if (old_plane_state->uapi.color_encoding != new_plane_state->uapi.color_encoding) { + DRM_DEBUG_KMS("Color encoding cannot be changed in async flip\n"); + return -EINVAL; + } + + if (old_plane_state->uapi.color_range != new_plane_state->uapi.color_range) { + DRM_DEBUG_KMS("Color range cannot be changed in async flip\n"); + return -EINVAL; + } + + n_planes++; + } + + if (n_planes != 1) + return -EINVAL; + + /*Scan line registers cannot be trusted for async flip */ + for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) { + if (new_crtc_state->uapi.async_flip) + crtc->mode_flags |= I915_MODE_FLAG_GET_SCANLINE_FROM_TIMESTAMP; + } + + return 0; +} + /** * intel_atomic_check - validate state object * @dev: drm device @@ -15014,6 +15110,14 @@ static int intel_atomic_check(struct drm_device *dev, "[modeset]" : "[fastset]"); }
+ for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) { + if (new_crtc_state->uapi.async_flip) { + ret = intel_atomic_check_async(state); + + if (ret) + goto fail; + } + } return 0;
fail:
Since the flip done event will be sent in the flip_done_handler, no need to add the event to the list and delay it for later.
v2: -Moved the async check above vblank_get as it was causing issues for PSR.
v3: -No need to wait for vblank to pass, as this wait was causing a 16ms delay once every few flips.
v4: -Rebased.
v5: -Rebased.
Signed-off-by: Karthik B S karthik.b.s@intel.com Signed-off-by: Vandita Kulkarni vandita.kulkarni@intel.com --- drivers/gpu/drm/i915/display/intel_sprite.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_sprite.c b/drivers/gpu/drm/i915/display/intel_sprite.c index 3747482e8fa3..1c03546a4d2a 100644 --- a/drivers/gpu/drm/i915/display/intel_sprite.c +++ b/drivers/gpu/drm/i915/display/intel_sprite.c @@ -93,6 +93,9 @@ void intel_pipe_update_start(const struct intel_crtc_state *new_crtc_state) DEFINE_WAIT(wait); u32 psr_status;
+ if (new_crtc_state->uapi.async_flip) + goto irq_disable; + vblank_start = adjusted_mode->crtc_vblank_start; if (adjusted_mode->flags & DRM_MODE_FLAG_INTERLACE) vblank_start = DIV_ROUND_UP(vblank_start, 2); @@ -206,7 +209,7 @@ void intel_pipe_update_end(struct intel_crtc_state *new_crtc_state) * Would be slightly nice to just grab the vblank count and arm the * event outside of the critical section - the spinlock might spin for a * while ... */ - if (new_crtc_state->uapi.event) { + if (new_crtc_state->uapi.event && !new_crtc_state->uapi.async_flip) { drm_WARN_ON(&dev_priv->drm, drm_crtc_vblank_get(&crtc->base) != 0);
@@ -220,6 +223,9 @@ void intel_pipe_update_end(struct intel_crtc_state *new_crtc_state)
local_irq_enable();
+ if (new_crtc_state->uapi.async_flip) + return; + if (intel_vgpu_active(dev_priv)) return;
Enable asynchronous flips in i915 for gen9+ platforms.
v2: -Async flip enablement should be a stand alone patch (Paulo)
v3: -Move the patch to the end of the serires (Paulo)
v4: -Rebased.
v5: -Rebased.
Signed-off-by: Karthik B S karthik.b.s@intel.com Signed-off-by: Vandita Kulkarni vandita.kulkarni@intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 562e3173ef83..931b0fe6ee34 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -17897,6 +17897,9 @@ static void intel_mode_config_init(struct drm_i915_private *i915)
mode_config->funcs = &intel_mode_funcs;
+ if (INTEL_GEN(i915) >= 9) + mode_config->async_page_flip = true; + /* * Maximum framebuffer dimensions, chosen to match * the maximum render engine surface size on gen4+.
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
Without async flip support in the kernel, fullscreen apps where game resolution is equal to the screen resolution, must perform an extra blit per frame prior to flipping.
Asynchronous page flips will also boost the FPS of Mesa benchmarks.
We had a discussion in patch 1 of v3 regarding the semantics of asynchronous flips from the point of view of the user space: how we handle our vblank counters, how/when we increment the sequence events and how we handle timestamps, how/when we deliver vblank events. Since apparently AMD has already enabled this feature, our job would be to implement their current behavior so KMS clients can continue to work regardless of the driver.
From reading this series it's not super clear to me what exactly is the
behavior that we're trying to follow. Can you please document somewhere what are these rules and expectations? This way, people writing user space code (or people improving the other drivers) will have an easier time. In addition to text documentation, I believe all our assumptions and rules should be coded in IGT: we want to be confident a driver implements async page flips correctly when we can verify it passes the IGT.
Also, in the other patches I raise some additional questions regarding mixing async with non-async vblanks: IMHO this should also be documented as text and as IGT.
v2: -Few patches have been squashed and patches have been shuffled as per the reviews on the previous version.
v3: -Few patches have been squashed and patches have been shuffled as per the reviews on the previous version.
v4: -Made changes to fix the sequence and time stamp issue as per the comments received on the previous version. -Timestamps are calculated using the flip done time stamp and current timestamp. Here I915_MODE_FLAG_GET_SCANLINE_FROM_TIMESTAMP flag is used for timestamp calculations. -Event is sent from the interrupt handler immediately using this updated timestamps and sequence. -Added more state checks as async flip should only allow change in plane surface address and nothing else should be allowed to change. -Added a separate plane hook for async flip. -Need to find a way to reject fbc enabling if it comes as part of this flip as bspec states that changes to FBC are not allowed.
v5: -Fixed the Checkpatch and sparse warnings.
Karthik B S (5): drm/i915: Add enable/disable flip done and flip done handler drm/i915: Add support for async flips in I915 drm/i915: Add checks specific to async flips drm/i915: Do not call drm_crtc_arm_vblank_event in async flips drm/i915: Enable async flips in i915
drivers/gpu/drm/i915/display/intel_display.c | 123 +++++++++++++++++++ drivers/gpu/drm/i915/display/intel_sprite.c | 33 ++++- drivers/gpu/drm/i915/i915_irq.c | 83 +++++++++++-- drivers/gpu/drm/i915/i915_irq.h | 2 + drivers/gpu/drm/i915/i915_reg.h | 5 +- 5 files changed, 237 insertions(+), 9 deletions(-)
-----Original Message----- From: Zanoni, Paulo R paulo.r.zanoni@intel.com Sent: Saturday, July 25, 2020 4:56 AM To: B S, Karthik karthik.b.s@intel.com; intel-gfx@lists.freedesktop.org Cc: ville.syrjala@linux.intel.com; Vetter, Daniel daniel.vetter@intel.com; Kulkarni, Vandita vandita.kulkarni@intel.com; nicholas.kazlauskas@amd.com; harry.wentland@amd.com; Shankar, Uma uma.shankar@intel.com; dri-devel@lists.freedesktop.org Subject: Re: [PATCH v5 0/5] Asynchronous flip implementation for i915
Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu:
Without async flip support in the kernel, fullscreen apps where game resolution is equal to the screen resolution, must perform an extra blit per frame prior to flipping.
Asynchronous page flips will also boost the FPS of Mesa benchmarks.
We had a discussion in patch 1 of v3 regarding the semantics of asynchronous flips from the point of view of the user space: how we handle our vblank counters, how/when we increment the sequence events and how we handle timestamps, how/when we deliver vblank events. Since apparently AMD has already enabled this feature, our job would be to implement their current behavior so KMS clients can continue to work regardless of the driver.
Thanks for your comments Paulo.
On V3 patch1, yes there were comments with this regard. But seems like we did not coclude on few of the things. There were comments from Ville on how we could implement the timestamping for async flips and that is part of this version. Also we heard from Nicholas in their driver the time stamp is not mapping to the scan out as it happens immediately.
On async flips, there needs to be some clarity/guideline on the behaviour and event expectation from the driver by user space. Here are few assumptions that we have, 1. Our understanding is that the user space doesn’t expect the timestamp for async flips (but still expects vblank timestamp) , or doesn’t do anything with that, same is the assumption wrt the flip sequence, please correct us if we are wrong. 2. In the sequence the user space still expects the counter that marks vblanks. 3. The user space can use different event types like DRM_EVENT_VBLANK or DRM_EVENT_FLIP_COMPLETE for getting the corresponding event. And their designs are still aligned to this even in case of async.
If there are any more expectations from the user space wrt to the event that is being sent from the driver in case of async flip, please let us know.
If the user space doesn’t care much about the flip sequence then, we can just not do anything like returning the flip counter like this version is doing and just stick to returning of the frame counter value(which marks vblanks).
Based on these, we can tune the current implementation which right now sends the flip time stamp in case of async flips.
Thanks, Vandita
From reading this series it's not super clear to me what exactly is the behavior that we're trying to follow. Can you please document somewhere what are these rules and expectations? This way, people writing user space code (or people improving the other drivers) will have an easier time. In addition to text documentation, I believe all our assumptions and rules should be coded in IGT: we want to be confident a driver implements async page flips correctly when we can verify it passes the IGT.
Also, in the other patches I raise some additional questions regarding mixing async with non-async vblanks: IMHO this should also be documented as text and as IGT.
v2: -Few patches have been squashed and patches have been shuffled as per the reviews on the previous version.
v3: -Few patches have been squashed and patches have been shuffled as per the reviews on the previous version.
v4: -Made changes to fix the sequence and time stamp issue as per the comments received on the previous version. -Timestamps are calculated using the flip done time stamp and current timestamp. Here
I915_MODE_FLAG_GET_SCANLINE_FROM_TIMESTAMP flag is used
for timestamp calculations. -Event is sent from the interrupt handler immediately using this updated timestamps and sequence. -Added more state checks as async flip should only allow change in plane surface address and nothing else should be allowed to change. -Added a separate plane hook for async flip. -Need to find a way to reject fbc enabling if it comes as part of this flip as bspec states that changes to FBC are not allowed.
v5: -Fixed the Checkpatch and sparse warnings.
Karthik B S (5): drm/i915: Add enable/disable flip done and flip done handler drm/i915: Add support for async flips in I915 drm/i915: Add checks specific to async flips drm/i915: Do not call drm_crtc_arm_vblank_event in async flips drm/i915: Enable async flips in i915
drivers/gpu/drm/i915/display/intel_display.c | 123 +++++++++++++++++++ drivers/gpu/drm/i915/display/intel_sprite.c | 33
++++-
drivers/gpu/drm/i915/i915_irq.c | 83 +++++++++++-- drivers/gpu/drm/i915/i915_irq.h | 2 + drivers/gpu/drm/i915/i915_reg.h | 5 +- 5 files changed, 237 insertions(+), 9 deletions(-)
On 2020-07-29 9:23 a.m., Kulkarni, Vandita wrote:
On async flips, there needs to be some clarity/guideline on the behaviour and event expectation from the driver by user space. Here are few assumptions that we have,
- Our understanding is that the user space doesn’t expect the timestamp for async flips (but still expects vblank timestamp) , or
doesn’t do anything with that, same is the assumption wrt the flip sequence, please correct us if we are wrong. 2. In the sequence the user space still expects the counter that marks vblanks. 3. The user space can use different event types like DRM_EVENT_VBLANK or DRM_EVENT_FLIP_COMPLETE for getting the corresponding event. And their designs are still aligned to this even in case of async.
If there are any more expectations from the user space wrt to the event that is being sent from the driver in case of async flip, please let us know.
If the user space doesn’t care much about the flip sequence then, we can just not do anything like returning the flip counter like this version is doing and just stick to returning of the frame counter value(which marks vblanks).
There's no such thing as a "flip sequence" in the KMS API. There's only the per-CRTC vblank counter. Each flip completion event needs to contain the value of that counter when the hardware completed the flip, regardless of whether it was an async flip or not.
As for the timestamp in the event, I'm not sure what the expectations are for async flips, but I suspect it may not really matter. E.g. the timestamp calculated to correspond to the end of the previous vertical blank period might be fine.
-----Original Message----- From: Michel Dänzer michel@daenzer.net Sent: Wednesday, July 29, 2020 1:04 PM To: Kulkarni, Vandita vandita.kulkarni@intel.com; Zanoni, Paulo R paulo.r.zanoni@intel.com; Vetter, Daniel daniel.vetter@intel.com; B S, Karthik karthik.b.s@intel.com; intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org; Shankar, Uma uma.shankar@intel.com; nicholas.kazlauskas@amd.com Subject: Re: [PATCH v5 0/5] Asynchronous flip implementation for i915
On 2020-07-29 9:23 a.m., Kulkarni, Vandita wrote:
On async flips, there needs to be some clarity/guideline on the behaviour and event expectation from the driver by user space. Here are few assumptions that we have, 1. Our understanding is that the user space doesn’t expect the timestamp for async flips (but still expects vblank timestamp) , or doesn’t do anything with that, same is the
assumption wrt the flip sequence, please correct us if we are wrong.
- In the sequence the user space still expects the counter that marks
vblanks.
- The user space can use different event types like DRM_EVENT_VBLANK
or DRM_EVENT_FLIP_COMPLETE for getting the corresponding event. And
their designs are still aligned to this even in case of async.
If there are any more expectations from the user space wrt to the event that is being sent from the driver in case of async flip, please let us
know.
If the user space doesn’t care much about the flip sequence then, we can just not do anything like returning the flip counter like this version is
doing and just stick to returning of the frame counter value(which marks vblanks).
There's no such thing as a "flip sequence" in the KMS API. There's only the per-CRTC vblank counter. Each flip completion event needs to contain the value of that counter when the hardware completed the flip, regardless of whether it was an async flip or not.
As for the timestamp in the event, I'm not sure what the expectations are for async flips, but I suspect it may not really matter. E.g. the timestamp calculated to correspond to the end of the previous vertical blank period might be fine.
Thanks Michel, Paulo, Daniel, Nicholas, Ville for your inputs. After all the discussions, looks like the async flip time stamp is not of much use to the userspace and the async flip sequence; hence we will stick to the approach of sending vblank time stamp itself and have a test case in the igt to cover the async flips cases in a slightly different way. And update the documentation.
Thanks, Vandita
-- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer
On 8/4/2020 11:19 AM, Kulkarni, Vandita wrote:
-----Original Message----- From: Michel Dänzer michel@daenzer.net Sent: Wednesday, July 29, 2020 1:04 PM To: Kulkarni, Vandita vandita.kulkarni@intel.com; Zanoni, Paulo R paulo.r.zanoni@intel.com; Vetter, Daniel daniel.vetter@intel.com; B S, Karthik karthik.b.s@intel.com; intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org; Shankar, Uma uma.shankar@intel.com; nicholas.kazlauskas@amd.com Subject: Re: [PATCH v5 0/5] Asynchronous flip implementation for i915
On 2020-07-29 9:23 a.m., Kulkarni, Vandita wrote:
On async flips, there needs to be some clarity/guideline on the behaviour and event expectation from the driver by user space. Here are few assumptions that we have, 1. Our understanding is that the user space doesn’t expect the timestamp for async flips (but still expects vblank timestamp) , or doesn’t do anything with that, same is the
assumption wrt the flip sequence, please correct us if we are wrong.
- In the sequence the user space still expects the counter that marks
vblanks.
- The user space can use different event types like DRM_EVENT_VBLANK
or DRM_EVENT_FLIP_COMPLETE for getting the corresponding event. And
their designs are still aligned to this even in case of async.
If there are any more expectations from the user space wrt to the event that is being sent from the driver in case of async flip, please let us
know.
If the user space doesn’t care much about the flip sequence then, we can just not do anything like returning the flip counter like this version is
doing and just stick to returning of the frame counter value(which marks vblanks).
There's no such thing as a "flip sequence" in the KMS API. There's only the per-CRTC vblank counter. Each flip completion event needs to contain the value of that counter when the hardware completed the flip, regardless of whether it was an async flip or not.
As for the timestamp in the event, I'm not sure what the expectations are for async flips, but I suspect it may not really matter. E.g. the timestamp calculated to correspond to the end of the previous vertical blank period might be fine.
Thanks Michel, Paulo, Daniel, Nicholas, Ville for your inputs. After all the discussions, looks like the async flip time stamp is not of much use to the userspace and the async flip sequence; hence we will stick to the approach of sending vblank time stamp itself and have a test case in the igt to cover the async flips cases in a slightly different way. And update the documentation.
Thanks a lot for all the inputs.
I will make changes in IGT to calculate the time stamps from userspace itself, as we have now concluded that the kernel will be returning vbl timestamps even in the case of async flips.
Also, as suggested by Daniel, I will be adding one more subtest to verify that the async flip time stamp lies in between the timestamps of the previous and the next vblank.
Thanks, Karthik.B.S
Thanks, Vandita
-- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer
dri-devel@lists.freedesktop.org