Hi Chris, for some reason I didn't receive the review email, so I copied your comments from patchwork and faked this email.
static void execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -1538,6 +1566,16 @@ static void execlists_dequeue(struct intel_engine_cs *engine) }
if (__i915_request_submit(rq)) {
/* hsdes: 1809175790 */
if ((GRAPHICS_VER(engine->i915) == 12) &&
rq->vd_ve_aux_inv &&
(engine->class == VIDEO_DECODE_CLASS ||
engine->class == VIDEO_ENHANCEMENT_CLASS)) {
We do not need the extra checks here; we just do as we are told. We only tell ourselves to apply the fixup when required.
Without checking GRAPHICS_VER, I'm seeing a lot of regressions on older platforms in the CI result. This workaround was only implemented for gen12 (gen12_emit_flush_xcs). Without checking engine->class, I'm seeing boot issues due to GEM_BUG_ON() in aux_inv_reg(). Any rq will go through this code regardless of engine class and gen version, so the checking seems to be necessary.
*rq->vd_ve_aux_inv = i915_mmio_reg_offset
Likewise, vd_ve is overspecific, aux_inv_fixup or aux_inv_wa (or wa_aux_iv, fixup_aux_inv).
I wanted to be specific because the workaround was only implemented for vd/ve engines. But I'm ok with your proposal.
(aux_inv_reg(engine));
rq->vd_ve_aux_inv = NULL;
Move this to i915_request initialisation so that we only set aux_inv when required, which probably explains the extra defence.
The pointer is currently initialized with 0x5a5a. I set it to NULL in gen12_emit_flush_xcs, otherwise the rq will enter that if-statement with an invalid pointer. I'm not familiar with the code, there seems to be multiple functions allocating the structure. I agree it's better to set it to NULL at initialization, but need some guidance on where is the best place to do so.
rq->execution_mask = engine->mask;
} if (!merge) { *port++ = i915_request_get(last); last = NULL;
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 28b1f9db5487..69de32e5e15d 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -350,6 +350,8 @@ struct i915_request { struct list_head link; unsigned long delay; } mock;)
u32 *vd_ve_aux_inv;
Not at the end of the struct; that's where we put things in the dungeon. The selftest struct should be last; I do hope no one has been putting things at random places in the struct without considering the layout and semantics. From the flow, this is akin to batch, capture_list; before emitted_jiffies would be a good spot.
Got it, will change. I thought adding at the end would be safer, thanks for the explanation.
-Chris