The Xe_HP architecture introduces compute engines as a new engine class. These compute command streamers (CCS) are similar to the render engine, except that they're intended for GPGPU usage and lack support for the 3D pipeline.
The definition of I915_ENGINE_CLASS_COMPUTE is new ABI; see below for a link to a UMD (compute) merge request that utilizes the new ABI.
This series adds some of the basic enablement for the CCS engines, but does not yet add them to the engine list for the relevant platforms (XeHP SDV and DG2); that will be handled in future series.
UMD (compute): https://github.com/intel/compute-runtime/pull/451
Daniele Ceraolo Spurio (1): drm/i915/xehp: compute engine pipe_control
John Harrison (1): drm/i915/xehp: Extend uninterruptible OpenCL workloads to CCS
Matt Roper (6): drm/i915/xehp: Define compute class and engine drm/i915/xehp: CCS shares the render reset domain drm/i915/xehp: Add Compute CS IRQ handlers drm/i915/xehp: CCS should use RCS setup functions drm/i915/xehp: Define context scheduling attributes in lrc descriptor drm/i915/xehp: Enable ccs/dual-ctx in RCU_MODE
.../drm/i915/gem/selftests/i915_gem_context.c | 8 ++-- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 31 ++++++++++----- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 39 ++++++++++++++++++- drivers/gpu/drm/i915/gt/intel_engine_types.h | 11 +++++- drivers/gpu/drm/i915/gt/intel_engine_user.c | 5 ++- .../drm/i915/gt/intel_execlists_submission.c | 34 +++++++++++++++- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 15 +++++++ drivers/gpu/drm/i915/gt/intel_gt_irq.c | 15 ++++++- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +- drivers/gpu/drm/i915/gt/intel_lrc.h | 10 +++++ drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 13 ++++--- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 28 ++++++++++++- drivers/gpu/drm/i915/i915_drv.h | 2 + drivers/gpu/drm/i915/i915_perf.c | 4 +- drivers/gpu/drm/i915/i915_reg.h | 20 +++++++++- include/uapi/drm/i915_drm.h | 1 + 17 files changed, 215 insertions(+), 29 deletions(-)
Introduce a Compute Command Streamer (CCS), which has access to the media and GPGPU pipelines (but not the 3D pipeline).
To begin with, define the compute class/engine common functions, based on the existing render ones.
Bspec: 46167, 45544 Original-patch-by: Michel Thierry Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Cc: Szymon Morek szymon.morek@intel.com UMD (compute): https://github.com/intel/compute-runtime/pull/451 Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 ++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 9 ++++++- drivers/gpu/drm/i915/gt/intel_engine_user.c | 5 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 13 +++++---- drivers/gpu/drm/i915/i915_reg.h | 8 ++++++ include/uapi/drm/i915_drm.h | 1 + 6 files changed, 57 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 332efea696a5..69944bd8c19d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -153,6 +153,34 @@ static const struct engine_info intel_engines[] = { { .graphics_ver = 12, .base = XEHP_VEBOX4_RING_BASE } }, }, + [CCS0] = { + .class = COMPUTE_CLASS, + .instance = 0, + .mmio_bases = { + { .graphics_ver = 12, .base = GEN12_COMPUTE0_RING_BASE } + } + }, + [CCS1] = { + .class = COMPUTE_CLASS, + .instance = 1, + .mmio_bases = { + { .graphics_ver = 12, .base = GEN12_COMPUTE1_RING_BASE } + } + }, + [CCS2] = { + .class = COMPUTE_CLASS, + .instance = 2, + .mmio_bases = { + { .graphics_ver = 12, .base = GEN12_COMPUTE2_RING_BASE } + } + }, + [CCS3] = { + .class = COMPUTE_CLASS, + .instance = 3, + .mmio_bases = { + { .graphics_ver = 12, .base = GEN12_COMPUTE3_RING_BASE } + } + }, };
/** diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index bfbfe53c23dd..dcb9d8b2362a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -33,7 +33,8 @@ #define VIDEO_ENHANCEMENT_CLASS 2 #define COPY_ENGINE_CLASS 3 #define OTHER_CLASS 4 -#define MAX_ENGINE_CLASS 4 +#define COMPUTE_CLASS 5 +#define MAX_ENGINE_CLASS 5 #define MAX_ENGINE_INSTANCE 7
#define I915_MAX_SLICES 3 @@ -95,6 +96,7 @@ struct i915_ctx_workarounds {
#define I915_MAX_VCS 8 #define I915_MAX_VECS 4 +#define I915_MAX_CCS 4
/* * Engine IDs definitions. @@ -117,6 +119,11 @@ enum intel_engine_id { VECS2, VECS3, #define _VECS(n) (VECS0 + (n)) + CCS0, + CCS1, + CCS2, + CCS3, +#define _CCS(n) (CCS0 + (n)) I915_NUM_ENGINES #define INVALID_ENGINE ((enum intel_engine_id)-1) }; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c index 8f8bea08e734..d981621a7c30 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c @@ -47,6 +47,7 @@ static const u8 uabi_classes[] = { [COPY_ENGINE_CLASS] = I915_ENGINE_CLASS_COPY, [VIDEO_DECODE_CLASS] = I915_ENGINE_CLASS_VIDEO, [VIDEO_ENHANCEMENT_CLASS] = I915_ENGINE_CLASS_VIDEO_ENHANCE, + [COMPUTE_CLASS] = I915_ENGINE_CLASS_COMPUTE, };
static int engine_cmp(void *priv, const struct list_head *A, @@ -139,6 +140,7 @@ const char *intel_engine_class_repr(u8 class) [COPY_ENGINE_CLASS] = "bcs", [VIDEO_DECODE_CLASS] = "vcs", [VIDEO_ENHANCEMENT_CLASS] = "vecs", + [COMPUTE_CLASS] = "ccs", };
if (class >= ARRAY_SIZE(uabi_names) || !uabi_names[class]) @@ -162,6 +164,7 @@ static int legacy_ring_idx(const struct legacy_ring *ring) [COPY_ENGINE_CLASS] = { BCS0, 1 }, [VIDEO_DECODE_CLASS] = { VCS0, I915_MAX_VCS }, [VIDEO_ENHANCEMENT_CLASS] = { VECS0, I915_MAX_VECS }, + [COMPUTE_CLASS] = { CCS0, I915_MAX_CCS }, };
if (GEM_DEBUG_WARN_ON(ring->class >= ARRAY_SIZE(map))) @@ -190,7 +193,7 @@ static void add_legacy_ring(struct legacy_ring *ring, void intel_engines_driver_register(struct drm_i915_private *i915) { struct legacy_ring ring = {}; - u8 uabi_instances[4] = {}; + u8 uabi_instances[5] = {}; struct list_head *it, *next; struct rb_node **p, *prev; LIST_HEAD(engines); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index fa4be13c8854..3f9007e4e895 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -45,8 +45,8 @@ #define GUC_VIDEO_CLASS 1 #define GUC_VIDEOENHANCE_CLASS 2 #define GUC_BLITTER_CLASS 3 -#define GUC_RESERVED_CLASS 4 -#define GUC_LAST_ENGINE_CLASS GUC_RESERVED_CLASS +#define GUC_COMPUTE_CLASS 4 +#define GUC_LAST_ENGINE_CLASS GUC_COMPUTE_CLASS #define GUC_MAX_ENGINE_CLASSES 16 #define GUC_MAX_INSTANCES_PER_CLASS 32
@@ -154,17 +154,20 @@ static inline u8 engine_class_to_guc_class(u8 class) BUILD_BUG_ON(GUC_BLITTER_CLASS != COPY_ENGINE_CLASS); BUILD_BUG_ON(GUC_VIDEO_CLASS != VIDEO_DECODE_CLASS); BUILD_BUG_ON(GUC_VIDEOENHANCE_CLASS != VIDEO_ENHANCEMENT_CLASS); + BUILD_BUG_ON(GUC_COMPUTE_CLASS != (COMPUTE_CLASS - 1)); GEM_BUG_ON(class > MAX_ENGINE_CLASS || class == OTHER_CLASS);
- return class; + /* the GuC arrays don't include OTHER_CLASS */ + return class < OTHER_CLASS ? class : class - 1; }
static inline u8 guc_class_to_engine_class(u8 guc_class) { + BUILD_BUG_ON(GUC_COMPUTE_CLASS != OTHER_CLASS); + BUILD_BUG_ON(GUC_LAST_ENGINE_CLASS != (MAX_ENGINE_CLASS - 1)); GEM_BUG_ON(guc_class > GUC_LAST_ENGINE_CLASS); - GEM_BUG_ON(guc_class == GUC_RESERVED_CLASS);
- return guc_class; + return guc_class < GUC_COMPUTE_CLASS ? guc_class : guc_class + 1; }
/* Work item for submitting workloads into work queue of GuC. */ diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c2853cc005ee..33d6aa0b07c1 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2528,6 +2528,10 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN11_VEBOX2_RING_BASE 0x1d8000 #define XEHP_VEBOX3_RING_BASE 0x1e8000 #define XEHP_VEBOX4_RING_BASE 0x1f8000 +#define GEN12_COMPUTE0_RING_BASE 0x1a000 +#define GEN12_COMPUTE1_RING_BASE 0x1c000 +#define GEN12_COMPUTE2_RING_BASE 0x1e000 +#define GEN12_COMPUTE3_RING_BASE 0x26000 #define BLT_RING_BASE 0x22000 #define RING_TAIL(base) _MMIO((base) + 0x30) #define RING_HEAD(base) _MMIO((base) + 0x34) @@ -8100,6 +8104,10 @@ enum { #define GEN11_KCR (19) #define GEN11_GTPM (16) #define GEN11_BCS (15) +#define GEN12_CCS3 (7) +#define GEN12_CCS2 (6) +#define GEN12_CCS1 (5) +#define GEN12_CCS0 (4) #define GEN11_RCS0 (0)
#define GEN11_GT_INTR_DW1 _MMIO(0x19001c) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index bde5860b3686..9540f33523d8 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -166,6 +166,7 @@ enum drm_i915_gem_engine_class { I915_ENGINE_CLASS_COPY = 1, I915_ENGINE_CLASS_VIDEO = 2, I915_ENGINE_CLASS_VIDEO_ENHANCE = 3, + I915_ENGINE_CLASS_COMPUTE = 4,
/* should be kept compact */
On 07/09/2021 18:19, Matt Roper wrote:
Introduce a Compute Command Streamer (CCS), which has access to the media and GPGPU pipelines (but not the 3D pipeline).
To begin with, define the compute class/engine common functions, based on the existing render ones.
Bspec: 46167, 45544 Original-patch-by: Michel Thierry Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Cc: Szymon Morek szymon.morek@intel.com UMD (compute): https://github.com/intel/compute-runtime/pull/451 Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 ++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 9 ++++++- drivers/gpu/drm/i915/gt/intel_engine_user.c | 5 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 13 +++++---- drivers/gpu/drm/i915/i915_reg.h | 8 ++++++ include/uapi/drm/i915_drm.h | 1 + 6 files changed, 57 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 332efea696a5..69944bd8c19d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -153,6 +153,34 @@ static const struct engine_info intel_engines[] = { { .graphics_ver = 12, .base = XEHP_VEBOX4_RING_BASE } }, },
[CCS0] = {
.class = COMPUTE_CLASS,
.instance = 0,
.mmio_bases = {
{ .graphics_ver = 12, .base = GEN12_COMPUTE0_RING_BASE }
}
},
[CCS1] = {
.class = COMPUTE_CLASS,
.instance = 1,
.mmio_bases = {
{ .graphics_ver = 12, .base = GEN12_COMPUTE1_RING_BASE }
}
},
[CCS2] = {
.class = COMPUTE_CLASS,
.instance = 2,
.mmio_bases = {
{ .graphics_ver = 12, .base = GEN12_COMPUTE2_RING_BASE }
}
},
[CCS3] = {
.class = COMPUTE_CLASS,
.instance = 3,
.mmio_bases = {
{ .graphics_ver = 12, .base = GEN12_COMPUTE3_RING_BASE }
}
}, };
/**
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index bfbfe53c23dd..dcb9d8b2362a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -33,7 +33,8 @@ #define VIDEO_ENHANCEMENT_CLASS 2 #define COPY_ENGINE_CLASS 3 #define OTHER_CLASS 4 -#define MAX_ENGINE_CLASS 4 +#define COMPUTE_CLASS 5 +#define MAX_ENGINE_CLASS 5 #define MAX_ENGINE_INSTANCE 7
#define I915_MAX_SLICES 3 @@ -95,6 +96,7 @@ struct i915_ctx_workarounds {
#define I915_MAX_VCS 8 #define I915_MAX_VECS 4 +#define I915_MAX_CCS 4
/*
- Engine IDs definitions.
@@ -117,6 +119,11 @@ enum intel_engine_id { VECS2, VECS3, #define _VECS(n) (VECS0 + (n))
- CCS0,
- CCS1,
- CCS2,
- CCS3,
+#define _CCS(n) (CCS0 + (n)) I915_NUM_ENGINES #define INVALID_ENGINE ((enum intel_engine_id)-1) }; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c index 8f8bea08e734..d981621a7c30 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c @@ -47,6 +47,7 @@ static const u8 uabi_classes[] = { [COPY_ENGINE_CLASS] = I915_ENGINE_CLASS_COPY, [VIDEO_DECODE_CLASS] = I915_ENGINE_CLASS_VIDEO, [VIDEO_ENHANCEMENT_CLASS] = I915_ENGINE_CLASS_VIDEO_ENHANCE,
[COMPUTE_CLASS] = I915_ENGINE_CLASS_COMPUTE, };
static int engine_cmp(void *priv, const struct list_head *A,
@@ -139,6 +140,7 @@ const char *intel_engine_class_repr(u8 class) [COPY_ENGINE_CLASS] = "bcs", [VIDEO_DECODE_CLASS] = "vcs", [VIDEO_ENHANCEMENT_CLASS] = "vecs",
[COMPUTE_CLASS] = "ccs",
};
if (class >= ARRAY_SIZE(uabi_names) || !uabi_names[class])
@@ -162,6 +164,7 @@ static int legacy_ring_idx(const struct legacy_ring *ring) [COPY_ENGINE_CLASS] = { BCS0, 1 }, [VIDEO_DECODE_CLASS] = { VCS0, I915_MAX_VCS }, [VIDEO_ENHANCEMENT_CLASS] = { VECS0, I915_MAX_VECS },
[COMPUTE_CLASS] = { CCS0, I915_MAX_CCS },
};
if (GEM_DEBUG_WARN_ON(ring->class >= ARRAY_SIZE(map)))
@@ -190,7 +193,7 @@ static void add_legacy_ring(struct legacy_ring *ring, void intel_engines_driver_register(struct drm_i915_private *i915) { struct legacy_ring ring = {};
- u8 uabi_instances[4] = {};
- u8 uabi_instances[5] = {}; struct list_head *it, *next; struct rb_node **p, *prev; LIST_HEAD(engines);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index fa4be13c8854..3f9007e4e895 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -45,8 +45,8 @@ #define GUC_VIDEO_CLASS 1 #define GUC_VIDEOENHANCE_CLASS 2 #define GUC_BLITTER_CLASS 3 -#define GUC_RESERVED_CLASS 4 -#define GUC_LAST_ENGINE_CLASS GUC_RESERVED_CLASS +#define GUC_COMPUTE_CLASS 4 +#define GUC_LAST_ENGINE_CLASS GUC_COMPUTE_CLASS #define GUC_MAX_ENGINE_CLASSES 16 #define GUC_MAX_INSTANCES_PER_CLASS 32
@@ -154,17 +154,20 @@ static inline u8 engine_class_to_guc_class(u8 class) BUILD_BUG_ON(GUC_BLITTER_CLASS != COPY_ENGINE_CLASS); BUILD_BUG_ON(GUC_VIDEO_CLASS != VIDEO_DECODE_CLASS); BUILD_BUG_ON(GUC_VIDEOENHANCE_CLASS != VIDEO_ENHANCEMENT_CLASS);
- BUILD_BUG_ON(GUC_COMPUTE_CLASS != (COMPUTE_CLASS - 1)); GEM_BUG_ON(class > MAX_ENGINE_CLASS || class == OTHER_CLASS);
- return class;
/* the GuC arrays don't include OTHER_CLASS */
return class < OTHER_CLASS ? class : class - 1; }
static inline u8 guc_class_to_engine_class(u8 guc_class) {
BUILD_BUG_ON(GUC_COMPUTE_CLASS != OTHER_CLASS);
In engine_class_to_guc_class we have:
BUILD_BUG_ON(GUC_COMPUTE_CLASS != (COMPUTE_CLASS - 1));
Maybe there's a reason why two "flavours" are used, like to convey something semantically? (I know the value is the same, it's just a bit confusing.)
Overall I would perhaps consider tables for both transformations. Not because of runtime concerns since neither seem on a hot path, but to avoid a "table" (of sorts) expressed with BUILD_BUG_ONs, followed by transform functions. Table would simply express both in one block.
- BUILD_BUG_ON(GUC_LAST_ENGINE_CLASS != (MAX_ENGINE_CLASS - 1)); GEM_BUG_ON(guc_class > GUC_LAST_ENGINE_CLASS);
GEM_BUG_ON(guc_class == GUC_RESERVED_CLASS);
return guc_class;
return guc_class < GUC_COMPUTE_CLASS ? guc_class : guc_class + 1; }
/* Work item for submitting workloads into work queue of GuC. */
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c2853cc005ee..33d6aa0b07c1 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2528,6 +2528,10 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN11_VEBOX2_RING_BASE 0x1d8000 #define XEHP_VEBOX3_RING_BASE 0x1e8000 #define XEHP_VEBOX4_RING_BASE 0x1f8000 +#define GEN12_COMPUTE0_RING_BASE 0x1a000 +#define GEN12_COMPUTE1_RING_BASE 0x1c000 +#define GEN12_COMPUTE2_RING_BASE 0x1e000 +#define GEN12_COMPUTE3_RING_BASE 0x26000 #define BLT_RING_BASE 0x22000 #define RING_TAIL(base) _MMIO((base) + 0x30) #define RING_HEAD(base) _MMIO((base) + 0x34) @@ -8100,6 +8104,10 @@ enum { #define GEN11_KCR (19) #define GEN11_GTPM (16) #define GEN11_BCS (15) +#define GEN12_CCS3 (7) +#define GEN12_CCS2 (6) +#define GEN12_CCS1 (5) +#define GEN12_CCS0 (4) #define GEN11_RCS0 (0)
#define GEN11_GT_INTR_DW1 _MMIO(0x19001c) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index bde5860b3686..9540f33523d8 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -166,6 +166,7 @@ enum drm_i915_gem_engine_class { I915_ENGINE_CLASS_COPY = 1, I915_ENGINE_CLASS_VIDEO = 2, I915_ENGINE_CLASS_VIDEO_ENHANCE = 3,
I915_ENGINE_CLASS_COMPUTE = 4,
/* should be kept compact */
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
On Tue, Sep 07, 2021 at 10:19:09AM -0700, Matt Roper wrote:
Introduce a Compute Command Streamer (CCS), which has access to the media and GPGPU pipelines (but not the 3D pipeline).
To begin with, define the compute class/engine common functions, based on the existing render ones.
Bspec: 46167, 45544 Original-patch-by: Michel Thierry Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Cc: Szymon Morek szymon.morek@intel.com UMD (compute): https://github.com/intel/compute-runtime/pull/451 Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 ++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 9 ++++++- drivers/gpu/drm/i915/gt/intel_engine_user.c | 5 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 13 +++++---- drivers/gpu/drm/i915/i915_reg.h | 8 ++++++ include/uapi/drm/i915_drm.h | 1 + 6 files changed, 57 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 332efea696a5..69944bd8c19d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -153,6 +153,34 @@ static const struct engine_info intel_engines[] = { { .graphics_ver = 12, .base = XEHP_VEBOX4_RING_BASE } }, },
- [CCS0] = {
.class = COMPUTE_CLASS,
.instance = 0,
.mmio_bases = {
{ .graphics_ver = 12, .base = GEN12_COMPUTE0_RING_BASE }
}
- },
- [CCS1] = {
.class = COMPUTE_CLASS,
.instance = 1,
.mmio_bases = {
{ .graphics_ver = 12, .base = GEN12_COMPUTE1_RING_BASE }
}
- },
- [CCS2] = {
.class = COMPUTE_CLASS,
.instance = 2,
.mmio_bases = {
{ .graphics_ver = 12, .base = GEN12_COMPUTE2_RING_BASE }
}
- },
- [CCS3] = {
.class = COMPUTE_CLASS,
.instance = 3,
.mmio_bases = {
{ .graphics_ver = 12, .base = GEN12_COMPUTE3_RING_BASE }
}
- },
};
/** diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index bfbfe53c23dd..dcb9d8b2362a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -33,7 +33,8 @@ #define VIDEO_ENHANCEMENT_CLASS 2 #define COPY_ENGINE_CLASS 3 #define OTHER_CLASS 4 -#define MAX_ENGINE_CLASS 4 +#define COMPUTE_CLASS 5 +#define MAX_ENGINE_CLASS 5 #define MAX_ENGINE_INSTANCE 7
#define I915_MAX_SLICES 3 @@ -95,6 +96,7 @@ struct i915_ctx_workarounds {
#define I915_MAX_VCS 8 #define I915_MAX_VECS 4 +#define I915_MAX_CCS 4
/*
- Engine IDs definitions.
@@ -117,6 +119,11 @@ enum intel_engine_id { VECS2, VECS3, #define _VECS(n) (VECS0 + (n))
- CCS0,
- CCS1,
- CCS2,
- CCS3,
+#define _CCS(n) (CCS0 + (n)) I915_NUM_ENGINES #define INVALID_ENGINE ((enum intel_engine_id)-1) }; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c index 8f8bea08e734..d981621a7c30 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c @@ -47,6 +47,7 @@ static const u8 uabi_classes[] = { [COPY_ENGINE_CLASS] = I915_ENGINE_CLASS_COPY, [VIDEO_DECODE_CLASS] = I915_ENGINE_CLASS_VIDEO, [VIDEO_ENHANCEMENT_CLASS] = I915_ENGINE_CLASS_VIDEO_ENHANCE,
- [COMPUTE_CLASS] = I915_ENGINE_CLASS_COMPUTE,
};
static int engine_cmp(void *priv, const struct list_head *A, @@ -139,6 +140,7 @@ const char *intel_engine_class_repr(u8 class) [COPY_ENGINE_CLASS] = "bcs", [VIDEO_DECODE_CLASS] = "vcs", [VIDEO_ENHANCEMENT_CLASS] = "vecs",
[COMPUTE_CLASS] = "ccs",
};
if (class >= ARRAY_SIZE(uabi_names) || !uabi_names[class])
@@ -162,6 +164,7 @@ static int legacy_ring_idx(const struct legacy_ring *ring) [COPY_ENGINE_CLASS] = { BCS0, 1 }, [VIDEO_DECODE_CLASS] = { VCS0, I915_MAX_VCS }, [VIDEO_ENHANCEMENT_CLASS] = { VECS0, I915_MAX_VECS },
[COMPUTE_CLASS] = { CCS0, I915_MAX_CCS },
};
if (GEM_DEBUG_WARN_ON(ring->class >= ARRAY_SIZE(map)))
@@ -190,7 +193,7 @@ static void add_legacy_ring(struct legacy_ring *ring, void intel_engines_driver_register(struct drm_i915_private *i915) { struct legacy_ring ring = {};
- u8 uabi_instances[4] = {};
- u8 uabi_instances[5] = {}; struct list_head *it, *next; struct rb_node **p, *prev; LIST_HEAD(engines);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index fa4be13c8854..3f9007e4e895 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -45,8 +45,8 @@ #define GUC_VIDEO_CLASS 1 #define GUC_VIDEOENHANCE_CLASS 2 #define GUC_BLITTER_CLASS 3 -#define GUC_RESERVED_CLASS 4 -#define GUC_LAST_ENGINE_CLASS GUC_RESERVED_CLASS +#define GUC_COMPUTE_CLASS 4 +#define GUC_LAST_ENGINE_CLASS GUC_COMPUTE_CLASS #define GUC_MAX_ENGINE_CLASSES 16 #define GUC_MAX_INSTANCES_PER_CLASS 32
@@ -154,17 +154,20 @@ static inline u8 engine_class_to_guc_class(u8 class) BUILD_BUG_ON(GUC_BLITTER_CLASS != COPY_ENGINE_CLASS); BUILD_BUG_ON(GUC_VIDEO_CLASS != VIDEO_DECODE_CLASS); BUILD_BUG_ON(GUC_VIDEOENHANCE_CLASS != VIDEO_ENHANCEMENT_CLASS);
- BUILD_BUG_ON(GUC_COMPUTE_CLASS != (COMPUTE_CLASS - 1)); GEM_BUG_ON(class > MAX_ENGINE_CLASS || class == OTHER_CLASS);
- return class;
- /* the GuC arrays don't include OTHER_CLASS */
- return class < OTHER_CLASS ? class : class - 1;
}
static inline u8 guc_class_to_engine_class(u8 guc_class) {
- BUILD_BUG_ON(GUC_COMPUTE_CLASS != OTHER_CLASS);
- BUILD_BUG_ON(GUC_LAST_ENGINE_CLASS != (MAX_ENGINE_CLASS - 1)); GEM_BUG_ON(guc_class > GUC_LAST_ENGINE_CLASS);
GEM_BUG_ON(guc_class == GUC_RESERVED_CLASS);
return guc_class;
- return guc_class < GUC_COMPUTE_CLASS ? guc_class : guc_class + 1;
}
/* Work item for submitting workloads into work queue of GuC. */ diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c2853cc005ee..33d6aa0b07c1 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -2528,6 +2528,10 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN11_VEBOX2_RING_BASE 0x1d8000 #define XEHP_VEBOX3_RING_BASE 0x1e8000 #define XEHP_VEBOX4_RING_BASE 0x1f8000 +#define GEN12_COMPUTE0_RING_BASE 0x1a000 +#define GEN12_COMPUTE1_RING_BASE 0x1c000 +#define GEN12_COMPUTE2_RING_BASE 0x1e000 +#define GEN12_COMPUTE3_RING_BASE 0x26000 #define BLT_RING_BASE 0x22000 #define RING_TAIL(base) _MMIO((base) + 0x30) #define RING_HEAD(base) _MMIO((base) + 0x34) @@ -8100,6 +8104,10 @@ enum { #define GEN11_KCR (19) #define GEN11_GTPM (16) #define GEN11_BCS (15) +#define GEN12_CCS3 (7) +#define GEN12_CCS2 (6) +#define GEN12_CCS1 (5) +#define GEN12_CCS0 (4) #define GEN11_RCS0 (0)
#define GEN11_GT_INTR_DW1 _MMIO(0x19001c) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index bde5860b3686..9540f33523d8 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -166,6 +166,7 @@ enum drm_i915_gem_engine_class {
Please add kerneldoc for any updated/new uapi. -Daniel
I915_ENGINE_CLASS_COPY = 1, I915_ENGINE_CLASS_VIDEO = 2, I915_ENGINE_CLASS_VIDEO_ENHANCE = 3,
I915_ENGINE_CLASS_COMPUTE = 4,
/* should be kept compact */
-- 2.25.4
The reset domain is shared between render and all compute engines, so resetting one will affect the others.
Note: Before performing a reset on an RCS or CCS engine, the GuC will attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid impacting other clients (since some shared modules will be reset). If other engines are executing non-preemptable workloads, the impact is unavoidable and some work may be lost.
Bspec: 52549 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 91200c43951f..30598c1d070c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, [VECS1] = GEN11_GRDOM_VECS2, [VECS2] = GEN11_GRDOM_VECS3, [VECS3] = GEN11_GRDOM_VECS4, + [CCS0] = GEN11_GRDOM_RENDER, + [CCS1] = GEN11_GRDOM_RENDER, + [CCS2] = GEN11_GRDOM_RENDER, + [CCS3] = GEN11_GRDOM_RENDER, }; struct intel_engine_cs *engine; intel_engine_mask_t tmp;
On 07/09/2021 18:19, Matt Roper wrote:
The reset domain is shared between render and all compute engines, so resetting one will affect the others.
Note: Before performing a reset on an RCS or CCS engine, the GuC will attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid impacting other clients (since some shared modules will be reset). If other engines are executing non-preemptable workloads, the impact is unavoidable and some work may be lost.
Since here it talks about engine reset, should this patch add warning if same is attempted by i915 on a GuC platform - to document it is not implemented/supported? Or perhaps later in the series, or future series works better.
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
Bspec: 52549 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 91200c43951f..30598c1d070c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, [VECS1] = GEN11_GRDOM_VECS2, [VECS2] = GEN11_GRDOM_VECS3, [VECS3] = GEN11_GRDOM_VECS4,
[CCS0] = GEN11_GRDOM_RENDER,
[CCS1] = GEN11_GRDOM_RENDER,
[CCS2] = GEN11_GRDOM_RENDER,
}; struct intel_engine_cs *engine; intel_engine_mask_t tmp;[CCS3] = GEN11_GRDOM_RENDER,
On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote:
On 07/09/2021 18:19, Matt Roper wrote:
The reset domain is shared between render and all compute engines, so resetting one will affect the others.
Note: Before performing a reset on an RCS or CCS engine, the GuC will attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid impacting other clients (since some shared modules will be reset). If other engines are executing non-preemptable workloads, the impact is unavoidable and some work may be lost.
Since here it talks about engine reset, should this patch add warning if same is attempted by i915 on a GuC platform - to document it is not
Did you mean "on a *non* GuC platform" here? We aren't going to have compute engine support on any platforms where GuC submission isn't the default operating model, so the only way to get compute engines + execlist submission is to force an override via module parameters (e.g., enable_guc=0). Doing so will taint the kernel, so I think the current consensus from offline discussion is that the user has already put themselves into a configuration where it's easier than usual to shoot themselves in the foot; it's not too much different than the kind of trouble a user could get themselves into if they loaded the driver with hangcheck disabled or something.
Matt
implemented/supported? Or perhaps later in the series, or future series works better.
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
Bspec: 52549 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 91200c43951f..30598c1d070c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, [VECS1] = GEN11_GRDOM_VECS2, [VECS2] = GEN11_GRDOM_VECS3, [VECS3] = GEN11_GRDOM_VECS4,
[CCS0] = GEN11_GRDOM_RENDER,
[CCS1] = GEN11_GRDOM_RENDER,
[CCS2] = GEN11_GRDOM_RENDER,
}; struct intel_engine_cs *engine; intel_engine_mask_t tmp;[CCS3] = GEN11_GRDOM_RENDER,
On 08/09/2021 21:23, Matt Roper wrote:
On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote:
On 07/09/2021 18:19, Matt Roper wrote:
The reset domain is shared between render and all compute engines, so resetting one will affect the others.
Note: Before performing a reset on an RCS or CCS engine, the GuC will attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid impacting other clients (since some shared modules will be reset). If other engines are executing non-preemptable workloads, the impact is unavoidable and some work may be lost.
Since here it talks about engine reset, should this patch add warning if same is attempted by i915 on a GuC platform - to document it is not
Did you mean "on a *non* GuC platform" here? We aren't going to have compute engine support on any platforms where GuC submission isn't the default operating model, so the only way to get compute engines + execlist submission is to force an override via module parameters (e.g., enable_guc=0). Doing so will taint the kernel, so I think the current consensus from offline discussion is that the user has already put themselves into a configuration where it's easier than usual to shoot themselves in the foot; it's not too much different than the kind of trouble a user could get themselves into if they loaded the driver with hangcheck disabled or something.
Yes I meant non GuC. :)
Okay..ish, although I think an explicit warn would still be better. Because it is one thing to taint and another to actively allow something which we know cannot work.
Unless we could hide the CCS engine until GuC gets loaded, which would make i915.enable_guc=0 safe. Hm.. should be doable actually to skip intel_engine_add_user in the engine init phase and do the CCS ones after GuC has been loaded. Would that make sense?
Regards,
Tvrtko
implemented/supported? Or perhaps later in the series, or future series works better.
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
Bspec: 52549 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 91200c43951f..30598c1d070c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, [VECS1] = GEN11_GRDOM_VECS2, [VECS2] = GEN11_GRDOM_VECS3, [VECS3] = GEN11_GRDOM_VECS4,
[CCS0] = GEN11_GRDOM_RENDER,
[CCS1] = GEN11_GRDOM_RENDER,
[CCS2] = GEN11_GRDOM_RENDER,
}; struct intel_engine_cs *engine; intel_engine_mask_t tmp;[CCS3] = GEN11_GRDOM_RENDER,
On Tue, Sep 07, 2021 at 10:19:10AM -0700, Matt Roper wrote:
The reset domain is shared between render and all compute engines, so resetting one will affect the others.
Note: Before performing a reset on an RCS or CCS engine, the GuC will attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid impacting other clients (since some shared modules will be reset). If other engines are executing non-preemptable workloads, the impact is unavoidable and some work may be lost.
Bspec: 52549 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
Do we have igts validating this all properly?
Specifically that the reset stats are incremented correctly for guilty respectively victimized contexts.
This is necessary if it doesn't exist yet.
Also you need a patch set here that fixes up the igts which have wrong assumptions about context isolation. -Daniel
drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 91200c43951f..30598c1d070c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, [VECS1] = GEN11_GRDOM_VECS2, [VECS2] = GEN11_GRDOM_VECS3, [VECS3] = GEN11_GRDOM_VECS4,
[CCS0] = GEN11_GRDOM_RENDER,
[CCS1] = GEN11_GRDOM_RENDER,
[CCS2] = GEN11_GRDOM_RENDER,
}; struct intel_engine_cs *engine; intel_engine_mask_t tmp;[CCS3] = GEN11_GRDOM_RENDER,
-- 2.25.4
Add execlists and GuC interrupts for compute CS into existing IRQ handlers.
All compute command streamers belong to the same compute class, so the only change needed to enable their interrupts is to program their GT engine interrupt mask registers.
CCS0 shares the register with CCS1, while CCS2 and CCS3 are in a new one.
BSpec: 50844, 54029, 54030, 53223, 53224. Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/gt/intel_gt_irq.c | 15 ++++++++++++++- drivers/gpu/drm/i915/i915_drv.h | 2 ++ drivers/gpu/drm/i915/i915_reg.h | 3 +++ 3 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c index b2de83be4d97..612281d47513 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c @@ -96,7 +96,7 @@ gen11_gt_identity_handler(struct intel_gt *gt, const u32 identity) if (unlikely(!intr)) return;
- if (class <= COPY_ENGINE_CLASS) + if (class <= COPY_ENGINE_CLASS || class == COMPUTE_CLASS) return gen11_engine_irq_handler(gt, class, instance, intr);
if (class == OTHER_CLASS) @@ -178,6 +178,8 @@ void gen11_gt_irq_reset(struct intel_gt *gt) /* Disable RCS, BCS, VCS and VECS class engines. */ intel_uncore_write(uncore, GEN11_RENDER_COPY_INTR_ENABLE, 0); intel_uncore_write(uncore, GEN11_VCS_VECS_INTR_ENABLE, 0); + if (CCS_MASK(gt)) + intel_uncore_write(uncore, GEN12_CCS_RSVD_INTR_ENABLE, 0);
/* Restore masks irqs on RCS, BCS, VCS and VECS engines. */ intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK, ~0); @@ -191,6 +193,10 @@ void gen11_gt_irq_reset(struct intel_gt *gt) intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~0); if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3)) intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~0); + if (HAS_ENGINE(gt, CCS0) || HAS_ENGINE(gt, CCS1)) + intel_uncore_write(uncore, GEN12_CCS0_CCS1_INTR_MASK, ~0); + if (HAS_ENGINE(gt, CCS2) || HAS_ENGINE(gt, CCS3)) + intel_uncore_write(uncore, GEN12_CCS2_CCS3_INTR_MASK, ~0);
intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0); intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK, ~0); @@ -218,6 +224,8 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt) /* Enable RCS, BCS, VCS and VECS class interrupts. */ intel_uncore_write(uncore, GEN11_RENDER_COPY_INTR_ENABLE, dmask); intel_uncore_write(uncore, GEN11_VCS_VECS_INTR_ENABLE, dmask); + if (CCS_MASK(gt)) + intel_uncore_write(uncore, GEN12_CCS_RSVD_INTR_ENABLE, smask);
/* Unmask irqs on RCS, BCS, VCS and VECS engines. */ intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK, ~smask); @@ -231,6 +239,11 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt) intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~dmask); if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3)) intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~dmask); + if (HAS_ENGINE(gt, CCS0) || HAS_ENGINE(gt, CCS1)) + intel_uncore_write(uncore, GEN12_CCS0_CCS1_INTR_MASK, ~dmask); + if (HAS_ENGINE(gt, CCS2) || HAS_ENGINE(gt, CCS3)) + intel_uncore_write(uncore, GEN12_CCS2_CCS3_INTR_MASK, ~dmask); + /* * RPS interrupts will get enabled/disabled on demand when RPS itself * is enabled/disabled. diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1fd3040b6771..5b6eee5d8ade 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1573,6 +1573,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, ENGINE_INSTANCES_MASK(gt, VCS0, I915_MAX_VCS) #define VEBOX_MASK(gt) \ ENGINE_INSTANCES_MASK(gt, VECS0, I915_MAX_VECS) +#define CCS_MASK(gt) \ + ENGINE_INSTANCES_MASK(gt, CCS0, I915_MAX_CCS)
/* * The Gen7 cmdparser copies the scanned buffer to the ggtt for execution diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 33d6aa0b07c1..31e9c2cc4c0c 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -8139,6 +8139,7 @@ enum { #define GEN11_GPM_WGBOXPERF_INTR_ENABLE _MMIO(0x19003c) #define GEN11_CRYPTO_RSVD_INTR_ENABLE _MMIO(0x190040) #define GEN11_GUNIT_CSME_INTR_ENABLE _MMIO(0x190044) +#define GEN12_CCS_RSVD_INTR_ENABLE _MMIO(0x190048)
#define GEN11_RCS0_RSVD_INTR_MASK _MMIO(0x190090) #define GEN11_BCS_RSVD_INTR_MASK _MMIO(0x1900a0) @@ -8152,6 +8153,8 @@ enum { #define GEN11_GPM_WGBOXPERF_INTR_MASK _MMIO(0x1900ec) #define GEN11_CRYPTO_RSVD_INTR_MASK _MMIO(0x1900f0) #define GEN11_GUNIT_CSME_INTR_MASK _MMIO(0x1900f4) +#define GEN12_CCS0_CCS1_INTR_MASK _MMIO(0x190100) +#define GEN12_CCS2_CCS3_INTR_MASK _MMIO(0x190104)
#define ENGINE1_MASK REG_GENMASK(31, 16) #define ENGINE0_MASK REG_GENMASK(15, 0)
On 07/09/2021 18:19, Matt Roper wrote:
Add execlists and GuC interrupts for compute CS into existing IRQ handlers.
All compute command streamers belong to the same compute class, so the only change needed to enable their interrupts is to program their GT engine interrupt mask registers.
CCS0 shares the register with CCS1, while CCS2 and CCS3 are in a new one.
BSpec: 50844, 54029, 54030, 53223, 53224. Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gt/intel_gt_irq.c | 15 ++++++++++++++- drivers/gpu/drm/i915/i915_drv.h | 2 ++ drivers/gpu/drm/i915/i915_reg.h | 3 +++ 3 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c index b2de83be4d97..612281d47513 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c @@ -96,7 +96,7 @@ gen11_gt_identity_handler(struct intel_gt *gt, const u32 identity) if (unlikely(!intr)) return;
- if (class <= COPY_ENGINE_CLASS)
if (class <= COPY_ENGINE_CLASS || class == COMPUTE_CLASS) return gen11_engine_irq_handler(gt, class, instance, intr);
if (class == OTHER_CLASS)
@@ -178,6 +178,8 @@ void gen11_gt_irq_reset(struct intel_gt *gt) /* Disable RCS, BCS, VCS and VECS class engines. */ intel_uncore_write(uncore, GEN11_RENDER_COPY_INTR_ENABLE, 0); intel_uncore_write(uncore, GEN11_VCS_VECS_INTR_ENABLE, 0);
if (CCS_MASK(gt))
intel_uncore_write(uncore, GEN12_CCS_RSVD_INTR_ENABLE, 0);
/* Restore masks irqs on RCS, BCS, VCS and VECS engines. */ intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK, ~0);
@@ -191,6 +193,10 @@ void gen11_gt_irq_reset(struct intel_gt *gt) intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~0); if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3)) intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~0);
if (HAS_ENGINE(gt, CCS0) || HAS_ENGINE(gt, CCS1))
intel_uncore_write(uncore, GEN12_CCS0_CCS1_INTR_MASK, ~0);
if (HAS_ENGINE(gt, CCS2) || HAS_ENGINE(gt, CCS3))
intel_uncore_write(uncore, GEN12_CCS2_CCS3_INTR_MASK, ~0);
intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0); intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK, ~0);
@@ -218,6 +224,8 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt) /* Enable RCS, BCS, VCS and VECS class interrupts. */ intel_uncore_write(uncore, GEN11_RENDER_COPY_INTR_ENABLE, dmask); intel_uncore_write(uncore, GEN11_VCS_VECS_INTR_ENABLE, dmask);
if (CCS_MASK(gt))
intel_uncore_write(uncore, GEN12_CCS_RSVD_INTR_ENABLE, smask);
/* Unmask irqs on RCS, BCS, VCS and VECS engines. */ intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK, ~smask);
@@ -231,6 +239,11 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt) intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~dmask); if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3)) intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~dmask);
- if (HAS_ENGINE(gt, CCS0) || HAS_ENGINE(gt, CCS1))
intel_uncore_write(uncore, GEN12_CCS0_CCS1_INTR_MASK, ~dmask);
- if (HAS_ENGINE(gt, CCS2) || HAS_ENGINE(gt, CCS3))
intel_uncore_write(uncore, GEN12_CCS2_CCS3_INTR_MASK, ~dmask);
- /*
- RPS interrupts will get enabled/disabled on demand when RPS itself
- is enabled/disabled.
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1fd3040b6771..5b6eee5d8ade 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1573,6 +1573,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, ENGINE_INSTANCES_MASK(gt, VCS0, I915_MAX_VCS) #define VEBOX_MASK(gt) \ ENGINE_INSTANCES_MASK(gt, VECS0, I915_MAX_VECS) +#define CCS_MASK(gt) \
ENGINE_INSTANCES_MASK(gt, CCS0, I915_MAX_CCS)
/*
- The Gen7 cmdparser copies the scanned buffer to the ggtt for execution
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 33d6aa0b07c1..31e9c2cc4c0c 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -8139,6 +8139,7 @@ enum { #define GEN11_GPM_WGBOXPERF_INTR_ENABLE _MMIO(0x19003c) #define GEN11_CRYPTO_RSVD_INTR_ENABLE _MMIO(0x190040) #define GEN11_GUNIT_CSME_INTR_ENABLE _MMIO(0x190044) +#define GEN12_CCS_RSVD_INTR_ENABLE _MMIO(0x190048)
#define GEN11_RCS0_RSVD_INTR_MASK _MMIO(0x190090) #define GEN11_BCS_RSVD_INTR_MASK _MMIO(0x1900a0) @@ -8152,6 +8153,8 @@ enum { #define GEN11_GPM_WGBOXPERF_INTR_MASK _MMIO(0x1900ec) #define GEN11_CRYPTO_RSVD_INTR_MASK _MMIO(0x1900f0) #define GEN11_GUNIT_CSME_INTR_MASK _MMIO(0x1900f4) +#define GEN12_CCS0_CCS1_INTR_MASK _MMIO(0x190100) +#define GEN12_CCS2_CCS3_INTR_MASK _MMIO(0x190104)
#define ENGINE1_MASK REG_GENMASK(31, 16) #define ENGINE0_MASK REG_GENMASK(15, 0)
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
The compute engine handles the same commands the render engine can (except 3D pipeline), so it makes sense that CCS is more similar to RCS than non-render engines.
The CCS context state (lrc) is also similar to the render one, so reuse it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE register.
In order to avoid having multiple RCS && CCS checks, add the following engine flag: - I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx.
BSpec: 46260 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 +++++--- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++++++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- drivers/gpu/drm/i915/i915_perf.c | 4 ++-- drivers/gpu/drm/i915/i915_reg.h | 2 +- 8 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index b32f7fed2d9c..fbe10783628b 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -883,7 +883,9 @@ static int igt_shared_ctx_exec(void *arg) return err; }
-static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *vma) +static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, + struct i915_vma *vma, + struct intel_engine_cs *engine) { u32 *cmd;
@@ -894,7 +896,7 @@ static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *v return PTR_ERR(cmd);
*cmd++ = MI_STORE_REGISTER_MEM_GEN8; - *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE); + *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE(engine->mmio_base)); *cmd++ = lower_32_bits(vma->node.start); *cmd++ = upper_32_bits(vma->node.start); *cmd = MI_BATCH_BUFFER_END; @@ -955,7 +957,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj, if (err) goto err_vma;
- err = rpcs_query_batch(rpcs, vma); + err = rpcs_query_batch(rpcs, vma, ce->engine); if (err) goto err_batch;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 69944bd8c19d..b346b946602d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -205,6 +205,8 @@ u32 intel_engine_context_size(struct intel_gt *gt, u8 class) BUILD_BUG_ON(I915_GTT_PAGE_SIZE != PAGE_SIZE);
switch (class) { + case COMPUTE_CLASS: + fallthrough; case RENDER_CLASS: switch (GRAPHICS_VER(gt->i915)) { default: @@ -379,6 +381,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0;
+ /* features common between engines sharing EUs */ + if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) + engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; + engine->defaults = engine->props; /* never to change again */
engine->context_size = intel_engine_context_size(gt, engine->class); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index dcb9d8b2362a..30a0c69c36c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -454,6 +454,7 @@ struct intel_engine_cs { #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) +#define I915_ENGINE_HAS_RCS_REG_STATE BIT(9) unsigned int flags;
/* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index de5f9c86b9a4..4c600c46414d 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3406,7 +3406,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) logical_ring_default_vfuncs(engine); logical_ring_default_irqs(engine);
- if (engine->class == RENDER_CLASS) + if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) rcs_submission_override(engine);
lrc_init_wa_ctx(engine); diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 6ba8daea2f56..6490dce0a73f 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -619,7 +619,7 @@ static const u8 *reg_offsets(const struct intel_engine_cs *engine) GEM_BUG_ON(GRAPHICS_VER(engine->i915) >= 12 && !intel_engine_has_relative_mmio(engine));
- if (engine->class == RENDER_CLASS) { + if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) { if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) return dg2_rcs_offsets; else if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) @@ -1572,7 +1572,7 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine) unsigned int i; int err;
- if (engine->class != RENDER_CLASS) + if (!(engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)) return;
switch (GRAPHICS_VER(engine->i915)) { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 87d8dc8f51b9..2f5bf7aa7e3b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2517,7 +2517,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) guc_default_irqs(engine); guc_init_breadcrumbs(engine);
- if (engine->class == RENDER_CLASS) + if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) rcs_submission_override(engine);
lrc_init_wa_ctx(engine); diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 2f01b8c0284c..5e12a9726c43 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2418,7 +2418,7 @@ gen12_configure_all_contexts(struct i915_perf_stream *stream, { struct flex regs[] = { { - GEN8_R_PWR_CLK_STATE, + GEN8_R_PWR_CLK_STATE(RENDER_RING_BASE), CTX_R_PWR_CLK_STATE, }, }; @@ -2438,7 +2438,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream, #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1) struct flex regs[] = { { - GEN8_R_PWR_CLK_STATE, + GEN8_R_PWR_CLK_STATE(RENDER_RING_BASE), CTX_R_PWR_CLK_STATE, }, { diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 31e9c2cc4c0c..0bb185ce9529 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -441,7 +441,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN8_RING_PDP_UDW(base, n) _MMIO((base) + 0x270 + (n) * 8 + 4) #define GEN8_RING_PDP_LDW(base, n) _MMIO((base) + 0x270 + (n) * 8)
-#define GEN8_R_PWR_CLK_STATE _MMIO(0x20C8) +#define GEN8_R_PWR_CLK_STATE(base) _MMIO((base)+0xc8) #define GEN8_RPCS_ENABLE (1 << 31) #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) #define GEN8_RPCS_S_CNT_SHIFT 15
On 07/09/2021 18:19, Matt Roper wrote:
The compute engine handles the same commands the render engine can (except 3D pipeline), so it makes sense that CCS is more similar to RCS than non-render engines.
The CCS context state (lrc) is also similar to the render one, so reuse it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE register.
In order to avoid having multiple RCS && CCS checks, add the following engine flag:
- I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx.
BSpec: 46260 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 +++++--- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++++++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- drivers/gpu/drm/i915/i915_perf.c | 4 ++-- drivers/gpu/drm/i915/i915_reg.h | 2 +- 8 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index b32f7fed2d9c..fbe10783628b 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -883,7 +883,9 @@ static int igt_shared_ctx_exec(void *arg) return err; }
-static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *vma) +static int rpcs_query_batch(struct drm_i915_gem_object *rpcs,
struct i915_vma *vma,
{ u32 *cmd;struct intel_engine_cs *engine)
@@ -894,7 +896,7 @@ static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *v return PTR_ERR(cmd);
*cmd++ = MI_STORE_REGISTER_MEM_GEN8;
- *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
- *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE(engine->mmio_base)); *cmd++ = lower_32_bits(vma->node.start); *cmd++ = upper_32_bits(vma->node.start); *cmd = MI_BATCH_BUFFER_END;
@@ -955,7 +957,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj, if (err) goto err_vma;
- err = rpcs_query_batch(rpcs, vma);
- err = rpcs_query_batch(rpcs, vma, ce->engine); if (err) goto err_batch;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 69944bd8c19d..b346b946602d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -205,6 +205,8 @@ u32 intel_engine_context_size(struct intel_gt *gt, u8 class) BUILD_BUG_ON(I915_GTT_PAGE_SIZE != PAGE_SIZE);
switch (class) {
- case COMPUTE_CLASS:
case RENDER_CLASS: switch (GRAPHICS_VER(gt->i915)) { default:fallthrough;
@@ -379,6 +381,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0;
/* features common between engines sharing EUs */
if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS)
engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE;
engine->defaults = engine->props; /* never to change again */
engine->context_size = intel_engine_context_size(gt, engine->class);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index dcb9d8b2362a..30a0c69c36c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -454,6 +454,7 @@ struct intel_engine_cs { #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) +#define I915_ENGINE_HAS_RCS_REG_STATE BIT(9) unsigned int flags;
/* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index de5f9c86b9a4..4c600c46414d 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3406,7 +3406,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) logical_ring_default_vfuncs(engine); logical_ring_default_irqs(engine);
- if (engine->class == RENDER_CLASS)
- if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) rcs_submission_override(engine);
Hm, what do pipe control flushes which relate to 3d pipeline end up doing on CCS engines?
Regards,
Tvrtko
lrc_init_wa_ctx(engine); diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 6ba8daea2f56..6490dce0a73f 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -619,7 +619,7 @@ static const u8 *reg_offsets(const struct intel_engine_cs *engine) GEM_BUG_ON(GRAPHICS_VER(engine->i915) >= 12 && !intel_engine_has_relative_mmio(engine));
- if (engine->class == RENDER_CLASS) {
- if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) { if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) return dg2_rcs_offsets; else if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50))
@@ -1572,7 +1572,7 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine) unsigned int i; int err;
- if (engine->class != RENDER_CLASS)
if (!(engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)) return;
switch (GRAPHICS_VER(engine->i915)) {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 87d8dc8f51b9..2f5bf7aa7e3b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2517,7 +2517,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) guc_default_irqs(engine); guc_init_breadcrumbs(engine);
- if (engine->class == RENDER_CLASS)
if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) rcs_submission_override(engine);
lrc_init_wa_ctx(engine);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 2f01b8c0284c..5e12a9726c43 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2418,7 +2418,7 @@ gen12_configure_all_contexts(struct i915_perf_stream *stream, { struct flex regs[] = { {
GEN8_R_PWR_CLK_STATE,
}, };GEN8_R_PWR_CLK_STATE(RENDER_RING_BASE), CTX_R_PWR_CLK_STATE,
@@ -2438,7 +2438,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream, #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1) struct flex regs[] = { {
GEN8_R_PWR_CLK_STATE,
}, {GEN8_R_PWR_CLK_STATE(RENDER_RING_BASE), CTX_R_PWR_CLK_STATE,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 31e9c2cc4c0c..0bb185ce9529 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -441,7 +441,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN8_RING_PDP_UDW(base, n) _MMIO((base) + 0x270 + (n) * 8 + 4) #define GEN8_RING_PDP_LDW(base, n) _MMIO((base) + 0x270 + (n) * 8)
-#define GEN8_R_PWR_CLK_STATE _MMIO(0x20C8) +#define GEN8_R_PWR_CLK_STATE(base) _MMIO((base)+0xc8) #define GEN8_RPCS_ENABLE (1 << 31) #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) #define GEN8_RPCS_S_CNT_SHIFT 15
On 08/09/2021 11:13, Tvrtko Ursulin wrote:
On 07/09/2021 18:19, Matt Roper wrote:
The compute engine handles the same commands the render engine can (except 3D pipeline), so it makes sense that CCS is more similar to RCS than non-render engines.
The CCS context state (lrc) is also similar to the render one, so reuse it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE register.
In order to avoid having multiple RCS && CCS checks, add the following engine flag: - I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx.
BSpec: 46260 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 +++++--- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++++++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- drivers/gpu/drm/i915/i915_perf.c | 4 ++-- drivers/gpu/drm/i915/i915_reg.h | 2 +- 8 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index b32f7fed2d9c..fbe10783628b 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -883,7 +883,9 @@ static int igt_shared_ctx_exec(void *arg) return err; } -static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *vma) +static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, + struct i915_vma *vma, + struct intel_engine_cs *engine) { u32 *cmd; @@ -894,7 +896,7 @@ static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *v return PTR_ERR(cmd); *cmd++ = MI_STORE_REGISTER_MEM_GEN8; - *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE); + *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE(engine->mmio_base)); *cmd++ = lower_32_bits(vma->node.start); *cmd++ = upper_32_bits(vma->node.start); *cmd = MI_BATCH_BUFFER_END; @@ -955,7 +957,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj, if (err) goto err_vma; - err = rpcs_query_batch(rpcs, vma); + err = rpcs_query_batch(rpcs, vma, ce->engine); if (err) goto err_batch; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 69944bd8c19d..b346b946602d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -205,6 +205,8 @@ u32 intel_engine_context_size(struct intel_gt *gt, u8 class) BUILD_BUG_ON(I915_GTT_PAGE_SIZE != PAGE_SIZE); switch (class) { + case COMPUTE_CLASS: + fallthrough; case RENDER_CLASS: switch (GRAPHICS_VER(gt->i915)) { default: @@ -379,6 +381,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0; + /* features common between engines sharing EUs */ + if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) + engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE;
engine->defaults = engine->props; /* never to change again */ engine->context_size = intel_engine_context_size(gt, engine->class); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index dcb9d8b2362a..30a0c69c36c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -454,6 +454,7 @@ struct intel_engine_cs { #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) +#define I915_ENGINE_HAS_RCS_REG_STATE BIT(9) unsigned int flags; /* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index de5f9c86b9a4..4c600c46414d 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3406,7 +3406,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) logical_ring_default_vfuncs(engine); logical_ring_default_irqs(engine); - if (engine->class == RENDER_CLASS) + if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) rcs_submission_override(engine);
Hm, what do pipe control flushes which relate to 3d pipeline end up doing on CCS engines?
Right, answer found in the following patch.
Ideally the two would swap places in the series so by the time vfunc are assigned to the engines they actually handle them correctly. It's a minor point since it's all disabled until the very end of the series so either way:
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
Regards,
Tvrtko
lrc_init_wa_ctx(engine); diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 6ba8daea2f56..6490dce0a73f 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -619,7 +619,7 @@ static const u8 *reg_offsets(const struct intel_engine_cs *engine) GEM_BUG_ON(GRAPHICS_VER(engine->i915) >= 12 && !intel_engine_has_relative_mmio(engine)); - if (engine->class == RENDER_CLASS) { + if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) { if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 55)) return dg2_rcs_offsets; else if (GRAPHICS_VER_FULL(engine->i915) >= IP_VER(12, 50)) @@ -1572,7 +1572,7 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine) unsigned int i; int err; - if (engine->class != RENDER_CLASS) + if (!(engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)) return; switch (GRAPHICS_VER(engine->i915)) { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 87d8dc8f51b9..2f5bf7aa7e3b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2517,7 +2517,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) guc_default_irqs(engine); guc_init_breadcrumbs(engine); - if (engine->class == RENDER_CLASS) + if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) rcs_submission_override(engine); lrc_init_wa_ctx(engine); diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 2f01b8c0284c..5e12a9726c43 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2418,7 +2418,7 @@ gen12_configure_all_contexts(struct i915_perf_stream *stream, { struct flex regs[] = { { - GEN8_R_PWR_CLK_STATE, + GEN8_R_PWR_CLK_STATE(RENDER_RING_BASE), CTX_R_PWR_CLK_STATE, }, }; @@ -2438,7 +2438,7 @@ lrc_configure_all_contexts(struct i915_perf_stream *stream, #define ctx_flexeuN(N) (ctx_flexeu0 + 2 * (N) + 1) struct flex regs[] = { { - GEN8_R_PWR_CLK_STATE, + GEN8_R_PWR_CLK_STATE(RENDER_RING_BASE), CTX_R_PWR_CLK_STATE, }, { diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 31e9c2cc4c0c..0bb185ce9529 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -441,7 +441,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GEN8_RING_PDP_UDW(base, n) _MMIO((base) + 0x270 + (n) * 8
#define GEN8_RING_PDP_LDW(base, n) _MMIO((base) + 0x270 + (n) * 8) -#define GEN8_R_PWR_CLK_STATE _MMIO(0x20C8) +#define GEN8_R_PWR_CLK_STATE(base) _MMIO((base)+0xc8) #define GEN8_RPCS_ENABLE (1 << 31) #define GEN8_RPCS_S_CNT_ENABLE (1 << 18) #define GEN8_RPCS_S_CNT_SHIFT 15
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
CCS re-uses the RCS functions for breadcrumb and flush emission. However, CCS pipe_control has additional programming restrictions:
- Command Streamer Stall Enable must be always set - Post Sync Operations must not be set to Write PS Depth Count - 3D-related bits must not be set
Bspec: 47112 Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 31 ++++++++++++++------ drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 15 ++++++++++ 2 files changed, 37 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 461844dffd7e..bf79e5ee3e45 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -200,6 +200,8 @@ static u32 *gen12_emit_aux_table_inv(const i915_reg_t inv_reg, u32 *cs)
int gen12_emit_flush_rcs(struct i915_request *rq, u32 mode) { + struct intel_engine_cs *engine = rq->engine; + if (mode & EMIT_FLUSH) { u32 flags = 0; u32 *cs; @@ -218,6 +220,9 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 mode)
flags |= PIPE_CONTROL_CS_STALL;
+ if (engine->class == COMPUTE_CLASS) + flags &= ~PIPE_CONTROL_RENDER_ONLY_FLAGS; + cs = intel_ring_begin(rq, 6); if (IS_ERR(cs)) return PTR_ERR(cs); @@ -245,6 +250,9 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 mode)
flags |= PIPE_CONTROL_CS_STALL;
+ if (engine->class == COMPUTE_CLASS) + flags &= ~PIPE_CONTROL_RENDER_ONLY_FLAGS; + cs = intel_ring_begin(rq, 8 + 4); if (IS_ERR(cs)) return PTR_ERR(cs); @@ -617,19 +625,24 @@ u32 *gen12_emit_fini_breadcrumb_xcs(struct i915_request *rq, u32 *cs)
u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs) { + u32 flags = (PIPE_CONTROL_CS_STALL | + PIPE_CONTROL_TILE_CACHE_FLUSH | + PIPE_CONTROL_FLUSH_L3 | + PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | + PIPE_CONTROL_DEPTH_CACHE_FLUSH | + /* Wa_1409600907:tgl,rkl,dg1,adl-p */ + PIPE_CONTROL_DEPTH_STALL | + PIPE_CONTROL_DC_FLUSH_ENABLE | + PIPE_CONTROL_FLUSH_ENABLE); + + if (rq->engine->class == COMPUTE_CLASS) + flags &= ~PIPE_CONTROL_RENDER_ONLY_FLAGS; + cs = gen12_emit_ggtt_write_rcs(cs, rq->fence.seqno, hwsp_offset(rq), PIPE_CONTROL0_HDC_PIPELINE_FLUSH, - PIPE_CONTROL_CS_STALL | - PIPE_CONTROL_TILE_CACHE_FLUSH | - PIPE_CONTROL_FLUSH_L3 | - PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | - PIPE_CONTROL_DEPTH_CACHE_FLUSH | - /* Wa_1409600907:tgl */ - PIPE_CONTROL_DEPTH_STALL | - PIPE_CONTROL_DC_FLUSH_ENABLE | - PIPE_CONTROL_FLUSH_ENABLE); + flags);
return gen12_emit_fini_breadcrumb_tail(rq, cs); } diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index 1c3af0fc0456..a3c1047a2c71 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -223,11 +223,14 @@ #define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29) /* gen11+ */ #define PIPE_CONTROL_TILE_CACHE_FLUSH (1<<28) /* gen11+ */ #define PIPE_CONTROL_FLUSH_L3 (1<<27) +#define PIPE_CONTROL_AMFS_FLUSH (1<<25) /* gen12+ */ #define PIPE_CONTROL_GLOBAL_GTT_IVB (1<<24) /* gen7+ */ #define PIPE_CONTROL_MMIO_WRITE (1<<23) #define PIPE_CONTROL_STORE_DATA_INDEX (1<<21) #define PIPE_CONTROL_CS_STALL (1<<20) +#define PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET (1<<19) #define PIPE_CONTROL_TLB_INVALIDATE (1<<18) +#define PIPE_CONTROL_PSD_SYNC (1<<17) /* gen11+ */ #define PIPE_CONTROL_MEDIA_STATE_CLEAR (1<<16) #define PIPE_CONTROL_WRITE_TIMESTAMP (3<<14) #define PIPE_CONTROL_QW_WRITE (1<<14) @@ -249,6 +252,18 @@ #define PIPE_CONTROL_DEPTH_CACHE_FLUSH (1<<0) #define PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
+/* 3D-related flags can't be set on compute engine */ +#define PIPE_CONTROL_RENDER_ONLY_FLAGS (\ + PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | \ + PIPE_CONTROL_DEPTH_CACHE_FLUSH | \ + PIPE_CONTROL_TILE_CACHE_FLUSH | \ + PIPE_CONTROL_DEPTH_STALL | \ + PIPE_CONTROL_STALL_AT_SCOREBOARD | \ + PIPE_CONTROL_PSD_SYNC | \ + PIPE_CONTROL_AMFS_FLUSH | \ + PIPE_CONTROL_VF_CACHE_INVALIDATE | \ + PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET) + #define MI_MATH(x) MI_INSTR(0x1a, (x) - 1) #define MI_MATH_INSTR(opcode, op1, op2) ((opcode) << 20 | (op1) << 10 | (op2)) /* Opcodes for MI_MATH_INSTR */
In Dual Context mode the EUs are shared between render and compute command streamers. The hardware provides a field in the lrc descriptor to indicate the prioritization of the thread dispatch associated to the corresponding context.
The context priority is set to 'low' at creation time and relies on the existing context priority to set it to low/normal/high.
HSDES: 1604462009 Bspec: 46145, 46260 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Prasad Nallani prasad.nallani@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 4 +++- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 6 +++++- drivers/gpu/drm/i915/gt/intel_lrc.h | 10 ++++++++++ drivers/gpu/drm/i915/i915_reg.h | 4 ++++ 5 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index b346b946602d..2f719f0ecac3 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -382,8 +382,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) engine->props.preempt_timeout_ms = 0;
/* features common between engines sharing EUs */ - if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) + if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) { engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; + engine->flags |= I915_ENGINE_HAS_EU_PRIORITY; + }
engine->defaults = engine->props; /* never to change again */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 30a0c69c36c8..00bf0296b28a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -455,6 +455,7 @@ struct intel_engine_cs { #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) #define I915_ENGINE_HAS_RCS_REG_STATE BIT(9) +#define I915_ENGINE_HAS_EU_PRIORITY BIT(10) unsigned int flags;
/* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 4c600c46414d..2b36ec7f3a04 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -662,9 +662,13 @@ static inline void execlists_schedule_out(struct i915_request *rq) static u64 execlists_update_context(struct i915_request *rq) { struct intel_context *ce = rq->context; - u64 desc = ce->lrc.desc; + u64 desc; u32 tail, prev;
+ desc = ce->lrc.desc; + if (rq->engine->flags & I915_ENGINE_HAS_EU_PRIORITY) + desc |= lrc_desc_priority(rq_prio(rq)); + /* * WaIdleLiteRestore:bdw,skl * diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h index 7f697845c4cf..d3f2096b3d51 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h @@ -79,4 +79,14 @@ static inline u32 lrc_get_runtime(const struct intel_context *ce) return READ_ONCE(ce->lrc_reg_state[CTX_TIMESTAMP]); }
+static inline u32 lrc_desc_priority(int prio) +{ + if (prio > I915_PRIORITY_NORMAL) + return GEN12_CTX_PRIORITY_HIGH; + else if (prio < I915_PRIORITY_NORMAL) + return GEN12_CTX_PRIORITY_LOW; + else + return GEN12_CTX_PRIORITY_NORMAL; +} + #endif /* __INTEL_LRC_H__ */ diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0bb185ce9529..5b68c02c35af 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -4212,6 +4212,10 @@ enum { #define GEN8_CTX_L3LLC_COHERENT (1 << 5) #define GEN8_CTX_PRIVILEGE (1 << 8) #define GEN8_CTX_ADDRESSING_MODE_SHIFT 3 +#define GEN12_CTX_PRIORITY_MASK REG_GENMASK(10, 9) +#define GEN12_CTX_PRIORITY_HIGH REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 2) +#define GEN12_CTX_PRIORITY_NORMAL REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 1) +#define GEN12_CTX_PRIORITY_LOW REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 0)
#define GEN8_CTX_ID_SHIFT 32 #define GEN8_CTX_ID_WIDTH 21
On 07/09/2021 18:19, Matt Roper wrote:
In Dual Context mode the EUs are shared between render and compute command streamers. The hardware provides a field in the lrc descriptor to indicate the prioritization of the thread dispatch associated to the corresponding context.
The context priority is set to 'low' at creation time and relies on the existing context priority to set it to low/normal/high.
HSDES: 1604462009 Bspec: 46145, 46260 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Prasad Nallani prasad.nallani@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 4 +++- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 6 +++++- drivers/gpu/drm/i915/gt/intel_lrc.h | 10 ++++++++++ drivers/gpu/drm/i915/i915_reg.h | 4 ++++ 5 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index b346b946602d..2f719f0ecac3 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -382,8 +382,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) engine->props.preempt_timeout_ms = 0;
/* features common between engines sharing EUs */
- if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS)
if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) { engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE;
engine->flags |= I915_ENGINE_HAS_EU_PRIORITY;
}
engine->defaults = engine->props; /* never to change again */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 30a0c69c36c8..00bf0296b28a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -455,6 +455,7 @@ struct intel_engine_cs { #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) #define I915_ENGINE_HAS_RCS_REG_STATE BIT(9) +#define I915_ENGINE_HAS_EU_PRIORITY BIT(10) unsigned int flags;
/* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 4c600c46414d..2b36ec7f3a04 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -662,9 +662,13 @@ static inline void execlists_schedule_out(struct i915_request *rq) static u64 execlists_update_context(struct i915_request *rq) { struct intel_context *ce = rq->context;
- u64 desc = ce->lrc.desc;
u64 desc; u32 tail, prev;
desc = ce->lrc.desc;
if (rq->engine->flags & I915_ENGINE_HAS_EU_PRIORITY)
desc |= lrc_desc_priority(rq_prio(rq));
/*
- WaIdleLiteRestore:bdw,skl
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h index 7f697845c4cf..d3f2096b3d51 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h @@ -79,4 +79,14 @@ static inline u32 lrc_get_runtime(const struct intel_context *ce) return READ_ONCE(ce->lrc_reg_state[CTX_TIMESTAMP]); }
+static inline u32 lrc_desc_priority(int prio) +{
- if (prio > I915_PRIORITY_NORMAL)
return GEN12_CTX_PRIORITY_HIGH;
- else if (prio < I915_PRIORITY_NORMAL)
return GEN12_CTX_PRIORITY_LOW;
- else
return GEN12_CTX_PRIORITY_NORMAL;
+}
- #endif /* __INTEL_LRC_H__ */
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0bb185ce9529..5b68c02c35af 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -4212,6 +4212,10 @@ enum { #define GEN8_CTX_L3LLC_COHERENT (1 << 5) #define GEN8_CTX_PRIVILEGE (1 << 8) #define GEN8_CTX_ADDRESSING_MODE_SHIFT 3 +#define GEN12_CTX_PRIORITY_MASK REG_GENMASK(10, 9) +#define GEN12_CTX_PRIORITY_HIGH REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 2) +#define GEN12_CTX_PRIORITY_NORMAL REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 1) +#define GEN12_CTX_PRIORITY_LOW REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 0)
#define GEN8_CTX_ID_SHIFT 32 #define GEN8_CTX_ID_WIDTH 21
Haven't checked bspec to check the bitfield but the mechanics look good.
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
We have to specify in the Render Control Unit Mode register when CCS is enabled.
Bspec: 46034 Original-patch-by: Michel Thierry Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com --- .../drm/i915/gt/intel_execlists_submission.c | 26 +++++++++++++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 26 +++++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 3 +++ 3 files changed, 55 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 2b36ec7f3a04..046f7da67ba6 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2874,6 +2874,29 @@ static int execlists_resume(struct intel_engine_cs *engine) return 0; }
+static int gen12_rcs_resume(struct intel_engine_cs *engine) +{ + int ret; + + ret = execlists_resume(engine); + if (ret) + return ret; + + /* + * Multi Context programming. + * just need to program this register once no matter how many CCS + * engines there are. Since some of the CCS engines might be fused off, + * we can't do this as part of the init of a specific CCS and we do + * it during RCS init instead. RCS and all CCS engines are reset + * together, so post-reset re-init is covered as well. + */ + if (CCS_MASK(engine->gt)) + intel_uncore_write(engine->uncore, GEN12_RCU_MODE, + _MASKED_BIT_ENABLE(GEN12_RCU_MODE_CCS_ENABLE)); + + return 0; +} + static void execlists_reset_prepare(struct intel_engine_cs *engine) { ENGINE_TRACE(engine, "depth<-%d\n", @@ -3394,6 +3417,9 @@ static void rcs_submission_override(struct intel_engine_cs *engine) engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; break; } + + if (engine->class == RENDER_CLASS) + engine->resume = gen12_rcs_resume; }
int intel_execlists_submission_setup(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 2f5bf7aa7e3b..db956255d076 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2350,6 +2350,29 @@ static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine) return !sched_engine->tasklet.callback; }
+static int gen12_rcs_resume(struct intel_engine_cs *engine) +{ + int ret; + + ret = guc_resume(engine); + if (ret) + return ret; + + /* + * Multi Context programming. + * just need to program this register once no matter how many CCS + * engines there are. Since some of the CCS engines might be fused off, + * we can't do this as part of the init of a specific CCS and we do + * it during RCS init instead. RCS and all CCS engines are reset + * together, so post-reset re-init is covered as well. + */ + if (CCS_MASK(engine->gt)) + intel_uncore_write(engine->uncore, GEN12_RCU_MODE, + _MASKED_BIT_ENABLE(GEN12_RCU_MODE_CCS_ENABLE)); + + return 0; +} + static void guc_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = guc_submit_request; @@ -2464,6 +2487,9 @@ static void rcs_submission_override(struct intel_engine_cs *engine) engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; break; } + + if (engine->class == RENDER_CLASS) + engine->resume = gen12_rcs_resume; }
static inline void guc_default_irqs(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 5b68c02c35af..57f9456f8c61 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -498,6 +498,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define ECOBITS_PPGTT_CACHE64B (3 << 8) #define ECOBITS_PPGTT_CACHE4B (0 << 8)
+#define GEN12_RCU_MODE _MMIO(0x14800) +#define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0) + #define GAB_CTL _MMIO(0x24000) #define GAB_CTL_CONT_AFTER_PAGEFAULT (1 << 8)
On 07/09/2021 18:19, Matt Roper wrote:
We have to specify in the Render Control Unit Mode register when CCS is enabled.
Bspec: 46034 Original-patch-by: Michel Thierry Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Vinay Belgaumkar vinay.belgaumkar@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Aravind Iddamsetty aravind.iddamsetty@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
.../drm/i915/gt/intel_execlists_submission.c | 26 +++++++++++++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 26 +++++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 3 +++ 3 files changed, 55 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 2b36ec7f3a04..046f7da67ba6 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2874,6 +2874,29 @@ static int execlists_resume(struct intel_engine_cs *engine) return 0; }
+static int gen12_rcs_resume(struct intel_engine_cs *engine) +{
- int ret;
- ret = execlists_resume(engine);
- if (ret)
return ret;
- /*
* Multi Context programming.
* just need to program this register once no matter how many CCS
Just
* engines there are. Since some of the CCS engines might be fused off,
* we can't do this as part of the init of a specific CCS and we do
* it during RCS init instead. RCS and all CCS engines are reset
I don't really understand the "can't" part - clearly it would be doable if a specific vfunc was assigned to one ccs only, the one which is present of course. Not saying that would be nicer since I think it has it's own downside.
Perhaps nicest solution is to add an engine flag saying "enables rcu" and then execlists and guc resume check that and do stuff?
No strong opinion yet, just discussing.
* together, so post-reset re-init is covered as well.
*/
- if (CCS_MASK(engine->gt))
intel_uncore_write(engine->uncore, GEN12_RCU_MODE,
_MASKED_BIT_ENABLE(GEN12_RCU_MODE_CCS_ENABLE));
- return 0;
+}
- static void execlists_reset_prepare(struct intel_engine_cs *engine) { ENGINE_TRACE(engine, "depth<-%d\n",
@@ -3394,6 +3417,9 @@ static void rcs_submission_override(struct intel_engine_cs *engine) engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; break; }
if (engine->class == RENDER_CLASS)
engine->resume = gen12_rcs_resume;
}
int intel_execlists_submission_setup(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 2f5bf7aa7e3b..db956255d076 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2350,6 +2350,29 @@ static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine) return !sched_engine->tasklet.callback; }
+static int gen12_rcs_resume(struct intel_engine_cs *engine) +{
- int ret;
- ret = guc_resume(engine);
- if (ret)
return ret;
- /*
* Multi Context programming.
* just need to program this register once no matter how many CCS
* engines there are. Since some of the CCS engines might be fused off,
* we can't do this as part of the init of a specific CCS and we do
* it during RCS init instead. RCS and all CCS engines are reset
* together, so post-reset re-init is covered as well.
*/
- if (CCS_MASK(engine->gt))
intel_uncore_write(engine->uncore, GEN12_RCU_MODE,
_MASKED_BIT_ENABLE(GEN12_RCU_MODE_CCS_ENABLE));
Duplicating the write from gen12_rcs_resume looks passable but when with the whole comment then hmm.. How about a helper is added which both would call? Like intel_engine_enable_rcu_mode() or something?
Regards,
Tvrtko
- return 0;
+}
- static void guc_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = guc_submit_request;
@@ -2464,6 +2487,9 @@ static void rcs_submission_override(struct intel_engine_cs *engine) engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; break; }
if (engine->class == RENDER_CLASS)
engine->resume = gen12_rcs_resume;
}
static inline void guc_default_irqs(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 5b68c02c35af..57f9456f8c61 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -498,6 +498,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define ECOBITS_PPGTT_CACHE64B (3 << 8) #define ECOBITS_PPGTT_CACHE4B (0 << 8)
+#define GEN12_RCU_MODE _MMIO(0x14800) +#define GEN12_RCU_MODE_CCS_ENABLE REG_BIT(0)
- #define GAB_CTL _MMIO(0x24000) #define GAB_CTL_CONT_AFTER_PAGEFAULT (1 << 8)
From: John Harrison John.C.Harrison@Intel.com
Now that OpenCL workloads can run on the compute engine, we need to set preempt_timeout_ms = 0 on the CCS engines too.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 2f719f0ecac3..7e6ac0ae1f07 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -377,16 +377,17 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) engine->props.timeslice_duration_ms = CONFIG_DRM_I915_TIMESLICE_DURATION;
- /* Override to uninterruptible for OpenCL workloads. */ - if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) - engine->props.preempt_timeout_ms = 0; - /* features common between engines sharing EUs */ if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) { engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; engine->flags |= I915_ENGINE_HAS_EU_PRIORITY; }
+ /* Override to uninterruptible for OpenCL workloads. */ + if (GRAPHICS_VER(i915) == 12 && + engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) + engine->props.preempt_timeout_ms = 0; + engine->defaults = engine->props; /* never to change again */
engine->context_size = intel_engine_context_size(gt, engine->class);
On 07/09/2021 18:19, Matt Roper wrote:
From: John Harrison John.C.Harrison@Intel.com
Now that OpenCL workloads can run on the compute engine, we need to set preempt_timeout_ms = 0 on the CCS engines too.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 2f719f0ecac3..7e6ac0ae1f07 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -377,16 +377,17 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) engine->props.timeslice_duration_ms = CONFIG_DRM_I915_TIMESLICE_DURATION;
- /* Override to uninterruptible for OpenCL workloads. */
- if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
engine->props.preempt_timeout_ms = 0;
- /* features common between engines sharing EUs */ if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) { engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; engine->flags |= I915_ENGINE_HAS_EU_PRIORITY; }
/* Override to uninterruptible for OpenCL workloads. */
if (GRAPHICS_VER(i915) == 12 &&
engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)
engine->props.preempt_timeout_ms = 0;
engine->defaults = engine->props; /* never to change again */
engine->context_size = intel_engine_context_size(gt, engine->class);
Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com
Regards,
Tvrtko
dri-devel@lists.freedesktop.org