This series introduces the enabling patches for new memory compression feature Flat CCS and 64k page support for i915 local memory, along with documentation on the uAPI impact. Included the details of the feature and the implications on the uAPI below. Which is also added into Documentation/gpu/rfc/i915_dg2.rst
DG2 64K page size support: =========================
On discrete platforms, starting from DG2, we have to contend with GTT page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE objects. Specifically the hardware only supports 64K or larger GTT page sizes for such memory. The kernel will already ensure that all I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page sizes underneath.
Note that the returned size here will always reflect any required rounding up done by the kernel, i.e 4K will now become 64K on devices such as DG2.
Special DG2 GTT address alignment requirement:
The GTT alignment will also need to be at least 2M for such objects.
Note that due to how the hardware implements 64K GTT page support, we have some further complications:
1) The entire PDE (which covers a 2MB virtual address range), must contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same PDE is forbidden by the hardware.
2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM objects.
To keep things simple for userland, we mandate that any GTT mappings must be aligned to and rounded up to 2MB. As this only wastes virtual address space and avoids userland having to copy any needlessly complicated PDE sharing scheme (coloring) and only affects DG2, this is deemed to be a good compromise.
Flat CCS support for lmem ========================= On Xe-HP and later devices, we use dedicated compression control state (CCS) stored in local memory for each surface, to support the 3D and media compression formats.
The memory required for the CCS of the entire local memory is 1/256 of the local memory size. So before the kernel boot, the required memory is reserved for the CCS data and a secure register will be programmed with the CCS base address.
Flat CCS data needs to be cleared when a lmem object is allocated. And CCS data can be copied in and out of CCS region through XY_CTRL_SURF_COPY_BLT. CPU can’t access the CCS data directly.
When we exaust the lmem, if the object’s placements support smem, then we can directly decompress the compressed lmem object into smem and start using it from smem itself.
But when we need to swapout the compressed lmem object into a smem region though objects’ placement doesn’t support smem, then we copy the lmem content as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT). When the object is referred, lmem content will be swaped in along with restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding location.
Flat-CCS Modifiers for different compression formats ==================================================== I915_FORMAT_MOD_4_TILED_DG2_RC_CCS - used to indicate the buffers of Flat CCS render compression formats. Though the general layout is same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression algorithm is used. Render compression uses 128 byte compression blocks
I915_FORMAT_MOD_4_TILED_DG2_MC_CCS -used to indicate the buffers of Flat CCS media compression formats. Though the general layout is same as I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm is used. Media compression uses 256 byte compression blocks.
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC - used to indicate the buffers of Flat CCS clear color render compression formats. Unified compression format for clear color render compression. The genral layout is a tiled layout using 4Kb tiles i.e Tile4 layout. Fast clear color value expected by HW is located in fb at offset 0 of plane#1
v2: Fixed some formatting issues and platform naming issues Added some more documentation on Flat-CCS
v3: Plane programming is handled for flat-ccs and clear color Tile4 and flat ccs modifier patches are rebased on table based modifier reference method Three patches are squashed Y tile is pruned for DG2. flat_ccs_cc plane format info is added Added mesa, compute and media ppl for required uAPI ack.
v4: Rebasing of the patches
v5: KDoc is enhanced for cc modifier. [Nanley & Lionel] inbuild macro usage for functional fix [Bob] Addressed review comments from Matt Platform coverage fix for modifiers [Imre]
Abdiel Janulgue (1): drm/i915/lmem: Enable lmem for platforms with Flat CCS
Anshuman Gupta (1): drm/i915/dg2: Flat CCS Support
Ayaz A Siddiqui (1): drm/i915/gt: Clear compress metadata for Xe_HP platforms
CQ Tang (1): drm/i915/xehpsdv: Add has_flat_ccs to device info
Matt Roper (1): drm/i915/dg2: Add DG2 unified compression
Matthew Auld (6): drm/i915: enforce min GTT alignment for discrete cards drm/i915: support 64K GTT pages for discrete cards drm/i915/gtt: allow overriding the pt alignment drm/i915/gtt: add xehpsdv_ppgtt_insert_entry drm/i915/migrate: add acceleration support for DG2 drm/i915/uapi: document behaviour for DG2 64K support
Mika Kahola (1): uapi/drm/dg2: Introduce format modifier for DG2 clear color
Ramalingam C (4): drm/i915: add needs_compact_pt flag Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI drm/i915/Flat-CCS: Document on Flat-CCS memory compression Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI
Robert Beckett (1): drm/i915: add gtt misalignment test
Stanislav Lisovskiy (2): drm/i915: Introduce new Tile 4 format drm/i915/dg2: Tile 4 plane format support
Documentation/gpu/rfc/i915_dg2.rst | 32 ++ Documentation/gpu/rfc/index.rst | 3 + drivers/gpu/drm/i915/display/intel_display.c | 5 +- drivers/gpu/drm/i915/display/intel_fb.c | 68 +++- drivers/gpu/drm/i915/display/intel_fb.h | 1 + drivers/gpu/drm/i915/display/intel_fbc.c | 1 + .../drm/i915/display/intel_plane_initial.c | 1 + .../drm/i915/display/skl_universal_plane.c | 70 +++- .../gpu/drm/i915/gem/selftests/huge_pages.c | 60 ++++ .../i915/gem/selftests/i915_gem_client_blt.c | 21 +- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 158 +++++++- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 14 + drivers/gpu/drm/i915/gt/intel_gt.c | 19 + drivers/gpu/drm/i915/gt/intel_gt.h | 1 + drivers/gpu/drm/i915/gt/intel_gtt.c | 12 + drivers/gpu/drm/i915/gt/intel_gtt.h | 31 +- drivers/gpu/drm/i915/gt/intel_migrate.c | 336 ++++++++++++++++-- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 17 +- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 24 +- drivers/gpu/drm/i915/i915_drv.h | 18 +- drivers/gpu/drm/i915/i915_pci.c | 4 + drivers/gpu/drm/i915/i915_reg.h | 4 + drivers/gpu/drm/i915/i915_vma.c | 9 + drivers/gpu/drm/i915/intel_device_info.h | 3 + drivers/gpu/drm/i915/intel_pm.c | 1 + drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 224 ++++++++++-- include/uapi/drm/drm_fourcc.h | 43 +++ include/uapi/drm/i915_drm.h | 44 ++- 28 files changed, 1102 insertions(+), 122 deletions(-) create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages.
With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access.
v6: * minor doc formatting
Suggested-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/i915_drv.h | 11 ++++++++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 00e7594b59c9..4afdfa5fd3b3 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,17 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
/* * Set this flag, when platform requires 64K GTT page sizes or larger for - * device local memory access. Also this flag implies that we require or - * at least support the compact PT layout for the ppGTT when using the 64K - * GTT pages. + * device local memory access. */ #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
+/* + * Set this flag when platform doesn't allow both 64k pages and 4k pages in + * the same PT. this flag means we need to support compact PT layout for the + * ppGTT when using the 64K GTT pages. + */ +#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt) + #define HAS_IPC(dev_priv) (INTEL_INFO(dev_priv)->display.has_ipc)
#define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i)) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 2df2db0a5d70..ce6ae6a3cbdf 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = { PLATFORM(INTEL_XEHPSDV), .display = { }, .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) | @@ -1046,6 +1047,7 @@ static const struct intel_device_info dg2_info = { PLATFORM(INTEL_DG2), .has_guc_deprivilege = 1, .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index abf1e103c558..d8da40d01dca 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -130,6 +130,7 @@ enum intel_ppgtt_type { /* Keep has_* in alphabetical order */ \ func(has_64bit_reloc); \ func(has_64k_pages); \ + func(needs_compact_pt); \ func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ func(has_global_mocs); \
From: Matthew Auld matthew.auld@intel.com
For local-memory objects we need to align the GTT addresses to 64K, both for the ppgtt and ggtt.
We need to support vm->min_alignment > 4K, depending on the vm itself and the type of object we are inserting. With this in mind update the GTT selftests to take this into account.
For compact-pt we further align and pad lmem object GTT addresses to 2MB to ensure PDEs contain consistent page sizes as required by the HW.
v3: * use needs_compact_pt flag to discriminate between 64K and 64K with compact-pt * add i915_vm_obj_min_alignment * use i915_vm_obj_min_alignment to round up vma reservation if compact-pt instead of hard coding v5: * fix i915_vm_obj_min_alignment for internal objects which have no memory region v6: * tiled_blits_create correctly pick largest required alignment
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com --- .../i915/gem/selftests/i915_gem_client_blt.c | 21 ++-- drivers/gpu/drm/i915/gt/intel_gtt.c | 12 +++ drivers/gpu/drm/i915/gt/intel_gtt.h | 18 ++++ drivers/gpu/drm/i915/i915_vma.c | 9 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++------- 5 files changed, 115 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c index c8ff8bf0986d..3675d12a7d9a 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c @@ -39,6 +39,7 @@ struct tiled_blits { struct blit_buffer scratch; struct i915_vma *batch; u64 hole; + u64 align; u32 width; u32 height; }; @@ -410,14 +411,19 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng) goto err_free; }
- hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4); + t->align = i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL); + t->align = max(t->align, + i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM)); + + hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align); hole_size *= 2; /* room to maneuver */ - hole_size += 2 * I915_GTT_MIN_ALIGNMENT; + hole_size += 2 * t->align; /* padding on either side */
mutex_lock(&t->ce->vm->mutex); memset(&hole, 0, sizeof(hole)); err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole, - hole_size, 0, I915_COLOR_UNEVICTABLE, + hole_size, t->align, + I915_COLOR_UNEVICTABLE, 0, U64_MAX, DRM_MM_INSERT_BEST); if (!err) @@ -428,7 +434,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng) goto err_put; }
- t->hole = hole.start + I915_GTT_MIN_ALIGNMENT; + t->hole = hole.start + t->align; pr_info("Using hole at %llx\n", t->hole);
err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng); @@ -455,7 +461,7 @@ static void tiled_blits_destroy(struct tiled_blits *t) static int tiled_blits_prepare(struct tiled_blits *t, struct rnd_state *prng) { - u64 offset = PAGE_ALIGN(t->width * t->height * 4); + u64 offset = round_up(t->width * t->height * 4, t->align); u32 *map; int err; int i; @@ -486,8 +492,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng) { - u64 offset = - round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT); + u64 offset = round_up(t->width * t->height * 4, 2 * t->align); int err;
/* We want to check position invariant tiling across GTT eviction */ @@ -500,7 +505,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
/* Reposition so that we overlap the old addresses, and slightly off */ err = tiled_blit(t, - &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT, + &t->buffers[2], t->hole + t->align, &t->buffers[1], t->hole + 3 * offset / 2); if (err) return err; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 46be4197b93f..df23ebdfc994 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -223,6 +223,18 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
GEM_BUG_ON(!vm->total); drm_mm_init(&vm->mm, 0, vm->total); + + memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT, + ARRAY_SIZE(vm->min_alignment)); + + if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) { + vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M; + vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M; + } else if (HAS_64K_PAGES(vm->i915)) { + vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K; + vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K; + } + vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
INIT_LIST_HEAD(&vm->bound_list); diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 8073438b67c8..ba9f040f8606 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -29,6 +29,8 @@ #include "i915_selftest.h" #include "i915_vma_resource.h" #include "i915_vma_types.h" +#include "i915_params.h" +#include "intel_memory_region.h"
#define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
@@ -223,6 +225,7 @@ struct i915_address_space { struct device *dma; u64 total; /* size addr space maps (ex. 2GB for ggtt) */ u64 reserved; /* size addr space reserved */ + u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
unsigned int bind_async_flags;
@@ -384,6 +387,21 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm) return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K); }
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm, + enum intel_memory_type type) +{ + return vm->min_alignment[type]; +} + +static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm, + struct drm_i915_gem_object *obj) +{ + struct intel_memory_region *mr = READ_ONCE(obj->mm.region); + enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM; + + return i915_vm_min_alignment(vm, type); +} + static inline bool i915_vm_has_cache_coloring(struct i915_address_space *vm) { diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 845cd88f8313..3558b16a929c 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -757,6 +757,14 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE); GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
+ alignment = max(alignment, i915_vm_obj_min_alignment(vma->vm, vma->obj)); + /* + * for compact-pt we round up the reservation to prevent + * any smaller pages being used within the same PDE + */ + if (NEEDS_COMPACT_PT(vma->vm->i915)) + size = round_up(size, alignment); + /* If binding the object/GGTT view requires more space than the entire * aperture has, reject it early before evicting everything in a vain * attempt to find space. @@ -769,6 +777,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, }
color = 0; + if (i915_vm_has_cache_coloring(vma->vm)) color = vma->obj->cache_level;
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index fba1c8be1649..b80788a2b7f9 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm, u64 hole_start, u64 hole_end, unsigned long end_time) { + const unsigned int min_alignment = + i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); I915_RND_STATE(seed_prng); struct i915_vma_resource *mock_vma_res; unsigned int size; @@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm, I915_RND_SUBSTATE(prng, seed_prng); struct drm_i915_gem_object *obj; unsigned int *order, count, n; - u64 hole_size; + u64 hole_size, aligned_size;
- hole_size = (hole_end - hole_start) >> size; + aligned_size = max_t(u32, ilog2(min_alignment), size); + hole_size = (hole_end - hole_start) >> aligned_size; if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32)) hole_size = KMALLOC_MAX_SIZE / sizeof(u32); count = hole_size >> 1; @@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm, } GEM_BUG_ON(!order);
- GEM_BUG_ON(count * BIT_ULL(size) > vm->total); - GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end); + GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total); + GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
/* Ignore allocation failures (i.e. don't report them as * a test failure) as we are purposefully allocating very @@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm, }
for (n = 0; n < count; n++) { - u64 addr = hole_start + order[n] * BIT_ULL(size); + u64 addr = hole_start + order[n] * BIT_ULL(aligned_size); intel_wakeref_t wakeref;
- GEM_BUG_ON(addr + BIT_ULL(size) > vm->total); + GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
if (igt_timeout(end_time, "%s timed out before %d/%d\n", @@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm, }
mock_vma_res->bi.pages = obj->mm.pages; - mock_vma_res->node_size = BIT_ULL(size); + mock_vma_res->node_size = BIT_ULL(aligned_size); mock_vma_res->start = addr;
with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref) @@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
i915_random_reorder(order, count, &prng); for (n = 0; n < count; n++) { - u64 addr = hole_start + order[n] * BIT_ULL(size); + u64 addr = hole_start + order[n] * BIT_ULL(aligned_size); intel_wakeref_t wakeref;
GEM_BUG_ON(addr + BIT_ULL(size) > vm->total); @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm, { const u64 hole_size = hole_end - hole_start; struct drm_i915_gem_object *obj; + const unsigned int min_alignment = + i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); const unsigned long max_pages = - min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT); + min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment)); const unsigned long max_step = max(int_sqrt(max_pages), 2UL); unsigned long npages, prime, flags; struct i915_vma *vma; @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
offset = p->offset; list_for_each_entry(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue;
if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; }
err = i915_vma_pin(vma, 0, 0, offset | flags); @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm, i915_vma_unpin(vma);
if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } }
offset = p->offset; list_for_each_entry(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue;
if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; }
if (!drm_mm_node_allocated(&vma->node) || @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm, }
if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } }
offset = p->offset; list_for_each_entry_reverse(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue;
if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; }
err = i915_vma_pin(vma, 0, 0, offset | flags); @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm, i915_vma_unpin(vma);
if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } }
offset = p->offset; list_for_each_entry_reverse(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue;
if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; }
if (!drm_mm_node_allocated(&vma->node) || @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm, }
if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } } } @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm, const u64 hole_size = hole_end - hole_start; const unsigned long max_pages = min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT); + unsigned long min_alignment; unsigned long flags; u64 size;
@@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm, if (i915_is_ggtt(vm)) flags |= PIN_GLOBAL;
+ min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + for_each_prime_number_from(size, 1, max_pages) { struct drm_i915_gem_object *obj; struct i915_vma *vma; @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
for (addr = hole_start; addr + obj->base.size < hole_end; - addr += obj->base.size) { + addr += round_up(obj->base.size, min_alignment)) { err = i915_vma_pin(vma, 0, 0, addr | flags); if (err) { pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n", @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm, { struct drm_i915_gem_object *obj; struct i915_vma *vma; + unsigned int min_alignment; unsigned long flags; unsigned int pot; int err = 0; @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm, if (i915_is_ggtt(vm)) flags |= PIN_GLOBAL;
+ min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE); if (IS_ERR(obj)) return PTR_ERR(obj); @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
/* Insert a pair of pages across every pot boundary within the hole */ for (pot = fls64(hole_end - 1) - 1; - pot > ilog2(2 * I915_GTT_PAGE_SIZE); + pot > ilog2(2 * min_alignment); pot--) { u64 step = BIT_ULL(pot); u64 addr;
- for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE; - addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE; + for (addr = round_up(hole_start + min_alignment, step) - min_alignment; + addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment; addr += step) { err = i915_vma_pin(vma, 0, 0, addr | flags); if (err) { @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm, unsigned long end_time) { I915_RND_STATE(prng); + unsigned int min_alignment; unsigned int size; unsigned long flags;
@@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm, if (i915_is_ggtt(vm)) flags |= PIN_GLOBAL;
+ min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + /* Keep creating larger objects until one cannot fit into the hole */ for (size = 12; (hole_end - hole_start) >> size; size++) { struct drm_i915_gem_object *obj; unsigned int *order, count, n; struct i915_vma *vma; - u64 hole_size; + u64 hole_size, aligned_size; int err = -ENODEV;
- hole_size = (hole_end - hole_start) >> size; + aligned_size = max_t(u32, ilog2(min_alignment), size); + hole_size = (hole_end - hole_start) >> aligned_size; if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32)) hole_size = KMALLOC_MAX_SIZE / sizeof(u32); count = hole_size >> 1; @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm, GEM_BUG_ON(vma->size != BIT_ULL(size));
for (n = 0; n < count; n++) { - u64 addr = hole_start + order[n] * BIT_ULL(size); + u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
err = i915_vma_pin(vma, 0, 0, addr | flags); if (err) { @@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm, { struct drm_i915_gem_object *obj; unsigned long flags = PIN_OFFSET_FIXED | PIN_USER; + unsigned int min_alignment; unsigned int order = 12; LIST_HEAD(objects); int err = 0; u64 addr;
+ min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + /* Keep creating larger objects until one cannot fit into the hole */ for (addr = hole_start; addr < hole_end; ) { struct i915_vma *vma; @@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm, }
i915_vma_unpin(vma); - addr += size; + addr += round_up(size, min_alignment);
/* * Since we are injecting allocation faults at random intervals,
From: Matthew Auld matthew.auld@intel.com
discrete cards optimise 64K GTT pages for local-memory, since everything should be allocated at 64K granularity. We say goodbye to sparse entries, and instead get a compact 256B page-table for 64K pages, which should be more cache friendly. 4K pages for local-memory are no longer supported by the HW.
v4: don't return uninitialized err in igt_ppgtt_compact Reported-by: kernel test robot lkp@intel.com
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Stuart Summers stuart.summers@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com --- .../gpu/drm/i915/gem/selftests/huge_pages.c | 60 ++++++++++ drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 108 +++++++++++++++++- drivers/gpu/drm/i915/gt/intel_gtt.h | 3 + drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + 4 files changed, 169 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index f36191ebf964..a7d9bdb85d70 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg) return err; }
+static int igt_ppgtt_compact(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct drm_i915_gem_object *obj; + int err; + + /* + * Simple test to catch issues with compact 64K pages -- since the pt is + * compacted to 256B that gives us 32 entries per pt, however since the + * backing page for the pt is 4K, any extra entries we might incorrectly + * write out should be ignored by the HW. If ever hit such a case this + * test should catch it since some of our writes would land in scratch. + */ + + if (!HAS_64K_PAGES(i915)) { + pr_info("device lacks compact 64K page support, skipping\n"); + return 0; + } + + if (!HAS_LMEM(i915)) { + pr_info("device lacks LMEM support, skipping\n"); + return 0; + } + + /* We want the range to cover multiple page-table boundaries. */ + obj = i915_gem_object_create_lmem(i915, SZ_4M, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + err = i915_gem_object_pin_pages_unlocked(obj); + if (err) + goto out_put; + + if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) { + pr_info("LMEM compact unable to allocate huge-page(s)\n"); + goto out_unpin; + } + + /* + * Disable 2M GTT pages by forcing the page-size to 64K for the GTT + * insertion. + */ + obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K; + + err = igt_write_huge(i915, obj); + if (err) + pr_err("LMEM compact write-huge failed\n"); + +out_unpin: + i915_gem_object_unpin_pages(obj); +out_put: + i915_gem_object_put(obj); + + if (err == -ENOMEM) + err = 0; + + return err; +} + static int igt_tmpfs_fallback(void *arg) { struct drm_i915_private *i915 = arg; @@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_tmpfs_fallback), SUBTEST(igt_ppgtt_smoke_huge), SUBTEST(igt_ppgtt_sanity_check), + SUBTEST(igt_ppgtt_compact), };
if (!HAS_PPGTT(i915)) { diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index c43e724afa9f..62471730266c 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, start, end, lvl); } else { unsigned int count; + unsigned int pte = gen8_pd_index(start, 0); + unsigned int num_ptes; u64 *vaddr;
count = gen8_pt_count(start, end); @@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
+ num_ptes = count; + if (pt->is_compact) { + GEM_BUG_ON(num_ptes % 16); + GEM_BUG_ON(pte % 16); + num_ptes /= 16; + pte /= 16; + } + vaddr = px_vaddr(pt); - memset64(vaddr + gen8_pd_index(start, 0), + memset64(vaddr + pte, vm->scratch[0]->encode, - count); + num_ptes);
atomic_sub(count, &pt->used); start += count; @@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, return idx; }
+static void +xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm, + struct i915_vma_resource *vma_res, + struct sgt_dma *iter, + enum i915_cache_level cache_level, + u32 flags) +{ + const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags); + unsigned int rem = sg_dma_len(iter->sg); + u64 start = vma_res->start; + + GEM_BUG_ON(!i915_vm_is_4lvl(vm)); + + do { + struct i915_page_directory * const pdp = + gen8_pdp_for_page_address(vm, start); + struct i915_page_directory * const pd = + i915_pd_entry(pdp, __gen8_pte_index(start, 2)); + struct i915_page_table *pt = + i915_pt_entry(pd, __gen8_pte_index(start, 1)); + gen8_pte_t encode = pte_encode; + unsigned int page_size; + gen8_pte_t *vaddr; + u16 index, max; + + max = I915_PDES; + + if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M && + IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) && + rem >= I915_GTT_PAGE_SIZE_2M && + !__gen8_pte_index(start, 0)) { + index = __gen8_pte_index(start, 1); + encode |= GEN8_PDE_PS_2M; + page_size = I915_GTT_PAGE_SIZE_2M; + + vaddr = px_vaddr(pd); + } else { + if (encode & GEN12_PPGTT_PTE_LM) { + GEM_BUG_ON(__gen8_pte_index(start, 0) % 16); + GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K); + GEM_BUG_ON(!IS_ALIGNED(iter->dma, + I915_GTT_PAGE_SIZE_64K)); + + index = __gen8_pte_index(start, 0) / 16; + page_size = I915_GTT_PAGE_SIZE_64K; + + max /= 16; + + vaddr = px_vaddr(pd); + vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K; + + pt->is_compact = true; + } else { + GEM_BUG_ON(pt->is_compact); + index = __gen8_pte_index(start, 0); + page_size = I915_GTT_PAGE_SIZE; + } + + vaddr = px_vaddr(pt); + } + + do { + GEM_BUG_ON(rem < page_size); + vaddr[index++] = encode | iter->dma; + + start += page_size; + iter->dma += page_size; + rem -= page_size; + if (iter->dma >= iter->max) { + iter->sg = __sg_next(iter->sg); + if (!iter->sg) + break; + + rem = sg_dma_len(iter->sg); + if (!rem) + break; + + iter->dma = sg_dma_address(iter->sg); + iter->max = iter->dma + rem; + + if (unlikely(!IS_ALIGNED(iter->dma, page_size))) + break; + } + } while (rem >= page_size && index < max); + + vma_res->page_sizes_gtt |= page_size; + } while (iter->sg && sg_dma_len(iter->sg)); +} + static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, struct i915_vma_resource *vma_res, struct sgt_dma *iter, @@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, struct sgt_dma iter = sgt_dma(vma_res);
if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) { - gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + if (HAS_64K_PAGES(vm->i915)) + xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + else + gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); } else { u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index ba9f040f8606..e6ce0be6d484 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
#define GEN12_GGTT_PTE_LM BIT_ULL(1)
+#define GEN12_PDE_64K BIT(6) + /* * Cacheability Control is a 4-bit value. The low three bits are stored in bits * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE. @@ -160,6 +162,7 @@ struct i915_page_table { atomic_t used; struct i915_page_table *stash; }; + bool is_compact; };
struct i915_page_directory { diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 48e6e2f87700..043652dc6892 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm) return ERR_PTR(-ENOMEM); }
+ pt->is_compact = false; atomic_set(&pt->used, 0); return pt; }
From: Robert Beckett bob.beckett@collabora.com
add test to check handling of misaligned offsets and sizes
v4: * remove spurious blank lines * explicitly cast intel_region_id to intel_memory_type in misaligned_pin Reported-by: kernel test robot lkp@intel.com v6: * use NEEDS_COMPACT_PT instead of hard coding for DG2
Signed-off-by: Robert Beckett bob.beckett@collabora.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++++++++++++++++++ 1 file changed, 128 insertions(+)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index b80788a2b7f9..c23b1e5cc436 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -22,10 +22,12 @@ * */
+#include "gt/intel_gtt.h" #include <linux/list_sort.h> #include <linux/prime_numbers.h>
#include "gem/i915_gem_context.h" +#include "gem/i915_gem_region.h" #include "gem/selftests/mock_context.h" #include "gt/intel_context.h" #include "gt/intel_gpu_commands.h" @@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm, return err; }
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr, + u64 addr, u64 size, unsigned long flags) +{ + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + int err = 0; + u64 expected_vma_size, expected_node_size; + + obj = i915_gem_object_create_region(mr, size, 0, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + vma = i915_vma_instance(obj, vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err_put; + } + + err = i915_vma_pin(vma, 0, 0, addr | flags); + if (err) + goto err_put; + i915_vma_unpin(vma); + + if (!drm_mm_node_allocated(&vma->node)) { + err = -EINVAL; + goto err_put; + } + + if (i915_vma_misplaced(vma, 0, 0, addr | flags)) { + err = -EINVAL; + goto err_put; + } + + expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1)); + expected_node_size = expected_vma_size; + + if (NEEDS_COMPACT_PT(vm->i915) && i915_gem_object_is_lmem(obj)) { + /* compact-pt should expand lmem node to 2MB */ + expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K); + expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M); + } + + if (vma->size != expected_vma_size || vma->node.size != expected_node_size) { + err = i915_vma_unbind(vma); + err = -EBADSLT; + goto err_put; + } + + err = i915_vma_unbind(vma); + if (err) + goto err_put; + + GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); + +err_put: + i915_gem_object_put(obj); + cleanup_freed_objects(vm->i915); + return err; +} + +static int misaligned_pin(struct i915_address_space *vm, + u64 hole_start, u64 hole_end, + unsigned long end_time) +{ + struct intel_memory_region *mr; + enum intel_region_id id; + unsigned long flags = PIN_OFFSET_FIXED | PIN_USER; + int err = 0; + u64 hole_size = hole_end - hole_start; + + if (i915_is_ggtt(vm)) + flags |= PIN_GLOBAL; + + for_each_memory_region(mr, vm->i915, id) { + u64 min_alignment = i915_vm_min_alignment(vm, (enum intel_memory_type)id); + u64 size = min_alignment; + u64 addr = round_up(hole_start + (hole_size / 2), min_alignment); + + /* we can't test < 4k alignment due to flags being encoded in lower bits */ + if (min_alignment != I915_GTT_PAGE_SIZE_4K) { + err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags); + /* misaligned should error with -EINVAL*/ + if (!err) + err = -EBADSLT; + if (err != -EINVAL) + return err; + } + + /* test for vma->size expansion to min page size */ + err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags); + if (min_alignment > hole_size) { + if (!err) + err = -EBADSLT; + else if (err == -ENOSPC) + err = 0; + } + if (err) + return err; + + /* test for intermediate size not expanding vma->size for large alignments */ + err = misaligned_case(vm, mr, addr, size / 2, flags); + if (min_alignment > hole_size) { + if (!err) + err = -EBADSLT; + else if (err == -ENOSPC) + err = 0; + } + if (err) + return err; + } + + return 0; +} + static int exercise_ppgtt(struct drm_i915_private *dev_priv, int (*func)(struct i915_address_space *vm, u64 hole_start, u64 hole_end, @@ -1136,6 +1252,11 @@ static int igt_ppgtt_shrink_boom(void *arg) return exercise_ppgtt(arg, shrink_boom); }
+static int igt_ppgtt_misaligned_pin(void *arg) +{ + return exercise_ppgtt(arg, misaligned_pin); +} + static int sort_holes(void *priv, const struct list_head *A, const struct list_head *B) { @@ -1208,6 +1329,11 @@ static int igt_ggtt_lowlevel(void *arg) return exercise_ggtt(arg, lowlevel_hole); }
+static int igt_ggtt_misaligned_pin(void *arg) +{ + return exercise_ggtt(arg, misaligned_pin); +} + static int igt_ggtt_page(void *arg) { const unsigned int count = PAGE_SIZE/sizeof(u32); @@ -2180,12 +2306,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_ppgtt_fill), SUBTEST(igt_ppgtt_shrink), SUBTEST(igt_ppgtt_shrink_boom), + SUBTEST(igt_ppgtt_misaligned_pin), SUBTEST(igt_ggtt_lowlevel), SUBTEST(igt_ggtt_drunk), SUBTEST(igt_ggtt_walk), SUBTEST(igt_ggtt_pot), SUBTEST(igt_ggtt_fill), SUBTEST(igt_ggtt_page), + SUBTEST(igt_ggtt_misaligned_pin), SUBTEST(igt_cs_tlb), };
From: Matthew Auld matthew.auld@intel.com
On some platforms we have alignment restrictions when accessing LMEM from the GTT. In the next patch few patches we need to be able to modify the page-tables directly via the GTT itself.
Suggested-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/gt/intel_gtt.h | 10 +++++++++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 16 ++++++++++++---- 2 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index e6ce0be6d484..2c62411eb52c 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -200,6 +200,14 @@ void *__px_vaddr(struct drm_i915_gem_object *p); struct i915_vm_pt_stash { /* preallocated chains of page tables/directories */ struct i915_page_table *pt[2]; + /* + * Optionally override the alignment/size of the physical page that + * contains each PT. If not set defaults back to the usual + * I915_GTT_PAGE_SIZE_4K. This does not influence the other paging + * structures. MUST be a power-of-two. ONLY applicable on discrete + * platforms. + */ + int pt_sz; };
struct i915_vma_ops { @@ -591,7 +599,7 @@ void free_scratch(struct i915_address_space *vm);
struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz); struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz); -struct i915_page_table *alloc_pt(struct i915_address_space *vm); +struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz); struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde);
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 043652dc6892..d91e2beb7517 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -12,7 +12,7 @@ #include "gen6_ppgtt.h" #include "gen8_ppgtt.h"
-struct i915_page_table *alloc_pt(struct i915_address_space *vm) +struct i915_page_table *alloc_pt(struct i915_address_space *vm, int sz) { struct i915_page_table *pt;
@@ -20,7 +20,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm) if (unlikely(!pt)) return ERR_PTR(-ENOMEM);
- pt->base = vm->alloc_pt_dma(vm, I915_GTT_PAGE_SIZE_4K); + pt->base = vm->alloc_pt_dma(vm, sz); if (IS_ERR(pt->base)) { kfree(pt); return ERR_PTR(-ENOMEM); @@ -221,17 +221,25 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm, u64 size) { unsigned long count; - int shift, n; + int shift, n, pt_sz;
shift = vm->pd_shift; if (!shift) return 0;
+ pt_sz = stash->pt_sz; + if (!pt_sz) + pt_sz = I915_GTT_PAGE_SIZE_4K; + else + GEM_BUG_ON(!IS_DGFX(vm->i915)); + + GEM_BUG_ON(!is_power_of_2(pt_sz)); + count = pd_count(size, shift); while (count--) { struct i915_page_table *pt;
- pt = alloc_pt(vm); + pt = alloc_pt(vm, pt_sz); if (IS_ERR(pt)) { i915_vm_free_pt_stash(vm, stash); return PTR_ERR(pt);
From: Matthew Auld matthew.auld@intel.com
If this is LMEM then we get a 32 entry PT, with each PTE pointing to some 64K block of memory, otherwise it's just the usual 512 entry PT. This very much assumes the caller knows what they are doing.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 50 ++++++++++++++++++++++++++-- 1 file changed, 48 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 62471730266c..f574da00eff1 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -715,13 +715,56 @@ static void gen8_ppgtt_insert_entry(struct i915_address_space *vm, gen8_pdp_for_page_index(vm, idx); struct i915_page_directory *pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); + struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1)); gen8_pte_t *vaddr;
- vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + GEM_BUG_ON(pt->is_compact); + + vaddr = px_vaddr(pt); vaddr[gen8_pd_index(idx, 0)] = gen8_pte_encode(addr, level, flags); clflush_cache_range(&vaddr[gen8_pd_index(idx, 0)], sizeof(*vaddr)); }
+static void __xehpsdv_ppgtt_insert_entry_lm(struct i915_address_space *vm, + dma_addr_t addr, + u64 offset, + enum i915_cache_level level, + u32 flags) +{ + u64 idx = offset >> GEN8_PTE_SHIFT; + struct i915_page_directory * const pdp = + gen8_pdp_for_page_index(vm, idx); + struct i915_page_directory *pd = + i915_pd_entry(pdp, gen8_pd_index(idx, 2)); + struct i915_page_table *pt = i915_pt_entry(pd, gen8_pd_index(idx, 1)); + gen8_pte_t *vaddr; + + GEM_BUG_ON(!IS_ALIGNED(addr, SZ_64K)); + GEM_BUG_ON(!IS_ALIGNED(offset, SZ_64K)); + + if (!pt->is_compact) { + vaddr = px_vaddr(pd); + vaddr[gen8_pd_index(idx, 1)] |= GEN12_PDE_64K; + pt->is_compact = true; + } + + vaddr = px_vaddr(pt); + vaddr[gen8_pd_index(idx, 0) / 16] = gen8_pte_encode(addr, level, flags); +} + +static void xehpsdv_ppgtt_insert_entry(struct i915_address_space *vm, + dma_addr_t addr, + u64 offset, + enum i915_cache_level level, + u32 flags) +{ + if (flags & PTE_LM) + return __xehpsdv_ppgtt_insert_entry_lm(vm, addr, offset, + level, flags); + + return gen8_ppgtt_insert_entry(vm, addr, offset, level, flags); +} + static int gen8_init_scratch(struct i915_address_space *vm) { u32 pte_flags; @@ -921,7 +964,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt,
ppgtt->vm.bind_async_flags = I915_VMA_LOCAL_BIND; ppgtt->vm.insert_entries = gen8_ppgtt_insert; - ppgtt->vm.insert_page = gen8_ppgtt_insert_entry; + if (HAS_64K_PAGES(gt->i915)) + ppgtt->vm.insert_page = xehpsdv_ppgtt_insert_entry; + else + ppgtt->vm.insert_page = gen8_ppgtt_insert_entry; ppgtt->vm.allocate_va_range = gen8_ppgtt_alloc; ppgtt->vm.clear_range = gen8_ppgtt_clear; ppgtt->vm.foreach = gen8_ppgtt_foreach;
From: Matthew Auld matthew.auld@intel.com
This is all kinds of awkward since we now have to contend with using 64K GTT pages when mapping anything in LMEM(including the page-tables themselves).
v2: Rebased [Ram]
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/gt/intel_migrate.c | 179 +++++++++++++++++++----- 1 file changed, 147 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 18b44af56969..cac791155244 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -32,6 +32,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine) return true; }
+static void xehpsdv_toggle_pdes(struct i915_address_space *vm, + struct i915_page_table *pt, + void *data) +{ + struct insert_pte_data *d = data; + + /* + * Insert a dummy PTE into every PT that will map to LMEM to ensure + * we have a correctly setup PDE structure for later use. + */ + vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM); + GEM_BUG_ON(!pt->is_compact); + d->offset += SZ_2M; +} + +static void xehpsdv_insert_pte(struct i915_address_space *vm, + struct i915_page_table *pt, + void *data) +{ + struct insert_pte_data *d = data; + + /* + * We are playing tricks here, since the actual pt, from the hw + * pov, is only 256bytes with 32 entries, or 4096bytes with 512 + * entries, but we are still guaranteed that the physical + * alignment is 64K underneath for the pt, and we are careful + * not to access the space in the void. + */ + vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM); + d->offset += SZ_64K; +} + static void insert_pte(struct i915_address_space *vm, struct i915_page_table *pt, void *data) @@ -74,7 +106,12 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) * i.e. within the same non-preemptible window so that we do not switch * to another migration context that overwrites the PTE. * - * TODO: Add support for huge LMEM PTEs + * On platforms with HAS_64K_PAGES support we have three windows, and + * dedicate two windows just for mapping lmem pages(smem <-> smem is not + * a thing), since we are forced to use 64K GTT pages underneath which + * requires also modifying the PDE. An alternative might be to instead + * map the PD into the GTT, and then on the fly toggle the 4K/64K mode + * in the PDE from the same batch that also modifies the PTEs. */
vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY); @@ -86,6 +123,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) goto err_vm; }
+ if (HAS_64K_PAGES(gt->i915)) + stash.pt_sz = I915_GTT_PAGE_SIZE_64K; + /* * Each engine instance is assigned its own chunk in the VM, so * that we can run multiple instances concurrently @@ -105,14 +145,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need * 4x2 page directories for source/destination. */ - sz = 2 * CHUNK_SZ; + if (HAS_64K_PAGES(gt->i915)) + sz = 3 * CHUNK_SZ; + else + sz = 2 * CHUNK_SZ; d.offset = base + sz;
/* * We need another page directory setup so that we can write * the 8x512 PTE in each chunk. */ - sz += (sz >> 12) * sizeof(u64); + if (HAS_64K_PAGES(gt->i915)) + sz += (sz / SZ_2M) * SZ_64K; + else + sz += (sz >> 12) * sizeof(u64);
err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz); if (err) @@ -133,7 +179,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) goto err_vm;
/* Now allow the GPU to rewrite the PTE via its own ppGTT */ - vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d); + if (HAS_64K_PAGES(gt->i915)) { + vm->vm.foreach(&vm->vm, base, d.offset - base, + xehpsdv_insert_pte, &d); + d.offset = base + CHUNK_SZ; + vm->vm.foreach(&vm->vm, + d.offset, + 2 * CHUNK_SZ, + xehpsdv_toggle_pdes, &d); + } else { + vm->vm.foreach(&vm->vm, base, d.offset - base, + insert_pte, &d); + } }
return &vm->vm; @@ -269,19 +326,38 @@ static int emit_pte(struct i915_request *rq, u64 offset, int length) { + bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915); const u64 encode = rq->context->vm->pte_encode(0, cache_level, is_lmem ? PTE_LM : 0); struct intel_ring *ring = rq->ring; - int total = 0; + int pkt, dword_length; + u32 total = 0; + u32 page_size; u32 *hdr, *cs; - int pkt;
GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
+ page_size = I915_GTT_PAGE_SIZE; + dword_length = 0x400; + /* Compute the page directory offset for the target address range */ - offset >>= 12; - offset *= sizeof(u64); - offset += 2 * CHUNK_SZ; + if (has_64K_pages) { + GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M)); + + offset /= SZ_2M; + offset *= SZ_64K; + offset += 3 * CHUNK_SZ; + + if (is_lmem) { + page_size = I915_GTT_PAGE_SIZE_64K; + dword_length = 0x40; + } + } else { + offset >>= 12; + offset *= sizeof(u64); + offset += 2 * CHUNK_SZ; + } + offset += (u64)rq->engine->instance << 32;
cs = intel_ring_begin(rq, 6); @@ -289,7 +365,7 @@ static int emit_pte(struct i915_request *rq, return PTR_ERR(cs);
/* Pack as many PTE updates as possible into a single MI command */ - pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5); + pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5); pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
hdr = cs; @@ -299,6 +375,8 @@ static int emit_pte(struct i915_request *rq,
do { if (cs - hdr >= pkt) { + int dword_rem; + *hdr += cs - hdr - 2; *cs++ = MI_NOOP;
@@ -310,7 +388,18 @@ static int emit_pte(struct i915_request *rq, if (IS_ERR(cs)) return PTR_ERR(cs);
- pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5); + dword_rem = dword_length; + if (has_64K_pages) { + if (IS_ALIGNED(total, SZ_2M)) { + offset = round_up(offset, SZ_64K); + } else { + dword_rem = SZ_2M - (total & (SZ_2M - 1)); + dword_rem /= page_size; + dword_rem *= 2; + } + } + + pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5); pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
hdr = cs; @@ -319,13 +408,15 @@ static int emit_pte(struct i915_request *rq, *cs++ = upper_32_bits(offset); }
+ GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size)); + *cs++ = lower_32_bits(encode | it->dma); *cs++ = upper_32_bits(encode | it->dma);
offset += 8; - total += I915_GTT_PAGE_SIZE; + total += page_size;
- it->dma += I915_GTT_PAGE_SIZE; + it->dma += page_size; if (it->dma >= it->max) { it->sg = __sg_next(it->sg); if (!it->sg || sg_dma_len(it->sg) == 0) @@ -356,7 +447,8 @@ static bool wa_1209644611_applies(int ver, u32 size) return height % 4 == 3 && height <= 8; }
-static int emit_copy(struct i915_request *rq, int size) +static int emit_copy(struct i915_request *rq, + u32 dst_offset, u32 src_offset, int size) { const int ver = GRAPHICS_VER(rq->engine->i915); u32 instance = rq->engine->instance; @@ -371,31 +463,31 @@ static int emit_copy(struct i915_request *rq, int size) *cs++ = BLT_DEPTH_32 | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = CHUNK_SZ; /* dst offset */ + *cs++ = dst_offset; *cs++ = instance; *cs++ = 0; *cs++ = PAGE_SIZE; - *cs++ = 0; /* src offset */ + *cs++ = src_offset; *cs++ = instance; } else if (ver >= 8) { *cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = CHUNK_SZ; /* dst offset */ + *cs++ = dst_offset; *cs++ = instance; *cs++ = 0; *cs++ = PAGE_SIZE; - *cs++ = 0; /* src offset */ + *cs++ = src_offset; *cs++ = instance; } else { GEM_BUG_ON(instance); *cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE; - *cs++ = CHUNK_SZ; /* dst offset */ + *cs++ = dst_offset; *cs++ = PAGE_SIZE; - *cs++ = 0; /* src offset */ + *cs++ = src_offset; }
intel_ring_advance(rq, cs); @@ -423,6 +515,7 @@ intel_context_migrate_copy(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K);
do { + u32 src_offset, dst_offset; int len;
rq = i915_request_create(ce); @@ -450,15 +543,28 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq;
- len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0, - CHUNK_SZ); + src_offset = 0; + dst_offset = CHUNK_SZ; + if (HAS_64K_PAGES(ce->engine->i915)) { + GEM_BUG_ON(!src_is_lmem && !dst_is_lmem); + + src_offset = 0; + dst_offset = 0; + if (src_is_lmem) + src_offset = CHUNK_SZ; + if (dst_is_lmem) + dst_offset = 2 * CHUNK_SZ; + } + + len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, + src_offset, CHUNK_SZ); if (len <= 0) { err = len; goto out_rq; }
err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem, - CHUNK_SZ, len); + dst_offset, len); if (err < 0) goto out_rq; if (err < len) { @@ -470,7 +576,7 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq;
- err = emit_copy(rq, len); + err = emit_copy(rq, dst_offset, src_offset, len);
/* Arbitration is re-enabled between requests. */ out_rq: @@ -488,14 +594,18 @@ intel_context_migrate_copy(struct intel_context *ce, return err; }
-static int emit_clear(struct i915_request *rq, int size, u32 value) +static int emit_clear(struct i915_request *rq, + u64 offset, + int size, + u32 value) { const int ver = GRAPHICS_VER(rq->engine->i915); - u32 instance = rq->engine->instance; u32 *cs;
GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
+ offset += (u64)rq->engine->instance << 32; + cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6); if (IS_ERR(cs)) return PTR_ERR(cs); @@ -505,17 +615,17 @@ static int emit_clear(struct i915_request *rq, int size, u32 value) *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = 0; /* offset */ - *cs++ = instance; + *cs++ = lower_32_bits(offset); + *cs++ = upper_32_bits(offset); *cs++ = value; *cs++ = MI_NOOP; } else { - GEM_BUG_ON(instance); + GEM_BUG_ON(upper_32_bits(offset)); *cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4; - *cs++ = 0; + *cs++ = lower_32_bits(offset); *cs++ = value; }
@@ -542,6 +652,7 @@ intel_context_migrate_clear(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K);
do { + u32 offset; int len;
rq = i915_request_create(ce); @@ -569,7 +680,11 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq;
- len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ); + offset = 0; + if (HAS_64K_PAGES(ce->engine->i915) && is_lmem) + offset = CHUNK_SZ; + + len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ); if (len <= 0) { err = len; goto out_rq; @@ -579,7 +694,7 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq;
- err = emit_clear(rq, len, value); + err = emit_clear(rq, offset, len, value);
/* Arbitration is re-enabled between requests. */ out_rq:
On 01/02/2022 10:41, Ramalingam C wrote:
From: Matthew Auld matthew.auld@intel.com
This is all kinds of awkward since we now have to contend with using 64K GTT pages when mapping anything in LMEM(including the page-tables themselves).
v2: Rebased [Ram]
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Ramalingam C ramalingam.c@intel.com
This version seems to be missing your review feedback, which I incorporated here[1].
[1] https://patchwork.freedesktop.org/series/97544/
drivers/gpu/drm/i915/gt/intel_migrate.c | 179 +++++++++++++++++++----- 1 file changed, 147 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 18b44af56969..cac791155244 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -32,6 +32,38 @@ static bool engine_supports_migration(struct intel_engine_cs *engine) return true; }
+static void xehpsdv_toggle_pdes(struct i915_address_space *vm,
struct i915_page_table *pt,
void *data)
+{
- struct insert_pte_data *d = data;
- /*
* Insert a dummy PTE into every PT that will map to LMEM to ensure
* we have a correctly setup PDE structure for later use.
*/
- vm->insert_page(vm, 0, d->offset, I915_CACHE_NONE, PTE_LM);
- GEM_BUG_ON(!pt->is_compact);
- d->offset += SZ_2M;
+}
+static void xehpsdv_insert_pte(struct i915_address_space *vm,
struct i915_page_table *pt,
void *data)
+{
- struct insert_pte_data *d = data;
- /*
* We are playing tricks here, since the actual pt, from the hw
* pov, is only 256bytes with 32 entries, or 4096bytes with 512
* entries, but we are still guaranteed that the physical
* alignment is 64K underneath for the pt, and we are careful
* not to access the space in the void.
*/
- vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, PTE_LM);
- d->offset += SZ_64K;
+}
- static void insert_pte(struct i915_address_space *vm, struct i915_page_table *pt, void *data)
@@ -74,7 +106,12 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) * i.e. within the same non-preemptible window so that we do not switch * to another migration context that overwrites the PTE. *
* TODO: Add support for huge LMEM PTEs
* On platforms with HAS_64K_PAGES support we have three windows, and
* dedicate two windows just for mapping lmem pages(smem <-> smem is not
* a thing), since we are forced to use 64K GTT pages underneath which
* requires also modifying the PDE. An alternative might be to instead
* map the PD into the GTT, and then on the fly toggle the 4K/64K mode
* in the PDE from the same batch that also modifies the PTEs.
*/
vm = i915_ppgtt_create(gt, I915_BO_ALLOC_PM_EARLY);
@@ -86,6 +123,9 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) goto err_vm; }
- if (HAS_64K_PAGES(gt->i915))
stash.pt_sz = I915_GTT_PAGE_SIZE_64K;
- /*
- Each engine instance is assigned its own chunk in the VM, so
- that we can run multiple instances concurrently
@@ -105,14 +145,20 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) * We copy in 8MiB chunks. Each PDE covers 2MiB, so we need * 4x2 page directories for source/destination. */
sz = 2 * CHUNK_SZ;
if (HAS_64K_PAGES(gt->i915))
sz = 3 * CHUNK_SZ;
else
sz = 2 * CHUNK_SZ;
d.offset = base + sz;
/*
- We need another page directory setup so that we can write
- the 8x512 PTE in each chunk.
*/
sz += (sz >> 12) * sizeof(u64);
if (HAS_64K_PAGES(gt->i915))
sz += (sz / SZ_2M) * SZ_64K;
else
sz += (sz >> 12) * sizeof(u64);
err = i915_vm_alloc_pt_stash(&vm->vm, &stash, sz); if (err)
@@ -133,7 +179,18 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) goto err_vm;
/* Now allow the GPU to rewrite the PTE via its own ppGTT */
vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d);
if (HAS_64K_PAGES(gt->i915)) {
vm->vm.foreach(&vm->vm, base, d.offset - base,
xehpsdv_insert_pte, &d);
d.offset = base + CHUNK_SZ;
vm->vm.foreach(&vm->vm,
d.offset,
2 * CHUNK_SZ,
xehpsdv_toggle_pdes, &d);
} else {
vm->vm.foreach(&vm->vm, base, d.offset - base,
insert_pte, &d);
}
}
return &vm->vm;
@@ -269,19 +326,38 @@ static int emit_pte(struct i915_request *rq, u64 offset, int length) {
- bool has_64K_pages = HAS_64K_PAGES(rq->engine->i915); const u64 encode = rq->context->vm->pte_encode(0, cache_level, is_lmem ? PTE_LM : 0); struct intel_ring *ring = rq->ring;
- int total = 0;
- int pkt, dword_length;
- u32 total = 0;
- u32 page_size; u32 *hdr, *cs;
int pkt;
GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
- page_size = I915_GTT_PAGE_SIZE;
- dword_length = 0x400;
- /* Compute the page directory offset for the target address range */
- offset >>= 12;
- offset *= sizeof(u64);
- offset += 2 * CHUNK_SZ;
if (has_64K_pages) {
GEM_BUG_ON(!IS_ALIGNED(offset, SZ_2M));
offset /= SZ_2M;
offset *= SZ_64K;
offset += 3 * CHUNK_SZ;
if (is_lmem) {
page_size = I915_GTT_PAGE_SIZE_64K;
dword_length = 0x40;
}
} else {
offset >>= 12;
offset *= sizeof(u64);
offset += 2 * CHUNK_SZ;
}
offset += (u64)rq->engine->instance << 32;
cs = intel_ring_begin(rq, 6);
@@ -289,7 +365,7 @@ static int emit_pte(struct i915_request *rq, return PTR_ERR(cs);
/* Pack as many PTE updates as possible into a single MI command */
- pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5); pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5);
hdr = cs;
@@ -299,6 +375,8 @@ static int emit_pte(struct i915_request *rq,
do { if (cs - hdr >= pkt) {
int dword_rem;
*hdr += cs - hdr - 2; *cs++ = MI_NOOP;
@@ -310,7 +388,18 @@ static int emit_pte(struct i915_request *rq, if (IS_ERR(cs)) return PTR_ERR(cs);
pkt = min_t(int, 0x400, ring->space / sizeof(u32) + 5);
dword_rem = dword_length;
if (has_64K_pages) {
if (IS_ALIGNED(total, SZ_2M)) {
offset = round_up(offset, SZ_64K);
} else {
dword_rem = SZ_2M - (total & (SZ_2M - 1));
dword_rem /= page_size;
dword_rem *= 2;
}
}
pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5); pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5); hdr = cs;
@@ -319,13 +408,15 @@ static int emit_pte(struct i915_request *rq, *cs++ = upper_32_bits(offset); }
GEM_BUG_ON(!IS_ALIGNED(it->dma, page_size));
*cs++ = lower_32_bits(encode | it->dma); *cs++ = upper_32_bits(encode | it->dma);
offset += 8;
total += I915_GTT_PAGE_SIZE;
total += page_size;
it->dma += I915_GTT_PAGE_SIZE;
if (it->dma >= it->max) { it->sg = __sg_next(it->sg); if (!it->sg || sg_dma_len(it->sg) == 0)it->dma += page_size;
@@ -356,7 +447,8 @@ static bool wa_1209644611_applies(int ver, u32 size) return height % 4 == 3 && height <= 8; }
-static int emit_copy(struct i915_request *rq, int size) +static int emit_copy(struct i915_request *rq,
{ const int ver = GRAPHICS_VER(rq->engine->i915); u32 instance = rq->engine->instance;u32 dst_offset, u32 src_offset, int size)
@@ -371,31 +463,31 @@ static int emit_copy(struct i915_request *rq, int size) *cs++ = BLT_DEPTH_32 | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cs++ = CHUNK_SZ; /* dst offset */
*cs++ = instance; *cs++ = 0; *cs++ = PAGE_SIZE;*cs++ = dst_offset;
*cs++ = 0; /* src offset */
*cs++ = instance; } else if (ver >= 8) { *cs++ = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (10 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;*cs++ = src_offset;
*cs++ = CHUNK_SZ; /* dst offset */
*cs++ = instance; *cs++ = 0; *cs++ = PAGE_SIZE;*cs++ = dst_offset;
*cs++ = 0; /* src offset */
*cs++ = instance; } else { GEM_BUG_ON(instance); *cs++ = SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | PAGE_SIZE; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE;*cs++ = src_offset;
*cs++ = CHUNK_SZ; /* dst offset */
*cs++ = PAGE_SIZE;*cs++ = dst_offset;
*cs++ = 0; /* src offset */
*cs++ = src_offset;
}
intel_ring_advance(rq, cs);
@@ -423,6 +515,7 @@ intel_context_migrate_copy(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K);
do {
u32 src_offset, dst_offset;
int len;
rq = i915_request_create(ce);
@@ -450,15 +543,28 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq;
len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem, 0,
CHUNK_SZ);
src_offset = 0;
dst_offset = CHUNK_SZ;
if (HAS_64K_PAGES(ce->engine->i915)) {
GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
src_offset = 0;
dst_offset = 0;
if (src_is_lmem)
src_offset = CHUNK_SZ;
if (dst_is_lmem)
dst_offset = 2 * CHUNK_SZ;
}
len = emit_pte(rq, &it_src, src_cache_level, src_is_lmem,
src_offset, CHUNK_SZ);
if (len <= 0) { err = len; goto out_rq; }
err = emit_pte(rq, &it_dst, dst_cache_level, dst_is_lmem,
CHUNK_SZ, len);
if (err < 0) goto out_rq; if (err < len) {dst_offset, len);
@@ -470,7 +576,7 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq;
err = emit_copy(rq, len);
err = emit_copy(rq, dst_offset, src_offset, len);
/* Arbitration is re-enabled between requests. */ out_rq:
@@ -488,14 +594,18 @@ intel_context_migrate_copy(struct intel_context *ce, return err; }
-static int emit_clear(struct i915_request *rq, int size, u32 value) +static int emit_clear(struct i915_request *rq,
u64 offset,
int size,
{ const int ver = GRAPHICS_VER(rq->engine->i915);u32 value)
u32 instance = rq->engine->instance; u32 *cs;
GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
- offset += (u64)rq->engine->instance << 32;
- cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6); if (IS_ERR(cs)) return PTR_ERR(cs);
@@ -505,17 +615,17 @@ static int emit_clear(struct i915_request *rq, int size, u32 value) *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;
*cs++ = 0; /* offset */
*cs++ = instance;
*cs++ = lower_32_bits(offset);
*cs++ = value; *cs++ = MI_NOOP; } else {*cs++ = upper_32_bits(offset);
GEM_BUG_ON(instance);
*cs++ = XY_COLOR_BLT_CMD | BLT_WRITE_RGBA | (6 - 2); *cs++ = BLT_DEPTH_32 | BLT_ROP_COLOR_COPY | PAGE_SIZE; *cs++ = 0; *cs++ = size >> PAGE_SHIFT << 16 | PAGE_SIZE / 4;GEM_BUG_ON(upper_32_bits(offset));
*cs++ = 0;
*cs++ = value; }*cs++ = lower_32_bits(offset);
@@ -542,6 +652,7 @@ intel_context_migrate_clear(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K);
do {
u32 offset;
int len;
rq = i915_request_create(ce);
@@ -569,7 +680,11 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq;
len = emit_pte(rq, &it, cache_level, is_lmem, 0, CHUNK_SZ);
offset = 0;
if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
offset = CHUNK_SZ;
if (len <= 0) { err = len; goto out_rq;len = emit_pte(rq, &it, cache_level, is_lmem, offset, CHUNK_SZ);
@@ -579,7 +694,7 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq;
err = emit_clear(rq, len, value);
err = emit_clear(rq, offset, len, value);
/* Arbitration is re-enabled between requests. */ out_rq:
From: Matthew Auld matthew.auld@intel.com
On discrete platforms like DG2, we need to support a minimum page size of 64K when dealing with device local-memory. This is quite tricky for various reasons, so try to document the new implicit uapi for this.
v3: fix typos and less emphasis v2: Fixed suggestions on formatting [Daniel]
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Acked-by: Jordan Justen jordan.l.justen@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com cc: Simon Ser contact@emersion.fr cc: Pekka Paalanen ppaalanen@gmail.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: mesa-dev@lists.freedesktop.org Cc: Tony Ye tony.ye@intel.com Cc: Slawomir Milczarek slawomir.milczarek@intel.com --- include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++----- 1 file changed, 39 insertions(+), 5 deletions(-)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 5e678917da70..77e5e74c32c1 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 { /** * When the EXEC_OBJECT_PINNED flag is specified this is populated by * the user with the GTT offset at which this object will be pinned. + * * When the I915_EXEC_NO_RELOC flag is specified this must contain the * presumed_offset of the object. + * * During execbuffer2 the kernel populates it with the value of the * current GTT offset of the object, for future presumed_offset writes. + * + * See struct drm_i915_gem_create_ext for the rules when dealing with + * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with + * minimum page sizes, like DG2. */ __u64 offset;
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext { * * The (page-aligned) allocated size for the object will be returned. * - * Note that for some devices we have might have further minimum - * page-size restrictions(larger than 4K), like for device local-memory. - * However in general the final size here should always reflect any - * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS - * extension to place the object in device local-memory. + * + * DG2 64K min page size implications: + * + * On discrete platforms, starting from DG2, we have to contend with GTT + * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE + * objects. Specifically the hardware only supports 64K or larger GTT + * page sizes for such memory. The kernel will already ensure that all + * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page + * sizes underneath. + * + * Note that the returned size here will always reflect any required + * rounding up done by the kernel, i.e 4K will now become 64K on devices + * such as DG2. + * + * Special DG2 GTT address alignment requirement: + * + * The GTT alignment will also need to be at least 2M for such objects. + * + * Note that due to how the hardware implements 64K GTT page support, we + * have some further complications: + * + * 1) The entire PDE (which covers a 2MB virtual address range), must + * contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same + * PDE is forbidden by the hardware. + * + * 2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM + * objects. + * + * To keep things simple for userland, we mandate that any GTT mappings + * must be aligned to and rounded up to 2MB. As this only wastes virtual + * address space and avoids userland having to copy any needlessly + * complicated PDE sharing scheme (coloring) and only affects DG2, this + * is deemed to be a good compromise. */ __u64 size; /**
Details of the 64k pagesize support added as part of DG2 enabling and its implicit impact on the uAPI.
v2: improvised the Flat-CCS documentation [Danvet & CQ] v3: made only for 64k pagesize support
Signed-off-by: Ramalingam C ramalingam.c@intel.com cc: Daniel Vetter daniel.vetter@ffwll.ch cc: Matthew Auld matthew.auld@intel.com cc: Simon Ser contact@emersion.fr cc: Pekka Paalanen ppaalanen@gmail.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: mesa-dev@lists.freedesktop.org Cc: Tony Ye tony.ye@intel.com Cc: Slawomir Milczarek slawomir.milczarek@intel.com --- Documentation/gpu/rfc/i915_dg2.rst | 25 +++++++++++++++++++++++++ Documentation/gpu/rfc/index.rst | 3 +++ 2 files changed, 28 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst new file mode 100644 index 000000000000..f4eb5a219897 --- /dev/null +++ b/Documentation/gpu/rfc/i915_dg2.rst @@ -0,0 +1,25 @@ +==================== +I915 DG2 RFC Section +==================== + +Upstream plan +============= +Plan to upstream the DG2 enabling is: + +* Merge basic HW enabling for DG2 (Still without pciid) +* Merge the 64k support for lmem +* Merge the flat CCS enabling patches +* Add the pciid for DG2 and enable the DG2 in CI + + +64K page support for lmem +========================= +On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not +supported anymore. + +DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt +Page table. Refer the struct drm_i915_gem_create_ext for the implication of +handling the 64k page size. + +.. kernel-doc:: include/uapi/drm/i915_drm.h + :functions: drm_i915_gem_create_ext diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst index 91e93a705230..afb320ed4028 100644 --- a/Documentation/gpu/rfc/index.rst +++ b/Documentation/gpu/rfc/index.rst @@ -20,6 +20,9 @@ host such documentation:
i915_gem_lmem.rst
+.. toctree:: + i915_dg2.rst + .. toctree::
i915_scheduler.rst
On Tue, Feb 01, 2022 at 04:11:22PM +0530, Ramalingam C wrote:
Details of the 64k pagesize support added as part of DG2 enabling and its implicit impact on the uAPI.
v2: improvised the Flat-CCS documentation [Danvet & CQ] v3: made only for 64k pagesize support
Signed-off-by: Ramalingam C ramalingam.c@intel.com cc: Daniel Vetter daniel.vetter@ffwll.ch cc: Matthew Auld matthew.auld@intel.com cc: Simon Ser contact@emersion.fr cc: Pekka Paalanen ppaalanen@gmail.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: mesa-dev@lists.freedesktop.org Cc: Tony Ye tony.ye@intel.com Cc: Slawomir Milczarek slawomir.milczarek@intel.com
Documentation/gpu/rfc/i915_dg2.rst | 25 +++++++++++++++++++++++++ Documentation/gpu/rfc/index.rst | 3 +++ 2 files changed, 28 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst new file mode 100644 index 000000000000..f4eb5a219897 --- /dev/null +++ b/Documentation/gpu/rfc/i915_dg2.rst @@ -0,0 +1,25 @@ +==================== +I915 DG2 RFC Section +====================
+Upstream plan +============= +Plan to upstream the DG2 enabling is:
+* Merge basic HW enabling for DG2 (Still without pciid) +* Merge the 64k support for lmem +* Merge the flat CCS enabling patches +* Add the pciid for DG2 and enable the DG2 in CI
does this make sense after the fact? Earlier version of this patch Daniel Vetter asked this to be moved to the be the first patch. I see you added it in the cover letter, but keeping this in gpu/rfc/i915_dg2.rst doesn't make much sense IMO. Maybe just drop this patch?
Lucas De Marchi
+64K page support for lmem +========================= +On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not +supported anymore.
+DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt +Page table. Refer the struct drm_i915_gem_create_ext for the implication of +handling the 64k page size.
+.. kernel-doc:: include/uapi/drm/i915_drm.h
:functions: drm_i915_gem_create_ext
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst index 91e93a705230..afb320ed4028 100644 --- a/Documentation/gpu/rfc/index.rst +++ b/Documentation/gpu/rfc/index.rst @@ -20,6 +20,9 @@ host such documentation:
i915_gem_lmem.rst
+.. toctree::
- i915_dg2.rst
.. toctree::
i915_scheduler.rst
-- 2.20.1
On 2022-02-17 at 21:39:16 -0800, Lucas De Marchi wrote:
On Tue, Feb 01, 2022 at 04:11:22PM +0530, Ramalingam C wrote:
Details of the 64k pagesize support added as part of DG2 enabling and its implicit impact on the uAPI.
v2: improvised the Flat-CCS documentation [Danvet & CQ] v3: made only for 64k pagesize support
Signed-off-by: Ramalingam C ramalingam.c@intel.com cc: Daniel Vetter daniel.vetter@ffwll.ch cc: Matthew Auld matthew.auld@intel.com cc: Simon Ser contact@emersion.fr cc: Pekka Paalanen ppaalanen@gmail.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: mesa-dev@lists.freedesktop.org Cc: Tony Ye tony.ye@intel.com Cc: Slawomir Milczarek slawomir.milczarek@intel.com
Documentation/gpu/rfc/i915_dg2.rst | 25 +++++++++++++++++++++++++ Documentation/gpu/rfc/index.rst | 3 +++ 2 files changed, 28 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst new file mode 100644 index 000000000000..f4eb5a219897 --- /dev/null +++ b/Documentation/gpu/rfc/i915_dg2.rst @@ -0,0 +1,25 @@ +==================== +I915 DG2 RFC Section +====================
+Upstream plan +============= +Plan to upstream the DG2 enabling is:
+* Merge basic HW enabling for DG2 (Still without pciid) +* Merge the 64k support for lmem +* Merge the flat CCS enabling patches +* Add the pciid for DG2 and enable the DG2 in CI
does this make sense after the fact? Earlier version of this patch Daniel Vetter asked this to be moved to the be the first patch. I see you added it in the cover letter, but keeping this in gpu/rfc/i915_dg2.rst doesn't make much sense IMO. Maybe just drop this patch?
Yes. I couldn't move this to the start of the series as the kdoc referenced here are from later patches of the series.
But now considering we have the Kdoc for uapi at the respective patches itself we could drop this patch.
Daniel, Hope you agree on that?
Ram.
Lucas De Marchi
+64K page support for lmem +========================= +On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is not +supported anymore.
+DG2 hw doesn't support the 64k (lmem) and 4k (smem) pages in the same ppgtt +Page table. Refer the struct drm_i915_gem_create_ext for the implication of +handling the 64k page size.
+.. kernel-doc:: include/uapi/drm/i915_drm.h
:functions: drm_i915_gem_create_ext
diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst index 91e93a705230..afb320ed4028 100644 --- a/Documentation/gpu/rfc/index.rst +++ b/Documentation/gpu/rfc/index.rst @@ -20,6 +20,9 @@ host such documentation:
i915_gem_lmem.rst
+.. toctree::
- i915_dg2.rst
.. toctree::
i915_scheduler.rst
-- 2.20.1
From: CQ Tang cq.tang@intel.com
Platforms of XeHP and beyond support 3D surface (buffer) compression and various compression formats. This is accomplished by an additional compression control state (CCS) stored for each surface.
Gen 12 devices(TGL family and DG1) stores compression states in a separate region of memory. It is managed by user-space and has an associated set of user-space managed page tables used by hardware for address translation.
In Xe HP and beyond (XEHPSDV, DG2, etc), there is a new feature introduced i.e Flat CCS. It replaced AUX page tables with a flat indexed region of device memory for storing compression states.
Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Reviewed-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/i915_drv.h | 6 ++++++ drivers/gpu/drm/i915/i915_pci.c | 1 + drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 4afdfa5fd3b3..384977798c8e 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1528,6 +1528,12 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i)) #define HAS_LMEM(i915) HAS_REGION(i915, REGION_LMEM)
+/* + * Platform has the dedicated compression control state for each lmem surfaces + * stored in lmem to support the 3D and media compression formats. + */ +#define HAS_FLAT_CCS(dev_priv) (INTEL_INFO(dev_priv)->has_flat_ccs) + #define HAS_GT_UC(dev_priv) (INTEL_INFO(dev_priv)->has_gt_uc)
#define HAS_POOLED_EU(dev_priv) (INTEL_INFO(dev_priv)->has_pooled_eu) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index ce6ae6a3cbdf..3976482582b8 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1003,6 +1003,7 @@ static const struct intel_device_info adl_p_info = { XE_HP_PAGE_SIZES, \ .dma_mask_size = 46, \ .has_64bit_reloc = 1, \ + .has_flat_ccs = 1, \ .has_global_mocs = 1, \ .has_gt_uc = 1, \ .has_llc = 1, \ diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index d8da40d01dca..ef7c7c988b7b 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -133,6 +133,7 @@ enum intel_ppgtt_type { func(needs_compact_pt); \ func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ + func(has_flat_ccs); \ func(has_global_mocs); \ func(has_gt_uc); \ func(has_guc_deprivilege); \
From: Abdiel Janulgue abdiel.janulgue@linux.intel.com
A portion of device memory is reserved for Flat CCS so usable device memory will be reduced by size of Flat CCS. Size of Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”. So to get effective device memory we need to subtract total device memory by Flat CCS memory size.
v2: Addressed the small bar related issue [Matt] Removed a reduntant check [Matt]
Cc: Matthew Auld matthew.auld@intel.com Signed-off-by: Abdiel Janulgue abdiel.janulgue@linux.intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/gt/intel_gt.c | 19 ++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt.h | 1 + drivers/gpu/drm/i915/gt/intel_region_lmem.c | 24 +++++++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 3 +++ 4 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index f59933abbb3a..e40d98cb3a2d 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -911,6 +911,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg) return intel_uncore_read_fw(gt->uncore, reg); }
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg) +{ + int type; + u8 sliceid, subsliceid; + + for (type = 0; type < NUM_STEERING_TYPES; type++) { + if (intel_gt_reg_needs_read_steering(gt, reg, type)) { + intel_gt_get_valid_steering(gt, type, &sliceid, + &subsliceid); + return intel_uncore_read_with_mcr_steering(gt->uncore, + reg, + sliceid, + subsliceid); + } + } + + return intel_uncore_read(gt->uncore, reg); +} + void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 2dad46c3eff2..0f571c8ee22b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt, }
u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg); +u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p); diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index 21215a080088..f1d37b46b505 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -205,8 +205,28 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) if (!IS_DGFX(i915)) return ERR_PTR(-ENODEV);
- /* Stolen starts from GSMBASE on DG1 */ - lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE); + if (HAS_FLAT_CCS(i915)) { + u64 tile_stolen, flat_ccs_base_addr_reg, flat_ccs_base; + + lmem_size = pci_resource_len(pdev, 2); + flat_ccs_base_addr_reg = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR); + flat_ccs_base = (flat_ccs_base_addr_reg >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K; + + if (GEM_WARN_ON(lmem_size < flat_ccs_base)) + return ERR_PTR(-ENODEV); + + tile_stolen = lmem_size - flat_ccs_base; + + /* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */ + if (tile_stolen == lmem_size) + DRM_ERROR("CCS_BASE_ADDR register did not have expected value\n"); + + lmem_size -= tile_stolen; + } else { + /* Stolen starts from GSMBASE without CCS */ + lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE); + } +
io_start = pci_resource_start(pdev, 2); if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2))) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0f36af8dc3a1..9b5423572fe9 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -11651,6 +11651,9 @@ enum skl_power_gate { #define SGGI_DIS REG_BIT(15) #define SGR_DIS REG_BIT(13)
+#define XEHPSDV_FLAT_CCS_BASE_ADDR _MMIO(0x4910) +#define XEHPSDV_CCS_BASE_SHIFT 8 + /* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4) #define GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW 0x67F1427F /* max/min for LRA1/2 */
On Tue, Feb 01, 2022 at 04:11:24PM +0530, Ramalingam C wrote:
From: Abdiel Janulgue abdiel.janulgue@linux.intel.com
A portion of device memory is reserved for Flat CCS so usable device memory will be reduced by size of Flat CCS. Size of Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”. So to get effective device memory we need to subtract total device memory by Flat CCS memory size.
v2: Addressed the small bar related issue [Matt] Removed a reduntant check [Matt]
Cc: Matthew Auld matthew.auld@intel.com Signed-off-by: Abdiel Janulgue abdiel.janulgue@linux.intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gt/intel_gt.c | 19 ++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt.h | 1 + drivers/gpu/drm/i915/gt/intel_region_lmem.c | 24 +++++++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 3 +++ 4 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index f59933abbb3a..e40d98cb3a2d 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -911,6 +911,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg) return intel_uncore_read_fw(gt->uncore, reg); }
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg) +{
- int type;
- u8 sliceid, subsliceid;
- for (type = 0; type < NUM_STEERING_TYPES; type++) {
if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
intel_gt_get_valid_steering(gt, type, &sliceid,
&subsliceid);
return intel_uncore_read_with_mcr_steering(gt->uncore,
reg,
sliceid,
subsliceid);
}
- }
- return intel_uncore_read(gt->uncore, reg);
+}
void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 2dad46c3eff2..0f571c8ee22b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt, }
u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg); +u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p); diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index 21215a080088..f1d37b46b505 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -205,8 +205,28 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) if (!IS_DGFX(i915)) return ERR_PTR(-ENODEV);
- /* Stolen starts from GSMBASE on DG1 */
- lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
- if (HAS_FLAT_CCS(i915)) {
u64 tile_stolen, flat_ccs_base_addr_reg, flat_ccs_base;
lmem_size = pci_resource_len(pdev, 2);
flat_ccs_base_addr_reg = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
nit since this will need a respin due to conflicts: we usually call _reg an i915_reg_t variable. But here you have the value, not the register. Maybe "flat_ccs_base_addr"?
flat_ccs_base = (flat_ccs_base_addr_reg >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
if (GEM_WARN_ON(lmem_size < flat_ccs_base))
return ERR_PTR(-ENODEV);
tile_stolen = lmem_size - flat_ccs_base;
/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
if (tile_stolen == lmem_size)
DRM_ERROR("CCS_BASE_ADDR register did not have expected value\n");
drm_err()
lmem_size -= tile_stolen;
} else {
/* Stolen starts from GSMBASE without CCS */
lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
}
io_start = pci_resource_start(pdev, 2); if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0f36af8dc3a1..9b5423572fe9 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -11651,6 +11651,9 @@ enum skl_power_gate { #define SGGI_DIS REG_BIT(15) #define SGR_DIS REG_BIT(13)
+#define XEHPSDV_FLAT_CCS_BASE_ADDR _MMIO(0x4910) +#define XEHPSDV_CCS_BASE_SHIFT 8
you will have a conflict here... I fixed it locally by moving to gt/intel_gt_regs.h
With the above,
Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com
Lucas De Marchi
/* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4)
#define GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW 0x67F1427F /* max/min for LRA1/2 */
2.20.1
On Fri, Feb 18, 2022 at 02:08:18AM -0800, Lucas De Marchi wrote:
On Tue, Feb 01, 2022 at 04:11:24PM +0530, Ramalingam C wrote:
From: Abdiel Janulgue abdiel.janulgue@linux.intel.com
A portion of device memory is reserved for Flat CCS so usable device memory will be reduced by size of Flat CCS. Size of Flat CCS is specified in “XEHPSDV_FLAT_CCS_BASE_ADDR”. So to get effective device memory we need to subtract total device memory by Flat CCS memory size.
v2: Addressed the small bar related issue [Matt] Removed a reduntant check [Matt]
Cc: Matthew Auld matthew.auld@intel.com Signed-off-by: Abdiel Janulgue abdiel.janulgue@linux.intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/gt/intel_gt.c | 19 ++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt.h | 1 + drivers/gpu/drm/i915/gt/intel_region_lmem.c | 24 +++++++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 3 +++ 4 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index f59933abbb3a..e40d98cb3a2d 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -911,6 +911,25 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg) return intel_uncore_read_fw(gt->uncore, reg); }
+u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg) +{
- int type;
- u8 sliceid, subsliceid;
- for (type = 0; type < NUM_STEERING_TYPES; type++) {
if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
intel_gt_get_valid_steering(gt, type, &sliceid,
&subsliceid);
return intel_uncore_read_with_mcr_steering(gt->uncore,
reg,
sliceid,
subsliceid);
}
- }
- return intel_uncore_read(gt->uncore, reg);
+}
void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 2dad46c3eff2..0f571c8ee22b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -85,6 +85,7 @@ static inline bool intel_gt_needs_read_steering(struct intel_gt *gt, }
u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg); +u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
void intel_gt_info_print(const struct intel_gt_info *info, struct drm_printer *p); diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index 21215a080088..f1d37b46b505 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -205,8 +205,28 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) if (!IS_DGFX(i915)) return ERR_PTR(-ENODEV);
- /* Stolen starts from GSMBASE on DG1 */
- lmem_size = intel_uncore_read64(uncore, GEN12_GSMBASE);
- if (HAS_FLAT_CCS(i915)) {
u64 tile_stolen, flat_ccs_base_addr_reg, flat_ccs_base;
lmem_size = pci_resource_len(pdev, 2);
flat_ccs_base_addr_reg = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
nit since this will need a respin due to conflicts: we usually call _reg an i915_reg_t variable. But here you have the value, not the register. Maybe "flat_ccs_base_addr"?
flat_ccs_base = (flat_ccs_base_addr_reg >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
if (GEM_WARN_ON(lmem_size < flat_ccs_base))
return ERR_PTR(-ENODEV);
tile_stolen = lmem_size - flat_ccs_base;
/* If the FLAT_CCS_BASE_ADDR register is not populated, flag an error */
if (tile_stolen == lmem_size)
DRM_ERROR("CCS_BASE_ADDR register did not have expected value\n");
drm_err()
lmem_size -= tile_stolen;
} else {
/* Stolen starts from GSMBASE without CCS */
lmem_size = intel_uncore_read64(&i915->uncore, GEN12_GSMBASE);
}
io_start = pci_resource_start(pdev, 2); if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0f36af8dc3a1..9b5423572fe9 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -11651,6 +11651,9 @@ enum skl_power_gate { #define SGGI_DIS REG_BIT(15) #define SGR_DIS REG_BIT(13)
+#define XEHPSDV_FLAT_CCS_BASE_ADDR _MMIO(0x4910) +#define XEHPSDV_CCS_BASE_SHIFT 8
you will have a conflict here... I fixed it locally by moving to gt/intel_gt_regs.h
With the above,
Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com
Also, I may be wrong and couldn't test it today. But... to finally have BAT on CI working, this and "drm/i915/xehpsdv: Add has_flat_ccs to device info" are the only changes from the FlatCCS part that are strictly needed, isn't it? "drm/i915/migrate: add acceleration support for DG2", all the plane format, support for clearing ccs etc are things that could be on top.
Lucas De Marchi
Lucas De Marchi
/* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4)
#define GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW 0x67F1427F /* max/min for LRA1/2 */
2.20.1
From: Ayaz A Siddiqui ayaz.siddiqui@intel.com
Xe-HP and latest devices support Flat CCS which reserved a portion of the device memory to store compression metadata, during the clearing of device memory buffer object we also need to clear the associated CCS buffer.
Flat CCS memory can not be directly accessed by S/W. Address of CCS buffer associated main BO is automatically calculated by device itself. KMD/UMD can only access this buffer indirectly using XY_CTRL_SURF_COPY_BLT cmd via the address of device memory buffer.
v2: Fixed issues with platform naming [Lucas] v3: Rebased [Ram] Used the round_up funcs [Bob]
Cc: CQ Tang cq.tang@intel.com Signed-off-by: Ayaz A Siddiqui ayaz.siddiqui@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 14 +++ drivers/gpu/drm/i915/gt/intel_migrate.c | 114 ++++++++++++++++++- 2 files changed, 125 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index f8253012d166..07bf5a1753bd 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -203,6 +203,20 @@ #define GFX_OP_DRAWRECT_INFO ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3)) #define GFX_OP_DRAWRECT_INFO_I965 ((0x7900<<16)|0x2)
+#define XY_CTRL_SURF_INSTR_SIZE 5 +#define MI_FLUSH_DW_SIZE 3 +#define XY_CTRL_SURF_COPY_BLT ((2 << 29) | (0x48 << 22) | 3) +#define SRC_ACCESS_TYPE_SHIFT 21 +#define DST_ACCESS_TYPE_SHIFT 20 +#define CCS_SIZE_SHIFT 8 +#define XY_CTRL_SURF_MOCS_SHIFT 25 +#define NUM_CCS_BYTES_PER_BLOCK 256 +#define NUM_CCS_BLKS_PER_XFER 1024 +#define INDIRECT_ACCESS 0 +#define DIRECT_ACCESS 1 +#define MI_FLUSH_LLC BIT(9) +#define MI_FLUSH_CCS BIT(16) + #define COLOR_BLT_CMD (2 << 29 | 0x40 << 22 | (5 - 2)) #define XY_COLOR_BLT_CMD (2 << 29 | 0x50 << 22) #define SRC_COPY_BLT_CMD (2 << 29 | 0x43 << 22) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index cac791155244..3e1cf224cdf0 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -16,6 +16,8 @@ struct insert_pte_data { };
#define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */ +#define GET_CCS_SIZE(i915, size) (HAS_FLAT_CCS(i915) ? \ + DIV_ROUND_UP(size, NUM_CCS_BYTES_PER_BLOCK) : 0)
static bool engine_supports_migration(struct intel_engine_cs *engine) { @@ -594,19 +596,105 @@ intel_context_migrate_copy(struct intel_context *ce, return err; }
+static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags) +{ + /* Mask the 3 LSB to use the PPGTT address space */ + *cmd++ = MI_FLUSH_DW | flags; + *cmd++ = lower_32_bits(dst); + *cmd++ = upper_32_bits(dst); + + return cmd; +} + +static u32 calc_ctrl_surf_instr_size(struct drm_i915_private *i915, int size) +{ + u32 num_cmds, num_blks, total_size; + + if (!GET_CCS_SIZE(i915, size)) + return 0; + + /* + * XY_CTRL_SURF_COPY_BLT transfers CCS in 256 byte + * blocks. one XY_CTRL_SURF_COPY_BLT command can + * trnasfer upto 1024 blocks. + */ + num_blks = GET_CCS_SIZE(i915, size); + num_cmds = (num_blks + (NUM_CCS_BLKS_PER_XFER - 1)) >> 10; + total_size = (XY_CTRL_SURF_INSTR_SIZE) * num_cmds; + + /* + * We need to add a flush before and after + * XY_CTRL_SURF_COPY_BLT + */ + total_size += 2 * MI_FLUSH_DW_SIZE; + return total_size; +} + +static u32 *_i915_ctrl_surf_copy_blt(u32 *cmd, u64 src_addr, u64 dst_addr, + u8 src_mem_access, u8 dst_mem_access, + int src_mocs, int dst_mocs, + u16 num_ccs_blocks) +{ + int i = num_ccs_blocks; + + /* + * The XY_CTRL_SURF_COPY_BLT instruction is used to copy the CCS + * data in and out of the CCS region. + * + * We can copy at most 1024 blocks of 256 bytes using one + * XY_CTRL_SURF_COPY_BLT instruction. + * + * In case we need to copy more than 1024 blocks, we need to add + * another instruction to the same batch buffer. + * + * 1024 blocks of 256 bytes of CCS represent a total 256KB of CCS. + * + * 256 KB of CCS represents 256 * 256 KB = 64 MB of LMEM. + */ + do { + /* + * We use logical AND with 1023 since the size field + * takes values which is in the range of 0 - 1023 + */ + *cmd++ = ((XY_CTRL_SURF_COPY_BLT) | + (src_mem_access << SRC_ACCESS_TYPE_SHIFT) | + (dst_mem_access << DST_ACCESS_TYPE_SHIFT) | + (((i - 1) & 1023) << CCS_SIZE_SHIFT)); + *cmd++ = lower_32_bits(src_addr); + *cmd++ = ((upper_32_bits(src_addr) & 0xFFFF) | + (src_mocs << XY_CTRL_SURF_MOCS_SHIFT)); + *cmd++ = lower_32_bits(dst_addr); + *cmd++ = ((upper_32_bits(dst_addr) & 0xFFFF) | + (dst_mocs << XY_CTRL_SURF_MOCS_SHIFT)); + src_addr += SZ_64M; + dst_addr += SZ_64M; + i -= NUM_CCS_BLKS_PER_XFER; + } while (i > 0); + + return cmd; +} + static int emit_clear(struct i915_request *rq, u64 offset, int size, - u32 value) + u32 value, + bool is_lmem) { + struct drm_i915_private *i915 = rq->engine->i915; const int ver = GRAPHICS_VER(rq->engine->i915); + u32 num_ccs_blks, ccs_ring_size; u32 *cs;
GEM_BUG_ON(size >> PAGE_SHIFT > S16_MAX);
offset += (u64)rq->engine->instance << 32;
- cs = intel_ring_begin(rq, ver >= 8 ? 8 : 6); + /* Clear flat css only when value is 0 */ + ccs_ring_size = (is_lmem && !value) ? + calc_ctrl_surf_instr_size(i915, size) + : 0; + + cs = intel_ring_begin(rq, round_up(ver >= 8 ? 8 + ccs_ring_size : 6, 2)); if (IS_ERR(cs)) return PTR_ERR(cs);
@@ -629,6 +717,26 @@ static int emit_clear(struct i915_request *rq, *cs++ = value; }
+ if (is_lmem && HAS_FLAT_CCS(i915) && !value) { + num_ccs_blks = GET_CCS_SIZE(i915, size); + + /* + * Flat CCS surface can only be accessed via + * XY_CTRL_SURF_COPY_BLT CMD and using indirect + * mapping of associated LMEM. + * We can clear ccs surface by writing all 0s, + * so we will flush the previously cleared buffer + * and use it as a source. + */ + cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS); + cs = _i915_ctrl_surf_copy_blt(cs, offset, offset, + DIRECT_ACCESS, INDIRECT_ACCESS, + 1, 1, num_ccs_blks); + cs = i915_flush_dw(cs, offset, MI_FLUSH_LLC | MI_FLUSH_CCS); + + if (ccs_ring_size & 1) + *cs++ = MI_NOOP; + } intel_ring_advance(rq, cs); return 0; } @@ -694,7 +802,7 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq;
- err = emit_clear(rq, offset, len, value); + err = emit_clear(rq, offset, len, value, is_lmem);
/* Arbitration is re-enabled between requests. */ out_rq:
From: Stanislav Lisovskiy stanislav.lisovskiy@intel.com
This tiling layout uses 4KB tiles in a row-major layout. It has the same shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It only differs from Tile Y at the 256B granularity in between. At this granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a shape of 64B x 8 rows.
Reviewed-by: Imre Deak imre.deak@intel.com Acked-by: Nanley Chery nanley.g.chery@intel.com Signed-off-by: Stanislav Lisovskiy stanislav.lisovskiy@intel.com --- include/uapi/drm/drm_fourcc.h | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index fc0c1454d275..b73fe6797fc3 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -572,6 +572,17 @@ extern "C" { */ #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC fourcc_mod_code(INTEL, 8)
+/* + * Intel Tile 4 layout + * + * This is a tiled layout using 4KB tiles in a row-major layout. It has the same + * shape as Tile Y at two granularities: 4KB (128B x 32) and 64B (16B x 4). It + * only differs from Tile Y at the 256B granularity in between. At this + * granularity, Tile Y has a shape of 16B x 32 rows, but this tiling has a shape + * of 64B x 8 rows. + */ +#define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9) + /* * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks *
From: Stanislav Lisovskiy stanislav.lisovskiy@intel.com
Tile4 in bspec format is 4K tile organized into 64B subtiles with same basic shape as for legacy TileY which will be supported by Display13.
v2: - Moved Tile4 assocating struct for modifier/display to the beginning(Imre Deak) - Removed unneeded case I915_FORMAT_MOD_4_TILED modifier checks(Imre Deak) - Fixed I915_FORMAT_MOD_4_TILED to be 9 instead of 12 (Imre Deak)
v3: - Rebased patch on top of new changes related to plane_caps. - Added static assert to check that PLANE_CTL_TILING_YF matches PLANE_CTL_TILING_4(Nanley Chery) - Fixed naming and layout description for Tile 4 in drm uapi header(Nanley Chery)
v4: - Extracted drm_fourcc changes to separate patch(Nanley Chery)
Reviewed-by: Imre Deak imre.deak@intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Signed-off-by: Stanislav Lisovskiy stanislav.lisovskiy@intel.com Signed-off-by: Matt Roper matthew.d.roper@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 1 + drivers/gpu/drm/i915/display/intel_fb.c | 15 +++++++++++- drivers/gpu/drm/i915/display/intel_fb.h | 1 + drivers/gpu/drm/i915/display/intel_fbc.c | 1 + .../drm/i915/display/intel_plane_initial.c | 1 + .../drm/i915/display/skl_universal_plane.c | 23 ++++++++++++------- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_pci.c | 1 + drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_device_info.h | 1 + drivers/gpu/drm/i915/intel_pm.c | 1 + 11 files changed, 38 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 75de794185b2..189767cef356 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -7737,6 +7737,7 @@ static int intel_atomic_check_async(struct intel_atomic_state *state, struct int case I915_FORMAT_MOD_X_TILED: case I915_FORMAT_MOD_Y_TILED: case I915_FORMAT_MOD_Yf_TILED: + case I915_FORMAT_MOD_4_TILED: break; default: drm_dbg_kms(&i915->drm, diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 23cfe2e5ce2a..94c57facbb46 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -135,11 +135,16 @@ struct intel_modifier_desc { INTEL_PLANE_CAP_CCS_MC) #define INTEL_PLANE_CAP_TILING_MASK (INTEL_PLANE_CAP_TILING_X | \ INTEL_PLANE_CAP_TILING_Y | \ - INTEL_PLANE_CAP_TILING_Yf) + INTEL_PLANE_CAP_TILING_Yf | \ + INTEL_PLANE_CAP_TILING_4) #define INTEL_PLANE_CAP_TILING_NONE 0
static const struct intel_modifier_desc intel_modifiers[] = { { + .modifier = I915_FORMAT_MOD_4_TILED, + .display_ver = { 13, 13 }, + .plane_caps = INTEL_PLANE_CAP_TILING_4, + }, { .modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, .display_ver = { 12, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_Y | INTEL_PLANE_CAP_CCS_MC, @@ -545,6 +550,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) return 128; else return 512; + case I915_FORMAT_MOD_4_TILED: + /* + * Each 4K tile consists of 64B(8*8) subtiles, with + * same shape as Y Tile(i.e 4*16B OWords) + */ + return 128; case I915_FORMAT_MOD_Y_TILED_CCS: if (intel_fb_is_ccs_aux_plane(fb, color_plane)) return 128; @@ -650,6 +661,7 @@ static unsigned int intel_fb_modifier_to_tiling(u64 fb_modifier) return I915_TILING_Y; case INTEL_PLANE_CAP_TILING_X: return I915_TILING_X; + case INTEL_PLANE_CAP_TILING_4: case INTEL_PLANE_CAP_TILING_Yf: case INTEL_PLANE_CAP_TILING_NONE: return I915_TILING_NONE; @@ -737,6 +749,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb, case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Yf_TILED_CCS: case I915_FORMAT_MOD_Y_TILED: + case I915_FORMAT_MOD_4_TILED: case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; default: diff --git a/drivers/gpu/drm/i915/display/intel_fb.h b/drivers/gpu/drm/i915/display/intel_fb.h index ba9df8986c1e..12386f13a4e0 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.h +++ b/drivers/gpu/drm/i915/display/intel_fb.h @@ -27,6 +27,7 @@ struct intel_plane_state; #define INTEL_PLANE_CAP_TILING_X BIT(3) #define INTEL_PLANE_CAP_TILING_Y BIT(4) #define INTEL_PLANE_CAP_TILING_Yf BIT(5) +#define INTEL_PLANE_CAP_TILING_4 BIT(6)
bool intel_fb_is_ccs_modifier(u64 modifier); bool intel_fb_is_rc_ccs_cc_modifier(u64 modifier); diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c b/drivers/gpu/drm/i915/display/intel_fbc.c index bcdffe62f3cb..dccd9b7cde3f 100644 --- a/drivers/gpu/drm/i915/display/intel_fbc.c +++ b/drivers/gpu/drm/i915/display/intel_fbc.c @@ -946,6 +946,7 @@ static bool tiling_is_valid(const struct intel_plane_state *plane_state) case I915_FORMAT_MOD_Y_TILED: case I915_FORMAT_MOD_Yf_TILED: return DISPLAY_VER(i915) >= 9; + case I915_FORMAT_MOD_4_TILED: case I915_FORMAT_MOD_X_TILED: return true; default: diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c index e4186a0b8edb..426a8bd30abc 100644 --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c @@ -126,6 +126,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc, case DRM_FORMAT_MOD_LINEAR: case I915_FORMAT_MOD_X_TILED: case I915_FORMAT_MOD_Y_TILED: + case I915_FORMAT_MOD_4_TILED: break; default: drm_dbg(&dev_priv->drm, diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index 1223075595ff..5299dfe68802 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -762,6 +762,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_X; case I915_FORMAT_MOD_Y_TILED: return PLANE_CTL_TILED_Y; + case I915_FORMAT_MOD_4_TILED: + return PLANE_CTL_TILED_4; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2011,9 +2013,7 @@ static bool gen12_plane_format_mod_supported(struct drm_plane *_plane, case DRM_FORMAT_Y216: case DRM_FORMAT_XVYU12_16161616: case DRM_FORMAT_XVYU16161616: - if (modifier == DRM_FORMAT_MOD_LINEAR || - modifier == I915_FORMAT_MOD_X_TILED || - modifier == I915_FORMAT_MOD_Y_TILED) + if (!intel_fb_is_ccs_modifier(modifier)) return true; fallthrough; default: @@ -2106,6 +2106,8 @@ static u8 skl_get_plane_caps(struct drm_i915_private *i915, caps |= INTEL_PLANE_CAP_TILING_Y; if (DISPLAY_VER(i915) < 12) caps |= INTEL_PLANE_CAP_TILING_Yf; + if (HAS_4TILE(i915)) + caps |= INTEL_PLANE_CAP_TILING_4;
if (skl_plane_has_rc_ccs(i915, pipe, plane_id)) { caps |= INTEL_PLANE_CAP_CCS_RC; @@ -2278,6 +2280,7 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, unsigned int aligned_height; struct drm_framebuffer *fb; struct intel_framebuffer *intel_fb; + static_assert(PLANE_CTL_TILED_YF == PLANE_CTL_TILED_4);
if (!plane->get_hw_state(plane, &pipe)) return; @@ -2340,11 +2343,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, else fb->modifier = I915_FORMAT_MOD_Y_TILED; break; - case PLANE_CTL_TILED_YF: - if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) - fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS; - else - fb->modifier = I915_FORMAT_MOD_Yf_TILED; + case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ + if (HAS_4TILE(dev_priv)) { + fb->modifier = I915_FORMAT_MOD_4_TILED; + } else { + if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) + fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS; + else + fb->modifier = I915_FORMAT_MOD_Yf_TILED; + } break; default: MISSING_CASE(tiling); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 384977798c8e..3011c05b5b9c 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1424,6 +1424,7 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, #define CMDPARSER_USES_GGTT(dev_priv) (GRAPHICS_VER(dev_priv) == 7)
#define HAS_LLC(dev_priv) (INTEL_INFO(dev_priv)->has_llc) +#define HAS_4TILE(dev_priv) (INTEL_INFO(dev_priv)->has_4tile) #define HAS_SNOOP(dev_priv) (INTEL_INFO(dev_priv)->has_snoop) #define HAS_EDRAM(dev_priv) ((dev_priv)->edram_size_mb) #define HAS_SECURE_BATCHES(dev_priv) (GRAPHICS_VER(dev_priv) < 6) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 3976482582b8..436aae34b1f1 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1045,6 +1045,7 @@ static const struct intel_device_info dg2_info = { DGFX_FEATURES, .graphics.rel = 55, .media.rel = 55, + .has_4tile = 1, PLATFORM(INTEL_DG2), .has_guc_deprivilege = 1, .has_64k_pages = 1, diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 9b5423572fe9..14065164fdcf 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6450,6 +6450,7 @@ enum { #define PLANE_CTL_TILED_X REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 1) #define PLANE_CTL_TILED_Y REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 4) #define PLANE_CTL_TILED_YF REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 5) +#define PLANE_CTL_TILED_4 REG_FIELD_PREP(PLANE_CTL_TILED_MASK, 5) #define PLANE_CTL_ASYNC_FLIP REG_BIT(9) #define PLANE_CTL_FLIP_HORIZONTAL REG_BIT(8) #define PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE REG_BIT(4) /* TGL+ */ diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index ef7c7c988b7b..081f9849797d 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -134,6 +134,7 @@ enum intel_ppgtt_type { func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ func(has_flat_ccs); \ + func(has_4tile); \ func(has_global_mocs); \ func(has_gt_uc); \ func(has_guc_deprivilege); \ diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 46b21680e601..e81791bf7d03 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5402,6 +5402,7 @@ skl_compute_wm_params(const struct intel_crtc_state *crtc_state, }
wp->y_tiled = modifier == I915_FORMAT_MOD_Y_TILED || + modifier == I915_FORMAT_MOD_4_TILED || modifier == I915_FORMAT_MOD_Yf_TILED || modifier == I915_FORMAT_MOD_Y_TILED_CCS || modifier == I915_FORMAT_MOD_Yf_TILED_CCS;
From: Matt Roper matthew.d.roper@intel.com
DG2 unifies render compression and media compression into a single format for the first time. The programming and buffer layout is supposed to match compression on older gen12 platforms, but the actual compression algorithm is different from any previous platform; as such, we need a new framebuffer modifier to represent buffers in this format, but otherwise we can re-use the existing gen12 compression driver logic.
v2: Display version fix [Imre]
Signed-off-by: Matt Roper matthew.d.roper@intel.com cc: Radhakrishna Sripada radhakrishna.sripada@intel.com Signed-off-by: Mika Kahola mika.kahola@intel.com (v2) cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/display/intel_fb.c | 13 ++++++++++ .../drm/i915/display/skl_universal_plane.c | 26 ++++++++++++++++--- include/uapi/drm/drm_fourcc.h | 22 ++++++++++++++++ 3 files changed, 57 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 94c57facbb46..4d4d01963f15 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -141,6 +141,14 @@ struct intel_modifier_desc {
static const struct intel_modifier_desc intel_modifiers[] = { { + .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, + .display_ver = { 13, 13 }, + .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC, + }, { + .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, + .display_ver = { 13, 13 }, + .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC, + }, { .modifier = I915_FORMAT_MOD_4_TILED, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4, @@ -550,6 +558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) return 128; else return 512; + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: + case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /* * Each 4K tile consists of 64B(8*8) subtiles, with @@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb, case I915_FORMAT_MOD_4_TILED: case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: + case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: + return 16 * 1024; default: MISSING_CASE(fb->modifier); return 0; diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index 5299dfe68802..c38ae0876c15 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_Y; case I915_FORMAT_MOD_4_TILED: return PLANE_CTL_TILED_4; + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: + return PLANE_CTL_TILED_4 | + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE | + PLANE_CTL_CLEAR_COLOR_DISABLE; + case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: + return PLANE_CTL_TILED_4 | + PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | + PLANE_CTL_CLEAR_COLOR_DISABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct drm_i915_private *i915, if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0)) return false;
+ /* Wa_14013215631 */ + if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0)) + return false; + return plane_id < PLANE_SPRITE4; }
@@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, case PLANE_CTL_TILED_Y: plane_config->tiling = I915_TILING_Y; if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) - fb->modifier = DISPLAY_VER(dev_priv) >= 12 ? - I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS : - I915_FORMAT_MOD_Y_TILED_CCS; + if (DISPLAY_VER(dev_priv) >= 12) + fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS; + else + fb->modifier = I915_FORMAT_MOD_Y_TILED_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS; else @@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) { - fb->modifier = I915_FORMAT_MOD_4_TILED; + if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) + fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS; + else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) + fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS; + else + fb->modifier = I915_FORMAT_MOD_4_TILED; } else { if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS; diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b73fe6797fc3..b8fb7b44c03c 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -583,6 +583,28 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9)
+/* + * Intel color control surfaces (CCS) for DG2 render compression. + * + * DG2 uses a new compression format for render compression. The general + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, + * but a new hashing/compression algorithm is used, so a fresh modifier must + * be associated with buffers of this type. Render compression uses 128 byte + * compression blocks. + */ +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10) + +/* + * Intel color control surfaces (CCS) for DG2 media compression. + * + * DG2 uses a new compression format for media compression. The general + * layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, + * but a new hashing/compression algorithm is used, so a fresh modifier must + * be associated with buffers of this type. Media compression uses 256 byte + * compression blocks. + */ +#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11) + /* * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks *
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com wrote:
From: Matt Roper matthew.d.roper@intel.com
DG2 unifies render compression and media compression into a single format for the first time. The programming and buffer layout is supposed to match compression on older gen12 platforms, but the actual compression algorithm is different from any previous platform; as such, we need a new framebuffer modifier to represent buffers in this format, but otherwise we can re-use the existing gen12 compression driver logic.
v2: Display version fix [Imre]
Signed-off-by: Matt Roper matthew.d.roper@intel.com cc: Radhakrishna Sripada radhakrishna.sripada@intel.com Signed-off-by: Mika Kahola mika.kahola@intel.com (v2) cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/display/intel_fb.c | 13 ++++++++++ .../drm/i915/display/skl_universal_plane.c | 26 ++++++++++++++++--- include/uapi/drm/drm_fourcc.h | 22 ++++++++++++++++ 3 files changed, 57 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 94c57facbb46..4d4d01963f15 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -141,6 +141,14 @@ struct intel_modifier_desc {
static const struct intel_modifier_desc intel_modifiers[] = { {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC,
}, { .modifier = I915_FORMAT_MOD_4_TILED, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4,
@@ -550,6 +558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) return 128; else return 512;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /* * Each 4K tile consists of 64B(8*8) subtiles, with
@@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb, case I915_FORMAT_MOD_4_TILED: case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
return 16 * 1024; default: MISSING_CASE(fb->modifier); return 0;
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index 5299dfe68802..c38ae0876c15 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_Y; case I915_FORMAT_MOD_4_TILED: return PLANE_CTL_TILED_4;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
return PLANE_CTL_TILED_4 |
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE;
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
return PLANE_CTL_TILED_4 |
PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct drm_i915_private *i915, if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0)) return false;
/* Wa_14013215631 */
if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0))
return false;
return plane_id < PLANE_SPRITE4;
}
@@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, case PLANE_CTL_TILED_Y: plane_config->tiling = I915_TILING_Y; if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier = DISPLAY_VER(dev_priv) >= 12 ?
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS :
I915_FORMAT_MOD_Y_TILED_CCS;
if (DISPLAY_VER(dev_priv) >= 12)
fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS;
else
fb->modifier = I915_FORMAT_MOD_Y_TILED_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS; else
@@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) {
fb->modifier = I915_FORMAT_MOD_4_TILED;
if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
else
fb->modifier = I915_FORMAT_MOD_4_TILED; } else { if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b73fe6797fc3..b8fb7b44c03c 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -583,6 +583,28 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9)
+/*
- Intel color control surfaces (CCS) for DG2 render compression.
- DG2 uses a new compression format for render compression. The general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh modifier must
- be associated with buffers of this type. Render compression uses 128 byte
- compression blocks.
I think I've seen a way to configure the compression block size on TGL at least. I can't find the spec text for that at the moment though... Could we omit these mentions?
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
How about something like:
The main surface is Tile 4 and at plane index 0. The CCS plane is hidden from userspace. The main surface pitch is required to be a multiple of four Tile 4 widths. The CCS is configured with the render compression format associated with the main surface format.
....I think the CCS is technically accessible via the blitter engine, so the part about the plane being "hidden" may need some tweaking.
-Nanley
+/*
- Intel color control surfaces (CCS) for DG2 media compression.
- DG2 uses a new compression format for media compression. The general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh modifier must
- be associated with buffers of this type. Media compression uses 256 byte
- compression blocks.
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
/*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
On 12.2.2022 3.17, Nanley Chery wrote:
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com wrote:
From: Matt Roper matthew.d.roper@intel.com
DG2 unifies render compression and media compression into a single format for the first time. The programming and buffer layout is supposed to match compression on older gen12 platforms, but the actual compression algorithm is different from any previous platform; as such, we need a new framebuffer modifier to represent buffers in this format, but otherwise we can re-use the existing gen12 compression driver logic.
v2: Display version fix [Imre]
Signed-off-by: Matt Roper matthew.d.roper@intel.com cc: Radhakrishna Sripada radhakrishna.sripada@intel.com Signed-off-by: Mika Kahola mika.kahola@intel.com (v2) cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/display/intel_fb.c | 13 ++++++++++ .../drm/i915/display/skl_universal_plane.c | 26 ++++++++++++++++--- include/uapi/drm/drm_fourcc.h | 22 ++++++++++++++++ 3 files changed, 57 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 94c57facbb46..4d4d01963f15 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -141,6 +141,14 @@ struct intel_modifier_desc {
static const struct intel_modifier_desc intel_modifiers[] = { {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC,
}, { .modifier = I915_FORMAT_MOD_4_TILED, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4,
@@ -550,6 +558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) return 128; else return 512;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /* * Each 4K tile consists of 64B(8*8) subtiles, with
@@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb, case I915_FORMAT_MOD_4_TILED: case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
return 16 * 1024; default: MISSING_CASE(fb->modifier); return 0;
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index 5299dfe68802..c38ae0876c15 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_Y; case I915_FORMAT_MOD_4_TILED: return PLANE_CTL_TILED_4;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
return PLANE_CTL_TILED_4 |
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE;
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
return PLANE_CTL_TILED_4 |
PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct drm_i915_private *i915, if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0)) return false;
/* Wa_14013215631 */
if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0))
return false;
}return plane_id < PLANE_SPRITE4;
@@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, case PLANE_CTL_TILED_Y: plane_config->tiling = I915_TILING_Y; if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier = DISPLAY_VER(dev_priv) >= 12 ?
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS :
I915_FORMAT_MOD_Y_TILED_CCS;
if (DISPLAY_VER(dev_priv) >= 12)
fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS;
else
fb->modifier = I915_FORMAT_MOD_Y_TILED_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS; else
@@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) {
fb->modifier = I915_FORMAT_MOD_4_TILED;
if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
else
fb->modifier = I915_FORMAT_MOD_4_TILED; } else { if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_Yf_TILED_CCS;
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b73fe6797fc3..b8fb7b44c03c 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -583,6 +583,28 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9)
+/*
- Intel color control surfaces (CCS) for DG2 render compression.
- DG2 uses a new compression format for render compression. The general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh modifier must
- be associated with buffers of this type. Render compression uses 128 byte
- compression blocks.
I think I've seen a way to configure the compression block size on TGL at least. I can't find the spec text for that at the moment though... Could we omit these mentions?
Not sure why general possibility of changing compression block size is relevant? All hw features can be changed but this defines how this modifier is being implemented.
Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including control surface and copy it out, then come back and restore framebuffer with same information. It is expected to be valid?
/Juha-Pekka
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
How about something like:
The main surface is Tile 4 and at plane index 0. The CCS plane is hidden from userspace. The main surface pitch is required to be a multiple of four Tile 4 widths. The CCS is configured with the render compression format associated with the main surface format.
....I think the CCS is technically accessible via the blitter engine, so the part about the plane being "hidden" may need some tweaking.
-Nanley
+/*
- Intel color control surfaces (CCS) for DG2 media compression.
- DG2 uses a new compression format for media compression. The general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh modifier must
- be associated with buffers of this type. Media compression uses 256 byte
- compression blocks.
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
- /*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 6:54 AM To: Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Chery, Nanley G nanley.g.chery@intel.com; Auld, Matthew matthew.auld@intel.com; dri- devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
On 12.2.2022 3.17, Nanley Chery wrote:
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com
wrote:
From: Matt Roper matthew.d.roper@intel.com
DG2 unifies render compression and media compression into a single format for the first time. The programming and buffer layout is supposed to match compression on older gen12 platforms, but the actual compression algorithm is different from any previous platform; as such, we need a new framebuffer modifier to represent buffers in this format, but otherwise we can re-use the existing gen12 compression
driver logic.
v2: Display version fix [Imre]
Signed-off-by: Matt Roper matthew.d.roper@intel.com cc: Radhakrishna Sripada radhakrishna.sripada@intel.com Signed-off-by: Mika Kahola mika.kahola@intel.com (v2) cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/display/intel_fb.c | 13 ++++++++++ .../drm/i915/display/skl_universal_plane.c | 26 ++++++++++++++++--- include/uapi/drm/drm_fourcc.h | 22 ++++++++++++++++ 3 files changed, 57 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 94c57facbb46..4d4d01963f15 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -141,6 +141,14 @@ struct intel_modifier_desc {
static const struct intel_modifier_desc intel_modifiers[] = { {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 |
INTEL_PLANE_CAP_CCS_MC,
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 |
INTEL_PLANE_CAP_CCS_RC,
}, { .modifier = I915_FORMAT_MOD_4_TILED, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4, @@ -550,6
+558,8 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int
color_plane)
return 128; else return 512;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /* * Each 4K tile consists of 64B(8*8) subtiles, with
@@ -752,6 +762,9 @@ unsigned int intel_surf_alignment(const struct
drm_framebuffer *fb,
case I915_FORMAT_MOD_4_TILED: case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
return 16 * 1024; default: MISSING_CASE(fb->modifier); return 0;
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index 5299dfe68802..c38ae0876c15 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -764,6 +764,14 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_Y; case I915_FORMAT_MOD_4_TILED: return PLANE_CTL_TILED_4;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
return PLANE_CTL_TILED_4 |
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE;
case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS:
return PLANE_CTL_TILED_4 |
PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y |
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2094,6 +2102,10 @@ static bool gen12_plane_has_mc_ccs(struct
drm_i915_private *i915,
if (IS_ADLP_DISPLAY_STEP(i915, STEP_A0, STEP_B0)) return false;
/* Wa_14013215631 */
if (IS_DG2_DISPLAY_STEP(i915, STEP_A0, STEP_C0))
return false;
}return plane_id < PLANE_SPRITE4;
@@ -2335,9 +2347,10 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, case PLANE_CTL_TILED_Y: plane_config->tiling = I915_TILING_Y; if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier = DISPLAY_VER(dev_priv) >= 12 ?
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS :
I915_FORMAT_MOD_Y_TILED_CCS;
if (DISPLAY_VER(dev_priv) >= 12)
fb->modifier =
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS;
else
fb->modifier =
- I915_FORMAT_MOD_Y_TILED_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS; else
@@ -2345,7 +2358,12 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) {
fb->modifier = I915_FORMAT_MOD_4_TILED;
if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
else
fb->modifier =
- I915_FORMAT_MOD_4_TILED; } else { if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) fb->modifier =
I915_FORMAT_MOD_Yf_TILED_CCS; diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b73fe6797fc3..b8fb7b44c03c 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -583,6 +583,28 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9)
+/*
- Intel color control surfaces (CCS) for DG2 render compression.
- DG2 uses a new compression format for render compression. The
+general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh
+modifier must
- be associated with buffers of this type. Render compression uses
+128 byte
- compression blocks.
I think I've seen a way to configure the compression block size on TGL at least. I can't find the spec text for that at the moment though... Could we omit these mentions?
Not sure why general possibility of changing compression block size is relevant? All hw features can be changed but this defines how this modifier is being implemented.
I was concerned about compatibility between the different modes, but I've looked into the restrictions here and don't see any problems with this.
Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including control surface and copy it out, then come back and restore framebuffer with same information. It is expected to be valid?
/Juha-Pekka
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS
fourcc_mod_code(INTEL,
+10)
How about something like:
The main surface is Tile 4 and at plane index 0. The CCS plane is hidden from userspace. The main surface pitch is required to be a multiple of four Tile 4 widths. The CCS is configured with the render compression format associated with the main surface format.
Actually, let's omit the last sentence. CCS has always been affected by the main surface format, so I don't think there's a need to mention it specifically for the DG2 modifier.
We do need to mention the 4-tile-wide pitch requirement though.
-Nanley
....I think the CCS is technically accessible via the blitter engine, so the part about the plane being "hidden" may need some tweaking.
-Nanley
+/*
- Intel color control surfaces (CCS) for DG2 media compression.
- DG2 uses a new compression format for media compression. The
+general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh
+modifier must
- be associated with buffers of this type. Media compression uses
+256 byte
- compression blocks.
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
fourcc_mod_code(INTEL,
+11)
- /*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
On Thu, Feb 17, 2022 at 05:15:15PM +0000, Chery, Nanley G wrote:
[...] --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -583,6 +583,28 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9)
+/*
- Intel color control surfaces (CCS) for DG2 render compression.
- DG2 uses a new compression format for render compression. The general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh modifier must
- be associated with buffers of this type. Render compression uses 128 byte
- compression blocks.
I think I've seen a way to configure the compression block size on TGL at least. I can't find the spec text for that at the moment though... Could we omit these mentions?
Not sure why general possibility of changing compression block size is relevant? All hw features can be changed but this defines how this modifier is being implemented.
I was concerned about compatibility between the different modes, but I've looked into the restrictions here and don't see any problems with this.
Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including control surface and copy it out, then come back and restore framebuffer with same information. It is expected to be valid?
/Juha-Pekka
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
How about something like:
The main surface is Tile 4 and at plane index 0. The CCS plane is hidden from userspace. The main surface pitch is required to be a multiple of four Tile 4 widths. The CCS is configured with the render compression format associated with the main surface format.
Actually, let's omit the last sentence. CCS has always been affected by the main surface format, so I don't think there's a need to mention it specifically for the DG2 modifier.
We do need to mention the 4-tile-wide pitch requirement though.
Agreed, the DG2 layout of planes and the tile format used - both different wrt. the GEN12_RC_CCS format - should be described here.
-Nanley
....I think the CCS is technically accessible via the blitter engine, so the part about the plane being "hidden" may need some tweaking.
Maybe outside of the GEM object? Capturing all the above would you be ok with the following?:
Intel color control surfaces (CCS) for DG2 render compression.
The main surface is Tile 4 and at plane index 0. The CCS data is stored outside of the GEM object in a reserved memory area dedicated for the storage of the CCS data from all GEM objects. The main surface pitch is required to be a multiple of four Tile 4 widths.
Intel color control surfaces (CCS) for DG2 media compression.
The main surface is Tile 4 and at plane index 0. For semi-planar formats like NV12, the UV plane is Tile 4 at plane index 1. The CCS data both for the main and semi-planar UV planes are stored outside of the GEM object in a reserved memory area dedicated for the storage of the CCS data from all GEM objects. The main surface pitch is required to be a multiple of four Tile 4 widths.
-Nanley
+/*
- Intel color control surfaces (CCS) for DG2 media compression.
- DG2 uses a new compression format for media compression. The
+general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh
+modifier must
- be associated with buffers of this type. Media compression uses
+256 byte
- compression blocks.
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
fourcc_mod_code(INTEL,
+11)
- /*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
-----Original Message----- From: Deak, Imre imre.deak@intel.com Sent: Friday, March 18, 2022 10:40 AM To: Chery, Nanley G nanley.g.chery@intel.com Cc: juhapekka.heikkila@gmail.com; Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com; intel-gfx <intel- gfx@lists.freedesktop.org>; Auld, Matthew matthew.auld@intel.com; dri-devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 15/19] drm/i915/dg2: Add DG2 unified compression
On Thu, Feb 17, 2022 at 05:15:15PM +0000, Chery, Nanley G wrote:
[...] --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -583,6 +583,28 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED fourcc_mod_code(INTEL, 9)
+/*
- Intel color control surfaces (CCS) for DG2 render compression.
- DG2 uses a new compression format for render compression. The general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh modifier must
- be associated with buffers of this type. Render compression uses 128 byte
- compression blocks.
I think I've seen a way to configure the compression block size on TGL at least. I can't find the spec text for that at the moment though... Could we omit these mentions?
Not sure why general possibility of changing compression block size is relevant? All hw features can be changed but this defines how this modifier is being implemented.
I was concerned about compatibility between the different modes, but I've looked into the restrictions here and don't see any problems with this.
Say you take I915_FORMAT_MOD_4_TILED_DG2_RC_CCS framebuffer including control surface and copy it out, then come back and restore framebuffer with same information. It is expected to be valid?
/Juha-Pekka
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS fourcc_mod_code(INTEL, 10)
How about something like:
The main surface is Tile 4 and at plane index 0. The CCS plane is hidden from userspace. The main surface pitch is required to be a multiple of four Tile 4 widths. The CCS is configured with the render compression format associated with the main surface format.
Actually, let's omit the last sentence. CCS has always been affected by the main surface format, so I don't think there's a need to mention it specifically for the DG2 modifier.
We do need to mention the 4-tile-wide pitch requirement though.
Agreed, the DG2 layout of planes and the tile format used - both different wrt. the GEN12_RC_CCS format - should be described here.
-Nanley
....I think the CCS is technically accessible via the blitter engine, so the part about the plane being "hidden" may need some tweaking.
Maybe outside of the GEM object? Capturing all the above would you be ok with the following?:
Intel color control surfaces (CCS) for DG2 render compression.
The main surface is Tile 4 and at plane index 0. The CCS data is stored outside of the GEM object in a reserved memory area dedicated for the storage of the CCS data from all GEM objects. The main surface pitch is required to be a multiple of four Tile 4 widths.
Intel color control surfaces (CCS) for DG2 media compression.
The main surface is Tile 4 and at plane index 0. For semi-planar formats like NV12, the UV plane is Tile 4 at plane index 1. The CCS data both for the main and semi-planar UV planes are stored outside of the GEM object
This kind of implies that the Y plane is the main surface, but it's not more "main" than the UV plane right? Seems like we should specifically call out the Y plane for clarity. Maybe something like:
For semi-planar formats like NV12, the Y and UV planes are Tile 4 and are located at plane indices 0 and 1, respectively. The CCS for all planes are stored outside of the GEM object
in a reserved memory area dedicated for the storage of the CCS data from all GEM objects. The main surface pitch is required to be a multiple of four Tile 4 widths.
Looks good to me. Main suggestion I have here is to substitute "from all GEM objects" with "for all compressible GEM objects". Happy to look at further revisions, but with that change at least, Acked-by: Nanley Chery nanley.g.chery@intel.com
-Nanley
+/*
- Intel color control surfaces (CCS) for DG2 media compression.
- DG2 uses a new compression format for media compression. The
+general
- layout is the same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS,
- but a new hashing/compression algorithm is used, so a fresh
+modifier must
- be associated with buffers of this type. Media compression uses
+256 byte
- compression blocks.
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
fourcc_mod_code(INTEL,
+11)
- /*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
On Thu, Mar 24, 2022 at 01:40:37AM +0200, Chery, Nanley G wrote:
[...] Capturing all the above would you be ok with the following?:
Intel color control surfaces (CCS) for DG2 render compression.
The main surface is Tile 4 and at plane index 0. The CCS data is stored outside of the GEM object in a reserved memory area dedicated for the storage of the CCS data from all GEM objects. The main surface pitch is required to be a multiple of four Tile 4 widths.
Intel color control surfaces (CCS) for DG2 media compression.
The main surface is Tile 4 and at plane index 0. For semi-planar formats like NV12, the UV plane is Tile 4 at plane index 1. The CCS data both for the main and semi-planar UV planes are stored outside of the GEM object
This kind of implies that the Y plane is the main surface, but it's not more "main" than the UV plane right? Seems like we should specifically call out the Y plane for clarity. Maybe something like:
For semi-planar formats like NV12, the Y and UV planes are Tile 4 and are located at plane indices 0 and 1, respectively. The CCS for all planes are stored outside of the GEM object
Ok, makes sense.
in a reserved memory area dedicated for the storage of the CCS data from all GEM objects. The main surface pitch is required to be a multiple of four Tile 4 widths.
Looks good to me. Main suggestion I have here is to substitute "from all GEM objects" with "for all compressible GEM objects".
"for all RC/RC_CC/MC CCS compressible GEM objects" would be more precise, in case there are other ways to compress data. Either way looks ok to me.
Happy to look at further revisions, but with that change at least, Acked-by: Nanley Chery nanley.g.chery@intel.com
Thanks.
--Imre
From: Mika Kahola mika.kahola@intel.com
DG2 clear color render compression uses Tile4 layout. Therefore, we need to define a new format modifier for uAPI to support clear color rendering.
v2: Display version is fixed. [Imre] KDoc is enhanced for cc modifier. [Nanley & Lionel]
Signed-off-by: Mika Kahola mika.kahola@intel.com cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/display/intel_fb.c | 8 ++++++++ drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 ++++++++- include/uapi/drm/drm_fourcc.h | 10 ++++++++++ 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 4d4d01963f15..3df6ef5ffec5 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = { .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC, + }, { + .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC, + .display_ver = { 13, 13 }, + .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC, + + .ccs.cc_planes = BIT(1), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) else return 512; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /* @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb, case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: return 16 * 1024; default: diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index c38ae0876c15..b4dced1907c5 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_4 | PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | PLANE_CTL_CLEAR_COLOR_DISABLE; + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: + return PLANE_CTL_TILED_4 | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) { - if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) + u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE | + PLANE_CTL_CLEAR_COLOR_DISABLE; + + if ((val & rc_mask) == rc_mask) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS; + else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) + fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; else fb->modifier = I915_FORMAT_MOD_4_TILED; } else { diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -605,6 +605,16 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
+/* + * Intel color control surfaces (CCS) for DG2 clear color render compression. + * + * DG2 uses a unified compression format for clear color render compression. + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout. + * + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1 + */ +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12) + /* * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks *
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com wrote:
From: Mika Kahola mika.kahola@intel.com
DG2 clear color render compression uses Tile4 layout. Therefore, we need to define a new format modifier for uAPI to support clear color rendering.
v2: Display version is fixed. [Imre] KDoc is enhanced for cc modifier. [Nanley & Lionel]
Signed-off-by: Mika Kahola mika.kahola@intel.com cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/display/intel_fb.c | 8 ++++++++ drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 ++++++++- include/uapi/drm/drm_fourcc.h | 10 ++++++++++ 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 4d4d01963f15..3df6ef5ffec5 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = { .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
.ccs.cc_planes = BIT(1), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 13 },
@@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) else return 512; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /*
@@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb, case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: return 16 * 1024; default:
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index c38ae0876c15..b4dced1907c5 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_4 | PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | PLANE_CTL_CLEAR_COLOR_DISABLE;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
return PLANE_CTL_TILED_4 | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) {
if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE;
if ((val & rc_mask) == rc_mask) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; else fb->modifier = I915_FORMAT_MOD_4_TILED; } else {
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -605,6 +605,16 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
+/*
- Intel color control surfaces (CCS) for DG2 clear color render compression.
- DG2 uses a unified compression format for clear color render compression.
What's unified about DG2's compression format? If this doesn't affect the layout, maybe we should drop this sentence.
- The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
This also needs a pitch aligned to four tiles, right? I think we can save some effort by referencing the DG2_RC_CCS modifier here.
- Fast clear color value expected by HW is located in fb at offset 0 of plane#1
Why is the expected offset hardcoded to 0 instead of relying on the offset provided by the modifier API? This looks like a bug.
We should probably give some info about the relevant fields in the fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
-Nanley
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
/*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
On 12.2.2022 3.19, Nanley Chery wrote:
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com wrote:
From: Mika Kahola mika.kahola@intel.com
DG2 clear color render compression uses Tile4 layout. Therefore, we need to define a new format modifier for uAPI to support clear color rendering.
v2: Display version is fixed. [Imre] KDoc is enhanced for cc modifier. [Nanley & Lionel]
Signed-off-by: Mika Kahola mika.kahola@intel.com cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/display/intel_fb.c | 8 ++++++++ drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 ++++++++- include/uapi/drm/drm_fourcc.h | 10 ++++++++++ 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 4d4d01963f15..3df6ef5ffec5 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = { .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_MC,
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
.ccs.cc_planes = BIT(1), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 13 },
@@ -559,6 +565,7 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) else return 512; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /*
@@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct drm_framebuffer *fb, case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: return 16 * 1024; default:
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index c38ae0876c15..b4dced1907c5 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_4 | PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | PLANE_CTL_CLEAR_COLOR_DISABLE;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
return PLANE_CTL_TILED_4 | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y | PLANE_CTL_RENDER_DECOMPRESSION_ENABLE;
@@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc *crtc, break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) {
if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE;
if ((val & rc_mask) == rc_mask) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; else fb->modifier = I915_FORMAT_MOD_4_TILED; } else {
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -605,6 +605,16 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11)
+/*
- Intel color control surfaces (CCS) for DG2 clear color render compression.
- DG2 uses a unified compression format for clear color render compression.
What's unified about DG2's compression format? If this doesn't affect the layout, maybe we should drop this sentence.
- The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
This also needs a pitch aligned to four tiles, right? I think we can save some effort by referencing the DG2_RC_CCS modifier here.
- Fast clear color value expected by HW is located in fb at offset 0 of plane#1
Why is the expected offset hardcoded to 0 instead of relying on the offset provided by the modifier API? This looks like a bug.
Hi Nanley,
can you elaborate a bit, which offset from modifier API that applies to cc surface?
We should probably give some info about the relevant fields in the fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
agree, that's totally missing here.
/Juha-Pekka
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, 12)
- /*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 6:56 AM To: Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Chery, Nanley G nanley.g.chery@intel.com; Auld, Matthew matthew.auld@intel.com; dri- devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 12.2.2022 3.19, Nanley Chery wrote:
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com
wrote:
From: Mika Kahola mika.kahola@intel.com
DG2 clear color render compression uses Tile4 layout. Therefore, we need to define a new format modifier for uAPI to support clear color
rendering.
v2: Display version is fixed. [Imre] KDoc is enhanced for cc modifier. [Nanley & Lionel]
Signed-off-by: Mika Kahola mika.kahola@intel.com cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/display/intel_fb.c | 8 ++++++++ drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 ++++++++- include/uapi/drm/drm_fourcc.h | 10 ++++++++++ 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 4d4d01963f15..3df6ef5ffec5 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
intel_modifiers[] = {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4 |
INTEL_PLANE_CAP_CCS_MC,
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 |
- INTEL_PLANE_CAP_CCS_RC_CC,
.ccs.cc_planes = BIT(1), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) else return 512; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /*
@@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
drm_framebuffer *fb,
case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: return 16 * 1024; default:
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index c38ae0876c15..b4dced1907c5 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_4 | PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | PLANE_CTL_CLEAR_COLOR_DISABLE;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
return PLANE_CTL_TILED_4 |
- PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y |
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc
*crtc,
break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) {
if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE;
if ((val & rc_mask) == rc_mask) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier =
I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier =
- I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; else fb->modifier = I915_FORMAT_MOD_4_TILED; } else {
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -605,6 +605,16 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
fourcc_mod_code(INTEL,
+/*
- Intel color control surfaces (CCS) for DG2 clear color render compression.
- DG2 uses a unified compression format for clear color render
compression.
What's unified about DG2's compression format? If this doesn't affect the layout, maybe we should drop this sentence.
- The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
This also needs a pitch aligned to four tiles, right? I think we can save some effort by referencing the DG2_RC_CCS modifier here.
- Fast clear color value expected by HW is located in fb at offset
- 0 of plane#1
Why is the expected offset hardcoded to 0 instead of relying on the offset provided by the modifier API? This looks like a bug.
Hi Nanley,
can you elaborate a bit, which offset from modifier API that applies to cc surface?
Hi Juha-Pekka,
On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
-Nanley
We should probably give some info about the relevant fields in the fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
agree, that's totally missing here.
/Juha-Pekka
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
fourcc_mod_code(INTEL,
+12)
- /*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
On 15.2.2022 17.02, Chery, Nanley G wrote:
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 6:56 AM To: Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Chery, Nanley G nanley.g.chery@intel.com; Auld, Matthew matthew.auld@intel.com; dri- devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 12.2.2022 3.19, Nanley Chery wrote:
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com
wrote:
From: Mika Kahola mika.kahola@intel.com
DG2 clear color render compression uses Tile4 layout. Therefore, we need to define a new format modifier for uAPI to support clear color
rendering.
v2: Display version is fixed. [Imre] KDoc is enhanced for cc modifier. [Nanley & Lionel]
Signed-off-by: Mika Kahola mika.kahola@intel.com cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/display/intel_fb.c | 8 ++++++++ drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 ++++++++- include/uapi/drm/drm_fourcc.h | 10 ++++++++++ 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 4d4d01963f15..3df6ef5ffec5 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
intel_modifiers[] = {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4 |
INTEL_PLANE_CAP_CCS_MC,
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 |
- INTEL_PLANE_CAP_CCS_RC_CC,
.ccs.cc_planes = BIT(1), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) else return 512; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /*
@@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
drm_framebuffer *fb,
case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: return 16 * 1024; default:
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index c38ae0876c15..b4dced1907c5 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_4 | PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | PLANE_CTL_CLEAR_COLOR_DISABLE;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
return PLANE_CTL_TILED_4 |
- PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y |
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc
*crtc,
break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) {
if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
u32 rc_mask = PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
PLANE_CTL_CLEAR_COLOR_DISABLE;
if ((val & rc_mask) == rc_mask) fb->modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS; else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier =
I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier =
- I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; else fb->modifier = I915_FORMAT_MOD_4_TILED; } else {
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -605,6 +605,16 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
fourcc_mod_code(INTEL,
+/*
- Intel color control surfaces (CCS) for DG2 clear color render compression.
- DG2 uses a unified compression format for clear color render
compression.
What's unified about DG2's compression format? If this doesn't affect the layout, maybe we should drop this sentence.
- The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
This also needs a pitch aligned to four tiles, right? I think we can save some effort by referencing the DG2_RC_CCS modifier here.
- Fast clear color value expected by HW is located in fb at offset
- 0 of plane#1
Why is the expected offset hardcoded to 0 instead of relying on the offset provided by the modifier API? This looks like a bug.
Hi Nanley,
can you elaborate a bit, which offset from modifier API that applies to cc surface?
Hi Juha-Pekka,
On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
Hi Nanley,
this offset is coming from userspace on creation of framebuffer, at that moment from userspace caller can point to offset of desire. Normally offset[0] is set at 0 and then offset[n] at plane n start which is not stated to have to be exactly after plane n-1 end. Or did I misunderstand what you meant?
For cc plane this offset likely will not be zero anyway and caller can move it as see fit to have cc plane (plane[1]) location[0] at place where wanted to have it.
/Juha-Pekka
We should probably give some info about the relevant fields in the fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
agree, that's totally missing here.
/Juha-Pekka
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
fourcc_mod_code(INTEL,
+12)
- /*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 8:15 AM To: Chery, Nanley G nanley.g.chery@intel.com; Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Auld, Matthew matthew.auld@intel.com; dri-devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 15.2.2022 17.02, Chery, Nanley G wrote:
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 6:56 AM To: Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Chery, Nanley G nanley.g.chery@intel.com; Auld, Matthew matthew.auld@intel.com; dri- devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 12.2.2022 3.19, Nanley Chery wrote:
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com
wrote:
From: Mika Kahola mika.kahola@intel.com
DG2 clear color render compression uses Tile4 layout. Therefore, we need to define a new format modifier for uAPI to support clear color
rendering.
v2: Display version is fixed. [Imre] KDoc is enhanced for cc modifier. [Nanley & Lionel]
Signed-off-by: Mika Kahola mika.kahola@intel.com cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/display/intel_fb.c | 8 ++++++++ drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 ++++++++- include/uapi/drm/drm_fourcc.h | 10 ++++++++++ 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 4d4d01963f15..3df6ef5ffec5 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
intel_modifiers[] = {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4 |
INTEL_PLANE_CAP_CCS_MC,
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 |
- INTEL_PLANE_CAP_CCS_RC_CC,
.ccs.cc_planes = BIT(1), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) else return 512; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /*
@@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
drm_framebuffer *fb,
case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: return 16 * 1024; default:
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index c38ae0876c15..b4dced1907c5 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_4 | PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | PLANE_CTL_CLEAR_COLOR_DISABLE;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
return PLANE_CTL_TILED_4 |
- PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y |
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc
*crtc,
break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) {
if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
u32 rc_mask =
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
- PLANE_CTL_CLEAR_COLOR_DISABLE;
if ((val & rc_mask) == rc_mask) fb->modifier =
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier =
I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier =
- I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; else fb->modifier = I915_FORMAT_MOD_4_TILED; } else {
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -605,6 +605,16 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
fourcc_mod_code(INTEL,
+/*
- Intel color control surfaces (CCS) for DG2 clear color render
compression.
- DG2 uses a unified compression format for clear color render
compression.
What's unified about DG2's compression format? If this doesn't affect the layout, maybe we should drop this sentence.
- The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
This also needs a pitch aligned to four tiles, right? I think we can save some effort by referencing the DG2_RC_CCS modifier here.
- Fast clear color value expected by HW is located in fb at
- offset
- 0 of plane#1
Why is the expected offset hardcoded to 0 instead of relying on the offset provided by the modifier API? This looks like a bug.
Hi Nanley,
can you elaborate a bit, which offset from modifier API that applies to cc
surface?
Hi Juha-Pekka,
On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
Hi Nanley,
this offset is coming from userspace on creation of framebuffer, at that moment from userspace caller can point to offset of desire. Normally offset[0] is set at 0 and then offset[n] at plane n start which is not stated to have to be exactly after plane n-1 end. Or did I misunderstand what you meant?
Perhaps, at least, I'm not sure what you're meaning to say. This modifier description seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane must be zero. Are you saying that it's correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't match mesa's expectations.
-Nanley
For cc plane this offset likely will not be zero anyway and caller can move it as see fit to have cc plane (plane[1]) location[0] at place where wanted to have it.
/Juha-Pekka
We should probably give some info about the relevant fields in the fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
agree, that's totally missing here.
/Juha-Pekka
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
fourcc_mod_code(INTEL,
+12)
- /*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
On 15.2.2022 18.44, Chery, Nanley G wrote:
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 8:15 AM To: Chery, Nanley G nanley.g.chery@intel.com; Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Auld, Matthew matthew.auld@intel.com; dri-devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 15.2.2022 17.02, Chery, Nanley G wrote:
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 6:56 AM To: Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Chery, Nanley G nanley.g.chery@intel.com; Auld, Matthew matthew.auld@intel.com; dri- devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 12.2.2022 3.19, Nanley Chery wrote:
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com
wrote:
From: Mika Kahola mika.kahola@intel.com
DG2 clear color render compression uses Tile4 layout. Therefore, we need to define a new format modifier for uAPI to support clear color
rendering.
v2: Display version is fixed. [Imre] KDoc is enhanced for cc modifier. [Nanley & Lionel]
Signed-off-by: Mika Kahola mika.kahola@intel.com cc: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
drivers/gpu/drm/i915/display/intel_fb.c | 8 ++++++++ drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 ++++++++- include/uapi/drm/drm_fourcc.h | 10 ++++++++++ 3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 4d4d01963f15..3df6ef5ffec5 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
intel_modifiers[] = {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, .display_ver = { 13, 13 }, .plane_caps = INTEL_PLANE_CAP_TILING_4 |
INTEL_PLANE_CAP_CCS_MC,
}, {
.modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC,
.display_ver = { 13, 13 },
.plane_caps = INTEL_PLANE_CAP_TILING_4 |
- INTEL_PLANE_CAP_CCS_RC_CC,
.ccs.cc_planes = BIT(1), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@
intel_tile_width_bytes(const struct drm_framebuffer *fb, int color_plane) else return 512; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: case I915_FORMAT_MOD_4_TILED: /*
@@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const struct
drm_framebuffer *fb,
case I915_FORMAT_MOD_Yf_TILED: return 1 * 1024 * 1024; case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS:
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: return 16 * 1024; default:
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index c38ae0876c15..b4dced1907c5 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) return PLANE_CTL_TILED_4 | PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | PLANE_CTL_CLEAR_COLOR_DISABLE;
case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC:
return PLANE_CTL_TILED_4 |
- PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; case I915_FORMAT_MOD_Y_TILED_CCS: case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: return PLANE_CTL_TILED_Y |
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct intel_crtc
*crtc,
break; case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+ */ if (HAS_4TILE(dev_priv)) {
if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
u32 rc_mask =
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
- PLANE_CTL_CLEAR_COLOR_DISABLE;
if ((val & rc_mask) == rc_mask) fb->modifier =
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
else if (val & PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE) fb->modifier =
I915_FORMAT_MOD_4_TILED_DG2_MC_CCS;
else if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
fb->modifier =
- I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; else fb->modifier = I915_FORMAT_MOD_4_TILED; } else {
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -605,6 +605,16 @@ extern "C" { */ #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
fourcc_mod_code(INTEL,
+/*
- Intel color control surfaces (CCS) for DG2 clear color render
compression.
- DG2 uses a unified compression format for clear color render
compression.
What's unified about DG2's compression format? If this doesn't affect the layout, maybe we should drop this sentence.
- The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout.
This also needs a pitch aligned to four tiles, right? I think we can save some effort by referencing the DG2_RC_CCS modifier here.
- Fast clear color value expected by HW is located in fb at
- offset
- 0 of plane#1
Why is the expected offset hardcoded to 0 instead of relying on the offset provided by the modifier API? This looks like a bug.
Hi Nanley,
can you elaborate a bit, which offset from modifier API that applies to cc
surface?
Hi Juha-Pekka,
On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
Hi Nanley,
this offset is coming from userspace on creation of framebuffer, at that moment from userspace caller can point to offset of desire. Normally offset[0] is set at 0 and then offset[n] at plane n start which is not stated to have to be exactly after plane n-1 end. Or did I misunderstand what you meant?
Perhaps, at least, I'm not sure what you're meaning to say. This modifier description seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane must be zero. Are you saying that it's correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't match mesa's expectations.
It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must be zero", it says "Fast clear color value expected by HW is located in fb at offset 0 of plane#1".
Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's nothing stated about that offset.
These offsets are just offsets to bo which contain the framebuffer information hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc information is found starting at drm_mode_fb_cmd2::offsets[1][0]
/Juha-Pekka
For cc plane this offset likely will not be zero anyway and caller can move it as see fit to have cc plane (plane[1]) location[0] at place where wanted to have it.
/Juha-Pekka
We should probably give some info about the relevant fields in the fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
agree, that's totally missing here.
/Juha-Pekka
- */
+#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
fourcc_mod_code(INTEL,
+12)
- /*
- Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks
-- 2.20.1
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 9:32 AM To: Chery, Nanley G nanley.g.chery@intel.com; Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Auld, Matthew matthew.auld@intel.com; dri-devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 15.2.2022 18.44, Chery, Nanley G wrote:
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 8:15 AM To: Chery, Nanley G nanley.g.chery@intel.com; Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Auld, Matthew matthew.auld@intel.com; dri-devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 15.2.2022 17.02, Chery, Nanley G wrote:
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 6:56 AM To: Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Chery, Nanley G nanley.g.chery@intel.com; Auld, Matthew matthew.auld@intel.com; dri- devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 12.2.2022 3.19, Nanley Chery wrote:
On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C ramalingam.c@intel.com
wrote:
> > From: Mika Kahola mika.kahola@intel.com > > DG2 clear color render compression uses Tile4 layout. Therefore, > we need to define a new format modifier for uAPI to support clear > color
rendering.
> > v2: > Display version is fixed. [Imre] > KDoc is enhanced for cc modifier. [Nanley & Lionel] > > Signed-off-by: Mika Kahola mika.kahola@intel.com > cc: Anshuman Gupta anshuman.gupta@intel.com > Signed-off-by: Juha-Pekka Heikkilä > juha-pekka.heikkila@intel.com > Signed-off-by: Ramalingam C ramalingam.c@intel.com > --- > drivers/gpu/drm/i915/display/intel_fb.c | 8 ++++++++ > drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 ++++++++- > include/uapi/drm/drm_fourcc.h | 10 ++++++++++ > 3 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_fb.c > b/drivers/gpu/drm/i915/display/intel_fb.c > index 4d4d01963f15..3df6ef5ffec5 100644 > --- a/drivers/gpu/drm/i915/display/intel_fb.c > +++ b/drivers/gpu/drm/i915/display/intel_fb.c > @@ -144,6 +144,12 @@ static const struct intel_modifier_desc
intel_modifiers[] = {
> .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, > .display_ver = { 13, 13 }, > .plane_caps = INTEL_PLANE_CAP_TILING_4 | > INTEL_PLANE_CAP_CCS_MC, > + }, { > + .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC, > + .display_ver = { 13, 13 }, > + .plane_caps = INTEL_PLANE_CAP_TILING_4 | > + INTEL_PLANE_CAP_CCS_RC_CC, > + > + .ccs.cc_planes = BIT(1), > }, { > .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, > .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@ > intel_tile_width_bytes(const struct drm_framebuffer *fb, int
color_plane)
> else > return 512; > case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: > + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: > case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: > case I915_FORMAT_MOD_4_TILED: > /* > @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const > struct
drm_framebuffer *fb,
> case I915_FORMAT_MOD_Yf_TILED: > return 1 * 1024 * 1024; > case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: > + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: > case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: > return 16 * 1024; > default: > diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c > b/drivers/gpu/drm/i915/display/skl_universal_plane.c > index c38ae0876c15..b4dced1907c5 100644 > --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c > +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c > @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) > return PLANE_CTL_TILED_4 | > PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | > PLANE_CTL_CLEAR_COLOR_DISABLE; > + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: > + return PLANE_CTL_TILED_4 | > + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; > case I915_FORMAT_MOD_Y_TILED_CCS: > case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: > return PLANE_CTL_TILED_Y | > PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; > @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct > intel_crtc
*crtc,
> break; > case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+
*/
> if (HAS_4TILE(dev_priv)) { > - if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) > + u32 rc_mask =
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
> + > + PLANE_CTL_CLEAR_COLOR_DISABLE; > + > + if ((val & rc_mask) == rc_mask) > fb->modifier =
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
> else if (val &
PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
> fb->modifier = > I915_FORMAT_MOD_4_TILED_DG2_MC_CCS; > + else if (val &
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
> + fb->modifier = > + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; > else > fb->modifier = I915_FORMAT_MOD_4_TILED; > } else { > diff --git a/include/uapi/drm/drm_fourcc.h > b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 > 100644 > --- a/include/uapi/drm/drm_fourcc.h > +++ b/include/uapi/drm/drm_fourcc.h > @@ -605,6 +605,16 @@ extern "C" { > */ > #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS
fourcc_mod_code(INTEL,
> 11) > > +/* > + * Intel color control surfaces (CCS) for DG2 clear color render
compression.
> + * > + * DG2 uses a unified compression format for clear color render
compression.
What's unified about DG2's compression format? If this doesn't affect the layout, maybe we should drop this sentence.
> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout. > + *
This also needs a pitch aligned to four tiles, right? I think we can save some effort by referencing the DG2_RC_CCS modifier here.
> + * Fast clear color value expected by HW is located in fb at > + offset > + 0 of plane#1
Why is the expected offset hardcoded to 0 instead of relying on the offset provided by the modifier API? This looks like a bug.
Hi Nanley,
can you elaborate a bit, which offset from modifier API that applies to cc
surface?
Hi Juha-Pekka,
On the kernel-side of things, I'm thinking of
drm_mode_fb_cmd2::offsets[1].
Hi Nanley,
this offset is coming from userspace on creation of framebuffer, at that moment from userspace caller can point to offset of desire. Normally offset[0] is set at 0 and then offset[n] at plane n start which is not stated to have to be exactly after plane n-1 end. Or did I
misunderstand what you meant?
Perhaps, at least, I'm not sure what you're meaning to say. This modifier description seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane must be zero. Are you saying that it's correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
match mesa's expectations.
It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must be zero", it says "Fast clear color value expected by HW is located in fb at offset 0 of plane#1".
Yes, it doesn't say that exactly, but that's what it seems to say. With every other modifier, it's implied that the data for the plane begins at the offset specified through the modifier API. So, explicitly mentioning it here (and with that wording) conveys a new requirement.
Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's nothing stated about that offset.
Technically, plane #1's location is specified to be the combination of ::handles[1] and ::offsets[1]. In practice though, I can imagine that there are areas of the stack that are implicitly requiring that all ::handles[] entries match.
These offsets are just offsets to bo which contain the framebuffer information hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc information is found starting at drm_mode_fb_cmd2::offsets[1][0]
If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane index), I propose that we drop this sentence due to avoid any confusion.
This offset discussion raises another question. The description says that the value expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine? The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
-Nanley
/Juha-Pekka
For cc plane this offset likely will not be zero anyway and caller can move it as see fit to have cc plane (plane[1]) location[0] at place where
wanted to have it.
/Juha-Pekka
We should probably give some info about the relevant fields in the fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
agree, that's totally missing here.
/Juha-Pekka
> + */ > +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC
fourcc_mod_code(INTEL,
> +12) > + > /* > * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks > * > -- > 2.20.1 >
On 15.2.2022 20.24, Chery, Nanley G wrote:
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 9:32 AM To: Chery, Nanley G nanley.g.chery@intel.com; Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Auld, Matthew matthew.auld@intel.com; dri-devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 15.2.2022 18.44, Chery, Nanley G wrote:
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 8:15 AM To: Chery, Nanley G nanley.g.chery@intel.com; Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Auld, Matthew matthew.auld@intel.com; dri-devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 15.2.2022 17.02, Chery, Nanley G wrote:
-----Original Message----- From: Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Sent: Tuesday, February 15, 2022 6:56 AM To: Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com Cc: intel-gfx intel-gfx@lists.freedesktop.org; Chery, Nanley G nanley.g.chery@intel.com; Auld, Matthew matthew.auld@intel.com; dri- devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
On 12.2.2022 3.19, Nanley Chery wrote: > On Tue, Feb 1, 2022 at 2:42 AM Ramalingam C > ramalingam.c@intel.com wrote: >> >> From: Mika Kahola mika.kahola@intel.com >> >> DG2 clear color render compression uses Tile4 layout. Therefore, >> we need to define a new format modifier for uAPI to support clear >> color rendering. >> >> v2: >> Display version is fixed. [Imre] >> KDoc is enhanced for cc modifier. [Nanley & Lionel] >> >> Signed-off-by: Mika Kahola mika.kahola@intel.com >> cc: Anshuman Gupta anshuman.gupta@intel.com >> Signed-off-by: Juha-Pekka Heikkilä >> juha-pekka.heikkila@intel.com >> Signed-off-by: Ramalingam C ramalingam.c@intel.com >> --- >> drivers/gpu/drm/i915/display/intel_fb.c | 8 ++++++++ >> drivers/gpu/drm/i915/display/skl_universal_plane.c | 9 ++++++++- >> include/uapi/drm/drm_fourcc.h | 10 ++++++++++ >> 3 files changed, 26 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/i915/display/intel_fb.c >> b/drivers/gpu/drm/i915/display/intel_fb.c >> index 4d4d01963f15..3df6ef5ffec5 100644 >> --- a/drivers/gpu/drm/i915/display/intel_fb.c >> +++ b/drivers/gpu/drm/i915/display/intel_fb.c >> @@ -144,6 +144,12 @@ static const struct intel_modifier_desc intel_modifiers[] = { >> .modifier = I915_FORMAT_MOD_4_TILED_DG2_MC_CCS, >> .display_ver = { 13, 13 }, >> .plane_caps = INTEL_PLANE_CAP_TILING_4 | >> INTEL_PLANE_CAP_CCS_MC, >> + }, { >> + .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC, >> + .display_ver = { 13, 13 }, >> + .plane_caps = INTEL_PLANE_CAP_TILING_4 | >> + INTEL_PLANE_CAP_CCS_RC_CC, >> + >> + .ccs.cc_planes = BIT(1), >> }, { >> .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, >> .display_ver = { 13, 13 }, @@ -559,6 +565,7 @@ >> intel_tile_width_bytes(const struct drm_framebuffer *fb, int
color_plane)
>> else >> return 512; >> case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: >> + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: >> case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: >> case I915_FORMAT_MOD_4_TILED: >> /* >> @@ -763,6 +770,7 @@ unsigned int intel_surf_alignment(const >> struct drm_framebuffer *fb, >> case I915_FORMAT_MOD_Yf_TILED: >> return 1 * 1024 * 1024; >> case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS: >> + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: >> case I915_FORMAT_MOD_4_TILED_DG2_MC_CCS: >> return 16 * 1024; >> default: >> diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c >> b/drivers/gpu/drm/i915/display/skl_universal_plane.c >> index c38ae0876c15..b4dced1907c5 100644 >> --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c >> +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c >> @@ -772,6 +772,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) >> return PLANE_CTL_TILED_4 | >> PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE | >> PLANE_CTL_CLEAR_COLOR_DISABLE; >> + case I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC: >> + return PLANE_CTL_TILED_4 | >> + PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; >> case I915_FORMAT_MOD_Y_TILED_CCS: >> case I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC: >> return PLANE_CTL_TILED_Y | >> PLANE_CTL_RENDER_DECOMPRESSION_ENABLE; >> @@ -2358,10 +2360,15 @@ skl_get_initial_plane_config(struct >> intel_crtc *crtc, >> break; >> case PLANE_CTL_TILED_YF: /* aka PLANE_CTL_TILED_4 on XE_LPD+
*/
>> if (HAS_4TILE(dev_priv)) { >> - if (val & PLANE_CTL_RENDER_DECOMPRESSION_ENABLE) >> + u32 rc_mask =
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE |
>> + >> + PLANE_CTL_CLEAR_COLOR_DISABLE; >> + >> + if ((val & rc_mask) == rc_mask) >> fb->modifier =
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS;
>> else if (val &
PLANE_CTL_MEDIA_DECOMPRESSION_ENABLE)
>> fb->modifier = >> I915_FORMAT_MOD_4_TILED_DG2_MC_CCS; >> + else if (val &
PLANE_CTL_RENDER_DECOMPRESSION_ENABLE)
>> + fb->modifier = >> + I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC; >> else >> fb->modifier = I915_FORMAT_MOD_4_TILED; >> } else { >> diff --git a/include/uapi/drm/drm_fourcc.h >> b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 >> 100644 >> --- a/include/uapi/drm/drm_fourcc.h >> +++ b/include/uapi/drm/drm_fourcc.h >> @@ -605,6 +605,16 @@ extern "C" { >> */ >> #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, >> 11) >> >> +/* >> + * Intel color control surfaces (CCS) for DG2 clear color render
compression.
>> + * >> + * DG2 uses a unified compression format for clear color render compression. > > What's unified about DG2's compression format? If this doesn't > affect the layout, maybe we should drop this sentence. > >> + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout. >> + * > > This also needs a pitch aligned to four tiles, right? I think we > can save some effort by referencing the DG2_RC_CCS modifier here. > >> + * Fast clear color value expected by HW is located in fb at >> + offset >> + 0 of plane#1 > > Why is the expected offset hardcoded to 0 instead of relying on > the offset provided by the modifier API? This looks like a bug.
Hi Nanley,
can you elaborate a bit, which offset from modifier API that applies to cc
surface?
Hi Juha-Pekka,
On the kernel-side of things, I'm thinking of
drm_mode_fb_cmd2::offsets[1].
Hi Nanley,
this offset is coming from userspace on creation of framebuffer, at that moment from userspace caller can point to offset of desire. Normally offset[0] is set at 0 and then offset[n] at plane n start which is not stated to have to be exactly after plane n-1 end. Or did I
misunderstand what you meant?
Perhaps, at least, I'm not sure what you're meaning to say. This modifier description seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane must be zero. Are you saying that it's correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't
match mesa's expectations.
It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must be zero", it says "Fast clear color value expected by HW is located in fb at offset 0 of plane#1".
Yes, it doesn't say that exactly, but that's what it seems to say. With every other modifier, it's implied that the data for the plane begins at the offset specified through the modifier API. So, explicitly mentioning it here (and with that wording) conveys a new requirement.
I don't have objections on changing this description but for reference gen12 version of the same says "The main surface is Y-tiled and is at plane index 0 whereas CCS is linear and at index 1. The clear color is stored at index 2, and the pitch should be ignored.", only plane indexes are mentioned. I anyway wrote neither of these descriptions.
Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's nothing stated about that offset.
Technically, plane #1's location is specified to be the combination of ::handles[1] and ::offsets[1]. In practice though, I can imagine that there are areas of the stack that are implicitly requiring that all ::handles[] entries match.
I didn't think we needed to go deeper as you started to just talk about how drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.
These offsets are just offsets to bo which contain the framebuffer information hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc information is found starting at drm_mode_fb_cmd2::offsets[1][0]
If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane index), I propose that we drop this sentence due to avoid any confusion.
But it need to defined as part of the modifier. It's the modifier features which are being described here.
This offset discussion raises another question. The description says that the value expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine? The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
Generally answer is yes but these parts you can see in patch "[PATCH v5 17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. Here "HW" should probably be changed something meaningful though.
/Juha-Pekka
For cc plane this offset likely will not be zero anyway and caller can move it as see fit to have cc plane (plane[1]) location[0] at place where
wanted to have it.
/Juha-Pekka
> > We should probably give some info about the relevant fields in the > fast clear plane (like what's done in the GEN12_RC_CCS_CC modifier).
agree, that's totally missing here.
/Juha-Pekka
> >> + */ >> +#define I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC fourcc_mod_code(INTEL, >> +12) >> + >> /* >> * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks >> * >> -- >> 2.20.1 >>
Hi Nanley, JP,
On Tue, Feb 15, 2022 at 09:34:22PM +0200, Juha-Pekka Heikkila wrote:
[...]
> > > diff --git a/include/uapi/drm/drm_fourcc.h > > > b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 > > > --- a/include/uapi/drm/drm_fourcc.h > > > +++ b/include/uapi/drm/drm_fourcc.h > > > @@ -605,6 +605,16 @@ extern "C" { > > > */ > > > #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11) > > > > > > +/* > > > + * Intel color control surfaces (CCS) for DG2 clear color render compression. > > > + * > > > + * DG2 uses a unified compression format for clear color render compression. > > > > What's unified about DG2's compression format? If this doesn't > > affect the layout, maybe we should drop this sentence.
Unified here probably refers to the fact the DG2 render engine is capable of generating both a render and a media compressed surface as opposed to earlier platforms. The display engine still needs to know which compression format the FB uses, hence we need both an RC and MC modifier. Based on this I also think we can drop the mention of unified compression.
> > > + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout. > > > + * > > > > This also needs a pitch aligned to four tiles, right? I think we > > can save some effort by referencing the DG2_RC_CCS modifier here. > > > > > + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1 > > > > Why is the expected offset hardcoded to 0 instead of relying on > > the offset provided by the modifier API? This looks like a bug. > > Hi Nanley, > > can you elaborate a bit, which offset from modifier API that > applies to cc surface?
Hi Juha-Pekka,
On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
Hi Nanley,
this offset is coming from userspace on creation of framebuffer, at that moment from userspace caller can point to offset of desire. Normally offset[0] is set at 0 and then offset[n] at plane n start which is not stated to have to be exactly after plane n-1 end. Or did I misunderstand what you meant?
Perhaps, at least, I'm not sure what you're meaning to say. This modifier description seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane must be zero. Are you saying that it's correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't match mesa's expectations.
It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must be zero", it says "Fast clear color value expected by HW is located in fb at offset 0 of plane#1".
Yes, it doesn't say that exactly, but that's what it seems to say. With every other modifier, it's implied that the data for the plane begins at the offset specified through the modifier API. So, explicitly mentioning it here (and with that wording) conveys a new requirement.
I don't have objections on changing this description but for reference gen12 version of the same says "The main surface is Y-tiled and is at plane index 0 whereas CCS is linear and at index 1. The clear color is stored at index 2, and the pitch should be ignored.", only plane indexes are mentioned. I anyway wrote neither of these descriptions.
Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's nothing stated about that offset.
Technically, plane #1's location is specified to be the combination of ::handles[1] and ::offsets[1]. In practice though, I can imagine that there are areas of the stack that are implicitly requiring that all ::handles[] entries match.
The FB modifier API requires all ::handles[] to match, that is all planes must be contained in one GEM object.
I didn't think we needed to go deeper as you started to just talk about how drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.
These offsets are just offsets to bo which contain the framebuffer information hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc information is found starting at drm_mode_fb_cmd2::offsets[1][0]
If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane index), I propose that we drop this sentence due to avoid any confusion.
But it need to defined as part of the modifier. It's the modifier features which are being described here.
This offset discussion raises another question. The description says that the value expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine? The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
Generally answer is yes but these parts you can see in patch "[PATCH v5 17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. Here "HW" should probably be changed something meaningful though.
The 256 bit clear color format starting at plane index 1 matches the one described at I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC . So yes, "HW" refers to the render engine and display consumes the 64 bit data at ::offset[1] + 16 bytes (and DE ignores the 64 bit data starting at ::offset[1] + 24 bytes.
The following captures all the above, would it be ok?:
Intel Color Control Surface with Clear Color (CCS) for DG2 render compression.
The main surface is Tile 4 and at plane index 0. The CCS data is stored outside of the GEM object in a reserved memory area dedicated for the storage of the CCS data from all GEM objects. The main surface pitch is required to be a multiple of four Tile 4 widths. The clear color is stored at plane index 1 and the pitch should be ignored. The format of the 256 bits clear color data matches the one used for the I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description for details.
--Imre
-----Original Message----- From: Deak, Imre imre.deak@intel.com Sent: Monday, March 21, 2022 6:20 AM To: Chery, Nanley G nanley.g.chery@intel.com; Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Cc: Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com; intel-gfx intel-gfx@lists.freedesktop.org; Auld, Matthew matthew.auld@intel.com; dri-devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
Hi Nanley, JP,
On Tue, Feb 15, 2022 at 09:34:22PM +0200, Juha-Pekka Heikkila wrote:
[...]
> > > > diff --git a/include/uapi/drm/drm_fourcc.h > > > > b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 > > > > --- a/include/uapi/drm/drm_fourcc.h > > > > +++ b/include/uapi/drm/drm_fourcc.h > > > > @@ -605,6 +605,16 @@ extern "C" { > > > > */ > > > > #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11) > > > > > > > > +/* > > > > + * Intel color control surfaces (CCS) for DG2 clear color render compression. > > > > + * > > > > + * DG2 uses a unified compression format for clear color render compression. > > > > > > What's unified about DG2's compression format? If this doesn't > > > affect the layout, maybe we should drop this sentence.
Unified here probably refers to the fact the DG2 render engine is capable of generating both a render and a media compressed surface as opposed to earlier platforms. The display engine still needs to know which compression format the FB uses, hence we need both an RC and MC modifier. Based on this I also think we can drop the mention of unified compression.
> > > > + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout. > > > > + * > > > > > > This also needs a pitch aligned to four tiles, right? I think we > > > can save some effort by referencing the DG2_RC_CCS modifier here. > > > > > > > + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1 > > > > > > Why is the expected offset hardcoded to 0 instead of relying on > > > the offset provided by the modifier API? This looks like a bug. > > > > Hi Nanley, > > > > can you elaborate a bit, which offset from modifier API that > > applies to cc surface? > > Hi Juha-Pekka, > > On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1].
Hi Nanley,
this offset is coming from userspace on creation of framebuffer, at that moment from userspace caller can point to offset of desire. Normally offset[0] is set at 0 and then offset[n] at plane n start which is not stated to have to be exactly after plane n-1 end. Or did I misunderstand what you meant?
Perhaps, at least, I'm not sure what you're meaning to say. This modifier description seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane must be zero. Are you saying that it's correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't match mesa's expectations.
It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must be zero", it says "Fast clear color value expected by HW is located in fb at offset 0 of plane#1".
Yes, it doesn't say that exactly, but that's what it seems to say. With every other modifier, it's implied that the data for the plane begins at the offset specified through the modifier API. So, explicitly mentioning it here (and with that wording) conveys a new requirement.
I don't have objections on changing this description but for reference gen12 version of the same says "The main surface is Y-tiled and is at plane index 0 whereas CCS is linear and at index 1. The clear color is stored at index 2, and the pitch should be ignored.", only plane indexes are mentioned. I anyway wrote neither of these descriptions.
Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's nothing stated about that offset.
Technically, plane #1's location is specified to be the combination of ::handles[1] and ::offsets[1]. In practice though, I can imagine that there are areas of the stack that are implicitly requiring that all ::handles[] entries match.
The FB modifier API requires all ::handles[] to match, that is all planes must be contained in one GEM object.
This is a requirement for i915, or for all drm drivers? I couldn't find anything in the generic DRM headers or docs requiring this. Feel free to ping me about this offline.
I didn't think we needed to go deeper as you started to just talk about how drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.
These offsets are just offsets to bo which contain the framebuffer information hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc information is found starting at drm_mode_fb_cmd2::offsets[1][0]
If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane index), I propose that we drop this sentence due to avoid any confusion.
But it need to defined as part of the modifier. It's the modifier features which are being described here.
This offset discussion raises another question. The description says that the value expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine? The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
Generally answer is yes but these parts you can see in patch "[PATCH v5 17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. Here "HW" should probably be changed something meaningful though.
The 256 bit clear color format starting at plane index 1 matches the one described at I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC . So yes, "HW" refers to the render engine and display consumes the 64 bit data at ::offset[1] + 16 bytes (and DE ignores the 64 bit data starting at ::offset[1] + 24 bytes.
The following captures all the above, would it be ok?:
Intel Color Control Surface with Clear Color (CCS) for DG2 render compression.
The main surface is Tile 4 and at plane index 0. The CCS data is stored outside of the GEM object in a reserved memory area dedicated for the storage of the CCS data from all GEM objects. The main surface pitch is
^ "for all compressible" ? (since SMEM objects don't have this)
required to be a multiple of four Tile 4 widths. The clear color is stored at plane index 1 and the pitch should be ignored. The format of the 256 bits clear color data matches the one used for the
^ "256 bits of"?
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description for details.
Looks good to me. With the above minor changes, Acked-by: Nanley Chery nanley.g.chery@intel.com
--Imre
On Thu, Mar 24, 2022 at 01:42:33AM +0200, Chery, Nanley G wrote:
-----Original Message----- From: Deak, Imre imre.deak@intel.com Sent: Monday, March 21, 2022 6:20 AM To: Chery, Nanley G nanley.g.chery@intel.com; Juha-Pekka Heikkila juhapekka.heikkila@gmail.com Cc: Nanley Chery nanleychery@gmail.com; C, Ramalingam ramalingam.c@intel.com; intel-gfx intel-gfx@lists.freedesktop.org; Auld, Matthew matthew.auld@intel.com; dri-devel dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH v5 16/19] uapi/drm/dg2: Introduce format modifier for DG2 clear color
Hi Nanley, JP,
On Tue, Feb 15, 2022 at 09:34:22PM +0200, Juha-Pekka Heikkila wrote:
[...]
> > > > > diff --git a/include/uapi/drm/drm_fourcc.h > > > > > b/include/uapi/drm/drm_fourcc.h index b8fb7b44c03c..697614ea4b84 100644 > > > > > --- a/include/uapi/drm/drm_fourcc.h > > > > > +++ b/include/uapi/drm/drm_fourcc.h > > > > > @@ -605,6 +605,16 @@ extern "C" { > > > > > */ > > > > > #define I915_FORMAT_MOD_4_TILED_DG2_MC_CCS fourcc_mod_code(INTEL, 11) > > > > > > > > > > +/* > > > > > + * Intel color control surfaces (CCS) for DG2 clear color render compression. > > > > > + * > > > > > + * DG2 uses a unified compression format for clear color render compression. > > > > > > > > What's unified about DG2's compression format? If this doesn't > > > > affect the layout, maybe we should drop this sentence.
Unified here probably refers to the fact the DG2 render engine is capable of generating both a render and a media compressed surface as opposed to earlier platforms. The display engine still needs to know which compression format the FB uses, hence we need both an RC and MC modifier. Based on this I also think we can drop the mention of unified compression.
> > > > > + * The general layout is a tiled layout using 4Kb tiles i.e. Tile4 layout. > > > > > + * > > > > > > > > This also needs a pitch aligned to four tiles, right? I think we > > > > can save some effort by referencing the DG2_RC_CCS modifier here. > > > > > > > > > + * Fast clear color value expected by HW is located in fb at offset 0 of plane#1 > > > > > > > > Why is the expected offset hardcoded to 0 instead of relying on > > > > the offset provided by the modifier API? This looks like a bug. > > > > > > Hi Nanley, > > > > > > can you elaborate a bit, which offset from modifier API that > > > applies to cc surface? > > > > Hi Juha-Pekka, > > > > On the kernel-side of things, I'm thinking of drm_mode_fb_cmd2::offsets[1]. > > Hi Nanley, > > this offset is coming from userspace on creation of framebuffer, at > that moment from userspace caller can point to offset of desire. > Normally offset[0] is set at 0 and then offset[n] at plane n start > which is not stated to have to be exactly after plane n-1 end. Or did I > misunderstand what you meant?
Perhaps, at least, I'm not sure what you're meaning to say. This modifier description seems to say that the drm_mode_fb_cmd2::offsets value for the clear color plane must be zero. Are you saying that it's correct? This doesn't match the GEN12_RC_CCS_CC behavior and doesn't match mesa's expectations.
It doesn't say "drm_mode_fb_cmd2::offsets value for the clear color plane must be zero", it says "Fast clear color value expected by HW is located in fb at offset 0 of plane#1".
Yes, it doesn't say that exactly, but that's what it seems to say. With every other modifier, it's implied that the data for the plane begins at the offset specified through the modifier API. So, explicitly mentioning it here (and with that wording) conveys a new requirement.
I don't have objections on changing this description but for reference gen12 version of the same says "The main surface is Y-tiled and is at plane index 0 whereas CCS is linear and at index 1. The clear color is stored at index 2, and the pitch should be ignored.", only plane indexes are mentioned. I anyway wrote neither of these descriptions.
Plane#1 location is pointed by drm_mode_fb_cmd2::offsets[1] and there's nothing stated about that offset.
Technically, plane #1's location is specified to be the combination of ::handles[1] and ::offsets[1]. In practice though, I can imagine that there are areas of the stack that are implicitly requiring that all ::handles[] entries match.
The FB modifier API requires all ::handles[] to match, that is all planes must be contained in one GEM object.
This is a requirement for i915, or for all drm drivers? I couldn't find anything in the generic DRM headers or docs requiring this. Feel free to ping me about this offline.
It's only an i915 requirement actually.
IIUC, it was added for the SKL CCS modifiers (2e2adb05736c3), where SKL has a restriction on the location of the CCS (and UV) planes. The feasible way to conform to these limits was to require all planes to reside in the same GEM object.
For DG2 there was no plan to expose the CCS plane, so wrt. that this restriction didn't make a difference.
I didn't think we needed to go deeper as you started to just talk about how drm_mode_fb_cmd2::offsets[1] not being used. Let's not waste time.
These offsets are just offsets to bo which contain the framebuffer information hence drm_mode_fb_cmd2::offsets[1] can be changed as one wish and cc information is found starting at drm_mode_fb_cmd2::offsets[1][0]
If the clear color handling is the same as GEN12_RC_CCS_CC (apart for the plane index), I propose that we drop this sentence due to avoid any confusion.
But it need to defined as part of the modifier. It's the modifier features which are being described here.
This offset discussion raises another question. The description says that the value expected by HW is at offset 0. I'm assuming "HW" is referring to the render engine? The kernel is still giving the display engine the packed values at ::offsets[1] + 16B right?
Generally answer is yes but these parts you can see in patch "[PATCH v5 17/19] drm/i915/dg2: Flat CCS Support" and should be discussed there. Here "HW" should probably be changed something meaningful though.
The 256 bit clear color format starting at plane index 1 matches the one described at I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC . So yes, "HW" refers to the render engine and display consumes the 64 bit data at ::offset[1] + 16 bytes (and DE ignores the 64 bit data starting at ::offset[1] + 24 bytes.
The following captures all the above, would it be ok?:
Intel Color Control Surface with Clear Color (CCS) for DG2 render compression.
The main surface is Tile 4 and at plane index 0. The CCS data is stored outside of the GEM object in a reserved memory area dedicated for the storage of the CCS data from all GEM objects. The main surface pitch is
^ "for all compressible" ? (since SMEM objects don't have this)
Makes sense, optionally also mentioning that "for all RC/RC_CC/MC CCS compressible GEM objects".
required to be a multiple of four Tile 4 widths. The clear color is stored at plane index 1 and the pitch should be ignored. The format of the 256 bits clear color data matches the one used for the
^ "256 bits of"?
Ok.
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC modifier, see its description for details.
Looks good to me. With the above minor changes, Acked-by: Nanley Chery nanley.g.chery@intel.com
Thanks.
--Imre
From: Anshuman Gupta anshuman.gupta@intel.com
DG2 onwards discrete gfx has support for new flat CCS mapping, which brings in display feature in to avoid Aux walk for compressed surface. This support build on top of Flat CCS support added in XEHPSDV. FLAT CCS surface base address should be 64k aligned, Compressed displayable surfaces must use tile4 format.
HAS: 1407880786 B.Spec : 7655 B.Spec : 53902
Cc: Mika Kahola mika.kahola@intel.com Signed-off-by: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 4 ++- drivers/gpu/drm/i915/display/intel_fb.c | 32 +++++++++++++------ .../drm/i915/display/skl_universal_plane.c | 16 ++++++---- 3 files changed, 36 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 189767cef356..2828ae612179 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -8588,7 +8588,9 @@ static void intel_atomic_prepare_plane_clear_colors(struct intel_atomic_state *s
/* * The layout of the fast clear color value expected by HW - * (the DRM ABI requiring this value to be located in fb at offset 0 of plane#2): + * (the DRM ABI requiring this value to be located in fb at + * offset 0 of cc plane, plane #2 previous generations or + * plane #1 for flat ccs): * - 4 x 4 bytes per-channel value * (in surface type specific float/int format provided by the fb user) * - 8 bytes native color value used by the display diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 3df6ef5ffec5..e94923e9dbb1 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -107,6 +107,21 @@ static const struct drm_format_info gen12_ccs_cc_formats[] = { .hsub = 1, .vsub = 1, .has_alpha = true }, };
+static const struct drm_format_info gen12_flat_ccs_cc_formats[] = { + { .format = DRM_FORMAT_XRGB8888, .depth = 24, .num_planes = 2, + .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 }, + .hsub = 1, .vsub = 1, }, + { .format = DRM_FORMAT_XBGR8888, .depth = 24, .num_planes = 2, + .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 }, + .hsub = 1, .vsub = 1, }, + { .format = DRM_FORMAT_ARGB8888, .depth = 32, .num_planes = 2, + .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 }, + .hsub = 1, .vsub = 1, .has_alpha = true }, + { .format = DRM_FORMAT_ABGR8888, .depth = 32, .num_planes = 2, + .char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 }, + .hsub = 1, .vsub = 1, .has_alpha = true }, +}; + struct intel_modifier_desc { u64 modifier; struct { @@ -150,6 +165,8 @@ static const struct intel_modifier_desc intel_modifiers[] = { .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
.ccs.cc_planes = BIT(1), + + FORMAT_OVERRIDE(gen12_flat_ccs_cc_formats), }, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 13 }, @@ -399,17 +416,13 @@ bool intel_fb_plane_supports_modifier(struct intel_plane *plane, u64 modifier) static bool format_is_yuv_semiplanar(const struct intel_modifier_desc *md, const struct drm_format_info *info) { - int yuv_planes; - if (!info->is_yuv) return false;
- if (plane_caps_contain_any(md->plane_caps, INTEL_PLANE_CAP_CCS_MASK)) - yuv_planes = 4; + if (hweight8(md->ccs.planar_aux_planes) == 2) + return info->num_planes == 4; else - yuv_planes = 2; - - return info->num_planes == yuv_planes; + return info->num_planes == 2; }
/** @@ -534,12 +547,13 @@ static unsigned int gen12_ccs_aux_stride(struct intel_framebuffer *fb, int ccs_p
int skl_main_to_aux_plane(const struct drm_framebuffer *fb, int main_plane) { + const struct intel_modifier_desc *md = lookup_modifier(fb->modifier); struct drm_i915_private *i915 = to_i915(fb->dev);
- if (intel_fb_is_ccs_modifier(fb->modifier)) + if (md->ccs.packed_aux_planes | md->ccs.planar_aux_planes) return main_to_ccs_plane(fb, main_plane); else if (DISPLAY_VER(i915) < 11 && - intel_format_info_is_yuv_semiplanar(fb->format, fb->modifier)) + format_is_yuv_semiplanar(md, fb->format)) return 1; else return 0; diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index b4dced1907c5..18e50583abaa 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -1176,8 +1176,10 @@ skl_plane_update_arm(struct intel_plane *plane, intel_de_write_fw(dev_priv, PLANE_OFFSET(pipe, plane_id), PLANE_OFFSET_Y(y) | PLANE_OFFSET_X(x));
- intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id), - skl_plane_aux_dist(plane_state, color_plane)); + /* FLAT CCS doesn't need to program AUX_DIST */ + if (!HAS_FLAT_CCS(dev_priv)) + intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id), + skl_plane_aux_dist(plane_state, color_plane));
if (DISPLAY_VER(dev_priv) < 11) intel_de_write_fw(dev_priv, PLANE_AUX_OFFSET(pipe, plane_id), @@ -1557,9 +1559,10 @@ static int skl_check_main_surface(struct intel_plane_state *plane_state)
/* * CCS AUX surface doesn't have its own x/y offsets, we must make sure - * they match with the main surface x/y offsets. + * they match with the main surface x/y offsets. On DG2 + * there's no aux plane on fb so skip this checking. */ - if (intel_fb_is_ccs_modifier(fb->modifier)) { + if (intel_fb_is_ccs_modifier(fb->modifier) && aux_plane) { while (!skl_check_main_ccs_coordinates(plane_state, x, y, offset, aux_plane)) { if (offset == 0) @@ -1603,6 +1606,8 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state) const struct drm_framebuffer *fb = plane_state->hw.fb; unsigned int rotation = plane_state->hw.rotation; int uv_plane = 1; + int ccs_plane = intel_fb_is_ccs_modifier(fb->modifier) ? + skl_main_to_aux_plane(fb, uv_plane) : 0; int max_width = intel_plane_max_width(plane, fb, uv_plane, rotation); int max_height = intel_plane_max_height(plane, fb, uv_plane, rotation); int x = plane_state->uapi.src.x1 >> 17; @@ -1623,8 +1628,7 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state) offset = intel_plane_compute_aligned_offset(&x, &y, plane_state, uv_plane);
- if (intel_fb_is_ccs_modifier(fb->modifier)) { - int ccs_plane = main_to_ccs_plane(fb, uv_plane); + if (ccs_plane) { u32 aux_offset = plane_state->view.color_plane[ccs_plane].offset; u32 alignment = intel_surf_alignment(fb, uv_plane);
On Tue, Feb 01, 2022 at 04:11:30PM +0530, Ramalingam C wrote:
From: Anshuman Gupta anshuman.gupta@intel.com
DG2 onwards discrete gfx has support for new flat CCS mapping, which brings in display feature in to avoid Aux walk for compressed surface. This support build on top of Flat CCS support added in XEHPSDV. FLAT CCS surface base address should be 64k aligned, Compressed displayable surfaces must use tile4 format.
HAS: 1407880786 B.Spec : 7655 B.Spec : 53902
Cc: Mika Kahola mika.kahola@intel.com Signed-off-by: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Juha-Pekka Heikkilä juha-pekka.heikkila@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com
Reviewed-by: Imre Deak imre.deak@intel.com
drivers/gpu/drm/i915/display/intel_display.c | 4 ++- drivers/gpu/drm/i915/display/intel_fb.c | 32 +++++++++++++------ .../drm/i915/display/skl_universal_plane.c | 16 ++++++---- 3 files changed, 36 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 189767cef356..2828ae612179 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -8588,7 +8588,9 @@ static void intel_atomic_prepare_plane_clear_colors(struct intel_atomic_state *s
/* * The layout of the fast clear color value expected by HW
* (the DRM ABI requiring this value to be located in fb at offset 0 of plane#2):
* (the DRM ABI requiring this value to be located in fb at
* offset 0 of cc plane, plane #2 previous generations or
* plane #1 for flat ccs):
- 4 x 4 bytes per-channel value
- (in surface type specific float/int format provided by the fb user)
- 8 bytes native color value used by the display
diff --git a/drivers/gpu/drm/i915/display/intel_fb.c b/drivers/gpu/drm/i915/display/intel_fb.c index 3df6ef5ffec5..e94923e9dbb1 100644 --- a/drivers/gpu/drm/i915/display/intel_fb.c +++ b/drivers/gpu/drm/i915/display/intel_fb.c @@ -107,6 +107,21 @@ static const struct drm_format_info gen12_ccs_cc_formats[] = { .hsub = 1, .vsub = 1, .has_alpha = true }, };
+static const struct drm_format_info gen12_flat_ccs_cc_formats[] = {
- { .format = DRM_FORMAT_XRGB8888, .depth = 24, .num_planes = 2,
.char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
.hsub = 1, .vsub = 1, },
- { .format = DRM_FORMAT_XBGR8888, .depth = 24, .num_planes = 2,
.char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
.hsub = 1, .vsub = 1, },
- { .format = DRM_FORMAT_ARGB8888, .depth = 32, .num_planes = 2,
.char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
.hsub = 1, .vsub = 1, .has_alpha = true },
- { .format = DRM_FORMAT_ABGR8888, .depth = 32, .num_planes = 2,
.char_per_block = { 4, 0 }, .block_w = { 1, 2 }, .block_h = { 1, 1 },
.hsub = 1, .vsub = 1, .has_alpha = true },
+};
struct intel_modifier_desc { u64 modifier; struct { @@ -150,6 +165,8 @@ static const struct intel_modifier_desc intel_modifiers[] = { .plane_caps = INTEL_PLANE_CAP_TILING_4 | INTEL_PLANE_CAP_CCS_RC_CC,
.ccs.cc_planes = BIT(1),
}, { .modifier = I915_FORMAT_MOD_4_TILED_DG2_RC_CCS, .display_ver = { 13, 13 },FORMAT_OVERRIDE(gen12_flat_ccs_cc_formats),
@@ -399,17 +416,13 @@ bool intel_fb_plane_supports_modifier(struct intel_plane *plane, u64 modifier) static bool format_is_yuv_semiplanar(const struct intel_modifier_desc *md, const struct drm_format_info *info) {
int yuv_planes;
if (!info->is_yuv) return false;
if (plane_caps_contain_any(md->plane_caps, INTEL_PLANE_CAP_CCS_MASK))
yuv_planes = 4;
- if (hweight8(md->ccs.planar_aux_planes) == 2)
elsereturn info->num_planes == 4;
yuv_planes = 2;
- return info->num_planes == yuv_planes;
return info->num_planes == 2;
}
/** @@ -534,12 +547,13 @@ static unsigned int gen12_ccs_aux_stride(struct intel_framebuffer *fb, int ccs_p
int skl_main_to_aux_plane(const struct drm_framebuffer *fb, int main_plane) {
- const struct intel_modifier_desc *md = lookup_modifier(fb->modifier); struct drm_i915_private *i915 = to_i915(fb->dev);
- if (intel_fb_is_ccs_modifier(fb->modifier))
- if (md->ccs.packed_aux_planes | md->ccs.planar_aux_planes) return main_to_ccs_plane(fb, main_plane); else if (DISPLAY_VER(i915) < 11 &&
intel_format_info_is_yuv_semiplanar(fb->format, fb->modifier))
return 1; else return 0;format_is_yuv_semiplanar(md, fb->format))
diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c b/drivers/gpu/drm/i915/display/skl_universal_plane.c index b4dced1907c5..18e50583abaa 100644 --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c @@ -1176,8 +1176,10 @@ skl_plane_update_arm(struct intel_plane *plane, intel_de_write_fw(dev_priv, PLANE_OFFSET(pipe, plane_id), PLANE_OFFSET_Y(y) | PLANE_OFFSET_X(x));
- intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id),
skl_plane_aux_dist(plane_state, color_plane));
/* FLAT CCS doesn't need to program AUX_DIST */
if (!HAS_FLAT_CCS(dev_priv))
intel_de_write_fw(dev_priv, PLANE_AUX_DIST(pipe, plane_id),
skl_plane_aux_dist(plane_state, color_plane));
if (DISPLAY_VER(dev_priv) < 11) intel_de_write_fw(dev_priv, PLANE_AUX_OFFSET(pipe, plane_id),
@@ -1557,9 +1559,10 @@ static int skl_check_main_surface(struct intel_plane_state *plane_state)
/* * CCS AUX surface doesn't have its own x/y offsets, we must make sure
* they match with the main surface x/y offsets.
* they match with the main surface x/y offsets. On DG2
*/* there's no aux plane on fb so skip this checking.
- if (intel_fb_is_ccs_modifier(fb->modifier)) {
- if (intel_fb_is_ccs_modifier(fb->modifier) && aux_plane) { while (!skl_check_main_ccs_coordinates(plane_state, x, y, offset, aux_plane)) { if (offset == 0)
@@ -1603,6 +1606,8 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state) const struct drm_framebuffer *fb = plane_state->hw.fb; unsigned int rotation = plane_state->hw.rotation; int uv_plane = 1;
- int ccs_plane = intel_fb_is_ccs_modifier(fb->modifier) ?
int max_width = intel_plane_max_width(plane, fb, uv_plane, rotation); int max_height = intel_plane_max_height(plane, fb, uv_plane, rotation); int x = plane_state->uapi.src.x1 >> 17;skl_main_to_aux_plane(fb, uv_plane) : 0;
@@ -1623,8 +1628,7 @@ static int skl_check_nv12_aux_surface(struct intel_plane_state *plane_state) offset = intel_plane_compute_aligned_offset(&x, &y, plane_state, uv_plane);
- if (intel_fb_is_ccs_modifier(fb->modifier)) {
int ccs_plane = main_to_ccs_plane(fb, uv_plane);
- if (ccs_plane) { u32 aux_offset = plane_state->view.color_plane[ccs_plane].offset; u32 alignment = intel_surf_alignment(fb, uv_plane);
-- 2.20.1
Documents the Flat-CCS feature and kernel handling required along with modifiers used.
Signed-off-by: Ramalingam C ramalingam.c@intel.com cc: Simon Ser contact@emersion.fr cc: Pekka Paalanen ppaalanen@gmail.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: mesa-dev@lists.freedesktop.org Cc: Tony Ye tony.ye@intel.com Cc: Slawomir Milczarek slawomir.milczarek@intel.com --- drivers/gpu/drm/i915/gt/intel_migrate.c | 47 +++++++++++++++++++++++++ 1 file changed, 47 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 3e1cf224cdf0..5bdab0b3c735 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -596,6 +596,53 @@ intel_context_migrate_copy(struct intel_context *ce, return err; }
+/** + * DOC: Flat-CCS - Memory compression for Local memory + * + * On Xe-HP and later devices, we use dedicated compression control state (CCS) + * stored in local memory for each surface, to support the 3D and media + * compression formats. + * + * The memory required for the CCS of the entire local memory is 1/256 of the + * local memory size. So before the kernel boot, the required memory is reserved + * for the CCS data and a secure register will be programmed with the CCS base + * address. + * + * Flat CCS data needs to be cleared when a lmem object is allocated. + * And CCS data can be copied in and out of CCS region through + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly. + * + * When we exaust the lmem, if the object's placements support smem, then we can + * directly decompress the compressed lmem object into smem and start using it + * from smem itself. + * + * But when we need to swapout the compressed lmem object into a smem region + * though objects' placement doesn't support smem, then we copy the lmem content + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT). + * When the object is referred, lmem content will be swaped in along with + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding + * location. + * + * + * Flat-CCS Modifiers for different compression formats + * ---------------------------------------------------- + * + * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS - used to indicate the buffers of Flat CCS + * render compression formats. Though the general layout is same as + * I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression algorithm is + * used. Render compression uses 128 byte compression blocks + * + * I915_FORMAT_MOD_F_TILED_DG2_MC_CCS -used to indicate the buffers of Flat CCS + * media compression formats. Though the general layout is same as + * I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm is + * used. Media compression uses 256 byte compression blocks. + * + * I915_FORMAT_MOD_F_TILED_DG2_RC_CCS_CC - used to indicate the buffers of Flat + * CCS clear color render compression formats. Unified compression format for + * clear color render compression. The genral layout is a tiled layout using + * 4Kb tiles i.e Tile4 layout. + */ + static inline u32 *i915_flush_dw(u32 *cmd, u64 dst, u32 flags) { /* Mask the 3 LSB to use the PPGTT address space */
Details of the Flat-CCS getting added as part of DG2 enabling and its implicit impact on the uAPI.
Signed-off-by: Ramalingam C ramalingam.c@intel.com cc: Daniel Vetter daniel.vetter@ffwll.ch cc: Matthew Auld matthew.auld@intel.com cc: Simon Ser contact@emersion.fr cc: Pekka Paalanen ppaalanen@gmail.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: mesa-dev@lists.freedesktop.org Cc: Tony Ye tony.ye@intel.com Cc: Slawomir Milczarek slawomir.milczarek@intel.com --- Documentation/gpu/rfc/i915_dg2.rst | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/Documentation/gpu/rfc/i915_dg2.rst b/Documentation/gpu/rfc/i915_dg2.rst index f4eb5a219897..9d28b1812bc7 100644 --- a/Documentation/gpu/rfc/i915_dg2.rst +++ b/Documentation/gpu/rfc/i915_dg2.rst @@ -23,3 +23,10 @@ handling the 64k page size.
.. kernel-doc:: include/uapi/drm/i915_drm.h :functions: drm_i915_gem_create_ext + + +Flat CCS support for lmem +========================= + +.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_migrate.c + :doc: Flat-CCS - Memory compression for Local memory
Just a note here. To enable the dg2 with basic support sooner on CI we have taken a subset of this series separtely at https://patchwork.freedesktop.org/series/100419/
Remaining patches will be pursued on top the above series. Thanks for the review comments. We will fix them working with reviewers. Thanks.
Ram.
On 2022-02-01 at 16:11:13 +0530, Ramalingam C wrote:
This series introduces the enabling patches for new memory compression feature Flat CCS and 64k page support for i915 local memory, along with documentation on the uAPI impact. Included the details of the feature and the implications on the uAPI below. Which is also added into Documentation/gpu/rfc/i915_dg2.rst
DG2 64K page size support:
On discrete platforms, starting from DG2, we have to contend with GTT page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE objects. Specifically the hardware only supports 64K or larger GTT page sizes for such memory. The kernel will already ensure that all I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page sizes underneath.
Note that the returned size here will always reflect any required rounding up done by the kernel, i.e 4K will now become 64K on devices such as DG2.
Special DG2 GTT address alignment requirement:
The GTT alignment will also need to be at least 2M for such objects.
Note that due to how the hardware implements 64K GTT page support, we have some further complications:
- The entire PDE (which covers a 2MB virtual address range), must
contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same PDE is forbidden by the hardware.
- We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM
objects.
To keep things simple for userland, we mandate that any GTT mappings must be aligned to and rounded up to 2MB. As this only wastes virtual address space and avoids userland having to copy any needlessly complicated PDE sharing scheme (coloring) and only affects DG2, this is deemed to be a good compromise.
Flat CCS support for lmem
On Xe-HP and later devices, we use dedicated compression control state (CCS) stored in local memory for each surface, to support the 3D and media compression formats.
The memory required for the CCS of the entire local memory is 1/256 of the local memory size. So before the kernel boot, the required memory is reserved for the CCS data and a secure register will be programmed with the CCS base address.
Flat CCS data needs to be cleared when a lmem object is allocated. And CCS data can be copied in and out of CCS region through XY_CTRL_SURF_COPY_BLT. CPU can’t access the CCS data directly.
When we exaust the lmem, if the object’s placements support smem, then we can directly decompress the compressed lmem object into smem and start using it from smem itself.
But when we need to swapout the compressed lmem object into a smem region though objects’ placement doesn’t support smem, then we copy the lmem content as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT). When the object is referred, lmem content will be swaped in along with restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding location.
Flat-CCS Modifiers for different compression formats
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS - used to indicate the buffers of Flat CCS render compression formats. Though the general layout is same as I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS, new hashing/compression algorithm is used. Render compression uses 128 byte compression blocks
I915_FORMAT_MOD_4_TILED_DG2_MC_CCS -used to indicate the buffers of Flat CCS media compression formats. Though the general layout is same as I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS, new hashing/compression algorithm is used. Media compression uses 256 byte compression blocks.
I915_FORMAT_MOD_4_TILED_DG2_RC_CCS_CC - used to indicate the buffers of Flat CCS clear color render compression formats. Unified compression format for clear color render compression. The genral layout is a tiled layout using 4Kb tiles i.e Tile4 layout. Fast clear color value expected by HW is located in fb at offset 0 of plane#1
v2: Fixed some formatting issues and platform naming issues Added some more documentation on Flat-CCS
v3: Plane programming is handled for flat-ccs and clear color Tile4 and flat ccs modifier patches are rebased on table based modifier reference method Three patches are squashed Y tile is pruned for DG2. flat_ccs_cc plane format info is added Added mesa, compute and media ppl for required uAPI ack.
v4: Rebasing of the patches
v5: KDoc is enhanced for cc modifier. [Nanley & Lionel] inbuild macro usage for functional fix [Bob] Addressed review comments from Matt Platform coverage fix for modifiers [Imre]
Abdiel Janulgue (1): drm/i915/lmem: Enable lmem for platforms with Flat CCS
Anshuman Gupta (1): drm/i915/dg2: Flat CCS Support
Ayaz A Siddiqui (1): drm/i915/gt: Clear compress metadata for Xe_HP platforms
CQ Tang (1): drm/i915/xehpsdv: Add has_flat_ccs to device info
Matt Roper (1): drm/i915/dg2: Add DG2 unified compression
Matthew Auld (6): drm/i915: enforce min GTT alignment for discrete cards drm/i915: support 64K GTT pages for discrete cards drm/i915/gtt: allow overriding the pt alignment drm/i915/gtt: add xehpsdv_ppgtt_insert_entry drm/i915/migrate: add acceleration support for DG2 drm/i915/uapi: document behaviour for DG2 64K support
Mika Kahola (1): uapi/drm/dg2: Introduce format modifier for DG2 clear color
Ramalingam C (4): drm/i915: add needs_compact_pt flag Doc/gpu/rfc/i915: i915 DG2 64k pagesize uAPI drm/i915/Flat-CCS: Document on Flat-CCS memory compression Doc/gpu/rfc/i915: i915 DG2 flat-CCS uAPI
Robert Beckett (1): drm/i915: add gtt misalignment test
Stanislav Lisovskiy (2): drm/i915: Introduce new Tile 4 format drm/i915/dg2: Tile 4 plane format support
Documentation/gpu/rfc/i915_dg2.rst | 32 ++ Documentation/gpu/rfc/index.rst | 3 + drivers/gpu/drm/i915/display/intel_display.c | 5 +- drivers/gpu/drm/i915/display/intel_fb.c | 68 +++- drivers/gpu/drm/i915/display/intel_fb.h | 1 + drivers/gpu/drm/i915/display/intel_fbc.c | 1 + .../drm/i915/display/intel_plane_initial.c | 1 + .../drm/i915/display/skl_universal_plane.c | 70 +++- .../gpu/drm/i915/gem/selftests/huge_pages.c | 60 ++++ .../i915/gem/selftests/i915_gem_client_blt.c | 21 +- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 158 +++++++- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 14 + drivers/gpu/drm/i915/gt/intel_gt.c | 19 + drivers/gpu/drm/i915/gt/intel_gt.h | 1 + drivers/gpu/drm/i915/gt/intel_gtt.c | 12 + drivers/gpu/drm/i915/gt/intel_gtt.h | 31 +- drivers/gpu/drm/i915/gt/intel_migrate.c | 336 ++++++++++++++++-- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 17 +- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 24 +- drivers/gpu/drm/i915/i915_drv.h | 18 +- drivers/gpu/drm/i915/i915_pci.c | 4 + drivers/gpu/drm/i915/i915_reg.h | 4 + drivers/gpu/drm/i915/i915_vma.c | 9 + drivers/gpu/drm/i915/intel_device_info.h | 3 + drivers/gpu/drm/i915/intel_pm.c | 1 + drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 224 ++++++++++-- include/uapi/drm/drm_fourcc.h | 43 +++ include/uapi/drm/i915_drm.h | 44 ++- 28 files changed, 1102 insertions(+), 122 deletions(-) create mode 100644 Documentation/gpu/rfc/i915_dg2.rst
-- 2.20.1
dri-devel@lists.freedesktop.org