[PATCH 00/19] More DG1 enabling

List overview All Threads
Download

newer

older

[PATCH v2] drm: rcar-du: Allow...

[PATCH v2] dt-bindings: display:...

Matthew Auld

12 Apr 2021 12 Apr '21

9:05 a.m.

Next batch of DG1 patches. With this we should now get a booting DG1 system with the kernel selftests passing.

Anshuman Gupta (1): drm/i915/oprom: Basic sanitization

Anusha Srivatsa (1): drm/i915/lmem: Bypass aperture when lmem is available

CQ Tang (3): drm/i915: Create stolen memory region from local memory drm/i915/stolen: enforce the min_page_size contract drm/i915/stolen: pass the allocation flags

Chris Wilson (2): drm/i915/gt: Skip aperture remapping selftest where there is no aperture drm/i915/selftests: Only query RAPL for integrated power measurements

Clint Taylor (3): drm/i915/dg1: Read OPROM via SPI controller drm/i915/dg1: Compute MEM Bandwidth using MCHBAR drm/i915/dg1: Double memory bandwidth available

José Roberto de Souza (1): drm/i915: WA for zero memory channel

Matt Roper (1): drm/i915/lmem: Fail driver init if LMEM training failed

Matthew Auld (3): drm/i915/stolen: treat stolen local as normal local memory drm/i915/gtt: map the PD up front drm/i915/gtt/dgfx: place the PD in LMEM

Mohammed Khajapasha (2): drm/i915/fbdev: Use lmem physical addresses for fb_mmap() on discrete drm/i915: Return error value when bo not in LMEM for discrete

Venkata Ramana Nayana (1): drm/i915/dg1: Fix mapping type for default state object

Venkata Sandeep Dhanalakota (1): drm/i915: Update the helper to set correct mapping

drivers/gpu/drm/i915/display/intel_bios.c | 75 +++++++- drivers/gpu/drm/i915/display/intel_bw.c | 63 ++++++- drivers/gpu/drm/i915/display/intel_display.c | 10 ++ drivers/gpu/drm/i915/display/intel_fbdev.c | 51 ++++-- drivers/gpu/drm/i915/display/intel_opregion.c | 169 ++++++++++++++++++ drivers/gpu/drm/i915/display/intel_opregion.h | 38 +++- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 20 ++- drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 5 + drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 116 ++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_stolen.h | 3 + .../drm/i915/gem/selftests/i915_gem_context.c | 11 +- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 11 +- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 31 ++-- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 +- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 71 +++++--- drivers/gpu/drm/i915/gt/intel_gtt.h | 12 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 7 +- drivers/gpu/drm/i915/gt/intel_ring.c | 9 +- drivers/gpu/drm/i915/gt/selftest_context.c | 3 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 +- drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +- drivers/gpu/drm/i915/gt/selftest_rc6.c | 32 ++-- drivers/gpu/drm/i915/gt/selftest_rps.c | 2 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +- drivers/gpu/drm/i915/i915_drv.h | 11 +- drivers/gpu/drm/i915/i915_pci.c | 2 +- drivers/gpu/drm/i915/i915_reg.h | 12 ++ drivers/gpu/drm/i915/i915_vma.c | 22 ++- drivers/gpu/drm/i915/intel_memory_region.c | 6 + drivers/gpu/drm/i915/intel_memory_region.h | 5 +- drivers/gpu/drm/i915/intel_uncore.c | 12 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 10 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 3 +- drivers/gpu/drm/i915/selftests/i915_vma.c | 3 + drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 +- drivers/gpu/drm/i915/selftests/librapl.c | 10 ++ drivers/gpu/drm/i915/selftests/librapl.h | 4 + 42 files changed, 716 insertions(+), 158 deletions(-)

-- 2.26.3

Show replies by date

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 01/19] drm/i915/gt: Skip aperture remapping selftest where there is no aperture

From: Chris Wilson chris@chris-wilson.co.uk

If there is no mappable aperture, we cannot remap it for access, and the selftest is void.

Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Reviewed-by: Matthew Auld matthew.auld@intel.com Reviewed-by: Imre Deak imre.deak@intel.com --- drivers/gpu/drm/i915/selftests/i915_vma.c | 3 +++ 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c index 5fe7b80ca0bd..dd0607254a95 100644 --- a/drivers/gpu/drm/i915/selftests/i915_vma.c +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c @@ -967,6 +967,9 @@ static int igt_vma_remapped_gtt(void *arg) intel_wakeref_t wakeref; int err = 0;

+ if (!i915_ggtt_has_aperture(&i915->ggtt)) + return 0; + obj = i915_gem_object_create_internal(i915, 10 * 10 * PAGE_SIZE); if (IS_ERR(obj)) return PTR_ERR(obj);

-- 2.26.3

Daniel Vetter

2:48 p.m.

New subject: [Intel-gfx] [PATCH 01/19] drm/i915/gt: Skip aperture remapping selftest where there is no aperture

On Mon, Apr 12, 2021 at 10:05:08AM +0100, Matthew Auld wrote:

...

From: Chris Wilson chris@chris-wilson.co.uk

If there is no mappable aperture, we cannot remap it for access, and the selftest is void.

Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Reviewed-by: Matthew Auld matthew.auld@intel.com Reviewed-by: Imre Deak imre.deak@intel.com

I guess subject should have i915/selftest in it? Also if you resubmit other people's code needs your sob. Otherwise looks reasonable. -Daniel

...

drivers/gpu/drm/i915/selftests/i915_vma.c | 3 +++ 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c index 5fe7b80ca0bd..dd0607254a95 100644 --- a/drivers/gpu/drm/i915/selftests/i915_vma.c +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c @@ -967,6 +967,9 @@ static int igt_vma_remapped_gtt(void *arg) intel_wakeref_t wakeref; int err = 0;
if (!i915_ggtt_has_aperture(&i915->ggtt))
return 0;
obj = i915_gem_object_create_internal(i915, 10 * 10 * PAGE_SIZE); if (IS_ERR(obj)) return PTR_ERR(obj);
-- 2.26.3

Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Matthew Auld

9:05 a.m.

New subject: [PATCH 02/19] drm/i915/selftests: Only query RAPL for integrated power measurements

From: Chris Wilson chris@chris-wilson.co.uk

RAPL provides an on-package power measurements which does not encompass discrete graphics, so let's avoid using the igfx masurements when testing dgfx. Later we will abstract the simple librapl interface over hwmon so that we can verify basic power consumption scenarios.

Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Reviewed-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gt/selftest_rc6.c | 32 +++++++++++++++--------- drivers/gpu/drm/i915/gt/selftest_rps.c | 2 +- drivers/gpu/drm/i915/selftests/librapl.c | 10 ++++++++ drivers/gpu/drm/i915/selftests/librapl.h | 4 +++ 4 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_rc6.c b/drivers/gpu/drm/i915/gt/selftest_rc6.c index f097e420ac45..710f825f6e5a 100644 --- a/drivers/gpu/drm/i915/gt/selftest_rc6.c +++ b/drivers/gpu/drm/i915/gt/selftest_rc6.c @@ -34,6 +34,7 @@ int live_rc6_manual(void *arg) struct intel_rc6 *rc6 = &gt->rc6; u64 rc0_power, rc6_power; intel_wakeref_t wakeref; + bool has_power; ktime_t dt; u64 res[2]; int err = 0; @@ -50,6 +51,7 @@ int live_rc6_manual(void *arg) if (IS_VALLEYVIEW(gt->i915) || IS_CHERRYVIEW(gt->i915)) return 0;

+ has_power = librapl_supported(gt->i915); wakeref = intel_runtime_pm_get(gt->uncore->rpm);

/* Force RC6 off for starters */ @@ -71,11 +73,14 @@ int live_rc6_manual(void *arg) goto out_unlock; }

- rc0_power = div64_u64(NSEC_PER_SEC * rc0_power, ktime_to_ns(dt)); - if (!rc0_power) { - pr_err("No power measured while in RC0\n"); - err = -EINVAL; - goto out_unlock; + if (has_power) { + rc0_power = div64_u64(NSEC_PER_SEC * rc0_power, + ktime_to_ns(dt)); + if (!rc0_power) { + pr_err("No power measured while in RC0\n"); + err = -EINVAL; + goto out_unlock; + } }

/* Manually enter RC6 */ @@ -97,13 +102,16 @@ int live_rc6_manual(void *arg) err = -EINVAL; }

- rc6_power = div64_u64(NSEC_PER_SEC * rc6_power, ktime_to_ns(dt)); - pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n", - rc0_power, rc6_power); - if (2 * rc6_power > rc0_power) { - pr_err("GPU leaked energy while in RC6!\n"); - err = -EINVAL; - goto out_unlock; + if (has_power) { + rc6_power = div64_u64(NSEC_PER_SEC * rc6_power, + ktime_to_ns(dt)); + pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n", + rc0_power, rc6_power); + if (2 * rc6_power > rc0_power) { + pr_err("GPU leaked energy while in RC6!\n"); + err = -EINVAL; + goto out_unlock; + } }

/* Restore what should have been the original state! */ diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c b/drivers/gpu/drm/i915/gt/selftest_rps.c index 967641fee42a..adf7fdbc00f7 100644 --- a/drivers/gpu/drm/i915/gt/selftest_rps.c +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c @@ -1139,7 +1139,7 @@ int live_rps_power(void *arg) if (!intel_rps_is_enabled(rps) || INTEL_GEN(gt->i915) < 6) return 0;

- if (!librapl_energy_uJ()) + if (!librapl_supported(gt->i915)) return 0;

if (igt_spinner_init(&spin, gt)) diff --git a/drivers/gpu/drm/i915/selftests/librapl.c b/drivers/gpu/drm/i915/selftests/librapl.c index 58710ac3f979..eb03b5b28bad 100644 --- a/drivers/gpu/drm/i915/selftests/librapl.c +++ b/drivers/gpu/drm/i915/selftests/librapl.c @@ -5,8 +5,18 @@

#include <asm/msr.h>

+#include "i915_drv.h" #include "librapl.h"

+bool librapl_supported(const struct drm_i915_private *i915) +{ + /* Discrete cards require hwmon integration */ + if (IS_DGFX(i915)) + return false; + + return librapl_energy_uJ(); +} + u64 librapl_energy_uJ(void) { unsigned long long power; diff --git a/drivers/gpu/drm/i915/selftests/librapl.h b/drivers/gpu/drm/i915/selftests/librapl.h index 887f3e91dd05..e3b24fad0a7a 100644 --- a/drivers/gpu/drm/i915/selftests/librapl.h +++ b/drivers/gpu/drm/i915/selftests/librapl.h @@ -8,6 +8,10 @@

#include <linux/types.h>

+struct drm_i915_private; + +bool librapl_supported(const struct drm_i915_private *i915); + u64 librapl_energy_uJ(void);

#endif /* SELFTEST_LIBRAPL_H */

-- 2.26.3

Matthew Auld

9:05 a.m.

New subject: [PATCH 03/19] drm/i915: Create stolen memory region from local memory

From: CQ Tang cq.tang@intel.com

Add "REGION_STOLEN" device info to dg1, create stolen memory region from upper portion of local device memory, starting from DSMBASE.

v2: - s/drm_info/drm_dbg; userspace likely doesn't care about stolen. - mem->type is only setup after the region probe, so setting the name as stolen-local or stolen-system based on this value won't work. Split system vs local stolen setup to fix this. - kill all the region->devmem/is_devmem stuff. We already differentiate the different types of stolen so such things shouldn't be needed anymore.

Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 99 +++++++++++++++++++--- drivers/gpu/drm/i915/gem/i915_gem_stolen.h | 3 + drivers/gpu/drm/i915/i915_pci.c | 2 +- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_memory_region.c | 6 ++ drivers/gpu/drm/i915/intel_memory_region.h | 5 +- 6 files changed, 102 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index b0597de206de..56dd58bef5ee 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -10,6 +10,7 @@ #include <drm/drm_mm.h> #include <drm/i915_drm.h>

+#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" #include "i915_drv.h" #include "i915_gem_stolen.h" @@ -121,6 +122,14 @@ static int i915_adjust_stolen(struct drm_i915_private *i915, } }

+ /* + * With device local memory, we don't need to check the address range, + * this is device memory physical address, could overlap with system + * memory. + */ + if (HAS_LMEM(i915)) + return 0; + /* * Verify that nothing else uses this physical address. Stolen * memory should be reserved by the BIOS and hidden from the @@ -374,8 +383,9 @@ static void icl_get_stolen_reserved(struct drm_i915_private *i915, } }

-static int i915_gem_init_stolen(struct drm_i915_private *i915) +static int i915_gem_init_stolen(struct intel_memory_region *mem) { + struct drm_i915_private *i915 = mem->i915; struct intel_uncore *uncore = &i915->uncore; resource_size_t reserved_base, stolen_top; resource_size_t reserved_total, reserved_size; @@ -396,10 +406,10 @@ static int i915_gem_init_stolen(struct drm_i915_private *i915) return 0; }

- if (resource_size(&intel_graphics_stolen_res) == 0) + if (resource_size(&mem->region) == 0) return 0;

- i915->dsm = intel_graphics_stolen_res; + i915->dsm = mem->region;

if (i915_adjust_stolen(i915, &i915->dsm)) return 0; @@ -684,23 +694,36 @@ static int _i915_gem_object_stolen_init(struct intel_memory_region *mem, return ret; }

+struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915) +{ + if (HAS_LMEM(i915)) + return i915->mm.regions[INTEL_REGION_STOLEN_LMEM]; + + return i915->mm.regions[INTEL_REGION_STOLEN_SMEM]; +} + struct drm_i915_gem_object * i915_gem_object_create_stolen(struct drm_i915_private *i915, resource_size_t size) { - return i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_STOLEN_SMEM], + return i915_gem_object_create_region(i915_stolen_region(i915), size, I915_BO_ALLOC_CONTIGUOUS); }

static int init_stolen(struct intel_memory_region *mem) { - intel_memory_region_set_name(mem, "stolen"); + if (HAS_LMEM(mem->i915)) { + if (!io_mapping_init_wc(&mem->iomap, + mem->io_start, + resource_size(&mem->region))) + return -EIO; + }

/* * Initialise stolen early so that we may reserve preallocated * objects for the BIOS to KMS transition. */ - return i915_gem_init_stolen(mem->i915); + return i915_gem_init_stolen(mem); }

static void release_stolen(struct intel_memory_region *mem) @@ -714,13 +737,65 @@ static const struct intel_memory_region_ops i915_region_stolen_ops = { .init_object = _i915_gem_object_stolen_init, };

+static struct intel_memory_region * +setup_lmem_stolen(struct drm_i915_private *i915) +{ + struct intel_uncore *uncore = &i915->uncore; + struct pci_dev *pdev = i915->drm.pdev; + struct intel_memory_region *mem; + resource_size_t io_start; + resource_size_t lmem_size; + u64 lmem_base; + + if (!IS_DGFX(i915)) + return ERR_PTR(-ENODEV); + + lmem_base = intel_uncore_read64(uncore, GEN12_DSMBASE); + lmem_size = pci_resource_len(pdev, 2) - lmem_base; + io_start = pci_resource_start(pdev, 2) + lmem_base; + + mem = intel_memory_region_create(i915, lmem_base, lmem_size, + I915_GTT_PAGE_SIZE_4K, io_start, + &i915_region_stolen_ops); + if (IS_ERR(mem)) + return mem; + + drm_dbg(&i915->drm, "Stolen Local memory: %pR\n", &mem->region); + drm_dbg(&i915->drm, "Stolen Local memory IO start: %pa\n", + &mem->io_start); + + intel_memory_region_set_name(mem, "stolen-local"); + + return mem; +} + +static struct intel_memory_region* +setup_smem_stolen(struct drm_i915_private *i915) +{ + struct intel_memory_region *mem; + + mem = intel_memory_region_create(i915, + intel_graphics_stolen_res.start, + resource_size(&intel_graphics_stolen_res), + PAGE_SIZE, 0, + &i915_region_stolen_ops); + if (IS_ERR(mem)) + return mem; + + intel_memory_region_set_name(mem, "stolen-system"); + + return mem; +} + struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915) { - return intel_memory_region_create(i915, - intel_graphics_stolen_res.start, - resource_size(&intel_graphics_stolen_res), - PAGE_SIZE, 0, - &i915_region_stolen_ops); + struct intel_memory_region *mem; + + mem = setup_lmem_stolen(i915); + if (mem == ERR_PTR(-ENODEV)) + mem = setup_smem_stolen(i915); + + return mem; }

struct drm_i915_gem_object * @@ -728,7 +803,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915, resource_size_t stolen_offset, resource_size_t size) { - struct intel_memory_region *mem = i915->mm.regions[INTEL_REGION_STOLEN_SMEM]; + struct intel_memory_region *mem = i915_stolen_region(i915); struct drm_i915_gem_object *obj; struct drm_mm_node *stolen; int ret; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h index b03489706796..2d1ce7fec61c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h @@ -22,6 +22,9 @@ int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv, void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv, struct drm_mm_node *node); struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915); + +struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915); + struct drm_i915_gem_object * i915_gem_object_create_stolen(struct drm_i915_private *dev_priv, resource_size_t size); diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 480553746794..53f5d1e6daef 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -906,7 +906,7 @@ static const struct intel_device_info rkl_info = {

#define GEN12_DGFX_FEATURES \ GEN12_FEATURES, \ - .memory_regions = REGION_SMEM | REGION_LMEM, \ + .memory_regions = REGION_SMEM | REGION_LMEM | REGION_STOLEN_LMEM, \ .has_master_unit_irq = 1, \ .has_llc = 0, \ .has_snoop = 1, \ diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index e087bcd21911..4108f2a7ebfa 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12191,6 +12191,7 @@ enum skl_power_gate { #define GEN12_GLOBAL_MOCS(i) _MMIO(0x4000 + (i) * 4) /* Global MOCS regs */

#define GEN12_GSMBASE _MMIO(0x108100) +#define GEN12_DSMBASE _MMIO(0x1080C0)

/* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4) diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index bf837b6bb185..ac90b76a3fa0 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -22,6 +22,10 @@ static const struct { .class = INTEL_MEMORY_STOLEN_SYSTEM, .instance = 0, }, + [INTEL_REGION_STOLEN_LMEM] = { + .class = INTEL_MEMORY_STOLEN_LOCAL, + .instance = 0, + }, };

struct intel_memory_region * @@ -278,6 +282,8 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) case INTEL_MEMORY_SYSTEM: mem = i915_gem_shmem_setup(i915); break; + case INTEL_MEMORY_STOLEN_LOCAL: + fallthrough; case INTEL_MEMORY_STOLEN_SYSTEM: mem = i915_gem_stolen_setup(i915); break; diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index edd49067c8ca..4c8ec15af55f 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -26,18 +26,21 @@ enum intel_memory_type { INTEL_MEMORY_SYSTEM = 0, INTEL_MEMORY_LOCAL, INTEL_MEMORY_STOLEN_SYSTEM, + INTEL_MEMORY_STOLEN_LOCAL, };

enum intel_region_id { INTEL_REGION_SMEM = 0, INTEL_REGION_LMEM, INTEL_REGION_STOLEN_SMEM, + INTEL_REGION_STOLEN_LMEM, INTEL_REGION_UNKNOWN, /* Should be last */ };

#define REGION_SMEM BIT(INTEL_REGION_SMEM) #define REGION_LMEM BIT(INTEL_REGION_LMEM) #define REGION_STOLEN_SMEM BIT(INTEL_REGION_STOLEN_SMEM) +#define REGION_STOLEN_LMEM BIT(INTEL_REGION_STOLEN_LMEM)

#define I915_ALLOC_MIN_PAGE_SIZE BIT(0) #define I915_ALLOC_CONTIGUOUS BIT(1) @@ -82,7 +85,7 @@ struct intel_memory_region { u16 type; u16 instance; enum intel_region_id id; - char name[8]; + char name[16];

struct list_head reserved;

-- 2.26.3

Tvrtko Ursulin

14 Apr 14 Apr

3:01 p.m.

New subject: [Intel-gfx] [PATCH 03/19] drm/i915: Create stolen memory region from local memory

On 12/04/2021 10:05, Matthew Auld wrote:

...

From: CQ Tang cq.tang@intel.com

Add "REGION_STOLEN" device info to dg1, create stolen memory region from upper portion of local device memory, starting from DSMBASE.

v2: - s/drm_info/drm_dbg; userspace likely doesn't care about stolen. - mem->type is only setup after the region probe, so setting the name as stolen-local or stolen-system based on this value won't work. Split system vs local stolen setup to fix this. - kill all the region->devmem/is_devmem stuff. We already differentiate the different types of stolen so such things shouldn't be needed anymore.

Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com

drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 99 +++++++++++++++++++--- drivers/gpu/drm/i915/gem/i915_gem_stolen.h | 3 + drivers/gpu/drm/i915/i915_pci.c | 2 +- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/intel_memory_region.c | 6 ++ drivers/gpu/drm/i915/intel_memory_region.h | 5 +- 6 files changed, 102 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index b0597de206de..56dd58bef5ee 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -10,6 +10,7 @@ #include <drm/drm_mm.h> #include <drm/i915_drm.h>

+#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" #include "i915_drv.h" #include "i915_gem_stolen.h" @@ -121,6 +122,14 @@ static int i915_adjust_stolen(struct drm_i915_private *i915, } }
/*
* With device local memory, we don't need to check the address range,
* this is device memory physical address, could overlap with system
* memory.
*/
if (HAS_LMEM(i915))
return 0;
/*

Verify that nothing else uses this physical address. Stolen

memory should be reserved by the BIOS and hidden from the
@@ -374,8 +383,9 @@ static void icl_get_stolen_reserved(struct drm_i915_private *i915, } }

-static int i915_gem_init_stolen(struct drm_i915_private *i915) +static int i915_gem_init_stolen(struct intel_memory_region *mem) {

struct drm_i915_private *i915 = mem->i915; struct intel_uncore *uncore = &i915->uncore; resource_size_t reserved_base, stolen_top; resource_size_t reserved_total, reserved_size;

@@ -396,10 +406,10 @@ static int i915_gem_init_stolen(struct drm_i915_private *i915) return 0; }

if (resource_size(&intel_graphics_stolen_res) == 0)

if (resource_size(&mem->region) == 0) return 0;

i915->dsm = intel_graphics_stolen_res;

i915->dsm = mem->region;

if (i915_adjust_stolen(i915, &i915->dsm)) return 0;

@@ -684,23 +694,36 @@ static int _i915_gem_object_stolen_init(struct intel_memory_region *mem, return ret; }

+struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915) +{
if (HAS_LMEM(i915))
return i915->mm.regions[INTEL_REGION_STOLEN_LMEM];
return i915->mm.regions[INTEL_REGION_STOLEN_SMEM];
+}

Could be a bikeshedding comment only - especially since I think this path gets very little used at runtime so it is most likely pointless to fiddle with it, but it just strikes me a bit not fully elegant to do:

i915_gem_object_create_stolen -> i915_gem_object_create_region -> i915_stolen_region

And end up in here, when alternative could be at driver init:

i915->stolen_region_id = HAS_LMEM() ? ... : ...;

i915_gem_object_create_stolen -> i915_gem_object_create_region(i915->mm.regions[i915->stolen_region_id]);

Or pointer to region. Would avoid having to export i915_stolen_region as well.

Or is i915->dsm already the right thing? Because..

...

struct drm_i915_gem_object * i915_gem_object_create_stolen(struct drm_i915_private *i915, resource_size_t size) {

return i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_STOLEN_SMEM],

return i915_gem_object_create_region(i915_stolen_region(i915), size, I915_BO_ALLOC_CONTIGUOUS); }

static int init_stolen(struct intel_memory_region *mem) {

intel_memory_region_set_name(mem, "stolen");
if (HAS_LMEM(mem->i915)) {
if (!io_mapping_init_wc(&mem->iomap,
			mem->io_start,
			resource_size(&mem->region)))
	return -EIO;
}

/*

Initialise stolen early so that we may reserve preallocated

objects for the BIOS to KMS transition.

*/
return i915_gem_init_stolen(mem->i915);

return i915_gem_init_stolen(mem);

... I find the mem region init paths a bit convoluted, stolen especially, and struggle to figure it out every time.

For instance we have i915_region_stolen_ops shared between system and local stolen. But then shared vfuncs branch depending on system vs stolen?

i915_gem_init_stolen is shared - but which parts of it are relevant for local stolen?

...

}

static void release_stolen(struct intel_memory_region *mem) @@ -714,13 +737,65 @@ static const struct intel_memory_region_ops i915_region_stolen_ops = { .init_object = _i915_gem_object_stolen_init, };

+static struct intel_memory_region * +setup_lmem_stolen(struct drm_i915_private *i915) +{
struct intel_uncore *uncore = &i915->uncore;

struct pci_dev *pdev = i915->drm.pdev;

struct intel_memory_region *mem;

resource_size_t io_start;

resource_size_t lmem_size;

u64 lmem_base;

if (!IS_DGFX(i915))
return ERR_PTR(-ENODEV);
lmem_base = intel_uncore_read64(uncore, GEN12_DSMBASE);

lmem_size = pci_resource_len(pdev, 2) - lmem_base;

io_start = pci_resource_start(pdev, 2) + lmem_base;

mem = intel_memory_region_create(i915, lmem_base, lmem_size,
			 I915_GTT_PAGE_SIZE_4K, io_start,
			 &i915_region_stolen_ops);
if (IS_ERR(mem))
return mem;
drm_dbg(&i915->drm, "Stolen Local memory: %pR\n", &mem->region);

drm_dbg(&i915->drm, "Stolen Local memory IO start: %pa\n",
&mem->io_start);

Could these messages be consolidated with the system stolen ones (i915_gem_setup_stolen?) and based off the memory_region data printed from common i915_gem_stolen_setup?

...

intel_memory_region_set_name(mem, "stolen-local");

return mem;

+}

+static struct intel_memory_region*

Space before asterisk.

...

+setup_smem_stolen(struct drm_i915_private *i915) +{
struct intel_memory_region *mem;

mem = intel_memory_region_create(i915,
			 intel_graphics_stolen_res.start,
			 resource_size(&intel_graphics_stolen_res),
			 PAGE_SIZE, 0,
			 &i915_region_stolen_ops);
if (IS_ERR(mem))
return mem;
intel_memory_region_set_name(mem, "stolen-system");

I assume this name, although changed from the current ("stolen"), is not exported anywhere to matter?

...

return mem;

+}

struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915) {
return intel_memory_region_create(i915,
			  intel_graphics_stolen_res.start,
			  resource_size(&intel_graphics_stolen_res),
			  PAGE_SIZE, 0,
			  &i915_region_stolen_ops);
struct intel_memory_region *mem;

mem = setup_lmem_stolen(i915);

if (mem == ERR_PTR(-ENODEV))
mem = setup_smem_stolen(i915);
return mem; }

struct drm_i915_gem_object *
@@ -728,7 +803,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915, resource_size_t stolen_offset, resource_size_t size) {

struct intel_memory_region *mem = i915->mm.regions[INTEL_REGION_STOLEN_SMEM];

struct intel_memory_region *mem = i915_stolen_region(i915); struct drm_i915_gem_object *obj; struct drm_mm_node *stolen; int ret;

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h index b03489706796..2d1ce7fec61c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h @@ -22,6 +22,9 @@ int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv, void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv, struct drm_mm_node *node); struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915);

+struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915);

struct drm_i915_gem_object * i915_gem_object_create_stolen(struct drm_i915_private *dev_priv, resource_size_t size);

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 480553746794..53f5d1e6daef 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -906,7 +906,7 @@ static const struct intel_device_info rkl_info = {

#define GEN12_DGFX_FEATURES \ GEN12_FEATURES, \

.memory_regions = REGION_SMEM | REGION_LMEM, \

.memory_regions = REGION_SMEM | REGION_LMEM | REGION_STOLEN_LMEM, \ .has_master_unit_irq = 1, \ .has_llc = 0, \ .has_snoop = 1, \

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index e087bcd21911..4108f2a7ebfa 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12191,6 +12191,7 @@ enum skl_power_gate { #define GEN12_GLOBAL_MOCS(i) _MMIO(0x4000 + (i) * 4) /* Global MOCS regs */

#define GEN12_GSMBASE _MMIO(0x108100) +#define GEN12_DSMBASE _MMIO(0x1080C0)

/* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4) diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index bf837b6bb185..ac90b76a3fa0 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -22,6 +22,10 @@ static const struct { .class = INTEL_MEMORY_STOLEN_SYSTEM, .instance = 0, },
[INTEL_REGION_STOLEN_LMEM] = {
.class = INTEL_MEMORY_STOLEN_LOCAL,
.instance = 0,
}, };

struct intel_memory_region *
@@ -278,6 +282,8 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915) case INTEL_MEMORY_SYSTEM: mem = i915_gem_shmem_setup(i915); break;
case INTEL_MEMORY_STOLEN_LOCAL:
	fallthrough;
case INTEL_MEMORY_STOLEN_SYSTEM: mem = i915_gem_stolen_setup(i915); break;
diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index edd49067c8ca..4c8ec15af55f 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -26,18 +26,21 @@ enum intel_memory_type { INTEL_MEMORY_SYSTEM = 0, INTEL_MEMORY_LOCAL, INTEL_MEMORY_STOLEN_SYSTEM,

INTEL_MEMORY_STOLEN_LOCAL, };

enum intel_region_id { INTEL_REGION_SMEM = 0, INTEL_REGION_LMEM, INTEL_REGION_STOLEN_SMEM,

INTEL_REGION_STOLEN_LMEM, INTEL_REGION_UNKNOWN, /* Should be last */ };

#define REGION_SMEM BIT(INTEL_REGION_SMEM) #define REGION_LMEM BIT(INTEL_REGION_LMEM) #define REGION_STOLEN_SMEM BIT(INTEL_REGION_STOLEN_SMEM)

+#define REGION_STOLEN_LMEM BIT(INTEL_REGION_STOLEN_LMEM)

#define I915_ALLOC_MIN_PAGE_SIZE BIT(0) #define I915_ALLOC_CONTIGUOUS BIT(1) @@ -82,7 +85,7 @@ struct intel_memory_region { u16 type; u16 instance; enum intel_region_id id;

char name[8];

char name[16];

struct list_head reserved;

Regards,

Tvrtko

Matthew Auld

16 Apr 16 Apr

3:04 p.m.

New subject: [Intel-gfx] [PATCH 03/19] drm/i915: Create stolen memory region from local memory

On 14/04/2021 16:01, Tvrtko Ursulin wrote:

...

On 12/04/2021 10:05, Matthew Auld wrote:

...
From: CQ Tang cq.tang@intel.com

Add "REGION_STOLEN" device info to dg1, create stolen memory region from upper portion of local device memory, starting from DSMBASE.

v2:      - s/drm_info/drm_dbg; userspace likely doesn't care about stolen.      - mem->type is only setup after the region probe, so setting the name        as stolen-local or stolen-system based on this value won't work. Split        system vs local stolen setup to fix this.      - kill all the region->devmem/is_devmem stuff. We already differentiate        the different types of stolen so such things shouldn't be needed        anymore.

Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com

drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 99 +++++++++++++++++++--- drivers/gpu/drm/i915/gem/i915_gem_stolen.h | 3 + drivers/gpu/drm/i915/i915_pci.c            | 2 +- drivers/gpu/drm/i915/i915_reg.h            | 1 + drivers/gpu/drm/i915/intel_memory_region.c | 6 ++ drivers/gpu/drm/i915/intel_memory_region.h | 5 +- 6 files changed, 102 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index b0597de206de..56dd58bef5ee 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -10,6 +10,7 @@ #include <drm/drm_mm.h> #include <drm/i915_drm.h> +#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" #include "i915_drv.h" #include "i915_gem_stolen.h" @@ -121,6 +122,14 @@ static int i915_adjust_stolen(struct drm_i915_private *i915,           }       } +    /* +     * With device local memory, we don't need to check the address range, +     * this is device memory physical address, could overlap with system +     * memory. +     */ +    if (HAS_LMEM(i915)) +        return 0;

/*        * Verify that nothing else uses this physical address. Stolen        * memory should be reserved by the BIOS and hidden from the @@ -374,8 +383,9 @@ static void icl_get_stolen_reserved(struct drm_i915_private *i915,       } } -static int i915_gem_init_stolen(struct drm_i915_private *i915) +static int i915_gem_init_stolen(struct intel_memory_region *mem) { +    struct drm_i915_private *i915 = mem->i915;       struct intel_uncore *uncore = &i915->uncore;       resource_size_t reserved_base, stolen_top;       resource_size_t reserved_total, reserved_size; @@ -396,10 +406,10 @@ static int i915_gem_init_stolen(struct drm_i915_private *i915)           return 0;       } -    if (resource_size(&intel_graphics_stolen_res) == 0) +    if (resource_size(&mem->region) == 0)           return 0; -    i915->dsm = intel_graphics_stolen_res; +    i915->dsm = mem->region;       if (i915_adjust_stolen(i915, &i915->dsm))           return 0; @@ -684,23 +694,36 @@ static int _i915_gem_object_stolen_init(struct intel_memory_region *mem,       return ret; } +struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915) +{ +    if (HAS_LMEM(i915)) +        return i915->mm.regions[INTEL_REGION_STOLEN_LMEM];

+    return i915->mm.regions[INTEL_REGION_STOLEN_SMEM]; +}

Could be a bikeshedding comment only - especially since I think this path gets very little used at runtime so it is most likely pointless to fiddle with it, but it just strikes me a bit not fully elegant to do:

i915_gem_object_create_stolen -> i915_gem_object_create_region     -> i915_stolen_region

And end up in here, when alternative could be at driver init:

i915->stolen_region_id = HAS_LMEM() ? ... : ...;

i915_gem_object_create_stolen -> i915_gem_object_create_region(i915->mm.regions[i915->stolen_region_id]);

Or pointer to region. Would avoid having to export i915_stolen_region as well.

Or is i915->dsm already the right thing? Because..

I guess we could just have an i915->stolen_region short-cut or something?

...

...

struct drm_i915_gem_object * i915_gem_object_create_stolen(struct drm_i915_private *i915, resource_size_t size) { - return i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_STOLEN_SMEM], + return i915_gem_object_create_region(i915_stolen_region(i915), size, I915_BO_ALLOC_CONTIGUOUS); } static int init_stolen(struct intel_memory_region *mem) { - intel_memory_region_set_name(mem, "stolen"); + if (HAS_LMEM(mem->i915)) { + if (!io_mapping_init_wc(&mem->iomap, + mem->io_start, + resource_size(&mem->region))) + return -EIO; + } /* * Initialise stolen early so that we may reserve preallocated * objects for the BIOS to KMS transition. */ - return i915_gem_init_stolen(mem->i915); + return i915_gem_init_stolen(mem);

... I find the mem region init paths a bit convoluted, stolen especially, and struggle to figure it out every time.

For instance we have i915_region_stolen_ops shared between system and local stolen. But then shared vfuncs branch depending on system vs stolen?

We could split the intel_memory_region ops? Maybe that will make it slightly less muddled?

The probing is slightly different, but that's kind of expected since it's quite different from the HW pov.

But once we get an intel_memory_region, it should be the same whether it's stolen device memory or whatever.

...

i915_gem_init_stolen is shared - but which parts of it are relevant for local stolen?

Asking all the difficult questions :)

It's just to populate dsm I think. I can rip that out and then we don't call i915_gem_init_stolen() for the stolen device memory path? Maybe that will look slightly better?

...

...
} static void release_stolen(struct intel_memory_region *mem) @@ -714,13 +737,65 @@ static const struct intel_memory_region_ops i915_region_stolen_ops = {       .init_object = _i915_gem_object_stolen_init, }; +static struct intel_memory_region * +setup_lmem_stolen(struct drm_i915_private *i915) +{ +    struct intel_uncore *uncore = &i915->uncore; +    struct pci_dev *pdev = i915->drm.pdev; +    struct intel_memory_region *mem; +    resource_size_t io_start; +    resource_size_t lmem_size; +    u64 lmem_base;

+    if (!IS_DGFX(i915)) +        return ERR_PTR(-ENODEV);

+    lmem_base = intel_uncore_read64(uncore, GEN12_DSMBASE); +    lmem_size = pci_resource_len(pdev, 2) - lmem_base; +    io_start = pci_resource_start(pdev, 2) + lmem_base;

+    mem = intel_memory_region_create(i915, lmem_base, lmem_size, +                     I915_GTT_PAGE_SIZE_4K, io_start, +                     &i915_region_stolen_ops); +    if (IS_ERR(mem)) +        return mem;

+    drm_dbg(&i915->drm, "Stolen Local memory: %pR\n", &mem->region); +    drm_dbg(&i915->drm, "Stolen Local memory IO start: %pa\n", +        &mem->io_start);

Could these messages be consolidated with the system stolen ones (i915_gem_setup_stolen?) and based off the memory_region data printed from common i915_gem_stolen_setup?

...

+    intel_memory_region_set_name(mem, "stolen-local");

+    return mem; +}

+static struct intel_memory_region*

Space before asterisk.

...
+setup_smem_stolen(struct drm_i915_private *i915) +{ +    struct intel_memory_region *mem;

+    mem = intel_memory_region_create(i915, +                     intel_graphics_stolen_res.start, +                     resource_size(&intel_graphics_stolen_res), +                     PAGE_SIZE, 0, +                     &i915_region_stolen_ops); +    if (IS_ERR(mem)) +        return mem;

+    intel_memory_region_set_name(mem, "stolen-system");

I assume this name, although changed from the current ("stolen"), is not exported anywhere to matter?

Yeah, it's just for internal use, and some debugfs.

...

...

+    return mem; +}

struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915) { -    return intel_memory_region_create(i915, -                      intel_graphics_stolen_res.start, -                      resource_size(&intel_graphics_stolen_res), -                      PAGE_SIZE, 0, -                      &i915_region_stolen_ops); +    struct intel_memory_region *mem;

+    mem = setup_lmem_stolen(i915); +    if (mem == ERR_PTR(-ENODEV)) +        mem = setup_smem_stolen(i915);

+    return mem; } struct drm_i915_gem_object * @@ -728,7 +803,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915,                              resource_size_t stolen_offset,                              resource_size_t size) { -    struct intel_memory_region *mem = i915->mm.regions[INTEL_REGION_STOLEN_SMEM]; +    struct intel_memory_region *mem = i915_stolen_region(i915);       struct drm_i915_gem_object *obj;       struct drm_mm_node *stolen;       int ret; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h index b03489706796..2d1ce7fec61c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.h @@ -22,6 +22,9 @@ int i915_gem_stolen_insert_node_in_range(struct drm_i915_private *dev_priv, void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,                    struct drm_mm_node *node); struct intel_memory_region *i915_gem_stolen_setup(struct drm_i915_private *i915);

+struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915);

struct drm_i915_gem_object * i915_gem_object_create_stolen(struct drm_i915_private *dev_priv,                     resource_size_t size); diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 480553746794..53f5d1e6daef 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -906,7 +906,7 @@ static const struct intel_device_info rkl_info = { #define GEN12_DGFX_FEATURES \       GEN12_FEATURES, \ -    .memory_regions = REGION_SMEM | REGION_LMEM, \ +    .memory_regions = REGION_SMEM | REGION_LMEM | REGION_STOLEN_LMEM, \       .has_master_unit_irq = 1, \       .has_llc = 0, \       .has_snoop = 1, \ diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index e087bcd21911..4108f2a7ebfa 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12191,6 +12191,7 @@ enum skl_power_gate { #define GEN12_GLOBAL_MOCS(i)    _MMIO(0x4000 + (i) * 4) /* Global MOCS regs */ #define GEN12_GSMBASE            _MMIO(0x108100) +#define GEN12_DSMBASE            _MMIO(0x1080C0) /* gamt regs */ #define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4) diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c index bf837b6bb185..ac90b76a3fa0 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.c +++ b/drivers/gpu/drm/i915/intel_memory_region.c @@ -22,6 +22,10 @@ static const struct {           .class = INTEL_MEMORY_STOLEN_SYSTEM,           .instance = 0,       }, +    [INTEL_REGION_STOLEN_LMEM] = { +        .class = INTEL_MEMORY_STOLEN_LOCAL, +        .instance = 0, +    }, }; struct intel_memory_region * @@ -278,6 +282,8 @@ int intel_memory_regions_hw_probe(struct drm_i915_private *i915)           case INTEL_MEMORY_SYSTEM:               mem = i915_gem_shmem_setup(i915);               break; +        case INTEL_MEMORY_STOLEN_LOCAL: +            fallthrough;           case INTEL_MEMORY_STOLEN_SYSTEM:               mem = i915_gem_stolen_setup(i915);               break; diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h index edd49067c8ca..4c8ec15af55f 100644 --- a/drivers/gpu/drm/i915/intel_memory_region.h +++ b/drivers/gpu/drm/i915/intel_memory_region.h @@ -26,18 +26,21 @@ enum intel_memory_type {       INTEL_MEMORY_SYSTEM = 0,       INTEL_MEMORY_LOCAL,       INTEL_MEMORY_STOLEN_SYSTEM, +    INTEL_MEMORY_STOLEN_LOCAL, }; enum intel_region_id {       INTEL_REGION_SMEM = 0,       INTEL_REGION_LMEM,       INTEL_REGION_STOLEN_SMEM, +    INTEL_REGION_STOLEN_LMEM,       INTEL_REGION_UNKNOWN, /* Should be last */ }; #define REGION_SMEM     BIT(INTEL_REGION_SMEM) #define REGION_LMEM     BIT(INTEL_REGION_LMEM) #define REGION_STOLEN_SMEM   BIT(INTEL_REGION_STOLEN_SMEM) +#define REGION_STOLEN_LMEM   BIT(INTEL_REGION_STOLEN_LMEM) #define I915_ALLOC_MIN_PAGE_SIZE BIT(0) #define I915_ALLOC_CONTIGUOUS     BIT(1) @@ -82,7 +85,7 @@ struct intel_memory_region {       u16 type;       u16 instance;       enum intel_region_id id; -    char name[8]; +    char name[16];       struct list_head reserved;

Regards,

Tvrtko

Tvrtko Ursulin

19 Apr 19 Apr

2:15 p.m.

New subject: [Intel-gfx] [PATCH 03/19] drm/i915: Create stolen memory region from local memory

On 16/04/2021 16:04, Matthew Auld wrote:

...

On 14/04/2021 16:01, Tvrtko Ursulin wrote:

...
On 12/04/2021 10:05, Matthew Auld wrote:

...
From: CQ Tang cq.tang@intel.com

Add "REGION_STOLEN" device info to dg1, create stolen memory region from upper portion of local device memory, starting from DSMBASE.

v2:      - s/drm_info/drm_dbg; userspace likely doesn't care about stolen.      - mem->type is only setup after the region probe, so setting the name        as stolen-local or stolen-system based on this value won't work. Split        system vs local stolen setup to fix this.      - kill all the region->devmem/is_devmem stuff. We already differentiate        the different types of stolen so such things shouldn't be needed        anymore.

Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com

drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 99 +++++++++++++++++++--- drivers/gpu/drm/i915/gem/i915_gem_stolen.h | 3 + drivers/gpu/drm/i915/i915_pci.c            | 2 +- drivers/gpu/drm/i915/i915_reg.h            | 1 + drivers/gpu/drm/i915/intel_memory_region.c | 6 ++ drivers/gpu/drm/i915/intel_memory_region.h | 5 +- 6 files changed, 102 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index b0597de206de..56dd58bef5ee 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -10,6 +10,7 @@ #include <drm/drm_mm.h> #include <drm/i915_drm.h> +#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_region.h" #include "i915_drv.h" #include "i915_gem_stolen.h" @@ -121,6 +122,14 @@ static int i915_adjust_stolen(struct drm_i915_private *i915,           }       } +    /* +     * With device local memory, we don't need to check the address range, +     * this is device memory physical address, could overlap with system +     * memory. +     */ +    if (HAS_LMEM(i915)) +        return 0;

/*        * Verify that nothing else uses this physical address. Stolen        * memory should be reserved by the BIOS and hidden from the @@ -374,8 +383,9 @@ static void icl_get_stolen_reserved(struct drm_i915_private *i915,       } } -static int i915_gem_init_stolen(struct drm_i915_private *i915) +static int i915_gem_init_stolen(struct intel_memory_region *mem) { +    struct drm_i915_private *i915 = mem->i915;       struct intel_uncore *uncore = &i915->uncore;       resource_size_t reserved_base, stolen_top;       resource_size_t reserved_total, reserved_size; @@ -396,10 +406,10 @@ static int i915_gem_init_stolen(struct drm_i915_private *i915)           return 0;       } -    if (resource_size(&intel_graphics_stolen_res) == 0) +    if (resource_size(&mem->region) == 0)           return 0; -    i915->dsm = intel_graphics_stolen_res; +    i915->dsm = mem->region;       if (i915_adjust_stolen(i915, &i915->dsm))           return 0; @@ -684,23 +694,36 @@ static int _i915_gem_object_stolen_init(struct intel_memory_region *mem,       return ret; } +struct intel_memory_region *i915_stolen_region(struct drm_i915_private *i915) +{ +    if (HAS_LMEM(i915)) +        return i915->mm.regions[INTEL_REGION_STOLEN_LMEM];

+    return i915->mm.regions[INTEL_REGION_STOLEN_SMEM]; +}

Could be a bikeshedding comment only - especially since I think this path gets very little used at runtime so it is most likely pointless to fiddle with it, but it just strikes me a bit not fully elegant to do:

i915_gem_object_create_stolen   -> i915_gem_object_create_region      -> i915_stolen_region

And end up in here, when alternative could be at driver init:

i915->stolen_region_id = HAS_LMEM() ? ... : ...;

i915_gem_object_create_stolen   -> i915_gem_object_create_region(i915->mm.regions[i915->stolen_region_id]);

Or pointer to region. Would avoid having to export i915_stolen_region as well.

Or is i915->dsm already the right thing? Because..

I guess we could just have an i915->stolen_region short-cut or something?

i915->dsm is not it? Where does i915_gem_init_stolen exists for local-stolen then? At the "resource_size(&mem->region) == 0" check?

...

...
...

struct drm_i915_gem_object * i915_gem_object_create_stolen(struct drm_i915_private *i915, resource_size_t size) { - return i915_gem_object_create_region(i915->mm.regions[INTEL_REGION_STOLEN_SMEM],

+ return i915_gem_object_create_region(i915_stolen_region(i915), size, I915_BO_ALLOC_CONTIGUOUS); } static int init_stolen(struct intel_memory_region *mem) { - intel_memory_region_set_name(mem, "stolen"); + if (HAS_LMEM(mem->i915)) { + if (!io_mapping_init_wc(&mem->iomap, + mem->io_start, + resource_size(&mem->region))) + return -EIO; + } /* * Initialise stolen early so that we may reserve preallocated * objects for the BIOS to KMS transition. */ - return i915_gem_init_stolen(mem->i915); + return i915_gem_init_stolen(mem);

... I find the mem region init paths a bit convoluted, stolen especially, and struggle to figure it out every time.

For instance we have i915_region_stolen_ops shared between system and local stolen. But then shared vfuncs branch depending on system vs stolen?

We could split the intel_memory_region ops? Maybe that will make it slightly less muddled?

I think so. Each vfunc table with it's own ->init() should make it easier to follow.

...

The probing is slightly different, but that's kind of expected since it's quite different from the HW pov.

But once we get an intel_memory_region, it should be the same whether it's stolen device memory or whatever.

...
i915_gem_init_stolen is shared - but which parts of it are relevant for local stolen?

Asking all the difficult questions :)

It's just to populate dsm I think. I can rip that out and then we don't call i915_gem_init_stolen() for the stolen device memory path? Maybe that will look slightly better?

Yes, with the above approach of two struct intel_memory_region_ops? Even if some vfuncs are shared it should be better.

I am also confused by ->release ie. i915_gem_cleanup_stolen. How does that work for two stolen regions, I mean one i915->mm.stolen?

Regards,

Tvrtko

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 04/19] drm/i915/stolen: treat stolen local as normal local memory

Underneath it's the same stuff, so things like the PTE_LM bits for the GTT should just keep working as-is.

Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index ce1c83c13d05..017db8f71130 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -19,7 +19,10 @@ const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = {

bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) { - return obj->ops == &i915_gem_lmem_obj_ops; + struct intel_memory_region *mr = obj->mm.region; + + return mr && (mr->type == INTEL_MEMORY_LOCAL || + mr->type == INTEL_MEMORY_STOLEN_LOCAL); }

struct drm_i915_gem_object *

-- 2.26.3

Tvrtko Ursulin

14 Apr 14 Apr

3:06 p.m.

New subject: [Intel-gfx] [PATCH 04/19] drm/i915/stolen: treat stolen local as normal local memory

On 12/04/2021 10:05, Matthew Auld wrote:

...

Underneath it's the same stuff, so things like the PTE_LM bits for the GTT should just keep working as-is.

Signed-off-by: Matthew Auld matthew.auld@intel.com

drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index ce1c83c13d05..017db8f71130 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -19,7 +19,10 @@ const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = {

bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) {

return obj->ops == &i915_gem_lmem_obj_ops;
struct intel_memory_region *mr = obj->mm.region;

return mr && (mr->type == INTEL_MEMORY_LOCAL ||
      mr->type == INTEL_MEMORY_STOLEN_LOCAL);
}

struct drm_i915_gem_object *

Passable I guess. Although there is also i915_gem_object_is_stolen so it is not immediately clear what are the semantics of i915_gem_object_is_lmem vs that one. Almost like we need more "hierarchy" in region types, or flags of some sort, but I haven't looked at the callers to have a good idea what would work best.

Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com

Regards,

Tvrtko

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 05/19] drm/i915/stolen: enforce the min_page_size contract

From: CQ Tang cq.tang@intel.com

Since stolen can now be device local-memory underneath, we should try to enforce any min_page_size restrictions when allocating pages.

Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 56dd58bef5ee..f713eabb7671 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -677,7 +677,8 @@ static int _i915_gem_object_stolen_init(struct intel_memory_region *mem, if (!stolen) return -ENOMEM;

- ret = i915_gem_stolen_insert_node(i915, stolen, size, 4096); + ret = i915_gem_stolen_insert_node(i915, stolen, size, + mem->min_page_size); if (ret) goto err_free;

@@ -817,8 +818,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915,

/* KISS and expect everything to be page-aligned */ if (GEM_WARN_ON(size == 0) || - GEM_WARN_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE)) || - GEM_WARN_ON(!IS_ALIGNED(stolen_offset, I915_GTT_MIN_ALIGNMENT))) + GEM_WARN_ON(!IS_ALIGNED(size, mem->min_page_size)) || + GEM_WARN_ON(!IS_ALIGNED(stolen_offset, mem->min_page_size))) return ERR_PTR(-EINVAL);

stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);

-- 2.26.3

Tvrtko Ursulin

14 Apr 14 Apr

3:07 p.m.

New subject: [Intel-gfx] [PATCH 05/19] drm/i915/stolen: enforce the min_page_size contract

On 12/04/2021 10:05, Matthew Auld wrote:

...

From: CQ Tang cq.tang@intel.com

Since stolen can now be device local-memory underneath, we should try to enforce any min_page_size restrictions when allocating pages.

Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com

drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 56dd58bef5ee..f713eabb7671 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -677,7 +677,8 @@ static int _i915_gem_object_stolen_init(struct intel_memory_region *mem, if (!stolen) return -ENOMEM;

ret = i915_gem_stolen_insert_node(i915, stolen, size, 4096);
ret = i915_gem_stolen_insert_node(i915, stolen, size,
			  mem->min_page_size);
if (ret) goto err_free;
@@ -817,8 +818,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915,

/* KISS and expect everything to be page-aligned */ if (GEM_WARN_ON(size == 0) ||
   GEM_WARN_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE)) ||
   GEM_WARN_ON(!IS_ALIGNED(stolen_offset, I915_GTT_MIN_ALIGNMENT)))
   GEM_WARN_ON(!IS_ALIGNED(size, mem->min_page_size)) ||
   GEM_WARN_ON(!IS_ALIGNED(stolen_offset, mem->min_page_size)))
return ERR_PTR(-EINVAL);

stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);

Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com

Regards,

Tvrtko

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 06/19] drm/i915/stolen: pass the allocation flags

From: CQ Tang cq.tang@intel.com

Stolen memory is always allocated as physically contiguous pages, mark the object flags as such.

Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index f713eabb7671..49a2dfcc8ba7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -633,14 +633,15 @@ static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {

static int __i915_gem_object_create_stolen(struct intel_memory_region *mem, struct drm_i915_gem_object *obj, - struct drm_mm_node *stolen) + struct drm_mm_node *stolen, + unsigned int flags) { static struct lock_class_key lock_class; unsigned int cache_level; int err;

drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size); - i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, 0); + i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, flags);

obj->stolen = stolen; obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT; @@ -682,7 +683,7 @@ static int _i915_gem_object_stolen_init(struct intel_memory_region *mem, if (ret) goto err_free;

- ret = __i915_gem_object_create_stolen(mem, obj, stolen); + ret = __i915_gem_object_create_stolen(mem, obj, stolen, flags); if (ret) goto err_remove;

@@ -840,7 +841,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915, goto err_stolen; }

- ret = __i915_gem_object_create_stolen(mem, obj, stolen); + ret = __i915_gem_object_create_stolen(mem, obj, stolen, + I915_BO_ALLOC_CONTIGUOUS); if (ret) goto err_object_free;

-- 2.26.3

Tvrtko Ursulin

14 Apr 14 Apr

3:09 p.m.

New subject: [Intel-gfx] [PATCH 06/19] drm/i915/stolen: pass the allocation flags

On 12/04/2021 10:05, Matthew Auld wrote:

...

From: CQ Tang cq.tang@intel.com

Stolen memory is always allocated as physically contiguous pages, mark the object flags as such.

Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com

drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index f713eabb7671..49a2dfcc8ba7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -633,14 +633,15 @@ static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {

static int __i915_gem_object_create_stolen(struct intel_memory_region *mem, struct drm_i915_gem_object *obj,
			   struct drm_mm_node *stolen)
			   struct drm_mm_node *stolen,
			   unsigned int flags)
{ static struct lock_class_key lock_class; unsigned int cache_level; int err;

drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size);
i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, 0);

i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, flags);

obj->stolen = stolen; obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;

@@ -682,7 +683,7 @@ static int _i915_gem_object_stolen_init(struct intel_memory_region *mem, if (ret) goto err_free;

ret = __i915_gem_object_create_stolen(mem, obj, stolen);

ret = __i915_gem_object_create_stolen(mem, obj, stolen, flags); if (ret) goto err_remove;

@@ -840,7 +841,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915, goto err_stolen; }

ret = __i915_gem_object_create_stolen(mem, obj, stolen);
ret = __i915_gem_object_create_stolen(mem, obj, stolen,
			      I915_BO_ALLOC_CONTIGUOUS);
if (ret) goto err_object_free;

Are all stolen objects always contiguous or only ones allocated by i915_gem_object_create_stolen_for_preallocated? If former should __i915_gem_object_create_stolen just set the flag without the need to pass it in?

Regards,

Tvrtko

Matthew Auld

16 Apr 16 Apr

1:53 p.m.

New subject: [Intel-gfx] [PATCH 06/19] drm/i915/stolen: pass the allocation flags

On 14/04/2021 16:09, Tvrtko Ursulin wrote:

...

On 12/04/2021 10:05, Matthew Auld wrote:

...
From: CQ Tang cq.tang@intel.com

Stolen memory is always allocated as physically contiguous pages, mark the object flags as such.

Signed-off-by: CQ Tang cq.tang@intel.com Signed-off-by: Matthew Auld matthew.auld@intel.com

drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index f713eabb7671..49a2dfcc8ba7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -633,14 +633,15 @@ static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = { static int __i915_gem_object_create_stolen(struct intel_memory_region *mem, struct drm_i915_gem_object *obj, - struct drm_mm_node *stolen) + struct drm_mm_node *stolen, + unsigned int flags) { static struct lock_class_key lock_class; unsigned int cache_level; int err; drm_gem_private_object_init(&mem->i915->drm, &obj->base, stolen->size); - i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, 0); + i915_gem_object_init(obj, &i915_gem_object_stolen_ops, &lock_class, flags); obj->stolen = stolen; obj->read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT; @@ -682,7 +683,7 @@ static int _i915_gem_object_stolen_init(struct intel_memory_region *mem, if (ret) goto err_free; - ret = __i915_gem_object_create_stolen(mem, obj, stolen); + ret = __i915_gem_object_create_stolen(mem, obj, stolen, flags); if (ret) goto err_remove; @@ -840,7 +841,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *i915, goto err_stolen; } - ret = __i915_gem_object_create_stolen(mem, obj, stolen); + ret = __i915_gem_object_create_stolen(mem, obj, stolen, + I915_BO_ALLOC_CONTIGUOUS); if (ret) goto err_object_free;

Are all stolen objects always contiguous or only ones allocated by i915_gem_object_create_stolen_for_preallocated? If former should __i915_gem_object_create_stolen just set the flag without the need to pass it in?

Yes, all stolen object are physically contiguous. Agreed, moving the I915_BO_ALLOC_CONTIGUOUS into __i915_gem_object_create_stolen() makes more sense here.

...

Regards,

Tvrtko

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 07/19] drm/i915/fbdev: Use lmem physical addresses for fb_mmap() on discrete

From: Mohammed Khajapasha mohammed.khajapasha@intel.com

use local memory io BAR address for fbdev's fb_mmap() operation on discrete, fbdev uses the physical address of our framebuffer for its fb_mmap() fn.

Signed-off-by: Mohammed Khajapasha mohammed.khajapasha@intel.com --- drivers/gpu/drm/i915/display/intel_fbdev.c | 29 +++++++++++++++++----- 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index ccd00e65a5fe..2b37959da747 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -41,6 +41,8 @@ #include <drm/drm_fb_helper.h> #include <drm/drm_fourcc.h>

+#include "gem/i915_gem_lmem.h" + #include "i915_drv.h" #include "intel_display_types.h" #include "intel_fbdev.h" @@ -178,6 +180,7 @@ static int intelfb_create(struct drm_fb_helper *helper, unsigned long flags = 0; bool prealloc = false; void __iomem *vaddr; + struct drm_i915_gem_object *obj; int ret;

if (intel_fb && @@ -232,13 +235,27 @@ static int intelfb_create(struct drm_fb_helper *helper, info->fbops = &intelfb_ops;

/* setup aperture base/size for vesafb takeover */ - info->apertures->ranges[0].base = ggtt->gmadr.start; - info->apertures->ranges[0].size = ggtt->mappable_end; + obj = intel_fb_obj(&intel_fb->base); + if (i915_gem_object_is_lmem(obj)) { + struct intel_memory_region *mem = obj->mm.region; + + info->apertures->ranges[0].base = mem->io_start; + info->apertures->ranges[0].size = mem->total; + + /* Use fbdev's framebuffer from lmem for discrete */ + info->fix.smem_start = + (unsigned long)(mem->io_start + + i915_gem_object_get_dma_address(obj, 0)); + info->fix.smem_len = obj->base.size; + } else { + info->apertures->ranges[0].base = ggtt->gmadr.start; + info->apertures->ranges[0].size = ggtt->mappable_end;

- /* Our framebuffer is the entirety of fbdev's system memory */ - info->fix.smem_start = - (unsigned long)(ggtt->gmadr.start + vma->node.start); - info->fix.smem_len = vma->node.size; + /* Our framebuffer is the entirety of fbdev's system memory */ + info->fix.smem_start = + (unsigned long)(ggtt->gmadr.start + vma->node.start); + info->fix.smem_len = vma->node.size; + }

vaddr = i915_vma_pin_iomap(vma); if (IS_ERR(vaddr)) {

-- 2.26.3

Daniel Vetter

3 p.m.

New subject: [PATCH 07/19] drm/i915/fbdev: Use lmem physical addresses for fb_mmap() on discrete

On Mon, Apr 12, 2021 at 10:05:14AM +0100, Matthew Auld wrote:

...

From: Mohammed Khajapasha mohammed.khajapasha@intel.com

use local memory io BAR address for fbdev's fb_mmap() operation on discrete, fbdev uses the physical address of our framebuffer for its fb_mmap() fn.

Signed-off-by: Mohammed Khajapasha mohammed.khajapasha@intel.com

Sob missing (I didn't check all previous patches), but also I think we should aim more to reuse drm fbdev helpers and retire our owns here. Eventually, long-term, and all that. -Daniel

...

drivers/gpu/drm/i915/display/intel_fbdev.c | 29 +++++++++++++++++----- 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index ccd00e65a5fe..2b37959da747 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -41,6 +41,8 @@ #include <drm/drm_fb_helper.h> #include <drm/drm_fourcc.h>

+#include "gem/i915_gem_lmem.h"

#include "i915_drv.h" #include "intel_display_types.h" #include "intel_fbdev.h" @@ -178,6 +180,7 @@ static int intelfb_create(struct drm_fb_helper *helper, unsigned long flags = 0; bool prealloc = false; void __iomem *vaddr;

struct drm_i915_gem_object *obj; int ret;

if (intel_fb &&

@@ -232,13 +235,27 @@ static int intelfb_create(struct drm_fb_helper *helper, info->fbops = &intelfb_ops;

/* setup aperture base/size for vesafb takeover */

info->apertures->ranges[0].base = ggtt->gmadr.start;

info->apertures->ranges[0].size = ggtt->mappable_end;
obj = intel_fb_obj(&intel_fb->base);

if (i915_gem_object_is_lmem(obj)) {
struct intel_memory_region *mem = obj->mm.region;
info->apertures->ranges[0].base = mem->io_start;
info->apertures->ranges[0].size = mem->total;
/* Use fbdev's framebuffer from lmem for discrete */
info->fix.smem_start =
	(unsigned long)(mem->io_start +
			i915_gem_object_get_dma_address(obj, 0));
info->fix.smem_len = obj->base.size;
} else {
info->apertures->ranges[0].base = ggtt->gmadr.start;
info->apertures->ranges[0].size = ggtt->mappable_end;
/* Our framebuffer is the entirety of fbdev's system memory */

info->fix.smem_start =
(unsigned long)(ggtt->gmadr.start + vma->node.start);
info->fix.smem_len = vma->node.size;
/* Our framebuffer is the entirety of fbdev's system memory */
info->fix.smem_start =
	(unsigned long)(ggtt->gmadr.start + vma->node.start);
info->fix.smem_len = vma->node.size;
}

vaddr = i915_vma_pin_iomap(vma); if (IS_ERR(vaddr)) {
-- 2.26.3

dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Matthew Auld

9:05 a.m.

New subject: [PATCH 08/19] drm/i915: Return error value when bo not in LMEM for discrete

From: Mohammed Khajapasha mohammed.khajapasha@intel.com

Return EREMOTE value when frame buffer object is not backed by LMEM for discrete. If Local memory is supported by hardware the framebuffer backing gem objects should be from local memory.

Signed-off-by: Mohammed Khajapasha mohammed.khajapasha@intel.com --- drivers/gpu/drm/i915/display/intel_display.c | 10 ++++++++++ 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 411b46c012f8..57b06d8728af 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -63,6 +63,7 @@ #include "display/intel_vdsc.h" #include "display/intel_vrr.h"

+#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_object.h"

#include "gt/intel_rps.h" @@ -11279,11 +11280,20 @@ intel_user_framebuffer_create(struct drm_device *dev, struct drm_framebuffer *fb; struct drm_i915_gem_object *obj; struct drm_mode_fb_cmd2 mode_cmd = *user_mode_cmd; + struct drm_i915_private *i915;

obj = i915_gem_object_lookup(filp, mode_cmd.handles[0]); if (!obj) return ERR_PTR(-ENOENT);

+ /* object is backed with LMEM for discrete */ + i915 = to_i915(obj->base.dev); + if (HAS_LMEM(i915) && !i915_gem_object_is_lmem(obj)) { + /* object is "remote", not in local memory */ + i915_gem_object_put(obj); + return ERR_PTR(-EREMOTE); + } + fb = intel_framebuffer_create(obj, &mode_cmd); i915_gem_object_put(obj);

-- 2.26.3

Tvrtko Ursulin

14 Apr 14 Apr

3:16 p.m.

New subject: [Intel-gfx] [PATCH 08/19] drm/i915: Return error value when bo not in LMEM for discrete

On 12/04/2021 10:05, Matthew Auld wrote:

...

From: Mohammed Khajapasha mohammed.khajapasha@intel.com

Return EREMOTE value when frame buffer object is not backed by LMEM for discrete. If Local memory is supported by hardware the framebuffer backing gem objects should be from local memory.

Signed-off-by: Mohammed Khajapasha mohammed.khajapasha@intel.com

drivers/gpu/drm/i915/display/intel_display.c | 10 ++++++++++ 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 411b46c012f8..57b06d8728af 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -63,6 +63,7 @@ #include "display/intel_vdsc.h" #include "display/intel_vrr.h"

+#include "gem/i915_gem_lmem.h" #include "gem/i915_gem_object.h"

#include "gt/intel_rps.h" @@ -11279,11 +11280,20 @@ intel_user_framebuffer_create(struct drm_device *dev, struct drm_framebuffer *fb; struct drm_i915_gem_object *obj; struct drm_mode_fb_cmd2 mode_cmd = *user_mode_cmd;
struct drm_i915_private *i915;

obj = i915_gem_object_lookup(filp, mode_cmd.handles[0]); if (!obj) return ERR_PTR(-ENOENT);

/* object is backed with LMEM for discrete */

i915 = to_i915(obj->base.dev);

if (HAS_LMEM(i915) && !i915_gem_object_is_lmem(obj)) {
/* object is "remote", not in local memory */
i915_gem_object_put(obj);
return ERR_PTR(-EREMOTE);

I am a fan of rich errnos and this one feels appropriately descriptive, but please get an ack from Daniel or so.

Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com

Regards,

Tvrtko

...

}

fb = intel_framebuffer_create(obj, &mode_cmd); i915_gem_object_put(obj);

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 09/19] drm/i915/lmem: Fail driver init if LMEM training failed

From: Matt Roper matthew.d.roper@intel.com

Boot firmware performs memory training and health assessment during startup. If the memory training fails, the firmware will consider the GPU unusable and will instruct the punit to keep the GT powered down. If this happens, our driver will be unable to communicate with the GT (all GT registers will read back as 0, forcewake requests will timeout, etc.) so we should abort driver initialization if this happens. We can confirm that LMEM was initialized successfully via sgunit register GU_CNTL.

Bspec: 53111 Signed-off-by: Matt Roper matthew.d.roper@intel.com Cc: Caz Yokoyama Caz.Yokoyama@intel.com Reviewed-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/i915_reg.h | 3 +++ drivers/gpu/drm/i915/intel_uncore.c | 12 ++++++++++++ 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 4108f2a7ebfa..da73dc939e58 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -487,6 +487,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define GAB_CTL _MMIO(0x24000) #define GAB_CTL_CONT_AFTER_PAGEFAULT (1 << 8)

+#define GU_CNTL _MMIO(0x101010) +#define LMEM_INIT REG_BIT(7) + #define GEN6_STOLEN_RESERVED _MMIO(0x1082C0) #define GEN6_STOLEN_RESERVED_ADDR_MASK (0xFFF << 20) #define GEN7_STOLEN_RESERVED_ADDR_MASK (0x3FFF << 18) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 661b50191f2b..4d0605757428 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1917,6 +1917,18 @@ int intel_uncore_init_mmio(struct intel_uncore *uncore) if (ret) return ret;

+ /* + * The boot firmware initializes local memory and assesses its health. + * If memory training fails, the punit will have been instructed to + * keep the GT powered down; we won't be able to communicate with it + * and we should not continue with driver initialization. + */ + if (IS_DGFX(i915) && + !(__raw_uncore_read32(uncore, GU_CNTL) & LMEM_INIT)) { + drm_err(&i915->drm, "LMEM not initialized by firmware\n"); + return -ENODEV; + } + if (INTEL_GEN(i915) > 5 && !intel_vgpu_active(i915)) uncore->flags |= UNCORE_HAS_FORCEWAKE;

-- 2.26.3

Matthew Auld

9:05 a.m.

New subject: [PATCH 10/19] drm/i915/dg1: Fix mapping type for default state object

From: Venkata Ramana Nayana venkata.ramana.nayana@intel.com

Use I915_MAP_WC when default state object is allocated in LMEM.

Signed-off-by: Venkata Ramana Nayana venkata.ramana.nayana@intel.com Reviewed-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index f8f02aab842b..0683b27a3890 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -8,6 +8,7 @@ #include <linux/shmem_fs.h>

#include "gem/i915_gem_object.h" +#include "gem/i915_gem_lmem.h" #include "shmem_utils.h"

struct file *shmem_create_from_data(const char *name, void *data, size_t len) @@ -39,7 +40,8 @@ struct file *shmem_create_from_object(struct drm_i915_gem_object *obj) return file; }

- ptr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); + ptr = i915_gem_object_pin_map_unlocked(obj, i915_gem_object_is_lmem(obj) ? + I915_MAP_WC : I915_MAP_WB); if (IS_ERR(ptr)) return ERR_CAST(ptr);

-- 2.26.3

Matthew Auld

9:05 a.m.

New subject: [PATCH 11/19] drm/i915: Update the helper to set correct mapping

From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

Determine the possible coherent map type based on object location, and if target has llc or if user requires an always coherent mapping.

Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- 11 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index efe935f80c1a..b79568d370f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -664,7 +664,8 @@ static int init_status_page(struct intel_engine_cs *engine) if (ret) goto err;

- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map(obj, + i915_coherent_map_type(engine->i915, obj, true)); if (IS_ERR(vaddr)) { ret = PTR_ERR(vaddr); goto err_unpin; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 7c9af86fdb1e..47f4397095e5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct intel_context *ce)

if (ce->state) { struct drm_i915_gem_object *obj = ce->state->obj; - int type = i915_coherent_map_type(ce->engine->i915); + int type = i915_coherent_map_type(ce->engine->i915, obj, true); void *map;

if (!i915_gem_object_trylock(obj)) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index e86897cde984..aafe2a4df496 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, GEM_BUG_ON(!i915_vma_is_pinned(ce->state));

*vaddr = i915_gem_object_pin_map(ce->state->obj, - i915_coherent_map_type(ce->engine->i915) | + i915_coherent_map_type(ce->engine->i915, + ce->state->obj, + false) | I915_MAP_OVERRIDE);

return PTR_ERR_OR_ZERO(*vaddr); diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index aee0a77c77e0..3cf6c7e68108 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)

if (i915_vma_is_map_and_fenceable(vma)) addr = (void __force *)i915_vma_pin_iomap(vma); - else - addr = i915_gem_object_pin_map(vma->obj, - i915_coherent_map_type(vma->vm->i915)); + else { + int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false); + + addr = i915_gem_object_pin_map(vma->obj, type); + } + if (IS_ERR(addr)) { ret = PTR_ERR(addr); goto err_ring; diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index b9bdd1d23243..26685b927169 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -88,7 +88,8 @@ static int __live_context_size(struct intel_engine_cs *engine) goto err;

vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, - i915_coherent_map_type(engine->i915)); + i915_coherent_map_type(engine->i915, + ce->state->obj, false)); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); intel_context_unpin(ce); diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 746985971c3a..5b63d4df8c93 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt) h->seqno = memset(vaddr, 0xff, PAGE_SIZE);

vaddr = i915_gem_object_pin_map_unlocked(h->obj, - i915_coherent_map_type(gt->i915)); + i915_coherent_map_type(gt->i915, h->obj, false)); if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_unpin_hws; @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) return ERR_CAST(obj); }

- vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915)); + vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915, obj, false)); if (IS_ERR(vaddr)) { i915_gem_object_put(obj); i915_vm_put(vm); diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 85e7df6a5123..d8f6623524e8 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1221,7 +1221,9 @@ static int compare_isolation(struct intel_engine_cs *engine, }

lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, - i915_coherent_map_type(engine->i915)); + i915_coherent_map_type(engine->i915, + ce->state->obj, + false)); if (IS_ERR(lrc)) { err = PTR_ERR(lrc); goto err_B1; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 78305b2ec89d..adae04c47aab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size, if (IS_ERR(vma)) return PTR_ERR(vma);

- vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, + i915_coherent_map_type(guc_to_gt(guc)->i915, + vma->obj, true)); if (IS_ERR(vaddr)) { i915_vma_unpin_and_release(&vma, 0); return PTR_ERR(vaddr); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 2126dd81ac38..56d2144dc6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc) if (IS_ERR(vma)) return PTR_ERR(vma);

- vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB); + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, + i915_coherent_map_type(gt->i915, + vma->obj, true)); if (IS_ERR(vaddr)) { i915_vma_unpin_and_release(&vma, 0); return PTR_ERR(vaddr); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 69e43bf91a15..2abbc06712a4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -78,6 +78,7 @@ #include "gem/i915_gem_context_types.h" #include "gem/i915_gem_shrinker.h" #include "gem/i915_gem_stolen.h" +#include "gem/i915_gem_lmem.h"

#include "gt/intel_engine.h" #include "gt/intel_gt_types.h" @@ -1921,9 +1922,15 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915) }

static inline enum i915_map_type -i915_coherent_map_type(struct drm_i915_private *i915) +i915_coherent_map_type(struct drm_i915_private *i915, + struct drm_i915_gem_object *obj, bool always_coherent) { - return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; + if (i915_gem_object_is_lmem(obj)) + return I915_MAP_WC; + if (HAS_LLC(i915) || always_coherent) + return I915_MAP_WB; + else + return I915_MAP_WC; }

#endif diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index cfbbe415b57c..5fe397b7d1d9 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -94,9 +94,9 @@ int igt_spinner_pin(struct igt_spinner *spin, }

if (!spin->batch) { - unsigned int mode = - i915_coherent_map_type(spin->gt->i915); + unsigned int mode;

+ mode = i915_coherent_map_type(spin->gt->i915, spin->obj, false); vaddr = igt_spinner_pin_obj(ce, ww, spin->obj, mode, &spin->batch_vma); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);

-- 2.26.3

Tvrtko Ursulin

14 Apr 14 Apr

3:22 p.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 12/04/2021 10:05, Matthew Auld wrote:

...

From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

Determine the possible coherent map type based on object location, and if target has llc or if user requires an always coherent mapping.

Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- 11 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index efe935f80c1a..b79568d370f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -664,7 +664,8 @@ static int init_status_page(struct intel_engine_cs *engine) if (ret) goto err;

vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map(obj,
			i915_coherent_map_type(engine->i915, obj, true));
if (IS_ERR(vaddr)) { ret = PTR_ERR(vaddr); goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 7c9af86fdb1e..47f4397095e5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct intel_context *ce)

if (ce->state) { struct drm_i915_gem_object *obj = ce->state->obj;
int type = i915_coherent_map_type(ce->engine->i915);
int type = i915_coherent_map_type(ce->engine->i915, obj, true);
void *map;

if (!i915_gem_object_trylock(obj))
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index e86897cde984..aafe2a4df496 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, GEM_BUG_ON(!i915_vma_is_pinned(ce->state));

*vaddr = i915_gem_object_pin_map(ce->state->obj,
			 i915_coherent_map_type(ce->engine->i915) |
			 i915_coherent_map_type(ce->engine->i915,
						ce->state->obj,
						false) |
		 I915_MAP_OVERRIDE);
return PTR_ERR_OR_ZERO(*vaddr);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index aee0a77c77e0..3cf6c7e68108 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)

if (i915_vma_is_map_and_fenceable(vma)) addr = (void __force *)i915_vma_pin_iomap(vma);
else
addr = i915_gem_object_pin_map(vma->obj,
			       i915_coherent_map_type(vma->vm->i915));
else {
int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false);
addr = i915_gem_object_pin_map(vma->obj, type);
}

if (IS_ERR(addr)) { ret = PTR_ERR(addr); goto err_ring;
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index b9bdd1d23243..26685b927169 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -88,7 +88,8 @@ static int __live_context_size(struct intel_engine_cs *engine) goto err;

vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj,
				 i915_coherent_map_type(engine->i915));
				 i915_coherent_map_type(engine->i915,
							ce->state->obj, false));
if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 746985971c3a..5b63d4df8c93 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt) h->seqno = memset(vaddr, 0xff, PAGE_SIZE);

vaddr = i915_gem_object_pin_map_unlocked(h->obj,
				 i915_coherent_map_type(gt->i915));
				 i915_coherent_map_type(gt->i915, h->obj, false));
if (IS_ERR(vaddr)) { err = PTR_ERR(vaddr); goto err_unpin_hws;
@@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) return ERR_CAST(obj); }

vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915));

vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915, obj, false)); if (IS_ERR(vaddr)) { i915_gem_object_put(obj); i915_vm_put(vm);

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 85e7df6a5123..d8f6623524e8 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1221,7 +1221,9 @@ static int compare_isolation(struct intel_engine_cs *engine, }

lrc = i915_gem_object_pin_map_unlocked(ce->state->obj,
		      i915_coherent_map_type(engine->i915));
			       i915_coherent_map_type(engine->i915,
						      ce->state->obj,
						      false));
if (IS_ERR(lrc)) { err = PTR_ERR(lrc); goto err_B1;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 78305b2ec89d..adae04c47aab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size, if (IS_ERR(vma)) return PTR_ERR(vma);

vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
				 i915_coherent_map_type(guc_to_gt(guc)->i915,
							vma->obj, true));
if (IS_ERR(vaddr)) { i915_vma_unpin_and_release(&vma, 0); return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 2126dd81ac38..56d2144dc6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc) if (IS_ERR(vma)) return PTR_ERR(vma);

vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
				 i915_coherent_map_type(gt->i915,
							vma->obj, true));
if (IS_ERR(vaddr)) { i915_vma_unpin_and_release(&vma, 0); return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 69e43bf91a15..2abbc06712a4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -78,6 +78,7 @@ #include "gem/i915_gem_context_types.h" #include "gem/i915_gem_shrinker.h" #include "gem/i915_gem_stolen.h" +#include "gem/i915_gem_lmem.h"

#include "gt/intel_engine.h" #include "gt/intel_gt_types.h" @@ -1921,9 +1922,15 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915) }

static inline enum i915_map_type -i915_coherent_map_type(struct drm_i915_private *i915) +i915_coherent_map_type(struct drm_i915_private *i915,
       struct drm_i915_gem_object *obj, bool always_coherent)
{
return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC;
if (i915_gem_object_is_lmem(obj))
return I915_MAP_WC;
if (HAS_LLC(i915) || always_coherent)
return I915_MAP_WB;
else
return I915_MAP_WC;

Seems this patch is doing two things.

First it is adding lmem support to this helper by always returning WC for lmem objects.

Secondly it is introducing an idea of "always coherent" in a helper called i915_coherent_map_type. Could someone explain what is coherent vs always coherent?

And also, why is always coherent happy with WB? Sounds counter intuitive to me.

Regards,

Tvrtko

...

}

#endif diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index cfbbe415b57c..5fe397b7d1d9 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -94,9 +94,9 @@ int igt_spinner_pin(struct igt_spinner *spin, }

if (!spin->batch) {
unsigned int mode =
	i915_coherent_map_type(spin->gt->i915);
unsigned int mode;
mode = i915_coherent_map_type(spin->gt->i915, spin->obj, false);
vaddr = igt_spinner_pin_obj(ce, ww, spin->obj, mode, &spin->batch_vma); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);

Matthew Auld

4:20 p.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...

On 12/04/2021 10:05, Matthew Auld wrote:

...
From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

Determine the possible coherent map type based on object location, and if target has llc or if user requires an always coherent mapping.

Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- 11 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index efe935f80c1a..b79568d370f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -664,7 +664,8 @@ static int init_status_page(struct intel_engine_cs *engine) if (ret) goto err;
vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map(obj,
                                i915_coherent_map_type(engine->i915, obj, true));
if (IS_ERR(vaddr)) {
        ret = PTR_ERR(vaddr);
        goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 7c9af86fdb1e..47f4397095e5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct intel_context *ce)
  if (ce->state) {
          struct drm_i915_gem_object *obj = ce->state->obj;
        int type = i915_coherent_map_type(ce->engine->i915);
        int type = i915_coherent_map_type(ce->engine->i915, obj, true);
        void *map;

        if (!i915_gem_object_trylock(obj))
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index e86897cde984..aafe2a4df496 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
  *vaddr = i915_gem_object_pin_map(ce->state->obj,
                                 i915_coherent_map_type(ce->engine->i915) |
                                 i915_coherent_map_type(ce->engine->i915,
                                                        ce->state->obj,
                                                        false) |
                                 I915_MAP_OVERRIDE);

return PTR_ERR_OR_ZERO(*vaddr);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index aee0a77c77e0..3cf6c7e68108 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)
  if (i915_vma_is_map_and_fenceable(vma))
          addr = (void __force *)i915_vma_pin_iomap(vma);
else
        addr = i915_gem_object_pin_map(vma->obj,
                                       i915_coherent_map_type(vma->vm->i915));
else {
        int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false);
        addr = i915_gem_object_pin_map(vma->obj, type);
}
if (IS_ERR(addr)) {
        ret = PTR_ERR(addr);
        goto err_ring;
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index b9bdd1d23243..26685b927169 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -88,7 +88,8 @@ static int __live_context_size(struct intel_engine_cs *engine) goto err;
  vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj,
                                         i915_coherent_map_type(engine->i915));
                                         i915_coherent_map_type(engine->i915,
                                                                ce->state->obj, false));
if (IS_ERR(vaddr)) {
        err = PTR_ERR(vaddr);
        intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 746985971c3a..5b63d4df8c93 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt) h->seqno = memset(vaddr, 0xff, PAGE_SIZE);
  vaddr = i915_gem_object_pin_map_unlocked(h->obj,
                                         i915_coherent_map_type(gt->i915));
                                         i915_coherent_map_type(gt->i915, h->obj, false));
if (IS_ERR(vaddr)) {
        err = PTR_ERR(vaddr);
        goto err_unpin_hws;
@@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) return ERR_CAST(obj); }
vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915));
vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915, obj, false));
if (IS_ERR(vaddr)) {
        i915_gem_object_put(obj);
        i915_vm_put(vm);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 85e7df6a5123..d8f6623524e8 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1221,7 +1221,9 @@ static int compare_isolation(struct intel_engine_cs *engine, }
  lrc = i915_gem_object_pin_map_unlocked(ce->state->obj,
                              i915_coherent_map_type(engine->i915));
                                       i915_coherent_map_type(engine->i915,
                                                              ce->state->obj,
                                                              false));
if (IS_ERR(lrc)) {
        err = PTR_ERR(lrc);
        goto err_B1;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 78305b2ec89d..adae04c47aab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size, if (IS_ERR(vma)) return PTR_ERR(vma);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
                                         i915_coherent_map_type(guc_to_gt(guc)->i915,
                                                                vma->obj, true));
if (IS_ERR(vaddr)) {
        i915_vma_unpin_and_release(&vma, 0);
        return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 2126dd81ac38..56d2144dc6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc) if (IS_ERR(vma)) return PTR_ERR(vma);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
                                         i915_coherent_map_type(gt->i915,
                                                                vma->obj, true));
if (IS_ERR(vaddr)) {
        i915_vma_unpin_and_release(&vma, 0);
        return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 69e43bf91a15..2abbc06712a4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -78,6 +78,7 @@ #include "gem/i915_gem_context_types.h" #include "gem/i915_gem_shrinker.h" #include "gem/i915_gem_stolen.h" +#include "gem/i915_gem_lmem.h"

#include "gt/intel_engine.h" #include "gt/intel_gt_types.h" @@ -1921,9 +1922,15 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915) }

static inline enum i915_map_type -i915_coherent_map_type(struct drm_i915_private *i915) +i915_coherent_map_type(struct drm_i915_private *i915,
               struct drm_i915_gem_object *obj, bool always_coherent)
{
return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC;
if (i915_gem_object_is_lmem(obj))
        return I915_MAP_WC;
if (HAS_LLC(i915) || always_coherent)
        return I915_MAP_WB;
else
        return I915_MAP_WC;
Seems this patch is doing two things.

First it is adding lmem support to this helper by always returning WC for lmem objects.

Secondly it is introducing an idea of "always coherent" in a helper called i915_coherent_map_type. Could someone explain what is coherent vs always coherent?

And also, why is always coherent happy with WB? Sounds counter intuitive to me.

All this does is try to keep the existing behaviour intact, whilst also ensuring that all lmem objects are mapped using only WC, no matter what. The always_coherent=true thing is for the existing places where we sometimes map the object using WB, without first considering whether the device has the fast shared LLC vs snooping. Yes, it's slightly ugly :)

...

Regards,

Tvrtko

...
}

#endif diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index cfbbe415b57c..5fe397b7d1d9 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -94,9 +94,9 @@ int igt_spinner_pin(struct igt_spinner *spin, }
  if (!spin->batch) {
        unsigned int mode =
                i915_coherent_map_type(spin->gt->i915);
        unsigned int mode;
        mode = i915_coherent_map_type(spin->gt->i915, spin->obj, false);
        vaddr = igt_spinner_pin_obj(ce, ww, spin->obj, mode, &spin->batch_vma);
        if (IS_ERR(vaddr))
                return PTR_ERR(vaddr);
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Tvrtko Ursulin

15 Apr 15 Apr

8:20 a.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 14/04/2021 17:20, Matthew Auld wrote:

...

On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 12/04/2021 10:05, Matthew Auld wrote:

...
From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

Determine the possible coherent map type based on object location, and if target has llc or if user requires an always coherent mapping.

Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- 11 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index efe935f80c1a..b79568d370f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -664,7 +664,8 @@ static int init_status_page(struct intel_engine_cs *engine) if (ret) goto err;
vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map(obj,
                                i915_coherent_map_type(engine->i915, obj, true));
 if (IS_ERR(vaddr)) {
         ret = PTR_ERR(vaddr);
         goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 7c9af86fdb1e..47f4397095e5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct intel_context *ce)
   if (ce->state) {
           struct drm_i915_gem_object *obj = ce->state->obj;
        int type = i915_coherent_map_type(ce->engine->i915);
        int type = i915_coherent_map_type(ce->engine->i915, obj, true);
         void *map;

         if (!i915_gem_object_trylock(obj))
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index e86897cde984..aafe2a4df496 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
   *vaddr = i915_gem_object_pin_map(ce->state->obj,
                                 i915_coherent_map_type(ce->engine->i915) |
                                 i915_coherent_map_type(ce->engine->i915,
                                                        ce->state->obj,
                                                        false) |
                                  I915_MAP_OVERRIDE);

 return PTR_ERR_OR_ZERO(*vaddr);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index aee0a77c77e0..3cf6c7e68108 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)
   if (i915_vma_is_map_and_fenceable(vma))
           addr = (void __force *)i915_vma_pin_iomap(vma);
else
        addr = i915_gem_object_pin_map(vma->obj,
                                       i915_coherent_map_type(vma->vm->i915));
else {
        int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false);
        addr = i915_gem_object_pin_map(vma->obj, type);
}
 if (IS_ERR(addr)) {
         ret = PTR_ERR(addr);
         goto err_ring;
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index b9bdd1d23243..26685b927169 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -88,7 +88,8 @@ static int __live_context_size(struct intel_engine_cs *engine) goto err;
   vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj,
                                         i915_coherent_map_type(engine->i915));
                                         i915_coherent_map_type(engine->i915,
                                                                ce->state->obj, false));
 if (IS_ERR(vaddr)) {
         err = PTR_ERR(vaddr);
         intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 746985971c3a..5b63d4df8c93 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt) h->seqno = memset(vaddr, 0xff, PAGE_SIZE);
   vaddr = i915_gem_object_pin_map_unlocked(h->obj,
                                         i915_coherent_map_type(gt->i915));
                                         i915_coherent_map_type(gt->i915, h->obj, false));
 if (IS_ERR(vaddr)) {
         err = PTR_ERR(vaddr);
         goto err_unpin_hws;
@@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) return ERR_CAST(obj); }
vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915));
vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915, obj, false));
 if (IS_ERR(vaddr)) {
         i915_gem_object_put(obj);
         i915_vm_put(vm);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 85e7df6a5123..d8f6623524e8 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1221,7 +1221,9 @@ static int compare_isolation(struct intel_engine_cs *engine, }
   lrc = i915_gem_object_pin_map_unlocked(ce->state->obj,
                              i915_coherent_map_type(engine->i915));
                                       i915_coherent_map_type(engine->i915,
                                                              ce->state->obj,
                                                              false));
 if (IS_ERR(lrc)) {
         err = PTR_ERR(lrc);
         goto err_B1;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 78305b2ec89d..adae04c47aab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size, if (IS_ERR(vma)) return PTR_ERR(vma);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
                                         i915_coherent_map_type(guc_to_gt(guc)->i915,
                                                                vma->obj, true));
 if (IS_ERR(vaddr)) {
         i915_vma_unpin_and_release(&vma, 0);
         return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 2126dd81ac38..56d2144dc6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc) if (IS_ERR(vma)) return PTR_ERR(vma);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
                                         i915_coherent_map_type(gt->i915,
                                                                vma->obj, true));
 if (IS_ERR(vaddr)) {
         i915_vma_unpin_and_release(&vma, 0);
         return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 69e43bf91a15..2abbc06712a4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -78,6 +78,7 @@ #include "gem/i915_gem_context_types.h" #include "gem/i915_gem_shrinker.h" #include "gem/i915_gem_stolen.h" +#include "gem/i915_gem_lmem.h"

#include "gt/intel_engine.h" #include "gt/intel_gt_types.h" @@ -1921,9 +1922,15 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915) }

static inline enum i915_map_type -i915_coherent_map_type(struct drm_i915_private *i915) +i915_coherent_map_type(struct drm_i915_private *i915,
               struct drm_i915_gem_object *obj, bool always_coherent)
{
return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC;
if (i915_gem_object_is_lmem(obj))
        return I915_MAP_WC;
if (HAS_LLC(i915) || always_coherent)
        return I915_MAP_WB;
else
        return I915_MAP_WC;
Seems this patch is doing two things.

First it is adding lmem support to this helper by always returning WC for lmem objects.

Secondly it is introducing an idea of "always coherent" in a helper called i915_coherent_map_type. Could someone explain what is coherent vs always coherent?

And also, why is always coherent happy with WB? Sounds counter intuitive to me.
All this does is try to keep the existing behaviour intact, whilst also ensuring that all lmem objects are mapped using only WC, no matter what. The always_coherent=true thing is for the existing places where we sometimes map the object using WB, without first considering whether the device has the fast shared LLC vs snooping. Yes, it's slightly ugly :)

Not fully following - if we had to write kerneldoc for always_coherent input argument - what it would say?

Regards,

Tvrtko

Matthew Auld

9:23 a.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...

On 14/04/2021 17:20, Matthew Auld wrote:

...
On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 12/04/2021 10:05, Matthew Auld wrote:

...
From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

Determine the possible coherent map type based on object location, and if target has llc or if user requires an always coherent mapping.

Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- 11 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index efe935f80c1a..b79568d370f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -664,7 +664,8 @@ static int init_status_page(struct intel_engine_cs *engine) if (ret) goto err;
vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map(obj,
                                i915_coherent_map_type(engine->i915, obj, true));
 if (IS_ERR(vaddr)) {
         ret = PTR_ERR(vaddr);
         goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 7c9af86fdb1e..47f4397095e5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct intel_context *ce)
   if (ce->state) {
           struct drm_i915_gem_object *obj = ce->state->obj;
        int type = i915_coherent_map_type(ce->engine->i915);
        int type = i915_coherent_map_type(ce->engine->i915, obj, true);
         void *map;

         if (!i915_gem_object_trylock(obj))
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index e86897cde984..aafe2a4df496 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
   *vaddr = i915_gem_object_pin_map(ce->state->obj,
                                 i915_coherent_map_type(ce->engine->i915) |
                                 i915_coherent_map_type(ce->engine->i915,
                                                        ce->state->obj,
                                                        false) |
                                  I915_MAP_OVERRIDE);

 return PTR_ERR_OR_ZERO(*vaddr);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index aee0a77c77e0..3cf6c7e68108 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)
   if (i915_vma_is_map_and_fenceable(vma))
           addr = (void __force *)i915_vma_pin_iomap(vma);
else
        addr = i915_gem_object_pin_map(vma->obj,
                                       i915_coherent_map_type(vma->vm->i915));
else {
        int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false);
        addr = i915_gem_object_pin_map(vma->obj, type);
}
 if (IS_ERR(addr)) {
         ret = PTR_ERR(addr);
         goto err_ring;
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index b9bdd1d23243..26685b927169 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -88,7 +88,8 @@ static int __live_context_size(struct intel_engine_cs *engine) goto err;
   vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj,
                                         i915_coherent_map_type(engine->i915));
                                         i915_coherent_map_type(engine->i915,
                                                                ce->state->obj, false));
 if (IS_ERR(vaddr)) {
         err = PTR_ERR(vaddr);
         intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 746985971c3a..5b63d4df8c93 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt) h->seqno = memset(vaddr, 0xff, PAGE_SIZE);
   vaddr = i915_gem_object_pin_map_unlocked(h->obj,
                                         i915_coherent_map_type(gt->i915));
                                         i915_coherent_map_type(gt->i915, h->obj, false));
 if (IS_ERR(vaddr)) {
         err = PTR_ERR(vaddr);
         goto err_unpin_hws;
@@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) return ERR_CAST(obj); }
vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915));
vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915, obj, false));
 if (IS_ERR(vaddr)) {
         i915_gem_object_put(obj);
         i915_vm_put(vm);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 85e7df6a5123..d8f6623524e8 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1221,7 +1221,9 @@ static int compare_isolation(struct intel_engine_cs *engine, }
   lrc = i915_gem_object_pin_map_unlocked(ce->state->obj,
                              i915_coherent_map_type(engine->i915));
                                       i915_coherent_map_type(engine->i915,
                                                              ce->state->obj,
                                                              false));
 if (IS_ERR(lrc)) {
         err = PTR_ERR(lrc);
         goto err_B1;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 78305b2ec89d..adae04c47aab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size, if (IS_ERR(vma)) return PTR_ERR(vma);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
                                         i915_coherent_map_type(guc_to_gt(guc)->i915,
                                                                vma->obj, true));
 if (IS_ERR(vaddr)) {
         i915_vma_unpin_and_release(&vma, 0);
         return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 2126dd81ac38..56d2144dc6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc) if (IS_ERR(vma)) return PTR_ERR(vma);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
                                         i915_coherent_map_type(gt->i915,
                                                                vma->obj, true));
 if (IS_ERR(vaddr)) {
         i915_vma_unpin_and_release(&vma, 0);
         return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 69e43bf91a15..2abbc06712a4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -78,6 +78,7 @@ #include "gem/i915_gem_context_types.h" #include "gem/i915_gem_shrinker.h" #include "gem/i915_gem_stolen.h" +#include "gem/i915_gem_lmem.h"

#include "gt/intel_engine.h" #include "gt/intel_gt_types.h" @@ -1921,9 +1922,15 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915) }

static inline enum i915_map_type -i915_coherent_map_type(struct drm_i915_private *i915) +i915_coherent_map_type(struct drm_i915_private *i915,
               struct drm_i915_gem_object *obj, bool always_coherent)
{
return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC;
if (i915_gem_object_is_lmem(obj))
        return I915_MAP_WC;
if (HAS_LLC(i915) || always_coherent)
        return I915_MAP_WB;
else
        return I915_MAP_WC;
Seems this patch is doing two things.

First it is adding lmem support to this helper by always returning WC for lmem objects.

Secondly it is introducing an idea of "always coherent" in a helper called i915_coherent_map_type. Could someone explain what is coherent vs always coherent?

And also, why is always coherent happy with WB? Sounds counter intuitive to me.
All this does is try to keep the existing behaviour intact, whilst also ensuring that all lmem objects are mapped using only WC, no matter what. The always_coherent=true thing is for the existing places where we sometimes map the object using WB, without first considering whether the device has the fast shared LLC vs snooping. Yes, it's slightly ugly :)
Not fully following - if we had to write kerneldoc for always_coherent input argument - what it would say?

@always_coherent - If true we should always try to map the object using WB. If false we should only map as WB if the device supports the fast shared LLC, in the case of snooped devices we will map use WC. Note that If the resource is lmem then we will always map as WC, regardless of the value of always_coherent, since that's all we currently support.

Maybe the naming is poor?

...

Regards,

Tvrtko

Tvrtko Ursulin

11:05 a.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 15/04/2021 10:23, Matthew Auld wrote:

...

On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 14/04/2021 17:20, Matthew Auld wrote:

...
On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 12/04/2021 10:05, Matthew Auld wrote:

...
From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

Determine the possible coherent map type based on object location, and if target has llc or if user requires an always coherent mapping.

Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com
drivers/gpu/drm/i915/gt/intel_engine_cs.c    |  3 ++-
drivers/gpu/drm/i915/gt/intel_engine_pm.c    |  2 +-
drivers/gpu/drm/i915/gt/intel_lrc.c          |  4 +++-
drivers/gpu/drm/i915/gt/intel_ring.c         |  9 ++++++---
drivers/gpu/drm/i915/gt/selftest_context.c   |  3 ++-
drivers/gpu/drm/i915/gt/selftest_hangcheck.c |  4 ++--
drivers/gpu/drm/i915/gt/selftest_lrc.c       |  4 +++-
drivers/gpu/drm/i915/gt/uc/intel_guc.c       |  4 +++-
drivers/gpu/drm/i915/gt/uc/intel_huc.c       |  4 +++-
drivers/gpu/drm/i915/i915_drv.h              | 11 +++++++++--
drivers/gpu/drm/i915/selftests/igt_spinner.c |  4 ++--
11 files changed, 36 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index efe935f80c1a..b79568d370f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -664,7 +664,8 @@ static int init_status_page(struct intel_engine_cs *engine) if (ret) goto err;
vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map(obj,
                                i915_coherent_map_type(engine->i915, obj, true));
  if (IS_ERR(vaddr)) {
          ret = PTR_ERR(vaddr);
          goto err_unpin;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 7c9af86fdb1e..47f4397095e5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct intel_context *ce)
    if (ce->state) {
            struct drm_i915_gem_object *obj = ce->state->obj;
        int type = i915_coherent_map_type(ce->engine->i915);
        int type = i915_coherent_map_type(ce->engine->i915, obj, true);
          void *map;

          if (!i915_gem_object_trylock(obj))
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index e86897cde984..aafe2a4df496 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
    *vaddr = i915_gem_object_pin_map(ce->state->obj,
                                 i915_coherent_map_type(ce->engine->i915) |
                                 i915_coherent_map_type(ce->engine->i915,
                                                        ce->state->obj,
                                                        false) |
                                   I915_MAP_OVERRIDE);

  return PTR_ERR_OR_ZERO(*vaddr);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index aee0a77c77e0..3cf6c7e68108 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)
    if (i915_vma_is_map_and_fenceable(vma))
            addr = (void __force *)i915_vma_pin_iomap(vma);
else
        addr = i915_gem_object_pin_map(vma->obj,
                                       i915_coherent_map_type(vma->vm->i915));
else {
        int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false);
        addr = i915_gem_object_pin_map(vma->obj, type);
}
  if (IS_ERR(addr)) {
          ret = PTR_ERR(addr);
          goto err_ring;
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index b9bdd1d23243..26685b927169 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -88,7 +88,8 @@ static int __live_context_size(struct intel_engine_cs *engine) goto err;
    vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj,
                                         i915_coherent_map_type(engine->i915));
                                         i915_coherent_map_type(engine->i915,
                                                                ce->state->obj, false));
  if (IS_ERR(vaddr)) {
          err = PTR_ERR(vaddr);
          intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 746985971c3a..5b63d4df8c93 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt) h->seqno = memset(vaddr, 0xff, PAGE_SIZE);
    vaddr = i915_gem_object_pin_map_unlocked(h->obj,
                                         i915_coherent_map_type(gt->i915));
                                         i915_coherent_map_type(gt->i915, h->obj, false));
  if (IS_ERR(vaddr)) {
          err = PTR_ERR(vaddr);
          goto err_unpin_hws;
@@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) return ERR_CAST(obj); }
vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915));
vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915, obj, false));
  if (IS_ERR(vaddr)) {
          i915_gem_object_put(obj);
          i915_vm_put(vm);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 85e7df6a5123..d8f6623524e8 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1221,7 +1221,9 @@ static int compare_isolation(struct intel_engine_cs *engine, }
    lrc = i915_gem_object_pin_map_unlocked(ce->state->obj,
                              i915_coherent_map_type(engine->i915));
                                       i915_coherent_map_type(engine->i915,
                                                              ce->state->obj,
                                                              false));
  if (IS_ERR(lrc)) {
          err = PTR_ERR(lrc);
          goto err_B1;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 78305b2ec89d..adae04c47aab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size, if (IS_ERR(vma)) return PTR_ERR(vma);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
                                         i915_coherent_map_type(guc_to_gt(guc)->i915,
                                                                vma->obj, true));
  if (IS_ERR(vaddr)) {
          i915_vma_unpin_and_release(&vma, 0);
          return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 2126dd81ac38..56d2144dc6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc) if (IS_ERR(vma)) return PTR_ERR(vma);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB);
vaddr = i915_gem_object_pin_map_unlocked(vma->obj,
                                         i915_coherent_map_type(gt->i915,
                                                                vma->obj, true));
  if (IS_ERR(vaddr)) {
          i915_vma_unpin_and_release(&vma, 0);
          return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 69e43bf91a15..2abbc06712a4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -78,6 +78,7 @@ #include "gem/i915_gem_context_types.h" #include "gem/i915_gem_shrinker.h" #include "gem/i915_gem_stolen.h" +#include "gem/i915_gem_lmem.h"
#include "gt/intel_engine.h"
#include "gt/intel_gt_types.h"
@@ -1921,9 +1922,15 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915) }
static inline enum i915_map_type
-i915_coherent_map_type(struct drm_i915_private *i915) +i915_coherent_map_type(struct drm_i915_private *i915,
               struct drm_i915_gem_object *obj, bool always_coherent)
{
return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC;
if (i915_gem_object_is_lmem(obj))
        return I915_MAP_WC;
if (HAS_LLC(i915) || always_coherent)
        return I915_MAP_WB;
else
        return I915_MAP_WC;
Seems this patch is doing two things.

First it is adding lmem support to this helper by always returning WC for lmem objects.

Secondly it is introducing an idea of "always coherent" in a helper called i915_coherent_map_type. Could someone explain what is coherent vs always coherent?

And also, why is always coherent happy with WB? Sounds counter intuitive to me.
All this does is try to keep the existing behaviour intact, whilst also ensuring that all lmem objects are mapped using only WC, no matter what. The always_coherent=true thing is for the existing places where we sometimes map the object using WB, without first considering whether the device has the fast shared LLC vs snooping. Yes, it's slightly ugly :)
Not fully following - if we had to write kerneldoc for always_coherent input argument - what it would say?
@always_coherent - If true we should always try to map the object using WB. If false we should only map as WB if the device supports the fast shared LLC, in the case of snooped devices we will map use WC. Note that If the resource is lmem then we will always map as WC, regardless of the value of always_coherent, since that's all we currently support.

Maybe the naming is poor?

Maybe just confusing to me, not sure yet.

So always_coherent is not about how the callers wants to use it, but about platform knowledge? Or a performance concern for LLC vs snooping cases? Does WB works (coherently) on snooping platforms?

Regards,

Tvrtko

Matthew Auld

19 Apr 19 Apr

11:30 a.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 15/04/2021 12:05, Tvrtko Ursulin wrote:

...

On 15/04/2021 10:23, Matthew Auld wrote:

...
On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 14/04/2021 17:20, Matthew Auld wrote:

...
On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 12/04/2021 10:05, Matthew Auld wrote:

...
From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com

Determine the possible coherent map type based on object location, and if target has llc or if user requires an always coherent mapping.

Cc: Matthew Auld matthew.auld@intel.com Cc: CQ Tang cq.tang@intel.com Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Venkata Sandeep Dhanalakota

venkata.s.dhanalakota@intel.com

drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 3 ++-     drivers/gpu/drm/i915/gt/intel_engine_pm.c    | 2 +-     drivers/gpu/drm/i915/gt/intel_lrc.c          | 4 +++-     drivers/gpu/drm/i915/gt/intel_ring.c         | 9 ++++++---     drivers/gpu/drm/i915/gt/selftest_context.c   | 3 ++-     drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++--     drivers/gpu/drm/i915/gt/selftest_lrc.c       | 4 +++-     drivers/gpu/drm/i915/gt/uc/intel_guc.c       | 4 +++-     drivers/gpu/drm/i915/gt/uc/intel_huc.c       | 4 +++-     drivers/gpu/drm/i915/i915_drv.h              | 11 +++++++++--     drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++--     11 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index efe935f80c1a..b79568d370f5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -664,7 +664,8 @@ static int init_status_page(struct intel_engine_cs *engine)         if (ret)                 goto err;

-     vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); +     vaddr = i915_gem_object_pin_map(obj,

i915_coherent_map_type(engine->i915, obj, true));         if (IS_ERR(vaddr)) {                 ret = PTR_ERR(vaddr);                 goto err_unpin; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 7c9af86fdb1e..47f4397095e5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct intel_context *ce)

if (ce->state) {                 struct drm_i915_gem_object *obj = ce->state->obj; -             int type = i915_coherent_map_type(ce->engine->i915); +             int type = i915_coherent_map_type(ce->engine->i915, obj, true);                 void *map;

if (!i915_gem_object_trylock(obj)) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index e86897cde984..aafe2a4df496 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce,         GEM_BUG_ON(!i915_vma_is_pinned(ce->state));

*vaddr = i915_gem_object_pin_map(ce->state->obj,

i915_coherent_map_type(ce->engine->i915) |

i915_coherent_map_type(ce->engine->i915,

ce->state->obj,

false) |                                          I915_MAP_OVERRIDE);

return PTR_ERR_OR_ZERO(*vaddr); diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index aee0a77c77e0..3cf6c7e68108 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)

if (i915_vma_is_map_and_fenceable(vma))                 addr = (void __force *)i915_vma_pin_iomap(vma); -     else -             addr = i915_gem_object_pin_map(vma->obj,

i915_coherent_map_type(vma->vm->i915)); +     else { +             int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false);

+             addr = i915_gem_object_pin_map(vma->obj, type); +     }

if (IS_ERR(addr)) {                 ret = PTR_ERR(addr);                 goto err_ring; diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index b9bdd1d23243..26685b927169 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -88,7 +88,8 @@ static int __live_context_size(struct intel_engine_cs *engine)                 goto err;

vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj,

i915_coherent_map_type(engine->i915));

i915_coherent_map_type(engine->i915,

ce->state->obj, false));         if (IS_ERR(vaddr)) {                 err = PTR_ERR(vaddr);                 intel_context_unpin(ce); diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 746985971c3a..5b63d4df8c93 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct intel_gt *gt)         h->seqno = memset(vaddr, 0xff, PAGE_SIZE);

vaddr = i915_gem_object_pin_map_unlocked(h->obj,

i915_coherent_map_type(gt->i915));

i915_coherent_map_type(gt->i915, h->obj, false));         if (IS_ERR(vaddr)) {                 err = PTR_ERR(vaddr);                 goto err_unpin_hws; @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)                 return ERR_CAST(obj);         }

-     vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915)); +     vaddr = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(gt->i915, obj, false));         if (IS_ERR(vaddr)) {                 i915_gem_object_put(obj);                 i915_vm_put(vm); diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 85e7df6a5123..d8f6623524e8 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1221,7 +1221,9 @@ static int compare_isolation(struct intel_engine_cs *engine,         }

lrc = i915_gem_object_pin_map_unlocked(ce->state->obj,

i915_coherent_map_type(engine->i915));

i915_coherent_map_type(engine->i915, +                                                                   ce->state->obj,

+                                                                   false));

if (IS_ERR(lrc)) {                 err = PTR_ERR(lrc);                 goto err_B1; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 78305b2ec89d..adae04c47aab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size,         if (IS_ERR(vma))                 return PTR_ERR(vma);

-     vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB); +     vaddr = i915_gem_object_pin_map_unlocked(vma->obj,

i915_coherent_map_type(guc_to_gt(guc)->i915,

vma->obj, true));         if (IS_ERR(vaddr)) {                 i915_vma_unpin_and_release(&vma, 0);                 return PTR_ERR(vaddr); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c index 2126dd81ac38..56d2144dc6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct intel_huc *huc)         if (IS_ERR(vma))                 return PTR_ERR(vma);

-     vaddr = i915_gem_object_pin_map_unlocked(vma->obj, I915_MAP_WB); +     vaddr = i915_gem_object_pin_map_unlocked(vma->obj,

i915_coherent_map_type(gt->i915,

vma->obj, true));         if (IS_ERR(vaddr)) {                 i915_vma_unpin_and_release(&vma, 0);                 return PTR_ERR(vaddr); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 69e43bf91a15..2abbc06712a4 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -78,6 +78,7 @@     #include "gem/i915_gem_context_types.h"     #include "gem/i915_gem_shrinker.h"     #include "gem/i915_gem_stolen.h" +#include "gem/i915_gem_lmem.h"

#include "gt/intel_engine.h"     #include "gt/intel_gt_types.h" @@ -1921,9 +1922,15 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)     }

static inline enum i915_map_type -i915_coherent_map_type(struct drm_i915_private *i915) +i915_coherent_map_type(struct drm_i915_private *i915, +                    struct drm_i915_gem_object *obj, bool always_coherent)     { -     return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; +     if (i915_gem_object_is_lmem(obj)) +             return I915_MAP_WC; +     if (HAS_LLC(i915) || always_coherent) +             return I915_MAP_WB; +     else +             return I915_MAP_WC;

Seems this patch is doing two things.

First it is adding lmem support to this helper by always returning WC for lmem objects.

Secondly it is introducing an idea of "always coherent" in a helper called i915_coherent_map_type. Could someone explain what is coherent vs always coherent?

And also, why is always coherent happy with WB? Sounds counter intuitive to me.

All this does is try to keep the existing behaviour intact, whilst also ensuring that all lmem objects are mapped using only WC, no matter what. The always_coherent=true thing is for the existing places where we sometimes map the object using WB, without first considering whether the device has the fast shared LLC vs snooping. Yes, it's slightly ugly :)

Not fully following - if we had to write kerneldoc for always_coherent input argument - what it would say?

@always_coherent - If true we should always try to map the object using WB. If false we should only map as WB if the device supports the fast shared LLC, in the case of snooped devices we will map use WC. Note that If the resource is lmem then we will always map as WC, regardless of the value of always_coherent, since that's all we currently support.

Maybe the naming is poor?

Maybe just confusing to me, not sure yet.

So always_coherent is not about how the callers wants to use it, but about platform knowledge? Or a performance concern for LLC vs snooping cases? Does WB works (coherently) on snooping platforms?

The always_coherent=true is for the existing callers that want WB, regardless of LLC vs snooping.

The other callers use the existing i915_coherent_map_type() which only gives out WB for LLC platforms.

AFAIK, LLC vs snooping should offer the same in terms of coherency, but in terms of performance the shared LLC is much faster, and so for snooping platforms we choose to not enable WB everywhere.

On top of that we now have lmem, but for that we only allow WC. This patch just rolls all of that into one helper, while keeping the existing behaviour unchanged.

...

Regards,

Tvrtko

Tvrtko Ursulin

2:07 p.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 19/04/2021 12:30, Matthew Auld wrote:

...

On 15/04/2021 12:05, Tvrtko Ursulin wrote:

...
On 15/04/2021 10:23, Matthew Auld wrote:

...
On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 14/04/2021 17:20, Matthew Auld wrote:

...
On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 12/04/2021 10:05, Matthew Auld wrote: > From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com > > Determine the possible coherent map type based on object location, > and if target has llc or if user requires an always coherent > mapping. > > Cc: Matthew Auld matthew.auld@intel.com > Cc: CQ Tang cq.tang@intel.com > Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com > Signed-off-by: Venkata Sandeep Dhanalakota > venkata.s.dhanalakota@intel.com > --- > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- > drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- > drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- > drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- > drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- > drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- > drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- > drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- > drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- > 11 files changed, 36 insertions(+), 16 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > index efe935f80c1a..b79568d370f5 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > @@ -664,7 +664,8 @@ static int init_status_page(struct > intel_engine_cs *engine) > if (ret) > goto err; > > - vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); > + vaddr = i915_gem_object_pin_map(obj, > + i915_coherent_map_type(engine->i915, obj, true)); > if (IS_ERR(vaddr)) { > ret = PTR_ERR(vaddr); > goto err_unpin; > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c > b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > index 7c9af86fdb1e..47f4397095e5 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct intel_context > *ce) > > if (ce->state) { > struct drm_i915_gem_object *obj = ce->state->obj; > - int type = i915_coherent_map_type(ce->engine->i915); > + int type = i915_coherent_map_type(ce->engine->i915, > obj, true); > void *map; > > if (!i915_gem_object_trylock(obj)) > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c > b/drivers/gpu/drm/i915/gt/intel_lrc.c > index e86897cde984..aafe2a4df496 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, > GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); > > *vaddr = i915_gem_object_pin_map(ce->state->obj, > - i915_coherent_map_type(ce->engine->i915) | > + i915_coherent_map_type(ce->engine->i915, > + ce->state->obj, > + false) | > I915_MAP_OVERRIDE); > > return PTR_ERR_OR_ZERO(*vaddr); > diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c > b/drivers/gpu/drm/i915/gt/intel_ring.c > index aee0a77c77e0..3cf6c7e68108 100644 > --- a/drivers/gpu/drm/i915/gt/intel_ring.c > +++ b/drivers/gpu/drm/i915/gt/intel_ring.c > @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, > struct i915_gem_ww_ctx *ww) > > if (i915_vma_is_map_and_fenceable(vma)) > addr = (void __force *)i915_vma_pin_iomap(vma); > - else > - addr = i915_gem_object_pin_map(vma->obj, > - i915_coherent_map_type(vma->vm->i915)); > + else { > + int type = i915_coherent_map_type(vma->vm->i915, > vma->obj, false); > + > + addr = i915_gem_object_pin_map(vma->obj, type); > + } > + > if (IS_ERR(addr)) { > ret = PTR_ERR(addr); > goto err_ring; > diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c > b/drivers/gpu/drm/i915/gt/selftest_context.c > index b9bdd1d23243..26685b927169 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_context.c > +++ b/drivers/gpu/drm/i915/gt/selftest_context.c > @@ -88,7 +88,8 @@ static int __live_context_size(struct > intel_engine_cs *engine) > goto err; > > vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, > - i915_coherent_map_type(engine->i915)); > + i915_coherent_map_type(engine->i915, > + ce->state->obj, false)); > if (IS_ERR(vaddr)) { > err = PTR_ERR(vaddr); > intel_context_unpin(ce); > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > index 746985971c3a..5b63d4df8c93 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct > intel_gt *gt) > h->seqno = memset(vaddr, 0xff, PAGE_SIZE); > > vaddr = i915_gem_object_pin_map_unlocked(h->obj, > - i915_coherent_map_type(gt->i915)); > + i915_coherent_map_type(gt->i915, h->obj, false)); > if (IS_ERR(vaddr)) { > err = PTR_ERR(vaddr); > goto err_unpin_hws; > @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct > intel_engine_cs *engine) > return ERR_CAST(obj); > } > > - vaddr = i915_gem_object_pin_map_unlocked(obj, > i915_coherent_map_type(gt->i915)); > + vaddr = i915_gem_object_pin_map_unlocked(obj, > i915_coherent_map_type(gt->i915, obj, false)); > if (IS_ERR(vaddr)) { > i915_gem_object_put(obj); > i915_vm_put(vm); > diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c > b/drivers/gpu/drm/i915/gt/selftest_lrc.c > index 85e7df6a5123..d8f6623524e8 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c > +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c > @@ -1221,7 +1221,9 @@ static int compare_isolation(struct > intel_engine_cs *engine, > } > > lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, > - i915_coherent_map_type(engine->i915)); > + i915_coherent_map_type(engine->i915, > + > ce->state->obj, > + > false)); > if (IS_ERR(lrc)) { > err = PTR_ERR(lrc); > goto err_B1; > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > index 78305b2ec89d..adae04c47aab 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct > intel_guc *guc, u32 size, > if (IS_ERR(vma)) > return PTR_ERR(vma); > > - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, > I915_MAP_WB); > + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, > + i915_coherent_map_type(guc_to_gt(guc)->i915, > + vma->obj, true)); > if (IS_ERR(vaddr)) { > i915_vma_unpin_and_release(&vma, 0); > return PTR_ERR(vaddr); > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c > b/drivers/gpu/drm/i915/gt/uc/intel_huc.c > index 2126dd81ac38..56d2144dc6a0 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c > @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct > intel_huc *huc) > if (IS_ERR(vma)) > return PTR_ERR(vma); > > - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, > I915_MAP_WB); > + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, > + i915_coherent_map_type(gt->i915, > + vma->obj, true)); > if (IS_ERR(vaddr)) { > i915_vma_unpin_and_release(&vma, 0); > return PTR_ERR(vaddr); > diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h > index 69e43bf91a15..2abbc06712a4 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -78,6 +78,7 @@ > #include "gem/i915_gem_context_types.h" > #include "gem/i915_gem_shrinker.h" > #include "gem/i915_gem_stolen.h" > +#include "gem/i915_gem_lmem.h" > > #include "gt/intel_engine.h" > #include "gt/intel_gt_types.h" > @@ -1921,9 +1922,15 @@ static inline int > intel_hws_csb_write_index(struct drm_i915_private *i915) > } > > static inline enum i915_map_type > -i915_coherent_map_type(struct drm_i915_private *i915) > +i915_coherent_map_type(struct drm_i915_private *i915, > + struct drm_i915_gem_object *obj, bool > always_coherent) > { > - return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; > + if (i915_gem_object_is_lmem(obj)) > + return I915_MAP_WC; > + if (HAS_LLC(i915) || always_coherent) > + return I915_MAP_WB; > + else > + return I915_MAP_WC;

Seems this patch is doing two things.

First it is adding lmem support to this helper by always returning WC for lmem objects.

Secondly it is introducing an idea of "always coherent" in a helper called i915_coherent_map_type. Could someone explain what is coherent vs always coherent?

And also, why is always coherent happy with WB? Sounds counter intuitive to me.

All this does is try to keep the existing behaviour intact, whilst also ensuring that all lmem objects are mapped using only WC, no matter what. The always_coherent=true thing is for the existing places where we sometimes map the object using WB, without first considering whether the device has the fast shared LLC vs snooping. Yes, it's slightly ugly :)

Not fully following - if we had to write kerneldoc for always_coherent input argument - what it would say?

@always_coherent - If true we should always try to map the object using WB. If false we should only map as WB if the device supports the fast shared LLC, in the case of snooped devices we will map use WC. Note that If the resource is lmem then we will always map as WC, regardless of the value of always_coherent, since that's all we currently support.

Maybe the naming is poor?

Maybe just confusing to me, not sure yet.

So always_coherent is not about how the callers wants to use it, but about platform knowledge? Or a performance concern for LLC vs snooping cases? Does WB works (coherently) on snooping platforms?

The always_coherent=true is for the existing callers that want WB, regardless of LLC vs snooping.

The other callers use the existing i915_coherent_map_type() which only gives out WB for LLC platforms.

AFAIK, LLC vs snooping should offer the same in terms of coherency, but in terms of performance the shared LLC is much faster, and so for snooping platforms we choose to not enable WB everywhere.

On top of that we now have lmem, but for that we only allow WC. This patch just rolls all of that into one helper, while keeping the existing behaviour unchanged.

Thanks. But I am still struggling with the API. :(

Is the introduction of always_coherent flag in the context of DG1 required even? AFAICT for lmem objects the flag is ignored so no?

Regards,

Tvrtko

Matthew Auld

2:37 p.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 19/04/2021 15:07, Tvrtko Ursulin wrote:

...

On 19/04/2021 12:30, Matthew Auld wrote:

...
On 15/04/2021 12:05, Tvrtko Ursulin wrote:

...
On 15/04/2021 10:23, Matthew Auld wrote:

...
On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 14/04/2021 17:20, Matthew Auld wrote:

...
On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote: > > > On 12/04/2021 10:05, Matthew Auld wrote: >> From: Venkata Sandeep Dhanalakota venkata.s.dhanalakota@intel.com >> >> Determine the possible coherent map type based on object location, >> and if target has llc or if user requires an always coherent >> mapping. >> >> Cc: Matthew Auld matthew.auld@intel.com >> Cc: CQ Tang cq.tang@intel.com >> Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com >> Signed-off-by: Venkata Sandeep Dhanalakota >> venkata.s.dhanalakota@intel.com >> --- >> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- >> drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- >> drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- >> drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- >> drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- >> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- >> drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- >> drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- >> drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- >> drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- >> drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- >> 11 files changed, 36 insertions(+), 16 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >> index efe935f80c1a..b79568d370f5 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >> @@ -664,7 +664,8 @@ static int init_status_page(struct >> intel_engine_cs *engine) >> if (ret) >> goto err; >> >> - vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); >> + vaddr = i915_gem_object_pin_map(obj, >> + i915_coherent_map_type(engine->i915, obj, true)); >> if (IS_ERR(vaddr)) { >> ret = PTR_ERR(vaddr); >> goto err_unpin; >> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >> b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >> index 7c9af86fdb1e..47f4397095e5 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >> @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct intel_context >> *ce) >> >> if (ce->state) { >> struct drm_i915_gem_object *obj = ce->state->obj; >> - int type = i915_coherent_map_type(ce->engine->i915); >> + int type = >> i915_coherent_map_type(ce->engine->i915, obj, true); >> void *map; >> >> if (!i915_gem_object_trylock(obj)) >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c >> b/drivers/gpu/drm/i915/gt/intel_lrc.c >> index e86897cde984..aafe2a4df496 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >> @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, >> GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); >> >> *vaddr = i915_gem_object_pin_map(ce->state->obj, >> - i915_coherent_map_type(ce->engine->i915) | >> + i915_coherent_map_type(ce->engine->i915, >> + ce->state->obj, >> + false) | >> I915_MAP_OVERRIDE); >> >> return PTR_ERR_OR_ZERO(*vaddr); >> diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c >> b/drivers/gpu/drm/i915/gt/intel_ring.c >> index aee0a77c77e0..3cf6c7e68108 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_ring.c >> +++ b/drivers/gpu/drm/i915/gt/intel_ring.c >> @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, >> struct i915_gem_ww_ctx *ww) >> >> if (i915_vma_is_map_and_fenceable(vma)) >> addr = (void __force *)i915_vma_pin_iomap(vma); >> - else >> - addr = i915_gem_object_pin_map(vma->obj, >> - i915_coherent_map_type(vma->vm->i915)); >> + else { >> + int type = i915_coherent_map_type(vma->vm->i915, >> vma->obj, false); >> + >> + addr = i915_gem_object_pin_map(vma->obj, type); >> + } >> + >> if (IS_ERR(addr)) { >> ret = PTR_ERR(addr); >> goto err_ring; >> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c >> b/drivers/gpu/drm/i915/gt/selftest_context.c >> index b9bdd1d23243..26685b927169 100644 >> --- a/drivers/gpu/drm/i915/gt/selftest_context.c >> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c >> @@ -88,7 +88,8 @@ static int __live_context_size(struct >> intel_engine_cs *engine) >> goto err; >> >> vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, >> - i915_coherent_map_type(engine->i915)); >> + i915_coherent_map_type(engine->i915, >> + ce->state->obj, false)); >> if (IS_ERR(vaddr)) { >> err = PTR_ERR(vaddr); >> intel_context_unpin(ce); >> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >> index 746985971c3a..5b63d4df8c93 100644 >> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >> @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct >> intel_gt *gt) >> h->seqno = memset(vaddr, 0xff, PAGE_SIZE); >> >> vaddr = i915_gem_object_pin_map_unlocked(h->obj, >> - i915_coherent_map_type(gt->i915)); >> + i915_coherent_map_type(gt->i915, h->obj, false)); >> if (IS_ERR(vaddr)) { >> err = PTR_ERR(vaddr); >> goto err_unpin_hws; >> @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct >> intel_engine_cs *engine) >> return ERR_CAST(obj); >> } >> >> - vaddr = i915_gem_object_pin_map_unlocked(obj, >> i915_coherent_map_type(gt->i915)); >> + vaddr = i915_gem_object_pin_map_unlocked(obj, >> i915_coherent_map_type(gt->i915, obj, false)); >> if (IS_ERR(vaddr)) { >> i915_gem_object_put(obj); >> i915_vm_put(vm); >> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c >> b/drivers/gpu/drm/i915/gt/selftest_lrc.c >> index 85e7df6a5123..d8f6623524e8 100644 >> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c >> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c >> @@ -1221,7 +1221,9 @@ static int compare_isolation(struct >> intel_engine_cs *engine, >> } >> >> lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, >> - i915_coherent_map_type(engine->i915)); >> + i915_coherent_map_type(engine->i915, >> + ce->state->obj, >> + false)); >> if (IS_ERR(lrc)) { >> err = PTR_ERR(lrc); >> goto err_B1; >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >> index 78305b2ec89d..adae04c47aab 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >> @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct >> intel_guc *guc, u32 size, >> if (IS_ERR(vma)) >> return PTR_ERR(vma); >> >> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >> I915_MAP_WB); >> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >> + i915_coherent_map_type(guc_to_gt(guc)->i915, >> + vma->obj, true)); >> if (IS_ERR(vaddr)) { >> i915_vma_unpin_and_release(&vma, 0); >> return PTR_ERR(vaddr); >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >> index 2126dd81ac38..56d2144dc6a0 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >> @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct >> intel_huc *huc) >> if (IS_ERR(vma)) >> return PTR_ERR(vma); >> >> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >> I915_MAP_WB); >> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >> + i915_coherent_map_type(gt->i915, >> + vma->obj, true)); >> if (IS_ERR(vaddr)) { >> i915_vma_unpin_and_release(&vma, 0); >> return PTR_ERR(vaddr); >> diff --git a/drivers/gpu/drm/i915/i915_drv.h >> b/drivers/gpu/drm/i915/i915_drv.h >> index 69e43bf91a15..2abbc06712a4 100644 >> --- a/drivers/gpu/drm/i915/i915_drv.h >> +++ b/drivers/gpu/drm/i915/i915_drv.h >> @@ -78,6 +78,7 @@ >> #include "gem/i915_gem_context_types.h" >> #include "gem/i915_gem_shrinker.h" >> #include "gem/i915_gem_stolen.h" >> +#include "gem/i915_gem_lmem.h" >> >> #include "gt/intel_engine.h" >> #include "gt/intel_gt_types.h" >> @@ -1921,9 +1922,15 @@ static inline int >> intel_hws_csb_write_index(struct drm_i915_private *i915) >> } >> >> static inline enum i915_map_type >> -i915_coherent_map_type(struct drm_i915_private *i915) >> +i915_coherent_map_type(struct drm_i915_private *i915, >> + struct drm_i915_gem_object *obj, bool >> always_coherent) >> { >> - return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; >> + if (i915_gem_object_is_lmem(obj)) >> + return I915_MAP_WC; >> + if (HAS_LLC(i915) || always_coherent) >> + return I915_MAP_WB; >> + else >> + return I915_MAP_WC; > > Seems this patch is doing two things. > > First it is adding lmem support to this helper by always > returning WC > for lmem objects. > > Secondly it is introducing an idea of "always coherent" in a helper > called i915_coherent_map_type. Could someone explain what is > coherent vs > always coherent? > > And also, why is always coherent happy with WB? Sounds counter > intuitive > to me.

All this does is try to keep the existing behaviour intact, whilst also ensuring that all lmem objects are mapped using only WC, no matter what. The always_coherent=true thing is for the existing places where we sometimes map the object using WB, without first considering whether the device has the fast shared LLC vs snooping. Yes, it's slightly ugly :)

Not fully following - if we had to write kerneldoc for always_coherent input argument - what it would say?

@always_coherent - If true we should always try to map the object using WB. If false we should only map as WB if the device supports the fast shared LLC, in the case of snooped devices we will map use WC. Note that If the resource is lmem then we will always map as WC, regardless of the value of always_coherent, since that's all we currently support.

Maybe the naming is poor?

Maybe just confusing to me, not sure yet.

So always_coherent is not about how the callers wants to use it, but about platform knowledge? Or a performance concern for LLC vs snooping cases? Does WB works (coherently) on snooping platforms?

The always_coherent=true is for the existing callers that want WB, regardless of LLC vs snooping.

The other callers use the existing i915_coherent_map_type() which only gives out WB for LLC platforms.

AFAIK, LLC vs snooping should offer the same in terms of coherency, but in terms of performance the shared LLC is much faster, and so for snooping platforms we choose to not enable WB everywhere.

On top of that we now have lmem, but for that we only allow WC. This patch just rolls all of that into one helper, while keeping the existing behaviour unchanged.

Thanks. But I am still struggling with the API. :(

Is the introduction of always_coherent flag in the context of DG1 required even? AFAICT for lmem objects the flag is ignored so no?

If we drop the flag/helper thing, then we need something like:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type);

In all the places where we currently do:

vaddr = i915_gem_object_pin_map(obj, WB);

Where obj can be lmem, so ctx, ring, guc etc. Is that better or worse? The existing i915_coherent_map_type() callers should work as-is, since DG1 is snooped. And this patch just extends that to cover all cases.

Perhaps we need a new helper instead? Maybe you have a better idea?

...

Regards,

Tvrtko

Tvrtko Ursulin

3:01 p.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 19/04/2021 15:37, Matthew Auld wrote:

...

On 19/04/2021 15:07, Tvrtko Ursulin wrote:

...
On 19/04/2021 12:30, Matthew Auld wrote:

...
On 15/04/2021 12:05, Tvrtko Ursulin wrote:

...
On 15/04/2021 10:23, Matthew Auld wrote:

...
On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 14/04/2021 17:20, Matthew Auld wrote: > On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin > tvrtko.ursulin@linux.intel.com wrote: >> >> >> On 12/04/2021 10:05, Matthew Auld wrote: >>> From: Venkata Sandeep Dhanalakota >>> venkata.s.dhanalakota@intel.com >>> >>> Determine the possible coherent map type based on object location, >>> and if target has llc or if user requires an always coherent >>> mapping. >>> >>> Cc: Matthew Auld matthew.auld@intel.com >>> Cc: CQ Tang cq.tang@intel.com >>> Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com >>> Signed-off-by: Venkata Sandeep Dhanalakota >>> venkata.s.dhanalakota@intel.com >>> --- >>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- >>> drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- >>> drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- >>> drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- >>> drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- >>> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- >>> drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- >>> drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- >>> drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- >>> drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- >>> drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- >>> 11 files changed, 36 insertions(+), 16 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>> index efe935f80c1a..b79568d370f5 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>> @@ -664,7 +664,8 @@ static int init_status_page(struct >>> intel_engine_cs *engine) >>> if (ret) >>> goto err; >>> >>> - vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); >>> + vaddr = i915_gem_object_pin_map(obj, >>> + i915_coherent_map_type(engine->i915, obj, true)); >>> if (IS_ERR(vaddr)) { >>> ret = PTR_ERR(vaddr); >>> goto err_unpin; >>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>> b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>> index 7c9af86fdb1e..47f4397095e5 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>> @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct >>> intel_context *ce) >>> >>> if (ce->state) { >>> struct drm_i915_gem_object *obj = ce->state->obj; >>> - int type = i915_coherent_map_type(ce->engine->i915); >>> + int type = >>> i915_coherent_map_type(ce->engine->i915, obj, true); >>> void *map; >>> >>> if (!i915_gem_object_trylock(obj)) >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c >>> b/drivers/gpu/drm/i915/gt/intel_lrc.c >>> index e86897cde984..aafe2a4df496 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >>> @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, >>> GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); >>> >>> *vaddr = i915_gem_object_pin_map(ce->state->obj, >>> - i915_coherent_map_type(ce->engine->i915) | >>> + i915_coherent_map_type(ce->engine->i915, >>> + ce->state->obj, >>> + false) | >>> I915_MAP_OVERRIDE); >>> >>> return PTR_ERR_OR_ZERO(*vaddr); >>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c >>> b/drivers/gpu/drm/i915/gt/intel_ring.c >>> index aee0a77c77e0..3cf6c7e68108 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_ring.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_ring.c >>> @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, >>> struct i915_gem_ww_ctx *ww) >>> >>> if (i915_vma_is_map_and_fenceable(vma)) >>> addr = (void __force *)i915_vma_pin_iomap(vma); >>> - else >>> - addr = i915_gem_object_pin_map(vma->obj, >>> - i915_coherent_map_type(vma->vm->i915)); >>> + else { >>> + int type = i915_coherent_map_type(vma->vm->i915, >>> vma->obj, false); >>> + >>> + addr = i915_gem_object_pin_map(vma->obj, type); >>> + } >>> + >>> if (IS_ERR(addr)) { >>> ret = PTR_ERR(addr); >>> goto err_ring; >>> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c >>> b/drivers/gpu/drm/i915/gt/selftest_context.c >>> index b9bdd1d23243..26685b927169 100644 >>> --- a/drivers/gpu/drm/i915/gt/selftest_context.c >>> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c >>> @@ -88,7 +88,8 @@ static int __live_context_size(struct >>> intel_engine_cs *engine) >>> goto err; >>> >>> vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, >>> - i915_coherent_map_type(engine->i915)); >>> + i915_coherent_map_type(engine->i915, >>> + ce->state->obj, false)); >>> if (IS_ERR(vaddr)) { >>> err = PTR_ERR(vaddr); >>> intel_context_unpin(ce); >>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>> index 746985971c3a..5b63d4df8c93 100644 >>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>> @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct >>> intel_gt *gt) >>> h->seqno = memset(vaddr, 0xff, PAGE_SIZE); >>> >>> vaddr = i915_gem_object_pin_map_unlocked(h->obj, >>> - i915_coherent_map_type(gt->i915)); >>> + i915_coherent_map_type(gt->i915, h->obj, false)); >>> if (IS_ERR(vaddr)) { >>> err = PTR_ERR(vaddr); >>> goto err_unpin_hws; >>> @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct >>> intel_engine_cs *engine) >>> return ERR_CAST(obj); >>> } >>> >>> - vaddr = i915_gem_object_pin_map_unlocked(obj, >>> i915_coherent_map_type(gt->i915)); >>> + vaddr = i915_gem_object_pin_map_unlocked(obj, >>> i915_coherent_map_type(gt->i915, obj, false)); >>> if (IS_ERR(vaddr)) { >>> i915_gem_object_put(obj); >>> i915_vm_put(vm); >>> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>> b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>> index 85e7df6a5123..d8f6623524e8 100644 >>> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>> @@ -1221,7 +1221,9 @@ static int compare_isolation(struct >>> intel_engine_cs *engine, >>> } >>> >>> lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, >>> - i915_coherent_map_type(engine->i915)); >>> + i915_coherent_map_type(engine->i915, >>> + ce->state->obj, >>> + false)); >>> if (IS_ERR(lrc)) { >>> err = PTR_ERR(lrc); >>> goto err_B1; >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>> index 78305b2ec89d..adae04c47aab 100644 >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>> @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct >>> intel_guc *guc, u32 size, >>> if (IS_ERR(vma)) >>> return PTR_ERR(vma); >>> >>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>> I915_MAP_WB); >>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>> + i915_coherent_map_type(guc_to_gt(guc)->i915, >>> + vma->obj, true)); >>> if (IS_ERR(vaddr)) { >>> i915_vma_unpin_and_release(&vma, 0); >>> return PTR_ERR(vaddr); >>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>> index 2126dd81ac38..56d2144dc6a0 100644 >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>> @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct >>> intel_huc *huc) >>> if (IS_ERR(vma)) >>> return PTR_ERR(vma); >>> >>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>> I915_MAP_WB); >>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>> + i915_coherent_map_type(gt->i915, >>> + vma->obj, true)); >>> if (IS_ERR(vaddr)) { >>> i915_vma_unpin_and_release(&vma, 0); >>> return PTR_ERR(vaddr); >>> diff --git a/drivers/gpu/drm/i915/i915_drv.h >>> b/drivers/gpu/drm/i915/i915_drv.h >>> index 69e43bf91a15..2abbc06712a4 100644 >>> --- a/drivers/gpu/drm/i915/i915_drv.h >>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>> @@ -78,6 +78,7 @@ >>> #include "gem/i915_gem_context_types.h" >>> #include "gem/i915_gem_shrinker.h" >>> #include "gem/i915_gem_stolen.h" >>> +#include "gem/i915_gem_lmem.h" >>> >>> #include "gt/intel_engine.h" >>> #include "gt/intel_gt_types.h" >>> @@ -1921,9 +1922,15 @@ static inline int >>> intel_hws_csb_write_index(struct drm_i915_private *i915) >>> } >>> >>> static inline enum i915_map_type >>> -i915_coherent_map_type(struct drm_i915_private *i915) >>> +i915_coherent_map_type(struct drm_i915_private *i915, >>> + struct drm_i915_gem_object *obj, bool >>> always_coherent) >>> { >>> - return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; >>> + if (i915_gem_object_is_lmem(obj)) >>> + return I915_MAP_WC; >>> + if (HAS_LLC(i915) || always_coherent) >>> + return I915_MAP_WB; >>> + else >>> + return I915_MAP_WC; >> >> Seems this patch is doing two things. >> >> First it is adding lmem support to this helper by always >> returning WC >> for lmem objects. >> >> Secondly it is introducing an idea of "always coherent" in a helper >> called i915_coherent_map_type. Could someone explain what is >> coherent vs >> always coherent? >> >> And also, why is always coherent happy with WB? Sounds counter >> intuitive >> to me. > > All this does is try to keep the existing behaviour intact, whilst > also ensuring that all lmem objects are mapped using only WC, no > matter what. The always_coherent=true thing is for the existing > places > where we sometimes map the object using WB, without first > considering > whether the device has the fast shared LLC vs snooping. Yes, it's > slightly ugly :)

Not fully following - if we had to write kerneldoc for always_coherent input argument - what it would say?

@always_coherent - If true we should always try to map the object using WB. If false we should only map as WB if the device supports the fast shared LLC, in the case of snooped devices we will map use WC. Note that If the resource is lmem then we will always map as WC, regardless of the value of always_coherent, since that's all we currently support.

Maybe the naming is poor?

Maybe just confusing to me, not sure yet.

So always_coherent is not about how the callers wants to use it, but about platform knowledge? Or a performance concern for LLC vs snooping cases? Does WB works (coherently) on snooping platforms?

The always_coherent=true is for the existing callers that want WB, regardless of LLC vs snooping.

The other callers use the existing i915_coherent_map_type() which only gives out WB for LLC platforms.

AFAIK, LLC vs snooping should offer the same in terms of coherency, but in terms of performance the shared LLC is much faster, and so for snooping platforms we choose to not enable WB everywhere.

On top of that we now have lmem, but for that we only allow WC. This patch just rolls all of that into one helper, while keeping the existing behaviour unchanged.

Thanks. But I am still struggling with the API. :(

Is the introduction of always_coherent flag in the context of DG1 required even? AFAICT for lmem objects the flag is ignored so no?

If we drop the flag/helper thing, then we need something like:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type);

In all the places where we currently do:

vaddr = i915_gem_object_pin_map(obj, WB);

Where obj can be lmem, so ctx, ring, guc etc. Is that better or worse? The existing i915_coherent_map_type() callers should work as-is, since DG1 is snooped. And this patch just extends that to cover all cases.

Perhaps we need a new helper instead? Maybe you have a better idea?

Not yet. Would it make sense to put something in kerneldoc about when callers might choose always_coherent true vs false? In terms of expected usage (frequency, simplicity?) and any rules with regards when callers need to worry about flushing/ordering when there are mixed read and writes?

Regards,

Tvrtko

Matthew Auld

21 Apr 21 Apr

11:42 a.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 19/04/2021 16:01, Tvrtko Ursulin wrote:

...

On 19/04/2021 15:37, Matthew Auld wrote:

...
On 19/04/2021 15:07, Tvrtko Ursulin wrote:

...
On 19/04/2021 12:30, Matthew Auld wrote:

...
On 15/04/2021 12:05, Tvrtko Ursulin wrote:

...
On 15/04/2021 10:23, Matthew Auld wrote:

...
On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote: > > > On 14/04/2021 17:20, Matthew Auld wrote: >> On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin >> tvrtko.ursulin@linux.intel.com wrote: >>> >>> >>> On 12/04/2021 10:05, Matthew Auld wrote: >>>> From: Venkata Sandeep Dhanalakota >>>> venkata.s.dhanalakota@intel.com >>>> >>>> Determine the possible coherent map type based on object >>>> location, >>>> and if target has llc or if user requires an always coherent >>>> mapping. >>>> >>>> Cc: Matthew Auld matthew.auld@intel.com >>>> Cc: CQ Tang cq.tang@intel.com >>>> Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com >>>> Signed-off-by: Venkata Sandeep Dhanalakota >>>> venkata.s.dhanalakota@intel.com >>>> --- >>>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- >>>> drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- >>>> drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- >>>> drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- >>>> drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- >>>> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- >>>> drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- >>>> drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- >>>> drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- >>>> drivers/gpu/drm/i915/i915_drv.h | 11 +++++++++-- >>>> drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- >>>> 11 files changed, 36 insertions(+), 16 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>> index efe935f80c1a..b79568d370f5 100644 >>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>> @@ -664,7 +664,8 @@ static int init_status_page(struct >>>> intel_engine_cs *engine) >>>> if (ret) >>>> goto err; >>>> >>>> - vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); >>>> + vaddr = i915_gem_object_pin_map(obj, >>>> + i915_coherent_map_type(engine->i915, obj, true)); >>>> if (IS_ERR(vaddr)) { >>>> ret = PTR_ERR(vaddr); >>>> goto err_unpin; >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>> b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>> index 7c9af86fdb1e..47f4397095e5 100644 >>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>> @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct >>>> intel_context *ce) >>>> >>>> if (ce->state) { >>>> struct drm_i915_gem_object *obj = ce->state->obj; >>>> - int type = >>>> i915_coherent_map_type(ce->engine->i915); >>>> + int type = >>>> i915_coherent_map_type(ce->engine->i915, obj, true); >>>> void *map; >>>> >>>> if (!i915_gem_object_trylock(obj)) >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>> b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>> index e86897cde984..aafe2a4df496 100644 >>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>> @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, >>>> GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); >>>> >>>> *vaddr = i915_gem_object_pin_map(ce->state->obj, >>>> - i915_coherent_map_type(ce->engine->i915) | >>>> + i915_coherent_map_type(ce->engine->i915, >>>> + ce->state->obj, >>>> + false) | >>>> I915_MAP_OVERRIDE); >>>> >>>> return PTR_ERR_OR_ZERO(*vaddr); >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c >>>> b/drivers/gpu/drm/i915/gt/intel_ring.c >>>> index aee0a77c77e0..3cf6c7e68108 100644 >>>> --- a/drivers/gpu/drm/i915/gt/intel_ring.c >>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring.c >>>> @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring *ring, >>>> struct i915_gem_ww_ctx *ww) >>>> >>>> if (i915_vma_is_map_and_fenceable(vma)) >>>> addr = (void __force *)i915_vma_pin_iomap(vma); >>>> - else >>>> - addr = i915_gem_object_pin_map(vma->obj, >>>> - i915_coherent_map_type(vma->vm->i915)); >>>> + else { >>>> + int type = i915_coherent_map_type(vma->vm->i915, >>>> vma->obj, false); >>>> + >>>> + addr = i915_gem_object_pin_map(vma->obj, type); >>>> + } >>>> + >>>> if (IS_ERR(addr)) { >>>> ret = PTR_ERR(addr); >>>> goto err_ring; >>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c >>>> b/drivers/gpu/drm/i915/gt/selftest_context.c >>>> index b9bdd1d23243..26685b927169 100644 >>>> --- a/drivers/gpu/drm/i915/gt/selftest_context.c >>>> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c >>>> @@ -88,7 +88,8 @@ static int __live_context_size(struct >>>> intel_engine_cs *engine) >>>> goto err; >>>> >>>> vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>> - i915_coherent_map_type(engine->i915)); >>>> + i915_coherent_map_type(engine->i915, >>>> + ce->state->obj, false)); >>>> if (IS_ERR(vaddr)) { >>>> err = PTR_ERR(vaddr); >>>> intel_context_unpin(ce); >>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>> index 746985971c3a..5b63d4df8c93 100644 >>>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>> @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct >>>> intel_gt *gt) >>>> h->seqno = memset(vaddr, 0xff, PAGE_SIZE); >>>> >>>> vaddr = i915_gem_object_pin_map_unlocked(h->obj, >>>> - i915_coherent_map_type(gt->i915)); >>>> + i915_coherent_map_type(gt->i915, h->obj, false)); >>>> if (IS_ERR(vaddr)) { >>>> err = PTR_ERR(vaddr); >>>> goto err_unpin_hws; >>>> @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, struct >>>> intel_engine_cs *engine) >>>> return ERR_CAST(obj); >>>> } >>>> >>>> - vaddr = i915_gem_object_pin_map_unlocked(obj, >>>> i915_coherent_map_type(gt->i915)); >>>> + vaddr = i915_gem_object_pin_map_unlocked(obj, >>>> i915_coherent_map_type(gt->i915, obj, false)); >>>> if (IS_ERR(vaddr)) { >>>> i915_gem_object_put(obj); >>>> i915_vm_put(vm); >>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>> b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>> index 85e7df6a5123..d8f6623524e8 100644 >>>> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>> @@ -1221,7 +1221,9 @@ static int compare_isolation(struct >>>> intel_engine_cs *engine, >>>> } >>>> >>>> lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>> - i915_coherent_map_type(engine->i915)); >>>> + i915_coherent_map_type(engine->i915, >>>> + ce->state->obj, >>>> + false)); >>>> if (IS_ERR(lrc)) { >>>> err = PTR_ERR(lrc); >>>> goto err_B1; >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>> index 78305b2ec89d..adae04c47aab 100644 >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>> @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct >>>> intel_guc *guc, u32 size, >>>> if (IS_ERR(vma)) >>>> return PTR_ERR(vma); >>>> >>>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>> I915_MAP_WB); >>>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>> + i915_coherent_map_type(guc_to_gt(guc)->i915, >>>> + vma->obj, true)); >>>> if (IS_ERR(vaddr)) { >>>> i915_vma_unpin_and_release(&vma, 0); >>>> return PTR_ERR(vaddr); >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>> index 2126dd81ac38..56d2144dc6a0 100644 >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>> @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct >>>> intel_huc *huc) >>>> if (IS_ERR(vma)) >>>> return PTR_ERR(vma); >>>> >>>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>> I915_MAP_WB); >>>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>> + i915_coherent_map_type(gt->i915, >>>> + vma->obj, true)); >>>> if (IS_ERR(vaddr)) { >>>> i915_vma_unpin_and_release(&vma, 0); >>>> return PTR_ERR(vaddr); >>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h >>>> b/drivers/gpu/drm/i915/i915_drv.h >>>> index 69e43bf91a15..2abbc06712a4 100644 >>>> --- a/drivers/gpu/drm/i915/i915_drv.h >>>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>>> @@ -78,6 +78,7 @@ >>>> #include "gem/i915_gem_context_types.h" >>>> #include "gem/i915_gem_shrinker.h" >>>> #include "gem/i915_gem_stolen.h" >>>> +#include "gem/i915_gem_lmem.h" >>>> >>>> #include "gt/intel_engine.h" >>>> #include "gt/intel_gt_types.h" >>>> @@ -1921,9 +1922,15 @@ static inline int >>>> intel_hws_csb_write_index(struct drm_i915_private *i915) >>>> } >>>> >>>> static inline enum i915_map_type >>>> -i915_coherent_map_type(struct drm_i915_private *i915) >>>> +i915_coherent_map_type(struct drm_i915_private *i915, >>>> + struct drm_i915_gem_object *obj, bool >>>> always_coherent) >>>> { >>>> - return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; >>>> + if (i915_gem_object_is_lmem(obj)) >>>> + return I915_MAP_WC; >>>> + if (HAS_LLC(i915) || always_coherent) >>>> + return I915_MAP_WB; >>>> + else >>>> + return I915_MAP_WC; >>> >>> Seems this patch is doing two things. >>> >>> First it is adding lmem support to this helper by always >>> returning WC >>> for lmem objects. >>> >>> Secondly it is introducing an idea of "always coherent" in a >>> helper >>> called i915_coherent_map_type. Could someone explain what is >>> coherent vs >>> always coherent? >>> >>> And also, why is always coherent happy with WB? Sounds counter >>> intuitive >>> to me. >> >> All this does is try to keep the existing behaviour intact, whilst >> also ensuring that all lmem objects are mapped using only WC, no >> matter what. The always_coherent=true thing is for the existing >> places >> where we sometimes map the object using WB, without first >> considering >> whether the device has the fast shared LLC vs snooping. Yes, it's >> slightly ugly :) > > Not fully following - if we had to write kerneldoc for > always_coherent > input argument - what it would say?

@always_coherent - If true we should always try to map the object using WB. If false we should only map as WB if the device supports the fast shared LLC, in the case of snooped devices we will map use WC. Note that If the resource is lmem then we will always map as WC, regardless of the value of always_coherent, since that's all we currently support.

Maybe the naming is poor?

Maybe just confusing to me, not sure yet.

So always_coherent is not about how the callers wants to use it, but about platform knowledge? Or a performance concern for LLC vs snooping cases? Does WB works (coherently) on snooping platforms?

The always_coherent=true is for the existing callers that want WB, regardless of LLC vs snooping.

The other callers use the existing i915_coherent_map_type() which only gives out WB for LLC platforms.

AFAIK, LLC vs snooping should offer the same in terms of coherency, but in terms of performance the shared LLC is much faster, and so for snooping platforms we choose to not enable WB everywhere.

On top of that we now have lmem, but for that we only allow WC. This patch just rolls all of that into one helper, while keeping the existing behaviour unchanged.

Thanks. But I am still struggling with the API. :(

Is the introduction of always_coherent flag in the context of DG1 required even? AFAICT for lmem objects the flag is ignored so no?

If we drop the flag/helper thing, then we need something like:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type);

In all the places where we currently do:

vaddr = i915_gem_object_pin_map(obj, WB);

Where obj can be lmem, so ctx, ring, guc etc. Is that better or worse? The existing i915_coherent_map_type() callers should work as-is, since DG1 is snooped. And this patch just extends that to cover all cases.

Perhaps we need a new helper instead? Maybe you have a better idea?

Not yet. Would it make sense to put something in kerneldoc about when callers might choose always_coherent true vs false? In terms of expected usage (frequency, simplicity?) and any rules with regards when callers need to worry about flushing/ordering when there are mixed read and writes?

Hmmm, looking at this again, maybe for now we should just go with:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type)

Which is way less confusing, plus there are only a handful of places where we need this, so doesn't seem too bad?

Alternatively, we could wrap that in something like:

/* Returns WB for system memory, or WC for local memory */ void *i915_gem_object_pin_map_default(obj);

Thoughts?

...

Regards,

Tvrtko

Tvrtko Ursulin

3:41 p.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 21/04/2021 12:42, Matthew Auld wrote:

...

On 19/04/2021 16:01, Tvrtko Ursulin wrote:

...
On 19/04/2021 15:37, Matthew Auld wrote:

...
On 19/04/2021 15:07, Tvrtko Ursulin wrote:

...
On 19/04/2021 12:30, Matthew Auld wrote:

...
On 15/04/2021 12:05, Tvrtko Ursulin wrote:

...
On 15/04/2021 10:23, Matthew Auld wrote: > On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin > tvrtko.ursulin@linux.intel.com wrote: >> >> >> On 14/04/2021 17:20, Matthew Auld wrote: >>> On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin >>> tvrtko.ursulin@linux.intel.com wrote: >>>> >>>> >>>> On 12/04/2021 10:05, Matthew Auld wrote: >>>>> From: Venkata Sandeep Dhanalakota >>>>> venkata.s.dhanalakota@intel.com >>>>> >>>>> Determine the possible coherent map type based on object >>>>> location, >>>>> and if target has llc or if user requires an always coherent >>>>> mapping. >>>>> >>>>> Cc: Matthew Auld matthew.auld@intel.com >>>>> Cc: CQ Tang cq.tang@intel.com >>>>> Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com >>>>> Signed-off-by: Venkata Sandeep Dhanalakota >>>>> venkata.s.dhanalakota@intel.com >>>>> --- >>>>>     drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 3 ++- >>>>>     drivers/gpu/drm/i915/gt/intel_engine_pm.c    | 2 +- >>>>>     drivers/gpu/drm/i915/gt/intel_lrc.c          | 4 +++- >>>>>     drivers/gpu/drm/i915/gt/intel_ring.c         | 9 ++++++--- >>>>>     drivers/gpu/drm/i915/gt/selftest_context.c   | 3 ++- >>>>>     drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- >>>>>     drivers/gpu/drm/i915/gt/selftest_lrc.c       | 4 +++- >>>>>     drivers/gpu/drm/i915/gt/uc/intel_guc.c       | 4 +++- >>>>>     drivers/gpu/drm/i915/gt/uc/intel_huc.c       | 4 +++- >>>>>     drivers/gpu/drm/i915/i915_drv.h              | 11 >>>>> +++++++++-- >>>>>     drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- >>>>>     11 files changed, 36 insertions(+), 16 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>> index efe935f80c1a..b79568d370f5 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>> @@ -664,7 +664,8 @@ static int init_status_page(struct >>>>> intel_engine_cs *engine) >>>>>         if (ret) >>>>>                 goto err; >>>>> >>>>> -     vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); >>>>> +     vaddr = i915_gem_object_pin_map(obj, >>>>> + i915_coherent_map_type(engine->i915, obj, true)); >>>>>         if (IS_ERR(vaddr)) { >>>>>                 ret = PTR_ERR(vaddr); >>>>>                 goto err_unpin; >>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>> b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>> index 7c9af86fdb1e..47f4397095e5 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>> @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct >>>>> intel_context *ce) >>>>> >>>>>         if (ce->state) { >>>>>                 struct drm_i915_gem_object *obj = >>>>> ce->state->obj; >>>>> -             int type = >>>>> i915_coherent_map_type(ce->engine->i915); >>>>> +             int type = >>>>> i915_coherent_map_type(ce->engine->i915, obj, true); >>>>>                 void *map; >>>>> >>>>>                 if (!i915_gem_object_trylock(obj)) >>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>> b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>> index e86897cde984..aafe2a4df496 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>> @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, >>>>>         GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); >>>>> >>>>>         *vaddr = i915_gem_object_pin_map(ce->state->obj, >>>>> - i915_coherent_map_type(ce->engine->i915) | >>>>> + i915_coherent_map_type(ce->engine->i915, >>>>> + ce->state->obj, >>>>> + false) | >>>>>                                          I915_MAP_OVERRIDE); >>>>> >>>>>         return PTR_ERR_OR_ZERO(*vaddr); >>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c >>>>> b/drivers/gpu/drm/i915/gt/intel_ring.c >>>>> index aee0a77c77e0..3cf6c7e68108 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/intel_ring.c >>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring.c >>>>> @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring >>>>> *ring, struct i915_gem_ww_ctx *ww) >>>>> >>>>>         if (i915_vma_is_map_and_fenceable(vma)) >>>>>                 addr = (void __force *)i915_vma_pin_iomap(vma); >>>>> -     else >>>>> -             addr = i915_gem_object_pin_map(vma->obj, >>>>> - i915_coherent_map_type(vma->vm->i915)); >>>>> +     else { >>>>> +             int type = >>>>> i915_coherent_map_type(vma->vm->i915, vma->obj, false); >>>>> + >>>>> +             addr = i915_gem_object_pin_map(vma->obj, type); >>>>> +     } >>>>> + >>>>>         if (IS_ERR(addr)) { >>>>>                 ret = PTR_ERR(addr); >>>>>                 goto err_ring; >>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c >>>>> b/drivers/gpu/drm/i915/gt/selftest_context.c >>>>> index b9bdd1d23243..26685b927169 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/selftest_context.c >>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c >>>>> @@ -88,7 +88,8 @@ static int __live_context_size(struct >>>>> intel_engine_cs *engine) >>>>>                 goto err; >>>>> >>>>>         vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>>> - i915_coherent_map_type(engine->i915)); >>>>> + i915_coherent_map_type(engine->i915, >>>>> + ce->state->obj, false)); >>>>>         if (IS_ERR(vaddr)) { >>>>>                 err = PTR_ERR(vaddr); >>>>>                 intel_context_unpin(ce); >>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>> index 746985971c3a..5b63d4df8c93 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>> @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct >>>>> intel_gt *gt) >>>>>         h->seqno = memset(vaddr, 0xff, PAGE_SIZE); >>>>> >>>>>         vaddr = i915_gem_object_pin_map_unlocked(h->obj, >>>>> - i915_coherent_map_type(gt->i915)); >>>>> + i915_coherent_map_type(gt->i915, h->obj, false)); >>>>>         if (IS_ERR(vaddr)) { >>>>>                 err = PTR_ERR(vaddr); >>>>>                 goto err_unpin_hws; >>>>> @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, >>>>> struct intel_engine_cs *engine) >>>>>                 return ERR_CAST(obj); >>>>>         } >>>>> >>>>> -     vaddr = i915_gem_object_pin_map_unlocked(obj, >>>>> i915_coherent_map_type(gt->i915)); >>>>> +     vaddr = i915_gem_object_pin_map_unlocked(obj, >>>>> i915_coherent_map_type(gt->i915, obj, false)); >>>>>         if (IS_ERR(vaddr)) { >>>>>                 i915_gem_object_put(obj); >>>>>                 i915_vm_put(vm); >>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>> b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>> index 85e7df6a5123..d8f6623524e8 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>> @@ -1221,7 +1221,9 @@ static int compare_isolation(struct >>>>> intel_engine_cs *engine, >>>>>         } >>>>> >>>>>         lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>>> - i915_coherent_map_type(engine->i915)); >>>>> + i915_coherent_map_type(engine->i915, >>>>> + ce->state->obj, >>>>> + false)); >>>>>         if (IS_ERR(lrc)) { >>>>>                 err = PTR_ERR(lrc); >>>>>                 goto err_B1; >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>> index 78305b2ec89d..adae04c47aab 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>> @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct >>>>> intel_guc *guc, u32 size, >>>>>         if (IS_ERR(vma)) >>>>>                 return PTR_ERR(vma); >>>>> >>>>> -     vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>> I915_MAP_WB); >>>>> +     vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>> + i915_coherent_map_type(guc_to_gt(guc)->i915, >>>>> + vma->obj, true)); >>>>>         if (IS_ERR(vaddr)) { >>>>>                 i915_vma_unpin_and_release(&vma, 0); >>>>>                 return PTR_ERR(vaddr); >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>> index 2126dd81ac38..56d2144dc6a0 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>> @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct >>>>> intel_huc *huc) >>>>>         if (IS_ERR(vma)) >>>>>                 return PTR_ERR(vma); >>>>> >>>>> -     vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>> I915_MAP_WB); >>>>> +     vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>> + i915_coherent_map_type(gt->i915, >>>>> + vma->obj, true)); >>>>>         if (IS_ERR(vaddr)) { >>>>>                 i915_vma_unpin_and_release(&vma, 0); >>>>>                 return PTR_ERR(vaddr); >>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h >>>>> b/drivers/gpu/drm/i915/i915_drv.h >>>>> index 69e43bf91a15..2abbc06712a4 100644 >>>>> --- a/drivers/gpu/drm/i915/i915_drv.h >>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>>>> @@ -78,6 +78,7 @@ >>>>>     #include "gem/i915_gem_context_types.h" >>>>>     #include "gem/i915_gem_shrinker.h" >>>>>     #include "gem/i915_gem_stolen.h" >>>>> +#include "gem/i915_gem_lmem.h" >>>>> >>>>>     #include "gt/intel_engine.h" >>>>>     #include "gt/intel_gt_types.h" >>>>> @@ -1921,9 +1922,15 @@ static inline int >>>>> intel_hws_csb_write_index(struct drm_i915_private *i915) >>>>>     } >>>>> >>>>>     static inline enum i915_map_type >>>>> -i915_coherent_map_type(struct drm_i915_private *i915) >>>>> +i915_coherent_map_type(struct drm_i915_private *i915, >>>>> +                    struct drm_i915_gem_object *obj, bool >>>>> always_coherent) >>>>>     { >>>>> -     return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; >>>>> +     if (i915_gem_object_is_lmem(obj)) >>>>> +             return I915_MAP_WC; >>>>> +     if (HAS_LLC(i915) || always_coherent) >>>>> +             return I915_MAP_WB; >>>>> +     else >>>>> +             return I915_MAP_WC; >>>> >>>> Seems this patch is doing two things. >>>> >>>> First it is adding lmem support to this helper by always >>>> returning WC >>>> for lmem objects. >>>> >>>> Secondly it is introducing an idea of "always coherent" in a >>>> helper >>>> called i915_coherent_map_type. Could someone explain what is >>>> coherent vs >>>> always coherent? >>>> >>>> And also, why is always coherent happy with WB? Sounds counter >>>> intuitive >>>> to me. >>> >>> All this does is try to keep the existing behaviour intact, whilst >>> also ensuring that all lmem objects are mapped using only WC, no >>> matter what. The always_coherent=true thing is for the existing >>> places >>> where we sometimes map the object using WB, without first >>> considering >>> whether the device has the fast shared LLC vs snooping. Yes, it's >>> slightly ugly :) >> >> Not fully following - if we had to write kerneldoc for >> always_coherent >> input argument - what it would say? > > @always_coherent - If true we should always try to map the object > using WB. If false we should only map as WB if the device > supports the > fast shared LLC, in the case of snooped devices we will map use WC. > Note that If the resource is lmem then we will always map as WC, > regardless of the value of always_coherent, since that's all we > currently support. > > Maybe the naming is poor?

Maybe just confusing to me, not sure yet.

So always_coherent is not about how the callers wants to use it, but about platform knowledge? Or a performance concern for LLC vs snooping cases? Does WB works (coherently) on snooping platforms?

The always_coherent=true is for the existing callers that want WB, regardless of LLC vs snooping.

The other callers use the existing i915_coherent_map_type() which only gives out WB for LLC platforms.

AFAIK, LLC vs snooping should offer the same in terms of coherency, but in terms of performance the shared LLC is much faster, and so for snooping platforms we choose to not enable WB everywhere.

On top of that we now have lmem, but for that we only allow WC. This patch just rolls all of that into one helper, while keeping the existing behaviour unchanged.

Thanks. But I am still struggling with the API. :(

Is the introduction of always_coherent flag in the context of DG1 required even? AFAICT for lmem objects the flag is ignored so no?

If we drop the flag/helper thing, then we need something like:

type = WB; if (i915_gem_object_is_lmem(obj))      type = WC;

vaddr = i915_gem_object_pin_map(obj, type);

In all the places where we currently do:

vaddr = i915_gem_object_pin_map(obj, WB);

Where obj can be lmem, so ctx, ring, guc etc. Is that better or worse? The existing i915_coherent_map_type() callers should work as-is, since DG1 is snooped. And this patch just extends that to cover all cases.

Perhaps we need a new helper instead? Maybe you have a better idea?

Not yet. Would it make sense to put something in kerneldoc about when callers might choose always_coherent true vs false? In terms of expected usage (frequency, simplicity?) and any rules with regards when callers need to worry about flushing/ordering when there are mixed read and writes?

Hmmm, looking at this again, maybe for now we should just go with:

type = WB; if (i915_gem_object_is_lmem(obj))       type = WC;

vaddr = i915_gem_object_pin_map(obj, type)

Which is way less confusing, plus there are only a handful of places where we need this, so doesn't seem too bad?

Alternatively, we could wrap that in something like:

/* Returns WB for system memory, or WC for local memory */ void *i915_gem_object_pin_map_default(obj);

Thoughts?

I went and looked at the use sites to try and figure it out.

First thing, the bool always_coherent story is only relevant when we decide to place some object in system memory. Otherwise mapping is always WC so I guess our code needs to handle it anyway. Well, if the assumption is that we can change the location of the objects and it all just keeps working? Or that is not the goal?

Let see about the users (ignoring selftests):

1) lrc_reg_state and ring; always_coherent=false

Update frequency medium and mostly write from the CPU side.

They say always_coherent=false - which means they have to handle being given a WC mapping anyway.

What is the benefit of ever selecting WB here?

2) Engine status page; always_coherent=true

Frequently read and written from the CPU and GPU so cost of snooping is therefore fine? Apart from having to be ready to deal with WC anyway.

3) dbg_poison_ce; always_coherent=true

Writes to lrc_reg_state once - meh. Could just as well always ask for WC.

4) intel_guc_allocate_and_map_vma; always_coherent=true

This one has three users:

a) guc_stage_desc_pool_create stage_desc_pool_vaddr

This one seems write once at init.

b) intel_guc_ct_init

Use for CT communication so similar to CSB on engine status page in principle. But code also has to deal with WC when object is in lmem.

c) intel_guc_ads_create

CPU appears to only write on init and GPU reset.

5) intel_huc_rsa_data_create; always_coheret=true

Called from intel_huc_init so it appears write once from CPU. Not sure why it would need a coherent mapping if that is correct.

I think this exercise left me equally confused. Because flushing and read-write ordering rules are different between WB and WC. And code which accesses all these mappings either has to know which one is in use, or does not care. For the latter case we have to be sure about for every path.

The write on init / reset ones are easy enough and it doesn't really matter for them to use the coherent helper.

Lrc_reg_state as well I think can be WC with explicit flushing - it has to on lmem, no?

This leaves the status page (CSB, etc) and GuC CT. Those are frequent R/W but also code has to be able to handle WC so what is the benefit of WB? It ends up faster than if it was WC, considering explicit flushes/barriers are still in there?

Regards,

Tvrtko

Matthew Auld

7:13 p.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On Wed, 21 Apr 2021 at 16:41, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...

On 21/04/2021 12:42, Matthew Auld wrote:

...
On 19/04/2021 16:01, Tvrtko Ursulin wrote:

...
On 19/04/2021 15:37, Matthew Auld wrote:

...
On 19/04/2021 15:07, Tvrtko Ursulin wrote:

...
On 19/04/2021 12:30, Matthew Auld wrote:

...
On 15/04/2021 12:05, Tvrtko Ursulin wrote: > > On 15/04/2021 10:23, Matthew Auld wrote: >> On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin >> tvrtko.ursulin@linux.intel.com wrote: >>> >>> >>> On 14/04/2021 17:20, Matthew Auld wrote: >>>> On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin >>>> tvrtko.ursulin@linux.intel.com wrote: >>>>> >>>>> >>>>> On 12/04/2021 10:05, Matthew Auld wrote: >>>>>> From: Venkata Sandeep Dhanalakota >>>>>> venkata.s.dhanalakota@intel.com >>>>>> >>>>>> Determine the possible coherent map type based on object >>>>>> location, >>>>>> and if target has llc or if user requires an always coherent >>>>>> mapping. >>>>>> >>>>>> Cc: Matthew Auld matthew.auld@intel.com >>>>>> Cc: CQ Tang cq.tang@intel.com >>>>>> Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com >>>>>> Signed-off-by: Venkata Sandeep Dhanalakota >>>>>> venkata.s.dhanalakota@intel.com >>>>>> --- >>>>>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- >>>>>> drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- >>>>>> drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- >>>>>> drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- >>>>>> drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- >>>>>> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- >>>>>> drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- >>>>>> drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- >>>>>> drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- >>>>>> drivers/gpu/drm/i915/i915_drv.h | 11 >>>>>> +++++++++-- >>>>>> drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- >>>>>> 11 files changed, 36 insertions(+), 16 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>> index efe935f80c1a..b79568d370f5 100644 >>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>> @@ -664,7 +664,8 @@ static int init_status_page(struct >>>>>> intel_engine_cs *engine) >>>>>> if (ret) >>>>>> goto err; >>>>>> >>>>>> - vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); >>>>>> + vaddr = i915_gem_object_pin_map(obj, >>>>>> + i915_coherent_map_type(engine->i915, obj, true)); >>>>>> if (IS_ERR(vaddr)) { >>>>>> ret = PTR_ERR(vaddr); >>>>>> goto err_unpin; >>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>> index 7c9af86fdb1e..47f4397095e5 100644 >>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>> @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct >>>>>> intel_context *ce) >>>>>> >>>>>> if (ce->state) { >>>>>> struct drm_i915_gem_object *obj = >>>>>> ce->state->obj; >>>>>> - int type = >>>>>> i915_coherent_map_type(ce->engine->i915); >>>>>> + int type = >>>>>> i915_coherent_map_type(ce->engine->i915, obj, true); >>>>>> void *map; >>>>>> >>>>>> if (!i915_gem_object_trylock(obj)) >>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>> b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>> index e86897cde984..aafe2a4df496 100644 >>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>> @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, >>>>>> GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); >>>>>> >>>>>> *vaddr = i915_gem_object_pin_map(ce->state->obj, >>>>>> - i915_coherent_map_type(ce->engine->i915) | >>>>>> + i915_coherent_map_type(ce->engine->i915, >>>>>> + ce->state->obj, >>>>>> + false) | >>>>>> I915_MAP_OVERRIDE); >>>>>> >>>>>> return PTR_ERR_OR_ZERO(*vaddr); >>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>> b/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>> index aee0a77c77e0..3cf6c7e68108 100644 >>>>>> --- a/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>> @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring >>>>>> *ring, struct i915_gem_ww_ctx *ww) >>>>>> >>>>>> if (i915_vma_is_map_and_fenceable(vma)) >>>>>> addr = (void __force *)i915_vma_pin_iomap(vma); >>>>>> - else >>>>>> - addr = i915_gem_object_pin_map(vma->obj, >>>>>> - i915_coherent_map_type(vma->vm->i915)); >>>>>> + else { >>>>>> + int type = >>>>>> i915_coherent_map_type(vma->vm->i915, vma->obj, false); >>>>>> + >>>>>> + addr = i915_gem_object_pin_map(vma->obj, type); >>>>>> + } >>>>>> + >>>>>> if (IS_ERR(addr)) { >>>>>> ret = PTR_ERR(addr); >>>>>> goto err_ring; >>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>> b/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>> index b9bdd1d23243..26685b927169 100644 >>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>> @@ -88,7 +88,8 @@ static int __live_context_size(struct >>>>>> intel_engine_cs *engine) >>>>>> goto err; >>>>>> >>>>>> vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>>>> - i915_coherent_map_type(engine->i915)); >>>>>> + i915_coherent_map_type(engine->i915, >>>>>> + ce->state->obj, false)); >>>>>> if (IS_ERR(vaddr)) { >>>>>> err = PTR_ERR(vaddr); >>>>>> intel_context_unpin(ce); >>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>> index 746985971c3a..5b63d4df8c93 100644 >>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>> @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct >>>>>> intel_gt *gt) >>>>>> h->seqno = memset(vaddr, 0xff, PAGE_SIZE); >>>>>> >>>>>> vaddr = i915_gem_object_pin_map_unlocked(h->obj, >>>>>> - i915_coherent_map_type(gt->i915)); >>>>>> + i915_coherent_map_type(gt->i915, h->obj, false)); >>>>>> if (IS_ERR(vaddr)) { >>>>>> err = PTR_ERR(vaddr); >>>>>> goto err_unpin_hws; >>>>>> @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, >>>>>> struct intel_engine_cs *engine) >>>>>> return ERR_CAST(obj); >>>>>> } >>>>>> >>>>>> - vaddr = i915_gem_object_pin_map_unlocked(obj, >>>>>> i915_coherent_map_type(gt->i915)); >>>>>> + vaddr = i915_gem_object_pin_map_unlocked(obj, >>>>>> i915_coherent_map_type(gt->i915, obj, false)); >>>>>> if (IS_ERR(vaddr)) { >>>>>> i915_gem_object_put(obj); >>>>>> i915_vm_put(vm); >>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>> b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>> index 85e7df6a5123..d8f6623524e8 100644 >>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>> @@ -1221,7 +1221,9 @@ static int compare_isolation(struct >>>>>> intel_engine_cs *engine, >>>>>> } >>>>>> >>>>>> lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>>>> - i915_coherent_map_type(engine->i915)); >>>>>> + i915_coherent_map_type(engine->i915, >>>>>> + ce->state->obj, >>>>>> + false)); >>>>>> if (IS_ERR(lrc)) { >>>>>> err = PTR_ERR(lrc); >>>>>> goto err_B1; >>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>> index 78305b2ec89d..adae04c47aab 100644 >>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>> @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct >>>>>> intel_guc *guc, u32 size, >>>>>> if (IS_ERR(vma)) >>>>>> return PTR_ERR(vma); >>>>>> >>>>>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>> I915_MAP_WB); >>>>>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>> + i915_coherent_map_type(guc_to_gt(guc)->i915, >>>>>> + vma->obj, true)); >>>>>> if (IS_ERR(vaddr)) { >>>>>> i915_vma_unpin_and_release(&vma, 0); >>>>>> return PTR_ERR(vaddr); >>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>> index 2126dd81ac38..56d2144dc6a0 100644 >>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>> @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct >>>>>> intel_huc *huc) >>>>>> if (IS_ERR(vma)) >>>>>> return PTR_ERR(vma); >>>>>> >>>>>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>> I915_MAP_WB); >>>>>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>> + i915_coherent_map_type(gt->i915, >>>>>> + vma->obj, true)); >>>>>> if (IS_ERR(vaddr)) { >>>>>> i915_vma_unpin_and_release(&vma, 0); >>>>>> return PTR_ERR(vaddr); >>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h >>>>>> b/drivers/gpu/drm/i915/i915_drv.h >>>>>> index 69e43bf91a15..2abbc06712a4 100644 >>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h >>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>>>>> @@ -78,6 +78,7 @@ >>>>>> #include "gem/i915_gem_context_types.h" >>>>>> #include "gem/i915_gem_shrinker.h" >>>>>> #include "gem/i915_gem_stolen.h" >>>>>> +#include "gem/i915_gem_lmem.h" >>>>>> >>>>>> #include "gt/intel_engine.h" >>>>>> #include "gt/intel_gt_types.h" >>>>>> @@ -1921,9 +1922,15 @@ static inline int >>>>>> intel_hws_csb_write_index(struct drm_i915_private *i915) >>>>>> } >>>>>> >>>>>> static inline enum i915_map_type >>>>>> -i915_coherent_map_type(struct drm_i915_private *i915) >>>>>> +i915_coherent_map_type(struct drm_i915_private *i915, >>>>>> + struct drm_i915_gem_object *obj, bool >>>>>> always_coherent) >>>>>> { >>>>>> - return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; >>>>>> + if (i915_gem_object_is_lmem(obj)) >>>>>> + return I915_MAP_WC; >>>>>> + if (HAS_LLC(i915) || always_coherent) >>>>>> + return I915_MAP_WB; >>>>>> + else >>>>>> + return I915_MAP_WC; >>>>> >>>>> Seems this patch is doing two things. >>>>> >>>>> First it is adding lmem support to this helper by always >>>>> returning WC >>>>> for lmem objects. >>>>> >>>>> Secondly it is introducing an idea of "always coherent" in a >>>>> helper >>>>> called i915_coherent_map_type. Could someone explain what is >>>>> coherent vs >>>>> always coherent? >>>>> >>>>> And also, why is always coherent happy with WB? Sounds counter >>>>> intuitive >>>>> to me. >>>> >>>> All this does is try to keep the existing behaviour intact, whilst >>>> also ensuring that all lmem objects are mapped using only WC, no >>>> matter what. The always_coherent=true thing is for the existing >>>> places >>>> where we sometimes map the object using WB, without first >>>> considering >>>> whether the device has the fast shared LLC vs snooping. Yes, it's >>>> slightly ugly :) >>> >>> Not fully following - if we had to write kerneldoc for >>> always_coherent >>> input argument - what it would say? >> >> @always_coherent - If true we should always try to map the object >> using WB. If false we should only map as WB if the device >> supports the >> fast shared LLC, in the case of snooped devices we will map use WC. >> Note that If the resource is lmem then we will always map as WC, >> regardless of the value of always_coherent, since that's all we >> currently support. >> >> Maybe the naming is poor? > > Maybe just confusing to me, not sure yet. > > So always_coherent is not about how the callers wants to use it, > but about platform knowledge? Or a performance concern for LLC vs > snooping cases? Does WB works (coherently) on snooping platforms?

The always_coherent=true is for the existing callers that want WB, regardless of LLC vs snooping.

The other callers use the existing i915_coherent_map_type() which only gives out WB for LLC platforms.

AFAIK, LLC vs snooping should offer the same in terms of coherency, but in terms of performance the shared LLC is much faster, and so for snooping platforms we choose to not enable WB everywhere.

On top of that we now have lmem, but for that we only allow WC. This patch just rolls all of that into one helper, while keeping the existing behaviour unchanged.

Thanks. But I am still struggling with the API. :(

Is the introduction of always_coherent flag in the context of DG1 required even? AFAICT for lmem objects the flag is ignored so no?

If we drop the flag/helper thing, then we need something like:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type);

In all the places where we currently do:

vaddr = i915_gem_object_pin_map(obj, WB);

Where obj can be lmem, so ctx, ring, guc etc. Is that better or worse? The existing i915_coherent_map_type() callers should work as-is, since DG1 is snooped. And this patch just extends that to cover all cases.

Perhaps we need a new helper instead? Maybe you have a better idea?

Not yet. Would it make sense to put something in kerneldoc about when callers might choose always_coherent true vs false? In terms of expected usage (frequency, simplicity?) and any rules with regards when callers need to worry about flushing/ordering when there are mixed read and writes?

Hmmm, looking at this again, maybe for now we should just go with:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type)

Which is way less confusing, plus there are only a handful of places where we need this, so doesn't seem too bad?

Alternatively, we could wrap that in something like:

/* Returns WB for system memory, or WC for local memory */ void *i915_gem_object_pin_map_default(obj);

Thoughts?

I went and looked at the use sites to try and figure it out.

First thing, the bool always_coherent story is only relevant when we decide to place some object in system memory. Otherwise mapping is always WC so I guess our code needs to handle it anyway. Well, if the assumption is that we can change the location of the objects and it all just keeps working? Or that is not the goal?

I guess your concern is that mapping as WC has different semantics, and that might somehow break the caller?

...

Let see about the users (ignoring selftests):

lrc_reg_state and ring; always_coherent=false

Update frequency medium and mostly write from the CPU side.

They say always_coherent=false - which means they have to handle being given a WC mapping anyway.

What is the benefit of ever selecting WB here?

Engine status page; always_coherent=true

Frequently read and written from the CPU and GPU so cost of snooping is therefore fine? Apart from having to be ready to deal with WC anyway.

dbg_poison_ce; always_coherent=true

Writes to lrc_reg_state once - meh. Could just as well always ask for WC.

intel_guc_allocate_and_map_vma; always_coherent=true

This one has three users:

a) guc_stage_desc_pool_create stage_desc_pool_vaddr

This one seems write once at init.

b) intel_guc_ct_init

Use for CT communication so similar to CSB on engine status page in principle. But code also has to deal with WC when object is in lmem.

c) intel_guc_ads_create

CPU appears to only write on init and GPU reset.

intel_huc_rsa_data_create; always_coheret=true

Called from intel_huc_init so it appears write once from CPU. Not sure why it would need a coherent mapping if that is correct.

I think this exercise left me equally confused. Because flushing and read-write ordering rules are different between WB and WC. And code which accesses all these mappings either has to know which one is in use, or does not care. For the latter case we have to be sure about for every path.

Users of pin_map() are generally meant to call flush_map() where appropriate, which should do the right thing for us. For WC it only needs to flush the wcb. For WB it's more complicated since that depends on if the object is considered coherent or not, if it is then we don't need to do anything, otherwise we need to clflush.

Also note that we If we just map the buffer as WB, that by itself doesn't magically enable snooping for the pages AFAIK. We still have to tell the GPU that these pages are meant to be coherent, which we always do for LLC platforms I think, since the shared LLC is considered fast, whereas on snooping platforms, we don't enable this by default, and have this as CACHE_NONE instead(see shmem_object_init for example), and incur the cost of additional clflushing. Doing an explicit i915_gem_object_set_coherency(I915_CACHE_LLC) I think will mark the object as coherent for us. I think there are also some matching GTT bits for caching.

Also for DG1 you apparently can't disable snooping, as per what Daniel was saying in another thread.

...

The write on init / reset ones are easy enough and it doesn't really matter for them to use the coherent helper.

Lrc_reg_state as well I think can be WC with explicit flushing - it has to on lmem, no?

I doubt it has to be, since the GPU still just accesses it through the GTT.

...

This leaves the status page (CSB, etc) and GuC CT. Those are frequent R/W but also code has to be able to handle WC so what is the benefit of WB? It ends up faster than if it was WC, considering explicit flushes/barriers are still in there?

No idea for GuC, but for the hwsp it's still in system memory, and is WB, even for discrete. Chris measured this to be more performant with our execlists submission path than say just sticking it in lmem, and mapping it as WC.

...

Regards,

Tvrtko

Matthew Auld

26 Apr 26 Apr

8:57 a.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On Wed, 21 Apr 2021 at 20:13, Matthew Auld matthew.william.auld@gmail.com wrote:

...

On Wed, 21 Apr 2021 at 16:41, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 21/04/2021 12:42, Matthew Auld wrote:

...
On 19/04/2021 16:01, Tvrtko Ursulin wrote:

...
On 19/04/2021 15:37, Matthew Auld wrote:

...
On 19/04/2021 15:07, Tvrtko Ursulin wrote:

...
On 19/04/2021 12:30, Matthew Auld wrote: > On 15/04/2021 12:05, Tvrtko Ursulin wrote: >> >> On 15/04/2021 10:23, Matthew Auld wrote: >>> On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin >>> tvrtko.ursulin@linux.intel.com wrote: >>>> >>>> >>>> On 14/04/2021 17:20, Matthew Auld wrote: >>>>> On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin >>>>> tvrtko.ursulin@linux.intel.com wrote: >>>>>> >>>>>> >>>>>> On 12/04/2021 10:05, Matthew Auld wrote: >>>>>>> From: Venkata Sandeep Dhanalakota >>>>>>> venkata.s.dhanalakota@intel.com >>>>>>> >>>>>>> Determine the possible coherent map type based on object >>>>>>> location, >>>>>>> and if target has llc or if user requires an always coherent >>>>>>> mapping. >>>>>>> >>>>>>> Cc: Matthew Auld matthew.auld@intel.com >>>>>>> Cc: CQ Tang cq.tang@intel.com >>>>>>> Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com >>>>>>> Signed-off-by: Venkata Sandeep Dhanalakota >>>>>>> venkata.s.dhanalakota@intel.com >>>>>>> --- >>>>>>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- >>>>>>> drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- >>>>>>> drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- >>>>>>> drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- >>>>>>> drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- >>>>>>> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- >>>>>>> drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- >>>>>>> drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- >>>>>>> drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- >>>>>>> drivers/gpu/drm/i915/i915_drv.h | 11 >>>>>>> +++++++++-- >>>>>>> drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- >>>>>>> 11 files changed, 36 insertions(+), 16 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>>> index efe935f80c1a..b79568d370f5 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>>> @@ -664,7 +664,8 @@ static int init_status_page(struct >>>>>>> intel_engine_cs *engine) >>>>>>> if (ret) >>>>>>> goto err; >>>>>>> >>>>>>> - vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); >>>>>>> + vaddr = i915_gem_object_pin_map(obj, >>>>>>> + i915_coherent_map_type(engine->i915, obj, true)); >>>>>>> if (IS_ERR(vaddr)) { >>>>>>> ret = PTR_ERR(vaddr); >>>>>>> goto err_unpin; >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>>> index 7c9af86fdb1e..47f4397095e5 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>>> @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct >>>>>>> intel_context *ce) >>>>>>> >>>>>>> if (ce->state) { >>>>>>> struct drm_i915_gem_object *obj = >>>>>>> ce->state->obj; >>>>>>> - int type = >>>>>>> i915_coherent_map_type(ce->engine->i915); >>>>>>> + int type = >>>>>>> i915_coherent_map_type(ce->engine->i915, obj, true); >>>>>>> void *map; >>>>>>> >>>>>>> if (!i915_gem_object_trylock(obj)) >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>>> b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>>> index e86897cde984..aafe2a4df496 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>>> @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, >>>>>>> GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); >>>>>>> >>>>>>> *vaddr = i915_gem_object_pin_map(ce->state->obj, >>>>>>> - i915_coherent_map_type(ce->engine->i915) | >>>>>>> + i915_coherent_map_type(ce->engine->i915, >>>>>>> + ce->state->obj, >>>>>>> + false) | >>>>>>> I915_MAP_OVERRIDE); >>>>>>> >>>>>>> return PTR_ERR_OR_ZERO(*vaddr); >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>>> b/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>>> index aee0a77c77e0..3cf6c7e68108 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>>> @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring >>>>>>> *ring, struct i915_gem_ww_ctx *ww) >>>>>>> >>>>>>> if (i915_vma_is_map_and_fenceable(vma)) >>>>>>> addr = (void __force *)i915_vma_pin_iomap(vma); >>>>>>> - else >>>>>>> - addr = i915_gem_object_pin_map(vma->obj, >>>>>>> - i915_coherent_map_type(vma->vm->i915)); >>>>>>> + else { >>>>>>> + int type = >>>>>>> i915_coherent_map_type(vma->vm->i915, vma->obj, false); >>>>>>> + >>>>>>> + addr = i915_gem_object_pin_map(vma->obj, type); >>>>>>> + } >>>>>>> + >>>>>>> if (IS_ERR(addr)) { >>>>>>> ret = PTR_ERR(addr); >>>>>>> goto err_ring; >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>>> b/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>>> index b9bdd1d23243..26685b927169 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>>> @@ -88,7 +88,8 @@ static int __live_context_size(struct >>>>>>> intel_engine_cs *engine) >>>>>>> goto err; >>>>>>> >>>>>>> vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>>>>> - i915_coherent_map_type(engine->i915)); >>>>>>> + i915_coherent_map_type(engine->i915, >>>>>>> + ce->state->obj, false)); >>>>>>> if (IS_ERR(vaddr)) { >>>>>>> err = PTR_ERR(vaddr); >>>>>>> intel_context_unpin(ce); >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>>> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>>> index 746985971c3a..5b63d4df8c93 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>>> @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct >>>>>>> intel_gt *gt) >>>>>>> h->seqno = memset(vaddr, 0xff, PAGE_SIZE); >>>>>>> >>>>>>> vaddr = i915_gem_object_pin_map_unlocked(h->obj, >>>>>>> - i915_coherent_map_type(gt->i915)); >>>>>>> + i915_coherent_map_type(gt->i915, h->obj, false)); >>>>>>> if (IS_ERR(vaddr)) { >>>>>>> err = PTR_ERR(vaddr); >>>>>>> goto err_unpin_hws; >>>>>>> @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, >>>>>>> struct intel_engine_cs *engine) >>>>>>> return ERR_CAST(obj); >>>>>>> } >>>>>>> >>>>>>> - vaddr = i915_gem_object_pin_map_unlocked(obj, >>>>>>> i915_coherent_map_type(gt->i915)); >>>>>>> + vaddr = i915_gem_object_pin_map_unlocked(obj, >>>>>>> i915_coherent_map_type(gt->i915, obj, false)); >>>>>>> if (IS_ERR(vaddr)) { >>>>>>> i915_gem_object_put(obj); >>>>>>> i915_vm_put(vm); >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>>> b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>>> index 85e7df6a5123..d8f6623524e8 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>>> @@ -1221,7 +1221,9 @@ static int compare_isolation(struct >>>>>>> intel_engine_cs *engine, >>>>>>> } >>>>>>> >>>>>>> lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>>>>> - i915_coherent_map_type(engine->i915)); >>>>>>> + i915_coherent_map_type(engine->i915, >>>>>>> + ce->state->obj, >>>>>>> + false)); >>>>>>> if (IS_ERR(lrc)) { >>>>>>> err = PTR_ERR(lrc); >>>>>>> goto err_B1; >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>>> index 78305b2ec89d..adae04c47aab 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>>> @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct >>>>>>> intel_guc *guc, u32 size, >>>>>>> if (IS_ERR(vma)) >>>>>>> return PTR_ERR(vma); >>>>>>> >>>>>>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>>> I915_MAP_WB); >>>>>>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>>> + i915_coherent_map_type(guc_to_gt(guc)->i915, >>>>>>> + vma->obj, true)); >>>>>>> if (IS_ERR(vaddr)) { >>>>>>> i915_vma_unpin_and_release(&vma, 0); >>>>>>> return PTR_ERR(vaddr); >>>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>>> index 2126dd81ac38..56d2144dc6a0 100644 >>>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>>> @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct >>>>>>> intel_huc *huc) >>>>>>> if (IS_ERR(vma)) >>>>>>> return PTR_ERR(vma); >>>>>>> >>>>>>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>>> I915_MAP_WB); >>>>>>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>>> + i915_coherent_map_type(gt->i915, >>>>>>> + vma->obj, true)); >>>>>>> if (IS_ERR(vaddr)) { >>>>>>> i915_vma_unpin_and_release(&vma, 0); >>>>>>> return PTR_ERR(vaddr); >>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h >>>>>>> b/drivers/gpu/drm/i915/i915_drv.h >>>>>>> index 69e43bf91a15..2abbc06712a4 100644 >>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h >>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>>>>>> @@ -78,6 +78,7 @@ >>>>>>> #include "gem/i915_gem_context_types.h" >>>>>>> #include "gem/i915_gem_shrinker.h" >>>>>>> #include "gem/i915_gem_stolen.h" >>>>>>> +#include "gem/i915_gem_lmem.h" >>>>>>> >>>>>>> #include "gt/intel_engine.h" >>>>>>> #include "gt/intel_gt_types.h" >>>>>>> @@ -1921,9 +1922,15 @@ static inline int >>>>>>> intel_hws_csb_write_index(struct drm_i915_private *i915) >>>>>>> } >>>>>>> >>>>>>> static inline enum i915_map_type >>>>>>> -i915_coherent_map_type(struct drm_i915_private *i915) >>>>>>> +i915_coherent_map_type(struct drm_i915_private *i915, >>>>>>> + struct drm_i915_gem_object *obj, bool >>>>>>> always_coherent) >>>>>>> { >>>>>>> - return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; >>>>>>> + if (i915_gem_object_is_lmem(obj)) >>>>>>> + return I915_MAP_WC; >>>>>>> + if (HAS_LLC(i915) || always_coherent) >>>>>>> + return I915_MAP_WB; >>>>>>> + else >>>>>>> + return I915_MAP_WC; >>>>>> >>>>>> Seems this patch is doing two things. >>>>>> >>>>>> First it is adding lmem support to this helper by always >>>>>> returning WC >>>>>> for lmem objects. >>>>>> >>>>>> Secondly it is introducing an idea of "always coherent" in a >>>>>> helper >>>>>> called i915_coherent_map_type. Could someone explain what is >>>>>> coherent vs >>>>>> always coherent? >>>>>> >>>>>> And also, why is always coherent happy with WB? Sounds counter >>>>>> intuitive >>>>>> to me. >>>>> >>>>> All this does is try to keep the existing behaviour intact, whilst >>>>> also ensuring that all lmem objects are mapped using only WC, no >>>>> matter what. The always_coherent=true thing is for the existing >>>>> places >>>>> where we sometimes map the object using WB, without first >>>>> considering >>>>> whether the device has the fast shared LLC vs snooping. Yes, it's >>>>> slightly ugly :) >>>> >>>> Not fully following - if we had to write kerneldoc for >>>> always_coherent >>>> input argument - what it would say? >>> >>> @always_coherent - If true we should always try to map the object >>> using WB. If false we should only map as WB if the device >>> supports the >>> fast shared LLC, in the case of snooped devices we will map use WC. >>> Note that If the resource is lmem then we will always map as WC, >>> regardless of the value of always_coherent, since that's all we >>> currently support. >>> >>> Maybe the naming is poor? >> >> Maybe just confusing to me, not sure yet. >> >> So always_coherent is not about how the callers wants to use it, >> but about platform knowledge? Or a performance concern for LLC vs >> snooping cases? Does WB works (coherently) on snooping platforms? > > The always_coherent=true is for the existing callers that want WB, > regardless of LLC vs snooping. > > The other callers use the existing i915_coherent_map_type() which > only gives out WB for LLC platforms. > > AFAIK, LLC vs snooping should offer the same in terms of coherency, > but in terms of performance the shared LLC is much faster, and so > for snooping platforms we choose to not enable WB everywhere. > > On top of that we now have lmem, but for that we only allow WC. > This patch just rolls all of that into one helper, while keeping > the existing behaviour unchanged.

Thanks. But I am still struggling with the API. :(

Is the introduction of always_coherent flag in the context of DG1 required even? AFAICT for lmem objects the flag is ignored so no?

If we drop the flag/helper thing, then we need something like:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type);

In all the places where we currently do:

vaddr = i915_gem_object_pin_map(obj, WB);

Where obj can be lmem, so ctx, ring, guc etc. Is that better or worse? The existing i915_coherent_map_type() callers should work as-is, since DG1 is snooped. And this patch just extends that to cover all cases.

Perhaps we need a new helper instead? Maybe you have a better idea?

Not yet. Would it make sense to put something in kerneldoc about when callers might choose always_coherent true vs false? In terms of expected usage (frequency, simplicity?) and any rules with regards when callers need to worry about flushing/ordering when there are mixed read and writes?

Hmmm, looking at this again, maybe for now we should just go with:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type)

Which is way less confusing, plus there are only a handful of places where we need this, so doesn't seem too bad?

Alternatively, we could wrap that in something like:

/* Returns WB for system memory, or WC for local memory */ void *i915_gem_object_pin_map_default(obj);

Thoughts?

I went and looked at the use sites to try and figure it out.

First thing, the bool always_coherent story is only relevant when we decide to place some object in system memory. Otherwise mapping is always WC so I guess our code needs to handle it anyway. Well, if the assumption is that we can change the location of the objects and it all just keeps working? Or that is not the goal?

I guess your concern is that mapping as WC has different semantics, and that might somehow break the caller?

...
Let see about the users (ignoring selftests):

lrc_reg_state and ring; always_coherent=false

Update frequency medium and mostly write from the CPU side.

They say always_coherent=false - which means they have to handle being given a WC mapping anyway.

What is the benefit of ever selecting WB here?

Engine status page; always_coherent=true

Frequently read and written from the CPU and GPU so cost of snooping is therefore fine? Apart from having to be ready to deal with WC anyway.

dbg_poison_ce; always_coherent=true

Writes to lrc_reg_state once - meh. Could just as well always ask for WC.

intel_guc_allocate_and_map_vma; always_coherent=true

This one has three users:

a) guc_stage_desc_pool_create stage_desc_pool_vaddr

This one seems write once at init.

b) intel_guc_ct_init

Use for CT communication so similar to CSB on engine status page in principle. But code also has to deal with WC when object is in lmem.

c) intel_guc_ads_create

CPU appears to only write on init and GPU reset.

intel_huc_rsa_data_create; always_coheret=true

Called from intel_huc_init so it appears write once from CPU. Not sure why it would need a coherent mapping if that is correct.

I think this exercise left me equally confused. Because flushing and read-write ordering rules are different between WB and WC. And code which accesses all these mappings either has to know which one is in use, or does not care. For the latter case we have to be sure about for every path.

Users of pin_map() are generally meant to call flush_map() where appropriate, which should do the right thing for us. For WC it only needs to flush the wcb. For WB it's more complicated since that depends on if the object is considered coherent or not, if it is then we don't need to do anything, otherwise we need to clflush.

Also note that we If we just map the buffer as WB, that by itself doesn't magically enable snooping for the pages AFAIK. We still have to tell the GPU that these pages are meant to be coherent, which we always do for LLC platforms I think, since the shared LLC is considered fast, whereas on snooping platforms, we don't enable this by default, and have this as CACHE_NONE instead(see shmem_object_init for example), and incur the cost of additional clflushing. Doing an explicit i915_gem_object_set_coherency(I915_CACHE_LLC) I think will mark the object as coherent for us. I think there are also some matching GTT bits for caching.

Also for DG1 you apparently can't disable snooping, as per what Daniel was saying in another thread.

...
The write on init / reset ones are easy enough and it doesn't really matter for them to use the coherent helper.

Lrc_reg_state as well I think can be WC with explicit flushing - it has to on lmem, no?

I doubt it has to be, since the GPU still just accesses it through the GTT.

...
This leaves the status page (CSB, etc) and GuC CT. Those are frequent R/W but also code has to be able to handle WC so what is the benefit of WB? It ends up faster than if it was WC, considering explicit flushes/barriers are still in there?

No idea for GuC, but for the hwsp it's still in system memory, and is WB, even for discrete. Chris measured this to be more performant with our execlists submission path than say just sticking it in lmem, and mapping it as WC.

Ping? How should we proceed with this patch?

...

...
Regards,

Tvrtko

Tvrtko Ursulin

9:21 a.m.

New subject: [Intel-gfx] [PATCH 11/19] drm/i915: Update the helper to set correct mapping

On 26/04/2021 09:57, Matthew Auld wrote:

...

On Wed, 21 Apr 2021 at 20:13, Matthew Auld matthew.william.auld@gmail.com wrote:

...
On Wed, 21 Apr 2021 at 16:41, Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 21/04/2021 12:42, Matthew Auld wrote:

...
On 19/04/2021 16:01, Tvrtko Ursulin wrote:

...
On 19/04/2021 15:37, Matthew Auld wrote:

...
On 19/04/2021 15:07, Tvrtko Ursulin wrote: > > On 19/04/2021 12:30, Matthew Auld wrote: >> On 15/04/2021 12:05, Tvrtko Ursulin wrote: >>> >>> On 15/04/2021 10:23, Matthew Auld wrote: >>>> On Thu, 15 Apr 2021 at 09:21, Tvrtko Ursulin >>>> tvrtko.ursulin@linux.intel.com wrote: >>>>> >>>>> >>>>> On 14/04/2021 17:20, Matthew Auld wrote: >>>>>> On Wed, 14 Apr 2021 at 16:22, Tvrtko Ursulin >>>>>> tvrtko.ursulin@linux.intel.com wrote: >>>>>>> >>>>>>> >>>>>>> On 12/04/2021 10:05, Matthew Auld wrote: >>>>>>>> From: Venkata Sandeep Dhanalakota >>>>>>>> venkata.s.dhanalakota@intel.com >>>>>>>> >>>>>>>> Determine the possible coherent map type based on object >>>>>>>> location, >>>>>>>> and if target has llc or if user requires an always coherent >>>>>>>> mapping. >>>>>>>> >>>>>>>> Cc: Matthew Auld matthew.auld@intel.com >>>>>>>> Cc: CQ Tang cq.tang@intel.com >>>>>>>> Suggested-by: Michal Wajdeczko michal.wajdeczko@intel.com >>>>>>>> Signed-off-by: Venkata Sandeep Dhanalakota >>>>>>>> venkata.s.dhanalakota@intel.com >>>>>>>> --- >>>>>>>> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++- >>>>>>>> drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- >>>>>>>> drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +++- >>>>>>>> drivers/gpu/drm/i915/gt/intel_ring.c | 9 ++++++--- >>>>>>>> drivers/gpu/drm/i915/gt/selftest_context.c | 3 ++- >>>>>>>> drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 ++-- >>>>>>>> drivers/gpu/drm/i915/gt/selftest_lrc.c | 4 +++- >>>>>>>> drivers/gpu/drm/i915/gt/uc/intel_guc.c | 4 +++- >>>>>>>> drivers/gpu/drm/i915/gt/uc/intel_huc.c | 4 +++- >>>>>>>> drivers/gpu/drm/i915/i915_drv.h | 11 >>>>>>>> +++++++++-- >>>>>>>> drivers/gpu/drm/i915/selftests/igt_spinner.c | 4 ++-- >>>>>>>> 11 files changed, 36 insertions(+), 16 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>>>> index efe935f80c1a..b79568d370f5 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c >>>>>>>> @@ -664,7 +664,8 @@ static int init_status_page(struct >>>>>>>> intel_engine_cs *engine) >>>>>>>> if (ret) >>>>>>>> goto err; >>>>>>>> >>>>>>>> - vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); >>>>>>>> + vaddr = i915_gem_object_pin_map(obj, >>>>>>>> + i915_coherent_map_type(engine->i915, obj, true)); >>>>>>>> if (IS_ERR(vaddr)) { >>>>>>>> ret = PTR_ERR(vaddr); >>>>>>>> goto err_unpin; >>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>>>> b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>>>> index 7c9af86fdb1e..47f4397095e5 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>>>>> @@ -23,7 +23,7 @@ static void dbg_poison_ce(struct >>>>>>>> intel_context *ce) >>>>>>>> >>>>>>>> if (ce->state) { >>>>>>>> struct drm_i915_gem_object *obj = >>>>>>>> ce->state->obj; >>>>>>>> - int type = >>>>>>>> i915_coherent_map_type(ce->engine->i915); >>>>>>>> + int type = >>>>>>>> i915_coherent_map_type(ce->engine->i915, obj, true); >>>>>>>> void *map; >>>>>>>> >>>>>>>> if (!i915_gem_object_trylock(obj)) >>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>>>> b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>>>> index e86897cde984..aafe2a4df496 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c >>>>>>>> @@ -903,7 +903,9 @@ lrc_pre_pin(struct intel_context *ce, >>>>>>>> GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); >>>>>>>> >>>>>>>> *vaddr = i915_gem_object_pin_map(ce->state->obj, >>>>>>>> - i915_coherent_map_type(ce->engine->i915) | >>>>>>>> + i915_coherent_map_type(ce->engine->i915, >>>>>>>> + ce->state->obj, >>>>>>>> + false) | >>>>>>>> I915_MAP_OVERRIDE); >>>>>>>> >>>>>>>> return PTR_ERR_OR_ZERO(*vaddr); >>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>>>> b/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>>>> index aee0a77c77e0..3cf6c7e68108 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ring.c >>>>>>>> @@ -53,9 +53,12 @@ int intel_ring_pin(struct intel_ring >>>>>>>> *ring, struct i915_gem_ww_ctx *ww) >>>>>>>> >>>>>>>> if (i915_vma_is_map_and_fenceable(vma)) >>>>>>>> addr = (void __force *)i915_vma_pin_iomap(vma); >>>>>>>> - else >>>>>>>> - addr = i915_gem_object_pin_map(vma->obj, >>>>>>>> - i915_coherent_map_type(vma->vm->i915)); >>>>>>>> + else { >>>>>>>> + int type = >>>>>>>> i915_coherent_map_type(vma->vm->i915, vma->obj, false); >>>>>>>> + >>>>>>>> + addr = i915_gem_object_pin_map(vma->obj, type); >>>>>>>> + } >>>>>>>> + >>>>>>>> if (IS_ERR(addr)) { >>>>>>>> ret = PTR_ERR(addr); >>>>>>>> goto err_ring; >>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>>>> b/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>>>> index b9bdd1d23243..26685b927169 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_context.c >>>>>>>> @@ -88,7 +88,8 @@ static int __live_context_size(struct >>>>>>>> intel_engine_cs *engine) >>>>>>>> goto err; >>>>>>>> >>>>>>>> vaddr = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>>>>>> - i915_coherent_map_type(engine->i915)); >>>>>>>> + i915_coherent_map_type(engine->i915, >>>>>>>> + ce->state->obj, false)); >>>>>>>> if (IS_ERR(vaddr)) { >>>>>>>> err = PTR_ERR(vaddr); >>>>>>>> intel_context_unpin(ce); >>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>>>> b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>>>> index 746985971c3a..5b63d4df8c93 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c >>>>>>>> @@ -69,7 +69,7 @@ static int hang_init(struct hang *h, struct >>>>>>>> intel_gt *gt) >>>>>>>> h->seqno = memset(vaddr, 0xff, PAGE_SIZE); >>>>>>>> >>>>>>>> vaddr = i915_gem_object_pin_map_unlocked(h->obj, >>>>>>>> - i915_coherent_map_type(gt->i915)); >>>>>>>> + i915_coherent_map_type(gt->i915, h->obj, false)); >>>>>>>> if (IS_ERR(vaddr)) { >>>>>>>> err = PTR_ERR(vaddr); >>>>>>>> goto err_unpin_hws; >>>>>>>> @@ -130,7 +130,7 @@ hang_create_request(struct hang *h, >>>>>>>> struct intel_engine_cs *engine) >>>>>>>> return ERR_CAST(obj); >>>>>>>> } >>>>>>>> >>>>>>>> - vaddr = i915_gem_object_pin_map_unlocked(obj, >>>>>>>> i915_coherent_map_type(gt->i915)); >>>>>>>> + vaddr = i915_gem_object_pin_map_unlocked(obj, >>>>>>>> i915_coherent_map_type(gt->i915, obj, false)); >>>>>>>> if (IS_ERR(vaddr)) { >>>>>>>> i915_gem_object_put(obj); >>>>>>>> i915_vm_put(vm); >>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>>>> b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>>>> index 85e7df6a5123..d8f6623524e8 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c >>>>>>>> @@ -1221,7 +1221,9 @@ static int compare_isolation(struct >>>>>>>> intel_engine_cs *engine, >>>>>>>> } >>>>>>>> >>>>>>>> lrc = i915_gem_object_pin_map_unlocked(ce->state->obj, >>>>>>>> - i915_coherent_map_type(engine->i915)); >>>>>>>> + i915_coherent_map_type(engine->i915, >>>>>>>> + ce->state->obj, >>>>>>>> + false)); >>>>>>>> if (IS_ERR(lrc)) { >>>>>>>> err = PTR_ERR(lrc); >>>>>>>> goto err_B1; >>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>>>> index 78305b2ec89d..adae04c47aab 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >>>>>>>> @@ -682,7 +682,9 @@ int intel_guc_allocate_and_map_vma(struct >>>>>>>> intel_guc *guc, u32 size, >>>>>>>> if (IS_ERR(vma)) >>>>>>>> return PTR_ERR(vma); >>>>>>>> >>>>>>>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>>>> I915_MAP_WB); >>>>>>>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>>>> + i915_coherent_map_type(guc_to_gt(guc)->i915, >>>>>>>> + vma->obj, true)); >>>>>>>> if (IS_ERR(vaddr)) { >>>>>>>> i915_vma_unpin_and_release(&vma, 0); >>>>>>>> return PTR_ERR(vaddr); >>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>>>> index 2126dd81ac38..56d2144dc6a0 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c >>>>>>>> @@ -82,7 +82,9 @@ static int intel_huc_rsa_data_create(struct >>>>>>>> intel_huc *huc) >>>>>>>> if (IS_ERR(vma)) >>>>>>>> return PTR_ERR(vma); >>>>>>>> >>>>>>>> - vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>>>> I915_MAP_WB); >>>>>>>> + vaddr = i915_gem_object_pin_map_unlocked(vma->obj, >>>>>>>> + i915_coherent_map_type(gt->i915, >>>>>>>> + vma->obj, true)); >>>>>>>> if (IS_ERR(vaddr)) { >>>>>>>> i915_vma_unpin_and_release(&vma, 0); >>>>>>>> return PTR_ERR(vaddr); >>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h >>>>>>>> b/drivers/gpu/drm/i915/i915_drv.h >>>>>>>> index 69e43bf91a15..2abbc06712a4 100644 >>>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h >>>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h >>>>>>>> @@ -78,6 +78,7 @@ >>>>>>>> #include "gem/i915_gem_context_types.h" >>>>>>>> #include "gem/i915_gem_shrinker.h" >>>>>>>> #include "gem/i915_gem_stolen.h" >>>>>>>> +#include "gem/i915_gem_lmem.h" >>>>>>>> >>>>>>>> #include "gt/intel_engine.h" >>>>>>>> #include "gt/intel_gt_types.h" >>>>>>>> @@ -1921,9 +1922,15 @@ static inline int >>>>>>>> intel_hws_csb_write_index(struct drm_i915_private *i915) >>>>>>>> } >>>>>>>> >>>>>>>> static inline enum i915_map_type >>>>>>>> -i915_coherent_map_type(struct drm_i915_private *i915) >>>>>>>> +i915_coherent_map_type(struct drm_i915_private *i915, >>>>>>>> + struct drm_i915_gem_object *obj, bool >>>>>>>> always_coherent) >>>>>>>> { >>>>>>>> - return HAS_LLC(i915) ? I915_MAP_WB : I915_MAP_WC; >>>>>>>> + if (i915_gem_object_is_lmem(obj)) >>>>>>>> + return I915_MAP_WC; >>>>>>>> + if (HAS_LLC(i915) || always_coherent) >>>>>>>> + return I915_MAP_WB; >>>>>>>> + else >>>>>>>> + return I915_MAP_WC; >>>>>>> >>>>>>> Seems this patch is doing two things. >>>>>>> >>>>>>> First it is adding lmem support to this helper by always >>>>>>> returning WC >>>>>>> for lmem objects. >>>>>>> >>>>>>> Secondly it is introducing an idea of "always coherent" in a >>>>>>> helper >>>>>>> called i915_coherent_map_type. Could someone explain what is >>>>>>> coherent vs >>>>>>> always coherent? >>>>>>> >>>>>>> And also, why is always coherent happy with WB? Sounds counter >>>>>>> intuitive >>>>>>> to me. >>>>>> >>>>>> All this does is try to keep the existing behaviour intact, whilst >>>>>> also ensuring that all lmem objects are mapped using only WC, no >>>>>> matter what. The always_coherent=true thing is for the existing >>>>>> places >>>>>> where we sometimes map the object using WB, without first >>>>>> considering >>>>>> whether the device has the fast shared LLC vs snooping. Yes, it's >>>>>> slightly ugly :) >>>>> >>>>> Not fully following - if we had to write kerneldoc for >>>>> always_coherent >>>>> input argument - what it would say? >>>> >>>> @always_coherent - If true we should always try to map the object >>>> using WB. If false we should only map as WB if the device >>>> supports the >>>> fast shared LLC, in the case of snooped devices we will map use WC. >>>> Note that If the resource is lmem then we will always map as WC, >>>> regardless of the value of always_coherent, since that's all we >>>> currently support. >>>> >>>> Maybe the naming is poor? >>> >>> Maybe just confusing to me, not sure yet. >>> >>> So always_coherent is not about how the callers wants to use it, >>> but about platform knowledge? Or a performance concern for LLC vs >>> snooping cases? Does WB works (coherently) on snooping platforms? >> >> The always_coherent=true is for the existing callers that want WB, >> regardless of LLC vs snooping. >> >> The other callers use the existing i915_coherent_map_type() which >> only gives out WB for LLC platforms. >> >> AFAIK, LLC vs snooping should offer the same in terms of coherency, >> but in terms of performance the shared LLC is much faster, and so >> for snooping platforms we choose to not enable WB everywhere. >> >> On top of that we now have lmem, but for that we only allow WC. >> This patch just rolls all of that into one helper, while keeping >> the existing behaviour unchanged. > > Thanks. But I am still struggling with the API. :( > > Is the introduction of always_coherent flag in the context of DG1 > required even? AFAICT for lmem objects the flag is ignored so no?

If we drop the flag/helper thing, then we need something like:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type);

In all the places where we currently do:

vaddr = i915_gem_object_pin_map(obj, WB);

Where obj can be lmem, so ctx, ring, guc etc. Is that better or worse? The existing i915_coherent_map_type() callers should work as-is, since DG1 is snooped. And this patch just extends that to cover all cases.

Perhaps we need a new helper instead? Maybe you have a better idea?

Not yet. Would it make sense to put something in kerneldoc about when callers might choose always_coherent true vs false? In terms of expected usage (frequency, simplicity?) and any rules with regards when callers need to worry about flushing/ordering when there are mixed read and writes?

Hmmm, looking at this again, maybe for now we should just go with:

type = WB; if (i915_gem_object_is_lmem(obj)) type = WC;

vaddr = i915_gem_object_pin_map(obj, type)

Which is way less confusing, plus there are only a handful of places where we need this, so doesn't seem too bad?

Alternatively, we could wrap that in something like:

/* Returns WB for system memory, or WC for local memory */ void *i915_gem_object_pin_map_default(obj);

Thoughts?

I went and looked at the use sites to try and figure it out.

First thing, the bool always_coherent story is only relevant when we decide to place some object in system memory. Otherwise mapping is always WC so I guess our code needs to handle it anyway. Well, if the assumption is that we can change the location of the objects and it all just keeps working? Or that is not the goal?

I guess your concern is that mapping as WC has different semantics, and that might somehow break the caller?

...
Let see about the users (ignoring selftests):

lrc_reg_state and ring; always_coherent=false

Update frequency medium and mostly write from the CPU side.

They say always_coherent=false - which means they have to handle being given a WC mapping anyway.

What is the benefit of ever selecting WB here?

Engine status page; always_coherent=true

Frequently read and written from the CPU and GPU so cost of snooping is therefore fine? Apart from having to be ready to deal with WC anyway.

dbg_poison_ce; always_coherent=true

Writes to lrc_reg_state once - meh. Could just as well always ask for WC.

intel_guc_allocate_and_map_vma; always_coherent=true

This one has three users:

a) guc_stage_desc_pool_create stage_desc_pool_vaddr

This one seems write once at init.

b) intel_guc_ct_init

Use for CT communication so similar to CSB on engine status page in principle. But code also has to deal with WC when object is in lmem.

c) intel_guc_ads_create

CPU appears to only write on init and GPU reset.

intel_huc_rsa_data_create; always_coheret=true

Called from intel_huc_init so it appears write once from CPU. Not sure why it would need a coherent mapping if that is correct.

I think this exercise left me equally confused. Because flushing and read-write ordering rules are different between WB and WC. And code which accesses all these mappings either has to know which one is in use, or does not care. For the latter case we have to be sure about for every path.

Users of pin_map() are generally meant to call flush_map() where appropriate, which should do the right thing for us. For WC it only needs to flush the wcb. For WB it's more complicated since that depends on if the object is considered coherent or not, if it is then we don't need to do anything, otherwise we need to clflush.

Also note that we If we just map the buffer as WB, that by itself doesn't magically enable snooping for the pages AFAIK. We still have to tell the GPU that these pages are meant to be coherent, which we always do for LLC platforms I think, since the shared LLC is considered fast, whereas on snooping platforms, we don't enable this by default, and have this as CACHE_NONE instead(see shmem_object_init for example), and incur the cost of additional clflushing. Doing an explicit i915_gem_object_set_coherency(I915_CACHE_LLC) I think will mark the object as coherent for us. I think there are also some matching GTT bits for caching.

Also for DG1 you apparently can't disable snooping, as per what Daniel was saying in another thread.

...
The write on init / reset ones are easy enough and it doesn't really matter for them to use the coherent helper.

Lrc_reg_state as well I think can be WC with explicit flushing - it has to on lmem, no?

I doubt it has to be, since the GPU still just accesses it through the GTT.

...
This leaves the status page (CSB, etc) and GuC CT. Those are frequent R/W but also code has to be able to handle WC so what is the benefit of WB? It ends up faster than if it was WC, considering explicit flushes/barriers are still in there?

No idea for GuC, but for the hwsp it's still in system memory, and is WB, even for discrete. Chris measured this to be more performant with our execlists submission path than say just sticking it in lmem, and mapping it as WC.

Ping? How should we proceed with this patch?

I just re-freshed my memory on when the write combine buffer gets flushed and realized uncached reads are also an implicit flush. So my complications from earlier reply were purely mine and I think you can proceed with the patch as is.

Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com

Regards,

Tvrtko

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 12/19] drm/i915/lmem: Bypass aperture when lmem is available

From: Anusha Srivatsa anusha.srivatsa@intel.com

In the scenario where local memory is available, we have rely on CPU access via lmem directly instead of aperture.

v2: gmch is only relevant for much older hw, therefore we can drop the has_aperture check since it should always be present on such platforms. (Chris)

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Dhinakaran Pandiyan dhinakaran.pandiyan@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: CQ Tang cq.tang@intel.com Signed-off-by: Anusha Srivatsa anusha.srivatsa@intel.com --- drivers/gpu/drm/i915/display/intel_fbdev.c | 22 +++++++++++++++------- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 15 +++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 5 +++++ drivers/gpu/drm/i915/i915_vma.c | 19 +++++++++++++------ 4 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 2b37959da747..4af40229f5ec 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -139,14 +139,22 @@ static int intelfb_alloc(struct drm_fb_helper *helper, size = mode_cmd.pitches[0] * mode_cmd.height; size = PAGE_ALIGN(size);

- /* If the FB is too big, just don't use it since fbdev is not very - * important and we should probably use that space with FBC or other - * features. */ obj = ERR_PTR(-ENODEV); - if (size * 2 < dev_priv->stolen_usable_size) - obj = i915_gem_object_create_stolen(dev_priv, size); - if (IS_ERR(obj)) - obj = i915_gem_object_create_shmem(dev_priv, size); + if (HAS_LMEM(dev_priv)) { + obj = i915_gem_object_create_lmem(dev_priv, size, + I915_BO_ALLOC_CONTIGUOUS); + } else { + /* + * If the FB is too big, just don't use it since fbdev is not very + * important and we should probably use that space with FBC or other + * features. + */ + if (size * 2 < dev_priv->stolen_usable_size) + obj = i915_gem_object_create_stolen(dev_priv, size); + if (IS_ERR(obj)) + obj = i915_gem_object_create_shmem(dev_priv, size); + } + if (IS_ERR(obj)) { drm_err(&dev_priv->drm, "failed to allocate framebuffer\n"); return PTR_ERR(obj); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index 017db8f71130..f44bdd08f7cb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -17,6 +17,21 @@ const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = { .release = i915_gem_object_release_memory_region, };

+void __iomem * +i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj, + unsigned long n, + unsigned long size) +{ + resource_size_t offset; + + GEM_BUG_ON(!i915_gem_object_is_contiguous(obj)); + + offset = i915_gem_object_get_dma_address(obj, n); + offset -= obj->mm.region->region.start; + + return io_mapping_map_wc(&obj->mm.region->iomap, offset, size); +} + bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) { struct intel_memory_region *mr = obj->mm.region; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index 036d53c01de9..fac6bc5a5ebb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -14,6 +14,11 @@ struct intel_memory_region;

extern const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops;

+void __iomem * +i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj, + unsigned long n, + unsigned long size); + bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj);

struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 07490db51cdc..e24d33aecac4 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -27,6 +27,7 @@

#include "display/intel_frontbuffer.h"

+#include "gem/i915_gem_lmem.h" #include "gt/intel_engine.h" #include "gt/intel_engine_heartbeat.h" #include "gt/intel_gt.h" @@ -448,9 +449,11 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma) void __iomem *ptr; int err;

- if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) { - err = -ENODEV; - goto err; + if (!i915_gem_object_is_lmem(vma->obj)) { + if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) { + err = -ENODEV; + goto err; + } }

GEM_BUG_ON(!i915_vma_is_ggtt(vma)); @@ -458,9 +461,13 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)

ptr = READ_ONCE(vma->iomap); if (ptr == NULL) { - ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap, - vma->node.start, - vma->node.size); + if (i915_gem_object_is_lmem(vma->obj)) + ptr = i915_gem_object_lmem_io_map(vma->obj, 0, + vma->obj->base.size); + else + ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap, + vma->node.start, + vma->node.size); if (ptr == NULL) { err = -ENOMEM; goto err;

-- 2.26.3

Tvrtko Ursulin

14 Apr 14 Apr

3:33 p.m.

New subject: [Intel-gfx] [PATCH 12/19] drm/i915/lmem: Bypass aperture when lmem is available

On 12/04/2021 10:05, Matthew Auld wrote:

...

From: Anusha Srivatsa anusha.srivatsa@intel.com

In the scenario where local memory is available, we have rely on CPU access via lmem directly instead of aperture.

v2: gmch is only relevant for much older hw, therefore we can drop the has_aperture check since it should always be present on such platforms. (Chris)

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Dhinakaran Pandiyan dhinakaran.pandiyan@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: CQ Tang cq.tang@intel.com Signed-off-by: Anusha Srivatsa anusha.srivatsa@intel.com

drivers/gpu/drm/i915/display/intel_fbdev.c | 22 +++++++++++++++------- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 15 +++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 5 +++++ drivers/gpu/drm/i915/i915_vma.c | 19 +++++++++++++------ 4 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 2b37959da747..4af40229f5ec 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -139,14 +139,22 @@ static int intelfb_alloc(struct drm_fb_helper *helper, size = mode_cmd.pitches[0] * mode_cmd.height; size = PAGE_ALIGN(size);
/* If the FB is too big, just don't use it since fbdev is not very
* important and we should probably use that space with FBC or other
* features. */
obj = ERR_PTR(-ENODEV);
if (size * 2 < dev_priv->stolen_usable_size)
obj = i915_gem_object_create_stolen(dev_priv, size);
if (IS_ERR(obj))
obj = i915_gem_object_create_shmem(dev_priv, size);
if (HAS_LMEM(dev_priv)) {
obj = i915_gem_object_create_lmem(dev_priv, size,
				  I915_BO_ALLOC_CONTIGUOUS);

Has to be contiguous? Question for display experts I guess.

[Comes back later.] Ah for iomap? Put a comment to that effect perhaps?

...

} else {
```
/*
```

 * If the FB is too big, just don't use it since fbdev is not very

 * important and we should probably use that space with FBC or other

```
 * features.
```
```
 */
```

if (size * 2 < dev_priv->stolen_usable_size)

	obj = i915_gem_object_create_stolen(dev_priv, size);

```
if (IS_ERR(obj))
```

	obj = i915_gem_object_create_shmem(dev_priv, size);

}

Could we keep the IS_ERR ordered allocation order to save having to re-indent? Bike shed so optional..

...

if (IS_ERR(obj)) { drm_err(&dev_priv->drm, "failed to allocate framebuffer\n"); return PTR_ERR(obj);

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index 017db8f71130..f44bdd08f7cb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -17,6 +17,21 @@ const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = { .release = i915_gem_object_release_memory_region, };

+void __iomem * +i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj,
	    unsigned long n,
	    unsigned long size)
+{

resource_size_t offset;

GEM_BUG_ON(!i915_gem_object_is_contiguous(obj));

offset = i915_gem_object_get_dma_address(obj, n);

offset -= obj->mm.region->region.start;

return io_mapping_map_wc(&obj->mm.region->iomap, offset, size);

+}

bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) { struct intel_memory_region *mr = obj->mm.region;

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index 036d53c01de9..fac6bc5a5ebb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -14,6 +14,11 @@ struct intel_memory_region;

extern const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops;

+void __iomem * +i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj,
	    unsigned long n,
	    unsigned long size);
bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj);

struct drm_i915_gem_object *
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 07490db51cdc..e24d33aecac4 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -27,6 +27,7 @@

#include "display/intel_frontbuffer.h"

+#include "gem/i915_gem_lmem.h" #include "gt/intel_engine.h" #include "gt/intel_engine_heartbeat.h" #include "gt/intel_gt.h" @@ -448,9 +449,11 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma) void __iomem *ptr; int err;
if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) {
err = -ENODEV;
goto err;
if (!i915_gem_object_is_lmem(vma->obj)) {
if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) {
	err = -ENODEV;
	goto err;
}
}

GEM_BUG_ON(!i915_vma_is_ggtt(vma));
@@ -458,9 +461,13 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)

ptr = READ_ONCE(vma->iomap); if (ptr == NULL) {
ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap,
			vma->node.start,
			vma->node.size);
if (i915_gem_object_is_lmem(vma->obj))
	ptr = i915_gem_object_lmem_io_map(vma->obj, 0,
					  vma->obj->base.size);

Can the vma size be bigger than the object here? Given how below works of vma->node.size.

...

```
else
```

	ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap,

```
				vma->node.start,
```
```
				vma->node.size);
```

Looks a bit odd that this calls the same io_mapping_map_wc as i915_gem_object_lmem_io_map ends up doing. Perhaps that suggests there should be a single helper here but I am not sure what would be elegant.

Regards,

Tvrtko

...

if (ptr == NULL) {
	err = -ENOMEM;
	goto err;

Matthew Auld

16 Apr 16 Apr

2:25 p.m.

New subject: [Intel-gfx] [PATCH 12/19] drm/i915/lmem: Bypass aperture when lmem is available

On 14/04/2021 16:33, Tvrtko Ursulin wrote:

...

On 12/04/2021 10:05, Matthew Auld wrote:

...
From: Anusha Srivatsa anusha.srivatsa@intel.com

In the scenario where local memory is available, we have rely on CPU access via lmem directly instead of aperture.

v2: gmch is only relevant for much older hw, therefore we can drop the has_aperture check since it should always be present on such platforms. (Chris)

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Dhinakaran Pandiyan dhinakaran.pandiyan@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: CQ Tang cq.tang@intel.com Signed-off-by: Anusha Srivatsa anusha.srivatsa@intel.com

drivers/gpu/drm/i915/display/intel_fbdev.c | 22 +++++++++++++++------- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 15 +++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 5 +++++ drivers/gpu/drm/i915/i915_vma.c | 19 +++++++++++++------ 4 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 2b37959da747..4af40229f5ec 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -139,14 +139,22 @@ static int intelfb_alloc(struct drm_fb_helper *helper, size = mode_cmd.pitches[0] * mode_cmd.height; size = PAGE_ALIGN(size); - /* If the FB is too big, just don't use it since fbdev is not very - * important and we should probably use that space with FBC or other - * features. */ obj = ERR_PTR(-ENODEV); - if (size * 2 < dev_priv->stolen_usable_size) - obj = i915_gem_object_create_stolen(dev_priv, size); - if (IS_ERR(obj)) - obj = i915_gem_object_create_shmem(dev_priv, size); + if (HAS_LMEM(dev_priv)) { + obj = i915_gem_object_create_lmem(dev_priv, size, + I915_BO_ALLOC_CONTIGUOUS);

Has to be contiguous? Question for display experts I guess.

[Comes back later.] Ah for iomap? Put a comment to that effect perhaps?

I don't think it has to be, since we could in theory just use pin_map() underneath, which can already deal with non-contiguous chunks of lmem, although that might bring in ww locking. I think for now just add a comment and mark this as XXX, and potentially revisit as follow up?

...

...
+    } else { +        /* +         * If the FB is too big, just don't use it since fbdev is not very +         * important and we should probably use that space with FBC or other +         * features. +         */ +        if (size * 2 < dev_priv->stolen_usable_size) +            obj = i915_gem_object_create_stolen(dev_priv, size); +        if (IS_ERR(obj)) +            obj = i915_gem_object_create_shmem(dev_priv, size); +    }

Could we keep the IS_ERR ordered allocation order to save having to re-indent? Bike shed so optional..

...

if (IS_ERR(obj)) {           drm_err(&dev_priv->drm, "failed to allocate framebuffer\n");           return PTR_ERR(obj); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index 017db8f71130..f44bdd08f7cb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -17,6 +17,21 @@ const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = {       .release = i915_gem_object_release_memory_region, }; +void __iomem * +i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj, +                unsigned long n, +                unsigned long size) +{ +    resource_size_t offset;

+    GEM_BUG_ON(!i915_gem_object_is_contiguous(obj));

+    offset = i915_gem_object_get_dma_address(obj, n); +    offset -= obj->mm.region->region.start;

+    return io_mapping_map_wc(&obj->mm.region->iomap, offset, size); +}

bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) {       struct intel_memory_region *mr = obj->mm.region; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index 036d53c01de9..fac6bc5a5ebb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -14,6 +14,11 @@ struct intel_memory_region; extern const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops; +void __iomem * +i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj, +                unsigned long n, +                unsigned long size);

bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj); struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 07490db51cdc..e24d33aecac4 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -27,6 +27,7 @@ #include "display/intel_frontbuffer.h" +#include "gem/i915_gem_lmem.h" #include "gt/intel_engine.h" #include "gt/intel_engine_heartbeat.h" #include "gt/intel_gt.h" @@ -448,9 +449,11 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)       void __iomem *ptr;       int err; -    if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) { -        err = -ENODEV; -        goto err; +    if (!i915_gem_object_is_lmem(vma->obj)) { +        if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) { +            err = -ENODEV; +            goto err; +        }       }       GEM_BUG_ON(!i915_vma_is_ggtt(vma)); @@ -458,9 +461,13 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)       ptr = READ_ONCE(vma->iomap);       if (ptr == NULL) { -        ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap, -                    vma->node.start, -                    vma->node.size); +        if (i915_gem_object_is_lmem(vma->obj)) +            ptr = i915_gem_object_lmem_io_map(vma->obj, 0, +                              vma->obj->base.size);

Can the vma size be bigger than the object here? Given how below works of vma->node.size.

I don't know tbh. But in general node.size can definitely be larger than vma->size/obj->base.size.

For the iomap version below, it's using the mappable aperture, which requires reserving a vma node into the mappable part of the GGTT first, so using node.size here make sense, since the node reflects the window into the mappable aperture.

For the lmem case though that might be bogus, since the vma has no relationship with LMEM_BAR, since really it's the object, hence why we use the obj->base.size instead. Although really it might make more sense to use pin_map() instead for the lmem case, if it's possible.

...

...
+ else + ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap, + vma->node.start, + vma->node.size);

Looks a bit odd that this calls the same io_mapping_map_wc as i915_gem_object_lmem_io_map ends up doing. Perhaps that suggests there should be a single helper here but I am not sure what would be elegant.

Regards,

Tvrtko

...
if (ptr == NULL) { err = -ENOMEM; goto err;

Tvrtko Ursulin

19 Apr 19 Apr

2:16 p.m.

New subject: [Intel-gfx] [PATCH 12/19] drm/i915/lmem: Bypass aperture when lmem is available

On 16/04/2021 15:25, Matthew Auld wrote:

...

On 14/04/2021 16:33, Tvrtko Ursulin wrote:

...
On 12/04/2021 10:05, Matthew Auld wrote:

...
From: Anusha Srivatsa anusha.srivatsa@intel.com

In the scenario where local memory is available, we have rely on CPU access via lmem directly instead of aperture.

v2: gmch is only relevant for much older hw, therefore we can drop the has_aperture check since it should always be present on such platforms. (Chris)

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Dhinakaran Pandiyan dhinakaran.pandiyan@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Chris P Wilson chris.p.wilson@intel.com Cc: Daniel Vetter daniel.vetter@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: CQ Tang cq.tang@intel.com Signed-off-by: Anusha Srivatsa anusha.srivatsa@intel.com

drivers/gpu/drm/i915/display/intel_fbdev.c | 22 +++++++++++++++------- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 15 +++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 5 +++++ drivers/gpu/drm/i915/i915_vma.c | 19 +++++++++++++------ 4 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 2b37959da747..4af40229f5ec 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -139,14 +139,22 @@ static int intelfb_alloc(struct drm_fb_helper *helper, size = mode_cmd.pitches[0] * mode_cmd.height; size = PAGE_ALIGN(size); - /* If the FB is too big, just don't use it since fbdev is not very - * important and we should probably use that space with FBC or other - * features. */ obj = ERR_PTR(-ENODEV); - if (size * 2 < dev_priv->stolen_usable_size) - obj = i915_gem_object_create_stolen(dev_priv, size); - if (IS_ERR(obj)) - obj = i915_gem_object_create_shmem(dev_priv, size); + if (HAS_LMEM(dev_priv)) { + obj = i915_gem_object_create_lmem(dev_priv, size, + I915_BO_ALLOC_CONTIGUOUS);

Has to be contiguous? Question for display experts I guess.

[Comes back later.] Ah for iomap? Put a comment to that effect perhaps?

I don't think it has to be, since we could in theory just use pin_map() underneath, which can already deal with non-contiguous chunks of lmem, although that might bring in ww locking. I think for now just add a comment and mark this as XXX, and potentially revisit as follow up?

Sure.

Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com

Regards,

Tvrtko

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 13/19] drm/i915/dg1: Read OPROM via SPI controller

From: Clint Taylor clinton.a.taylor@intel.com

Read OPROM SPI through MMIO and find VBT entry since we can't use OpRegion and PCI mapping may not work on some systems due to the BIOS not leaving the Option ROM mapped.

v2 by Jani: - switch to intel_uncore_read/intel_uncore_write

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Tomas Winkler tomas.winkler@intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Signed-off-by: Clint Taylor clinton.a.taylor@intel.com Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com --- drivers/gpu/drm/i915/display/intel_bios.c | 80 +++++++++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 8 +++ 2 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c index ea4837d485a1..f9dc651f1652 100644 --- a/drivers/gpu/drm/i915/display/intel_bios.c +++ b/drivers/gpu/drm/i915/display/intel_bios.c @@ -2238,6 +2238,66 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size) return vbt; }

+static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *i915) +{ + u32 count, data, found, store = 0; + u32 static_region, oprom_offset; + u32 oprom_size = 0x200000; + u16 vbt_size; + u32 *vbt; + + static_region = intel_uncore_read(&i915->uncore, SPI_STATIC_REGIONS); + static_region &= OPTIONROM_SPI_REGIONID_MASK; + intel_uncore_write(&i915->uncore, PRIMARY_SPI_REGIONID, static_region); + + oprom_offset = intel_uncore_read(&i915->uncore, OROM_OFFSET); + oprom_offset &= OROM_OFFSET_MASK; + + for (count = 0; count < oprom_size; count += 4) { + intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, oprom_offset + count); + data = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER); + + if (data == *((const u32 *)"$VBT")) { + found = oprom_offset + count; + break; + } + } + + if (count >= oprom_size) + goto err_not_found; + + /* Get VBT size and allocate space for the VBT */ + intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, found + + offsetof(struct vbt_header, vbt_size)); + vbt_size = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER); + vbt_size &= 0xffff; + + vbt = kzalloc(vbt_size, GFP_KERNEL); + if (!vbt) { + DRM_ERROR("Unable to allocate %u bytes for VBT storage\n", + vbt_size); + goto err_not_found; + } + + for (count = 0; count < vbt_size; count += 4) { + intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, found + count); + data = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER); + *(vbt + store++) = data; + } + + if (!intel_bios_is_valid_vbt(vbt, vbt_size)) + goto err_free_vbt; + + DRM_DEBUG_KMS("Found valid VBT in SPI flash\n"); + + return (struct vbt_header *)vbt; + +err_free_vbt: + kfree(vbt); +err_not_found: + return NULL; +} + static struct vbt_header *oprom_get_vbt(struct drm_i915_private *i915) { struct pci_dev *pdev = to_pci_dev(i915->drm.dev); @@ -2287,6 +2347,8 @@ static struct vbt_header *oprom_get_vbt(struct drm_i915_private *i915)

pci_unmap_rom(pdev, oprom);

+ DRM_DEBUG_KMS("Found valid VBT in PCI ROM\n"); + return vbt;

err_free_vbt: @@ -2321,17 +2383,23 @@ void intel_bios_init(struct drm_i915_private *i915)

init_vbt_defaults(i915);

- /* If the OpRegion does not have VBT, look in PCI ROM. */ + /* + * If the OpRegion does not have VBT, look in SPI flash through MMIO or + * PCI mapping + */ + if (!vbt && IS_DGFX(i915)) { + oprom_vbt = spi_oprom_get_vbt(i915); + vbt = oprom_vbt; + } + if (!vbt) { oprom_vbt = oprom_get_vbt(i915); - if (!oprom_vbt) - goto out; - vbt = oprom_vbt; - - drm_dbg_kms(&i915->drm, "Found valid VBT in PCI ROM\n"); }

+ if (!vbt) + goto out; + bdb = get_bdb_header(vbt); i915->vbt.version = bdb->version;

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index da73dc939e58..54ff63b86df6 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12540,6 +12540,14 @@ enum skl_power_gate { #define DP_PIN_ASSIGNMENT_MASK(idx) (0xf << ((idx) * 4)) #define DP_PIN_ASSIGNMENT(idx, x) ((x) << ((idx) * 4))

+#define PRIMARY_SPI_TRIGGER _MMIO(0x102040) +#define PRIMARY_SPI_ADDRESS _MMIO(0x102080) +#define PRIMARY_SPI_REGIONID _MMIO(0x102084) +#define SPI_STATIC_REGIONS _MMIO(0x102090) +#define OPTIONROM_SPI_REGIONID_MASK REG_GENMASK(7, 0) +#define OROM_OFFSET _MMIO(0x1020c0) +#define OROM_OFFSET_MASK REG_GENMASK(20, 16) + /* This register controls the Display State Buffer (DSB) engines. */ #define _DSBSL_INSTANCE_BASE 0x70B00 #define DSBSL_INSTANCE(pipe, id) (_DSBSL_INSTANCE_BASE + \

-- 2.26.3

Lucas De Marchi

17 Sep 17 Sep

11:29 p.m.

New subject: [Intel-gfx] [PATCH 13/19] drm/i915/dg1: Read OPROM via SPI controller

On Mon, Apr 12, 2021 at 10:05:20AM +0100, Matthew Auld wrote:

...

From: Clint Taylor clinton.a.taylor@intel.com

Read OPROM SPI through MMIO and find VBT entry since we can't use OpRegion and PCI mapping may not work on some systems due to the BIOS not leaving the Option ROM mapped.

I was surprised to see we still don't have this patch applied. There is some coding style to fix, but if we don't have it we are basically relying on the fallback of using a fake/hardcoded vbt. I will do some fixups and re-submit.

Lucas De Marchi

...

v2 by Jani:

switch to intel_uncore_read/intel_uncore_write

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Tomas Winkler tomas.winkler@intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Signed-off-by: Clint Taylor clinton.a.taylor@intel.com Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com

drivers/gpu/drm/i915/display/intel_bios.c | 80 +++++++++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 8 +++ 2 files changed, 82 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c index ea4837d485a1..f9dc651f1652 100644 --- a/drivers/gpu/drm/i915/display/intel_bios.c +++ b/drivers/gpu/drm/i915/display/intel_bios.c @@ -2238,6 +2238,66 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size) return vbt; }

+static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *i915) +{
u32 count, data, found, store = 0;

u32 static_region, oprom_offset;

u32 oprom_size = 0x200000;

u16 vbt_size;

u32 *vbt;

static_region = intel_uncore_read(&i915->uncore, SPI_STATIC_REGIONS);

static_region &= OPTIONROM_SPI_REGIONID_MASK;

intel_uncore_write(&i915->uncore, PRIMARY_SPI_REGIONID, static_region);

oprom_offset = intel_uncore_read(&i915->uncore, OROM_OFFSET);

oprom_offset &= OROM_OFFSET_MASK;

for (count = 0; count < oprom_size; count += 4) {
intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, oprom_offset + count);
data = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER);
if (data == *((const u32 *)"$VBT")) {
	found = oprom_offset + count;
	break;
}
}

if (count >= oprom_size)
goto err_not_found;
/* Get VBT size and allocate space for the VBT */

intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, found +
   offsetof(struct vbt_header, vbt_size));
vbt_size = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER);

vbt_size &= 0xffff;

vbt = kzalloc(vbt_size, GFP_KERNEL);

if (!vbt) {
DRM_ERROR("Unable to allocate %u bytes for VBT storage\n",
	  vbt_size);
goto err_not_found;
}

for (count = 0; count < vbt_size; count += 4) {
intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, found + count);
data = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER);
*(vbt + store++) = data;
}

if (!intel_bios_is_valid_vbt(vbt, vbt_size))
goto err_free_vbt;
DRM_DEBUG_KMS("Found valid VBT in SPI flash\n");

return (struct vbt_header *)vbt;
+err_free_vbt:

kfree(vbt);

+err_not_found:

return NULL;

+}

static struct vbt_header *oprom_get_vbt(struct drm_i915_private *i915) { struct pci_dev *pdev = to_pci_dev(i915->drm.dev); @@ -2287,6 +2347,8 @@ static struct vbt_header *oprom_get_vbt(struct drm_i915_private *i915)

pci_unmap_rom(pdev, oprom);

DRM_DEBUG_KMS("Found valid VBT in PCI ROM\n");

return vbt;

err_free_vbt: @@ -2321,17 +2383,23 @@ void intel_bios_init(struct drm_i915_private *i915)

init_vbt_defaults(i915);

/* If the OpRegion does not have VBT, look in PCI ROM. */
/*
* If the OpRegion does not have VBT, look in SPI flash through MMIO or
* PCI mapping
*/
if (!vbt && IS_DGFX(i915)) {
oprom_vbt = spi_oprom_get_vbt(i915);
vbt = oprom_vbt;
}

if (!vbt) { oprom_vbt = oprom_get_vbt(i915);
if (!oprom_vbt)
	goto out;
vbt = oprom_vbt;
drm_dbg_kms(&i915->drm, "Found valid VBT in PCI ROM\n");
}
if (!vbt)
goto out;
bdb = get_bdb_header(vbt); i915->vbt.version = bdb->version;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index da73dc939e58..54ff63b86df6 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12540,6 +12540,14 @@ enum skl_power_gate { #define DP_PIN_ASSIGNMENT_MASK(idx) (0xf << ((idx) * 4)) #define DP_PIN_ASSIGNMENT(idx, x) ((x) << ((idx) * 4))

+#define PRIMARY_SPI_TRIGGER _MMIO(0x102040) +#define PRIMARY_SPI_ADDRESS _MMIO(0x102080) +#define PRIMARY_SPI_REGIONID _MMIO(0x102084) +#define SPI_STATIC_REGIONS _MMIO(0x102090) +#define OPTIONROM_SPI_REGIONID_MASK REG_GENMASK(7, 0) +#define OROM_OFFSET _MMIO(0x1020c0) +#define OROM_OFFSET_MASK REG_GENMASK(20, 16)

/* This register controls the Display State Buffer (DSB) engines. */ #define _DSBSL_INSTANCE_BASE 0x70B00

#define DSBSL_INSTANCE(pipe, id) (_DSBSL_INSTANCE_BASE + \

2.26.3

Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 14/19] drm/i915/oprom: Basic sanitization

From: Anshuman Gupta anshuman.gupta@intel.com

Sanitize OPROM header, CPD signature and OPROM PCI version. OPROM_HEADER, EXPANSION_ROM_HEADER and OPROM_MEU_BLOB structures and PCI struct offsets are provided by GSC counterparts. These are yet to be Documented in B.Spec. After successful sanitization, extract VBT from opregion image.

v2: - Used macro for OPROM header magic 0xaa55 [Rodrigo] - Added a OPROM layout. [Uma] - Extract opregion from OPROM package and then extract VBT from opregion to have backward compatibility with older IFWI.

v3: - Moved opreg stuff to intel_opregion.{c,h}. [Uma] - Memory leak and intel_oprom_verify_signature return value fixes. [Uma]

v4: - Fix return code storage for oprom_image_parse_helper (Matt)

v5 by Jani: - switch to intel_uncore_read/intel_uncore_write

v6 by Khajapasha: - Rename intel_oprom_verify_signature() to intel_spi_get_oprom_opreg() [Jani, Nikula] - Use u32 data type for opregion size [Jani, Nikula]

Cc: Jani Nikula jani.nikula@intel.com Cc: Uma Shankar uma.shankar@intel.com Signed-off-by: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Mohammed Khajapasha mohammed.khajapasha@intel.com --- drivers/gpu/drm/i915/display/intel_bios.c | 47 +++-- drivers/gpu/drm/i915/display/intel_opregion.c | 169 ++++++++++++++++++ drivers/gpu/drm/i915/display/intel_opregion.h | 38 +++- 3 files changed, 227 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c index f9dc651f1652..59eec8333723 100644 --- a/drivers/gpu/drm/i915/display/intel_bios.c +++ b/drivers/gpu/drm/i915/display/intel_bios.c @@ -2240,37 +2240,36 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size)

static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *i915) { - u32 count, data, found, store = 0; - u32 static_region, oprom_offset; - u32 oprom_size = 0x200000; + u32 count, found, opreg_size; + u32 *vbt, *oprom_opreg = NULL; u16 vbt_size; - u32 *vbt; + u8 *parse_ptr;

- static_region = intel_uncore_read(&i915->uncore, SPI_STATIC_REGIONS); - static_region &= OPTIONROM_SPI_REGIONID_MASK; - intel_uncore_write(&i915->uncore, PRIMARY_SPI_REGIONID, static_region); - - oprom_offset = intel_uncore_read(&i915->uncore, OROM_OFFSET); - oprom_offset &= OROM_OFFSET_MASK; + if (intel_spi_get_oprom_opreg(i915, &oprom_opreg, &opreg_size)) { + drm_err(&i915->drm, "oprom signature verification failed\n"); + goto err_not_found; + }

- for (count = 0; count < oprom_size; count += 4) { - intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, oprom_offset + count); - data = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER); + if (!oprom_opreg) { + drm_err(&i915->drm, "opregion not found\n"); + goto err_not_found; + }

- if (data == *((const u32 *)"$VBT")) { - found = oprom_offset + count; + for (count = 0; count < opreg_size; count += 4) { + if (oprom_opreg[count / 4] == *((const u32 *)"$VBT")) { + found = count; break; } }

- if (count >= oprom_size) + if (count >= opreg_size) { + drm_err(&i915->drm, "VBT not found in opregion\n"); goto err_not_found; + }

/* Get VBT size and allocate space for the VBT */ - intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, found + - offsetof(struct vbt_header, vbt_size)); - vbt_size = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER); - vbt_size &= 0xffff; + parse_ptr = (u8 *)oprom_opreg + found; + vbt_size = ((struct vbt_header *)parse_ptr)->vbt_size;

vbt = kzalloc(vbt_size, GFP_KERNEL); if (!vbt) { @@ -2279,16 +2278,12 @@ static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *i915) goto err_not_found; }

- for (count = 0; count < vbt_size; count += 4) { - intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, found + count); - data = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER); - *(vbt + store++) = data; - } - + memcpy(vbt, parse_ptr, vbt_size); if (!intel_bios_is_valid_vbt(vbt, vbt_size)) goto err_free_vbt;

DRM_DEBUG_KMS("Found valid VBT in SPI flash\n"); + kfree(oprom_opreg);

return (struct vbt_header *)vbt;

diff --git a/drivers/gpu/drm/i915/display/intel_opregion.c b/drivers/gpu/drm/i915/display/intel_opregion.c index dfd724e506b5..e9ccd8265a1f 100644 --- a/drivers/gpu/drm/i915/display/intel_opregion.c +++ b/drivers/gpu/drm/i915/display/intel_opregion.c @@ -983,6 +983,175 @@ int intel_opregion_setup(struct drm_i915_private *dev_priv) return err; }

+static int oprom_image_parse_helper(u8 *parse_ptr, u8 *last_img, u8 *code_type, + struct drm_i915_private *i915) +{ + u8 size_512_bytes; + + if (((union oprom_header *)parse_ptr)->signature != OPROM_IMAGE_MAGIC) { + drm_err(&i915->drm, "Wrong OPROM header signature.\n"); + return -EINVAL; + } + + size_512_bytes = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_IMAGE_LENGTH_OFFSET]; + *code_type = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_CODE_TYPE_OFFSET]; + *last_img = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_LAST_IMAGE_INDICATOR_OFFSET]; + + return size_512_bytes; +} + +static void spi_read_oprom_helper(size_t len, u32 offset, u32 *buf, + struct drm_i915_private *dev_priv) +{ + u32 count, data; + + for (count = 0; count < len; count += 4) { + intel_uncore_write(&dev_priv->uncore, PRIMARY_SPI_ADDRESS, offset + count); + data = intel_uncore_read(&dev_priv->uncore, PRIMARY_SPI_TRIGGER); + buf[count / 4] = data; + } +} + +/** + * + DASH+G OPROM IMAGE LAYOUT + + * +--------+-------+---------------------------+ + * | Offset | Value | ROM Header Fields +-----> Image 1 (CSS) + * +--------------------------------------------+ + * | 0h | 55h | ROM Signature Byte1 | + * | 1h | AAh | ROM Signature Byte2 | + * | 2h | xx | Reserved | + * | 18+19h| xx | Ptr to PCI DataStructure | + * +----------------+---------------------------+ + * | PCI Data Structure | + * +--------------------------------------------+ + * | . . . | + * | . . . | + * | 10 + xx + Image Length | + * | 14 + xx + Code Type | + * | 15 + xx + Last Image Indicator | + * | . . . | + * +--------------------------------------------+ + * | MEU BLOB | + * +--------------------------------------------+ + * | CPD Header | + * | CPD Entry | + * | Reserved | + * | SignedDataPart1 | + * | PublicKey | + * | RSA Signature | + * | SignedDataPart2 | + * | IFWI Metadata | + * +--------+-------+---------------------------+ + * | . | . | . | + * | . | . | . | + * +--------------------------------------------+ + * | Offset | Value | ROM Header Fields +-----> Image 2 (Config Data) (Offset: 0x800) + * +--------------------------------------------+ + * | 0h | 55h | ROM Signature Byte1 | + * | 1h | AAh | ROM Signature Byte2 | + * | 2h | xx | Reserved | + * | 18+19h| xx | Ptr to PCI DataStructure | + * +----------------+---------------------------+ + * | PCI Data Structure | + * +--------------------------------------------+ + * | . . . | + * | . . . | + * | 10 + xx + Image Length | + * | 14 + xx + Code Type | + * | 15 + xx + Last Image Indicator | + * | . . . | + * | 1A + 3C + Ptr to Opregion Signature | + * | . . . | + * | . . . | + * | 83Ch + IntelGraphicsMem | <---+ Opregion Signature + * +--------+-----------------------------------+ + * + * intel_spi_get_oprom_opreg() get OPROM image. + * @i915: pointer to i915 device. + * @opreg: pointer to opregion buffer output. + * @opreg_size: pointer to opregion size output. + */ +int +intel_spi_get_oprom_opreg(struct drm_i915_private *i915, u32 **opreg, + u32 *opreg_size) +{ + u8 img_sig[sizeof(OPREGION_SIGNATURE)]; + u8 code_type, last_img; + u32 static_region, offset, img_len; + u32 *oprom_img, *oprom_img_hdr; + u16 opreg_base; + u8 *parse_ptr; + int img_size; + int ret = -EINVAL; + + /* initialize SPI to read the OPROM */ + static_region = intel_uncore_read(&i915->uncore, SPI_STATIC_REGIONS); + static_region &= OPTIONROM_SPI_REGIONID_MASK; + intel_uncore_write(&i915->uncore, PRIMARY_SPI_REGIONID, static_region); + /* read OPROM offset in SPI flash */ + offset = intel_uncore_read(&i915->uncore, OROM_OFFSET); + offset &= OROM_OFFSET_MASK; + + oprom_img_hdr = kzalloc(OPROM_INITIAL_READ_SIZE, GFP_KERNEL); + if (!oprom_img_hdr) + return -ENOMEM; + + do { + spi_read_oprom_helper(OPROM_INITIAL_READ_SIZE, offset, + oprom_img_hdr, i915); + img_size = oprom_image_parse_helper((u8 *)oprom_img_hdr, &last_img, + &code_type, i915); + if (img_size <= 0) { + ret = -EINVAL; + goto err_free_hdr; + } + + img_len = img_size * OPROM_BYTE_BOUNDARY; + oprom_img = kzalloc(img_len, GFP_KERNEL); + if (!oprom_img) { + ret = -ENOMEM; + goto err_free_hdr; + } + + spi_read_oprom_helper(img_len, offset, oprom_img, i915); + parse_ptr = (u8 *)oprom_img; + offset = offset + img_len; + + /* opregion base offset */ + opreg_base = ((struct expansion_rom_header *)parse_ptr)->opregion_base; + /* CPD or opreg signature is present at opregion_base offset */ + memcpy(img_sig, parse_ptr + opreg_base, sizeof(OPREGION_SIGNATURE)); + + if (!memcmp(img_sig, OPREGION_SIGNATURE, sizeof(OPREGION_SIGNATURE) - 1)) { + *opreg = oprom_img; + *opreg_size = img_len; + drm_dbg_kms(&i915->drm, "Found opregion image\n"); + ret = 0; + break; + } else if (!memcmp(img_sig, CPD_SIGNATURE, NUM_CPD_BYTES)) { + if (code_type != OPROM_CSS_CODE_TYPE) { + drm_err(&i915->drm, "Invalid OPROM\n"); + ret = -EINVAL; + goto err_free_img; + } + drm_dbg_kms(&i915->drm, "Found CSS image\n"); + /* proceed here onwards for signature authentication */ + kfree(oprom_img); + continue; + } + + } while (last_img != LAST_IMG_INDICATOR); + + return ret; + +err_free_img: + kfree(oprom_img); +err_free_hdr: + kfree(oprom_img_hdr); + + return ret; +} + static int intel_use_opregion_panel_type_callback(const struct dmi_system_id *id) { DRM_INFO("Using panel type from OpRegion on %s\n", id->ident); diff --git a/drivers/gpu/drm/i915/display/intel_opregion.h b/drivers/gpu/drm/i915/display/intel_opregion.h index 4aa68ffbd30e..de53dde10dd9 100644 --- a/drivers/gpu/drm/i915/display/intel_opregion.h +++ b/drivers/gpu/drm/i915/display/intel_opregion.h @@ -54,6 +54,34 @@ struct intel_opregion {

#define OPREGION_SIZE (8 * 1024)

+#define CPD_SIGNATURE "$CPD" /* CPD Signature */ +#define NUM_CPD_BYTES 4 +#define PCI_IMAGE_LENGTH_OFFSET 0x10 +#define PCI_CODE_TYPE_OFFSET 0x14 +#define PCI_LAST_IMAGE_INDICATOR_OFFSET 0x15 +#define LAST_IMG_INDICATOR 0x80 +#define OPROM_IMAGE_MAGIC 0xAA55 /* Little Endian */ +#define OPROM_CSS_CODE_TYPE 0xF0 +#define OPROM_BYTE_BOUNDARY 512 /* OPROM image sizes are indicated in 512 byte boundaries */ +#define OPROM_INITIAL_READ_SIZE 60 /* Read 60 bytes to compute the Img Len from PCI structure */ + +union oprom_header { + u32 data; + struct { + u16 signature; /* Offset[0x0]: Header 0x55 0xAA */ + u8 sizein512bytes; + u8 reserved; + }; +}; + +struct expansion_rom_header { + union oprom_header header; /* Offset[0x0]: Oprom Header */ + u16 vbiospostoffset; /* Offset[0x4]: pointer to VBIOS entry point */ + u8 resvd[0x12]; + u16 pcistructoffset; /* Offset[0x18]: Contains pointer PCI Data Structure */ + u16 opregion_base; /* Offset[0x1A]: Offset to Opregion Base start */ +}; + #ifdef CONFIG_ACPI

int intel_opregion_setup(struct drm_i915_private *dev_priv); @@ -72,6 +100,9 @@ int intel_opregion_notify_adapter(struct drm_i915_private *dev_priv, pci_power_t state); int intel_opregion_get_panel_type(struct drm_i915_private *dev_priv);

+int intel_spi_get_oprom_opreg(struct drm_i915_private *i915, u32 **opreg, + u32 *opreg_size); + #else /* CONFIG_ACPI*/

static inline int intel_opregion_setup(struct drm_i915_private *dev_priv) @@ -117,6 +148,11 @@ static inline int intel_opregion_get_panel_type(struct drm_i915_private *dev) return -ENODEV; }

-#endif /* CONFIG_ACPI */ +static int intel_spi_get_oprom_opreg(struct drm_i915_private *i915, u32 **opreg, + u32 *opreg_size) +{ + return 0; +}

+#endif /* CONFIG_ACPI */ #endif

-- 2.26.3

kernel test robot

10:36 p.m.

New subject: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

Hi Matthew,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on drm-intel/for-linux-next] [also build test WARNING on drm-tip/drm-tip] [cannot apply to drm-exynos/exynos-drm-next tegra-drm/drm/tegra/for-next drm/drm-next v5.12-rc7] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Matthew-Auld/More-DG1-enabling/2021... base: git://anongit.freedesktop.org/drm-intel for-linux-next config: x86_64-randconfig-c022-20210412 (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot lkp@intel.com

cocci warnings: (new ones prefixed by >>)

...

...
drivers/gpu/drm/i915/display/intel_bios.c:2274:7-14: WARNING opportunity for kmemdup

Please review and possibly fold the followup patch.

--- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

kernel test robot

10:36 p.m.

New subject: [PATCH] drm/i915/oprom: fix memdup.cocci warnings

From: kernel test robot lkp@intel.com

drivers/gpu/drm/i915/display/intel_bios.c:2274:7-14: WARNING opportunity for kmemdup

Use kmemdup rather than duplicating its implementation

Generated by: scripts/coccinelle/api/memdup.cocci

CC: Anshuman Gupta anshuman.gupta@intel.com Reported-by: kernel test robot lkp@intel.com Signed-off-by: kernel test robot lkp@intel.com ---

url: https://github.com/0day-ci/linux/commits/Matthew-Auld/More-DG1-enabling/2021... base: git://anongit.freedesktop.org/drm-intel for-linux-next

intel_bios.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/drivers/gpu/drm/i915/display/intel_bios.c +++ b/drivers/gpu/drm/i915/display/intel_bios.c @@ -2271,14 +2271,13 @@ static struct vbt_header *spi_oprom_get_ parse_ptr = (u8 *)oprom_opreg + found; vbt_size = ((struct vbt_header *)parse_ptr)->vbt_size;

- vbt = kzalloc(vbt_size, GFP_KERNEL); + vbt = kmemdup(parse_ptr, vbt_size, GFP_KERNEL); if (!vbt) { DRM_ERROR("Unable to allocate %u bytes for VBT storage\n", vbt_size); goto err_not_found; }

- memcpy(vbt, parse_ptr, vbt_size); if (!intel_bios_is_valid_vbt(vbt, vbt_size)) goto err_free_vbt;

Jani Nikula

17 May 17 May

11:57 a.m.

New subject: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

On Mon, 12 Apr 2021, Matthew Auld matthew.auld@intel.com wrote:

...

From: Anshuman Gupta anshuman.gupta@intel.com

Sanitize OPROM header, CPD signature and OPROM PCI version. OPROM_HEADER, EXPANSION_ROM_HEADER and OPROM_MEU_BLOB structures and PCI struct offsets are provided by GSC counterparts. These are yet to be Documented in B.Spec. After successful sanitization, extract VBT from opregion image.

So I don't understand what the point is with two consecutive patches where the latter rewrites a lot of the former.

BR, Jani.

...

v2:

Used macro for OPROM header magic 0xaa55 [Rodrigo]

Added a OPROM layout. [Uma]

Extract opregion from OPROM package and then extract VBT from opregion to have backward compatibility with older IFWI.

v3:

Moved opreg stuff to intel_opregion.{c,h}. [Uma]

Memory leak and intel_oprom_verify_signature return value fixes. [Uma]

v4:

Fix return code storage for oprom_image_parse_helper (Matt)

v5 by Jani:

switch to intel_uncore_read/intel_uncore_write

v6 by Khajapasha:

Rename intel_oprom_verify_signature() to intel_spi_get_oprom_opreg() [Jani, Nikula]

Use u32 data type for opregion size [Jani, Nikula]

Cc: Jani Nikula jani.nikula@intel.com Cc: Uma Shankar uma.shankar@intel.com Signed-off-by: Anshuman Gupta anshuman.gupta@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Mohammed Khajapasha mohammed.khajapasha@intel.com

drivers/gpu/drm/i915/display/intel_bios.c | 47 +++-- drivers/gpu/drm/i915/display/intel_opregion.c | 169 ++++++++++++++++++ drivers/gpu/drm/i915/display/intel_opregion.h | 38 +++- 3 files changed, 227 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c index f9dc651f1652..59eec8333723 100644 --- a/drivers/gpu/drm/i915/display/intel_bios.c +++ b/drivers/gpu/drm/i915/display/intel_bios.c @@ -2240,37 +2240,36 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size)

static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *i915) {

u32 count, data, found, store = 0;

u32 static_region, oprom_offset;

u32 oprom_size = 0x200000;

u32 count, found, opreg_size;

u32 *vbt, *oprom_opreg = NULL; u16 vbt_size;

u32 *vbt;

u8 *parse_ptr;

static_region = intel_uncore_read(&i915->uncore, SPI_STATIC_REGIONS);

static_region &= OPTIONROM_SPI_REGIONID_MASK;

intel_uncore_write(&i915->uncore, PRIMARY_SPI_REGIONID, static_region);

oprom_offset = intel_uncore_read(&i915->uncore, OROM_OFFSET);

oprom_offset &= OROM_OFFSET_MASK;
if (intel_spi_get_oprom_opreg(i915, &oprom_opreg, &opreg_size)) {
drm_err(&i915->drm, "oprom signature verification failed\n");
goto err_not_found;
}
for (count = 0; count < oprom_size; count += 4) {
intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, oprom_offset + count);
data = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER);
if (!oprom_opreg) {
drm_err(&i915->drm, "opregion not found\n");
goto err_not_found;
}
if (data == *((const u32 *)"$VBT")) {
	found = oprom_offset + count;
for (count = 0; count < opreg_size; count += 4) {
if (oprom_opreg[count / 4] == *((const u32 *)"$VBT")) {
	found = count;
break;
} }
if (count >= oprom_size)
if (count >= opreg_size) {
drm_err(&i915->drm, "VBT not found in opregion\n");
goto err_not_found;
}

/* Get VBT size and allocate space for the VBT */
intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, found +
   offsetof(struct vbt_header, vbt_size));
vbt_size = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER);

vbt_size &= 0xffff;
parse_ptr = (u8 *)oprom_opreg + found;

vbt_size = ((struct vbt_header *)parse_ptr)->vbt_size;

vbt = kzalloc(vbt_size, GFP_KERNEL); if (!vbt) {

@@ -2279,16 +2278,12 @@ static struct vbt_header *spi_oprom_get_vbt(struct drm_i915_private *i915) goto err_not_found; }
for (count = 0; count < vbt_size; count += 4) {
intel_uncore_write(&i915->uncore, PRIMARY_SPI_ADDRESS, found + count);
data = intel_uncore_read(&i915->uncore, PRIMARY_SPI_TRIGGER);
*(vbt + store++) = data;
}
memcpy(vbt, parse_ptr, vbt_size); if (!intel_bios_is_valid_vbt(vbt, vbt_size)) goto err_free_vbt;

DRM_DEBUG_KMS("Found valid VBT in SPI flash\n");

kfree(oprom_opreg);

return (struct vbt_header *)vbt;

diff --git a/drivers/gpu/drm/i915/display/intel_opregion.c b/drivers/gpu/drm/i915/display/intel_opregion.c index dfd724e506b5..e9ccd8265a1f 100644 --- a/drivers/gpu/drm/i915/display/intel_opregion.c +++ b/drivers/gpu/drm/i915/display/intel_opregion.c @@ -983,6 +983,175 @@ int intel_opregion_setup(struct drm_i915_private *dev_priv) return err; }

+static int oprom_image_parse_helper(u8 *parse_ptr, u8 *last_img, u8 *code_type,
		    struct drm_i915_private *i915)
+{
u8 size_512_bytes;

if (((union oprom_header *)parse_ptr)->signature != OPROM_IMAGE_MAGIC) {
drm_err(&i915->drm, "Wrong OPROM header signature.\n");
return -EINVAL;
}

size_512_bytes = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_IMAGE_LENGTH_OFFSET];

*code_type = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_CODE_TYPE_OFFSET];

*last_img = parse_ptr[((struct expansion_rom_header *)parse_ptr)->pcistructoffset + PCI_LAST_IMAGE_INDICATOR_OFFSET];

return size_512_bytes;
+}

+static void spi_read_oprom_helper(size_t len, u32 offset, u32 *buf,
		  struct drm_i915_private *dev_priv)
+{
u32 count, data;

for (count = 0; count < len; count += 4) {
intel_uncore_write(&dev_priv->uncore, PRIMARY_SPI_ADDRESS, offset + count);
data = intel_uncore_read(&dev_priv->uncore, PRIMARY_SPI_TRIGGER);
buf[count / 4] = data;
}
+}

+/**
   DASH+G OPROM IMAGE LAYOUT           +
+--------+-------+---------------------------+

| Offset | Value | ROM Header Fields +-----> Image 1 (CSS)

+--------------------------------------------+

| 0h | 55h | ROM Signature Byte1 |

| 1h | AAh | ROM Signature Byte2 |

| 2h | xx | Reserved |

| 18+19h| xx | Ptr to PCI DataStructure |

+----------------+---------------------------+

| PCI Data Structure |

+--------------------------------------------+

| . . . |

| . . . |

| 10 + xx + Image Length |

| 14 + xx + Code Type |

| 15 + xx + Last Image Indicator |

| . . . |

+--------------------------------------------+

| MEU BLOB |

+--------------------------------------------+

| CPD Header |

| CPD Entry |

| Reserved |

| SignedDataPart1 |

| PublicKey |

| RSA Signature |

| SignedDataPart2 |

| IFWI Metadata |

+--------+-------+---------------------------+

| . | . | . |

| . | . | . |

+--------------------------------------------+

| Offset | Value | ROM Header Fields +-----> Image 2 (Config Data) (Offset: 0x800)

+--------------------------------------------+

| 0h | 55h | ROM Signature Byte1 |

| 1h | AAh | ROM Signature Byte2 |

| 2h | xx | Reserved |

| 18+19h| xx | Ptr to PCI DataStructure |

+----------------+---------------------------+

| PCI Data Structure |

+--------------------------------------------+

| . . . |

| . . . |

| 10 + xx + Image Length |

| 14 + xx + Code Type |

| 15 + xx + Last Image Indicator |

| . . . |

| 1A + 3C + Ptr to Opregion Signature |

| . . . |

| . . . |

| 83Ch + IntelGraphicsMem | <---+ Opregion Signature

+--------+-----------------------------------+

intel_spi_get_oprom_opreg() get OPROM image.

@i915: pointer to i915 device.

@opreg: pointer to opregion buffer output.

@opreg_size: pointer to opregion size output.

*/
+int +intel_spi_get_oprom_opreg(struct drm_i915_private *i915, u32 **opreg,
	  u32 *opreg_size)
+{
u8 img_sig[sizeof(OPREGION_SIGNATURE)];

u8 code_type, last_img;

u32 static_region, offset, img_len;

u32 *oprom_img, *oprom_img_hdr;

u16 opreg_base;

u8 *parse_ptr;

int img_size;

int ret = -EINVAL;

/* initialize SPI to read the OPROM */

static_region = intel_uncore_read(&i915->uncore, SPI_STATIC_REGIONS);

static_region &= OPTIONROM_SPI_REGIONID_MASK;

intel_uncore_write(&i915->uncore, PRIMARY_SPI_REGIONID, static_region);

/* read OPROM offset in SPI flash */

offset = intel_uncore_read(&i915->uncore, OROM_OFFSET);

offset &= OROM_OFFSET_MASK;

oprom_img_hdr = kzalloc(OPROM_INITIAL_READ_SIZE, GFP_KERNEL);

if (!oprom_img_hdr)
return -ENOMEM;
do {
spi_read_oprom_helper(OPROM_INITIAL_READ_SIZE, offset,
		      oprom_img_hdr, i915);
img_size = oprom_image_parse_helper((u8 *)oprom_img_hdr, &last_img,
				    &code_type, i915);
if (img_size <= 0) {
	ret = -EINVAL;
	goto err_free_hdr;
}
img_len = img_size * OPROM_BYTE_BOUNDARY;
oprom_img = kzalloc(img_len, GFP_KERNEL);
if (!oprom_img) {
	ret = -ENOMEM;
	goto err_free_hdr;
}
spi_read_oprom_helper(img_len, offset, oprom_img, i915);
parse_ptr = (u8 *)oprom_img;
offset = offset + img_len;
/* opregion base offset */
opreg_base = ((struct expansion_rom_header *)parse_ptr)->opregion_base;
/* CPD or opreg signature is present at opregion_base offset */
memcpy(img_sig, parse_ptr + opreg_base, sizeof(OPREGION_SIGNATURE));
if (!memcmp(img_sig, OPREGION_SIGNATURE, sizeof(OPREGION_SIGNATURE) - 1)) {
	*opreg = oprom_img;
	*opreg_size = img_len;
	drm_dbg_kms(&i915->drm, "Found opregion image\n");
	ret = 0;
	break;
} else if (!memcmp(img_sig, CPD_SIGNATURE, NUM_CPD_BYTES)) {
	if (code_type != OPROM_CSS_CODE_TYPE) {
		drm_err(&i915->drm, "Invalid OPROM\n");
		ret = -EINVAL;
		goto err_free_img;
	}
	drm_dbg_kms(&i915->drm, "Found CSS image\n");
	/* proceed here onwards for signature authentication */
	kfree(oprom_img);
	continue;
}
} while (last_img != LAST_IMG_INDICATOR);

return ret;
+err_free_img:

kfree(oprom_img);

+err_free_hdr:

kfree(oprom_img_hdr);

return ret;

+}

static int intel_use_opregion_panel_type_callback(const struct dmi_system_id *id) { DRM_INFO("Using panel type from OpRegion on %s\n", id->ident); diff --git a/drivers/gpu/drm/i915/display/intel_opregion.h b/drivers/gpu/drm/i915/display/intel_opregion.h index 4aa68ffbd30e..de53dde10dd9 100644 --- a/drivers/gpu/drm/i915/display/intel_opregion.h +++ b/drivers/gpu/drm/i915/display/intel_opregion.h @@ -54,6 +54,34 @@ struct intel_opregion {

#define OPREGION_SIZE (8 * 1024)

+#define CPD_SIGNATURE "$CPD" /* CPD Signature */ +#define NUM_CPD_BYTES 4 +#define PCI_IMAGE_LENGTH_OFFSET 0x10 +#define PCI_CODE_TYPE_OFFSET 0x14 +#define PCI_LAST_IMAGE_INDICATOR_OFFSET 0x15 +#define LAST_IMG_INDICATOR 0x80 +#define OPROM_IMAGE_MAGIC 0xAA55 /* Little Endian */ +#define OPROM_CSS_CODE_TYPE 0xF0 +#define OPROM_BYTE_BOUNDARY 512 /* OPROM image sizes are indicated in 512 byte boundaries */ +#define OPROM_INITIAL_READ_SIZE 60 /* Read 60 bytes to compute the Img Len from PCI structure */

+union oprom_header {
u32 data;

struct {
u16 signature;  /* Offset[0x0]: Header 0x55 0xAA */
u8 sizein512bytes;
u8 reserved;
};
+};

+struct expansion_rom_header {

union oprom_header header; /* Offset[0x0]: Oprom Header */

u16 vbiospostoffset; /* Offset[0x4]: pointer to VBIOS entry point */

u8 resvd[0x12];

u16 pcistructoffset; /* Offset[0x18]: Contains pointer PCI Data Structure */

u16 opregion_base; /* Offset[0x1A]: Offset to Opregion Base start */

+};

#ifdef CONFIG_ACPI

int intel_opregion_setup(struct drm_i915_private *dev_priv); @@ -72,6 +100,9 @@ int intel_opregion_notify_adapter(struct drm_i915_private *dev_priv, pci_power_t state); int intel_opregion_get_panel_type(struct drm_i915_private *dev_priv);

+int intel_spi_get_oprom_opreg(struct drm_i915_private *i915, u32 **opreg,
	      u32 *opreg_size);
#else /* CONFIG_ACPI*/

static inline int intel_opregion_setup(struct drm_i915_private *dev_priv) @@ -117,6 +148,11 @@ static inline int intel_opregion_get_panel_type(struct drm_i915_private *dev) return -ENODEV; }

-#endif /* CONFIG_ACPI */ +static int intel_spi_get_oprom_opreg(struct drm_i915_private *i915, u32 **opreg,
		     u32 *opreg_size)
+{

return 0;

+}

+#endif /* CONFIG_ACPI */ #endif

-- Jani Nikula, Intel Open Source Graphics Center

Lucas De Marchi

18 Sep 18 Sep

4:30 a.m.

New subject: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

On Mon, May 17, 2021 at 02:57:33PM +0300, Jani Nikula wrote:

...

On Mon, 12 Apr 2021, Matthew Auld matthew.auld@intel.com wrote:

...
From: Anshuman Gupta anshuman.gupta@intel.com

Sanitize OPROM header, CPD signature and OPROM PCI version. OPROM_HEADER, EXPANSION_ROM_HEADER and OPROM_MEU_BLOB structures and PCI struct offsets are provided by GSC counterparts. These are yet to be Documented in B.Spec. After successful sanitization, extract VBT from opregion image.

So I don't understand what the point is with two consecutive patches where the latter rewrites a lot of the former.

I actually wonder what's the point of this. Getting it from spi is already the fallback and looks much more complex. Yes, it's pretty detailed and document the format pretty well, but it still looks more complex than the initial code. Do you see additional benefit in this one?

Lucas De Marchi

Jani Nikula

20 Sep 20 Sep

7:41 a.m.

New subject: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

On Fri, 17 Sep 2021, Lucas De Marchi lucas.demarchi@intel.com wrote:

...

On Mon, May 17, 2021 at 02:57:33PM +0300, Jani Nikula wrote:

...
On Mon, 12 Apr 2021, Matthew Auld matthew.auld@intel.com wrote:

...
From: Anshuman Gupta anshuman.gupta@intel.com

Sanitize OPROM header, CPD signature and OPROM PCI version. OPROM_HEADER, EXPANSION_ROM_HEADER and OPROM_MEU_BLOB structures and PCI struct offsets are provided by GSC counterparts. These are yet to be Documented in B.Spec. After successful sanitization, extract VBT from opregion image.

So I don't understand what the point is with two consecutive patches where the latter rewrites a lot of the former.

I actually wonder what's the point of this. Getting it from spi is already the fallback and looks much more complex. Yes, it's pretty detailed and document the format pretty well, but it still looks more complex than the initial code. Do you see additional benefit in this one?

The commit message doesn't really explain much. Anshuman?

BR, Jani.

-- Jani Nikula, Intel Open Source Graphics Center

Gupta, Anshuman

8:04 a.m.

New subject: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

...

-----Original Message----- From: Nikula, Jani jani.nikula@intel.com Sent: Monday, September 20, 2021 1:12 PM To: De Marchi, Lucas lucas.demarchi@intel.com Cc: Auld, Matthew matthew.auld@intel.com; intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Gupta, Anshuman anshuman.gupta@intel.com Subject: Re: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

On Fri, 17 Sep 2021, Lucas De Marchi lucas.demarchi@intel.com wrote:

...
On Mon, May 17, 2021 at 02:57:33PM +0300, Jani Nikula wrote:

...
On Mon, 12 Apr 2021, Matthew Auld matthew.auld@intel.com wrote:

...
From: Anshuman Gupta anshuman.gupta@intel.com

Sanitize OPROM header, CPD signature and OPROM PCI version. OPROM_HEADER, EXPANSION_ROM_HEADER and OPROM_MEU_BLOB

structures and

...
...
...
PCI struct offsets are provided by GSC counterparts. These are yet to be Documented in B.Spec. After successful sanitization, extract VBT from opregion image.

So I don't understand what the point is with two consecutive patches where the latter rewrites a lot of the former.

I actually wonder what's the point of this. Getting it from spi is already the fallback and looks much more complex. Yes, it's pretty detailed and document the format pretty well, but it still looks more complex than the initial code. Do you see additional benefit in this one?

Getting opregion image from spi is needed to get the intel_opregion and its mailboxes on discrete card.

...

The commit message doesn't really explain much. Anshuman?

I will get rework of the patches and float it again. Thanks, Anshuman Gupta.

...

BR, Jani.

-- Jani Nikula, Intel Open Source Graphics Center

Jani Nikula

8:43 a.m.

New subject: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

On Mon, 20 Sep 2021, "Gupta, Anshuman" anshuman.gupta@intel.com wrote:

...

...
-----Original Message----- From: Nikula, Jani jani.nikula@intel.com Sent: Monday, September 20, 2021 1:12 PM To: De Marchi, Lucas lucas.demarchi@intel.com Cc: Auld, Matthew matthew.auld@intel.com; intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Gupta, Anshuman anshuman.gupta@intel.com Subject: Re: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

On Fri, 17 Sep 2021, Lucas De Marchi lucas.demarchi@intel.com wrote:

...
On Mon, May 17, 2021 at 02:57:33PM +0300, Jani Nikula wrote:

...
On Mon, 12 Apr 2021, Matthew Auld matthew.auld@intel.com wrote:

...
From: Anshuman Gupta anshuman.gupta@intel.com

Sanitize OPROM header, CPD signature and OPROM PCI version. OPROM_HEADER, EXPANSION_ROM_HEADER and OPROM_MEU_BLOB

structures and

...
...
...
PCI struct offsets are provided by GSC counterparts. These are yet to be Documented in B.Spec. After successful sanitization, extract VBT from opregion image.

So I don't understand what the point is with two consecutive patches where the latter rewrites a lot of the former.

I actually wonder what's the point of this. Getting it from spi is already the fallback and looks much more complex. Yes, it's pretty detailed and document the format pretty well, but it still looks more complex than the initial code. Do you see additional benefit in this one?

Getting opregion image from spi is needed to get the intel_opregion and its mailboxes on discrete card.

I mean what's the point of the "drm/i915/oprom: Basic sanitization" patch? And if that's needed, then why is it separate from "drm/i915/dg1: Read OPROM via SPI controller"?

...

...
The commit message doesn't really explain much. Anshuman?

I will get rework of the patches and float it again.

Lucas already sent something, please sync with him.

BR, Jani.

...

Thanks, Anshuman Gupta.

...
BR, Jani.

-- Jani Nikula, Intel Open Source Graphics Center

-- Jani Nikula, Intel Open Source Graphics Center

Lucas De Marchi

22 Sep 22 Sep

9:53 p.m.

New subject: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

On Mon, Sep 20, 2021 at 08:04:32AM +0000, Gupta, Anshuman wrote:

...

...
-----Original Message----- From: Nikula, Jani jani.nikula@intel.com Sent: Monday, September 20, 2021 1:12 PM To: De Marchi, Lucas lucas.demarchi@intel.com Cc: Auld, Matthew matthew.auld@intel.com; intel-gfx@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Gupta, Anshuman anshuman.gupta@intel.com Subject: Re: [Intel-gfx] [PATCH 14/19] drm/i915/oprom: Basic sanitization

On Fri, 17 Sep 2021, Lucas De Marchi lucas.demarchi@intel.com wrote:

...
On Mon, May 17, 2021 at 02:57:33PM +0300, Jani Nikula wrote:

...
On Mon, 12 Apr 2021, Matthew Auld matthew.auld@intel.com wrote:

...
From: Anshuman Gupta anshuman.gupta@intel.com

Sanitize OPROM header, CPD signature and OPROM PCI version. OPROM_HEADER, EXPANSION_ROM_HEADER and OPROM_MEU_BLOB

structures and

...
...
...
PCI struct offsets are provided by GSC counterparts. These are yet to be Documented in B.Spec. After successful sanitization, extract VBT from opregion image.

So I don't understand what the point is with two consecutive patches where the latter rewrites a lot of the former.

I actually wonder what's the point of this. Getting it from spi is already the fallback and looks much more complex. Yes, it's pretty detailed and document the format pretty well, but it still looks more complex than the initial code. Do you see additional benefit in this one?

Getting opregion image from spi is needed to get the intel_opregion and its mailboxes on discrete card.

...
The commit message doesn't really explain much. Anshuman?

I will get rework of the patches and float it again.

from this patch the only thing I see it's doing is to get the VBT from inside opregion... it moves the read part to helper methods and apparently it supports multiple images...?

The question here is not why we are reading from spi, but rather what this is doing that the previous commit wasn't already.

Lucas De Marchi

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 15/19] drm/i915: WA for zero memory channel

From: José Roberto de Souza jose.souza@intel.com

Commit c457d9cf256e ("drm/i915: Make sure we have enough memory bandwidth on ICL") assumes that we always have a non-zero dram_info->channels and uses it as a divisor. We need num memory channels to be at least 1 for sane bw limits checking, even when PCode returns 0, so lets force it to 1 in this case.

Cc: Stanislav Lisovskiy stanislav.lisovskiy@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/display/intel_bw.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c index 584ab5ce4106..c5f70f3e930e 100644 --- a/drivers/gpu/drm/i915/display/intel_bw.c +++ b/drivers/gpu/drm/i915/display/intel_bw.c @@ -175,6 +175,7 @@ static int icl_get_bw_info(struct drm_i915_private *dev_priv, const struct intel "Failed to get memory subsystem information, ignoring bandwidth limits"); return ret; } + num_channels = max_t(u8, 1, num_channels);

deinterleave = DIV_ROUND_UP(num_channels, is_y_tile ? 4 : 2); dclk_max = icl_sagv_max_dclk(&qi);

-- 2.26.3

Souza, Jose

4:57 p.m.

New subject: [PATCH 15/19] drm/i915: WA for zero memory channel

On Mon, 2021-04-12 at 10:05 +0100, Matthew Auld wrote:

...

From: José Roberto de Souza jose.souza@intel.com

Commit c457d9cf256e ("drm/i915: Make sure we have enough memory bandwidth on ICL") assumes that we always have a non-zero dram_info->channels and uses it as a divisor. We need num memory channels to be at least 1 for sane bw limits checking, even when PCode returns 0, so lets force it to 1 in this case.

Missing my sob.

...

Cc: Stanislav Lisovskiy stanislav.lisovskiy@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com

drivers/gpu/drm/i915/display/intel_bw.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c index 584ab5ce4106..c5f70f3e930e 100644 --- a/drivers/gpu/drm/i915/display/intel_bw.c +++ b/drivers/gpu/drm/i915/display/intel_bw.c @@ -175,6 +175,7 @@ static int icl_get_bw_info(struct drm_i915_private *dev_priv, const struct intel "Failed to get memory subsystem information, ignoring bandwidth limits"); return ret; }

num_channels = max_t(u8, 1, num_channels);

deinterleave = DIV_ROUND_UP(num_channels, is_y_tile ? 4 : 2); dclk_max = icl_sagv_max_dclk(&qi);

Matthew Auld

9:05 a.m.

New subject: [PATCH 16/19] drm/i915/dg1: Compute MEM Bandwidth using MCHBAR

From: Clint Taylor clinton.a.taylor@intel.com

The PUNIT FW is currently returning 0 for all memory bandwidth parameters. Read the values directly from MCHBAR offsets 0x5918 and 0x4000(4). This is a temporary WA until the PUNIT FW returns valid values.

v2 (Lucas): Add error to log since this is fixed in new pcode available on IFWI WW14. Also fix checkpatch warnings.

v3 by Jani: - switch to intel_uncore_read/intel_uncore_write

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: Matt Roper matthew.d.roper@intel.com Cc: Jani Saarinen jani.saarinen@intel.com Signed-off-by: Clint Taylor clinton.a.taylor@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com --- drivers/gpu/drm/i915/display/intel_bw.c | 54 ++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c index c5f70f3e930e..99cae0dc0ca2 100644 --- a/drivers/gpu/drm/i915/display/intel_bw.c +++ b/drivers/gpu/drm/i915/display/intel_bw.c @@ -23,6 +23,53 @@ struct intel_qgv_info { u8 t_bl; };

+#define SA_PERF_STATUS_0_0_0_MCHBAR_PC _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5918) +#define DG1_QCLK_RATIO_MASK (0xFF << 2) +#define DG1_QCLK_RATIO_SHIFT 2 +#define DG1_QCLK_REFERENCE (1 << 10) + +#define MCHBAR_CH0_CR_TC_PRE_0_0_0_MCHBAR _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x4000) +#define MCHBAR_CH0_CR_TC_PRE_0_0_0_MCHBAR_HIGH _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x4004) +#define MCHBAR_CH1_CR_TC_PRE_0_0_0_MCHBAR _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x4400) +#define MCHBAR_CH1_CR_TC_PRE_0_0_0_MCHBAR_HIGH _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x4404) +#define DG1_DRAM_T_RCD_MASK (0x7F << 9) +#define DG1_DRAM_T_RCD_SHIFT 9 +#define DG1_DRAM_T_RDPRE_MASK (0x3F << 11) +#define DG1_DRAM_T_RDPRE_SHIFT 11 +#define DG1_DRAM_T_RAS_MASK (0xFF << 1) +#define DG1_DRAM_T_RAS_SHIFT 1 +#define DG1_DRAM_T_RP_MASK (0x7F << 0) +#define DG1_DRAM_T_RP_SHIFT 0 + +static int dg1_mchbar_read_qgv_point_info(struct drm_i915_private *dev_priv, + struct intel_qgv_point *sp, + int point) +{ + u32 val = 0; + u32 dclk_ratio = 0, dclk_reference = 0; + + val = intel_uncore_read(&dev_priv->uncore, SA_PERF_STATUS_0_0_0_MCHBAR_PC); + dclk_ratio = (val & DG1_QCLK_RATIO_MASK) >> DG1_QCLK_RATIO_SHIFT; + if (val & DG1_QCLK_REFERENCE) + dclk_reference = 6; /* 6 * 16.666 MHz = 100 MHz */ + else + dclk_reference = 8; /* 8 * 16.666 MHz = 133 MHz */ + sp->dclk = dclk_ratio * dclk_reference; + if (sp->dclk == 0) + return -EINVAL; + + val = intel_uncore_read(&dev_priv->uncore, MCHBAR_CH0_CR_TC_PRE_0_0_0_MCHBAR); + sp->t_rp = (val & DG1_DRAM_T_RP_MASK) >> DG1_DRAM_T_RP_SHIFT; + sp->t_rdpre = (val & DG1_DRAM_T_RDPRE_MASK) >> DG1_DRAM_T_RDPRE_SHIFT; + + val = intel_uncore_read(&dev_priv->uncore, MCHBAR_CH0_CR_TC_PRE_0_0_0_MCHBAR_HIGH); + sp->t_rcd = (val & DG1_DRAM_T_RCD_MASK) >> DG1_DRAM_T_RCD_SHIFT; + sp->t_ras = (val & DG1_DRAM_T_RAS_MASK) >> DG1_DRAM_T_RAS_SHIFT; + + sp->t_rc = sp->t_rp + sp->t_ras; + return 0; +} + static int icl_pcode_read_qgv_point_info(struct drm_i915_private *dev_priv, struct intel_qgv_point *sp, int point) @@ -100,7 +147,12 @@ static int icl_get_qgv_points(struct drm_i915_private *dev_priv, struct intel_qgv_point *sp = &qi->points[i];

ret = icl_pcode_read_qgv_point_info(dev_priv, sp, i); - if (ret) + if (IS_DG1(dev_priv) && (ret || sp->dclk == 0)) { + drm_dbg_kms(&dev_priv->drm, "Failed to get memory subsystem information via pcode. IFWI needs update. Trying with MCHBAR\n"); + ret = dg1_mchbar_read_qgv_point_info(dev_priv, sp, i); + if (ret) + return ret; + } else if (ret) return ret;

drm_dbg_kms(&dev_priv->drm,

-- 2.26.3

Matthew Auld

9:05 a.m.

New subject: [PATCH 17/19] drm/i915/dg1: Double memory bandwidth available

From: Clint Taylor clinton.a.taylor@intel.com

Use MCHBAR Gear_type information to compute memory bandwidth available during MCHBAR calculations.

v2 by Jani: - switch to intel_uncore_read/intel_uncore_write

Tested-by: Swati Sharma swati2.sharma@intel.com Cc: Swati Sharma swati2.sharma@intel.com Cc: Ville Syrjälä ville.syrjala@linux.intel.com Signed-off-by: Clint Taylor clinton.a.taylor@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com --- drivers/gpu/drm/i915/display/intel_bw.c | 8 ++++++++ 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c index 99cae0dc0ca2..6c02bd52ce45 100644 --- a/drivers/gpu/drm/i915/display/intel_bw.c +++ b/drivers/gpu/drm/i915/display/intel_bw.c @@ -41,6 +41,9 @@ struct intel_qgv_info { #define DG1_DRAM_T_RP_MASK (0x7F << 0) #define DG1_DRAM_T_RP_SHIFT 0

+#define ICL_GEAR_TYPE_MASK (0x01 << 16) +#define ICL_GEAR_TYPE_SHIFT 16 + static int dg1_mchbar_read_qgv_point_info(struct drm_i915_private *dev_priv, struct intel_qgv_point *sp, int point) @@ -55,6 +58,11 @@ static int dg1_mchbar_read_qgv_point_info(struct drm_i915_private *dev_priv, else dclk_reference = 8; /* 8 * 16.666 MHz = 133 MHz */ sp->dclk = dclk_ratio * dclk_reference; + + val = intel_uncore_read(&dev_priv->uncore, SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU); + if ((val & ICL_GEAR_TYPE_MASK) >> ICL_GEAR_TYPE_SHIFT) + sp->dclk *= 2; + if (sp->dclk == 0) return -EINVAL;

-- 2.26.3

Matthew Auld

9:05 a.m.

New subject: [PATCH 18/19] drm/i915/gtt: map the PD up front

We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.

v2: (Thomas) Rebase on dma_resv and obj->mm.lock removal.

Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- .../drm/i915/gem/selftests/i915_gem_context.c | 11 +---- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 11 ++--- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++++------ drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 48 +++++++++---------- drivers/gpu/drm/i915/gt/intel_gtt.h | 11 +++-- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 7 ++- drivers/gpu/drm/i915/i915_vma.c | 3 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 10 ++-- drivers/gpu/drm/i915/selftests/i915_perf.c | 3 +- 10 files changed, 54 insertions(+), 78 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 5fef592390cb..ce70d0a3afb2 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1740,7 +1740,6 @@ static int read_from_scratch(struct i915_gem_context *ctx, static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) { struct i915_address_space *vm; - struct page *page; u32 *vaddr; int err = 0;

@@ -1748,24 +1747,18 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) if (!vm) return -ENODEV;

- page = __px_page(vm->scratch[0]); - if (!page) { + if (!vm->scratch[0]) { pr_err("No scratch page!\n"); return -EINVAL; }

- vaddr = kmap(page); - if (!vaddr) { - pr_err("No (mappable) scratch page!\n"); - return -EINVAL; - } + vaddr = __px_vaddr(vm->scratch[0]);

memcpy(out, vaddr, sizeof(*out)); if (memchr_inv(vaddr, *out, PAGE_SIZE)) { pr_err("Inconsistent initial state of scratch page!\n"); err = -EINVAL; } - kunmap(page);

return err; } diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index e08dff376339..21b1085769be 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -96,9 +96,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm, * entries back to scratch. */

- vaddr = kmap_atomic_px(pt); + vaddr = px_vaddr(pt); memset32(vaddr + pte, scratch_pte, count); - kunmap_atomic(vaddr);

pte = 0; } @@ -120,7 +119,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,

GEM_BUG_ON(!pd->entry[act_pt]);

- vaddr = kmap_atomic_px(i915_pt_entry(pd, act_pt)); + vaddr = px_vaddr(i915_pt_entry(pd, act_pt)); do { GEM_BUG_ON(sg_dma_len(iter.sg) < I915_GTT_PAGE_SIZE); vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma); @@ -136,12 +135,10 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, }

if (++act_pte == GEN6_PTES) { - kunmap_atomic(vaddr); - vaddr = kmap_atomic_px(i915_pt_entry(pd, ++act_pt)); + vaddr = px_vaddr(i915_pt_entry(pd, ++act_pt)); act_pte = 0; } } while (1); - kunmap_atomic(vaddr);

vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; } @@ -235,7 +232,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt) goto err_scratch0; }

- ret = pin_pt_dma(vm, vm->scratch[1]); + ret = map_pt_dma(vm, vm->scratch[1]); if (ret) goto err_scratch1;

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 176c19633412..f83496836f0f 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -242,11 +242,10 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used));

- vaddr = kmap_atomic_px(pt); + vaddr = px_vaddr(pt); memset64(vaddr + gen8_pd_index(start, 0), vm->scratch[0]->encode, count); - kunmap_atomic(vaddr);

atomic_sub(count, &pt->used); start += count; @@ -375,7 +374,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, gen8_pte_t *vaddr;

pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); - vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); do { GEM_BUG_ON(sg_dma_len(iter->sg) < I915_GTT_PAGE_SIZE); vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma; @@ -402,12 +401,10 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, }

clflush_cache_range(vaddr, PAGE_SIZE); - kunmap_atomic(vaddr); - vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); } } while (1); clflush_cache_range(vaddr, PAGE_SIZE); - kunmap_atomic(vaddr);

return idx; } @@ -442,7 +439,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, encode |= GEN8_PDE_PS_2M; page_size = I915_GTT_PAGE_SIZE_2M;

- vaddr = kmap_atomic_px(pd); + vaddr = px_vaddr(pd); } else { struct i915_page_table *pt = i915_pt_entry(pd, __gen8_pte_index(start, 1)); @@ -457,7 +454,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, rem >= (I915_PDES - index) * I915_GTT_PAGE_SIZE)) maybe_64K = __gen8_pte_index(start, 1);

- vaddr = kmap_atomic_px(pt); + vaddr = px_vaddr(pt); }

do { @@ -491,7 +488,6 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, } while (rem >= page_size && index < I915_PDES);

clflush_cache_range(vaddr, PAGE_SIZE); - kunmap_atomic(vaddr);

/* * Is it safe to mark the 2M block as 64K? -- Either we have @@ -505,9 +501,8 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, !iter->sg && IS_ALIGNED(vma->node.start + vma->node.size, I915_GTT_PAGE_SIZE_2M)))) { - vaddr = kmap_atomic_px(pd); + vaddr = px_vaddr(pd); vaddr[maybe_64K] |= GEN8_PDE_IPS_64K; - kunmap_atomic(vaddr); page_size = I915_GTT_PAGE_SIZE_64K;

/* @@ -523,12 +518,11 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, u16 i;

encode = vma->vm->scratch[0]->encode; - vaddr = kmap_atomic_px(i915_pt_entry(pd, maybe_64K)); + vaddr = px_vaddr(i915_pt_entry(pd, maybe_64K));

for (i = 1; i < index; i += 16) memset64(vaddr + i, encode, 15);

- kunmap_atomic(vaddr); } }

@@ -602,7 +596,7 @@ static int gen8_init_scratch(struct i915_address_space *vm) if (IS_ERR(obj)) goto free_scratch;

- ret = pin_pt_dma(vm, obj); + ret = map_pt_dma(vm, obj); if (ret) { i915_gem_object_put(obj); goto free_scratch; @@ -639,7 +633,7 @@ static int gen8_preallocate_top_level_pdp(struct i915_ppgtt *ppgtt) if (IS_ERR(pde)) return PTR_ERR(pde);

- err = pin_pt_dma(vm, pde->pt.base); + err = map_pt_dma(vm, pde->pt.base); if (err) { i915_gem_object_put(pde->pt.base); free_pd(vm, pde); @@ -675,7 +669,7 @@ gen8_alloc_top_pd(struct i915_address_space *vm) goto err_pd; }

- err = pin_pt_dma(vm, pd->pt.base); + err = map_pt_dma(vm, pd->pt.base); if (err) goto err_pd;

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 670c1271e7d5..d94628b9d89e 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -657,7 +657,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt) goto err_ppgtt;

i915_gem_object_lock(ppgtt->vm.scratch[0], NULL); - err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash); + err = i915_vm_map_pt_stash(&ppgtt->vm, &stash); i915_gem_object_unlock(ppgtt->vm.scratch[0]); if (err) goto err_stash; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 941f8af016d6..d386b89e2758 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -25,27 +25,25 @@ struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz) return obj; }

-int pin_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) +int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) { - int err; + void *vaddr;

- i915_gem_object_lock(obj, NULL); - err = i915_gem_object_pin_pages(obj); - i915_gem_object_unlock(obj); - if (err) - return err; + vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr);

i915_gem_object_make_unshrinkable(obj); return 0; }

-int pin_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) +int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) { - int err; + void *vaddr;

- err = i915_gem_object_pin_pages(obj); - if (err) - return err; + vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + if (IS_ERR(vaddr)) + return PTR_ERR(vaddr);

i915_gem_object_make_unshrinkable(obj); return 0; @@ -155,6 +153,14 @@ void clear_pages(struct i915_vma *vma) memset(&vma->page_sizes, 0, sizeof(vma->page_sizes)); }

+void *__px_vaddr(struct drm_i915_gem_object *p) +{ + enum i915_map_type type; + + GEM_BUG_ON(!i915_gem_object_has_pages(p)); + return page_unpack_bits(p->mm.mapping, &type); +} + dma_addr_t __px_dma(struct drm_i915_gem_object *p) { GEM_BUG_ON(!i915_gem_object_has_pages(p)); @@ -170,32 +176,22 @@ struct page *__px_page(struct drm_i915_gem_object *p) void fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count) { - struct page *page = __px_page(p); - void *vaddr; + void *vaddr = __px_vaddr(p);

- vaddr = kmap(page); memset64(vaddr, val, count); clflush_cache_range(vaddr, PAGE_SIZE); - kunmap(page); }

static void poison_scratch_page(struct drm_i915_gem_object *scratch) { - struct sgt_iter sgt; - struct page *page; + void *vaddr = __px_vaddr(scratch); u8 val;

val = 0; if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) val = POISON_FREE;

- for_each_sgt_page(page, sgt, scratch->mm.pages) { - void *vaddr; - - vaddr = kmap(page); - memset(vaddr, val, PAGE_SIZE); - kunmap(page); - } + memset(vaddr, val, scratch->base.size); }

int setup_scratch_page(struct i915_address_space *vm) @@ -225,7 +221,7 @@ int setup_scratch_page(struct i915_address_space *vm) if (IS_ERR(obj)) goto skip;

- if (pin_pt_dma(vm, obj)) + if (map_pt_dma(vm, obj)) goto skip_obj;

/* We need a single contiguous page for our scratch */ diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index e67e34e17913..40e486704558 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -180,6 +180,9 @@ struct page *__px_page(struct drm_i915_gem_object *p); dma_addr_t __px_dma(struct drm_i915_gem_object *p); #define px_dma(px) (__px_dma(px_base(px)))

+void *__px_vaddr(struct drm_i915_gem_object *p); +#define px_vaddr(px) (__px_vaddr(px_base(px))) + #define px_pt(px) \ __px_choose_expr(px, struct i915_page_table *, __x, \ __px_choose_expr(px, struct i915_page_directory *, &__x->pt, \ @@ -511,8 +514,6 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt); void i915_ggtt_suspend(struct i915_ggtt *gtt); void i915_ggtt_resume(struct i915_ggtt *ggtt);

-#define kmap_atomic_px(px) kmap_atomic(__px_page(px_base(px))) - void fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count);

@@ -530,8 +531,8 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm); struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde);

-int pin_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj); -int pin_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj); +int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj); +int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj);

void free_px(struct i915_address_space *vm, struct i915_page_table *pt, int lvl); @@ -578,7 +579,7 @@ void setup_private_pat(struct intel_uncore *uncore); int i915_vm_alloc_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, u64 size); -int i915_vm_pin_pt_stash(struct i915_address_space *vm, +int i915_vm_map_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash); void i915_vm_free_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash); diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 014ae8ac4480..4e3d80c2295c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -87,11 +87,10 @@ write_dma_entry(struct drm_i915_gem_object * const pdma, const unsigned short idx, const u64 encoded_entry) { - u64 * const vaddr = kmap_atomic(__px_page(pdma)); + u64 * const vaddr = __px_vaddr(pdma);

vaddr[idx] = encoded_entry; clflush_cache_range(&vaddr[idx], sizeof(u64)); - kunmap_atomic(vaddr); }

void @@ -258,7 +257,7 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm, return 0; }

-int i915_vm_pin_pt_stash(struct i915_address_space *vm, +int i915_vm_map_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash) { struct i915_page_table *pt; @@ -266,7 +265,7 @@ int i915_vm_pin_pt_stash(struct i915_address_space *vm,

for (n = 0; n < ARRAY_SIZE(stash->pt); n++) { for (pt = stash->pt[n]; pt; pt = pt->stash) { - err = pin_pt_dma_locked(vm, pt->base); + err = map_pt_dma_locked(vm, pt->base); if (err) return err; } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index e24d33aecac4..c68a743fac2a 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -912,8 +912,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, if (err) goto err_fence;

- err = i915_vm_pin_pt_stash(vma->vm, - &work->stash); + err = i915_vm_map_pt_stash(vma->vm, &work->stash); if (err) goto err_fence; } diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 2e4f06eaacc1..e060e455e9f6 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -186,7 +186,7 @@ static int igt_ppgtt_alloc(void *arg) if (err) goto err_ppgtt_cleanup;

- err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash); + err = i915_vm_map_pt_stash(&ppgtt->vm, &stash); if (err) { i915_vm_free_pt_stash(&ppgtt->vm, &stash); goto err_ppgtt_cleanup; @@ -208,7 +208,7 @@ static int igt_ppgtt_alloc(void *arg) if (err) goto err_ppgtt_cleanup;

- err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash); + err = i915_vm_map_pt_stash(&ppgtt->vm, &stash); if (err) { i915_vm_free_pt_stash(&ppgtt->vm, &stash); goto err_ppgtt_cleanup; @@ -325,11 +325,10 @@ static int lowlevel_hole(struct i915_address_space *vm, BIT_ULL(size))) goto alloc_vm_end;

- err = i915_vm_pin_pt_stash(vm, &stash); + err = i915_vm_map_pt_stash(vm, &stash); if (!err) vm->allocate_va_range(vm, &stash, addr, BIT_ULL(size)); - i915_vm_free_pt_stash(vm, &stash); alloc_vm_end: if (err == -EDEADLK) { @@ -1967,10 +1966,9 @@ static int igt_cs_tlb(void *arg) if (err) goto end_ww;

- err = i915_vm_pin_pt_stash(vm, &stash); + err = i915_vm_map_pt_stash(vm, &stash); if (!err) vm->allocate_va_range(vm, &stash, offset, chunk_size); - i915_vm_free_pt_stash(vm, &stash); end_ww: if (err == -EDEADLK) { diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c index e9d86dab8677..bfb0290967a1 100644 --- a/drivers/gpu/drm/i915/selftests/i915_perf.c +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c @@ -307,7 +307,7 @@ static int live_noa_gpr(void *arg) }

/* Poison the ce->vm so we detect writes not to the GGTT gt->scratch */ - scratch = kmap(__px_page(ce->vm->scratch[0])); + scratch = __px_vaddr(ce->vm->scratch[0]); memset(scratch, POISON_FREE, PAGE_SIZE);

rq = intel_context_create_request(ce); @@ -405,7 +405,6 @@ static int live_noa_gpr(void *arg) out_rq: i915_request_put(rq); out_ce: - kunmap(__px_page(ce->vm->scratch[0])); intel_context_put(ce); out: stream_destroy(stream);

-- 2.26.3

Daniel Vetter

3:17 p.m.

New subject: [Intel-gfx] [PATCH 18/19] drm/i915/gtt: map the PD up front

On Mon, Apr 12, 2021 at 10:05:25AM +0100, Matthew Auld wrote:

...

We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.

v2: (Thomas) Rebase on dma_resv and obj->mm.lock removal.

Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk

So I wanted to understand what px stands for as an abbreviation, and dug all the way down to this:

commit 567047be2a7ede082d29f45524c287b87bd75e53 Author: Mika Kuoppala mika.kuoppala@linux.intel.com Date: Thu Jun 25 18:35:12 2015 +0300

drm/i915/gtt: Use macros to access dma mapped pages

I still have no idea what it means, I guess px = page. But I also committed this, so I guess can blame myself :-)

But while digging I've stumbled over this here

commit 6eebfe8a10a62139d681e2f1af1386252742278b Author: Chris Wilson chris@chris-wilson.co.uk Date: Fri Jul 12 08:58:18 2019 +0100

drm/i915/gtt: Use shallow dma pages for scratch

And that's some serious wtf. Yes we've done some compile-time type casting automagic between i915_priv and dev in the past, and I think even that was bad taste. But it was justified with that we have these everywhere (especially in the mmio macros), and it would be a terrible flag day.

But I'm not seeing any need for auto-casting for these pages here, and I'm not aware that we're doing this anywhere else in kernel code. There is some macro-trickery in lockdep annotations, but that relies on the lockdep map having the same struct member name in all lock types, and is not exposed to drivers at all.

Am I missing something, or why do we have this compile-time type casting stuff going on in i915 page accessors? -Daniel

...

.../drm/i915/gem/selftests/i915_gem_context.c | 11 +---- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 11 ++--- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++++------ drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 48 +++++++++---------- drivers/gpu/drm/i915/gt/intel_gtt.h | 11 +++-- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 7 ++- drivers/gpu/drm/i915/i915_vma.c | 3 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 10 ++-- drivers/gpu/drm/i915/selftests/i915_perf.c | 3 +- 10 files changed, 54 insertions(+), 78 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 5fef592390cb..ce70d0a3afb2 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1740,7 +1740,6 @@ static int read_from_scratch(struct i915_gem_context *ctx, static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) { struct i915_address_space *vm;

struct page *page; u32 *vaddr; int err = 0;

@@ -1748,24 +1747,18 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) if (!vm) return -ENODEV;

page = __px_page(vm->scratch[0]);

if (!page) {

if (!vm->scratch[0]) { pr_err("No scratch page!\n"); return -EINVAL; }
vaddr = kmap(page);

if (!vaddr) {
pr_err("No (mappable) scratch page!\n");
return -EINVAL;
}
vaddr = __px_vaddr(vm->scratch[0]);

memcpy(out, vaddr, sizeof(*out)); if (memchr_inv(vaddr, *out, PAGE_SIZE)) { pr_err("Inconsistent initial state of scratch page!\n"); err = -EINVAL; }

kunmap(page);

return err;

} diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index e08dff376339..21b1085769be 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -96,9 +96,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm, * entries back to scratch. */
vaddr = kmap_atomic_px(pt);
vaddr = px_vaddr(pt);
memset32(vaddr + pte, scratch_pte, count);
kunmap_atomic(vaddr);
pte = 0; }
@@ -120,7 +119,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,

GEM_BUG_ON(!pd->entry[act_pt]);

vaddr = kmap_atomic_px(i915_pt_entry(pd, act_pt));

vaddr = px_vaddr(i915_pt_entry(pd, act_pt)); do { GEM_BUG_ON(sg_dma_len(iter.sg) < I915_GTT_PAGE_SIZE); vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma);

@@ -136,12 +135,10 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, }
if (++act_pte == GEN6_PTES) {
	kunmap_atomic(vaddr);
	vaddr = kmap_atomic_px(i915_pt_entry(pd, ++act_pt));
	vaddr = px_vaddr(i915_pt_entry(pd, ++act_pt));
act_pte = 0;
} } while (1);
kunmap_atomic(vaddr);

vma->page_sizes.gtt = I915_GTT_PAGE_SIZE;

} @@ -235,7 +232,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt) goto err_scratch0; }

ret = pin_pt_dma(vm, vm->scratch[1]);

ret = map_pt_dma(vm, vm->scratch[1]); if (ret) goto err_scratch1;

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 176c19633412..f83496836f0f 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -242,11 +242,10 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
	vaddr = kmap_atomic_px(pt);
	vaddr = px_vaddr(pt);
memset64(vaddr + gen8_pd_index(start, 0),
	 vm->scratch[0]->encode,
	 count);
	kunmap_atomic(vaddr);

atomic_sub(count, &pt->used);
start += count;
@@ -375,7 +374,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, gen8_pte_t *vaddr;

pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2));

vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1)));

vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); do { GEM_BUG_ON(sg_dma_len(iter->sg) < I915_GTT_PAGE_SIZE); vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma;

@@ -402,12 +401,10 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, }
	clflush_cache_range(vaddr, PAGE_SIZE);
	kunmap_atomic(vaddr);
	vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
	vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
} } while (1); clflush_cache_range(vaddr, PAGE_SIZE);
kunmap_atomic(vaddr);

return idx;

} @@ -442,7 +439,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, encode |= GEN8_PDE_PS_2M; page_size = I915_GTT_PAGE_SIZE_2M;
	vaddr = kmap_atomic_px(pd);
	vaddr = px_vaddr(pd);
} else { struct i915_page_table *pt = i915_pt_entry(pd, __gen8_pte_index(start, 1));
@@ -457,7 +454,7 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, rem >= (I915_PDES - index) * I915_GTT_PAGE_SIZE)) maybe_64K = __gen8_pte_index(start, 1);
	vaddr = kmap_atomic_px(pt);
	vaddr = px_vaddr(pt);
}

do {
@@ -491,7 +488,6 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, } while (rem >= page_size && index < I915_PDES);
clflush_cache_range(vaddr, PAGE_SIZE);
kunmap_atomic(vaddr);
/*

Is it safe to mark the 2M block as 64K? -- Either we have
@@ -505,9 +501,8 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, !iter->sg && IS_ALIGNED(vma->node.start + vma->node.size, I915_GTT_PAGE_SIZE_2M)))) {
	vaddr = kmap_atomic_px(pd);
	vaddr = px_vaddr(pd);
vaddr[maybe_64K] |= GEN8_PDE_IPS_64K;
	kunmap_atomic(vaddr);
page_size = I915_GTT_PAGE_SIZE_64K;

/*
@@ -523,12 +518,11 @@ static void gen8_ppgtt_insert_huge(struct i915_vma *vma, u16 i;
		encode = vma->vm->scratch[0]->encode;
		vaddr = kmap_atomic_px(i915_pt_entry(pd, maybe_64K));
		vaddr = px_vaddr(i915_pt_entry(pd, maybe_64K));

	for (i = 1; i < index; i += 16)
		memset64(vaddr + i, encode, 15);
		kunmap_atomic(vaddr);
}
}
@@ -602,7 +596,7 @@ static int gen8_init_scratch(struct i915_address_space *vm) if (IS_ERR(obj)) goto free_scratch;
ret = pin_pt_dma(vm, obj);
ret = map_pt_dma(vm, obj);
if (ret) { i915_gem_object_put(obj); goto free_scratch;
@@ -639,7 +633,7 @@ static int gen8_preallocate_top_level_pdp(struct i915_ppgtt *ppgtt) if (IS_ERR(pde)) return PTR_ERR(pde);
err = pin_pt_dma(vm, pde->pt.base);
err = map_pt_dma(vm, pde->pt.base);
if (err) { i915_gem_object_put(pde->pt.base); free_pd(vm, pde);
@@ -675,7 +669,7 @@ gen8_alloc_top_pd(struct i915_address_space *vm) goto err_pd; }

err = pin_pt_dma(vm, pd->pt.base);

err = map_pt_dma(vm, pd->pt.base); if (err) goto err_pd;

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 670c1271e7d5..d94628b9d89e 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -657,7 +657,7 @@ static int init_aliasing_ppgtt(struct i915_ggtt *ggtt) goto err_ppgtt;

i915_gem_object_lock(ppgtt->vm.scratch[0], NULL);

err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash);

err = i915_vm_map_pt_stash(&ppgtt->vm, &stash); i915_gem_object_unlock(ppgtt->vm.scratch[0]); if (err) goto err_stash;

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 941f8af016d6..d386b89e2758 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -25,27 +25,25 @@ struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz) return obj; }

-int pin_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) +int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) {

int err;

void *vaddr;
i915_gem_object_lock(obj, NULL);

err = i915_gem_object_pin_pages(obj);

i915_gem_object_unlock(obj);

if (err)
return err;
vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);

if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
i915_gem_object_make_unshrinkable(obj); return 0;
}

-int pin_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) +int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) {

int err;

void *vaddr;
err = i915_gem_object_pin_pages(obj);

if (err)
return err;
vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);

if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
i915_gem_object_make_unshrinkable(obj); return 0;
@@ -155,6 +153,14 @@ void clear_pages(struct i915_vma *vma) memset(&vma->page_sizes, 0, sizeof(vma->page_sizes)); }

+void *__px_vaddr(struct drm_i915_gem_object *p) +{

enum i915_map_type type;

GEM_BUG_ON(!i915_gem_object_has_pages(p));

return page_unpack_bits(p->mm.mapping, &type);

+}

dma_addr_t __px_dma(struct drm_i915_gem_object *p) { GEM_BUG_ON(!i915_gem_object_has_pages(p)); @@ -170,32 +176,22 @@ struct page *__px_page(struct drm_i915_gem_object *p) void fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count) {

struct page *page = __px_page(p);

void *vaddr;

void *vaddr = __px_vaddr(p);

vaddr = kmap(page); memset64(vaddr, val, count); clflush_cache_range(vaddr, PAGE_SIZE);

kunmap(page);

}

static void poison_scratch_page(struct drm_i915_gem_object *scratch) {

struct sgt_iter sgt;

struct page *page;

void *vaddr = __px_vaddr(scratch); u8 val;

val = 0; if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) val = POISON_FREE;
for_each_sgt_page(page, sgt, scratch->mm.pages) {
void *vaddr;
vaddr = kmap(page);
memset(vaddr, val, PAGE_SIZE);
kunmap(page);
}
memset(vaddr, val, scratch->base.size);

}

int setup_scratch_page(struct i915_address_space *vm) @@ -225,7 +221,7 @@ int setup_scratch_page(struct i915_address_space *vm) if (IS_ERR(obj)) goto skip;
if (pin_pt_dma(vm, obj))
if (map_pt_dma(vm, obj))
goto skip_obj;
/* We need a single contiguous page for our scratch */
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index e67e34e17913..40e486704558 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -180,6 +180,9 @@ struct page *__px_page(struct drm_i915_gem_object *p); dma_addr_t __px_dma(struct drm_i915_gem_object *p); #define px_dma(px) (__px_dma(px_base(px)))

+void *__px_vaddr(struct drm_i915_gem_object *p); +#define px_vaddr(px) (__px_vaddr(px_base(px)))

#define px_pt(px) \ __px_choose_expr(px, struct i915_page_table *, __x, \ __px_choose_expr(px, struct i915_page_directory *, &__x->pt, \ @@ -511,8 +514,6 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt); void i915_ggtt_suspend(struct i915_ggtt *gtt); void i915_ggtt_resume(struct i915_ggtt *ggtt);

-#define kmap_atomic_px(px) kmap_atomic(__px_page(px_base(px)))

void fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count);

@@ -530,8 +531,8 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm); struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde);

-int pin_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj); -int pin_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj); +int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj); +int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj);

void free_px(struct i915_address_space *vm, struct i915_page_table *pt, int lvl); @@ -578,7 +579,7 @@ void setup_private_pat(struct intel_uncore *uncore); int i915_vm_alloc_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash, u64 size); -int i915_vm_pin_pt_stash(struct i915_address_space *vm, +int i915_vm_map_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash); void i915_vm_free_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash); diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 014ae8ac4480..4e3d80c2295c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -87,11 +87,10 @@ write_dma_entry(struct drm_i915_gem_object * const pdma, const unsigned short idx, const u64 encoded_entry) {

u64 * const vaddr = kmap_atomic(__px_page(pdma));

u64 * const vaddr = __px_vaddr(pdma);

vaddr[idx] = encoded_entry; clflush_cache_range(&vaddr[idx], sizeof(u64));

kunmap_atomic(vaddr);

}

void @@ -258,7 +257,7 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm, return 0; }

-int i915_vm_pin_pt_stash(struct i915_address_space *vm, +int i915_vm_map_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash) { struct i915_page_table *pt; @@ -266,7 +265,7 @@ int i915_vm_pin_pt_stash(struct i915_address_space *vm,

for (n = 0; n < ARRAY_SIZE(stash->pt); n++) { for (pt = stash->pt[n]; pt; pt = pt->stash) {
	err = pin_pt_dma_locked(vm, pt->base);
	err = map_pt_dma_locked(vm, pt->base);
if (err)
	return err;
}
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index e24d33aecac4..c68a743fac2a 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -912,8 +912,7 @@ int i915_vma_pin_ww(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, if (err) goto err_fence;
	err = i915_vm_pin_pt_stash(vma->vm,
				   &work->stash);
	err = i915_vm_map_pt_stash(vma->vm, &work->stash);
if (err)
	goto err_fence;
}
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index 2e4f06eaacc1..e060e455e9f6 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -186,7 +186,7 @@ static int igt_ppgtt_alloc(void *arg) if (err) goto err_ppgtt_cleanup;
err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash);
err = i915_vm_map_pt_stash(&ppgtt->vm, &stash);
if (err) { i915_vm_free_pt_stash(&ppgtt->vm, &stash); goto err_ppgtt_cleanup;
@@ -208,7 +208,7 @@ static int igt_ppgtt_alloc(void *arg) if (err) goto err_ppgtt_cleanup;
err = i915_vm_pin_pt_stash(&ppgtt->vm, &stash);
err = i915_vm_map_pt_stash(&ppgtt->vm, &stash);
if (err) { i915_vm_free_pt_stash(&ppgtt->vm, &stash); goto err_ppgtt_cleanup;
@@ -325,11 +325,10 @@ static int lowlevel_hole(struct i915_address_space *vm, BIT_ULL(size))) goto alloc_vm_end;
		err = i915_vm_pin_pt_stash(vm, &stash);
		err = i915_vm_map_pt_stash(vm, &stash);
	if (!err)
		vm->allocate_va_range(vm, &stash,
				      addr, BIT_ULL(size));
	i915_vm_free_pt_stash(vm, &stash);
alloc_vm_end: if (err == -EDEADLK) { @@ -1967,10 +1966,9 @@ static int igt_cs_tlb(void *arg) if (err) goto end_ww;
	err = i915_vm_pin_pt_stash(vm, &stash);
	err = i915_vm_map_pt_stash(vm, &stash);
if (!err)
	vm->allocate_va_range(vm, &stash, offset, chunk_size);
i915_vm_free_pt_stash(vm, &stash);
end_ww: if (err == -EDEADLK) { diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c index e9d86dab8677..bfb0290967a1 100644 --- a/drivers/gpu/drm/i915/selftests/i915_perf.c +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c @@ -307,7 +307,7 @@ static int live_noa_gpr(void *arg) }

/* Poison the ce->vm so we detect writes not to the GGTT gt->scratch */

scratch = kmap(__px_page(ce->vm->scratch[0]));

scratch = __px_vaddr(ce->vm->scratch[0]); memset(scratch, POISON_FREE, PAGE_SIZE);

rq = intel_context_create_request(ce);

@@ -405,7 +405,6 @@ static int live_noa_gpr(void *arg) out_rq: i915_request_put(rq); out_ce:

kunmap(__px_page(ce->vm->scratch[0])); intel_context_put(ce);

out: stream_destroy(stream); -- 2.26.3

Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Jani Nikula

4:01 p.m.

New subject: [Intel-gfx] [PATCH 18/19] drm/i915/gtt: map the PD up front

On Mon, 12 Apr 2021, Daniel Vetter daniel@ffwll.ch wrote:

...

And that's some serious wtf. Yes we've done some compile-time type casting automagic between i915_priv and dev in the past, and I think even that was bad taste. But it was justified with that we have these everywhere (especially in the mmio macros), and it would be a terrible flag day.

FWIW, we had the dev_priv/dev macro trickery for a while to not have that flag day conversion, until everything used i915 or &i915->drm. But we got rid of it afterwards.

BR, Jani.

-- Jani Nikula, Intel Open Source Graphics Center

Daniel Vetter

4:36 p.m.

New subject: [Intel-gfx] [PATCH 18/19] drm/i915/gtt: map the PD up front

On Mon, Apr 12, 2021 at 07:01:19PM +0300, Jani Nikula wrote:

...

On Mon, 12 Apr 2021, Daniel Vetter daniel@ffwll.ch wrote:

...
And that's some serious wtf. Yes we've done some compile-time type casting automagic between i915_priv and dev in the past, and I think even that was bad taste. But it was justified with that we have these everywhere (especially in the mmio macros), and it would be a terrible flag day.

FWIW, we had the dev_priv/dev macro trickery for a while to not have that flag day conversion, until everything used i915 or &i915->drm. But we got rid of it afterwards.

Yay, and yes that was the plan to avoid the flag day. And not as a great coding pattern that everyone should imitate ... -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Matthew Auld

4:08 p.m.

New subject: [Intel-gfx] [PATCH 18/19] drm/i915/gtt: map the PD up front

On Mon, 12 Apr 2021 at 16:17, Daniel Vetter daniel@ffwll.ch wrote:

...

On Mon, Apr 12, 2021 at 10:05:25AM +0100, Matthew Auld wrote:

...
We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.

v2: (Thomas) Rebase on dma_resv and obj->mm.lock removal.

Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk

So I wanted to understand what px stands for as an abbreviation, and dug all the way down to this:

commit 567047be2a7ede082d29f45524c287b87bd75e53 Author: Mika Kuoppala mika.kuoppala@linux.intel.com Date: Thu Jun 25 18:35:12 2015 +0300
drm/i915/gtt: Use macros to access dma mapped pages
I still have no idea what it means, I guess px = page. But I also committed this, so I guess can blame myself :-)

But while digging I've stumbled over this here

commit 6eebfe8a10a62139d681e2f1af1386252742278b Author: Chris Wilson chris@chris-wilson.co.uk Date: Fri Jul 12 08:58:18 2019 +0100
drm/i915/gtt: Use shallow dma pages for scratch
And that's some serious wtf. Yes we've done some compile-time type casting automagic between i915_priv and dev in the past, and I think even that was bad taste. But it was justified with that we have these everywhere (especially in the mmio macros), and it would be a terrible flag day.

But I'm not seeing any need for auto-casting for these pages here, and I'm not aware that we're doing this anywhere else in kernel code. There is some macro-trickery in lockdep annotations, but that relies on the lockdep map having the same struct member name in all lock types, and is not exposed to drivers at all.

Am I missing something, or why do we have this compile-time type casting stuff going on in i915 page accessors?

I think 'x' in the px family of macros/functions is meant in the variable/polymorphic sense, so it can potentially be a pt, pd, etc underneath. If you look at px_base() for example all it does is fish out the base GEM object from the structure, using the known-at-compile-time-type, which then lets us get at the dma address, vaddr etc.

It does seem pretty magical, but seems ok to me, if it means less typing?

Daniel Vetter

5 p.m.

New subject: [Intel-gfx] [PATCH 18/19] drm/i915/gtt: map the PD up front

On Mon, Apr 12, 2021 at 6:08 PM Matthew Auld matthew.william.auld@gmail.com wrote:

...

On Mon, 12 Apr 2021 at 16:17, Daniel Vetter daniel@ffwll.ch wrote:

...
On Mon, Apr 12, 2021 at 10:05:25AM +0100, Matthew Auld wrote:

...
We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.

v2: (Thomas) Rebase on dma_resv and obj->mm.lock removal.

Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk

So I wanted to understand what px stands for as an abbreviation, and dug all the way down to this:

commit 567047be2a7ede082d29f45524c287b87bd75e53 Author: Mika Kuoppala mika.kuoppala@linux.intel.com Date: Thu Jun 25 18:35:12 2015 +0300
drm/i915/gtt: Use macros to access dma mapped pages
I still have no idea what it means, I guess px = page. But I also committed this, so I guess can blame myself :-)

But while digging I've stumbled over this here

commit 6eebfe8a10a62139d681e2f1af1386252742278b Author: Chris Wilson chris@chris-wilson.co.uk Date: Fri Jul 12 08:58:18 2019 +0100
drm/i915/gtt: Use shallow dma pages for scratch
And that's some serious wtf. Yes we've done some compile-time type casting automagic between i915_priv and dev in the past, and I think even that was bad taste. But it was justified with that we have these everywhere (especially in the mmio macros), and it would be a terrible flag day.

But I'm not seeing any need for auto-casting for these pages here, and I'm not aware that we're doing this anywhere else in kernel code. There is some macro-trickery in lockdep annotations, but that relies on the lockdep map having the same struct member name in all lock types, and is not exposed to drivers at all.

Am I missing something, or why do we have this compile-time type casting stuff going on in i915 page accessors?
I think 'x' in the px family of macros/functions is meant in the variable/polymorphic sense, so it can potentially be a pt, pd, etc underneath. If you look at px_base() for example all it does is fish out the base GEM object from the structure, using the known-at-compile-time-type, which then lets us get at the dma address, vaddr etc.

Yeah, but that's not how things landed. px predates the magic polymorphism. I think the px just stands for page, or at least originally only stood for page. I'm not sure honestly. It seems to be just used for page directory type of things, but I haven't found that written down anywhere.

...

It does seem pretty magical, but seems ok to me, if it means less typing?

That's the worst justification. Code is generally write once, read many times. Optimizing for writing at the cost of magic indirection is generally not the right tradeoff in the kernel, where any indirection could hide a major gotcha. In huge userspace applications fancy abstraction and polymorphism is often the right thing to do, but there you also have a real compiler with a real typesystem (generally at least) helping you out. Or it's yolo duct-taping with lots of tests, where the speed at which you can hack up something matters more than being able to read it quickly.

We're typing C here. It is generally rather verbose, with type casting all done explicitly. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Matthew Auld

13 Apr 13 Apr

9:28 a.m.

New subject: [Intel-gfx] [PATCH 18/19] drm/i915/gtt: map the PD up front

On Mon, 12 Apr 2021 at 18:01, Daniel Vetter daniel@ffwll.ch wrote:

...

On Mon, Apr 12, 2021 at 6:08 PM Matthew Auld matthew.william.auld@gmail.com wrote:

...
On Mon, 12 Apr 2021 at 16:17, Daniel Vetter daniel@ffwll.ch wrote:

...
On Mon, Apr 12, 2021 at 10:05:25AM +0100, Matthew Auld wrote:

...
We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.

v2: (Thomas) Rebase on dma_resv and obj->mm.lock removal.

Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk

So I wanted to understand what px stands for as an abbreviation, and dug all the way down to this:

commit 567047be2a7ede082d29f45524c287b87bd75e53 Author: Mika Kuoppala mika.kuoppala@linux.intel.com Date: Thu Jun 25 18:35:12 2015 +0300
drm/i915/gtt: Use macros to access dma mapped pages
I still have no idea what it means, I guess px = page. But I also committed this, so I guess can blame myself :-)

But while digging I've stumbled over this here

commit 6eebfe8a10a62139d681e2f1af1386252742278b Author: Chris Wilson chris@chris-wilson.co.uk Date: Fri Jul 12 08:58:18 2019 +0100
drm/i915/gtt: Use shallow dma pages for scratch
And that's some serious wtf. Yes we've done some compile-time type casting automagic between i915_priv and dev in the past, and I think even that was bad taste. But it was justified with that we have these everywhere (especially in the mmio macros), and it would be a terrible flag day.

But I'm not seeing any need for auto-casting for these pages here, and I'm not aware that we're doing this anywhere else in kernel code. There is some macro-trickery in lockdep annotations, but that relies on the lockdep map having the same struct member name in all lock types, and is not exposed to drivers at all.

Am I missing something, or why do we have this compile-time type casting stuff going on in i915 page accessors?
I think 'x' in the px family of macros/functions is meant in the variable/polymorphic sense, so it can potentially be a pt, pd, etc underneath. If you look at px_base() for example all it does is fish out the base GEM object from the structure, using the known-at-compile-time-type, which then lets us get at the dma address, vaddr etc.
Yeah, but that's not how things landed. px predates the magic polymorphism. I think the px just stands for page, or at least originally only stood for page. I'm not sure honestly. It seems to be just used for page directory type of things, but I haven't found that written down anywhere.

...
It does seem pretty magical, but seems ok to me, if it means less typing?

That's the worst justification. Code is generally write once, read many times. Optimizing for writing at the cost of magic indirection is generally not the right tradeoff in the kernel, where any indirection could hide a major gotcha. In huge userspace applications fancy abstraction and polymorphism is often the right thing to do, but there you also have a real compiler with a real typesystem (generally at least) helping you out. Or it's yolo duct-taping with lots of tests, where the speed at which you can hack up something matters more than being able to read it quickly.

We're typing C here. It is generally rather verbose, with type casting all done explicitly.

Ok. So should we change this around for this patch? The px_ stuff is already quite prevalent it seems, and the px_vaddr() is just one part of it? Maybe just add pt_vaddr(), pd_vaddr() etc instead?

...

-Daniel

Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Daniel Vetter

10:18 a.m.

New subject: [Intel-gfx] [PATCH 18/19] drm/i915/gtt: map the PD up front

On Tue, Apr 13, 2021 at 11:29 AM Matthew Auld matthew.william.auld@gmail.com wrote:

...

On Mon, 12 Apr 2021 at 18:01, Daniel Vetter daniel@ffwll.ch wrote:

...
On Mon, Apr 12, 2021 at 6:08 PM Matthew Auld matthew.william.auld@gmail.com wrote:

...
On Mon, 12 Apr 2021 at 16:17, Daniel Vetter daniel@ffwll.ch wrote:

...
On Mon, Apr 12, 2021 at 10:05:25AM +0100, Matthew Auld wrote:

...
We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory.

v2: (Thomas) Rebase on dma_resv and obj->mm.lock removal.

Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk

So I wanted to understand what px stands for as an abbreviation, and dug all the way down to this:

commit 567047be2a7ede082d29f45524c287b87bd75e53 Author: Mika Kuoppala mika.kuoppala@linux.intel.com Date: Thu Jun 25 18:35:12 2015 +0300
drm/i915/gtt: Use macros to access dma mapped pages
I still have no idea what it means, I guess px = page. But I also committed this, so I guess can blame myself :-)

But while digging I've stumbled over this here

commit 6eebfe8a10a62139d681e2f1af1386252742278b Author: Chris Wilson chris@chris-wilson.co.uk Date: Fri Jul 12 08:58:18 2019 +0100
drm/i915/gtt: Use shallow dma pages for scratch
And that's some serious wtf. Yes we've done some compile-time type casting automagic between i915_priv and dev in the past, and I think even that was bad taste. But it was justified with that we have these everywhere (especially in the mmio macros), and it would be a terrible flag day.

But I'm not seeing any need for auto-casting for these pages here, and I'm not aware that we're doing this anywhere else in kernel code. There is some macro-trickery in lockdep annotations, but that relies on the lockdep map having the same struct member name in all lock types, and is not exposed to drivers at all.

Am I missing something, or why do we have this compile-time type casting stuff going on in i915 page accessors?
I think 'x' in the px family of macros/functions is meant in the variable/polymorphic sense, so it can potentially be a pt, pd, etc underneath. If you look at px_base() for example all it does is fish out the base GEM object from the structure, using the known-at-compile-time-type, which then lets us get at the dma address, vaddr etc.
Yeah, but that's not how things landed. px predates the magic polymorphism. I think the px just stands for page, or at least originally only stood for page. I'm not sure honestly. It seems to be just used for page directory type of things, but I haven't found that written down anywhere.

...
It does seem pretty magical, but seems ok to me, if it means less typing?

That's the worst justification. Code is generally write once, read many times. Optimizing for writing at the cost of magic indirection is generally not the right tradeoff in the kernel, where any indirection could hide a major gotcha. In huge userspace applications fancy abstraction and polymorphism is often the right thing to do, but there you also have a real compiler with a real typesystem (generally at least) helping you out. Or it's yolo duct-taping with lots of tests, where the speed at which you can hack up something matters more than being able to read it quickly.

We're typing C here. It is generally rather verbose, with type casting all done explicitly.
Ok. So should we change this around for this patch? The px_ stuff is already quite prevalent it seems, and the px_vaddr() is just one part of it? Maybe just add pt_vaddr(), pd_vaddr() etc instead?

Nah, that was just an orthogonal observation. The confusion with magic type-aware macros is preexisting and widespread, there's no point holding up dg1 code with that. But it is maybe something we should put on our cleanup list. Or at least have a better explanation for why exactly it is needed. Also note I'm not worried about the px stuff standing for pt/pd/whatever, it's the magic type casting property of these macros added with the 2nd patch I've mentioned above that looks rather questionable to me. Maybe as transition thing like we've done with i915_priv pointers, but not something that we should build on top for long term. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Matthew Auld

12 Apr 12 Apr

9:05 a.m.

New subject: [PATCH 19/19] drm/i915/gtt/dgfx: place the PD in LMEM

It's a requirement that for dgfx we place all the paging structures in device local-memory.

Signed-off-by: Matthew Auld matthew.auld@intel.com --- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 5 ++++- drivers/gpu/drm/i915/gt/intel_gtt.c | 27 +++++++++++++++++++++++++-- drivers/gpu/drm/i915/gt/intel_gtt.h | 1 + 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index f83496836f0f..11fb5df45a0f 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -712,7 +712,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt) */ ppgtt->vm.has_read_only = !IS_GEN_RANGE(gt->i915, 11, 12);

- ppgtt->vm.alloc_pt_dma = alloc_pt_dma; + if (HAS_LMEM(gt->i915)) + ppgtt->vm.alloc_pt_dma = alloc_pt_lmem; + else + ppgtt->vm.alloc_pt_dma = alloc_pt_dma;

err = gen8_init_scratch(&ppgtt->vm); if (err) diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index d386b89e2758..1eeeab45445c 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -7,10 +7,23 @@

#include <linux/fault-inject.h>

+#include "gem/i915_gem_lmem.h" #include "i915_trace.h" #include "intel_gt.h" #include "intel_gtt.h"

+struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz) +{ + struct drm_i915_gem_object *obj; + + obj = i915_gem_object_create_lmem(vm->i915, sz, 0); + + /* ensure all dma objects have the same reservation class */ + if (!IS_ERR(obj)) + obj->base.resv = &vm->resv; + return obj; +} + struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz) { struct drm_i915_gem_object *obj; @@ -27,9 +40,14 @@ struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz)

int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) { + enum i915_map_type type; void *vaddr;

- vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB); + type = I915_MAP_WB; + if (i915_gem_object_is_lmem(obj)) + type = I915_MAP_WC; + + vaddr = i915_gem_object_pin_map_unlocked(obj, type); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);

@@ -39,9 +57,14 @@ int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj)

int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) { + enum i915_map_type type; void *vaddr;

- vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB); + type = I915_MAP_WB; + if (i915_gem_object_is_lmem(obj)) + type = I915_MAP_WC; + + vaddr = i915_gem_object_pin_map(obj, type); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 40e486704558..44ce27c51631 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -527,6 +527,7 @@ int setup_scratch_page(struct i915_address_space *vm); void free_scratch(struct i915_address_space *vm);

struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz); +struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz); struct i915_page_table *alloc_pt(struct i915_address_space *vm); struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde);

-- 2.26.3

Tvrtko Ursulin

14 Apr 14 Apr

3:37 p.m.

New subject: [Intel-gfx] [PATCH 19/19] drm/i915/gtt/dgfx: place the PD in LMEM

On 12/04/2021 10:05, Matthew Auld wrote:

...

It's a requirement that for dgfx we place all the paging structures in device local-memory.

Signed-off-by: Matthew Auld matthew.auld@intel.com

drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 5 ++++- drivers/gpu/drm/i915/gt/intel_gtt.c | 27 +++++++++++++++++++++++++-- drivers/gpu/drm/i915/gt/intel_gtt.h | 1 + 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index f83496836f0f..11fb5df45a0f 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -712,7 +712,10 @@ struct i915_ppgtt *gen8_ppgtt_create(struct intel_gt *gt) */ ppgtt->vm.has_read_only = !IS_GEN_RANGE(gt->i915, 11, 12);

ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
if (HAS_LMEM(gt->i915))
ppgtt->vm.alloc_pt_dma = alloc_pt_lmem;
else
ppgtt->vm.alloc_pt_dma = alloc_pt_dma;
err = gen8_init_scratch(&ppgtt->vm); if (err)
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index d386b89e2758..1eeeab45445c 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -7,10 +7,23 @@

#include <linux/fault-inject.h>

+#include "gem/i915_gem_lmem.h" #include "i915_trace.h" #include "intel_gt.h" #include "intel_gtt.h"

+struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz) +{
struct drm_i915_gem_object *obj;

obj = i915_gem_object_create_lmem(vm->i915, sz, 0);

/* ensure all dma objects have the same reservation class */

if (!IS_ERR(obj))
obj->base.resv = &vm->resv;
return obj;
+}

struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz) { struct drm_i915_gem_object *obj;

@@ -27,9 +40,14 @@ struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz)

int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) {

enum i915_map_type type; void *vaddr;

vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);
type = I915_MAP_WB;

if (i915_gem_object_is_lmem(obj))
type = I915_MAP_WC;

Not trusting the "always coherent" helper from earlier in the series?

Regards,

Tvrtko

...

vaddr = i915_gem_object_pin_map_unlocked(obj, type); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);

@@ -39,9 +57,14 @@ int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj)

int map_pt_dma_locked(struct i915_address_space *vm, struct drm_i915_gem_object *obj) {

enum i915_map_type type; void *vaddr;

vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
type = I915_MAP_WB;

if (i915_gem_object_is_lmem(obj))
type = I915_MAP_WC;
vaddr = i915_gem_object_pin_map(obj, type); if (IS_ERR(vaddr)) return PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 40e486704558..44ce27c51631 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -527,6 +527,7 @@ int setup_scratch_page(struct i915_address_space *vm); void free_scratch(struct i915_address_space *vm);

struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz); +struct drm_i915_gem_object *alloc_pt_lmem(struct i915_address_space *vm, int sz); struct i915_page_table *alloc_pt(struct i915_address_space *vm); struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde);

1306

Age (days ago)

1469

Last active (days ago)

dri-devel@lists.freedesktop.org

64 comments

10 participants

tags (0)

participants (10)

Daniel Vetter
Gupta, Anshuman
Jani Nikula
Jani Nikula
kernel test robot
Lucas De Marchi
Matthew Auld
Matthew Auld
Souza, Jose
Tvrtko Ursulin