Our uncore MMIO functions for reading/writing registers have become very complicated over time. There's significant macro magic used to generate several nearly-identical functions that only really differ in terms of which platform-specific shadow register table they should check on write operations. We can significantly simplify our MMIO handlers by storing a reference to the current platform's shadow table within the 'struct intel_uncore' the same way we already do for forcewake; this allows us to consolidate the multiple variants of each 'write' function down to just a single 'fwtable' version that gets the shadow table out of the uncore struct rather than hardcoding the name of a specific platform's table. We can do similar consolidation on the MMIO read side by creating a single-entry forcewake table to replace the open-coded range check they had been using previously.
The final patch of the series adds a new shadow table for DG2; this becomes quite clean and simple now, given the refactoring in the first five patches.
Matt Roper (6): drm/i915/uncore: Convert gen6/gen7 read operations to fwtable drm/i915/uncore: Associate shadow table with uncore drm/i915/uncore: Replace gen8 write functions with general fwtable drm/i915/uncore: Drop gen11/gen12 mmio write handlers drm/i915/uncore: Drop gen11 mmio read handlers drm/i915/dg2: Add DG2-specific shadow register table
drivers/gpu/drm/i915/intel_uncore.c | 190 ++++++++++-------- drivers/gpu/drm/i915/intel_uncore.h | 7 + drivers/gpu/drm/i915/selftests/intel_uncore.c | 1 + 3 files changed, 110 insertions(+), 88 deletions(-)
On gen6-gen8 (except vlv/chv) we don't use a forcewake lookup table; we simply check whether the register offset is < 0x40000, and return FORCEWAKE_RENDER if it is. To prepare for upcoming refactoring, let's define a single-entry forcewake table from [0x0, 0x3ffff] and switch these platforms over to use the fwtable reader functions.
Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/intel_uncore.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index f9767054dbdf..7f92f12d95f2 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1064,6 +1064,10 @@ gen6_reg_write_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) __fwd; \ })
+static const struct intel_forcewake_range __gen6_fw_ranges[] = { + GEN_FW_RANGE(0x0, 0x3ffff, FORCEWAKE_RENDER), +}; + /* *Must* be sorted by offset ranges! See intel_fw_table_check(). */ static const struct intel_forcewake_range __chv_fw_ranges[] = { GEN_FW_RANGE(0x2000, 0x3fff, FORCEWAKE_RENDER), @@ -1623,7 +1627,6 @@ __gen_read(func, 64)
__gen_reg_read_funcs(gen11_fwtable); __gen_reg_read_funcs(fwtable); -__gen_reg_read_funcs(gen6);
#undef __gen_reg_read_funcs #undef GEN6_READ_FOOTER @@ -2111,15 +2114,17 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (GRAPHICS_VER(i915) == 8) { + ASSIGN_FW_DOMAINS_TABLE(uncore, __gen6_fw_ranges); ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen8); - ASSIGN_READ_MMIO_VFUNCS(uncore, gen6); + ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (IS_VALLEYVIEW(i915)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __vlv_fw_ranges); ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen6); ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (IS_GRAPHICS_VER(i915, 6, 7)) { + ASSIGN_FW_DOMAINS_TABLE(uncore, __gen6_fw_ranges); ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen6); - ASSIGN_READ_MMIO_VFUNCS(uncore, gen6); + ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); }
uncore->pmic_bus_access_nb.notifier_call = i915_pmic_bus_access_notifier;
On 10/09/2021 06:33, Matt Roper wrote:
On gen6-gen8 (except vlv/chv) we don't use a forcewake lookup table; we simply check whether the register offset is < 0x40000, and return FORCEWAKE_RENDER if it is. To prepare for upcoming refactoring, let's define a single-entry forcewake table from [0x0, 0x3ffff] and switch these platforms over to use the fwtable reader functions.
Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/intel_uncore.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index f9767054dbdf..7f92f12d95f2 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1064,6 +1064,10 @@ gen6_reg_write_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) __fwd; \ })
Is __gen6_reg_read_fw_domains left orphaned somewhere around here or in a later patch?
Regards,
Tvrtko
+static const struct intel_forcewake_range __gen6_fw_ranges[] = {
- GEN_FW_RANGE(0x0, 0x3ffff, FORCEWAKE_RENDER),
+};
- /* *Must* be sorted by offset ranges! See intel_fw_table_check(). */ static const struct intel_forcewake_range __chv_fw_ranges[] = { GEN_FW_RANGE(0x2000, 0x3fff, FORCEWAKE_RENDER),
@@ -1623,7 +1627,6 @@ __gen_read(func, 64)
__gen_reg_read_funcs(gen11_fwtable); __gen_reg_read_funcs(fwtable); -__gen_reg_read_funcs(gen6);
#undef __gen_reg_read_funcs #undef GEN6_READ_FOOTER @@ -2111,15 +2114,17 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (GRAPHICS_VER(i915) == 8) {
ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen8);ASSIGN_FW_DOMAINS_TABLE(uncore, __gen6_fw_ranges);
ASSIGN_READ_MMIO_VFUNCS(uncore, gen6);
} else if (IS_VALLEYVIEW(i915)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __vlv_fw_ranges); ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen6); ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (IS_GRAPHICS_VER(i915, 6, 7)) {ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable);
ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen6);ASSIGN_FW_DOMAINS_TABLE(uncore, __gen6_fw_ranges);
ASSIGN_READ_MMIO_VFUNCS(uncore, gen6);
ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable);
}
uncore->pmic_bus_access_nb.notifier_call = i915_pmic_bus_access_notifier;
Store a reference to a platform's shadow table inside the uncore, the same as we do with the forcewake table. This will allow us to use a single set of functions that operate on the shadow table reference rather than generating lots of nearly-identical functions via macros that differ only in terms of the table that they reference.
Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/intel_uncore.c | 40 ++++++++++++++++++++--------- drivers/gpu/drm/i915/intel_uncore.h | 7 +++++ 2 files changed, 35 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 7f92f12d95f2..159880e3e77d 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1036,17 +1036,19 @@ static int mmio_range_cmp(u32 key, const struct i915_range *range) return 0; }
-#define __is_X_shadowed(x) \ -static bool is_##x##_shadowed(u32 offset) \ -{ \ - const struct i915_range *regs = x##_shadowed_regs; \ - return BSEARCH(offset, regs, ARRAY_SIZE(x##_shadowed_regs), \ +static bool +is_shadowed(struct intel_uncore *uncore, u32 offset) +{ + if (drm_WARN_ON(&uncore->i915->drm, !uncore->shadowed_reg_table)) + return false; + + return BSEARCH(offset, + uncore->shadowed_reg_table, + uncore->shadowed_reg_table_entries, mmio_range_cmp); \ }
-__is_X_shadowed(gen8) -__is_X_shadowed(gen11) -__is_X_shadowed(gen12) +
static enum forcewake_domains gen6_reg_write_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) @@ -1057,7 +1059,7 @@ gen6_reg_write_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) #define __gen8_reg_write_fw_domains(uncore, offset) \ ({ \ enum forcewake_domains __fwd; \ - if (NEEDS_FORCE_WAKE(offset) && !is_gen8_shadowed(offset)) \ + if (NEEDS_FORCE_WAKE(offset) && !is_shadowed(uncore, offset)) \ __fwd = FORCEWAKE_RENDER; \ else \ __fwd = 0; \ @@ -1091,7 +1093,7 @@ static const struct intel_forcewake_range __chv_fw_ranges[] = { #define __fwtable_reg_write_fw_domains(uncore, offset) \ ({ \ enum forcewake_domains __fwd = 0; \ - if (NEEDS_FORCE_WAKE((offset)) && !is_gen8_shadowed(offset)) \ + if (NEEDS_FORCE_WAKE((offset)) && !is_shadowed(uncore, offset)) \ __fwd = find_fw_domain(uncore, offset); \ __fwd; \ }) @@ -1100,7 +1102,7 @@ static const struct intel_forcewake_range __chv_fw_ranges[] = { ({ \ enum forcewake_domains __fwd = 0; \ const u32 __offset = (offset); \ - if (!is_gen11_shadowed(__offset)) \ + if (!is_shadowed(uncore, __offset)) \ __fwd = find_fw_domain(uncore, __offset); \ __fwd; \ }) @@ -1109,7 +1111,7 @@ static const struct intel_forcewake_range __chv_fw_ranges[] = { ({ \ enum forcewake_domains __fwd = 0; \ const u32 __offset = (offset); \ - if (!is_gen12_shadowed(__offset)) \ + if (!is_shadowed(uncore, __offset)) \ __fwd = find_fw_domain(uncore, __offset); \ __fwd; \ }) @@ -1715,6 +1717,7 @@ __gen_write(func, 8) \ __gen_write(func, 16) \ __gen_write(func, 32)
+ __gen_reg_write_funcs(gen12_fwtable); __gen_reg_write_funcs(gen11_fwtable); __gen_reg_write_funcs(fwtable); @@ -1979,6 +1982,12 @@ static int intel_uncore_fw_domains_init(struct intel_uncore *uncore) (uncore)->fw_domains_table_entries = ARRAY_SIZE((d)); \ }
+#define ASSIGN_SHADOW_TABLE(uncore, d) \ +{ \ + (uncore)->shadowed_reg_table = d; \ + (uncore)->shadowed_reg_table_entries = ARRAY_SIZE((d)); \ +} + static int i915_pmic_bus_access_notifier(struct notifier_block *nb, unsigned long action, void *data) { @@ -2091,30 +2100,37 @@ static int uncore_forcewake_init(struct intel_uncore *uncore)
if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __dg2_fw_ranges); + ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen12_fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __xehp_fw_ranges); + ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen12_fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); } else if (GRAPHICS_VER(i915) >= 12) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen12_fw_ranges); + ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen12_fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); } else if (GRAPHICS_VER(i915) == 11) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen11_fw_ranges); + ASSIGN_SHADOW_TABLE(uncore, gen11_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen11_fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); } else if (IS_GRAPHICS_VER(i915, 9, 10)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen9_fw_ranges); + ASSIGN_SHADOW_TABLE(uncore, gen8_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (IS_CHERRYVIEW(i915)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __chv_fw_ranges); + ASSIGN_SHADOW_TABLE(uncore, gen8_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (GRAPHICS_VER(i915) == 8) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen6_fw_ranges); + ASSIGN_SHADOW_TABLE(uncore, gen8_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen8); ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (IS_VALLEYVIEW(i915)) { diff --git a/drivers/gpu/drm/i915/intel_uncore.h b/drivers/gpu/drm/i915/intel_uncore.h index 531665b08039..2f31c50eeae2 100644 --- a/drivers/gpu/drm/i915/intel_uncore.h +++ b/drivers/gpu/drm/i915/intel_uncore.h @@ -142,6 +142,13 @@ struct intel_uncore { const struct intel_forcewake_range *fw_domains_table; unsigned int fw_domains_table_entries;
+ /* + * Shadowed registers are special cases where we can safely write + * to the register *without* grabbing forcewake. + */ + const struct i915_range *shadowed_reg_table; + unsigned int shadowed_reg_table_entries; + struct notifier_block pmic_bus_access_nb; struct intel_uncore_funcs funcs;
Now that we have both a standard forcewake table (albeit a single-entry table) and the shadow table stored in the uncore, we can drop the gen8-specific write handlers in favor of the general fwtable version.
Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/intel_uncore.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 159880e3e77d..80533cf24fe8 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1056,16 +1056,6 @@ gen6_reg_write_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) return FORCEWAKE_RENDER; }
-#define __gen8_reg_write_fw_domains(uncore, offset) \ -({ \ - enum forcewake_domains __fwd; \ - if (NEEDS_FORCE_WAKE(offset) && !is_shadowed(uncore, offset)) \ - __fwd = FORCEWAKE_RENDER; \ - else \ - __fwd = 0; \ - __fwd; \ -}) - static const struct intel_forcewake_range __gen6_fw_ranges[] = { GEN_FW_RANGE(0x0, 0x3ffff, FORCEWAKE_RENDER), }; @@ -1721,7 +1711,6 @@ __gen_write(func, 32) __gen_reg_write_funcs(gen12_fwtable); __gen_reg_write_funcs(gen11_fwtable); __gen_reg_write_funcs(fwtable); -__gen_reg_write_funcs(gen8);
#undef __gen_reg_write_funcs #undef GEN6_WRITE_FOOTER @@ -2131,7 +2120,7 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) } else if (GRAPHICS_VER(i915) == 8) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen6_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen8_shadowed_regs); - ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen8); + ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (IS_VALLEYVIEW(i915)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __vlv_fw_ranges);
Now that the reference to the shadow table is stored within the uncore, we don't need to generate separate fwtable, gen11_fwtable, and gen12_fwtable variants of the register write functions; a single 'fwtable' implementation will work for all of those platforms now.
Note that while consolidating down to __fwtable_reg_write_fw_domains() we drop the NEEDS_FORCE_WAKE() check. If an MMIO offset is outside that range on older platforms, it also won't be part of a range in the forcewake table, so a '0' will be returned anyway.
Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/intel_uncore.c | 54 +++++++++-------------------- 1 file changed, 16 insertions(+), 38 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 80533cf24fe8..c181e74fbf43 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1081,23 +1081,6 @@ static const struct intel_forcewake_range __chv_fw_ranges[] = { };
#define __fwtable_reg_write_fw_domains(uncore, offset) \ -({ \ - enum forcewake_domains __fwd = 0; \ - if (NEEDS_FORCE_WAKE((offset)) && !is_shadowed(uncore, offset)) \ - __fwd = find_fw_domain(uncore, offset); \ - __fwd; \ -}) - -#define __gen11_fwtable_reg_write_fw_domains(uncore, offset) \ -({ \ - enum forcewake_domains __fwd = 0; \ - const u32 __offset = (offset); \ - if (!is_shadowed(uncore, __offset)) \ - __fwd = find_fw_domain(uncore, __offset); \ - __fwd; \ -}) - -#define __gen12_fwtable_reg_write_fw_domains(uncore, offset) \ ({ \ enum forcewake_domains __fwd = 0; \ const u32 __offset = (offset); \ @@ -1685,34 +1668,29 @@ __gen6_write(8) __gen6_write(16) __gen6_write(32)
-#define __gen_write(func, x) \ +#define __gen_fwtable_write(x) \ static void \ -func##_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, bool trace) { \ +fwtable_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, bool trace) { \ enum forcewake_domains fw_engine; \ GEN6_WRITE_HEADER; \ - fw_engine = __##func##_reg_write_fw_domains(uncore, offset); \ + fw_engine = __fwtable_reg_write_fw_domains(uncore, offset); \ if (fw_engine) \ __force_wake_auto(uncore, fw_engine); \ __raw_uncore_write##x(uncore, reg, val); \ GEN6_WRITE_FOOTER; \ }
-#define __gen_reg_write_funcs(func) \ -static enum forcewake_domains \ -func##_reg_write_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) { \ - return __##func##_reg_write_fw_domains(uncore, i915_mmio_reg_offset(reg)); \ -} \ -\ -__gen_write(func, 8) \ -__gen_write(func, 16) \ -__gen_write(func, 32) - +static enum forcewake_domains +fwtable_reg_write_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) +{ + return __fwtable_reg_write_fw_domains(uncore, i915_mmio_reg_offset(reg)); +}
-__gen_reg_write_funcs(gen12_fwtable); -__gen_reg_write_funcs(gen11_fwtable); -__gen_reg_write_funcs(fwtable); +__gen_fwtable_write(8) +__gen_fwtable_write(16) +__gen_fwtable_write(32)
-#undef __gen_reg_write_funcs +#undef __gen_fwtable_write #undef GEN6_WRITE_FOOTER #undef GEN6_WRITE_HEADER
@@ -2090,22 +2068,22 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __dg2_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); - ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen12_fwtable); + ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __xehp_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); - ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen12_fwtable); + ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); } else if (GRAPHICS_VER(i915) >= 12) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen12_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); - ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen12_fwtable); + ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); } else if (GRAPHICS_VER(i915) == 11) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen11_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen11_shadowed_regs); - ASSIGN_WRITE_MMIO_VFUNCS(uncore, gen11_fwtable); + ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); } else if (IS_GRAPHICS_VER(i915, 9, 10)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen9_fw_ranges);
Consolidate down to just a single 'fwtable' implementation. For reads we don't need to worry about shadow tables. Also, the NEEDS_FORCE_WAKE() check we previously had in the fwtable implementation can be dropped --- if a register is outside that range on one of the old platforms, then it won't belong to any forcewake range and 0 will be returned anyway.
Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/intel_uncore.c | 45 +++++++++++------------------ 1 file changed, 17 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index c181e74fbf43..95398cb69722 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -935,14 +935,6 @@ static const struct intel_forcewake_range __vlv_fw_ranges[] = { };
#define __fwtable_reg_read_fw_domains(uncore, offset) \ -({ \ - enum forcewake_domains __fwd = 0; \ - if (NEEDS_FORCE_WAKE((offset))) \ - __fwd = find_fw_domain(uncore, offset); \ - __fwd; \ -}) - -#define __gen11_fwtable_reg_read_fw_domains(uncore, offset) \ find_fw_domain(uncore, offset)
/* *Must* be sorted by offset! See intel_shadow_table_check(). */ @@ -1577,33 +1569,30 @@ static inline void __force_wake_auto(struct intel_uncore *uncore, ___force_wake_auto(uncore, fw_domains); }
-#define __gen_read(func, x) \ +#define __gen_fwtable_read(x) \ static u##x \ -func##_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { \ +fwtable_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) \ +{ \ enum forcewake_domains fw_engine; \ GEN6_READ_HEADER(x); \ - fw_engine = __##func##_reg_read_fw_domains(uncore, offset); \ + fw_engine = __fwtable_reg_read_fw_domains(uncore, offset); \ if (fw_engine) \ __force_wake_auto(uncore, fw_engine); \ val = __raw_uncore_read##x(uncore, reg); \ GEN6_READ_FOOTER; \ }
-#define __gen_reg_read_funcs(func) \ -static enum forcewake_domains \ -func##_reg_read_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) { \ - return __##func##_reg_read_fw_domains(uncore, i915_mmio_reg_offset(reg)); \ -} \ -\ -__gen_read(func, 8) \ -__gen_read(func, 16) \ -__gen_read(func, 32) \ -__gen_read(func, 64) +static enum forcewake_domains +fwtable_reg_read_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) { + return __fwtable_reg_read_fw_domains(uncore, i915_mmio_reg_offset(reg)); +}
-__gen_reg_read_funcs(gen11_fwtable); -__gen_reg_read_funcs(fwtable); +__gen_fwtable_read(8) +__gen_fwtable_read(16) +__gen_fwtable_read(32) +__gen_fwtable_read(64)
-#undef __gen_reg_read_funcs +#undef __gen_fwtable_read #undef GEN6_READ_FOOTER #undef GEN6_READ_HEADER
@@ -2069,22 +2058,22 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) ASSIGN_FW_DOMAINS_TABLE(uncore, __dg2_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); - ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); + ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __xehp_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); - ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); + ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (GRAPHICS_VER(i915) >= 12) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen12_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); - ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); + ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (GRAPHICS_VER(i915) == 11) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen11_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen11_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); - ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable); + ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (IS_GRAPHICS_VER(i915, 9, 10)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen9_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen8_shadowed_regs);
On 10/09/2021 06:33, Matt Roper wrote:
Consolidate down to just a single 'fwtable' implementation. For reads we don't need to worry about shadow tables. Also, the NEEDS_FORCE_WAKE() check we previously had in the fwtable implementation can be dropped --- if a register is outside that range on one of the old platforms, then it won't belong to any forcewake range and 0 will be returned anyway.
Signed-off-by: Matt Roper matthew.d.roper@intel.com
drivers/gpu/drm/i915/intel_uncore.c | 45 +++++++++++------------------ 1 file changed, 17 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index c181e74fbf43..95398cb69722 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -935,14 +935,6 @@ static const struct intel_forcewake_range __vlv_fw_ranges[] = { };
#define __fwtable_reg_read_fw_domains(uncore, offset) \ -({ \
- enum forcewake_domains __fwd = 0; \
- if (NEEDS_FORCE_WAKE((offset))) \
__fwd = find_fw_domain(uncore, offset); \
- __fwd; \
-})
-#define __gen11_fwtable_reg_read_fw_domains(uncore, offset) \ find_fw_domain(uncore, offset)
Looks like you can drop this macro and just call find_fw_domain or you think there is value to keep it?
Regards,
Tvrtko
/* *Must* be sorted by offset! See intel_shadow_table_check(). */ @@ -1577,33 +1569,30 @@ static inline void __force_wake_auto(struct intel_uncore *uncore, ___force_wake_auto(uncore, fw_domains); }
-#define __gen_read(func, x) \ +#define __gen_fwtable_read(x) \ static u##x \ -func##_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { \ +fwtable_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) \ +{ \ enum forcewake_domains fw_engine; \ GEN6_READ_HEADER(x); \
- fw_engine = __##func##_reg_read_fw_domains(uncore, offset); \
- fw_engine = __fwtable_reg_read_fw_domains(uncore, offset); \ if (fw_engine) \ __force_wake_auto(uncore, fw_engine); \ val = __raw_uncore_read##x(uncore, reg); \ GEN6_READ_FOOTER; \ }
-#define __gen_reg_read_funcs(func) \ -static enum forcewake_domains \ -func##_reg_read_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) { \
- return __##func##_reg_read_fw_domains(uncore, i915_mmio_reg_offset(reg)); \
-} \ -\ -__gen_read(func, 8) \ -__gen_read(func, 16) \ -__gen_read(func, 32) \ -__gen_read(func, 64) +static enum forcewake_domains +fwtable_reg_read_fw_domains(struct intel_uncore *uncore, i915_reg_t reg) {
- return __fwtable_reg_read_fw_domains(uncore, i915_mmio_reg_offset(reg));
+}
-__gen_reg_read_funcs(gen11_fwtable); -__gen_reg_read_funcs(fwtable); +__gen_fwtable_read(8) +__gen_fwtable_read(16) +__gen_fwtable_read(32) +__gen_fwtable_read(64)
-#undef __gen_reg_read_funcs +#undef __gen_fwtable_read #undef GEN6_READ_FOOTER #undef GEN6_READ_HEADER
@@ -2069,22 +2058,22 @@ static int uncore_forcewake_init(struct intel_uncore *uncore) ASSIGN_FW_DOMAINS_TABLE(uncore, __dg2_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable);
ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable);
} else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __xehp_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable);ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable);
ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable);
} else if (GRAPHICS_VER(i915) >= 12) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen12_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable);ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable);
ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable);
} else if (GRAPHICS_VER(i915) == 11) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen11_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen11_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable);ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable);
ASSIGN_READ_MMIO_VFUNCS(uncore, gen11_fwtable);
} else if (IS_GRAPHICS_VER(i915, 9, 10)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __gen9_fw_ranges); ASSIGN_SHADOW_TABLE(uncore, gen8_shadowed_regs);ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable);
We thought the DG2 table of shadowed registers would be the same as the gen12/xehp table, but it turns out that there are a few minor differences that require us to define a new DG2-specific table: * One register is removed (0xC4D4) * One register is added (0xC4E0)
Signed-off-by: Matt Roper matthew.d.roper@intel.com --- drivers/gpu/drm/i915/intel_uncore.c | 41 ++++++++++++++++++- drivers/gpu/drm/i915/selftests/intel_uncore.c | 1 + 2 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 95398cb69722..088d9b9d1224 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1018,6 +1018,45 @@ static const struct i915_range gen12_shadowed_regs[] = { { .start = 0x1F8510, .end = 0x1F8550 }, };
+static const struct i915_range dg2_shadowed_regs[] = { + { .start = 0x2030, .end = 0x2030 }, + { .start = 0x2510, .end = 0x2550 }, + { .start = 0xA008, .end = 0xA00C }, + { .start = 0xA188, .end = 0xA188 }, + { .start = 0xA278, .end = 0xA278 }, + { .start = 0xA540, .end = 0xA56C }, + { .start = 0xC4C8, .end = 0xC4C8 }, + { .start = 0xC4E0, .end = 0xC4E0 }, + { .start = 0xC600, .end = 0xC600 }, + { .start = 0xC658, .end = 0xC658 }, + { .start = 0x22030, .end = 0x22030 }, + { .start = 0x22510, .end = 0x22550 }, + { .start = 0x1C0030, .end = 0x1C0030 }, + { .start = 0x1C0510, .end = 0x1C0550 }, + { .start = 0x1C4030, .end = 0x1C4030 }, + { .start = 0x1C4510, .end = 0x1C4550 }, + { .start = 0x1C8030, .end = 0x1C8030 }, + { .start = 0x1C8510, .end = 0x1C8550 }, + { .start = 0x1D0030, .end = 0x1D0030 }, + { .start = 0x1D0510, .end = 0x1D0550 }, + { .start = 0x1D4030, .end = 0x1D4030 }, + { .start = 0x1D4510, .end = 0x1D4550 }, + { .start = 0x1D8030, .end = 0x1D8030 }, + { .start = 0x1D8510, .end = 0x1D8550 }, + { .start = 0x1E0030, .end = 0x1E0030 }, + { .start = 0x1E0510, .end = 0x1E0550 }, + { .start = 0x1E4030, .end = 0x1E4030 }, + { .start = 0x1E4510, .end = 0x1E4550 }, + { .start = 0x1E8030, .end = 0x1E8030 }, + { .start = 0x1E8510, .end = 0x1E8550 }, + { .start = 0x1F0030, .end = 0x1F0030 }, + { .start = 0x1F0510, .end = 0x1F0550 }, + { .start = 0x1F4030, .end = 0x1F4030 }, + { .start = 0x1F4510, .end = 0x1F4550 }, + { .start = 0x1F8030, .end = 0x1F8030 }, + { .start = 0x1F8510, .end = 0x1F8550 }, +}; + static int mmio_range_cmp(u32 key, const struct i915_range *range) { if (key < range->start) @@ -2056,7 +2095,7 @@ static int uncore_forcewake_init(struct intel_uncore *uncore)
if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55)) { ASSIGN_FW_DOMAINS_TABLE(uncore, __dg2_fw_ranges); - ASSIGN_SHADOW_TABLE(uncore, gen12_shadowed_regs); + ASSIGN_SHADOW_TABLE(uncore, dg2_shadowed_regs); ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable); ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable); } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) { diff --git a/drivers/gpu/drm/i915/selftests/intel_uncore.c b/drivers/gpu/drm/i915/selftests/intel_uncore.c index 22ef2c87df1a..bc8128170a99 100644 --- a/drivers/gpu/drm/i915/selftests/intel_uncore.c +++ b/drivers/gpu/drm/i915/selftests/intel_uncore.c @@ -68,6 +68,7 @@ static int intel_shadow_table_check(void) { gen8_shadowed_regs, ARRAY_SIZE(gen8_shadowed_regs) }, { gen11_shadowed_regs, ARRAY_SIZE(gen11_shadowed_regs) }, { gen12_shadowed_regs, ARRAY_SIZE(gen12_shadowed_regs) }, + { dg2_shadowed_regs, ARRAY_SIZE(dg2_shadowed_regs) }, }; const struct i915_range *range; unsigned int i, j;
On 10/09/2021 06:33, Matt Roper wrote:
Our uncore MMIO functions for reading/writing registers have become very complicated over time. There's significant macro magic used to generate several nearly-identical functions that only really differ in terms of which platform-specific shadow register table they should check on write operations. We can significantly simplify our MMIO handlers by storing a reference to the current platform's shadow table within the 'struct intel_uncore' the same way we already do for forcewake; this allows us to consolidate the multiple variants of each 'write' function down to just a single 'fwtable' version that gets the shadow table out of the uncore struct rather than hardcoding the name of a specific platform's table. We can do similar consolidation on the MMIO read side by creating a single-entry forcewake table to replace the open-coded range check they had been using previously.
The final patch of the series adds a new shadow table for DG2; this becomes quite clean and simple now, given the refactoring in the first five patches.
Tidy and it ends up saving kernel binary size.
However I am undecided yet, because one thing to note is that the trade off is source code and kernel text consolidation at the expense of more indirect calls at runtime and larger common read/write functions.
To expand, current code generates a bunch of per gen functions but in doing so it manages to inline a bunch of checks like NEEDS_FORCE_WAKE and BSEARCH (from find_fw_domain) so at runtime each platform mmio read/write does not have to do indirect calls to do lookups.
It may matter a lot in the grand scheme of things but this trade off is something to note in the cover letter I think.
Regards,
Tvrtko
Matt Roper (6): drm/i915/uncore: Convert gen6/gen7 read operations to fwtable drm/i915/uncore: Associate shadow table with uncore drm/i915/uncore: Replace gen8 write functions with general fwtable drm/i915/uncore: Drop gen11/gen12 mmio write handlers drm/i915/uncore: Drop gen11 mmio read handlers drm/i915/dg2: Add DG2-specific shadow register table
drivers/gpu/drm/i915/intel_uncore.c | 190 ++++++++++-------- drivers/gpu/drm/i915/intel_uncore.h | 7 + drivers/gpu/drm/i915/selftests/intel_uncore.c | 1 + 3 files changed, 110 insertions(+), 88 deletions(-)
On Fri, Sep 10, 2021 at 02:03:44PM +0100, Tvrtko Ursulin wrote:
On 10/09/2021 06:33, Matt Roper wrote:
Our uncore MMIO functions for reading/writing registers have become very complicated over time. There's significant macro magic used to generate several nearly-identical functions that only really differ in terms of which platform-specific shadow register table they should check on write operations. We can significantly simplify our MMIO handlers by storing a reference to the current platform's shadow table within the 'struct intel_uncore' the same way we already do for forcewake; this allows us to consolidate the multiple variants of each 'write' function down to just a single 'fwtable' version that gets the shadow table out of the uncore struct rather than hardcoding the name of a specific platform's table. We can do similar consolidation on the MMIO read side by creating a single-entry forcewake table to replace the open-coded range check they had been using previously.
The final patch of the series adds a new shadow table for DG2; this becomes quite clean and simple now, given the refactoring in the first five patches.
Tidy and it ends up saving kernel binary size.
However I am undecided yet, because one thing to note is that the trade off is source code and kernel text consolidation at the expense of more indirect calls at runtime and larger common read/write functions.
To expand, current code generates a bunch of per gen functions but in doing so it manages to inline a bunch of checks like NEEDS_FORCE_WAKE and BSEARCH (from find_fw_domain) so at runtime each platform mmio read/write does not have to do indirect calls to do lookups.
It may matter a lot in the grand scheme of things but this trade off is something to note in the cover letter I think.
That's true. However it seems like if the extra indirect calls are good enough for our forcewake lookups (which are called more frequently and have to search through much larger tables) then using the same strategy for shadow registers should be less of a concern. Plus most of timing-critical parts of the code don't call through this at all; they just grab an explicit forcewake and then issue a bunch of *_fw() operations that skip all the per-register forcewake and shadow handling.
But you're right that this is something I should mention more clearly in the cover letter.
Matt
Regards,
Tvrtko
Matt Roper (6): drm/i915/uncore: Convert gen6/gen7 read operations to fwtable drm/i915/uncore: Associate shadow table with uncore drm/i915/uncore: Replace gen8 write functions with general fwtable drm/i915/uncore: Drop gen11/gen12 mmio write handlers drm/i915/uncore: Drop gen11 mmio read handlers drm/i915/dg2: Add DG2-specific shadow register table
drivers/gpu/drm/i915/intel_uncore.c | 190 ++++++++++-------- drivers/gpu/drm/i915/intel_uncore.h | 7 + drivers/gpu/drm/i915/selftests/intel_uncore.c | 1 + 3 files changed, 110 insertions(+), 88 deletions(-)
On 10/09/2021 15:24, Matt Roper wrote:
On Fri, Sep 10, 2021 at 02:03:44PM +0100, Tvrtko Ursulin wrote:
On 10/09/2021 06:33, Matt Roper wrote:
Our uncore MMIO functions for reading/writing registers have become very complicated over time. There's significant macro magic used to generate several nearly-identical functions that only really differ in terms of which platform-specific shadow register table they should check on write operations. We can significantly simplify our MMIO handlers by storing a reference to the current platform's shadow table within the 'struct intel_uncore' the same way we already do for forcewake; this allows us to consolidate the multiple variants of each 'write' function down to just a single 'fwtable' version that gets the shadow table out of the uncore struct rather than hardcoding the name of a specific platform's table. We can do similar consolidation on the MMIO read side by creating a single-entry forcewake table to replace the open-coded range check they had been using previously.
The final patch of the series adds a new shadow table for DG2; this becomes quite clean and simple now, given the refactoring in the first five patches.
Tidy and it ends up saving kernel binary size.
However I am undecided yet, because one thing to note is that the trade off is source code and kernel text consolidation at the expense of more indirect calls at runtime and larger common read/write functions.
To expand, current code generates a bunch of per gen functions but in doing so it manages to inline a bunch of checks like NEEDS_FORCE_WAKE and BSEARCH (from find_fw_domain) so at runtime each platform mmio read/write does not have to do indirect calls to do lookups.
It may matter a lot in the grand scheme of things but this trade off is something to note in the cover letter I think.
That's true. However it seems like if the extra indirect calls are good enough for our forcewake lookups (which are called more frequently and have to search through much larger tables) then using the same strategy for shadow registers should be less of a concern. Plus most of timing-critical parts of the code don't call through this at all; they just grab an explicit forcewake and then issue a bunch of *_fw() operations that skip all the per-register forcewake and shadow handling.
With lookups you mean intel_uncore_forcewake_for_reg? Yeah I don't have a good idea of how many of those followed by "_fw" accessors we have vs "un-optimized" access. But it's a good point.
I was mostly coming from the point of view of old platforms like gen6, where with this series reads go from inlined checks (NEEDS_FORCE_WAKE) to always calling find_fw_domain. Just because it is a bit unfortunate to burden old CPUs (they are not getting any faster) with executing more code. It's not nice when old hardware gets slower and slower with software updates. :) But whether or not this case would at all be measurable.. probably not. Unless some compounding effects, like "death by thousand cuts", would come into play.
Regards,
Tvrtko
But you're right that this is something I should mention more clearly in the cover letter.
Matt
Regards,
Tvrtko
Matt Roper (6): drm/i915/uncore: Convert gen6/gen7 read operations to fwtable drm/i915/uncore: Associate shadow table with uncore drm/i915/uncore: Replace gen8 write functions with general fwtable drm/i915/uncore: Drop gen11/gen12 mmio write handlers drm/i915/uncore: Drop gen11 mmio read handlers drm/i915/dg2: Add DG2-specific shadow register table
drivers/gpu/drm/i915/intel_uncore.c | 190 ++++++++++-------- drivers/gpu/drm/i915/intel_uncore.h | 7 + drivers/gpu/drm/i915/selftests/intel_uncore.c | 1 + 3 files changed, 110 insertions(+), 88 deletions(-)
On Fri, Sep 10, 2021 at 04:03:50PM +0100, Tvrtko Ursulin wrote:
On 10/09/2021 15:24, Matt Roper wrote:
On Fri, Sep 10, 2021 at 02:03:44PM +0100, Tvrtko Ursulin wrote:
On 10/09/2021 06:33, Matt Roper wrote:
Our uncore MMIO functions for reading/writing registers have become very complicated over time. There's significant macro magic used to generate several nearly-identical functions that only really differ in terms of which platform-specific shadow register table they should check on write operations. We can significantly simplify our MMIO handlers by storing a reference to the current platform's shadow table within the 'struct intel_uncore' the same way we already do for forcewake; this allows us to consolidate the multiple variants of each 'write' function down to just a single 'fwtable' version that gets the shadow table out of the uncore struct rather than hardcoding the name of a specific platform's table. We can do similar consolidation on the MMIO read side by creating a single-entry forcewake table to replace the open-coded range check they had been using previously.
The final patch of the series adds a new shadow table for DG2; this becomes quite clean and simple now, given the refactoring in the first five patches.
Tidy and it ends up saving kernel binary size.
However I am undecided yet, because one thing to note is that the trade off is source code and kernel text consolidation at the expense of more indirect calls at runtime and larger common read/write functions.
To expand, current code generates a bunch of per gen functions but in doing so it manages to inline a bunch of checks like NEEDS_FORCE_WAKE and BSEARCH (from find_fw_domain) so at runtime each platform mmio read/write does not have to do indirect calls to do lookups.
It may matter a lot in the grand scheme of things but this trade off is something to note in the cover letter I think.
That's true. However it seems like if the extra indirect calls are good enough for our forcewake lookups (which are called more frequently and have to search through much larger tables) then using the same strategy for shadow registers should be less of a concern. Plus most of timing-critical parts of the code don't call through this at all; they just grab an explicit forcewake and then issue a bunch of *_fw() operations that skip all the per-register forcewake and shadow handling.
With lookups you mean intel_uncore_forcewake_for_reg? Yeah I don't have a good idea of how many of those followed by "_fw" accessors we have vs "un-optimized" access. But it's a good point.
I was mostly coming from the point of view of old platforms like gen6, where with this series reads go from inlined checks (NEEDS_FORCE_WAKE) to always calling find_fw_domain. Just because it is a bit unfortunate to burden old CPUs (they are not getting any faster) with executing more code. It's not nice when old hardware gets slower and slower with software updates. :) But whether or not this case would at all be measurable.. probably not. Unless some compounding effects, like "death by thousand cuts", would come into play.
Chris pointed out in an offline mail that NEEDS_FORCE_WAKE does cut cut out a lot of display MMIO lookups. So I think it might be worth adding that back, but also adding an "|| GEN11_BSD_RING_BASE" so that it will still be accurate for the newer platforms too.
But I think another thing to consider here would be that we might want to switch our intel_de_{read,write} wrappers to call raw mmio directly, to completely bypass forcewake and shadow logic.
Matt
Regards,
Tvrtko
But you're right that this is something I should mention more clearly in the cover letter.
Matt
Regards,
Tvrtko
Matt Roper (6): drm/i915/uncore: Convert gen6/gen7 read operations to fwtable drm/i915/uncore: Associate shadow table with uncore drm/i915/uncore: Replace gen8 write functions with general fwtable drm/i915/uncore: Drop gen11/gen12 mmio write handlers drm/i915/uncore: Drop gen11 mmio read handlers drm/i915/dg2: Add DG2-specific shadow register table
drivers/gpu/drm/i915/intel_uncore.c | 190 ++++++++++-------- drivers/gpu/drm/i915/intel_uncore.h | 7 + drivers/gpu/drm/i915/selftests/intel_uncore.c | 1 + 3 files changed, 110 insertions(+), 88 deletions(-)
dri-devel@lists.freedesktop.org