From: Tvrtko Ursulin tvrtko.ursulin@intel.com
We have a statement from HW designers that the GPU read regression when using 2M pages was fixed from Icelake onwards, which was also confirmed by bencharking Eero did last year:
""" When IOMMU is disabled, enabling THP causes following perf changes on TGL-H (GT1):
10-15% SynMark Batch[0-3] 5-10% MemBW GPU texture, SynMark ShMapVsm 3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4 1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7] + few others -7% MemBW GPU blend
In the above 3D benchmark names, * means all the variants of tests with the same prefix. For example "SynMark TexMem*", means both TexMem128 & TexMem512 tests in the synthetic (Intel internal) SynMark test suite.
In the (public, but proprietary) GfxBench & GLB(enchmark) test suites, there are both onscreen and offscreen variants of each test. Unless explicitly stated otherwise, numbers are for both variants.
All tests are run with FullHD monitor. All tests are fullscreen except for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest triangle is run both in fullscreen and 1/2 screen window). """
Since the only regression is MemBW GPU blend, against many more gains, it sounds it is time to enable THP on Gen11+.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Eero Tamminen eero.t.tamminen@intel.com --- drivers/gpu/drm/i915/gem/i915_gemfs.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index ee87874e59dc..c5a6bbc842fc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -28,12 +28,14 @@ int i915_gemfs_init(struct drm_i915_private *i915) * * One example, although it is probably better with a per-file * control, is selecting huge page allocations ("huge=within_size"). - * However, we only do so to offset the overhead of iommu lookups - * due to bandwidth issues (slow reads) on Broadwell+. + * However, we only do so on platforms which benefit from it, or to + * offset the overhead of iommu lookups, where with latter it is a net + * win even on platforms which would otherwise see some performance + * regressions such a slow reads issue on Broadwell and Skylake. */
opts = NULL; - if (i915_vtd_active(i915)) { + if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) { if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { opts = huge_opt; drm_info(&i915->drm, @@ -41,7 +43,10 @@ int i915_gemfs_init(struct drm_i915_private *i915) opts); } else { drm_notice(&i915->drm, - "Transparent Hugepage support is recommended for optimal performance when IOMMU is enabled!\n"); + "Transparent Hugepage support is recommended for optimal performance%s\n", + GRAPHICS_VER(i915) >= 11 ? + " on this platform!" : + " when IOMMU is enabled!"); } }
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
If i915 does not want to use huge pages there is a) no point in setting up the private mount and b) should former fail, it is misleading to log THP support is disabled in the caller, which does not even know if callee tried to enable it.
Fix both by restructuring the flow in i915_gemfs_init and at the same time note the failure to set it up in all cases.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Eero Tamminen eero.t.tamminen@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 11 +----- drivers/gpu/drm/i915/gem/i915_gemfs.c | 45 ++++++++++------------- drivers/gpu/drm/i915/gem/i915_gemfs.h | 3 +- 3 files changed, 23 insertions(+), 36 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index c2a3e388fcb4..955844f19193 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -671,17 +671,10 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv,
static int init_shmem(struct intel_memory_region *mem) { - int err; - - err = i915_gemfs_init(mem->i915); - if (err) { - DRM_NOTE("Unable to create a private tmpfs mount, hugepage support will be disabled(%d).\n", - err); - } - + i915_gemfs_init(mem->i915); intel_memory_region_set_name(mem, "system");
- return 0; /* Don't error, we can simply fallback to the kernel mnt */ + return 0; /* We have fallback to the kernel mnt if gemfs init failed. */ }
static int release_shmem(struct intel_memory_region *mem) diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index c5a6bbc842fc..46b9a17d6abc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -11,16 +11,11 @@ #include "i915_gemfs.h" #include "i915_utils.h"
-int i915_gemfs_init(struct drm_i915_private *i915) +void i915_gemfs_init(struct drm_i915_private *i915) { char huge_opt[] = "huge=within_size"; /* r/w */ struct file_system_type *type; struct vfsmount *gemfs; - char *opts; - - type = get_fs_type("tmpfs"); - if (!type) - return -ENODEV;
/* * By creating our own shmemfs mountpoint, we can pass in @@ -34,29 +29,29 @@ int i915_gemfs_init(struct drm_i915_private *i915) * regressions such a slow reads issue on Broadwell and Skylake. */
- opts = NULL; - if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) { - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - opts = huge_opt; - drm_info(&i915->drm, - "Transparent Hugepage mode '%s'\n", - opts); - } else { - drm_notice(&i915->drm, - "Transparent Hugepage support is recommended for optimal performance%s\n", - GRAPHICS_VER(i915) >= 11 ? - " on this platform!" : - " when IOMMU is enabled!"); - } - } + if (GRAPHICS_VER(i915) < 11 && !i915_vtd_active(i915)) + return; + + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + goto err;
- gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, opts); + type = get_fs_type("tmpfs"); + if (!type) + goto err; + + gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt); if (IS_ERR(gemfs)) - return PTR_ERR(gemfs); + goto err;
i915->mm.gemfs = gemfs; - - return 0; + drm_info(&i915->drm, "Using Transparent Hugepages\n"); + return; + +err: + drm_notice(&i915->drm, + "Transparent Hugepage support is recommended for optimal performance%s\n", + GRAPHICS_VER(i915) >= 11 ? " on this platform!" : + " when IOMMU is enabled!"); }
void i915_gemfs_fini(struct drm_i915_private *i915) diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.h b/drivers/gpu/drm/i915/gem/i915_gemfs.h index 2a1e59af3e4a..5d835e44c4f6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.h +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.h @@ -9,8 +9,7 @@
struct drm_i915_private;
-int i915_gemfs_init(struct drm_i915_private *i915); - +void i915_gemfs_init(struct drm_i915_private *i915); void i915_gemfs_fini(struct drm_i915_private *i915);
#endif
On 29/04/2022 11:04, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
We have a statement from HW designers that the GPU read regression when using 2M pages was fixed from Icelake onwards, which was also confirmed by bencharking Eero did last year:
""" When IOMMU is disabled, enabling THP causes following perf changes on TGL-H (GT1):
10-15% SynMark Batch[0-3] 5-10% MemBW GPU texture, SynMark ShMapVsm 3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4 1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7] + few others -7% MemBW GPU blend
In the above 3D benchmark names, * means all the variants of tests with the same prefix. For example "SynMark TexMem*", means both TexMem128 & TexMem512 tests in the synthetic (Intel internal) SynMark test suite.
In the (public, but proprietary) GfxBench & GLB(enchmark) test suites, there are both onscreen and offscreen variants of each test. Unless explicitly stated otherwise, numbers are for both variants.
All tests are run with FullHD monitor. All tests are fullscreen except for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest triangle is run both in fullscreen and 1/2 screen window). """
Since the only regression is MemBW GPU blend, against many more gains, it sounds it is time to enable THP on Gen11+.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Eero Tamminen eero.t.tamminen@intel.com
fwiw, for the series, Reviewed-by: Matthew Auld matthew.auld@intel.com
drivers/gpu/drm/i915/gem/i915_gemfs.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index ee87874e59dc..c5a6bbc842fc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -28,12 +28,14 @@ int i915_gemfs_init(struct drm_i915_private *i915) * * One example, although it is probably better with a per-file * control, is selecting huge page allocations ("huge=within_size").
* However, we only do so to offset the overhead of iommu lookups
* due to bandwidth issues (slow reads) on Broadwell+.
* However, we only do so on platforms which benefit from it, or to
* offset the overhead of iommu lookups, where with latter it is a net
* win even on platforms which would otherwise see some performance
* regressions such a slow reads issue on Broadwell and Skylake.
*/
opts = NULL;
- if (i915_vtd_active(i915)) {
- if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) { if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { opts = huge_opt; drm_info(&i915->drm,
@@ -41,7 +43,10 @@ int i915_gemfs_init(struct drm_i915_private *i915) opts); } else { drm_notice(&i915->drm,
"Transparent Hugepage support is recommended for optimal performance when IOMMU is enabled!\n");
"Transparent Hugepage support is recommended for optimal performance%s\n",
GRAPHICS_VER(i915) >= 11 ?
" on this platform!" :
} }" when IOMMU is enabled!");
On 09/05/2022 11:49, Matthew Auld wrote:
On 29/04/2022 11:04, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
We have a statement from HW designers that the GPU read regression when using 2M pages was fixed from Icelake onwards, which was also confirmed by bencharking Eero did last year:
""" When IOMMU is disabled, enabling THP causes following perf changes on TGL-H (GT1):
10-15% SynMark Batch[0-3] 5-10% MemBW GPU texture, SynMark ShMapVsm 3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4 1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7] + few others -7% MemBW GPU blend
In the above 3D benchmark names, * means all the variants of tests with the same prefix. For example "SynMark TexMem*", means both TexMem128 & TexMem512 tests in the synthetic (Intel internal) SynMark test suite.
In the (public, but proprietary) GfxBench & GLB(enchmark) test suites, there are both onscreen and offscreen variants of each test. Unless explicitly stated otherwise, numbers are for both variants.
All tests are run with FullHD monitor. All tests are fullscreen except for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest triangle is run both in fullscreen and 1/2 screen window). """
Since the only regression is MemBW GPU blend, against many more gains, it sounds it is time to enable THP on Gen11+.
Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Eero Tamminen eero.t.tamminen@intel.com
fwiw, for the series, Reviewed-by: Matthew Auld matthew.auld@intel.com
Thanks! With a statement from hw arch, benchmark results from Eero and a r-b from you, I think it is justified to push this so I have. Lets see if someone notices an improvement.
Regards,
Tvrtko
drivers/gpu/drm/i915/gem/i915_gemfs.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index ee87874e59dc..c5a6bbc842fc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -28,12 +28,14 @@ int i915_gemfs_init(struct drm_i915_private *i915) * * One example, although it is probably better with a per-file * control, is selecting huge page allocations ("huge=within_size"). - * However, we only do so to offset the overhead of iommu lookups - * due to bandwidth issues (slow reads) on Broadwell+. + * However, we only do so on platforms which benefit from it, or to + * offset the overhead of iommu lookups, where with latter it is a net + * win even on platforms which would otherwise see some performance + * regressions such a slow reads issue on Broadwell and Skylake. */ opts = NULL; - if (i915_vtd_active(i915)) { + if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) { if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { opts = huge_opt; drm_info(&i915->drm, @@ -41,7 +43,10 @@ int i915_gemfs_init(struct drm_i915_private *i915) opts); } else { drm_notice(&i915->drm, - "Transparent Hugepage support is recommended for optimal performance when IOMMU is enabled!\n"); + "Transparent Hugepage support is recommended for optimal performance%s\n", + GRAPHICS_VER(i915) >= 11 ? + " on this platform!" : + " when IOMMU is enabled!"); } }
dri-devel@lists.freedesktop.org