From: Matthew Auld matthew.auld@intel.com
Since the object might still be active here, the shrink_all will simply ignore it, which blows up in the test, since the pages will still be there. Currently THP is disabled which should result in the test being skipped, but if we ever re-enable THP we might start seeing the failure. Fix this by forcing I915_SHRINK_ACTIVE.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com --- drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index a094f3ce1a90..acc435f14ac9 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1572,12 +1572,15 @@ static int igt_shrink_thp(void *arg) goto out_put;
/* - * Now that the pages are *unpinned* shrink-all should invoke + * Now that the pages are *unpinned* shrinking should invoke * shmem to truncate our pages. */ - i915_gem_shrink_all(i915); + i915_gem_shrink(NULL, i915, -1UL, NULL, + I915_SHRINK_BOUND | + I915_SHRINK_UNBOUND | + I915_SHRINK_ACTIVE); if (i915_gem_object_has_pages(obj)) { - pr_err("shrink-all didn't truncate the pages\n"); + pr_err("shrinking didn't truncate the pages\n"); err = -EINVAL; goto out_put; }
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
Usage of Transparent Hugepages was disabled in 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it appears majority of performance regressions reported with an enabled IOMMU can be almost eliminated by turning them on, lets do that by adding a couple of Kconfig options.
To err on the side of safety we keep the current default in cases where IOMMU is not active, and only when it is default to the "huge=within_size" mode. Although there probably would be wins to enable them throughout, more extensive testing across benchmarks and platforms would need to be done.
With the patch and IOMMU enabled my local testing on a small Skylake part shows OglVSTangent regression being reduced from ~14% to ~2%.
v2: * Add Kconfig dependency to transparent hugepages and some help text. * Move to helper for easier handling of kernel build options.
References: b901bb89324a ("drm/i915/gemfs: enable THP") References: 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A") References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Co-developed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Eero Tamminen eero.t.tamminen@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com # v1 --- drivers/gpu/drm/i915/Kconfig.profile | 73 +++++++++++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gemfs.c | 27 ++++++++-- 2 files changed, 97 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index 39328567c200..d49ee794732f 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -119,3 +119,76 @@ config DRM_I915_TIMESLICE_DURATION /sys/class/drm/card?/engine/*/timeslice_duration_ms
May be 0 to disable timeslicing. + +choice + prompt "Transparent Hugepage Support (native)" + default DRM_I915_THP_NATIVE_NEVER + depends on TRANSPARENT_HUGEPAGE + help + Select the preferred method for allocating from Transparent Hugepages + when IOMMU is not enabled. + + config DRM_I915_THP_NATIVE_NEVER + bool "Never" + help + Disable using THP for system memory allocations, individually + allocating each 4K chunk as a separate page. It is unlikely that such + individual allocations will return contiguous memory. + + config DRM_I915_THP_NATIVE_WITHIN + bool "Within size" + help + Allocate whole 2M superpages while those chunks do not exceed the + object size. The remainder of the object will be allocated from 4K + pages. No overallocation. + + config DRM_I915_THP_NATIVE_ALWAYS + bool "Always" + help + Allocate the whole object using 2M superpages, even if the object does + not require an exact number of superpages. + +endchoice + +config DRM_I915_THP_NATIVE + string + default "always" if DRM_I915_THP_NATIVE_ALWAYS + default "within_size" if DRM_I915_THP_NATIVE_WITHIN + default "never" if DRM_I915_THP_NATIVE_NEVER + +choice + prompt "Transparent Hugepage Support (IOMMU)" + default DRM_I915_THP_IOMMU_WITHIN if TRANSPARENT_HUGEPAGE=y + default DRM_I915_THP_IOMMU_NEVER if TRANSPARENT_HUGEPAGE=n + depends on TRANSPARENT_HUGEPAGE + help + Select the preferred method for allocating from Transparent Hugepages + with IOMMU active. + + config DRM_I915_THP_IOMMU_NEVER + bool "Never" + help + Disable using THP for system memory allocations, individually + allocating each 4K chunk as a separate page. It is unlikely that such + individual allocations will return contiguous memory. + + config DRM_I915_THP_IOMMU_WITHIN + bool "Within size" + help + Allocate whole 2M superpages while those chunks do not exceed the + object size. The remainder of the object will be allocated from 4K + pages. No overallocation. + + config DRM_I915_THP_IOMMU_ALWAYS + bool "Always" + help + Allocate the whole object using 2M superpages, even if the object does + not require an exact number of superpages. + +endchoice + +config DRM_I915_THP_IOMMU + string + default "always" if DRM_I915_THP_IOMMU_ALWAYS + default "within_size" if DRM_I915_THP_IOMMU_WITHIN + default "never" if DRM_I915_THP_IOMMU_NEVER diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index 5e6e8c91ab38..871cbfb02fdf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -11,6 +11,26 @@ #include "i915_drv.h" #include "i915_gemfs.h"
+#if defined(CONFIG_DRM_I915_THP_NATIVE) && defined(CONFIG_DRM_I915_THP_IOMMU) +static char *gemfd_mount_opts(struct drm_i915_private *i915) +{ + static char thp_native[] = "huge=" CONFIG_DRM_I915_THP_NATIVE; + static char thp_iommu[] = "huge=" CONFIG_DRM_I915_THP_IOMMU; + char *opts; + + opts = intel_vtd_active() ? thp_iommu : thp_native; + drm_info(&i915->drm, "Transparent Hugepage mode '%s'", opts); + + return opts; +} +#else +static char *gemfd_mount_opts(struct drm_i915_private *i915) +{ + return NULL; +} +#endif + + int i915_gemfs_init(struct drm_i915_private *i915) { struct file_system_type *type; @@ -26,10 +46,11 @@ int i915_gemfs_init(struct drm_i915_private *i915) * * One example, although it is probably better with a per-file * control, is selecting huge page allocations ("huge=within_size"). - * Currently unused due to bandwidth issues (slow reads) on Broadwell+. + * However, we only do so to offset the overhead of iommu lookups + * due to bandwidth issues (slow reads) on Broadwell+. */ - - gemfs = kern_mount(type); + gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, + gemfd_mount_opts(i915)); if (IS_ERR(gemfs)) return PTR_ERR(gemfs);
On Thu, Jul 29, 2021 at 1:19 PM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
Usage of Transparent Hugepages was disabled in 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it appears majority of performance regressions reported with an enabled IOMMU can be almost eliminated by turning them on, lets do that by adding a couple of Kconfig options.
To err on the side of safety we keep the current default in cases where IOMMU is not active, and only when it is default to the "huge=within_size" mode. Although there probably would be wins to enable them throughout, more extensive testing across benchmarks and platforms would need to be done.
With the patch and IOMMU enabled my local testing on a small Skylake part shows OglVSTangent regression being reduced from ~14% to ~2%.
I guess the 14% regression is iommu disabled vs iommu enabled? Would be good to clarify that.
v2:
- Add Kconfig dependency to transparent hugepages and some help text.
Uh I'm really not a huge fan of Kconfig for everything, especially for tuning stuff. Maybe if there's a need a module param for debugging, but otherwise can't we just pick the right default?
And it very much sounds like the right default here is "enable it unconditionally if we have iommu support". -Daniel
- Move to helper for easier handling of kernel build options.
References: b901bb89324a ("drm/i915/gemfs: enable THP") References: 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A") References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Co-developed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Eero Tamminen eero.t.tamminen@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com # v1
drivers/gpu/drm/i915/Kconfig.profile | 73 +++++++++++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gemfs.c | 27 ++++++++-- 2 files changed, 97 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index 39328567c200..d49ee794732f 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -119,3 +119,76 @@ config DRM_I915_TIMESLICE_DURATION /sys/class/drm/card?/engine/*/timeslice_duration_ms
May be 0 to disable timeslicing.
+choice
prompt "Transparent Hugepage Support (native)"
default DRM_I915_THP_NATIVE_NEVER
depends on TRANSPARENT_HUGEPAGE
help
Select the preferred method for allocating from Transparent Hugepages
when IOMMU is not enabled.
config DRM_I915_THP_NATIVE_NEVER
bool "Never"
help
Disable using THP for system memory allocations, individually
allocating each 4K chunk as a separate page. It is unlikely that such
individual allocations will return contiguous memory.
config DRM_I915_THP_NATIVE_WITHIN
bool "Within size"
help
Allocate whole 2M superpages while those chunks do not exceed the
object size. The remainder of the object will be allocated from 4K
pages. No overallocation.
config DRM_I915_THP_NATIVE_ALWAYS
bool "Always"
help
Allocate the whole object using 2M superpages, even if the object does
not require an exact number of superpages.
+endchoice
+config DRM_I915_THP_NATIVE
string
default "always" if DRM_I915_THP_NATIVE_ALWAYS
default "within_size" if DRM_I915_THP_NATIVE_WITHIN
default "never" if DRM_I915_THP_NATIVE_NEVER
+choice
prompt "Transparent Hugepage Support (IOMMU)"
default DRM_I915_THP_IOMMU_WITHIN if TRANSPARENT_HUGEPAGE=y
default DRM_I915_THP_IOMMU_NEVER if TRANSPARENT_HUGEPAGE=n
depends on TRANSPARENT_HUGEPAGE
help
Select the preferred method for allocating from Transparent Hugepages
with IOMMU active.
config DRM_I915_THP_IOMMU_NEVER
bool "Never"
help
Disable using THP for system memory allocations, individually
allocating each 4K chunk as a separate page. It is unlikely that such
individual allocations will return contiguous memory.
config DRM_I915_THP_IOMMU_WITHIN
bool "Within size"
help
Allocate whole 2M superpages while those chunks do not exceed the
object size. The remainder of the object will be allocated from 4K
pages. No overallocation.
config DRM_I915_THP_IOMMU_ALWAYS
bool "Always"
help
Allocate the whole object using 2M superpages, even if the object does
not require an exact number of superpages.
+endchoice
+config DRM_I915_THP_IOMMU
string
default "always" if DRM_I915_THP_IOMMU_ALWAYS
default "within_size" if DRM_I915_THP_IOMMU_WITHIN
default "never" if DRM_I915_THP_IOMMU_NEVER
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index 5e6e8c91ab38..871cbfb02fdf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -11,6 +11,26 @@ #include "i915_drv.h" #include "i915_gemfs.h"
+#if defined(CONFIG_DRM_I915_THP_NATIVE) && defined(CONFIG_DRM_I915_THP_IOMMU) +static char *gemfd_mount_opts(struct drm_i915_private *i915) +{
static char thp_native[] = "huge=" CONFIG_DRM_I915_THP_NATIVE;
static char thp_iommu[] = "huge=" CONFIG_DRM_I915_THP_IOMMU;
char *opts;
opts = intel_vtd_active() ? thp_iommu : thp_native;
drm_info(&i915->drm, "Transparent Hugepage mode '%s'", opts);
return opts;
+} +#else +static char *gemfd_mount_opts(struct drm_i915_private *i915) +{
return NULL;
+} +#endif
int i915_gemfs_init(struct drm_i915_private *i915) { struct file_system_type *type; @@ -26,10 +46,11 @@ int i915_gemfs_init(struct drm_i915_private *i915) * * One example, although it is probably better with a per-file * control, is selecting huge page allocations ("huge=within_size").
* Currently unused due to bandwidth issues (slow reads) on Broadwell+.
* However, we only do so to offset the overhead of iommu lookups
* due to bandwidth issues (slow reads) on Broadwell+. */
gemfs = kern_mount(type);
gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name,
gemfd_mount_opts(i915)); if (IS_ERR(gemfs)) return PTR_ERR(gemfs);
-- 2.30.2
On 29/07/2021 13:07, Daniel Vetter wrote:
On Thu, Jul 29, 2021 at 1:19 PM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
Usage of Transparent Hugepages was disabled in 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it appears majority of performance regressions reported with an enabled IOMMU can be almost eliminated by turning them on, lets do that by adding a couple of Kconfig options.
To err on the side of safety we keep the current default in cases where IOMMU is not active, and only when it is default to the "huge=within_size" mode. Although there probably would be wins to enable them throughout, more extensive testing across benchmarks and platforms would need to be done.
With the patch and IOMMU enabled my local testing on a small Skylake part shows OglVSTangent regression being reduced from ~14% to ~2%.
I guess the 14% regression is iommu disabled vs iommu enabled? Would be good to clarify that.
Should be clear from the first paragraph above - "...majority of performance regressions reported with an _enabled_ IOMMU can be almost eliminated...".
v2:
- Add Kconfig dependency to transparent hugepages and some help text.
Uh I'm really not a huge fan of Kconfig for everything, especially for tuning stuff. Maybe if there's a need a module param for debugging, but otherwise can't we just pick the right default?
Kconfig is picking the right default so I do not see a problem by allowing override from a deep enough menu. But I also do not feel so strongly about bikeshedding this to no kconfig, or a module param, or whatever - there are votes for all three options already, as usual. Main problem I have is actually..
And it very much sounds like the right default here is "enable it unconditionally if we have iommu support".
.. about this - who knows? I will remind you of a certain VLK-20150 which I thought was very important for going forward but was falling on deaf ears for years. As such I am waiting for Eero to come back and improvise some unofficial testing. It's extra bewildering to me given how we had the facility and then shut it down just like that.
Regards,
Tvrtko
-Daniel
- Move to helper for easier handling of kernel build options.
References: b901bb89324a ("drm/i915/gemfs: enable THP") References: 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A") References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Co-developed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Eero Tamminen eero.t.tamminen@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com # v1
drivers/gpu/drm/i915/Kconfig.profile | 73 +++++++++++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gemfs.c | 27 ++++++++-- 2 files changed, 97 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index 39328567c200..d49ee794732f 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -119,3 +119,76 @@ config DRM_I915_TIMESLICE_DURATION /sys/class/drm/card?/engine/*/timeslice_duration_ms
May be 0 to disable timeslicing.
+choice
prompt "Transparent Hugepage Support (native)"
default DRM_I915_THP_NATIVE_NEVER
depends on TRANSPARENT_HUGEPAGE
help
Select the preferred method for allocating from Transparent Hugepages
when IOMMU is not enabled.
config DRM_I915_THP_NATIVE_NEVER
bool "Never"
help
Disable using THP for system memory allocations, individually
allocating each 4K chunk as a separate page. It is unlikely that such
individual allocations will return contiguous memory.
config DRM_I915_THP_NATIVE_WITHIN
bool "Within size"
help
Allocate whole 2M superpages while those chunks do not exceed the
object size. The remainder of the object will be allocated from 4K
pages. No overallocation.
config DRM_I915_THP_NATIVE_ALWAYS
bool "Always"
help
Allocate the whole object using 2M superpages, even if the object does
not require an exact number of superpages.
+endchoice
+config DRM_I915_THP_NATIVE
string
default "always" if DRM_I915_THP_NATIVE_ALWAYS
default "within_size" if DRM_I915_THP_NATIVE_WITHIN
default "never" if DRM_I915_THP_NATIVE_NEVER
+choice
prompt "Transparent Hugepage Support (IOMMU)"
default DRM_I915_THP_IOMMU_WITHIN if TRANSPARENT_HUGEPAGE=y
default DRM_I915_THP_IOMMU_NEVER if TRANSPARENT_HUGEPAGE=n
depends on TRANSPARENT_HUGEPAGE
help
Select the preferred method for allocating from Transparent Hugepages
with IOMMU active.
config DRM_I915_THP_IOMMU_NEVER
bool "Never"
help
Disable using THP for system memory allocations, individually
allocating each 4K chunk as a separate page. It is unlikely that such
individual allocations will return contiguous memory.
config DRM_I915_THP_IOMMU_WITHIN
bool "Within size"
help
Allocate whole 2M superpages while those chunks do not exceed the
object size. The remainder of the object will be allocated from 4K
pages. No overallocation.
config DRM_I915_THP_IOMMU_ALWAYS
bool "Always"
help
Allocate the whole object using 2M superpages, even if the object does
not require an exact number of superpages.
+endchoice
+config DRM_I915_THP_IOMMU
string
default "always" if DRM_I915_THP_IOMMU_ALWAYS
default "within_size" if DRM_I915_THP_IOMMU_WITHIN
default "never" if DRM_I915_THP_IOMMU_NEVER
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index 5e6e8c91ab38..871cbfb02fdf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -11,6 +11,26 @@ #include "i915_drv.h" #include "i915_gemfs.h"
+#if defined(CONFIG_DRM_I915_THP_NATIVE) && defined(CONFIG_DRM_I915_THP_IOMMU) +static char *gemfd_mount_opts(struct drm_i915_private *i915) +{
static char thp_native[] = "huge=" CONFIG_DRM_I915_THP_NATIVE;
static char thp_iommu[] = "huge=" CONFIG_DRM_I915_THP_IOMMU;
char *opts;
opts = intel_vtd_active() ? thp_iommu : thp_native;
drm_info(&i915->drm, "Transparent Hugepage mode '%s'", opts);
return opts;
+} +#else +static char *gemfd_mount_opts(struct drm_i915_private *i915) +{
return NULL;
+} +#endif
- int i915_gemfs_init(struct drm_i915_private *i915) { struct file_system_type *type;
@@ -26,10 +46,11 @@ int i915_gemfs_init(struct drm_i915_private *i915) * * One example, although it is probably better with a per-file * control, is selecting huge page allocations ("huge=within_size").
* Currently unused due to bandwidth issues (slow reads) on Broadwell+.
* However, we only do so to offset the overhead of iommu lookups
* due to bandwidth issues (slow reads) on Broadwell+. */
gemfs = kern_mount(type);
gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name,
gemfd_mount_opts(i915)); if (IS_ERR(gemfs)) return PTR_ERR(gemfs);
-- 2.30.2
On Thu, Jul 29, 2021 at 2:21 PM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
On 29/07/2021 13:07, Daniel Vetter wrote:
On Thu, Jul 29, 2021 at 1:19 PM Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:
From: Tvrtko Ursulin tvrtko.ursulin@intel.com
Usage of Transparent Hugepages was disabled in 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it appears majority of performance regressions reported with an enabled IOMMU can be almost eliminated by turning them on, lets do that by adding a couple of Kconfig options.
To err on the side of safety we keep the current default in cases where IOMMU is not active, and only when it is default to the "huge=within_size" mode. Although there probably would be wins to enable them throughout, more extensive testing across benchmarks and platforms would need to be done.
With the patch and IOMMU enabled my local testing on a small Skylake part shows OglVSTangent regression being reduced from ~14% to ~2%.
I guess the 14% regression is iommu disabled vs iommu enabled? Would be good to clarify that.
Should be clear from the first paragraph above - "...majority of performance regressions reported with an _enabled_ IOMMU can be almost eliminated...".
Yeah I inferred, but might be good to hammer that in by repeating, like
"reduced from 14% (for IOMMU on vs off case) to 2% (IOMMU on with THP enabled vs IOMMU off with THP disabled)"
v2:
- Add Kconfig dependency to transparent hugepages and some help text.
Uh I'm really not a huge fan of Kconfig for everything, especially for tuning stuff. Maybe if there's a need a module param for debugging, but otherwise can't we just pick the right default?
Kconfig is picking the right default so I do not see a problem by allowing override from a deep enough menu. But I also do not feel so strongly about bikeshedding this to no kconfig, or a module param, or whatever - there are votes for all three options already, as usual. Main problem I have is actually..
Yeah that's pretty much what Kconfig is abused for: Everyone brings their bikeshed because they're not quite happy, and it gets "resolved" by Kconfigs to give everyone what they want. It just leads to combinatorial explosion that no on tests. Hence unless we have a demonstrated benefit of the choices there's going to be one default, and you get to decide (which you did).
And it very much sounds like the right default here is "enable it unconditionally if we have iommu support".
.. about this - who knows? I will remind you of a certain VLK-20150 which I thought was very important for going forward but was falling on deaf ears for years. As such I am waiting for Eero to come back and improvise some unofficial testing. It's extra bewildering to me given how we had the facility and then shut it down just like that.
Oh sure the general performance tuning is terrible, and also the specific case of when THP. But we're looking the very specific case of "IOMMU is enabled and it sucks away perf", and it looks like enabling THP is the answer. So let's just do that.
Ofc we don't have full perf data, but we never have that even with a nice perf lab (there's always more to benchmark than there's machine time), so just doing as good as we can is imo perfectly fine enough. You've put in the work (at least a bit), you get to pick the default until we find something new. -Daniel
Regards,
Tvrtko
-Daniel
- Move to helper for easier handling of kernel build options.
References: b901bb89324a ("drm/i915/gemfs: enable THP") References: 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A") References: https://gitlab.freedesktop.org/drm/intel/-/issues/430 Co-developed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Eero Tamminen eero.t.tamminen@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com # v1
drivers/gpu/drm/i915/Kconfig.profile | 73 +++++++++++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gemfs.c | 27 ++++++++-- 2 files changed, 97 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index 39328567c200..d49ee794732f 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -119,3 +119,76 @@ config DRM_I915_TIMESLICE_DURATION /sys/class/drm/card?/engine/*/timeslice_duration_ms
May be 0 to disable timeslicing.
+choice
prompt "Transparent Hugepage Support (native)"
default DRM_I915_THP_NATIVE_NEVER
depends on TRANSPARENT_HUGEPAGE
help
Select the preferred method for allocating from Transparent Hugepages
when IOMMU is not enabled.
config DRM_I915_THP_NATIVE_NEVER
bool "Never"
help
Disable using THP for system memory allocations, individually
allocating each 4K chunk as a separate page. It is unlikely that such
individual allocations will return contiguous memory.
config DRM_I915_THP_NATIVE_WITHIN
bool "Within size"
help
Allocate whole 2M superpages while those chunks do not exceed the
object size. The remainder of the object will be allocated from 4K
pages. No overallocation.
config DRM_I915_THP_NATIVE_ALWAYS
bool "Always"
help
Allocate the whole object using 2M superpages, even if the object does
not require an exact number of superpages.
+endchoice
+config DRM_I915_THP_NATIVE
string
default "always" if DRM_I915_THP_NATIVE_ALWAYS
default "within_size" if DRM_I915_THP_NATIVE_WITHIN
default "never" if DRM_I915_THP_NATIVE_NEVER
+choice
prompt "Transparent Hugepage Support (IOMMU)"
default DRM_I915_THP_IOMMU_WITHIN if TRANSPARENT_HUGEPAGE=y
default DRM_I915_THP_IOMMU_NEVER if TRANSPARENT_HUGEPAGE=n
depends on TRANSPARENT_HUGEPAGE
help
Select the preferred method for allocating from Transparent Hugepages
with IOMMU active.
config DRM_I915_THP_IOMMU_NEVER
bool "Never"
help
Disable using THP for system memory allocations, individually
allocating each 4K chunk as a separate page. It is unlikely that such
individual allocations will return contiguous memory.
config DRM_I915_THP_IOMMU_WITHIN
bool "Within size"
help
Allocate whole 2M superpages while those chunks do not exceed the
object size. The remainder of the object will be allocated from 4K
pages. No overallocation.
config DRM_I915_THP_IOMMU_ALWAYS
bool "Always"
help
Allocate the whole object using 2M superpages, even if the object does
not require an exact number of superpages.
+endchoice
+config DRM_I915_THP_IOMMU
string
default "always" if DRM_I915_THP_IOMMU_ALWAYS
default "within_size" if DRM_I915_THP_IOMMU_WITHIN
default "never" if DRM_I915_THP_IOMMU_NEVER
diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c index 5e6e8c91ab38..871cbfb02fdf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c @@ -11,6 +11,26 @@ #include "i915_drv.h" #include "i915_gemfs.h"
+#if defined(CONFIG_DRM_I915_THP_NATIVE) && defined(CONFIG_DRM_I915_THP_IOMMU) +static char *gemfd_mount_opts(struct drm_i915_private *i915) +{
static char thp_native[] = "huge=" CONFIG_DRM_I915_THP_NATIVE;
static char thp_iommu[] = "huge=" CONFIG_DRM_I915_THP_IOMMU;
char *opts;
opts = intel_vtd_active() ? thp_iommu : thp_native;
drm_info(&i915->drm, "Transparent Hugepage mode '%s'", opts);
return opts;
+} +#else +static char *gemfd_mount_opts(struct drm_i915_private *i915) +{
return NULL;
+} +#endif
- int i915_gemfs_init(struct drm_i915_private *i915) { struct file_system_type *type;
@@ -26,10 +46,11 @@ int i915_gemfs_init(struct drm_i915_private *i915) * * One example, although it is probably better with a per-file * control, is selecting huge page allocations ("huge=within_size").
* Currently unused due to bandwidth issues (slow reads) on Broadwell+.
* However, we only do so to offset the overhead of iommu lookups
* due to bandwidth issues (slow reads) on Broadwell+. */
gemfs = kern_mount(type);
gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name,
gemfd_mount_opts(i915)); if (IS_ERR(gemfs)) return PTR_ERR(gemfs);
-- 2.30.2
dri-devel@lists.freedesktop.org