This series continues support for 64K pages for discrete cards. It supersedes the 64K patches from https://patchwork.freedesktop.org/series/95686/#rev4 Changes since that series:
- set min alignment for DG2 to 2MB in i915_address_space_init - replace coloring with simpler 2MB VA alignment for lmem buffers - enforce alignment to 2MB for lmem objects on DG2 in i915_vma_insert - expand vma reservation to round up to 2MB on DG2 in i915_vma_insert - add alignment test
v2: rebase and fix for async vma that landed v3: * fix uapi doc typos * add needs_compact_pt flag patch * cleanup vma expansion to use vm->min_alignment instead of hard coding v4: * fix err return in igt_ppgtt_compact test * placate ci robot with explicit enum conversion in misaligned_pin * remove some blank lines v5: * fix obj alignment requirements querying for internal buffers that have no memory region associated. (fixes v3 bug)
Matthew Auld (3): drm/i915: enforce min GTT alignment for discrete cards drm/i915: support 64K GTT pages for discrete cards drm/i915/uapi: document behaviour for DG2 64K support
Ramalingam C (1): drm/i915: add needs_compact_pt flag
Robert Beckett (1): drm/i915: add gtt misalignment test
.../gpu/drm/i915/gem/selftests/huge_pages.c | 60 +++++ .../i915/gem/selftests/i915_gem_client_blt.c | 23 +- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 108 ++++++++- drivers/gpu/drm/i915/gt/intel_gtt.c | 12 + drivers/gpu/drm/i915/gt/intel_gtt.h | 21 ++ drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + drivers/gpu/drm/i915/i915_drv.h | 10 +- drivers/gpu/drm/i915/i915_pci.c | 2 + drivers/gpu/drm/i915/i915_vma.c | 9 + drivers/gpu/drm/i915/intel_device_info.h | 1 + drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 224 +++++++++++++++--- include/uapi/drm/i915_drm.h | 44 +++- 12 files changed, 463 insertions(+), 52 deletions(-)
From: Ramalingam C ramalingam.c@intel.com
Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages.
With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access.
Suggested-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com --- drivers/gpu/drm/i915/i915_drv.h | 10 +++++++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 44c1f98144b4..1258b7779705 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
/* * Set this flag, when platform requires 64K GTT page sizes or larger for - * device local memory access. Also this flag implies that we require or - * at least support the compact PT layout for the ppGTT when using the 64K - * GTT pages. + * device local memory access. */ #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
+/* Set this flag when platform doesn't allow both 64k pages and 4k pages in + * the same PT. this flag means we need to support compact PT layout for the + * ppGTT when using the 64K GTT pages. + */ +#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt) + #define HAS_IPC(dev_priv) (INTEL_INFO(dev_priv)->display.has_ipc)
#define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i)) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 4081fd50ba9d..799b56569ef5 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = { PLATFORM(INTEL_XEHPSDV), .display = { }, .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) | @@ -1045,6 +1046,7 @@ static const struct intel_device_info dg2_info = { .media.rel = 55, PLATFORM(INTEL_DG2), .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index 3699b1c539ea..c8aaf646430c 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -130,6 +130,7 @@ enum intel_ppgtt_type { /* Keep has_* in alphabetical order */ \ func(has_64bit_reloc); \ func(has_64k_pages); \ + func(needs_compact_pt); \ func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ func(has_global_mocs); \
On 1/25/22 20:35, Robert Beckett wrote:
From: Ramalingam C ramalingam.c@intel.com
Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages.
With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access.
Suggested-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/i915/i915_drv.h | 10 +++++++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 44c1f98144b4..1258b7779705 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
/*
- Set this flag, when platform requires 64K GTT page sizes or larger for
- device local memory access. Also this flag implies that we require or
- at least support the compact PT layout for the ppGTT when using the 64K
- GTT pages.
Why do we remove these comment lines?
*/ #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
- device local memory access.
+/* Set this flag when platform doesn't allow both 64k pages and 4k pages in
First line of multi-line comments should be empty.
- the same PT. this flag means we need to support compact PT layout for the
- ppGTT when using the 64K GTT pages.
- */
+#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt)
#define HAS_IPC(dev_priv) (INTEL_INFO(dev_priv)->display.has_ipc)
#define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 4081fd50ba9d..799b56569ef5 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = { PLATFORM(INTEL_XEHPSDV), .display = { }, .has_64k_pages = 1,
- .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) |
@@ -1045,6 +1046,7 @@ static const struct intel_device_info dg2_info = { .media.rel = 55, PLATFORM(INTEL_DG2), .has_64k_pages = 1,
- .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index 3699b1c539ea..c8aaf646430c 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -130,6 +130,7 @@ enum intel_ppgtt_type { /* Keep has_* in alphabetical order */ \ func(has_64bit_reloc); \ func(has_64k_pages); \
- func(needs_compact_pt); \ func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ func(has_global_mocs); \
On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:
On 1/25/22 20:35, Robert Beckett wrote:
From: Ramalingam C ramalingam.c@intel.com
Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages.
With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access.
Suggested-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/i915/i915_drv.h | 10 +++++++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 44c1f98144b4..1258b7779705 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, /* * Set this flag, when platform requires 64K GTT page sizes or larger for
- device local memory access. Also this flag implies that we require or
- at least support the compact PT layout for the ppGTT when using
the 64K
- GTT pages.
Why do we remove these comment lines?
Because HAS_64K_PAGES now means just 64K page, it no longer means also requires compact pt. This is to support other products that will have 64K but not have the PDE non-sharing restriction in future.
Those lines moved to the next change NEEDS_COMPACT_PT, which is now separate.
- device local memory access.
*/ #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages) +/* Set this flag when platform doesn't allow both 64k pages and 4k pages in
First line of multi-line comments should be empty.
thanks. I'm surprised checkpatch didnt spot that.
- the same PT. this flag means we need to support compact PT layout
for the
- ppGTT when using the 64K GTT pages.
- */
+#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt)
#define HAS_IPC(dev_priv) (INTEL_INFO(dev_priv)->display.has_ipc) #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i)) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 4081fd50ba9d..799b56569ef5 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = { PLATFORM(INTEL_XEHPSDV), .display = { }, .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) | @@ -1045,6 +1046,7 @@ static const struct intel_device_info dg2_info = { .media.rel = 55, PLATFORM(INTEL_DG2), .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index 3699b1c539ea..c8aaf646430c 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -130,6 +130,7 @@ enum intel_ppgtt_type { /* Keep has_* in alphabetical order */ \ func(has_64bit_reloc); \ func(has_64k_pages); \ + func(needs_compact_pt); \ func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ func(has_global_mocs); \
On 1/26/22 18:11, Robert Beckett wrote:
On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:
On 1/25/22 20:35, Robert Beckett wrote:
From: Ramalingam C ramalingam.c@intel.com
Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages.
With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access.
Suggested-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/i915/i915_drv.h | 10 +++++++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 44c1f98144b4..1258b7779705 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, /* * Set this flag, when platform requires 64K GTT page sizes or larger for
- device local memory access. Also this flag implies that we
require or
- at least support the compact PT layout for the ppGTT when using
the 64K
- GTT pages.
Why do we remove these comment lines?
Because HAS_64K_PAGES now means just 64K page, it no longer means also requires compact pt. This is to support other products that will have 64K but not have the PDE non-sharing restriction in future.
Those lines moved to the next change NEEDS_COMPACT_PT, which is now separate.
Yes, NEEDS_COMPACT_PT indicates that compact is *required* but does "HAS_64K_PAGES" still mean compact is supported? That information is lost.
/Thomas
On 27/01/2022 09:37, Thomas Hellström (Intel) wrote:
On 1/26/22 18:11, Robert Beckett wrote:
On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:
On 1/25/22 20:35, Robert Beckett wrote:
From: Ramalingam C ramalingam.c@intel.com
Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages.
With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access.
Suggested-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/i915/i915_drv.h | 10 +++++++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 44c1f98144b4..1258b7779705 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, /* * Set this flag, when platform requires 64K GTT page sizes or larger for
- device local memory access. Also this flag implies that we
require or
- at least support the compact PT layout for the ppGTT when using
the 64K
- GTT pages.
Why do we remove these comment lines?
Because HAS_64K_PAGES now means just 64K page, it no longer means also requires compact pt. This is to support other products that will have 64K but not have the PDE non-sharing restriction in future.
Those lines moved to the next change NEEDS_COMPACT_PT, which is now separate.
Yes, NEEDS_COMPACT_PT indicates that compact is *required* but does "HAS_64K_PAGES" still mean compact is supported? That information is lost.
Not any more. I discussed the ambiguity of the original wording with mauld on irc. We came to the conclusion that HAS_64K_PAGES should mean just that, that the hw has support for 64K pages, and says nothing about compact-pt at all.
NEEDS_COMPACT_PT means that the hw has compact-pt support and the driver is required to use it as it is a hw limitation.
There will be other devices that can support compact-pt but do not mandate its use. In this case, the current code would not use them, but there is potential for some future opportunistic use of that in the driver if desired (currently expected to include accelerated move/clear). If any opportunistic use is added to the driver, a new flag can be added along with the code that uses it to indicate compact-pt availability that is not mandatory (HAS_COMPACT_PT most likely), but as there is no code requiring it currently it should not be added yet, and the comments left as this patch does.
/Thomas
On 27/01/2022 09:37, Thomas Hellström (Intel) wrote:
On 1/26/22 18:11, Robert Beckett wrote:
On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:
On 1/25/22 20:35, Robert Beckett wrote:
From: Ramalingam C ramalingam.c@intel.com
Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages.
With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access.
Suggested-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/i915/i915_drv.h | 10 +++++++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 44c1f98144b4..1258b7779705 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, /* * Set this flag, when platform requires 64K GTT page sizes or larger for
- device local memory access. Also this flag implies that we
require or
- at least support the compact PT layout for the ppGTT when using
the 64K
- GTT pages.
Why do we remove these comment lines?
Because HAS_64K_PAGES now means just 64K page, it no longer means also requires compact pt. This is to support other products that will have 64K but not have the PDE non-sharing restriction in future.
Those lines moved to the next change NEEDS_COMPACT_PT, which is now separate.
Yes, NEEDS_COMPACT_PT indicates that compact is *required* but does "HAS_64K_PAGES" still mean compact is supported? That information is lost.
Not any more. I discussed the ambiguity of the original wording with mauld on irc. We came to the conclusion that HAS_64K_PAGES should mean just that, that the hw has support for 64K pages, and says nothing about compact-pt at all.
NEEDS_COMPACT_PT means that the hw has compact-pt support and the driver is required to use it as it is a hw limitation.
There will be other devices that can support compact-pt but do not mandate its use. In this case, the current code would not use them, but there is potential for some future opportunistic use of that in the driver if desired (currently expected to include accelerated move/clear). If any opportunistic use is added to the driver, a new flag can be added along with the code that uses it to indicate compact-pt availability that is not mandatory (HAS_COMPACT_PT most likely), but as there is no code requiring it currently it should not be added yet, and the comments left as this patch does.
/Thomas
On 1/31/22 15:19, Robert Beckett wrote:
On 27/01/2022 09:37, Thomas Hellström (Intel) wrote:
On 1/26/22 18:11, Robert Beckett wrote:
On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:
On 1/25/22 20:35, Robert Beckett wrote:
From: Ramalingam C ramalingam.c@intel.com
Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages.
With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access.
Suggested-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/i915/i915_drv.h | 10 +++++++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 44c1f98144b4..1258b7779705 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, /* * Set this flag, when platform requires 64K GTT page sizes or larger for
- device local memory access. Also this flag implies that we
require or
- at least support the compact PT layout for the ppGTT when
using the 64K
- GTT pages.
Why do we remove these comment lines?
Because HAS_64K_PAGES now means just 64K page, it no longer means also requires compact pt. This is to support other products that will have 64K but not have the PDE non-sharing restriction in future.
Those lines moved to the next change NEEDS_COMPACT_PT, which is now separate.
Yes, NEEDS_COMPACT_PT indicates that compact is *required* but does "HAS_64K_PAGES" still mean compact is supported? That information is lost.
Not any more. I discussed the ambiguity of the original wording with mauld on irc. We came to the conclusion that HAS_64K_PAGES should mean just that, that the hw has support for 64K pages, and says nothing about compact-pt at all.
NEEDS_COMPACT_PT means that the hw has compact-pt support and the driver is required to use it as it is a hw limitation.
There will be other devices that can support compact-pt but do not mandate its use. In this case, the current code would not use them, but there is potential for some future opportunistic use of that in the driver if desired (currently expected to include accelerated move/clear). If any opportunistic use is added to the driver, a new flag can be added along with the code that uses it to indicate compact-pt availability that is not mandatory (HAS_COMPACT_PT most likely), but as there is no code requiring it currently it should not be added yet, and the comments left as this patch does.
Ok, Thanks for the clarification.
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
/Thomas
From: Matthew Auld matthew.auld@intel.com
For local-memory objects we need to align the GTT addresses to 64K, both for the ppgtt and ggtt.
We need to support vm->min_alignment > 4K, depending on the vm itself and the type of object we are inserting. With this in mind update the GTT selftests to take this into account.
For compact-pt we further align and pad lmem object GTT addresses to 2MB to ensure PDEs contain consistent page sizes as required by the HW.
v3: * use needs_compact_pt flag to discriminate between 64K and 64K with compact-pt * add i915_vm_obj_min_alignment * use i915_vm_obj_min_alignment to round up vma reservation if compact-pt instead of hard coding v5: * fix i915_vm_obj_min_alignment for internal objects which have no memory region
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com --- .../i915/gem/selftests/i915_gem_client_blt.c | 23 +++-- drivers/gpu/drm/i915/gt/intel_gtt.c | 12 +++ drivers/gpu/drm/i915/gt/intel_gtt.h | 18 ++++ drivers/gpu/drm/i915/i915_vma.c | 9 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++------- 5 files changed, 117 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c index c8ff8bf0986d..f0bfce53258f 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c @@ -39,6 +39,7 @@ struct tiled_blits { struct blit_buffer scratch; struct i915_vma *batch; u64 hole; + u64 align; u32 width; u32 height; }; @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng) goto err_free; }
- hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4); + t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */ + t->align = max(t->align, + i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL)); + t->align = max(t->align, + i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM)); + + hole_size = 2 * round_up(WIDTH * HEIGHT * 4, t->align); hole_size *= 2; /* room to maneuver */ - hole_size += 2 * I915_GTT_MIN_ALIGNMENT; + hole_size += 2 * t->align; /* padding on either side */
mutex_lock(&t->ce->vm->mutex); memset(&hole, 0, sizeof(hole)); err = drm_mm_insert_node_in_range(&t->ce->vm->mm, &hole, - hole_size, 0, I915_COLOR_UNEVICTABLE, + hole_size, t->align, + I915_COLOR_UNEVICTABLE, 0, U64_MAX, DRM_MM_INSERT_BEST); if (!err) @@ -428,7 +436,7 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng) goto err_put; }
- t->hole = hole.start + I915_GTT_MIN_ALIGNMENT; + t->hole = hole.start + t->align; pr_info("Using hole at %llx\n", t->hole);
err = tiled_blits_create_buffers(t, WIDTH, HEIGHT, prng); @@ -455,7 +463,7 @@ static void tiled_blits_destroy(struct tiled_blits *t) static int tiled_blits_prepare(struct tiled_blits *t, struct rnd_state *prng) { - u64 offset = PAGE_ALIGN(t->width * t->height * 4); + u64 offset = round_up(t->width * t->height * 4, t->align); u32 *map; int err; int i; @@ -486,8 +494,7 @@ static int tiled_blits_prepare(struct tiled_blits *t,
static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng) { - u64 offset = - round_up(t->width * t->height * 4, 2 * I915_GTT_MIN_ALIGNMENT); + u64 offset = round_up(t->width * t->height * 4, 2 * t->align); int err;
/* We want to check position invariant tiling across GTT eviction */ @@ -500,7 +507,7 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
/* Reposition so that we overlap the old addresses, and slightly off */ err = tiled_blit(t, - &t->buffers[2], t->hole + I915_GTT_MIN_ALIGNMENT, + &t->buffers[2], t->hole + t->align, &t->buffers[1], t->hole + 3 * offset / 2); if (err) return err; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 46be4197b93f..df23ebdfc994 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -223,6 +223,18 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass)
GEM_BUG_ON(!vm->total); drm_mm_init(&vm->mm, 0, vm->total); + + memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT, + ARRAY_SIZE(vm->min_alignment)); + + if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) { + vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M; + vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M; + } else if (HAS_64K_PAGES(vm->i915)) { + vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_64K; + vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_64K; + } + vm->mm.head_node.color = I915_COLOR_UNEVICTABLE;
INIT_LIST_HEAD(&vm->bound_list); diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index 8073438b67c8..ba9f040f8606 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -29,6 +29,8 @@ #include "i915_selftest.h" #include "i915_vma_resource.h" #include "i915_vma_types.h" +#include "i915_params.h" +#include "intel_memory_region.h"
#define I915_GFP_ALLOW_FAIL (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
@@ -223,6 +225,7 @@ struct i915_address_space { struct device *dma; u64 total; /* size addr space maps (ex. 2GB for ggtt) */ u64 reserved; /* size addr space reserved */ + u64 min_alignment[INTEL_MEMORY_STOLEN_LOCAL + 1];
unsigned int bind_async_flags;
@@ -384,6 +387,21 @@ i915_vm_has_scratch_64K(struct i915_address_space *vm) return vm->scratch_order == get_order(I915_GTT_PAGE_SIZE_64K); }
+static inline u64 i915_vm_min_alignment(struct i915_address_space *vm, + enum intel_memory_type type) +{ + return vm->min_alignment[type]; +} + +static inline u64 i915_vm_obj_min_alignment(struct i915_address_space *vm, + struct drm_i915_gem_object *obj) +{ + struct intel_memory_region *mr = READ_ONCE(obj->mm.region); + enum intel_memory_type type = mr ? mr->type : INTEL_MEMORY_SYSTEM; + + return i915_vm_min_alignment(vm, type); +} + static inline bool i915_vm_has_cache_coloring(struct i915_address_space *vm) { diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index b959e904c4d3..3b31d2ba6864 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -745,6 +745,14 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, end = min_t(u64, end, (1ULL << 32) - I915_GTT_PAGE_SIZE); GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE));
+ alignment = max(alignment, i915_vm_obj_min_alignment(vma->vm, vma->obj)); + /* + * for compact-pt we round up the reservation to prevent + * any smaller pages being used within the same PDE + */ + if (NEEDS_COMPACT_PT(vma->vm->i915)) + size = round_up(size, alignment); + /* If binding the object/GGTT view requires more space than the entire * aperture has, reject it early before evicting everything in a vain * attempt to find space. @@ -757,6 +765,7 @@ i915_vma_insert(struct i915_vma *vma, struct i915_gem_ww_ctx *ww, }
color = 0; + if (i915_vm_has_cache_coloring(vma->vm)) color = vma->obj->cache_level;
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index fba1c8be1649..b80788a2b7f9 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -238,6 +238,8 @@ static int lowlevel_hole(struct i915_address_space *vm, u64 hole_start, u64 hole_end, unsigned long end_time) { + const unsigned int min_alignment = + i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); I915_RND_STATE(seed_prng); struct i915_vma_resource *mock_vma_res; unsigned int size; @@ -251,9 +253,10 @@ static int lowlevel_hole(struct i915_address_space *vm, I915_RND_SUBSTATE(prng, seed_prng); struct drm_i915_gem_object *obj; unsigned int *order, count, n; - u64 hole_size; + u64 hole_size, aligned_size;
- hole_size = (hole_end - hole_start) >> size; + aligned_size = max_t(u32, ilog2(min_alignment), size); + hole_size = (hole_end - hole_start) >> aligned_size; if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32)) hole_size = KMALLOC_MAX_SIZE / sizeof(u32); count = hole_size >> 1; @@ -274,8 +277,8 @@ static int lowlevel_hole(struct i915_address_space *vm, } GEM_BUG_ON(!order);
- GEM_BUG_ON(count * BIT_ULL(size) > vm->total); - GEM_BUG_ON(hole_start + count * BIT_ULL(size) > hole_end); + GEM_BUG_ON(count * BIT_ULL(aligned_size) > vm->total); + GEM_BUG_ON(hole_start + count * BIT_ULL(aligned_size) > hole_end);
/* Ignore allocation failures (i.e. don't report them as * a test failure) as we are purposefully allocating very @@ -298,10 +301,10 @@ static int lowlevel_hole(struct i915_address_space *vm, }
for (n = 0; n < count; n++) { - u64 addr = hole_start + order[n] * BIT_ULL(size); + u64 addr = hole_start + order[n] * BIT_ULL(aligned_size); intel_wakeref_t wakeref;
- GEM_BUG_ON(addr + BIT_ULL(size) > vm->total); + GEM_BUG_ON(addr + BIT_ULL(aligned_size) > vm->total);
if (igt_timeout(end_time, "%s timed out before %d/%d\n", @@ -344,7 +347,7 @@ static int lowlevel_hole(struct i915_address_space *vm, }
mock_vma_res->bi.pages = obj->mm.pages; - mock_vma_res->node_size = BIT_ULL(size); + mock_vma_res->node_size = BIT_ULL(aligned_size); mock_vma_res->start = addr;
with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref) @@ -355,7 +358,7 @@ static int lowlevel_hole(struct i915_address_space *vm,
i915_random_reorder(order, count, &prng); for (n = 0; n < count; n++) { - u64 addr = hole_start + order[n] * BIT_ULL(size); + u64 addr = hole_start + order[n] * BIT_ULL(aligned_size); intel_wakeref_t wakeref;
GEM_BUG_ON(addr + BIT_ULL(size) > vm->total); @@ -399,8 +402,10 @@ static int fill_hole(struct i915_address_space *vm, { const u64 hole_size = hole_end - hole_start; struct drm_i915_gem_object *obj; + const unsigned int min_alignment = + i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); const unsigned long max_pages = - min_t(u64, ULONG_MAX - 1, hole_size/2 >> PAGE_SHIFT); + min_t(u64, ULONG_MAX - 1, (hole_size / 2) >> ilog2(min_alignment)); const unsigned long max_step = max(int_sqrt(max_pages), 2UL); unsigned long npages, prime, flags; struct i915_vma *vma; @@ -441,14 +446,17 @@ static int fill_hole(struct i915_address_space *vm,
offset = p->offset; list_for_each_entry(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue;
if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; }
err = i915_vma_pin(vma, 0, 0, offset | flags); @@ -470,22 +478,25 @@ static int fill_hole(struct i915_address_space *vm, i915_vma_unpin(vma);
if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } }
offset = p->offset; list_for_each_entry(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue;
if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; }
if (!drm_mm_node_allocated(&vma->node) || @@ -506,22 +517,25 @@ static int fill_hole(struct i915_address_space *vm, }
if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } }
offset = p->offset; list_for_each_entry_reverse(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue;
if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; }
err = i915_vma_pin(vma, 0, 0, offset | flags); @@ -543,22 +557,25 @@ static int fill_hole(struct i915_address_space *vm, i915_vma_unpin(vma);
if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } }
offset = p->offset; list_for_each_entry_reverse(obj, &objects, st_link) { + u64 aligned_size = round_up(obj->base.size, + min_alignment); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) continue;
if (p->step < 0) { - if (offset < hole_start + obj->base.size) + if (offset < hole_start + aligned_size) break; - offset -= obj->base.size; + offset -= aligned_size; }
if (!drm_mm_node_allocated(&vma->node) || @@ -579,9 +596,9 @@ static int fill_hole(struct i915_address_space *vm, }
if (p->step > 0) { - if (offset + obj->base.size > hole_end) + if (offset + aligned_size > hole_end) break; - offset += obj->base.size; + offset += aligned_size; } } } @@ -611,6 +628,7 @@ static int walk_hole(struct i915_address_space *vm, const u64 hole_size = hole_end - hole_start; const unsigned long max_pages = min_t(u64, ULONG_MAX - 1, hole_size >> PAGE_SHIFT); + unsigned long min_alignment; unsigned long flags; u64 size;
@@ -620,6 +638,8 @@ static int walk_hole(struct i915_address_space *vm, if (i915_is_ggtt(vm)) flags |= PIN_GLOBAL;
+ min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + for_each_prime_number_from(size, 1, max_pages) { struct drm_i915_gem_object *obj; struct i915_vma *vma; @@ -638,7 +658,7 @@ static int walk_hole(struct i915_address_space *vm,
for (addr = hole_start; addr + obj->base.size < hole_end; - addr += obj->base.size) { + addr += round_up(obj->base.size, min_alignment)) { err = i915_vma_pin(vma, 0, 0, addr | flags); if (err) { pr_err("%s bind failed at %llx + %llx [hole %llx- %llx] with err=%d\n", @@ -690,6 +710,7 @@ static int pot_hole(struct i915_address_space *vm, { struct drm_i915_gem_object *obj; struct i915_vma *vma; + unsigned int min_alignment; unsigned long flags; unsigned int pot; int err = 0; @@ -698,6 +719,8 @@ static int pot_hole(struct i915_address_space *vm, if (i915_is_ggtt(vm)) flags |= PIN_GLOBAL;
+ min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + obj = i915_gem_object_create_internal(vm->i915, 2 * I915_GTT_PAGE_SIZE); if (IS_ERR(obj)) return PTR_ERR(obj); @@ -710,13 +733,13 @@ static int pot_hole(struct i915_address_space *vm,
/* Insert a pair of pages across every pot boundary within the hole */ for (pot = fls64(hole_end - 1) - 1; - pot > ilog2(2 * I915_GTT_PAGE_SIZE); + pot > ilog2(2 * min_alignment); pot--) { u64 step = BIT_ULL(pot); u64 addr;
- for (addr = round_up(hole_start + I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE; - addr <= round_down(hole_end - 2*I915_GTT_PAGE_SIZE, step) - I915_GTT_PAGE_SIZE; + for (addr = round_up(hole_start + min_alignment, step) - min_alignment; + addr <= round_down(hole_end - (2 * min_alignment), step) - min_alignment; addr += step) { err = i915_vma_pin(vma, 0, 0, addr | flags); if (err) { @@ -761,6 +784,7 @@ static int drunk_hole(struct i915_address_space *vm, unsigned long end_time) { I915_RND_STATE(prng); + unsigned int min_alignment; unsigned int size; unsigned long flags;
@@ -768,15 +792,18 @@ static int drunk_hole(struct i915_address_space *vm, if (i915_is_ggtt(vm)) flags |= PIN_GLOBAL;
+ min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + /* Keep creating larger objects until one cannot fit into the hole */ for (size = 12; (hole_end - hole_start) >> size; size++) { struct drm_i915_gem_object *obj; unsigned int *order, count, n; struct i915_vma *vma; - u64 hole_size; + u64 hole_size, aligned_size; int err = -ENODEV;
- hole_size = (hole_end - hole_start) >> size; + aligned_size = max_t(u32, ilog2(min_alignment), size); + hole_size = (hole_end - hole_start) >> aligned_size; if (hole_size > KMALLOC_MAX_SIZE / sizeof(u32)) hole_size = KMALLOC_MAX_SIZE / sizeof(u32); count = hole_size >> 1; @@ -816,7 +843,7 @@ static int drunk_hole(struct i915_address_space *vm, GEM_BUG_ON(vma->size != BIT_ULL(size));
for (n = 0; n < count; n++) { - u64 addr = hole_start + order[n] * BIT_ULL(size); + u64 addr = hole_start + order[n] * BIT_ULL(aligned_size);
err = i915_vma_pin(vma, 0, 0, addr | flags); if (err) { @@ -868,11 +895,14 @@ static int __shrink_hole(struct i915_address_space *vm, { struct drm_i915_gem_object *obj; unsigned long flags = PIN_OFFSET_FIXED | PIN_USER; + unsigned int min_alignment; unsigned int order = 12; LIST_HEAD(objects); int err = 0; u64 addr;
+ min_alignment = i915_vm_min_alignment(vm, INTEL_MEMORY_SYSTEM); + /* Keep creating larger objects until one cannot fit into the hole */ for (addr = hole_start; addr < hole_end; ) { struct i915_vma *vma; @@ -913,7 +943,7 @@ static int __shrink_hole(struct i915_address_space *vm, }
i915_vma_unpin(vma); - addr += size; + addr += round_up(size, min_alignment);
/* * Since we are injecting allocation faults at random intervals,
On 1/25/22 20:35, Robert Beckett wrote:
From: Matthew Auld matthew.auld@intel.com
For local-memory objects we need to align the GTT addresses to 64K, both for the ppgtt and ggtt.
We need to support vm->min_alignment > 4K, depending on the vm itself and the type of object we are inserting. With this in mind update the GTT selftests to take this into account.
For compact-pt we further align and pad lmem object GTT addresses to 2MB to ensure PDEs contain consistent page sizes as required by the HW.
v3:
- use needs_compact_pt flag to discriminate between 64K and 64K with compact-pt
- add i915_vm_obj_min_alignment
- use i915_vm_obj_min_alignment to round up vma reservation if compact-pt instead of hard coding
v5:
- fix i915_vm_obj_min_alignment for internal objects which have no memory region
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com
.../i915/gem/selftests/i915_gem_client_blt.c | 23 +++-- drivers/gpu/drm/i915/gt/intel_gtt.c | 12 +++ drivers/gpu/drm/i915/gt/intel_gtt.h | 18 ++++ drivers/gpu/drm/i915/i915_vma.c | 9 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++------- 5 files changed, 117 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c index c8ff8bf0986d..f0bfce53258f 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c @@ -39,6 +39,7 @@ struct tiled_blits { struct blit_buffer scratch; struct i915_vma *batch; u64 hole;
- u64 align; u32 width; u32 height; };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng) goto err_free; }
- hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);
- t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
- t->align = max(t->align,
i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
- t->align = max(t->align,
i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
Don't we always end up with 2M here, regardless of the vm restrictions?
On 26/01/2022 15:45, Thomas Hellström (Intel) wrote:
On 1/25/22 20:35, Robert Beckett wrote:
From: Matthew Auld matthew.auld@intel.com
For local-memory objects we need to align the GTT addresses to 64K, both for the ppgtt and ggtt.
We need to support vm->min_alignment > 4K, depending on the vm itself and the type of object we are inserting. With this in mind update the GTT selftests to take this into account.
For compact-pt we further align and pad lmem object GTT addresses to 2MB to ensure PDEs contain consistent page sizes as required by the HW.
v3: * use needs_compact_pt flag to discriminate between 64K and 64K with compact-pt * add i915_vm_obj_min_alignment * use i915_vm_obj_min_alignment to round up vma reservation if compact-pt instead of hard coding v5: * fix i915_vm_obj_min_alignment for internal objects which have no memory region
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com
.../i915/gem/selftests/i915_gem_client_blt.c | 23 +++-- drivers/gpu/drm/i915/gt/intel_gtt.c | 12 +++ drivers/gpu/drm/i915/gt/intel_gtt.h | 18 ++++ drivers/gpu/drm/i915/i915_vma.c | 9 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ++++++++++++------- 5 files changed, 117 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c index c8ff8bf0986d..f0bfce53258f 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c @@ -39,6 +39,7 @@ struct tiled_blits { struct blit_buffer scratch; struct i915_vma *batch; u64 hole; + u64 align; u32 width; u32 height; }; @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng) goto err_free; } - hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4); + t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */ + t->align = max(t->align, + i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL)); + t->align = max(t->align, + i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));
Don't we always end up with 2M here, regardless of the vm restrictions?
agreed. I will drop the 2M worst case.
From: Matthew Auld matthew.auld@intel.com
discrete cards optimise 64K GTT pages for local-memory, since everything should be allocated at 64K granularity. We say goodbye to sparse entries, and instead get a compact 256B page-table for 64K pages, which should be more cache friendly. 4K pages for local-memory are no longer supported by the HW.
v4: don't return uninitialized err in igt_ppgtt_compact Reported-by: kernel test robot lkp@intel.com
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Stuart Summers stuart.summers@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com --- .../gpu/drm/i915/gem/selftests/huge_pages.c | 60 ++++++++++ drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 108 +++++++++++++++++- drivers/gpu/drm/i915/gt/intel_gtt.h | 3 + drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + 4 files changed, 169 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index f36191ebf964..a7d9bdb85d70 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg) return err; }
+static int igt_ppgtt_compact(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct drm_i915_gem_object *obj; + int err; + + /* + * Simple test to catch issues with compact 64K pages -- since the pt is + * compacted to 256B that gives us 32 entries per pt, however since the + * backing page for the pt is 4K, any extra entries we might incorrectly + * write out should be ignored by the HW. If ever hit such a case this + * test should catch it since some of our writes would land in scratch. + */ + + if (!HAS_64K_PAGES(i915)) { + pr_info("device lacks compact 64K page support, skipping\n"); + return 0; + } + + if (!HAS_LMEM(i915)) { + pr_info("device lacks LMEM support, skipping\n"); + return 0; + } + + /* We want the range to cover multiple page-table boundaries. */ + obj = i915_gem_object_create_lmem(i915, SZ_4M, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + err = i915_gem_object_pin_pages_unlocked(obj); + if (err) + goto out_put; + + if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) { + pr_info("LMEM compact unable to allocate huge-page(s)\n"); + goto out_unpin; + } + + /* + * Disable 2M GTT pages by forcing the page-size to 64K for the GTT + * insertion. + */ + obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K; + + err = igt_write_huge(i915, obj); + if (err) + pr_err("LMEM compact write-huge failed\n"); + +out_unpin: + i915_gem_object_unpin_pages(obj); +out_put: + i915_gem_object_put(obj); + + if (err == -ENOMEM) + err = 0; + + return err; +} + static int igt_tmpfs_fallback(void *arg) { struct drm_i915_private *i915 = arg; @@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_tmpfs_fallback), SUBTEST(igt_ppgtt_smoke_huge), SUBTEST(igt_ppgtt_sanity_check), + SUBTEST(igt_ppgtt_compact), };
if (!HAS_PPGTT(i915)) { diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index c43e724afa9f..62471730266c 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, start, end, lvl); } else { unsigned int count; + unsigned int pte = gen8_pd_index(start, 0); + unsigned int num_ptes; u64 *vaddr;
count = gen8_pt_count(start, end); @@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
+ num_ptes = count; + if (pt->is_compact) { + GEM_BUG_ON(num_ptes % 16); + GEM_BUG_ON(pte % 16); + num_ptes /= 16; + pte /= 16; + } + vaddr = px_vaddr(pt); - memset64(vaddr + gen8_pd_index(start, 0), + memset64(vaddr + pte, vm->scratch[0]->encode, - count); + num_ptes);
atomic_sub(count, &pt->used); start += count; @@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, return idx; }
+static void +xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm, + struct i915_vma_resource *vma_res, + struct sgt_dma *iter, + enum i915_cache_level cache_level, + u32 flags) +{ + const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags); + unsigned int rem = sg_dma_len(iter->sg); + u64 start = vma_res->start; + + GEM_BUG_ON(!i915_vm_is_4lvl(vm)); + + do { + struct i915_page_directory * const pdp = + gen8_pdp_for_page_address(vm, start); + struct i915_page_directory * const pd = + i915_pd_entry(pdp, __gen8_pte_index(start, 2)); + struct i915_page_table *pt = + i915_pt_entry(pd, __gen8_pte_index(start, 1)); + gen8_pte_t encode = pte_encode; + unsigned int page_size; + gen8_pte_t *vaddr; + u16 index, max; + + max = I915_PDES; + + if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M && + IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) && + rem >= I915_GTT_PAGE_SIZE_2M && + !__gen8_pte_index(start, 0)) { + index = __gen8_pte_index(start, 1); + encode |= GEN8_PDE_PS_2M; + page_size = I915_GTT_PAGE_SIZE_2M; + + vaddr = px_vaddr(pd); + } else { + if (encode & GEN12_PPGTT_PTE_LM) { + GEM_BUG_ON(__gen8_pte_index(start, 0) % 16); + GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K); + GEM_BUG_ON(!IS_ALIGNED(iter->dma, + I915_GTT_PAGE_SIZE_64K)); + + index = __gen8_pte_index(start, 0) / 16; + page_size = I915_GTT_PAGE_SIZE_64K; + + max /= 16; + + vaddr = px_vaddr(pd); + vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K; + + pt->is_compact = true; + } else { + GEM_BUG_ON(pt->is_compact); + index = __gen8_pte_index(start, 0); + page_size = I915_GTT_PAGE_SIZE; + } + + vaddr = px_vaddr(pt); + } + + do { + GEM_BUG_ON(rem < page_size); + vaddr[index++] = encode | iter->dma; + + start += page_size; + iter->dma += page_size; + rem -= page_size; + if (iter->dma >= iter->max) { + iter->sg = __sg_next(iter->sg); + if (!iter->sg) + break; + + rem = sg_dma_len(iter->sg); + if (!rem) + break; + + iter->dma = sg_dma_address(iter->sg); + iter->max = iter->dma + rem; + + if (unlikely(!IS_ALIGNED(iter->dma, page_size))) + break; + } + } while (rem >= page_size && index < max); + + vma_res->page_sizes_gtt |= page_size; + } while (iter->sg && sg_dma_len(iter->sg)); +} + static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, struct i915_vma_resource *vma_res, struct sgt_dma *iter, @@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, struct sgt_dma iter = sgt_dma(vma_res);
if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) { - gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + if (HAS_64K_PAGES(vm->i915)) + xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); + else + gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags); } else { u64 idx = vma_res->start >> GEN8_PTE_SHIFT;
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index ba9f040f8606..e6ce0be6d484 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
#define GEN12_GGTT_PTE_LM BIT_ULL(1)
+#define GEN12_PDE_64K BIT(6) + /* * Cacheability Control is a 4-bit value. The low three bits are stored in bits * 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE. @@ -160,6 +162,7 @@ struct i915_page_table { atomic_t used; struct i915_page_table *stash; }; + bool is_compact; };
struct i915_page_directory { diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 48e6e2f87700..043652dc6892 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm) return ERR_PTR(-ENOMEM); }
+ pt->is_compact = false; atomic_set(&pt->used, 0); return pt; }
On 1/25/22 20:35, Robert Beckett wrote:
From: Matthew Auld matthew.auld@intel.com
discrete cards optimise 64K GTT pages for local-memory, since everything should be allocated at 64K granularity. We say goodbye to sparse entries, and instead get a compact 256B page-table for 64K pages, which should be more cache friendly. 4K pages for local-memory are no longer supported by the HW.
v4: don't return uninitialized err in igt_ppgtt_compact Reported-by: kernel test robot lkp@intel.com
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Stuart Summers stuart.summers@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com
.../gpu/drm/i915/gem/selftests/huge_pages.c | 60 ++++++++++ drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 108 +++++++++++++++++- drivers/gpu/drm/i915/gt/intel_gtt.h | 3 + drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + 4 files changed, 169 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index f36191ebf964..a7d9bdb85d70 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1478,6 +1478,65 @@ static int igt_ppgtt_sanity_check(void *arg) return err; }
+static int igt_ppgtt_compact(void *arg) +{
- struct drm_i915_private *i915 = arg;
- struct drm_i915_gem_object *obj;
- int err;
- /*
* Simple test to catch issues with compact 64K pages -- since the pt is
* compacted to 256B that gives us 32 entries per pt, however since the
* backing page for the pt is 4K, any extra entries we might incorrectly
* write out should be ignored by the HW. If ever hit such a case this
* test should catch it since some of our writes would land in scratch.
*/
- if (!HAS_64K_PAGES(i915)) {
pr_info("device lacks compact 64K page support, skipping\n");
return 0;
- }
- if (!HAS_LMEM(i915)) {
pr_info("device lacks LMEM support, skipping\n");
return 0;
- }
- /* We want the range to cover multiple page-table boundaries. */
- obj = i915_gem_object_create_lmem(i915, SZ_4M, 0);
- if (IS_ERR(obj))
return PTR_ERR(obj);
- err = i915_gem_object_pin_pages_unlocked(obj);
- if (err)
goto out_put;
- if (obj->mm.page_sizes.phys < I915_GTT_PAGE_SIZE_64K) {
pr_info("LMEM compact unable to allocate huge-page(s)\n");
goto out_unpin;
- }
- /*
* Disable 2M GTT pages by forcing the page-size to 64K for the GTT
* insertion.
*/
- obj->mm.page_sizes.sg = I915_GTT_PAGE_SIZE_64K;
- err = igt_write_huge(i915, obj);
- if (err)
pr_err("LMEM compact write-huge failed\n");
+out_unpin:
- i915_gem_object_unpin_pages(obj);
+out_put:
- i915_gem_object_put(obj);
- if (err == -ENOMEM)
err = 0;
- return err;
+}
- static int igt_tmpfs_fallback(void *arg) { struct drm_i915_private *i915 = arg;
@@ -1735,6 +1794,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_tmpfs_fallback), SUBTEST(igt_ppgtt_smoke_huge), SUBTEST(igt_ppgtt_sanity_check),
SUBTEST(igt_ppgtt_compact),
};
if (!HAS_PPGTT(i915)) {
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index c43e724afa9f..62471730266c 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -233,6 +233,8 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, start, end, lvl); } else { unsigned int count;
unsigned int pte = gen8_pd_index(start, 0);
unsigned int num_ptes; u64 *vaddr; count = gen8_pt_count(start, end);
@@ -242,10 +244,18 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used));
num_ptes = count;
if (pt->is_compact) {
GEM_BUG_ON(num_ptes % 16);
GEM_BUG_ON(pte % 16);
num_ptes /= 16;
pte /= 16;
}
vaddr = px_vaddr(pt);
memset64(vaddr + gen8_pd_index(start, 0),
memset64(vaddr + pte, vm->scratch[0]->encode,
count);
num_ptes); atomic_sub(count, &pt->used); start += count;
@@ -453,6 +463,95 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, return idx; }
+static void +xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
struct i915_vma_resource *vma_res,
struct sgt_dma *iter,
enum i915_cache_level cache_level,
u32 flags)
+{
- const gen8_pte_t pte_encode = vm->pte_encode(0, cache_level, flags);
- unsigned int rem = sg_dma_len(iter->sg);
- u64 start = vma_res->start;
- GEM_BUG_ON(!i915_vm_is_4lvl(vm));
- do {
struct i915_page_directory * const pdp =
gen8_pdp_for_page_address(vm, start);
struct i915_page_directory * const pd =
i915_pd_entry(pdp, __gen8_pte_index(start, 2));
struct i915_page_table *pt =
i915_pt_entry(pd, __gen8_pte_index(start, 1));
gen8_pte_t encode = pte_encode;
unsigned int page_size;
gen8_pte_t *vaddr;
u16 index, max;
max = I915_PDES;
if (vma_res->bi.page_sizes.sg & I915_GTT_PAGE_SIZE_2M &&
IS_ALIGNED(iter->dma, I915_GTT_PAGE_SIZE_2M) &&
rem >= I915_GTT_PAGE_SIZE_2M &&
!__gen8_pte_index(start, 0)) {
index = __gen8_pte_index(start, 1);
encode |= GEN8_PDE_PS_2M;
page_size = I915_GTT_PAGE_SIZE_2M;
vaddr = px_vaddr(pd);
} else {
if (encode & GEN12_PPGTT_PTE_LM) {
GEM_BUG_ON(__gen8_pte_index(start, 0) % 16);
GEM_BUG_ON(rem < I915_GTT_PAGE_SIZE_64K);
GEM_BUG_ON(!IS_ALIGNED(iter->dma,
I915_GTT_PAGE_SIZE_64K));
index = __gen8_pte_index(start, 0) / 16;
page_size = I915_GTT_PAGE_SIZE_64K;
max /= 16;
vaddr = px_vaddr(pd);
vaddr[__gen8_pte_index(start, 1)] |= GEN12_PDE_64K;
pt->is_compact = true;
} else {
GEM_BUG_ON(pt->is_compact);
index = __gen8_pte_index(start, 0);
page_size = I915_GTT_PAGE_SIZE;
}
vaddr = px_vaddr(pt);
}
do {
GEM_BUG_ON(rem < page_size);
vaddr[index++] = encode | iter->dma;
start += page_size;
iter->dma += page_size;
rem -= page_size;
if (iter->dma >= iter->max) {
iter->sg = __sg_next(iter->sg);
if (!iter->sg)
break;
rem = sg_dma_len(iter->sg);
if (!rem)
break;
iter->dma = sg_dma_address(iter->sg);
iter->max = iter->dma + rem;
if (unlikely(!IS_ALIGNED(iter->dma, page_size)))
break;
}
} while (rem >= page_size && index < max);
vma_res->page_sizes_gtt |= page_size;
- } while (iter->sg && sg_dma_len(iter->sg));
+}
- static void gen8_ppgtt_insert_huge(struct i915_address_space *vm, struct i915_vma_resource *vma_res, struct sgt_dma *iter,
@@ -586,7 +685,10 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm, struct sgt_dma iter = sgt_dma(vma_res);
if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
if (HAS_64K_PAGES(vm->i915))
xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
else
} else { u64 idx = vma_res->start >> GEN8_PTE_SHIFT;gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index ba9f040f8606..e6ce0be6d484 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -92,6 +92,8 @@ typedef u64 gen8_pte_t;
#define GEN12_GGTT_PTE_LM BIT_ULL(1)
+#define GEN12_PDE_64K BIT(6)
- /*
- Cacheability Control is a 4-bit value. The low three bits are stored in bits
- 3:1 of the PTE, while the fourth bit is stored in bit 11 of the PTE.
@@ -160,6 +162,7 @@ struct i915_page_table { atomic_t used; struct i915_page_table *stash; };
bool is_compact; };
struct i915_page_directory {
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 48e6e2f87700..043652dc6892 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -26,6 +26,7 @@ struct i915_page_table *alloc_pt(struct i915_address_space *vm) return ERR_PTR(-ENOMEM); }
- pt->is_compact = false; atomic_set(&pt->used, 0); return pt; }
add test to check handling of misaligned offsets and sizes
v4: * remove spurious blank lines * explicitly cast intel_region_id to intel_memory_type in misaligned_pin Reported-by: kernel test robot lkp@intel.com
Signed-off-by: Robert Beckett bob.beckett@collabora.com --- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++++++++++++++++++ 1 file changed, 128 insertions(+)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index b80788a2b7f9..f082b5ff3b5e 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -22,10 +22,12 @@ * */
+#include "gt/intel_gtt.h" #include <linux/list_sort.h> #include <linux/prime_numbers.h>
#include "gem/i915_gem_context.h" +#include "gem/i915_gem_region.h" #include "gem/selftests/mock_context.h" #include "gt/intel_context.h" #include "gt/intel_gpu_commands.h" @@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm, return err; }
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr, + u64 addr, u64 size, unsigned long flags) +{ + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + int err = 0; + u64 expected_vma_size, expected_node_size; + + obj = i915_gem_object_create_region(mr, size, 0, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + vma = i915_vma_instance(obj, vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err_put; + } + + err = i915_vma_pin(vma, 0, 0, addr | flags); + if (err) + goto err_put; + i915_vma_unpin(vma); + + if (!drm_mm_node_allocated(&vma->node)) { + err = -EINVAL; + goto err_put; + } + + if (i915_vma_misplaced(vma, 0, 0, addr | flags)) { + err = -EINVAL; + goto err_put; + } + + expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1)); + expected_node_size = expected_vma_size; + + if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) { + /* dg2 should expand lmem node to 2MB */ + expected_vma_size = round_up(size, I915_GTT_PAGE_SIZE_64K); + expected_node_size = round_up(size, I915_GTT_PAGE_SIZE_2M); + } + + if (vma->size != expected_vma_size || vma->node.size != expected_node_size) { + err = i915_vma_unbind(vma); + err = -EBADSLT; + goto err_put; + } + + err = i915_vma_unbind(vma); + if (err) + goto err_put; + + GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); + +err_put: + i915_gem_object_put(obj); + cleanup_freed_objects(vm->i915); + return err; +} + +static int misaligned_pin(struct i915_address_space *vm, + u64 hole_start, u64 hole_end, + unsigned long end_time) +{ + struct intel_memory_region *mr; + enum intel_region_id id; + unsigned long flags = PIN_OFFSET_FIXED | PIN_USER; + int err = 0; + u64 hole_size = hole_end - hole_start; + + if (i915_is_ggtt(vm)) + flags |= PIN_GLOBAL; + + for_each_memory_region(mr, vm->i915, id) { + u64 min_alignment = i915_vm_min_alignment(vm, (enum intel_memory_type)id); + u64 size = min_alignment; + u64 addr = round_up(hole_start + (hole_size / 2), min_alignment); + + /* we can't test < 4k alignment due to flags being encoded in lower bits */ + if (min_alignment != I915_GTT_PAGE_SIZE_4K) { + err = misaligned_case(vm, mr, addr + (min_alignment / 2), size, flags); + /* misaligned should error with -EINVAL*/ + if (!err) + err = -EBADSLT; + if (err != -EINVAL) + return err; + } + + /* test for vma->size expansion to min page size */ + err = misaligned_case(vm, mr, addr, PAGE_SIZE, flags); + if (min_alignment > hole_size) { + if (!err) + err = -EBADSLT; + else if (err == -ENOSPC) + err = 0; + } + if (err) + return err; + + /* test for intermediate size not expanding vma->size for large alignments */ + err = misaligned_case(vm, mr, addr, size / 2, flags); + if (min_alignment > hole_size) { + if (!err) + err = -EBADSLT; + else if (err == -ENOSPC) + err = 0; + } + if (err) + return err; + } + + return 0; +} + static int exercise_ppgtt(struct drm_i915_private *dev_priv, int (*func)(struct i915_address_space *vm, u64 hole_start, u64 hole_end, @@ -1136,6 +1252,11 @@ static int igt_ppgtt_shrink_boom(void *arg) return exercise_ppgtt(arg, shrink_boom); }
+static int igt_ppgtt_misaligned_pin(void *arg) +{ + return exercise_ppgtt(arg, misaligned_pin); +} + static int sort_holes(void *priv, const struct list_head *A, const struct list_head *B) { @@ -1208,6 +1329,11 @@ static int igt_ggtt_lowlevel(void *arg) return exercise_ggtt(arg, lowlevel_hole); }
+static int igt_ggtt_misaligned_pin(void *arg) +{ + return exercise_ggtt(arg, misaligned_pin); +} + static int igt_ggtt_page(void *arg) { const unsigned int count = PAGE_SIZE/sizeof(u32); @@ -2180,12 +2306,14 @@ int i915_gem_gtt_live_selftests(struct drm_i915_private *i915) SUBTEST(igt_ppgtt_fill), SUBTEST(igt_ppgtt_shrink), SUBTEST(igt_ppgtt_shrink_boom), + SUBTEST(igt_ppgtt_misaligned_pin), SUBTEST(igt_ggtt_lowlevel), SUBTEST(igt_ggtt_drunk), SUBTEST(igt_ggtt_walk), SUBTEST(igt_ggtt_pot), SUBTEST(igt_ggtt_fill), SUBTEST(igt_ggtt_page), + SUBTEST(igt_ggtt_misaligned_pin), SUBTEST(igt_cs_tlb), };
On 1/25/22 20:35, Robert Beckett wrote:
add test to check handling of misaligned offsets and sizes
v4:
- remove spurious blank lines
- explicitly cast intel_region_id to intel_memory_type in misaligned_pin
Reported-by: kernel test robot lkp@intel.com
Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++++++++++++++++++ 1 file changed, 128 insertions(+)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index b80788a2b7f9..f082b5ff3b5e 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -22,10 +22,12 @@
*/
+#include "gt/intel_gtt.h" #include <linux/list_sort.h> #include <linux/prime_numbers.h>
#include "gem/i915_gem_context.h" +#include "gem/i915_gem_region.h" #include "gem/selftests/mock_context.h" #include "gt/intel_context.h" #include "gt/intel_gpu_commands.h" @@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm, return err; }
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,
u64 addr, u64 size, unsigned long flags)
+{
- struct drm_i915_gem_object *obj;
- struct i915_vma *vma;
- int err = 0;
- u64 expected_vma_size, expected_node_size;
- obj = i915_gem_object_create_region(mr, size, 0, 0);
- if (IS_ERR(obj))
return PTR_ERR(obj);
- vma = i915_vma_instance(obj, vm, NULL);
- if (IS_ERR(vma)) {
err = PTR_ERR(vma);
goto err_put;
- }
- err = i915_vma_pin(vma, 0, 0, addr | flags);
- if (err)
goto err_put;
- i915_vma_unpin(vma);
- if (!drm_mm_node_allocated(&vma->node)) {
err = -EINVAL;
goto err_put;
- }
- if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
err = -EINVAL;
goto err_put;
- }
- expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1));
- expected_node_size = expected_vma_size;
- if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
/* dg2 should expand lmem node to 2MB */
Should this test be NEEDS_COMPACT_PT()?
Otherwise LGTM. Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
On 26/01/2022 14:05, Thomas Hellström (Intel) wrote:
On 1/25/22 20:35, Robert Beckett wrote:
add test to check handling of misaligned offsets and sizes
v4: * remove spurious blank lines * explicitly cast intel_region_id to intel_memory_type in misaligned_pin Reported-by: kernel test robot lkp@intel.com
Signed-off-by: Robert Beckett bob.beckett@collabora.com
drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++++++++++++++++++ 1 file changed, 128 insertions(+)
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index b80788a2b7f9..f082b5ff3b5e 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -22,10 +22,12 @@ * */ +#include "gt/intel_gtt.h" #include <linux/list_sort.h> #include <linux/prime_numbers.h> #include "gem/i915_gem_context.h" +#include "gem/i915_gem_region.h" #include "gem/selftests/mock_context.h" #include "gt/intel_context.h" #include "gt/intel_gpu_commands.h" @@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm, return err; } +static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr, + u64 addr, u64 size, unsigned long flags) +{ + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + int err = 0; + u64 expected_vma_size, expected_node_size;
+ obj = i915_gem_object_create_region(mr, size, 0, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj);
+ vma = i915_vma_instance(obj, vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err_put; + }
+ err = i915_vma_pin(vma, 0, 0, addr | flags); + if (err) + goto err_put; + i915_vma_unpin(vma);
+ if (!drm_mm_node_allocated(&vma->node)) { + err = -EINVAL; + goto err_put; + }
+ if (i915_vma_misplaced(vma, 0, 0, addr | flags)) { + err = -EINVAL; + goto err_put; + }
+ expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1)); + expected_node_size = expected_vma_size;
+ if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) { + /* dg2 should expand lmem node to 2MB */
Should this test be NEEDS_COMPACT_PT()?
Otherwise LGTM. Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
Thanks. Good catch, forgot to retrofit the new macro here.
From: Matthew Auld matthew.auld@intel.com
On discrete platforms like DG2, we need to support a minimum page size of 64K when dealing with device local-memory. This is quite tricky for various reasons, so try to document the new implicit uapi for this.
v3: fix typos and less emphasis v2: Fixed suggestions on formatting [Daniel]
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Acked-by: Jordan Justen jordan.l.justen@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com cc: Simon Ser contact@emersion.fr cc: Pekka Paalanen ppaalanen@gmail.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: mesa-dev@lists.freedesktop.org Cc: Tony Ye tony.ye@intel.com Cc: Slawomir Milczarek slawomir.milczarek@intel.com --- include/uapi/drm/i915_drm.h | 44 ++++++++++++++++++++++++++++++++----- 1 file changed, 39 insertions(+), 5 deletions(-)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 5e678917da70..77e5e74c32c1 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 { /** * When the EXEC_OBJECT_PINNED flag is specified this is populated by * the user with the GTT offset at which this object will be pinned. + * * When the I915_EXEC_NO_RELOC flag is specified this must contain the * presumed_offset of the object. + * * During execbuffer2 the kernel populates it with the value of the * current GTT offset of the object, for future presumed_offset writes. + * + * See struct drm_i915_gem_create_ext for the rules when dealing with + * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with + * minimum page sizes, like DG2. */ __u64 offset;
@@ -3145,11 +3151,39 @@ struct drm_i915_gem_create_ext { * * The (page-aligned) allocated size for the object will be returned. * - * Note that for some devices we have might have further minimum - * page-size restrictions(larger than 4K), like for device local-memory. - * However in general the final size here should always reflect any - * rounding up, if for example using the I915_GEM_CREATE_EXT_MEMORY_REGIONS - * extension to place the object in device local-memory. + * + * DG2 64K min page size implications: + * + * On discrete platforms, starting from DG2, we have to contend with GTT + * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE + * objects. Specifically the hardware only supports 64K or larger GTT + * page sizes for such memory. The kernel will already ensure that all + * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page + * sizes underneath. + * + * Note that the returned size here will always reflect any required + * rounding up done by the kernel, i.e 4K will now become 64K on devices + * such as DG2. + * + * Special DG2 GTT address alignment requirement: + * + * The GTT alignment will also need to be at least 2M for such objects. + * + * Note that due to how the hardware implements 64K GTT page support, we + * have some further complications: + * + * 1) The entire PDE (which covers a 2MB virtual address range), must + * contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same + * PDE is forbidden by the hardware. + * + * 2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM + * objects. + * + * To keep things simple for userland, we mandate that any GTT mappings + * must be aligned to and rounded up to 2MB. As this only wastes virtual + * address space and avoids userland having to copy any needlessly + * complicated PDE sharing scheme (coloring) and only affects DG2, this + * is deemed to be a good compromise. */ __u64 size; /**
On 1/25/22 20:35, Robert Beckett wrote:
From: Matthew Auld matthew.auld@intel.com
On discrete platforms like DG2, we need to support a minimum page size of 64K when dealing with device local-memory. This is quite tricky for various reasons, so try to document the new implicit uapi for this.
v3: fix typos and less emphasis v2: Fixed suggestions on formatting [Daniel]
Signed-off-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Robert Beckett bob.beckett@collabora.com Acked-by: Jordan Justen jordan.l.justen@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com cc: Simon Ser contact@emersion.fr cc: Pekka Paalanen ppaalanen@gmail.com Cc: Jordan Justen jordan.l.justen@intel.com Cc: Kenneth Graunke kenneth@whitecape.org Cc: mesa-dev@lists.freedesktop.org Cc: Tony Ye tony.ye@intel.com Cc: Slawomir Milczarek slawomir.milczarek@intel.com
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com
dri-devel@lists.freedesktop.org