drm_memcpy_from_wc() performs fast copy from WC memory type using non-temporal instructions. Now there are two similar implementations of this function. One exists in drm_cache.c as drm_memcpy_from_wc() and another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc(). drm_memcpy_from_wc() was the recent addition through the series https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
The goal of this patch series is to change all users of i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common implementation in drm and eventually remove the copy from i915.
Another benefit of using memcpy functions from drm is that drm_memcpy_from_wc() is available for non-x86 architectures. i915_memcpy_from_wc() is implemented only for x86 and prevents building i915 for ARM64. drm_memcpy_from_wc() does fast copy using non-temporal instructions for x86 and for other architectures makes use of memcpy() family of functions as fallback.
Another major difference is unlike i915_memcpy_from_wc(), drm_memcpy_from_wc() will not fail if the passed address argument is not alignment to be used with non-temporal load instructions or if the platform lacks support for those instructions (non-temporal load instructions are provided through SSE4.1 instruction set extension). Instead drm_memcpy_from_wc() continues with fallback functions to complete the copy. This relieves the caller from checking the return value of i915_memcpy_from_wc() and explicitly using a fallback.
Follow up series will be created to remove the memcpy_from_wc functions from i915 once the dependency is completely removed.
Cc: Jani Nikula jani.nikula@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com
Balasubramani Vivekanandan (7): drm: Relax alignment constraint for destination address drm: Add drm_memcpy_from_wc() variant which accepts destination address drm/i915: use the memcpy_from_wc call from the drm drm/i915/guc: use the memcpy_from_wc call from the drm drm/i915/selftests: use the memcpy_from_wc call from the drm drm/i915/gt: Avoid direct dereferencing of io memory drm/i915: Avoid dereferencing io mapped memory
drivers/gpu/drm/drm_cache.c | 98 +++++++++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 11 ++- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++---- .../drm/i915/selftests/intel_memory_region.c | 8 +- include/drm/drm_cache.h | 3 + 7 files changed, 148 insertions(+), 46 deletions(-)
There is no need for the destination address to be aligned to 16 byte boundary to be able to use the non-temporal instructions while copying. Non-temporal instructions are used only for loading from the source address which has alignment constraints. We only need to take care of using the right instructions, based on whether destination address is aligned or not, while storing the data to the destination address.
__memcpy_ntdqu is copied from i915/i915_memcpy.c
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/drm_cache.c | 44 ++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index c3e6e615bf09..a21c1350eb09 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -278,18 +278,50 @@ static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len) kernel_fpu_end(); }
+static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len) +{ + kernel_fpu_begin(); + + while (len >= 4) { + asm("movntdqa (%0), %%xmm0\n" + "movntdqa 16(%0), %%xmm1\n" + "movntdqa 32(%0), %%xmm2\n" + "movntdqa 48(%0), %%xmm3\n" + "movups %%xmm0, (%1)\n" + "movups %%xmm1, 16(%1)\n" + "movups %%xmm2, 32(%1)\n" + "movups %%xmm3, 48(%1)\n" + :: "r" (src), "r" (dst) : "memory"); + src += 64; + dst += 64; + len -= 4; + } + while (len--) { + asm("movntdqa (%0), %%xmm0\n" + "movups %%xmm0, (%1)\n" + :: "r" (src), "r" (dst) : "memory"); + src += 16; + dst += 16; + } + + kernel_fpu_end(); +} + /* * __drm_memcpy_from_wc copies @len bytes from @src to @dst using - * non-temporal instructions where available. Note that all arguments - * (@src, @dst) must be aligned to 16 bytes and @len must be a multiple - * of 16. + * non-temporal instructions where available. Note that @src must be aligned to + * 16 bytes and @len must be a multiple of 16. */ static void __drm_memcpy_from_wc(void *dst, const void *src, unsigned long len) { - if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15)) + if (unlikely(((unsigned long)src | len) & 15)) { memcpy(dst, src, len); - else if (likely(len)) - __memcpy_ntdqa(dst, src, len >> 4); + } else if (likely(len)) { + if (IS_ALIGNED((unsigned long)dst, 16)) + __memcpy_ntdqa(dst, src, len >> 4); + else + __memcpy_ntdqu(dst, src, len >> 4); + } }
/**
On Tue, Feb 22, 2022 at 08:22:00PM +0530, Balasubramani Vivekanandan wrote:
There is no need for the destination address to be aligned to 16 byte boundary to be able to use the non-temporal instructions while copying. Non-temporal instructions are used only for loading from the source address which has alignment constraints. We only need to take care of using the right instructions, based on whether destination address is aligned or not, while storing the data to the destination address.
__memcpy_ntdqu is copied from i915/i915_memcpy.c
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
drivers/gpu/drm/drm_cache.c | 44 ++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index c3e6e615bf09..a21c1350eb09 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -278,18 +278,50 @@ static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len) kernel_fpu_end(); }
+static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len) +{
- kernel_fpu_begin();
- while (len >= 4) {
asm("movntdqa (%0), %%xmm0\n"
"movntdqa 16(%0), %%xmm1\n"
"movntdqa 32(%0), %%xmm2\n"
"movntdqa 48(%0), %%xmm3\n"
"movups %%xmm0, (%1)\n"
"movups %%xmm1, 16(%1)\n"
"movups %%xmm2, 32(%1)\n"
"movups %%xmm3, 48(%1)\n"
:: "r" (src), "r" (dst) : "memory");
src += 64;
dst += 64;
len -= 4;
- }
- while (len--) {
asm("movntdqa (%0), %%xmm0\n"
"movups %%xmm0, (%1)\n"
:: "r" (src), "r" (dst) : "memory");
src += 16;
dst += 16;
ok, this takes care of the tail
- }
- kernel_fpu_end();
+}
/*
- __drm_memcpy_from_wc copies @len bytes from @src to @dst using
- non-temporal instructions where available. Note that all arguments
- (@src, @dst) must be aligned to 16 bytes and @len must be a multiple
- of 16.
- non-temporal instructions where available. Note that @src must be aligned to
- 16 bytes and @len must be a multiple of 16.
*/ static void __drm_memcpy_from_wc(void *dst, const void *src, unsigned long len) {
- if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15))
- if (unlikely(((unsigned long)src | len) & 15)) { memcpy(dst, src, len);
- else if (likely(len))
__memcpy_ntdqa(dst, src, len >> 4);
- } else if (likely(len)) {
if (IS_ALIGNED((unsigned long)dst, 16))
we may want to just extend this function to deal with dst not being aligned. But this may be done on top
Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com
Lucas De Marchi
__memcpy_ntdqa(dst, src, len >> 4);
else
__memcpy_ntdqu(dst, src, len >> 4);
- }
}
/**
2.25.1
Fast copy using non-temporal instructions for x86 currently exists at two locations. One is implemented in i915 driver at i915/i915_memcpy.c and another copy at drm_cache.c. The plan is to remove the duplicate implementation in i915 driver and use the functions from drm_cache.c.
A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts address as argument instead of iosys_map for destination. It is a very common scenario in i915 to copy from a WC memory type, which may be an io memory or a system memory to a destination address pointing to system memory. To avoid the overhead of creating iosys_map type for the destination, new variant is created to accept the address directly.
Also a new function is exported in drm_cache.c to find if the fast copy is supported by the platform or not. It is required for i915.
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/drm_cache.c | 54 +++++++++++++++++++++++++++++++++++++ include/drm/drm_cache.h | 3 +++ 2 files changed, 57 insertions(+)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index a21c1350eb09..eb0bcd33665e 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -358,6 +358,54 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+/** + * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source + * that may be WC. + * @dst: The destination pointer + * @src: The source pointer + * @len: The size of the area to transfer in bytes + * + * Same as drm_memcpy_from_wc except destination is accepted as system memory + * address. Useful in situations where passing destination address as iosys_map + * is simply an overhead and can be avoided. + */ +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + unsigned long len) +{ + if (WARN_ON(in_interrupt())) { + iosys_map_memcpy_from(dst, src, 0, len); + return; + } + + if (static_branch_likely(&has_movntdqa)) { + __drm_memcpy_from_wc(dst, + src->is_iomem ? + (void const __force *)src->vaddr_iomem : + src->vaddr, + len); + return; + } + + iosys_map_memcpy_from(dst, src, 0, len); +} +EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr); + +/* + * drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal + * instructions is supported + * + * Returns true if platform has support for fast copying from wc memory type + * using non-temporal instructions. Else false. + */ +bool drm_memcpy_fastcopy_supported(void) +{ + if (static_branch_likely(&has_movntdqa)) + return true; + + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + /* * drm_memcpy_init_early - One time initialization of the WC memcpy code */ @@ -382,6 +430,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+bool drm_memcpy_fastcopy_supported(void) +{ + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + void drm_memcpy_init_early(void) { } diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h index 22deb216b59c..8f48e4dcd7dc 100644 --- a/include/drm/drm_cache.h +++ b/include/drm/drm_cache.h @@ -77,4 +77,7 @@ void drm_memcpy_init_early(void); void drm_memcpy_from_wc(struct iosys_map *dst, const struct iosys_map *src, unsigned long len); +bool drm_memcpy_fastcopy_supported(void); +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + unsigned long len); #endif
On Tue, Feb 22, 2022 at 08:22:01PM +0530, Balasubramani Vivekanandan wrote:
Fast copy using non-temporal instructions for x86 currently exists at two locations. One is implemented in i915 driver at i915/i915_memcpy.c and another copy at drm_cache.c. The plan is to remove the duplicate implementation in i915 driver and use the functions from drm_cache.c.
A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts address as argument instead of iosys_map for destination. It is a very common scenario in i915 to copy from a WC memory type, which may be an io memory or a system memory to a destination address pointing to system memory. To avoid the overhead of creating iosys_map type for the destination, new variant is created to accept the address directly.
Also a new function is exported in drm_cache.c to find if the fast copy is supported by the platform or not. It is required for i915.
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
drivers/gpu/drm/drm_cache.c | 54 +++++++++++++++++++++++++++++++++++++ include/drm/drm_cache.h | 3 +++ 2 files changed, 57 insertions(+)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index a21c1350eb09..eb0bcd33665e 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -358,6 +358,54 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+/**
- drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source
- that may be WC.
.... to a destination in system memory.
- @dst: The destination pointer
- @src: The source pointer
- @len: The size of the area to transfer in bytes
- Same as drm_memcpy_from_wc except destination is accepted as system memory
- address. Useful in situations where passing destination address as iosys_map
- is simply an overhead and can be avoided.
although one could do drm_memcpy_from_wc(IOSYS_MAP_INIT_VADDR(addr), ...
(if IOSYS_MAP_INIT_VADDR provided a cast to the struct).
- */
+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
name here is confusing as we are copying *to* system memory. Maybe drm_memcpy_vaddr_from_wc()? Not sure it's better. Maybe someone in Cc has a better suggestion
( To be honest, this whole _from_wc() suffix sound weird when are checking I/O vs system memory.... it may have been the motivation, but maybe it shouldn't be the name of the memcpy() variant )
The implementation looks ok and follows drm_memcpy_from_wc()
Lucas De Marchi
unsigned long len)
+{
- if (WARN_ON(in_interrupt())) {
iosys_map_memcpy_from(dst, src, 0, len);
return;
- }
- if (static_branch_likely(&has_movntdqa)) {
__drm_memcpy_from_wc(dst,
src->is_iomem ?
(void const __force *)src->vaddr_iomem :
src->vaddr,
len);
return;
- }
- iosys_map_memcpy_from(dst, src, 0, len);
+} +EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr);
+/*
- drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal
- instructions is supported
- Returns true if platform has support for fast copying from wc memory type
- using non-temporal instructions. Else false.
- */
+bool drm_memcpy_fastcopy_supported(void) +{
- if (static_branch_likely(&has_movntdqa))
return true;
- return false;
+} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
/*
- drm_memcpy_init_early - One time initialization of the WC memcpy code
*/ @@ -382,6 +430,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+bool drm_memcpy_fastcopy_supported(void) +{
- return false;
+} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
void drm_memcpy_init_early(void) { } diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h index 22deb216b59c..8f48e4dcd7dc 100644 --- a/include/drm/drm_cache.h +++ b/include/drm/drm_cache.h @@ -77,4 +77,7 @@ void drm_memcpy_init_early(void); void drm_memcpy_from_wc(struct iosys_map *dst, const struct iosys_map *src, unsigned long len); +bool drm_memcpy_fastcopy_supported(void); +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
unsigned long len);
#endif
2.25.1
On Mon, Feb 28, 2022 at 11:48:58PM -0800, Lucas De Marchi wrote:
On Tue, Feb 22, 2022 at 08:22:01PM +0530, Balasubramani Vivekanandan wrote:
Fast copy using non-temporal instructions for x86 currently exists at two locations. One is implemented in i915 driver at i915/i915_memcpy.c and another copy at drm_cache.c. The plan is to remove the duplicate implementation in i915 driver and use the functions from drm_cache.c.
A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts address as argument instead of iosys_map for destination. It is a very common scenario in i915 to copy from a WC memory type, which may be an io memory or a system memory to a destination address pointing to system memory. To avoid the overhead of creating iosys_map type for the destination, new variant is created to accept the address directly.
Also a new function is exported in drm_cache.c to find if the fast copy is supported by the platform or not. It is required for i915.
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
drivers/gpu/drm/drm_cache.c | 54 +++++++++++++++++++++++++++++++++++++ include/drm/drm_cache.h | 3 +++ 2 files changed, 57 insertions(+)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index a21c1350eb09..eb0bcd33665e 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -358,6 +358,54 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+/**
- drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source
- that may be WC.
.... to a destination in system memory.
- @dst: The destination pointer
- @src: The source pointer
- @len: The size of the area to transfer in bytes
- Same as drm_memcpy_from_wc except destination is accepted as system memory
- address. Useful in situations where passing destination address as iosys_map
- is simply an overhead and can be avoided.
although one could do drm_memcpy_from_wc(IOSYS_MAP_INIT_VADDR(addr), ...
... Just making you don't take that as a suggestion, I was just thinking out loud. And as is, it doesn't work as the function expects a iosys_map *
Lucas De Marhci
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 2d593d573ef1..49ff8e3e71d9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -449,16 +449,16 @@ static void i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, u64 offset, void *dst, int size) { void __iomem *src_map; - void __iomem *src_ptr; + struct iosys_map src_ptr; + dma_addr_t dma = i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT);
src_map = io_mapping_map_wc(&obj->mm.region->iomap, dma - obj->mm.region->region.start, PAGE_SIZE);
- src_ptr = src_map + offset_in_page(offset); - if (!i915_memcpy_from_wc(dst, (void __force *)src_ptr, size)) - memcpy_fromio(dst, src_ptr, size); + iosys_map_set_vaddr_iomem(&src_ptr, (src_map + offset_in_page(offset))); + drm_memcpy_from_wc_vaddr(dst, &src_ptr, size);
io_mapping_unmap(src_map); }
On Tue, Feb 22, 2022 at 08:22:02PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 2d593d573ef1..49ff8e3e71d9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -449,16 +449,16 @@ static void i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, u64 offset, void *dst, int size) { void __iomem *src_map;
- void __iomem *src_ptr;
struct iosys_map src_ptr;
dma_addr_t dma = i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT);
src_map = io_mapping_map_wc(&obj->mm.region->iomap, dma - obj->mm.region->region.start, PAGE_SIZE);
- src_ptr = src_map + offset_in_page(offset);
- if (!i915_memcpy_from_wc(dst, (void __force *)src_ptr, size))
memcpy_fromio(dst, src_ptr, size);
- iosys_map_set_vaddr_iomem(&src_ptr, (src_map + offset_in_page(offset)));
Too many parenthesis -----------------------^
other than that.
Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com
Lucas De Marchi
drm_memcpy_from_wc_vaddr(dst, &src_ptr, size);
io_mapping_unmap(src_map);
}
2.25.1
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index b53f61f3101f..1990762f07de 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -3,6 +3,7 @@ * Copyright © 2014-2019 Intel Corporation */
+#include <drm/drm_cache.h> #include <linux/debugfs.h>
#include "gt/intel_gt.h" @@ -205,6 +206,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) enum guc_log_buffer_type type; void *src_data, *dst_data; bool new_overflow; + struct iosys_map src_map;
mutex_lock(&log->relay.lock);
@@ -281,14 +283,17 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) }
/* Just copy the newly written data */ + iosys_map_set_vaddr(&src_map, src_data); if (read_offset > write_offset) { - i915_memcpy_from_wc(dst_data, src_data, write_offset); + drm_memcpy_from_wc_vaddr(dst_data, &src_map, + write_offset); bytes_to_copy = buffer_size - read_offset; } else { bytes_to_copy = write_offset - read_offset; } - i915_memcpy_from_wc(dst_data + read_offset, - src_data + read_offset, bytes_to_copy); + iosys_map_incr(&src_map, read_offset); + drm_memcpy_from_wc_vaddr(dst_data + read_offset, &src_map, + bytes_to_copy);
src_data += buffer_size; dst_data += buffer_size;
On Tue, Feb 22, 2022 at 08:22:03PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index b53f61f3101f..1990762f07de 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -3,6 +3,7 @@
- Copyright © 2014-2019 Intel Corporation
*/
+#include <drm/drm_cache.h> #include <linux/debugfs.h>
#include "gt/intel_gt.h" @@ -205,6 +206,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) enum guc_log_buffer_type type; void *src_data, *dst_data; bool new_overflow;
struct iosys_map src_map;
mutex_lock(&log->relay.lock);
@@ -281,14 +283,17 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) }
/* Just copy the newly written data */
iosys_map_set_vaddr(&src_map, src_data);
src is not guaranteed to come from system memory.... src is coming from: intel_guc_allocate_vma(), that may call either i915_gem_object_create_lmem() or i915_gem_object_create_shmem() depending if the platforma has lmem.
I guess you will need to check if the obj is in lmem and initialize src_map accordingly.
Lucas De Marchi
if (read_offset > write_offset) {
i915_memcpy_from_wc(dst_data, src_data, write_offset);
drm_memcpy_from_wc_vaddr(dst_data, &src_map,
} else { bytes_to_copy = write_offset - read_offset; }write_offset); bytes_to_copy = buffer_size - read_offset;
i915_memcpy_from_wc(dst_data + read_offset,
src_data + read_offset, bytes_to_copy);
iosys_map_incr(&src_map, read_offset);
drm_memcpy_from_wc_vaddr(dst_data + read_offset, &src_map,
bytes_to_copy);
src_data += buffer_size; dst_data += buffer_size;
-- 2.25.1
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/i915/selftests/intel_memory_region.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 7acba1d2135e..d7531aa6965a 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -7,6 +7,7 @@ #include <linux/sort.h>
#include <drm/drm_buddy.h> +#include <drm/drm_cache.h>
#include "../i915_selftest.h"
@@ -1033,7 +1034,10 @@ static inline void igt_memcpy(void *dst, const void *src, size_t size)
static inline void igt_memcpy_from_wc(void *dst, const void *src, size_t size) { - i915_memcpy_from_wc(dst, src, size); + struct iosys_map src_map; + + iosys_map_set_vaddr(&src_map, (void *)src); + drm_memcpy_from_wc_vaddr(dst, &src_map, size); }
static int _perf_memcpy(struct intel_memory_region *src_mr, @@ -1057,7 +1061,7 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, { "memcpy_from_wc", igt_memcpy_from_wc, - !i915_has_memcpy_from_wc(), + !drm_memcpy_fastcopy_supported(), }, }; struct drm_i915_gem_object *src, *dst;
On Tue, Feb 22, 2022 at 08:22:04PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
drivers/gpu/drm/i915/selftests/intel_memory_region.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 7acba1d2135e..d7531aa6965a 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -7,6 +7,7 @@ #include <linux/sort.h>
#include <drm/drm_buddy.h> +#include <drm/drm_cache.h>
#include "../i915_selftest.h"
@@ -1033,7 +1034,10 @@ static inline void igt_memcpy(void *dst, const void *src, size_t size)
static inline void igt_memcpy_from_wc(void *dst, const void *src, size_t size) {
- i915_memcpy_from_wc(dst, src, size);
- struct iosys_map src_map;
- iosys_map_set_vaddr(&src_map, (void *)src);
src is not guaranteed to be system memory. See perf_memcpy():
for_each_memory_region(src_mr, i915, src_id) { for_each_memory_region(dst_mr, i915, dst_id) { ...
Lucas De Marchi
- drm_memcpy_from_wc_vaddr(dst, &src_map, size);
}
static int _perf_memcpy(struct intel_memory_region *src_mr, @@ -1057,7 +1061,7 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, { "memcpy_from_wc", igt_memcpy_from_wc,
!i915_has_memcpy_from_wc(),
}, }; struct drm_i915_gem_object *src, *dst;!drm_memcpy_fastcopy_supported(),
-- 2.25.1
io mapped memory should not be directly dereferenced to ensure portability. io memory should be read/written/copied using helper functions. i915_memcpy_from_wc() function was used to copy the data from io memory to a temporary buffer and pointer to the temporary buffer was passed to CRC calculation function. But i915_memcpy_from_wc() only does a copy if the platform supports fast copy using non-temporal instructions. Otherwise the pointer to io memory was passed for CRC calculation. CRC function will directly dereference io memory and would not work properly on non-x86 platforms. To make it portable, it should be ensured always temporary buffer is used for CRC and not io memory. drm_memcpy_from_wc_vaddr() is now used for copying instead of i915_memcpy_from_wc() for 2 reasons. - i915_memcpy_from_wc() will be deprecated. - drm_memcpy_from_wc_vaddr() will not fail if the fast copy is not supported but uses memcpy_fromio as fallback for copying.
Cc: Matthew Brost <matthew.brost@intel.com Cc: Michał Winiarski michal.winiarski@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index 37c38bdd5f47..79d2bd7ef3b9 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -3,6 +3,7 @@ * Copyright © 2018 Intel Corporation */
+#include <drm/drm_cache.h> #include <linux/crc32.h>
#include "gem/i915_gem_stolen.h" @@ -82,7 +83,7 @@ __igt_reset_stolen(struct intel_gt *gt, for (page = 0; page < num_pages; page++) { dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT); void __iomem *s; - void *in; + struct iosys_map src_map;
ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start, @@ -98,10 +99,9 @@ __igt_reset_stolen(struct intel_gt *gt, ((page + 1) << PAGE_SHIFT) - 1)) memset_io(s, STACK_MAGIC, PAGE_SIZE);
- in = (void __force *)s; - if (i915_memcpy_from_wc(tmp, in, PAGE_SIZE)) - in = tmp; - crc[page] = crc32_le(0, in, PAGE_SIZE); + iosys_map_set_vaddr_iomem(&src_map, s); + drm_memcpy_from_wc_vaddr(tmp, &src_map, PAGE_SIZE); + crc[page] = crc32_le(0, tmp, PAGE_SIZE);
io_mapping_unmap(s); } @@ -122,7 +122,7 @@ __igt_reset_stolen(struct intel_gt *gt, for (page = 0; page < num_pages; page++) { dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT); void __iomem *s; - void *in; + struct iosys_map src_map; u32 x;
ggtt->vm.insert_page(&ggtt->vm, dma, @@ -134,10 +134,9 @@ __igt_reset_stolen(struct intel_gt *gt, ggtt->error_capture.start, PAGE_SIZE);
- in = (void __force *)s; - if (i915_memcpy_from_wc(tmp, in, PAGE_SIZE)) - in = tmp; - x = crc32_le(0, in, PAGE_SIZE); + iosys_map_set_vaddr_iomem(&src_map, s); + drm_memcpy_from_wc_vaddr(tmp, &src_map, PAGE_SIZE); + x = crc32_le(0, tmp, PAGE_SIZE);
if (x != crc[page] && !__drm_mm_interval_first(>->i915->mm.stolen, @@ -146,7 +145,7 @@ __igt_reset_stolen(struct intel_gt *gt, pr_debug("unused stolen page %pa modified by GPU reset\n", &page); if (count++ == 0) - igt_hexdump(in, PAGE_SIZE); + igt_hexdump(tmp, PAGE_SIZE); max = page; }
Pointer passed to zlib_deflate() for compression could point to io mapped memory and might end up in direct derefencing. io mapped memory is copied to a temporary buffer, which is then shared to zlib_deflate(), only for the case where platform supports fast copy using non-temporal instructions. If the platform lacks support, then io mapped memory is directly used.
Direct dereferencing of io memory makes driver not portable outside x86 and should be avoided.
With this patch, io memory is always copied to a temporary buffer irrespective of platform support for fast copy. The i915_has_memcpy_from_wc() check is removed. And drm_memcpy_from_wc_vaddr() is now used for copying instead of i915_memcpy_from_wc() for 2 reasons. - i915_memcpy_from_wc() will be deprecated. - drm_memcpy_from_wc_vaddr() will not fail if the fast copy is not supported instead continues copying using memcpy_fromio as fallback.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++++++++++++------------ 1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 1d042551619e..0c5917a7a545 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -258,9 +258,12 @@ static bool compress_init(struct i915_vma_compress *c) return false; }
- c->tmp = NULL; - if (i915_has_memcpy_from_wc()) - c->tmp = pool_alloc(&c->pool, ALLOW_FAIL); + c->tmp = pool_alloc(&c->pool, ALLOW_FAIL); + if (!c->tmp) { + kfree(zstream->workspace); + pool_fini(&c->pool); + return false; + }
return true; } @@ -292,15 +295,17 @@ static void *compress_next_page(struct i915_vma_compress *c, }
static int compress_page(struct i915_vma_compress *c, - void *src, - struct i915_vma_coredump *dst, - bool wc) + struct iosys_map *src, + struct i915_vma_coredump *dst) { struct z_stream_s *zstream = &c->zstream;
- zstream->next_in = src; - if (wc && c->tmp && i915_memcpy_from_wc(c->tmp, src, PAGE_SIZE)) + if (src->is_iomem) { + drm_memcpy_from_wc_vaddr(c->tmp, src, PAGE_SIZE); zstream->next_in = c->tmp; + } else { + zstream->next_in = src->vaddr; + } zstream->avail_in = PAGE_SIZE;
do { @@ -389,9 +394,8 @@ static bool compress_start(struct i915_vma_compress *c) }
static int compress_page(struct i915_vma_compress *c, - void *src, - struct i915_vma_coredump *dst, - bool wc) + struct iosys_map *src, + struct i915_vma_coredump *dst) { void *ptr;
@@ -399,8 +403,7 @@ static int compress_page(struct i915_vma_compress *c, if (!ptr) return -ENOMEM;
- if (!(wc && i915_memcpy_from_wc(ptr, src, PAGE_SIZE))) - memcpy(ptr, src, PAGE_SIZE); + drm_memcpy_from_wc_vaddr(ptr, src, PAGE_SIZE); list_add_tail(&virt_to_page(ptr)->lru, &dst->page_list); cond_resched();
@@ -1054,6 +1057,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, if (drm_mm_node_allocated(&ggtt->error_capture)) { void __iomem *s; dma_addr_t dma; + struct iosys_map src;
for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { mutex_lock(&ggtt->error_mutex); @@ -1062,9 +1066,8 @@ i915_vma_coredump_create(const struct intel_gt *gt, mb();
s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE); - ret = compress_page(compress, - (void __force *)s, dst, - true); + iosys_map_set_vaddr_iomem(&src, s); + ret = compress_page(compress, &src, dst); io_mapping_unmap(s);
mb(); @@ -1076,6 +1079,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, } else if (vma_res->bi.lmem) { struct intel_memory_region *mem = vma_res->mr; dma_addr_t dma; + struct iosys_map src;
for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { void __iomem *s; @@ -1083,15 +1087,15 @@ i915_vma_coredump_create(const struct intel_gt *gt, s = io_mapping_map_wc(&mem->iomap, dma - mem->region.start, PAGE_SIZE); - ret = compress_page(compress, - (void __force *)s, dst, - true); + iosys_map_set_vaddr_iomem(&src, s); + ret = compress_page(compress, &src, dst); io_mapping_unmap(s); if (ret) break; } } else { struct page *page; + struct iosys_map src;
for_each_sgt_page(page, iter, vma_res->bi.pages) { void *s; @@ -1099,7 +1103,8 @@ i915_vma_coredump_create(const struct intel_gt *gt, drm_clflush_pages(&page, 1);
s = kmap(page); - ret = compress_page(compress, s, dst, false); + iosys_map_set_vaddr(&src, s); + ret = compress_page(compress, &src, dst); kunmap(page);
drm_clflush_pages(&page, 1);
On 22/02/2022 15:51, Balasubramani Vivekanandan wrote:
drm_memcpy_from_wc() performs fast copy from WC memory type using non-temporal instructions. Now there are two similar implementations of this function. One exists in drm_cache.c as drm_memcpy_from_wc() and another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc(). drm_memcpy_from_wc() was the recent addition through the series https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
The goal of this patch series is to change all users of i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common implementation in drm and eventually remove the copy from i915.
Another benefit of using memcpy functions from drm is that drm_memcpy_from_wc() is available for non-x86 architectures. i915_memcpy_from_wc() is implemented only for x86 and prevents building i915 for ARM64. drm_memcpy_from_wc() does fast copy using non-temporal instructions for x86 and for other architectures makes use of memcpy() family of functions as fallback.
Another major difference is unlike i915_memcpy_from_wc(), drm_memcpy_from_wc() will not fail if the passed address argument is not alignment to be used with non-temporal load instructions or if the platform lacks support for those instructions (non-temporal load instructions are provided through SSE4.1 instruction set extension). Instead drm_memcpy_from_wc() continues with fallback functions to complete the copy. This relieves the caller from checking the return value of i915_memcpy_from_wc() and explicitly using a fallback.
Follow up series will be created to remove the memcpy_from_wc functions from i915 once the dependency is completely removed.
Overall the series looks good to me but I think you can add another patch to remove
i915_memcpy_from_wc() as I don't see any other usages left after this series, may be I am missing something?
Regards, Nirmoy
Cc: Jani Nikula jani.nikula@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com
Balasubramani Vivekanandan (7): drm: Relax alignment constraint for destination address drm: Add drm_memcpy_from_wc() variant which accepts destination address drm/i915: use the memcpy_from_wc call from the drm drm/i915/guc: use the memcpy_from_wc call from the drm drm/i915/selftests: use the memcpy_from_wc call from the drm drm/i915/gt: Avoid direct dereferencing of io memory drm/i915: Avoid dereferencing io mapped memory
drivers/gpu/drm/drm_cache.c | 98 +++++++++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 11 ++- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++---- .../drm/i915/selftests/intel_memory_region.c | 8 +- include/drm/drm_cache.h | 3 + 7 files changed, 148 insertions(+), 46 deletions(-)
On 23.02.2022 10:02, Das, Nirmoy wrote:
On 22/02/2022 15:51, Balasubramani Vivekanandan wrote:
drm_memcpy_from_wc() performs fast copy from WC memory type using non-temporal instructions. Now there are two similar implementations of this function. One exists in drm_cache.c as drm_memcpy_from_wc() and another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc(). drm_memcpy_from_wc() was the recent addition through the series https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
The goal of this patch series is to change all users of i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common implementation in drm and eventually remove the copy from i915.
Another benefit of using memcpy functions from drm is that drm_memcpy_from_wc() is available for non-x86 architectures. i915_memcpy_from_wc() is implemented only for x86 and prevents building i915 for ARM64. drm_memcpy_from_wc() does fast copy using non-temporal instructions for x86 and for other architectures makes use of memcpy() family of functions as fallback.
Another major difference is unlike i915_memcpy_from_wc(), drm_memcpy_from_wc() will not fail if the passed address argument is not alignment to be used with non-temporal load instructions or if the platform lacks support for those instructions (non-temporal load instructions are provided through SSE4.1 instruction set extension). Instead drm_memcpy_from_wc() continues with fallback functions to complete the copy. This relieves the caller from checking the return value of i915_memcpy_from_wc() and explicitly using a fallback.
Follow up series will be created to remove the memcpy_from_wc functions from i915 once the dependency is completely removed.
Overall the series looks good to me but I think you can add another patch to remove
i915_memcpy_from_wc() as I don't see any other usages left after this series, may be I am missing something?
I have changed all users of i915_memcpy_from_wc() to drm function. But this is another function i915_unaligned_memcpy_from_wc() in i915_memcpy.c which is blocking completely eliminating the i915_memcpy.c file from i915. This function accepts unaligned source address and does fast copy only for the aligned region of memory and remaining part is copied using memcpy function. Either I can move i915_unaligned_memcpy_from_wc() also to drm but I am concerned since it is more a platform specific handling, does it make sense to keep it in drm. Else I have retain to i915_unaligned_memcpy_from_wc() inside i915 and refactor the function to use drm_memcpy_from_wc() instead of the __memcpy_ntdqu(). But before I could do more changes, I wanted feedback on the current change. So I decided to go ahead with creating series for review.
Regards, Bala
Regards, Nirmoy
Cc: Jani Nikula jani.nikula@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com
Balasubramani Vivekanandan (7): drm: Relax alignment constraint for destination address drm: Add drm_memcpy_from_wc() variant which accepts destination address drm/i915: use the memcpy_from_wc call from the drm drm/i915/guc: use the memcpy_from_wc call from the drm drm/i915/selftests: use the memcpy_from_wc call from the drm drm/i915/gt: Avoid direct dereferencing of io memory drm/i915: Avoid dereferencing io mapped memory
drivers/gpu/drm/drm_cache.c | 98 +++++++++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 11 ++- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++---- .../drm/i915/selftests/intel_memory_region.c | 8 +- include/drm/drm_cache.h | 3 + 7 files changed, 148 insertions(+), 46 deletions(-)
On 23/02/2022 12:08, Balasubramani Vivekanandan wrote:
On 23.02.2022 10:02, Das, Nirmoy wrote:
On 22/02/2022 15:51, Balasubramani Vivekanandan wrote:
drm_memcpy_from_wc() performs fast copy from WC memory type using non-temporal instructions. Now there are two similar implementations of this function. One exists in drm_cache.c as drm_memcpy_from_wc() and another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc(). drm_memcpy_from_wc() was the recent addition through the series https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
The goal of this patch series is to change all users of i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common implementation in drm and eventually remove the copy from i915.
Another benefit of using memcpy functions from drm is that drm_memcpy_from_wc() is available for non-x86 architectures. i915_memcpy_from_wc() is implemented only for x86 and prevents building i915 for ARM64. drm_memcpy_from_wc() does fast copy using non-temporal instructions for x86 and for other architectures makes use of memcpy() family of functions as fallback.
Another major difference is unlike i915_memcpy_from_wc(), drm_memcpy_from_wc() will not fail if the passed address argument is not alignment to be used with non-temporal load instructions or if the platform lacks support for those instructions (non-temporal load instructions are provided through SSE4.1 instruction set extension). Instead drm_memcpy_from_wc() continues with fallback functions to complete the copy. This relieves the caller from checking the return value of i915_memcpy_from_wc() and explicitly using a fallback.
Follow up series will be created to remove the memcpy_from_wc functions from i915 once the dependency is completely removed.
Overall the series looks good to me but I think you can add another patch to remove
i915_memcpy_from_wc() as I don't see any other usages left after this series, may be I am missing something?
I have changed all users of i915_memcpy_from_wc() to drm function. But this is another function i915_unaligned_memcpy_from_wc() in i915_memcpy.c which is blocking completely eliminating the i915_memcpy.c file from i915. This function accepts unaligned source address and does fast copy only for the aligned region of memory and remaining part is copied using memcpy function. Either I can move i915_unaligned_memcpy_from_wc() also to drm but I am concerned since it is more a platform specific handling, does it make sense to keep it in drm. Else I have retain to i915_unaligned_memcpy_from_wc() inside i915 and refactor the function to use drm_memcpy_from_wc() instead of the __memcpy_ntdqu().
I think for completeness it makes sense to remove i915_memcpy_from_wc() and its helper functions
in this series. I don't think we can have i915_unaligned_memcpy_from_wc() if want i915 on ARM[0] so I think
you can remove usages of i915_unaligned_memcpy_from_wc() as well.
[0]IIUC CI_BUG_ON() check in i915_unaligned_memcpy_from_wc() will raise a build error on ARM
Regards,
Nirmoy
But before I could do more changes, I wanted feedback on the current change. So I decided to go ahead with creating series for review.
Regards, Bala
Regards, Nirmoy
Cc: Jani Nikula jani.nikula@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com
Balasubramani Vivekanandan (7): drm: Relax alignment constraint for destination address drm: Add drm_memcpy_from_wc() variant which accepts destination address drm/i915: use the memcpy_from_wc call from the drm drm/i915/guc: use the memcpy_from_wc call from the drm drm/i915/selftests: use the memcpy_from_wc call from the drm drm/i915/gt: Avoid direct dereferencing of io memory drm/i915: Avoid dereferencing io mapped memory
drivers/gpu/drm/drm_cache.c | 98 +++++++++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 11 ++- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++---- .../drm/i915/selftests/intel_memory_region.c | 8 +- include/drm/drm_cache.h | 3 + 7 files changed, 148 insertions(+), 46 deletions(-)
dri-devel@lists.freedesktop.org