drm_memcpy_from_wc() performs fast copy from WC memory type using non-temporal instructions. Now there are two similar implementations of this function. One exists in drm_cache.c as drm_memcpy_from_wc() and another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc(). drm_memcpy_from_wc() was the recent addition through the series https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
The goal of this patch series is to change all users of i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common implementation in drm and eventually remove the copy from i915.
Another benefit of using memcpy functions from drm is that drm_memcpy_from_wc() is available for non-x86 architectures. i915_memcpy_from_wc() is implemented only for x86 and prevents building i915 for ARM64. drm_memcpy_from_wc() does fast copy using non-temporal instructions for x86 and for other architectures makes use of memcpy() family of functions as fallback.
Another major difference is unlike i915_memcpy_from_wc(), drm_memcpy_from_wc() will not fail if the passed address argument is not alignment to be used with non-temporal load instructions or if the platform lacks support for those instructions (non-temporal load instructions are provided through SSE4.1 instruction set extension). Instead drm_memcpy_from_wc() continues with fallback functions to complete the copy. This relieves the caller from checking the return value of i915_memcpy_from_wc() and explicitly using a fallback.
Follow up series will be created to remove the memcpy_from_wc functions from i915 once the dependency is completely removed.
v2: Fixed missing check to find if the address is from system memory or io memory and use the right initialization function to construct the iosys_map structure (Review feedback from Lucas)
Cc: Jani Nikula jani.nikula@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Nirmoy Das nirmoy.das@intel.com
Balasubramani Vivekanandan (7): drm: Relax alignment constraint for destination address drm: Add drm_memcpy_from_wc() variant which accepts destination address drm/i915: use the memcpy_from_wc call from the drm drm/i915/guc: use the memcpy_from_wc call from the drm drm/i915/selftests: use the memcpy_from_wc call from the drm drm/i915/gt: Avoid direct dereferencing of io memory drm/i915: Avoid dereferencing io mapped memory
drivers/gpu/drm/drm_cache.c | 98 +++++++++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 15 ++- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++---- .../drm/i915/selftests/intel_memory_region.c | 41 +++++--- include/drm/drm_cache.h | 3 + 7 files changed, 174 insertions(+), 55 deletions(-)
There is no need for the destination address to be aligned to 16 byte boundary to be able to use the non-temporal instructions while copying. Non-temporal instructions are used only for loading from the source address which has alignment constraints. We only need to take care of using the right instructions, based on whether destination address is aligned or not, while storing the data to the destination address.
__memcpy_ntdqu is copied from i915/i915_memcpy.c
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/drm_cache.c | 44 ++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index c3e6e615bf09..a21c1350eb09 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -278,18 +278,50 @@ static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len) kernel_fpu_end(); }
+static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len) +{ + kernel_fpu_begin(); + + while (len >= 4) { + asm("movntdqa (%0), %%xmm0\n" + "movntdqa 16(%0), %%xmm1\n" + "movntdqa 32(%0), %%xmm2\n" + "movntdqa 48(%0), %%xmm3\n" + "movups %%xmm0, (%1)\n" + "movups %%xmm1, 16(%1)\n" + "movups %%xmm2, 32(%1)\n" + "movups %%xmm3, 48(%1)\n" + :: "r" (src), "r" (dst) : "memory"); + src += 64; + dst += 64; + len -= 4; + } + while (len--) { + asm("movntdqa (%0), %%xmm0\n" + "movups %%xmm0, (%1)\n" + :: "r" (src), "r" (dst) : "memory"); + src += 16; + dst += 16; + } + + kernel_fpu_end(); +} + /* * __drm_memcpy_from_wc copies @len bytes from @src to @dst using - * non-temporal instructions where available. Note that all arguments - * (@src, @dst) must be aligned to 16 bytes and @len must be a multiple - * of 16. + * non-temporal instructions where available. Note that @src must be aligned to + * 16 bytes and @len must be a multiple of 16. */ static void __drm_memcpy_from_wc(void *dst, const void *src, unsigned long len) { - if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15)) + if (unlikely(((unsigned long)src | len) & 15)) { memcpy(dst, src, len); - else if (likely(len)) - __memcpy_ntdqa(dst, src, len >> 4); + } else if (likely(len)) { + if (IS_ALIGNED((unsigned long)dst, 16)) + __memcpy_ntdqa(dst, src, len >> 4); + else + __memcpy_ntdqu(dst, src, len >> 4); + } }
/**
Fast copy using non-temporal instructions for x86 currently exists at two locations. One is implemented in i915 driver at i915/i915_memcpy.c and another copy at drm_cache.c. The plan is to remove the duplicate implementation in i915 driver and use the functions from drm_cache.c.
A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts address as argument instead of iosys_map for destination. It is a very common scenario in i915 to copy from a WC memory type, which may be an io memory or a system memory to a destination address pointing to system memory. To avoid the overhead of creating iosys_map type for the destination, new variant is created to accept the address directly.
Also a new function is exported in drm_cache.c to find if the fast copy is supported by the platform or not. It is required for i915.
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/drm_cache.c | 54 +++++++++++++++++++++++++++++++++++++ include/drm/drm_cache.h | 3 +++ 2 files changed, 57 insertions(+)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index a21c1350eb09..97959eecc300 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -358,6 +358,54 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+/** + * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source + * that may be WC to a destination in system memory. + * @dst: The destination pointer + * @src: The source pointer + * @len: The size of the area to transfer in bytes + * + * Same as drm_memcpy_from_wc except destination is accepted as system memory + * address. Useful in situations where passing destination address as iosys_map + * is simply an overhead and can be avoided. + */ +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + unsigned long len) +{ + if (WARN_ON(in_interrupt())) { + iosys_map_memcpy_from(dst, src, 0, len); + return; + } + + if (static_branch_likely(&has_movntdqa)) { + __drm_memcpy_from_wc(dst, + src->is_iomem ? + (void const __force *)src->vaddr_iomem : + src->vaddr, + len); + return; + } + + iosys_map_memcpy_from(dst, src, 0, len); +} +EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr); + +/* + * drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal + * instructions is supported + * + * Returns true if platform has support for fast copying from wc memory type + * using non-temporal instructions. Else false. + */ +bool drm_memcpy_fastcopy_supported(void) +{ + if (static_branch_likely(&has_movntdqa)) + return true; + + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + /* * drm_memcpy_init_early - One time initialization of the WC memcpy code */ @@ -382,6 +430,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+bool drm_memcpy_fastcopy_supported(void) +{ + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + void drm_memcpy_init_early(void) { } diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h index 22deb216b59c..8f48e4dcd7dc 100644 --- a/include/drm/drm_cache.h +++ b/include/drm/drm_cache.h @@ -77,4 +77,7 @@ void drm_memcpy_init_early(void); void drm_memcpy_from_wc(struct iosys_map *dst, const struct iosys_map *src, unsigned long len); +bool drm_memcpy_fastcopy_supported(void); +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + unsigned long len); #endif
On Thu, Mar 03, 2022 at 11:30:08PM +0530, Balasubramani Vivekanandan wrote:
Fast copy using non-temporal instructions for x86 currently exists at two locations. One is implemented in i915 driver at i915/i915_memcpy.c and another copy at drm_cache.c. The plan is to remove the duplicate implementation in i915 driver and use the functions from drm_cache.c.
A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts address as argument instead of iosys_map for destination. It is a very common scenario in i915 to copy from a WC memory type, which may be an io memory or a system memory to a destination address pointing to system memory. To avoid the overhead of creating iosys_map type for the destination, new variant is created to accept the address directly.
Also a new function is exported in drm_cache.c to find if the fast copy is supported by the platform or not. It is required for i915.
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
drivers/gpu/drm/drm_cache.c | 54 +++++++++++++++++++++++++++++++++++++ include/drm/drm_cache.h | 3 +++ 2 files changed, 57 insertions(+)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index a21c1350eb09..97959eecc300 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -358,6 +358,54 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+/**
- drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source
- that may be WC to a destination in system memory.
- @dst: The destination pointer
- @src: The source pointer
- @len: The size of the area to transfer in bytes
- Same as drm_memcpy_from_wc except destination is accepted as system memory
drm_memcpy_from_wc() for kernel doc
- address. Useful in situations where passing destination address as iosys_map
- is simply an overhead and can be avoided.
- */
+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
As in the first version, still don't like the name, but ok.
Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com
Lucas De Marchi
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 372bc220faeb..5de657c3190e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -438,6 +438,8 @@ i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, u64 offset { void __iomem *src_map; void __iomem *src_ptr; + struct iosys_map src; + dma_addr_t dma = i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT);
src_map = io_mapping_map_wc(&obj->mm.region->iomap, @@ -445,8 +447,8 @@ i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, u64 offset PAGE_SIZE);
src_ptr = src_map + offset_in_page(offset); - if (!i915_memcpy_from_wc(dst, (void __force *)src_ptr, size)) - memcpy_fromio(dst, src_ptr, size); + iosys_map_set_vaddr_iomem(&src, src_ptr); + drm_memcpy_from_wc_vaddr(dst, &src, size);
io_mapping_unmap(src_map); }
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: Check if the log object allocated from local memory or system memory and according setup the iosys_map (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index a24dc6441872..b9db765627ea 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -3,6 +3,7 @@ * Copyright © 2014-2019 Intel Corporation */
+#include <drm/drm_cache.h> #include <linux/debugfs.h> #include <linux/string_helpers.h>
@@ -206,6 +207,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) enum guc_log_buffer_type type; void *src_data, *dst_data; bool new_overflow; + struct iosys_map src_map;
mutex_lock(&log->relay.lock);
@@ -282,14 +284,21 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) }
/* Just copy the newly written data */ + if (i915_gem_object_is_lmem(log->vma->obj)) + iosys_map_set_vaddr_iomem(&src_map, (void __iomem *)src_data); + else + iosys_map_set_vaddr(&src_map, src_data); + if (read_offset > write_offset) { - i915_memcpy_from_wc(dst_data, src_data, write_offset); + drm_memcpy_from_wc_vaddr(dst_data, &src_map, + write_offset); bytes_to_copy = buffer_size - read_offset; } else { bytes_to_copy = write_offset - read_offset; } - i915_memcpy_from_wc(dst_data + read_offset, - src_data + read_offset, bytes_to_copy); + iosys_map_incr(&src_map, read_offset); + drm_memcpy_from_wc_vaddr(dst_data + read_offset, &src_map, + bytes_to_copy);
src_data += buffer_size; dst_data += buffer_size;
On Thu, Mar 03, 2022 at 11:30:10PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: Check if the log object allocated from local memory or system memory and according setup the iosys_map (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index a24dc6441872..b9db765627ea 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -3,6 +3,7 @@
- Copyright © 2014-2019 Intel Corporation
*/
+#include <drm/drm_cache.h> #include <linux/debugfs.h> #include <linux/string_helpers.h>
@@ -206,6 +207,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) enum guc_log_buffer_type type; void *src_data, *dst_data; bool new_overflow;
struct iosys_map src_map;
mutex_lock(&log->relay.lock);
@@ -282,14 +284,21 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) }
/* Just copy the newly written data */
if (i915_gem_object_is_lmem(log->vma->obj))
iosys_map_set_vaddr_iomem(&src_map, (void __iomem *)src_data);
else
iosys_map_set_vaddr(&src_map, src_data);
It would be better to keep this outside of the loop. So inside the loop we can use only iosys_map_incr(&src_map, buffer_size). However you'd also have to handle the read_offset. The iosys_map_ API has both a src_offset and dst_offset due to situations like that. Maybe this is missing in the new drm_memcpy_* function you're adding?
This function was not correct wrt to IO memory access with the other 2 places in this function doing plain memcpy(). Since we are starting to use iosys_map here, we probably should handle this commit as "migrate to iosys_map", and convert those. In your current final state we have 3 variables aliasing the same memory location. IMO it will be error prone to keep it like that
+Michal, some questions:
- I'm not very familiar with the relayfs API. Is the `dst_data += PAGE_SIZE;` really correct?
- Could you double check this patch and ack if ok?
Heads up that since the log buffer is potentially in lmem, we will need to convert this function to take that into account. All those accesses to log_buf_state need to use the proper kernel abstraction for system vs I/O memory.
thanks Lucas De Marchi
- if (read_offset > write_offset) {
i915_memcpy_from_wc(dst_data, src_data, write_offset);
drm_memcpy_from_wc_vaddr(dst_data, &src_map,
} else { bytes_to_copy = write_offset - read_offset; }write_offset); bytes_to_copy = buffer_size - read_offset;
i915_memcpy_from_wc(dst_data + read_offset,
src_data + read_offset, bytes_to_copy);
iosys_map_incr(&src_map, read_offset);
drm_memcpy_from_wc_vaddr(dst_data + read_offset, &src_map,
bytes_to_copy);
src_data += buffer_size; dst_data += buffer_size;
-- 2.25.1
On 21.03.2022 22:14, Lucas De Marchi wrote:
On Thu, Mar 03, 2022 at 11:30:10PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: Check if the log object allocated from local memory or system memory and according setup the iosys_map (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com
Signed-off-by: Balasubramani Vivekanandan
balasubramani.vivekanandan@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index a24dc6441872..b9db765627ea 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -3,6 +3,7 @@ * Copyright © 2014-2019 Intel Corporation */
+#include <drm/drm_cache.h> #include <linux/debugfs.h> #include <linux/string_helpers.h>
@@ -206,6 +207,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) enum guc_log_buffer_type type; void *src_data, *dst_data; bool new_overflow; + struct iosys_map src_map;
mutex_lock(&log->relay.lock);
@@ -282,14 +284,21 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) }
/* Just copy the newly written data */ + if (i915_gem_object_is_lmem(log->vma->obj)) + iosys_map_set_vaddr_iomem(&src_map, (void __iomem *)src_data); + else + iosys_map_set_vaddr(&src_map, src_data);
It would be better to keep this outside of the loop. So inside the loop we can use only iosys_map_incr(&src_map, buffer_size). However you'd also have to handle the read_offset. The iosys_map_ API has both a src_offset and dst_offset due to situations like that. Maybe this is missing in the new drm_memcpy_* function you're adding?
This function was not correct wrt to IO memory access with the other 2 places in this function doing plain memcpy(). Since we are starting to use iosys_map here, we probably should handle this commit as "migrate to iosys_map", and convert those. In your current final state we have 3 variables aliasing the same memory location. IMO it will be error prone to keep it like that
+Michal, some questions:
@Lucas, better to ask Alan who is making some changes around GuC log
@Alan, can you help answer below questions?
thanks, Michal
- I'm not very familiar with the relayfs API. Is the `dst_data +=
PAGE_SIZE;` really correct?
- Could you double check this patch and ack if ok?
Heads up that since the log buffer is potentially in lmem, we will need to convert this function to take that into account. All those accesses to log_buf_state need to use the proper kernel abstraction for system vs I/O memory.
thanks Lucas De Marchi
if (read_offset > write_offset) { - i915_memcpy_from_wc(dst_data, src_data, write_offset); + drm_memcpy_from_wc_vaddr(dst_data, &src_map, + write_offset); bytes_to_copy = buffer_size - read_offset; } else { bytes_to_copy = write_offset - read_offset; } - i915_memcpy_from_wc(dst_data + read_offset, - src_data + read_offset, bytes_to_copy); + iosys_map_incr(&src_map, read_offset); + drm_memcpy_from_wc_vaddr(dst_data + read_offset, &src_map, + bytes_to_copy);
src_data += buffer_size; dst_data += buffer_size; -- 2.25.1
On 21.03.2022 14:14, Lucas De Marchi wrote:
On Thu, Mar 03, 2022 at 11:30:10PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: Check if the log object allocated from local memory or system memory and according setup the iosys_map (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index a24dc6441872..b9db765627ea 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -3,6 +3,7 @@
- Copyright © 2014-2019 Intel Corporation
*/
+#include <drm/drm_cache.h> #include <linux/debugfs.h> #include <linux/string_helpers.h>
@@ -206,6 +207,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) enum guc_log_buffer_type type; void *src_data, *dst_data; bool new_overflow;
struct iosys_map src_map;
mutex_lock(&log->relay.lock);
@@ -282,14 +284,21 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) }
/* Just copy the newly written data */
if (i915_gem_object_is_lmem(log->vma->obj))
iosys_map_set_vaddr_iomem(&src_map, (void __iomem *)src_data);
else
iosys_map_set_vaddr(&src_map, src_data);
It would be better to keep this outside of the loop. So inside the loop we can use only iosys_map_incr(&src_map, buffer_size). However you'd also have to handle the read_offset. The iosys_map_ API has both a src_offset and dst_offset due to situations like that. Maybe this is missing in the new drm_memcpy_* function you're adding?
This function was not correct wrt to IO memory access with the other 2 places in this function doing plain memcpy(). Since we are starting to use iosys_map here, we probably should handle this commit as "migrate to iosys_map", and convert those. In your current final state we have 3 variables aliasing the same memory location. IMO it will be error prone to keep it like that
yes, it is a good suggestion to completely change the reading of the GuC log for the relay to use the iosys_map interfaces. Though it was planned eventually, doing it now with this series will avoid mixing of memcpy() and drm_memcpy_*(which needs iosys_map parameters) functions. I will do the changes.
+Michal, some questions:
- I'm not very familiar with the relayfs API. Is the `dst_data += PAGE_SIZE;`
really correct?
- Could you double check this patch and ack if ok?
Heads up that since the log buffer is potentially in lmem, we will need to convert this function to take that into account. All those accesses to log_buf_state need to use the proper kernel abstraction for system vs I/O memory.
thanks Lucas De Marchi
- if (read_offset > write_offset) {
i915_memcpy_from_wc(dst_data, src_data, write_offset);
drm_memcpy_from_wc_vaddr(dst_data, &src_map,
} else { bytes_to_copy = write_offset - read_offset; }write_offset); bytes_to_copy = buffer_size - read_offset;
i915_memcpy_from_wc(dst_data + read_offset,
src_data + read_offset, bytes_to_copy);
iosys_map_incr(&src_map, read_offset);
drm_memcpy_from_wc_vaddr(dst_data + read_offset, &src_map,
bytes_to_copy);
src_data += buffer_size; dst_data += buffer_size;
-- 2.25.1
On 3/21/2022 2:14 PM, Lucas De Marchi wrote:
On Thu, Mar 03, 2022 at 11:30:10PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: Check if the log object allocated from local memory or system memory and according setup the iosys_map (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com
Signed-off-by: Balasubramani Vivekanandan
balasubramani.vivekanandan@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index a24dc6441872..b9db765627ea 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -3,6 +3,7 @@ * Copyright © 2014-2019 Intel Corporation */
+#include <drm/drm_cache.h> #include <linux/debugfs.h> #include <linux/string_helpers.h>
@@ -206,6 +207,7 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) enum guc_log_buffer_type type; void *src_data, *dst_data; bool new_overflow; + struct iosys_map src_map;
mutex_lock(&log->relay.lock);
@@ -282,14 +284,21 @@ static void guc_read_update_log_buffer(struct intel_guc_log *log) }
/* Just copy the newly written data */ + if (i915_gem_object_is_lmem(log->vma->obj)) + iosys_map_set_vaddr_iomem(&src_map, (void __iomem *)src_data); + else + iosys_map_set_vaddr(&src_map, src_data);
It would be better to keep this outside of the loop. So inside the loop we can use only iosys_map_incr(&src_map, buffer_size). However you'd also have to handle the read_offset. The iosys_map_ API has both a src_offset and dst_offset due to situations like that. Maybe this is missing in the new drm_memcpy_* function you're adding?
This function was not correct wrt to IO memory access with the other 2 places in this function doing plain memcpy(). Since we are starting to use iosys_map here, we probably should handle this commit as "migrate to iosys_map", and convert those. In your current final state we have 3 variables aliasing the same memory location. IMO it will be error prone to keep it like that
+Michal, some questions:
- I'm not very familiar with the relayfs API. Is the `dst_data +=
PAGE_SIZE;` really correct?
This is a bit weird due to how i915 uses the relay for the GuC logs, but it should be functionally correct. Each relay buffer is the same size of the GuC log buffer in i915 (which is guaranteed to be greater than PAGE_SIZE) and we always switch to a new relay buffer each time we dump new data, so we're guaranteed to have the space we need. We do some pointer magic because instead of just blindly copying the whole local log buffer to the relay buffer, we copy the header (which is in the first page) first, then we copy the rest of the logs (2nd page and onwards) based on what the header tells us has been filled out.
- Could you double check this patch and ack if ok?
The approach looks good to me, but I agree that at this point we might as well do a full conversion to iosys map. As you already mentioned, the memcpy that copies the header would also need to be updated for that, because it accesses the same memory as src_data, while the other memcpy is from the local copy of the header to the relay, so it should be safe to not convert.
Daniele
Heads up that since the log buffer is potentially in lmem, we will need to convert this function to take that into account. All those accesses to log_buf_state need to use the proper kernel abstraction for system vs I/O memory.
thanks Lucas De Marchi
if (read_offset > write_offset) { - i915_memcpy_from_wc(dst_data, src_data, write_offset); + drm_memcpy_from_wc_vaddr(dst_data, &src_map, + write_offset); bytes_to_copy = buffer_size - read_offset; } else { bytes_to_copy = write_offset - read_offset; } - i915_memcpy_from_wc(dst_data + read_offset, - src_data + read_offset, bytes_to_copy); + iosys_map_incr(&src_map, read_offset); + drm_memcpy_from_wc_vaddr(dst_data + read_offset, &src_map, + bytes_to_copy);
src_data += buffer_size; dst_data += buffer_size; -- 2.25.1
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: check if the source and destination memory address is from local memory or system memory and initialize the iosys_map accordingly (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- .../drm/i915/selftests/intel_memory_region.c | 41 +++++++++++++------ 1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index ba32893e0873..d16ecb905f3b 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -7,6 +7,7 @@ #include <linux/sort.h>
#include <drm/drm_buddy.h> +#include <drm/drm_cache.h>
#include "../i915_selftest.h"
@@ -1133,7 +1134,7 @@ static const char *repr_type(u32 type)
static struct drm_i915_gem_object * create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type, - void **out_addr) + struct iosys_map *out_addr) { struct drm_i915_gem_object *obj; void *addr; @@ -1153,7 +1154,11 @@ create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type, return addr; }
- *out_addr = addr; + if (i915_gem_object_is_lmem(obj)) + iosys_map_set_vaddr_iomem(out_addr, (void __iomem *)addr); + else + iosys_map_set_vaddr(out_addr, addr); + return obj; }
@@ -1164,24 +1169,33 @@ static int wrap_ktime_compare(const void *A, const void *B) return ktime_compare(*a, *b); }
-static void igt_memcpy_long(void *dst, const void *src, size_t size) +static void igt_memcpy_long(struct iosys_map *dst, struct iosys_map *src, + size_t size) { - unsigned long *tmp = dst; - const unsigned long *s = src; + unsigned long *tmp = dst->is_iomem ? + (unsigned long __force *)dst->vaddr_iomem : + dst->vaddr; + const unsigned long *s = src->is_iomem ? + (unsigned long __force *)src->vaddr_iomem : + src->vaddr;
size = size / sizeof(unsigned long); while (size--) *tmp++ = *s++; }
-static inline void igt_memcpy(void *dst, const void *src, size_t size) +static inline void igt_memcpy(struct iosys_map *dst, struct iosys_map *src, + size_t size) { - memcpy(dst, src, size); + memcpy(dst->is_iomem ? (void __force *)dst->vaddr_iomem : dst->vaddr, + src->is_iomem ? (void __force *)src->vaddr_iomem : src->vaddr, + size); }
-static inline void igt_memcpy_from_wc(void *dst, const void *src, size_t size) +static inline void igt_memcpy_from_wc(struct iosys_map *dst, struct iosys_map *src, + size_t size) { - i915_memcpy_from_wc(dst, src, size); + drm_memcpy_from_wc(dst, src, size); }
static int _perf_memcpy(struct intel_memory_region *src_mr, @@ -1191,7 +1205,8 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, struct drm_i915_private *i915 = src_mr->i915; const struct { const char *name; - void (*copy)(void *dst, const void *src, size_t size); + void (*copy)(struct iosys_map *dst, struct iosys_map *src, + size_t size); bool skip; } tests[] = { { @@ -1205,11 +1220,11 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, { "memcpy_from_wc", igt_memcpy_from_wc, - !i915_has_memcpy_from_wc(), + !drm_memcpy_fastcopy_supported(), }, }; struct drm_i915_gem_object *src, *dst; - void *src_addr, *dst_addr; + struct iosys_map src_addr, dst_addr; int ret = 0; int i;
@@ -1237,7 +1252,7 @@ static int _perf_memcpy(struct intel_memory_region *src_mr,
t0 = ktime_get();
- tests[i].copy(dst_addr, src_addr, size); + tests[i].copy(&dst_addr, &src_addr, size);
t1 = ktime_get(); t[pass] = ktime_sub(t1, t0);
+Thomas Zimmermann and +Daniel Vetter
Could you take a look below regarding the I/O to I/O memory access?
On Thu, Mar 03, 2022 at 11:30:11PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: check if the source and destination memory address is from local memory or system memory and initialize the iosys_map accordingly (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
.../drm/i915/selftests/intel_memory_region.c | 41 +++++++++++++------ 1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index ba32893e0873..d16ecb905f3b 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -7,6 +7,7 @@ #include <linux/sort.h>
#include <drm/drm_buddy.h> +#include <drm/drm_cache.h>
#include "../i915_selftest.h"
@@ -1133,7 +1134,7 @@ static const char *repr_type(u32 type)
static struct drm_i915_gem_object * create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type,
void **out_addr)
struct iosys_map *out_addr)
{ struct drm_i915_gem_object *obj; void *addr; @@ -1153,7 +1154,11 @@ create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type, return addr; }
- *out_addr = addr;
- if (i915_gem_object_is_lmem(obj))
iosys_map_set_vaddr_iomem(out_addr, (void __iomem *)addr);
- else
iosys_map_set_vaddr(out_addr, addr);
- return obj;
}
@@ -1164,24 +1169,33 @@ static int wrap_ktime_compare(const void *A, const void *B) return ktime_compare(*a, *b); }
-static void igt_memcpy_long(void *dst, const void *src, size_t size) +static void igt_memcpy_long(struct iosys_map *dst, struct iosys_map *src,
size_t size)
{
- unsigned long *tmp = dst;
- const unsigned long *s = src;
- unsigned long *tmp = dst->is_iomem ?
(unsigned long __force *)dst->vaddr_iomem :
dst->vaddr;
if we access vaddr_iomem/vaddr we basically break the promise of abstracting system and I/O memory. There is no point in receiving struct iosys_map as argument and then break the abstraction.
const unsigned long *s = src->is_iomem ?
(unsigned long __force *)src->vaddr_iomem :
src->vaddr;
size = size / sizeof(unsigned long); while (size--) *tmp++ = *s++;
so we basically want to copy from one place to the other on a word boundary. And it may be
a) I/O -> I/O or b) system -> I/O or c) I/O -> system
(b) and (c) should work, but AFAICS (a) is not possible with the current iosys-map API. Not even the underlying APIs have that abstracted. Both memcpy_fromio() and memcpy_toio() expect one of them to be RAM (system memory)
I remember seeing people using a temporary in buffer in system memory for proxying the copy. But maybe we need an abstraction for that? Also adding Thomas Zimmermann here for that question.
and since this is a selftest testing the performance of the memcpy from one memory region to the other, it would be good to have this test executed to a) make sure it still works and b) record in the commit message any possible slow down we are incurring.
thanks Lucas De Marchi
}
-static inline void igt_memcpy(void *dst, const void *src, size_t size) +static inline void igt_memcpy(struct iosys_map *dst, struct iosys_map *src,
size_t size)
{
- memcpy(dst, src, size);
- memcpy(dst->is_iomem ? (void __force *)dst->vaddr_iomem : dst->vaddr,
src->is_iomem ? (void __force *)src->vaddr_iomem : src->vaddr,
size);
}
-static inline void igt_memcpy_from_wc(void *dst, const void *src, size_t size) +static inline void igt_memcpy_from_wc(struct iosys_map *dst, struct iosys_map *src,
size_t size)
{
- i915_memcpy_from_wc(dst, src, size);
- drm_memcpy_from_wc(dst, src, size);
}
static int _perf_memcpy(struct intel_memory_region *src_mr, @@ -1191,7 +1205,8 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, struct drm_i915_private *i915 = src_mr->i915; const struct { const char *name;
void (*copy)(void *dst, const void *src, size_t size);
void (*copy)(struct iosys_map *dst, struct iosys_map *src,
bool skip; } tests[] = { {size_t size);
@@ -1205,11 +1220,11 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, { "memcpy_from_wc", igt_memcpy_from_wc,
!i915_has_memcpy_from_wc(),
}, }; struct drm_i915_gem_object *src, *dst;!drm_memcpy_fastcopy_supported(),
- void *src_addr, *dst_addr;
- struct iosys_map src_addr, dst_addr; int ret = 0; int i;
@@ -1237,7 +1252,7 @@ static int _perf_memcpy(struct intel_memory_region *src_mr,
t0 = ktime_get();
tests[i].copy(dst_addr, src_addr, size);
tests[i].copy(&dst_addr, &src_addr, size); t1 = ktime_get(); t[pass] = ktime_sub(t1, t0);
-- 2.25.1
Now Cc'ing Daniel properly
Lucas De Marchi
On Mon, Mar 21, 2022 at 04:00:56PM -0700, Lucas De Marchi wrote:
+Thomas Zimmermann and +Daniel Vetter
Could you take a look below regarding the I/O to I/O memory access?
On Thu, Mar 03, 2022 at 11:30:11PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: check if the source and destination memory address is from local memory or system memory and initialize the iosys_map accordingly (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
.../drm/i915/selftests/intel_memory_region.c | 41 +++++++++++++------ 1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index ba32893e0873..d16ecb905f3b 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -7,6 +7,7 @@ #include <linux/sort.h>
#include <drm/drm_buddy.h> +#include <drm/drm_cache.h>
#include "../i915_selftest.h"
@@ -1133,7 +1134,7 @@ static const char *repr_type(u32 type)
static struct drm_i915_gem_object * create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type,
void **out_addr)
struct iosys_map *out_addr)
{ struct drm_i915_gem_object *obj; void *addr; @@ -1153,7 +1154,11 @@ create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type, return addr; }
- *out_addr = addr;
- if (i915_gem_object_is_lmem(obj))
iosys_map_set_vaddr_iomem(out_addr, (void __iomem *)addr);
- else
iosys_map_set_vaddr(out_addr, addr);
- return obj;
}
@@ -1164,24 +1169,33 @@ static int wrap_ktime_compare(const void *A, const void *B) return ktime_compare(*a, *b); }
-static void igt_memcpy_long(void *dst, const void *src, size_t size) +static void igt_memcpy_long(struct iosys_map *dst, struct iosys_map *src,
size_t size)
{
- unsigned long *tmp = dst;
- const unsigned long *s = src;
- unsigned long *tmp = dst->is_iomem ?
(unsigned long __force *)dst->vaddr_iomem :
dst->vaddr;
if we access vaddr_iomem/vaddr we basically break the promise of abstracting system and I/O memory. There is no point in receiving struct iosys_map as argument and then break the abstraction.
const unsigned long *s = src->is_iomem ?
(unsigned long __force *)src->vaddr_iomem :
src->vaddr;
size = size / sizeof(unsigned long); while (size--) *tmp++ = *s++;
so we basically want to copy from one place to the other on a word boundary. And it may be
a) I/O -> I/O or b) system -> I/O or c) I/O -> system
(b) and (c) should work, but AFAICS (a) is not possible with the current iosys-map API. Not even the underlying APIs have that abstracted. Both memcpy_fromio() and memcpy_toio() expect one of them to be RAM (system memory)
I remember seeing people using a temporary in buffer in system memory for proxying the copy. But maybe we need an abstraction for that? Also adding Thomas Zimmermann here for that question.
and since this is a selftest testing the performance of the memcpy from one memory region to the other, it would be good to have this test executed to a) make sure it still works and b) record in the commit message any possible slow down we are incurring.
thanks Lucas De Marchi
}
-static inline void igt_memcpy(void *dst, const void *src, size_t size) +static inline void igt_memcpy(struct iosys_map *dst, struct iosys_map *src,
size_t size)
{
- memcpy(dst, src, size);
- memcpy(dst->is_iomem ? (void __force *)dst->vaddr_iomem : dst->vaddr,
src->is_iomem ? (void __force *)src->vaddr_iomem : src->vaddr,
size);
}
-static inline void igt_memcpy_from_wc(void *dst, const void *src, size_t size) +static inline void igt_memcpy_from_wc(struct iosys_map *dst, struct iosys_map *src,
size_t size)
{
- i915_memcpy_from_wc(dst, src, size);
- drm_memcpy_from_wc(dst, src, size);
}
static int _perf_memcpy(struct intel_memory_region *src_mr, @@ -1191,7 +1205,8 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, struct drm_i915_private *i915 = src_mr->i915; const struct { const char *name;
void (*copy)(void *dst, const void *src, size_t size);
void (*copy)(struct iosys_map *dst, struct iosys_map *src,
bool skip; } tests[] = { {size_t size);
@@ -1205,11 +1220,11 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, { "memcpy_from_wc", igt_memcpy_from_wc,
!i915_has_memcpy_from_wc(),
}, }; struct drm_i915_gem_object *src, *dst;!drm_memcpy_fastcopy_supported(),
- void *src_addr, *dst_addr;
- struct iosys_map src_addr, dst_addr; int ret = 0; int i;
@@ -1237,7 +1252,7 @@ static int _perf_memcpy(struct intel_memory_region *src_mr,
t0 = ktime_get();
tests[i].copy(dst_addr, src_addr, size);
tests[i].copy(&dst_addr, &src_addr, size); t1 = ktime_get(); t[pass] = ktime_sub(t1, t0);
-- 2.25.1
On 21.03.2022 16:07, Lucas De Marchi wrote:
Now Cc'ing Daniel properly
Lucas De Marchi
On Mon, Mar 21, 2022 at 04:00:56PM -0700, Lucas De Marchi wrote:
+Thomas Zimmermann and +Daniel Vetter
Could you take a look below regarding the I/O to I/O memory access?
On Thu, Mar 03, 2022 at 11:30:11PM +0530, Balasubramani Vivekanandan wrote:
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: check if the source and destination memory address is from local memory or system memory and initialize the iosys_map accordingly (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com
.../drm/i915/selftests/intel_memory_region.c | 41 +++++++++++++------ 1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index ba32893e0873..d16ecb905f3b 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -7,6 +7,7 @@ #include <linux/sort.h>
#include <drm/drm_buddy.h> +#include <drm/drm_cache.h>
#include "../i915_selftest.h"
@@ -1133,7 +1134,7 @@ static const char *repr_type(u32 type)
static struct drm_i915_gem_object * create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type,
void **out_addr)
struct iosys_map *out_addr)
{ struct drm_i915_gem_object *obj; void *addr; @@ -1153,7 +1154,11 @@ create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type, return addr; }
- *out_addr = addr;
- if (i915_gem_object_is_lmem(obj))
iosys_map_set_vaddr_iomem(out_addr, (void __iomem *)addr);
- else
iosys_map_set_vaddr(out_addr, addr);
- return obj;
}
@@ -1164,24 +1169,33 @@ static int wrap_ktime_compare(const void *A, const void *B) return ktime_compare(*a, *b); }
-static void igt_memcpy_long(void *dst, const void *src, size_t size) +static void igt_memcpy_long(struct iosys_map *dst, struct iosys_map *src,
size_t size)
{
- unsigned long *tmp = dst;
- const unsigned long *s = src;
- unsigned long *tmp = dst->is_iomem ?
(unsigned long __force *)dst->vaddr_iomem :
dst->vaddr;
if we access vaddr_iomem/vaddr we basically break the promise of abstracting system and I/O memory. There is no point in receiving struct iosys_map as argument and then break the abstraction.
Hi Lucas, I didn't attempt to convert the memory access using iosys_map interfaces to abstract system and I/O memory, in this patch. The intention of passing iosys_map structures instead of raw pointers in the test functions is for the benefit of igt_memcpy_from_wc() test function. igt_memcpy_from_wc() requires iosys_map variables for passing it to drm_memcpy_from_wc(). In the other test functions, though it receives iosys_map structures I have retained the behavior same as earlier by converting back the iosys_map structures to pointers. I made a short try to use iosys_map structures to perform the memory copy inside other test functions, but I dropped it after I realized that their is support lacking for (a) mentioned below in your comment. Since it requires some discussion to bring in the support for (a), I did not proceed with it.
Regards, Bala
const unsigned long *s = src->is_iomem ?
(unsigned long __force *)src->vaddr_iomem :
src->vaddr;
size = size / sizeof(unsigned long); while (size--) *tmp++ = *s++;
so we basically want to copy from one place to the other on a word boundary. And it may be
a) I/O -> I/O or b) system -> I/O or c) I/O -> system
(b) and (c) should work, but AFAICS (a) is not possible with the current iosys-map API. Not even the underlying APIs have that abstracted. Both memcpy_fromio() and memcpy_toio() expect one of them to be RAM (system memory)
I remember seeing people using a temporary in buffer in system memory for proxying the copy. But maybe we need an abstraction for that? Also adding Thomas Zimmermann here for that question.
and since this is a selftest testing the performance of the memcpy from one memory region to the other, it would be good to have this test executed to a) make sure it still works and b) record in the commit message any possible slow down we are incurring.
thanks Lucas De Marchi
}
-static inline void igt_memcpy(void *dst, const void *src, size_t size) +static inline void igt_memcpy(struct iosys_map *dst, struct iosys_map *src,
size_t size)
{
- memcpy(dst, src, size);
- memcpy(dst->is_iomem ? (void __force *)dst->vaddr_iomem : dst->vaddr,
src->is_iomem ? (void __force *)src->vaddr_iomem : src->vaddr,
size);
}
-static inline void igt_memcpy_from_wc(void *dst, const void *src, size_t size) +static inline void igt_memcpy_from_wc(struct iosys_map *dst, struct iosys_map *src,
size_t size)
{
- i915_memcpy_from_wc(dst, src, size);
- drm_memcpy_from_wc(dst, src, size);
}
static int _perf_memcpy(struct intel_memory_region *src_mr, @@ -1191,7 +1205,8 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, struct drm_i915_private *i915 = src_mr->i915; const struct { const char *name;
void (*copy)(void *dst, const void *src, size_t size);
void (*copy)(struct iosys_map *dst, struct iosys_map *src,
bool skip; } tests[] = { {size_t size);
@@ -1205,11 +1220,11 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, { "memcpy_from_wc", igt_memcpy_from_wc,
!i915_has_memcpy_from_wc(),
}, }; struct drm_i915_gem_object *src, *dst;!drm_memcpy_fastcopy_supported(),
- void *src_addr, *dst_addr;
- struct iosys_map src_addr, dst_addr; int ret = 0; int i;
@@ -1237,7 +1252,7 @@ static int _perf_memcpy(struct intel_memory_region *src_mr,
t0 = ktime_get();
tests[i].copy(dst_addr, src_addr, size);
tests[i].copy(&dst_addr, &src_addr, size); t1 = ktime_get(); t[pass] = ktime_sub(t1, t0);
-- 2.25.1
io mapped memory should not be directly dereferenced to ensure portability. io memory should be read/written/copied using helper functions. i915_memcpy_from_wc() function was used to copy the data from io memory to a temporary buffer and pointer to the temporary buffer was passed to CRC calculation function. But i915_memcpy_from_wc() only does a copy if the platform supports fast copy using non-temporal instructions. Otherwise the pointer to io memory was passed for CRC calculation. CRC function will directly dereference io memory and would not work properly on non-x86 platforms. To make it portable, it should be ensured always temporary buffer is used for CRC and not io memory. drm_memcpy_from_wc_vaddr() is now used for copying instead of i915_memcpy_from_wc() for 2 reasons. - i915_memcpy_from_wc() will be deprecated. - drm_memcpy_from_wc_vaddr() will not fail if the fast copy is not supported but uses memcpy_fromio as fallback for copying.
Cc: Matthew Brost matthew.brost@intel.com Cc: Michał Winiarski michal.winiarski@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index 37c38bdd5f47..79d2bd7ef3b9 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -3,6 +3,7 @@ * Copyright © 2018 Intel Corporation */
+#include <drm/drm_cache.h> #include <linux/crc32.h>
#include "gem/i915_gem_stolen.h" @@ -82,7 +83,7 @@ __igt_reset_stolen(struct intel_gt *gt, for (page = 0; page < num_pages; page++) { dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT); void __iomem *s; - void *in; + struct iosys_map src_map;
ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start, @@ -98,10 +99,9 @@ __igt_reset_stolen(struct intel_gt *gt, ((page + 1) << PAGE_SHIFT) - 1)) memset_io(s, STACK_MAGIC, PAGE_SIZE);
- in = (void __force *)s; - if (i915_memcpy_from_wc(tmp, in, PAGE_SIZE)) - in = tmp; - crc[page] = crc32_le(0, in, PAGE_SIZE); + iosys_map_set_vaddr_iomem(&src_map, s); + drm_memcpy_from_wc_vaddr(tmp, &src_map, PAGE_SIZE); + crc[page] = crc32_le(0, tmp, PAGE_SIZE);
io_mapping_unmap(s); } @@ -122,7 +122,7 @@ __igt_reset_stolen(struct intel_gt *gt, for (page = 0; page < num_pages; page++) { dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT); void __iomem *s; - void *in; + struct iosys_map src_map; u32 x;
ggtt->vm.insert_page(&ggtt->vm, dma, @@ -134,10 +134,9 @@ __igt_reset_stolen(struct intel_gt *gt, ggtt->error_capture.start, PAGE_SIZE);
- in = (void __force *)s; - if (i915_memcpy_from_wc(tmp, in, PAGE_SIZE)) - in = tmp; - x = crc32_le(0, in, PAGE_SIZE); + iosys_map_set_vaddr_iomem(&src_map, s); + drm_memcpy_from_wc_vaddr(tmp, &src_map, PAGE_SIZE); + x = crc32_le(0, tmp, PAGE_SIZE);
if (x != crc[page] && !__drm_mm_interval_first(>->i915->mm.stolen, @@ -146,7 +145,7 @@ __igt_reset_stolen(struct intel_gt *gt, pr_debug("unused stolen page %pa modified by GPU reset\n", &page); if (count++ == 0) - igt_hexdump(in, PAGE_SIZE); + igt_hexdump(tmp, PAGE_SIZE); max = page; }
Pointer passed to zlib_deflate() for compression could point to io mapped memory and might end up in direct derefencing. io mapped memory is copied to a temporary buffer, which is then shared to zlib_deflate(), only for the case where platform supports fast copy using non-temporal instructions. If the platform lacks support, then io mapped memory is directly used.
Direct dereferencing of io memory makes driver not portable outside x86 and should be avoided.
With this patch, io memory is always copied to a temporary buffer irrespective of platform support for fast copy. The i915_has_memcpy_from_wc() check is removed. And drm_memcpy_from_wc_vaddr() is now used for copying instead of i915_memcpy_from_wc() for 2 reasons. - i915_memcpy_from_wc() will be deprecated. - drm_memcpy_from_wc_vaddr() will not fail if the fast copy is not supported instead continues copying using memcpy_fromio as fallback.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++++++++++++------------ 1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 4967e79806f8..1ca5072b85db 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -259,9 +259,12 @@ static bool compress_init(struct i915_vma_compress *c) return false; }
- c->tmp = NULL; - if (i915_has_memcpy_from_wc()) - c->tmp = pool_alloc(&c->pool, ALLOW_FAIL); + c->tmp = pool_alloc(&c->pool, ALLOW_FAIL); + if (!c->tmp) { + kfree(zstream->workspace); + pool_fini(&c->pool); + return false; + }
return true; } @@ -293,15 +296,17 @@ static void *compress_next_page(struct i915_vma_compress *c, }
static int compress_page(struct i915_vma_compress *c, - void *src, - struct i915_vma_coredump *dst, - bool wc) + struct iosys_map *src, + struct i915_vma_coredump *dst) { struct z_stream_s *zstream = &c->zstream;
- zstream->next_in = src; - if (wc && c->tmp && i915_memcpy_from_wc(c->tmp, src, PAGE_SIZE)) + if (src->is_iomem) { + drm_memcpy_from_wc_vaddr(c->tmp, src, PAGE_SIZE); zstream->next_in = c->tmp; + } else { + zstream->next_in = src->vaddr; + } zstream->avail_in = PAGE_SIZE;
do { @@ -390,9 +395,8 @@ static bool compress_start(struct i915_vma_compress *c) }
static int compress_page(struct i915_vma_compress *c, - void *src, - struct i915_vma_coredump *dst, - bool wc) + struct iosys_map *src, + struct i915_vma_coredump *dst) { void *ptr;
@@ -400,8 +404,7 @@ static int compress_page(struct i915_vma_compress *c, if (!ptr) return -ENOMEM;
- if (!(wc && i915_memcpy_from_wc(ptr, src, PAGE_SIZE))) - memcpy(ptr, src, PAGE_SIZE); + drm_memcpy_from_wc_vaddr(ptr, src, PAGE_SIZE); list_add_tail(&virt_to_page(ptr)->lru, &dst->page_list); cond_resched();
@@ -1055,6 +1058,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, if (drm_mm_node_allocated(&ggtt->error_capture)) { void __iomem *s; dma_addr_t dma; + struct iosys_map src;
for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { mutex_lock(&ggtt->error_mutex); @@ -1063,9 +1067,8 @@ i915_vma_coredump_create(const struct intel_gt *gt, mb();
s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE); - ret = compress_page(compress, - (void __force *)s, dst, - true); + iosys_map_set_vaddr_iomem(&src, s); + ret = compress_page(compress, &src, dst); io_mapping_unmap(s);
mb(); @@ -1077,6 +1080,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, } else if (vma_res->bi.lmem) { struct intel_memory_region *mem = vma_res->mr; dma_addr_t dma; + struct iosys_map src;
for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { void __iomem *s; @@ -1084,15 +1088,15 @@ i915_vma_coredump_create(const struct intel_gt *gt, s = io_mapping_map_wc(&mem->iomap, dma - mem->region.start, PAGE_SIZE); - ret = compress_page(compress, - (void __force *)s, dst, - true); + iosys_map_set_vaddr_iomem(&src, s); + ret = compress_page(compress, &src, dst); io_mapping_unmap(s); if (ret) break; } } else { struct page *page; + struct iosys_map src;
for_each_sgt_page(page, iter, vma_res->bi.pages) { void *s; @@ -1100,7 +1104,8 @@ i915_vma_coredump_create(const struct intel_gt *gt, drm_clflush_pages(&page, 1);
s = kmap(page); - ret = compress_page(compress, s, dst, false); + iosys_map_set_vaddr(&src, s); + ret = compress_page(compress, &src, dst); kunmap(page);
drm_clflush_pages(&page, 1);
looks good to me overall but I would get others r-b.
Patches 1-3 Reviewed-by: Nirmoy Das nirmoy.das@intel.com
Patches 4-7 Acked-by: Nirmoy Das nirmoy.das@intel.com
On 03/03/2022 19:00, Balasubramani Vivekanandan wrote:
drm_memcpy_from_wc() performs fast copy from WC memory type using non-temporal instructions. Now there are two similar implementations of this function. One exists in drm_cache.c as drm_memcpy_from_wc() and another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc(). drm_memcpy_from_wc() was the recent addition through the series https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
The goal of this patch series is to change all users of i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common implementation in drm and eventually remove the copy from i915.
Another benefit of using memcpy functions from drm is that drm_memcpy_from_wc() is available for non-x86 architectures. i915_memcpy_from_wc() is implemented only for x86 and prevents building i915 for ARM64. drm_memcpy_from_wc() does fast copy using non-temporal instructions for x86 and for other architectures makes use of memcpy() family of functions as fallback.
Another major difference is unlike i915_memcpy_from_wc(), drm_memcpy_from_wc() will not fail if the passed address argument is not alignment to be used with non-temporal load instructions or if the platform lacks support for those instructions (non-temporal load instructions are provided through SSE4.1 instruction set extension). Instead drm_memcpy_from_wc() continues with fallback functions to complete the copy. This relieves the caller from checking the return value of i915_memcpy_from_wc() and explicitly using a fallback.
Follow up series will be created to remove the memcpy_from_wc functions from i915 once the dependency is completely removed.
v2: Fixed missing check to find if the address is from system memory or io memory and use the right initialization function to construct the iosys_map structure (Review feedback from Lucas)
Cc: Jani Nikulajani.nikula@intel.com Cc: Lucas De Marchilucas.demarchi@intel.com Cc: David Airlieairlied@linux.ie Cc: Daniel Vetterdaniel@ffwll.ch Cc: Chris Wilsonchris.p.wilson@intel.com Cc: Thomas Hellstr_mthomas.hellstrom@linux.intel.com Cc: Joonas Lahtinenjoonas.lahtinen@linux.intel.com Cc: Rodrigo Vivirodrigo.vivi@intel.com Cc: Tvrtko Ursulintvrtko.ursulin@linux.intel.com Cc: Nirmoy Dasnirmoy.das@intel.com
Balasubramani Vivekanandan (7): drm: Relax alignment constraint for destination address drm: Add drm_memcpy_from_wc() variant which accepts destination address drm/i915: use the memcpy_from_wc call from the drm drm/i915/guc: use the memcpy_from_wc call from the drm drm/i915/selftests: use the memcpy_from_wc call from the drm drm/i915/gt: Avoid direct dereferencing of io memory drm/i915: Avoid dereferencing io mapped memory
drivers/gpu/drm/drm_cache.c | 98 +++++++++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 15 ++- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++---- .../drm/i915/selftests/intel_memory_region.c | 41 +++++--- include/drm/drm_cache.h | 3 + 7 files changed, 174 insertions(+), 55 deletions(-)
dri-devel@lists.freedesktop.org