drm_memcpy_from_wc() performs fast copy from WC memory type using non-temporal instructions. Now there are two similar implementations of this function. One exists in drm_cache.c as drm_memcpy_from_wc() and another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc(). drm_memcpy_from_wc() was the recent addition through the series https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
The goal of this patch series is to change all users of i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common implementation in drm and eventually remove the copy from i915.
Another benefit of using memcpy functions from drm is that drm_memcpy_from_wc() is available for non-x86 architectures. i915_memcpy_from_wc() is implemented only for x86 and prevents building i915 for ARM64. drm_memcpy_from_wc() does fast copy using non-temporal instructions for x86 and for other architectures makes use of memcpy() family of functions as fallback.
Another major difference is unlike i915_memcpy_from_wc(), drm_memcpy_from_wc() will not fail if the passed address argument is not alignment to be used with non-temporal load instructions or if the platform lacks support for those instructions (non-temporal load instructions are provided through SSE4.1 instruction set extension). Instead drm_memcpy_from_wc() continues with fallback functions to complete the copy. This relieves the caller from checking the return value of i915_memcpy_from_wc() and explicitly using a fallback.
Follow up series will be created to remove the memcpy_from_wc functions from i915 once the dependency is completely removed.
v2: Fixed missing check to find if the address is from system memory or io memory and use the right initialization function to construct the iosys_map structure (Review feedback from Lucas)
v3: "drm/i915/guc: use the memcpy_from_wc call from the drm" replaced by patch "drm/i915/guc: use iosys_map abstraction to access GuC log". New patch does a wider change compared to the old patch. It completely changes the access to GuC log using iosys_map abstraction, in addition to using drm_memcpy_from_wc.
Cc: Jani Nikula jani.nikula@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Cc: Nirmoy Das nirmoy.das@intel.com
Balasubramani Vivekanandan (7): drm: Relax alignment constraint for destination address drm: Add drm_memcpy_from_wc() variant which accepts destination address drm/i915: use the memcpy_from_wc call from the drm drm/i915/guc: use iosys_map abstraction to access GuC log drm/i915/selftests: use the memcpy_from_wc call from the drm drm/i915/gt: Avoid direct dereferencing of io memory drm/i915: Avoid dereferencing io mapped memory
drivers/gpu/drm/drm_cache.c | 99 +++++++++++++++++-- drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 +- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++-- drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h | 2 +- .../gpu/drm/i915/gt/uc/intel_guc_capture.c | 52 +++++++--- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 77 +++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_guc_log.h | 3 +- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++---- .../drm/i915/selftests/intel_memory_region.c | 41 +++++--- include/drm/drm_cache.h | 3 + 10 files changed, 261 insertions(+), 90 deletions(-)
There is no need for the destination address to be aligned to 16 byte boundary to be able to use the non-temporal instructions while copying. Non-temporal instructions are used only for loading from the source address which has alignment constraints. We only need to take care of using the right instructions, based on whether destination address is aligned or not, while storing the data to the destination address.
__memcpy_ntdqu is copied from i915/i915_memcpy.c
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Chris Wilson chris.p.wilson@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com Reviewed-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/drm_cache.c | 44 ++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 7051c9c909c2..2e2545df3310 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -278,18 +278,50 @@ static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len) kernel_fpu_end(); }
+static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len) +{ + kernel_fpu_begin(); + + while (len >= 4) { + asm("movntdqa (%0), %%xmm0\n" + "movntdqa 16(%0), %%xmm1\n" + "movntdqa 32(%0), %%xmm2\n" + "movntdqa 48(%0), %%xmm3\n" + "movups %%xmm0, (%1)\n" + "movups %%xmm1, 16(%1)\n" + "movups %%xmm2, 32(%1)\n" + "movups %%xmm3, 48(%1)\n" + :: "r" (src), "r" (dst) : "memory"); + src += 64; + dst += 64; + len -= 4; + } + while (len--) { + asm("movntdqa (%0), %%xmm0\n" + "movups %%xmm0, (%1)\n" + :: "r" (src), "r" (dst) : "memory"); + src += 16; + dst += 16; + } + + kernel_fpu_end(); +} + /* * __drm_memcpy_from_wc copies @len bytes from @src to @dst using - * non-temporal instructions where available. Note that all arguments - * (@src, @dst) must be aligned to 16 bytes and @len must be a multiple - * of 16. + * non-temporal instructions where available. Note that @src must be aligned to + * 16 bytes and @len must be a multiple of 16. */ static void __drm_memcpy_from_wc(void *dst, const void *src, unsigned long len) { - if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15)) + if (unlikely(((unsigned long)src | len) & 15)) { memcpy(dst, src, len); - else if (likely(len)) - __memcpy_ntdqa(dst, src, len >> 4); + } else if (likely(len)) { + if (IS_ALIGNED((unsigned long)dst, 16)) + __memcpy_ntdqa(dst, src, len >> 4); + else + __memcpy_ntdqu(dst, src, len >> 4); + } }
/**
Fast copy using non-temporal instructions for x86 currently exists at two locations. One is implemented in i915 driver at i915/i915_memcpy.c and another copy at drm_cache.c. The plan is to remove the duplicate implementation in i915 driver and use the functions from drm_cache.c.
A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts address as argument instead of iosys_map for destination. It is a very common scenario in i915 to copy from a WC memory type, which may be an io memory or a system memory to a destination address pointing to system memory. To avoid the overhead of creating iosys_map type for the destination, new variant is created to accept the address directly.
Also a new function is exported in drm_cache.c to find if the fast copy is supported by the platform or not. It is required for i915.
v2: Added a new argument to drm_memcpy_from_wc_vaddr() which provides the offset into the src address to start copy from.
Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com Reviewed-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/drm_cache.c | 55 +++++++++++++++++++++++++++++++++++++ include/drm/drm_cache.h | 3 ++ 2 files changed, 58 insertions(+)
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 2e2545df3310..8c7af755f7bc 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -358,6 +358,55 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+/** + * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source + * that may be WC to a destination in system memory. + * @dst: The destination pointer + * @src: The source pointer + * @src_offset: The offset from which to copy + * @len: The size of the area to transfer in bytes + * + * Same as drm_memcpy_from_wc except destination is accepted as system memory + * address. Useful in situations where passing destination address as iosys_map + * is simply an overhead and can be avoided. + */ +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + size_t src_offset, unsigned long len) +{ + const void *src_addr = src->is_iomem ? + (void const __force *)src->vaddr_iomem : + src->vaddr; + + if (WARN_ON(in_interrupt())) { + iosys_map_memcpy_from(dst, src, src_offset, len); + return; + } + + if (static_branch_likely(&has_movntdqa)) { + __drm_memcpy_from_wc(dst, src_addr + src_offset, len); + return; + } + + iosys_map_memcpy_from(dst, src, src_offset, len); +} +EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr); + +/* + * drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal + * instructions is supported + * + * Returns true if platform has support for fast copying from wc memory type + * using non-temporal instructions. Else false. + */ +bool drm_memcpy_fastcopy_supported(void) +{ + if (static_branch_likely(&has_movntdqa)) + return true; + + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + /* * drm_memcpy_init_early - One time initialization of the WC memcpy code */ @@ -382,6 +431,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc);
+bool drm_memcpy_fastcopy_supported(void) +{ + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + void drm_memcpy_init_early(void) { } diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h index 22deb216b59c..d1b57c84a659 100644 --- a/include/drm/drm_cache.h +++ b/include/drm/drm_cache.h @@ -77,4 +77,7 @@ void drm_memcpy_init_early(void); void drm_memcpy_from_wc(struct iosys_map *dst, const struct iosys_map *src, unsigned long len); +bool drm_memcpy_fastcopy_supported(void); +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + size_t src_offset, unsigned long len); #endif
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: Pass newly added src offset argument to the modified drm_memcpy_from_wc_vaddr() function.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com Reviewed-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 06b1b188ce5a..c1ff0a591a24 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -438,16 +438,16 @@ static void i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, u64 offset, void *dst, int size) { void __iomem *src_map; - void __iomem *src_ptr; + struct iosys_map src; + dma_addr_t dma = i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT);
src_map = io_mapping_map_wc(&obj->mm.region->iomap, dma - obj->mm.region->region.start, PAGE_SIZE);
- src_ptr = src_map + offset_in_page(offset); - if (!i915_memcpy_from_wc(dst, (void __force *)src_ptr, size)) - memcpy_fromio(dst, src_ptr, size); + iosys_map_set_vaddr_iomem(&src, src_map); + drm_memcpy_from_wc_vaddr(dst, &src, offset_in_page(offset), size);
io_mapping_unmap(src_map); }
Pointer to the GuC log may be pointing to system memory or device memory based on if the GuC log is backed by system memory or GPU local memory. If the GuC log is on the local memory, we need to use memcpy_[from/to]io APIs to access the logs to support i915 on non-x86 architectures. iosys_map family of APIs provide the needed abstraction to access such address pointers. There is parallel work ongoing to move all such memory access in i915 to iosys_map APIs. Pointer to GuC log ported to iosys_map in this patch as it provides a good base when changing to drm_memcpy_from_wc.
Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com --- drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h | 2 +- .../gpu/drm/i915/gt/uc/intel_guc_capture.c | 52 +++++++++---- drivers/gpu/drm/i915/gt/uc/intel_guc_log.c | 77 ++++++++++++++----- drivers/gpu/drm/i915/gt/uc/intel_guc_log.h | 3 +- 4 files changed, 98 insertions(+), 36 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h index 3624abfd22d1..47bed2a0c409 100644 --- a/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h @@ -21,7 +21,7 @@ struct file; */ struct __guc_capture_bufstate { u32 size; - void *data; + struct iosys_map *data_map; u32 rd; u32 wr; }; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c index c4e25966d3e9..c4f7a28956b8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c @@ -5,6 +5,7 @@
#include <linux/types.h>
+#include <drm/drm_cache.h> #include <drm/drm_print.h>
#include "gt/intel_engine_regs.h" @@ -826,7 +827,6 @@ guc_capture_log_remove_dw(struct intel_guc *guc, struct __guc_capture_bufstate * struct drm_i915_private *i915 = guc_to_gt(guc)->i915; int tries = 2; int avail = 0; - u32 *src_data;
if (!guc_capture_buf_cnt(buf)) return 0; @@ -834,8 +834,7 @@ guc_capture_log_remove_dw(struct intel_guc *guc, struct __guc_capture_bufstate * while (tries--) { avail = guc_capture_buf_cnt_to_end(buf); if (avail >= sizeof(u32)) { - src_data = (u32 *)(buf->data + buf->rd); - *dw = *src_data; + *dw = iosys_map_rd(buf->data_map, buf->rd, u32); buf->rd += 4; return 4; } @@ -852,7 +851,7 @@ guc_capture_data_extracted(struct __guc_capture_bufstate *b, int size, void *dest) { if (guc_capture_buf_cnt_to_end(b) >= size) { - memcpy(dest, (b->data + b->rd), size); + drm_memcpy_from_wc_vaddr(dest, b->data_map, b->rd, size); b->rd += size; return true; } @@ -1343,22 +1342,24 @@ static void __guc_capture_process_output(struct intel_guc *guc) struct intel_uc *uc = container_of(guc, typeof(*uc), guc); struct drm_i915_private *i915 = guc_to_gt(guc)->i915; struct guc_log_buffer_state log_buf_state_local; - struct guc_log_buffer_state *log_buf_state; + unsigned int capture_offset; struct __guc_capture_bufstate buf; - void *src_data = NULL; + struct iosys_map src_map; bool new_overflow; int ret;
- log_buf_state = guc->log.buf_addr + - (sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER); - src_data = guc->log.buf_addr + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER); + src_map = IOSYS_MAP_INIT_OFFSET(&guc->log.buf_map, + intel_guc_get_log_buffer_offset(GUC_CAPTURE_LOG_BUFFER));
/* * Make a copy of the state structure, inside GuC log buffer * (which is uncached mapped), on the stack to avoid reading * from it multiple times. */ - memcpy(&log_buf_state_local, log_buf_state, sizeof(struct guc_log_buffer_state)); + capture_offset = sizeof(struct guc_log_buffer_state) * GUC_CAPTURE_LOG_BUFFER; + drm_memcpy_from_wc_vaddr(&log_buf_state_local, &guc->log.buf_map, + capture_offset, + sizeof(struct guc_log_buffer_state)); buffer_size = intel_guc_get_log_buffer_size(GUC_CAPTURE_LOG_BUFFER); read_offset = log_buf_state_local.read_ptr; write_offset = log_buf_state_local.sampled_write_ptr; @@ -1385,7 +1386,7 @@ static void __guc_capture_process_output(struct intel_guc *guc) buf.size = buffer_size; buf.rd = read_offset; buf.wr = write_offset; - buf.data = src_data; + buf.data_map = &src_map;
if (!uc->reset_in_progress) { do { @@ -1394,8 +1395,33 @@ static void __guc_capture_process_output(struct intel_guc *guc) }
/* Update the state of log buffer err-cap state */ - log_buf_state->read_ptr = write_offset; - log_buf_state->flush_to_file = 0; + iosys_map_wr_field(&guc->log.buf_map, capture_offset, + struct guc_log_buffer_state, read_ptr, write_offset); + + /* flush_to_file is a bitfield. iosys_map_wr_field cannot be used to + * update bitfield member types. We make use of another member variable + * `flags` which is a union of flush_to_file as following, to update + * the flush_to_file bitfield. + * + * ==================================================================== + * union { + * struct { + * u32 flush_to_file:1; + * u32 buffer_full_cnt:4; + * u32 reserved:27; + * }; + * u32 flags; + * }; + * ==================================================================== + */ + log_buf_state_local.flags = iosys_map_rd_field(&guc->log.buf_map, + capture_offset, + struct guc_log_buffer_state, + flags); + log_buf_state_local.flush_to_file = 0; + iosys_map_wr_field(&guc->log.buf_map, capture_offset, + struct guc_log_buffer_state, flags, + log_buf_state_local.flags); __guc_capture_flushlog_complete(guc); }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c index 78d2989fe917..62b917fbe731 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.c @@ -3,6 +3,7 @@ * Copyright © 2014-2019 Intel Corporation */
+#include <drm/drm_cache.h> #include <linux/debugfs.h> #include <linux/string_helpers.h>
@@ -217,11 +218,13 @@ size_t intel_guc_get_log_buffer_offset(enum guc_log_buffer_type type) static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log) { unsigned int buffer_size, read_offset, write_offset, bytes_to_copy, full_cnt; - struct guc_log_buffer_state *log_buf_state, *log_buf_snapshot_state; + struct guc_log_buffer_state *log_buf_snapshot_state; struct guc_log_buffer_state log_buf_state_local; enum guc_log_buffer_type type; - void *src_data, *dst_data; + void *dst_data; bool new_overflow; + struct iosys_map src_data; + unsigned int type_offset;
mutex_lock(&log->relay.lock);
@@ -229,8 +232,7 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log) goto out_unlock;
/* Get the pointer to shared GuC log buffer */ - src_data = log->buf_addr; - log_buf_state = src_data; + src_data = log->buf_map;
/* Get the pointer to local buffer to store the logs */ log_buf_snapshot_state = dst_data = guc_get_write_buffer(log); @@ -247,18 +249,22 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log) }
/* Actual logs are present from the 2nd page */ - src_data += PAGE_SIZE; + iosys_map_incr(&src_data, PAGE_SIZE); dst_data += PAGE_SIZE;
/* For relay logging, we exclude error state capture */ - for (type = GUC_DEBUG_LOG_BUFFER; type <= GUC_CRASH_DUMP_LOG_BUFFER; type++) { + for (type = GUC_DEBUG_LOG_BUFFER, type_offset = 0; + type < GUC_CRASH_DUMP_LOG_BUFFER; + type++, type_offset += sizeof(struct guc_log_buffer_state)) { /* * Make a copy of the state structure, inside GuC log buffer * (which is uncached mapped), on the stack to avoid reading * from it multiple times. */ - memcpy(&log_buf_state_local, log_buf_state, - sizeof(struct guc_log_buffer_state)); + iosys_map_memcpy_from(&log_buf_state_local, &log->buf_map, + type_offset, + sizeof(struct guc_log_buffer_state)); + buffer_size = intel_guc_get_log_buffer_size(type); read_offset = log_buf_state_local.read_ptr; write_offset = log_buf_state_local.sampled_write_ptr; @@ -268,15 +274,39 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log) log->stats[type].flush += log_buf_state_local.flush_to_file; new_overflow = intel_guc_check_log_buf_overflow(log, type, full_cnt);
- /* Update the state of shared log buffer */ - log_buf_state->read_ptr = write_offset; - log_buf_state->flush_to_file = 0; - log_buf_state++; - /* First copy the state structure in snapshot buffer */ memcpy(log_buf_snapshot_state, &log_buf_state_local, sizeof(struct guc_log_buffer_state));
+ /* Update the state of shared log buffer */ + iosys_map_wr_field(&log->buf_map, type_offset, + struct guc_log_buffer_state, read_ptr, + write_offset); + /* flush_to_file is a bitfield. iosys_map_wr_field cannot be used to + * update bitfield member types. We make use of another member variable + * `flags` which is a union of flush_to_file as following, to update + * the flush_to_file bitfield. + * + * ==================================================================== + * union { + * struct { + * u32 flush_to_file:1; + * u32 buffer_full_cnt:4; + * u32 reserved:27; + * }; + * u32 flags; + * }; + * ==================================================================== + */ + log_buf_state_local.flags = iosys_map_rd_field(&log->buf_map, + type_offset, + struct guc_log_buffer_state, + flags); + log_buf_state_local.flush_to_file = 0; + iosys_map_wr_field(&log->buf_map, type_offset, + struct guc_log_buffer_state, flags, + log_buf_state_local.flags); + /* * The write pointer could have been updated by GuC firmware, * after sending the flush interrupt to Host, for consistency @@ -301,15 +331,16 @@ static void _guc_log_copy_debuglogs_for_relay(struct intel_guc_log *log)
/* Just copy the newly written data */ if (read_offset > write_offset) { - i915_memcpy_from_wc(dst_data, src_data, write_offset); + drm_memcpy_from_wc_vaddr(dst_data, &src_data, 0, + write_offset); bytes_to_copy = buffer_size - read_offset; } else { bytes_to_copy = write_offset - read_offset; } - i915_memcpy_from_wc(dst_data + read_offset, - src_data + read_offset, bytes_to_copy); + drm_memcpy_from_wc_vaddr(dst_data + read_offset, &src_data, + read_offset, bytes_to_copy);
- src_data += buffer_size; + iosys_map_incr(&src_data, buffer_size); dst_data += buffer_size; }
@@ -331,7 +362,7 @@ static int guc_log_relay_map(struct intel_guc_log *log) { lockdep_assert_held(&log->relay.lock);
- if (!log->vma || !log->buf_addr) + if (!log->vma || !iosys_map_is_null(&log->buf_map)) return -ENODEV;
/* @@ -505,7 +536,11 @@ int intel_guc_log_create(struct intel_guc_log *log) i915_vma_unpin_and_release(&log->vma, 0); goto err; } - log->buf_addr = vaddr; + + if (i915_gem_object_is_lmem(log->vma->obj)) + iosys_map_set_vaddr_iomem(&log->buf_map, (void __iomem *)vaddr); + else + iosys_map_set_vaddr(&log->buf_map, vaddr);
log->level = __get_default_log_level(log); DRM_DEBUG_DRIVER("guc_log_level=%d (%s, verbose:%s, verbosity:%d)\n", @@ -522,7 +557,7 @@ int intel_guc_log_create(struct intel_guc_log *log)
void intel_guc_log_destroy(struct intel_guc_log *log) { - log->buf_addr = NULL; + iosys_map_clear(&log->buf_map); i915_vma_unpin_and_release(&log->vma, I915_VMA_RELEASE_MAP); }
@@ -568,7 +603,7 @@ int intel_guc_log_set_level(struct intel_guc_log *log, u32 level)
bool intel_guc_log_relay_created(const struct intel_guc_log *log) { - return log->buf_addr; + return !iosys_map_is_null(&log->buf_map); }
int intel_guc_log_relay_open(struct intel_guc_log *log) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h index 18007e639be9..a66e882ba716 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_log.h @@ -6,6 +6,7 @@ #ifndef _INTEL_GUC_LOG_H_ #define _INTEL_GUC_LOG_H_
+#include <linux/iosys-map.h> #include <linux/mutex.h> #include <linux/relay.h> #include <linux/workqueue.h> @@ -49,7 +50,7 @@ struct intel_guc; struct intel_guc_log { u32 level; struct i915_vma *vma; - void *buf_addr; + struct iosys_map buf_map; struct { bool buf_in_use; bool started;
memcpy_from_wc functions in i915_memcpy.c will be removed and replaced by the implementation in drm_cache.c. Updated to use the functions provided by drm_cache.c.
v2: check if the source and destination memory address is from local memory or system memory and initialize the iosys_map accordingly (Lucas)
Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellstr_m thomas.hellstrom@linux.intel.com Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Daniel Vetter daniel.vetter@ffwll.ch
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- .../drm/i915/selftests/intel_memory_region.c | 41 +++++++++++++------ 1 file changed, 28 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 73eb53edb8de..420210c20ad5 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -7,6 +7,7 @@ #include <linux/sort.h>
#include <drm/drm_buddy.h> +#include <drm/drm_cache.h>
#include "../i915_selftest.h"
@@ -1141,7 +1142,7 @@ static const char *repr_type(u32 type)
static struct drm_i915_gem_object * create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type, - void **out_addr) + struct iosys_map *out_addr) { struct drm_i915_gem_object *obj; void *addr; @@ -1161,7 +1162,11 @@ create_region_for_mapping(struct intel_memory_region *mr, u64 size, u32 type, return addr; }
- *out_addr = addr; + if (i915_gem_object_is_lmem(obj)) + iosys_map_set_vaddr_iomem(out_addr, (void __iomem *)addr); + else + iosys_map_set_vaddr(out_addr, addr); + return obj; }
@@ -1172,24 +1177,33 @@ static int wrap_ktime_compare(const void *A, const void *B) return ktime_compare(*a, *b); }
-static void igt_memcpy_long(void *dst, const void *src, size_t size) +static void igt_memcpy_long(struct iosys_map *dst, struct iosys_map *src, + size_t size) { - unsigned long *tmp = dst; - const unsigned long *s = src; + unsigned long *tmp = dst->is_iomem ? + (unsigned long __force *)dst->vaddr_iomem : + dst->vaddr; + const unsigned long *s = src->is_iomem ? + (unsigned long __force *)src->vaddr_iomem : + src->vaddr;
size = size / sizeof(unsigned long); while (size--) *tmp++ = *s++; }
-static inline void igt_memcpy(void *dst, const void *src, size_t size) +static inline void igt_memcpy(struct iosys_map *dst, struct iosys_map *src, + size_t size) { - memcpy(dst, src, size); + memcpy(dst->is_iomem ? (void __force *)dst->vaddr_iomem : dst->vaddr, + src->is_iomem ? (void __force *)src->vaddr_iomem : src->vaddr, + size); }
-static inline void igt_memcpy_from_wc(void *dst, const void *src, size_t size) +static inline void igt_memcpy_from_wc(struct iosys_map *dst, struct iosys_map *src, + size_t size) { - i915_memcpy_from_wc(dst, src, size); + drm_memcpy_from_wc(dst, src, size); }
static int _perf_memcpy(struct intel_memory_region *src_mr, @@ -1199,7 +1213,8 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, struct drm_i915_private *i915 = src_mr->i915; const struct { const char *name; - void (*copy)(void *dst, const void *src, size_t size); + void (*copy)(struct iosys_map *dst, struct iosys_map *src, + size_t size); bool skip; } tests[] = { { @@ -1213,11 +1228,11 @@ static int _perf_memcpy(struct intel_memory_region *src_mr, { "memcpy_from_wc", igt_memcpy_from_wc, - !i915_has_memcpy_from_wc(), + !drm_memcpy_fastcopy_supported(), }, }; struct drm_i915_gem_object *src, *dst; - void *src_addr, *dst_addr; + struct iosys_map src_addr, dst_addr; int ret = 0; int i;
@@ -1245,7 +1260,7 @@ static int _perf_memcpy(struct intel_memory_region *src_mr,
t0 = ktime_get();
- tests[i].copy(dst_addr, src_addr, size); + tests[i].copy(&dst_addr, &src_addr, size);
t1 = ktime_get(); t[pass] = ktime_sub(t1, t0);
io mapped memory should not be directly dereferenced to ensure portability. io memory should be read/written/copied using helper functions. i915_memcpy_from_wc() function was used to copy the data from io memory to a temporary buffer and pointer to the temporary buffer was passed to CRC calculation function. But i915_memcpy_from_wc() only does a copy if the platform supports fast copy using non-temporal instructions. Otherwise the pointer to io memory was passed for CRC calculation. CRC function will directly dereference io memory and would not work properly on non-x86 platforms. To make it portable, it should be ensured always temporary buffer is used for CRC and not io memory. drm_memcpy_from_wc_vaddr() is now used for copying instead of i915_memcpy_from_wc() for 2 reasons. - i915_memcpy_from_wc() will be deprecated. - drm_memcpy_from_wc_vaddr() will not fail if the fast copy is not supported but uses memcpy_fromio as fallback for copying.
Cc: Matthew Brost matthew.brost@intel.com Cc: Michał Winiarski michal.winiarski@intel.com
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/i915/gt/selftest_reset.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index 37c38bdd5f47..7a455583c687 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -3,6 +3,7 @@ * Copyright © 2018 Intel Corporation */
+#include <drm/drm_cache.h> #include <linux/crc32.h>
#include "gem/i915_gem_stolen.h" @@ -82,7 +83,7 @@ __igt_reset_stolen(struct intel_gt *gt, for (page = 0; page < num_pages; page++) { dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT); void __iomem *s; - void *in; + struct iosys_map src_map;
ggtt->vm.insert_page(&ggtt->vm, dma, ggtt->error_capture.start, @@ -98,10 +99,9 @@ __igt_reset_stolen(struct intel_gt *gt, ((page + 1) << PAGE_SHIFT) - 1)) memset_io(s, STACK_MAGIC, PAGE_SIZE);
- in = (void __force *)s; - if (i915_memcpy_from_wc(tmp, in, PAGE_SIZE)) - in = tmp; - crc[page] = crc32_le(0, in, PAGE_SIZE); + iosys_map_set_vaddr_iomem(&src_map, s); + drm_memcpy_from_wc_vaddr(tmp, &src_map, 0, PAGE_SIZE); + crc[page] = crc32_le(0, tmp, PAGE_SIZE);
io_mapping_unmap(s); } @@ -122,7 +122,7 @@ __igt_reset_stolen(struct intel_gt *gt, for (page = 0; page < num_pages; page++) { dma_addr_t dma = (dma_addr_t)dsm->start + (page << PAGE_SHIFT); void __iomem *s; - void *in; + struct iosys_map src_map; u32 x;
ggtt->vm.insert_page(&ggtt->vm, dma, @@ -134,10 +134,9 @@ __igt_reset_stolen(struct intel_gt *gt, ggtt->error_capture.start, PAGE_SIZE);
- in = (void __force *)s; - if (i915_memcpy_from_wc(tmp, in, PAGE_SIZE)) - in = tmp; - x = crc32_le(0, in, PAGE_SIZE); + iosys_map_set_vaddr_iomem(&src_map, s); + drm_memcpy_from_wc_vaddr(tmp, &src_map, 0, PAGE_SIZE); + x = crc32_le(0, tmp, PAGE_SIZE);
if (x != crc[page] && !__drm_mm_interval_first(>->i915->mm.stolen, @@ -146,7 +145,7 @@ __igt_reset_stolen(struct intel_gt *gt, pr_debug("unused stolen page %pa modified by GPU reset\n", &page); if (count++ == 0) - igt_hexdump(in, PAGE_SIZE); + igt_hexdump(tmp, PAGE_SIZE); max = page; }
Pointer passed to zlib_deflate() for compression could point to io mapped memory and might end up in direct derefencing. io mapped memory is copied to a temporary buffer, which is then shared to zlib_deflate(), only for the case where platform supports fast copy using non-temporal instructions. If the platform lacks support, then io mapped memory is directly used.
Direct dereferencing of io memory makes driver not portable outside x86 and should be avoided.
With this patch, io memory is always copied to a temporary buffer irrespective of platform support for fast copy. The i915_has_memcpy_from_wc() check is removed. And drm_memcpy_from_wc_vaddr() is now used for copying instead of i915_memcpy_from_wc() for 2 reasons. - i915_memcpy_from_wc() will be deprecated. - drm_memcpy_from_wc_vaddr() will not fail if the fast copy is not supported instead continues copying using memcpy_fromio as fallback.
Signed-off-by: Balasubramani Vivekanandan balasubramani.vivekanandan@intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/i915/i915_gpu_error.c | 45 +++++++++++++++------------ 1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 0512c66fa4f3..9cafacb4ceb6 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -262,9 +262,12 @@ static bool compress_init(struct i915_vma_compress *c) return false; }
- c->tmp = NULL; - if (i915_has_memcpy_from_wc()) - c->tmp = pool_alloc(&c->pool, ALLOW_FAIL); + c->tmp = pool_alloc(&c->pool, ALLOW_FAIL); + if (!c->tmp) { + kfree(zstream->workspace); + pool_fini(&c->pool); + return false; + }
return true; } @@ -296,15 +299,17 @@ static void *compress_next_page(struct i915_vma_compress *c, }
static int compress_page(struct i915_vma_compress *c, - void *src, - struct i915_vma_coredump *dst, - bool wc) + struct iosys_map *src, + struct i915_vma_coredump *dst) { struct z_stream_s *zstream = &c->zstream;
- zstream->next_in = src; - if (wc && c->tmp && i915_memcpy_from_wc(c->tmp, src, PAGE_SIZE)) + if (src->is_iomem) { + drm_memcpy_from_wc_vaddr(c->tmp, src, 0, PAGE_SIZE); zstream->next_in = c->tmp; + } else { + zstream->next_in = src->vaddr; + } zstream->avail_in = PAGE_SIZE;
do { @@ -393,9 +398,8 @@ static bool compress_start(struct i915_vma_compress *c) }
static int compress_page(struct i915_vma_compress *c, - void *src, - struct i915_vma_coredump *dst, - bool wc) + struct iosys_map *src, + struct i915_vma_coredump *dst) { void *ptr;
@@ -403,8 +407,7 @@ static int compress_page(struct i915_vma_compress *c, if (!ptr) return -ENOMEM;
- if (!(wc && i915_memcpy_from_wc(ptr, src, PAGE_SIZE))) - memcpy(ptr, src, PAGE_SIZE); + drm_memcpy_from_wc_vaddr(ptr, src, 0, PAGE_SIZE); list_add_tail(&virt_to_page(ptr)->lru, &dst->page_list); cond_resched();
@@ -1092,6 +1095,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, if (drm_mm_node_allocated(&ggtt->error_capture)) { void __iomem *s; dma_addr_t dma; + struct iosys_map src;
for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { mutex_lock(&ggtt->error_mutex); @@ -1100,9 +1104,8 @@ i915_vma_coredump_create(const struct intel_gt *gt, mb();
s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE); - ret = compress_page(compress, - (void __force *)s, dst, - true); + iosys_map_set_vaddr_iomem(&src, s); + ret = compress_page(compress, &src, dst); io_mapping_unmap(s);
mb(); @@ -1114,6 +1117,7 @@ i915_vma_coredump_create(const struct intel_gt *gt, } else if (vma_res->bi.lmem) { struct intel_memory_region *mem = vma_res->mr; dma_addr_t dma; + struct iosys_map src;
for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { void __iomem *s; @@ -1121,15 +1125,15 @@ i915_vma_coredump_create(const struct intel_gt *gt, s = io_mapping_map_wc(&mem->iomap, dma - mem->region.start, PAGE_SIZE); - ret = compress_page(compress, - (void __force *)s, dst, - true); + iosys_map_set_vaddr_iomem(&src, s); + ret = compress_page(compress, &src, dst); io_mapping_unmap(s); if (ret) break; } } else { struct page *page; + struct iosys_map src;
for_each_sgt_page(page, iter, vma_res->bi.pages) { void *s; @@ -1137,7 +1141,8 @@ i915_vma_coredump_create(const struct intel_gt *gt, drm_clflush_pages(&page, 1);
s = kmap(page); - ret = compress_page(compress, s, dst, false); + iosys_map_set_vaddr(&src, s); + ret = compress_page(compress, &src, dst); kunmap(page);
drm_clflush_pages(&page, 1);
dri-devel@lists.freedesktop.org