Hi all,
The last version of this work was sent a while ago here:
http://lists.freedesktop.org/archives/dri-devel/2015-August/089263.html
So let's recap this series:
1. it adds a vendor-independent client interface for mapping gem objects through prime, IOW it implements userspace mmap() on dma-buf fd. This could be used for texturing from CPU rendered buffer, passing buffers among processes without performing copies in the userspace. 2. the series lets the client write on the mmap'ed memory, and 3. it deals with GPU and CPU caches synchronization.
Based on previous discussions seems that people are fine with 1. and 2. but not really with 3., given that caches coherency is a bit more boring to deal with.
It's easier to use this new infra on "coherent hardware" (systems with the memory cache that is shared by the GPU and CPU) because they rarely need to use that kind of synchronization. But would be much more convenient to have the very same interface exposed for clients no matter whether the underlying hardware is cache coherent or not.
One idea that came up was to force clients to call the sync ioctls after the dma-buf was mmaped. But apparently there's no easy, and performant, way to do so cause seems too costly to go over the page table entry and check the dirty bits. Also, depending on the instructions order sent for the devices, it might be needed a sync call after the mapped region gets accessed as well, to flush all cachelines and make sure for example the GPU domain won't read stale data. So that would make the things even more complicated, if we ever decide to go to this direction of forcing sync ioctls. The alternative therefore is to simply document it very well, strong wording the clients to use the sync ioctl regardless otherwise they will mis-behave. Do we have objections or maybe other wiser ways to circumvent this? I've made similar comments in August and no one has came up with better ideas.
Lastly, the diff of v6 series is that I've basically addressed concerns pointed in the igt tests, organized those changes better a bit (in smaller patches), documented the usage of sync ioctls and I have extensively tested this in different types of hardware.
https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf... https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v6
Tiago
Daniel Thompson (1): drm: prime: Honour O_RDWR during prime-handle-to-fd
Daniel Vetter (1): dma-buf: Add ioctls to allow userspace to flush
Tiago Vignatti (3): dma-buf: Remove range-based flush drm/i915: Implement end_cpu_access drm/i915: Use CPU mapping for userspace dma-buf mmap()
Documentation/dma-buf-sharing.txt | 41 +++++++++++++++------- drivers/dma-buf/dma-buf.c | 56 ++++++++++++++++++++++++++----- drivers/gpu/drm/drm_prime.c | 10 ++---- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 42 +++++++++++++++++++++-- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 +-- drivers/gpu/drm/udl/udl_fb.c | 2 -- drivers/staging/android/ion/ion.c | 6 ++-- drivers/staging/android/ion/ion_test.c | 4 +-- include/linux/dma-buf.h | 12 +++---- include/uapi/drm/drm.h | 1 + include/uapi/linux/dma-buf.h | 38 +++++++++++++++++++++ 11 files changed, 169 insertions(+), 47 deletions(-) create mode 100644 include/uapi/linux/dma-buf.h
And the igt changes: Rob Bradford (1): prime_mmap: Add new test for calling mmap() on dma-buf fds
Tiago Vignatti (5): lib: Add gem_userptr and __gem_userptr helpers prime_mmap: Add basic tests to write in a bo using CPU lib: Add prime_sync_start and prime_sync_end helpers tests: Add kms_mmap_write_crc for cache coherency tests tests: Add prime_mmap_coherency for cache coherency tests
benchmarks/gem_userptr_benchmark.c | 55 +---- lib/ioctl_wrappers.c | 92 +++++++ lib/ioctl_wrappers.h | 32 +++ tests/Makefile.sources | 3 + tests/gem_userptr_blits.c | 104 ++------ tests/kms_mmap_write_crc.c | 281 +++++++++++++++++++++ tests/prime_mmap.c | 494 +++++++++++++++++++++++++++++++++++++ tests/prime_mmap_coherency.c | 246 ++++++++++++++++++ 8 files changed, 1180 insertions(+), 127 deletions(-) create mode 100644 tests/kms_mmap_write_crc.c create mode 100644 tests/prime_mmap.c create mode 100644 tests/prime_mmap_coherency.c
From: Daniel Thompson daniel.thompson@linaro.org
Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver.
It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls.
v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl.
Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- drivers/gpu/drm/drm_prime.c | 10 +++------- include/uapi/drm/drm.h | 1 + 2 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa718..df6cdc7 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { * drm_gem_prime_export - helper library implementation of the export callback * @dev: drm_device to export from * @obj: GEM object to export - * @flags: flags like DRM_CLOEXEC + * @flags: flags like DRM_CLOEXEC and DRM_RDWR * * This is the implementation of the gem_prime_export functions for GEM drivers * using the PRIME helpers. @@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_prime_handle *args = data; - uint32_t flags;
if (!drm_core_check_feature(dev, DRIVER_PRIME)) return -EINVAL; @@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, return -ENOSYS;
/* check flags are valid */ - if (args->flags & ~DRM_CLOEXEC) + if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR)) return -EINVAL;
- /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */ - flags = args->flags & DRM_CLOEXEC; - return dev->driver->prime_handle_to_fd(dev, file_priv, - args->handle, flags, &args->fd); + args->handle, args->flags, &args->fd); }
int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index b4e92eb..a0ebfe7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -669,6 +669,7 @@ struct drm_set_client_cap { __u64 value; };
+#define DRM_RDWR O_RDWR #define DRM_CLOEXEC O_CLOEXEC struct drm_prime_handle { __u32 handle;
This patch removes range-based information used for optimizations in begin_cpu_access and end_cpu_access.
We don't have any user nor implementation using range-based flush. It seems a consensus that if we ever want something like that again (or even more robust using 2D, 3D sub-range regions) we can use the upcoming dma-buf sync ioctl for such.
Cc: Sumit Semwal sumit.semwal@linaro.org Cc: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- Documentation/dma-buf-sharing.txt | 19 ++++++++----------- drivers/dma-buf/dma-buf.c | 13 ++++--------- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 2 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 ++-- drivers/gpu/drm/udl/udl_fb.c | 2 -- drivers/staging/android/ion/ion.c | 6 ++---- drivers/staging/android/ion/ion_test.c | 4 ++-- include/linux/dma-buf.h | 12 +++++------- 8 files changed, 24 insertions(+), 38 deletions(-)
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 480c8de..4f4a84b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -257,17 +257,15 @@ Access to a dma_buf from the kernel context involves three steps:
Interface: int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, - size_t start, size_t len, enum dma_data_direction direction)
This allows the exporter to ensure that the memory is actually available for cpu access - the exporter might need to allocate or swap-in and pin the backing storage. The exporter also needs to ensure that cpu access is - coherent for the given range and access direction. The range and access - direction can be used by the exporter to optimize the cache flushing, i.e. - access outside of the range or with a different direction (read instead of - write) might return stale or even bogus data (e.g. when the exporter needs to - copy the data to temporary storage). + coherent for the access direction. The direction can be used by the exporter + to optimize the cache flushing, i.e. access with a different direction (read + instead of write) might return stale or even bogus data (e.g. when the + exporter needs to copy the data to temporary storage).
This step might fail, e.g. in oom conditions.
@@ -322,14 +320,13 @@ Access to a dma_buf from the kernel context involves three steps:
3. Finish access
- When the importer is done accessing the range specified in begin_cpu_access, - it needs to announce this to the exporter (to facilitate cache flushing and - unpinning of any pinned resources). The result of any dma_buf kmap calls - after end_cpu_access is undefined. + When the importer is done accessing the CPU, it needs to announce this to + the exporter (to facilitate cache flushing and unpinning of any pinned + resources). The result of any dma_buf kmap calls after end_cpu_access is + undefined.
Interface: void dma_buf_end_cpu_access(struct dma_buf *dma_buf, - size_t start, size_t len, enum dma_data_direction dir);
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 155c146..b2ac13b 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -539,13 +539,11 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); * preparations. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf: [in] buffer to prepare cpu access for. - * @start: [in] start of range for cpu access. - * @len: [in] length of range for cpu access. * @direction: [in] length of range for cpu access. * * Can return negative error values, returns 0 on success. */ -int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, +int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { int ret = 0; @@ -554,8 +552,7 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, return -EINVAL;
if (dmabuf->ops->begin_cpu_access) - ret = dmabuf->ops->begin_cpu_access(dmabuf, start, - len, direction); + ret = dmabuf->ops->begin_cpu_access(dmabuf, direction);
return ret; } @@ -567,19 +564,17 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access); * actions. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf: [in] buffer to complete cpu access for. - * @start: [in] start of range for cpu access. - * @len: [in] length of range for cpu access. * @direction: [in] length of range for cpu access. * * This call must always succeed. */ -void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, +void dma_buf_end_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { WARN_ON(!dmabuf);
if (dmabuf->ops->end_cpu_access) - dmabuf->ops->end_cpu_access(dmabuf, start, len, direction); + dmabuf->ops->end_cpu_access(dmabuf, direction); } EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access);
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index e9c2bfd..65ab2bd 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -196,7 +196,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct * return -EINVAL; }
-static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t length, enum dma_data_direction direction) +static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) { struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); struct drm_device *dev = obj->base.dev; diff --git a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c index 27c2976..aebae1c 100644 --- a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c +++ b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c @@ -79,7 +79,7 @@ static void omap_gem_dmabuf_release(struct dma_buf *buffer)
static int omap_gem_dmabuf_begin_cpu_access(struct dma_buf *buffer, - size_t start, size_t len, enum dma_data_direction dir) + enum dma_data_direction dir) { struct drm_gem_object *obj = buffer->priv; struct page **pages; @@ -94,7 +94,7 @@ static int omap_gem_dmabuf_begin_cpu_access(struct dma_buf *buffer, }
static void omap_gem_dmabuf_end_cpu_access(struct dma_buf *buffer, - size_t start, size_t len, enum dma_data_direction dir) + enum dma_data_direction dir) { struct drm_gem_object *obj = buffer->priv; omap_gem_put_pages(obj); diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c index 200419d..c427499 100644 --- a/drivers/gpu/drm/udl/udl_fb.c +++ b/drivers/gpu/drm/udl/udl_fb.c @@ -409,7 +409,6 @@ static int udl_user_framebuffer_dirty(struct drm_framebuffer *fb,
if (ufb->obj->base.import_attach) { ret = dma_buf_begin_cpu_access(ufb->obj->base.import_attach->dmabuf, - 0, ufb->obj->base.size, DMA_FROM_DEVICE); if (ret) goto unlock; @@ -425,7 +424,6 @@ static int udl_user_framebuffer_dirty(struct drm_framebuffer *fb,
if (ufb->obj->base.import_attach) { dma_buf_end_cpu_access(ufb->obj->base.import_attach->dmabuf, - 0, ufb->obj->base.size, DMA_FROM_DEVICE); }
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index e237e9f..0754a37 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -1057,8 +1057,7 @@ static void ion_dma_buf_kunmap(struct dma_buf *dmabuf, unsigned long offset, { }
-static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, - size_t len, +static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { struct ion_buffer *buffer = dmabuf->priv; @@ -1076,8 +1075,7 @@ static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, return PTR_ERR_OR_ZERO(vaddr); }
-static void ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t start, - size_t len, +static void ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { struct ion_buffer *buffer = dmabuf->priv; diff --git a/drivers/staging/android/ion/ion_test.c b/drivers/staging/android/ion/ion_test.c index b8dcf5a..da34bc12 100644 --- a/drivers/staging/android/ion/ion_test.c +++ b/drivers/staging/android/ion/ion_test.c @@ -109,7 +109,7 @@ static int ion_handle_test_kernel(struct dma_buf *dma_buf, void __user *ptr, if (offset > dma_buf->size || size > dma_buf->size - offset) return -EINVAL;
- ret = dma_buf_begin_cpu_access(dma_buf, offset, size, dir); + ret = dma_buf_begin_cpu_access(dma_buf, dir); if (ret) return ret;
@@ -139,7 +139,7 @@ static int ion_handle_test_kernel(struct dma_buf *dma_buf, void __user *ptr, copy_offset = 0; } err: - dma_buf_end_cpu_access(dma_buf, offset, size, dir); + dma_buf_end_cpu_access(dma_buf, dir); return ret; }
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index f98bd70..532108e 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -54,7 +54,7 @@ struct dma_buf_attachment; * @release: release this buffer; to be called after the last dma_buf_put. * @begin_cpu_access: [optional] called before cpu access to invalidate cpu * caches and allocate backing storage (if not yet done) - * respectively pin the objet into memory. + * respectively pin the object into memory. * @end_cpu_access: [optional] called after cpu access to flush caches. * @kmap_atomic: maps a page from the buffer into kernel address * space, users may not block until the subsequent unmap call. @@ -93,10 +93,8 @@ struct dma_buf_ops { /* after final dma_buf_put() */ void (*release)(struct dma_buf *);
- int (*begin_cpu_access)(struct dma_buf *, size_t, size_t, - enum dma_data_direction); - void (*end_cpu_access)(struct dma_buf *, size_t, size_t, - enum dma_data_direction); + int (*begin_cpu_access)(struct dma_buf *, enum dma_data_direction); + void (*end_cpu_access)(struct dma_buf *, enum dma_data_direction); void *(*kmap_atomic)(struct dma_buf *, unsigned long); void (*kunmap_atomic)(struct dma_buf *, unsigned long, void *); void *(*kmap)(struct dma_buf *, unsigned long); @@ -224,9 +222,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction); void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *, enum dma_data_direction); -int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t len, +int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction dir); -void dma_buf_end_cpu_access(struct dma_buf *dma_buf, size_t start, size_t len, +void dma_buf_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction dir); void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long); void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *);
From: Daniel Vetter daniel.vetter@ffwll.ch
The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more
v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls.
Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..2ddd4b2 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.
- No special interfaces, userspace simply calls mmap on the dma-buf fd. + No special interfaces, userspace simply calls mmap on the dma-buf fd. Very + important to note though is that, even if it is not mandatory, the userspace + is strongly recommended to always use the cache synchronization ioctl + (DMA_BUF_IOCTL_SYNC) discussed next. + + Some systems might need some sort of cache coherency management e.g. when + CPU and GPU domains are being accessed through dma-buf at the same time. To + circumvent this problem there are begin/end coherency markers, that forward + directly to existing dma-buf device drivers vfunc hooks. Userspace can make + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence + would be used like following: + - mmap dma-buf fd + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you + want (with the new data being consumed by the GPU or say scanout device) + - munmap once you don't need the buffer any more + + In principle systems with the memory cache shared by the GPU and CPU may + not need SYNC_START and SYNC_END but still, userspace is always encouraged + to use these ioctls before and after, respectively, when accessing the + mapped address.
2. Supporting existing mmap interfaces in importers
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>
+#include <uapi/linux/dma-buf.h> + static inline int is_dma_buf_file(struct file *);
struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; }
+static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & DMA_BUF_SYNC_RW) + direction = DMA_BIDIRECTIONAL; + else if (sync.flags & DMA_BUF_SYNC_READ) + direction = DMA_FROM_DEVICE; + else if (sync.flags & DMA_BUF_SYNC_WRITE) + direction = DMA_TO_DEVICE; + else + return -EINVAL; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + return -EINVAL; + + if (sync.flags & DMA_BUF_SYNC_END) + dma_buf_end_cpu_access(dmabuf, direction); + else + dma_buf_begin_cpu_access(dmabuf, direction); + + return 0; + default: + return -ENOTTY; + } +} + static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll, + .unlocked_ioctl = dma_buf_ioctl, };
/* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..bd195f2 --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/* + * Framework for buffer objects that can be shared across devices/subsystems. + * + * Copyright(C) 2015 Intel Ltd + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_ + +/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync { + __u64 flags; +}; + +#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (3 << 0) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \ + (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) + +#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync) + +#endif
On Wed, Dec 16, 2015 at 5:25 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:
From: Daniel Vetter daniel.vetter@ffwll.ch
The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more
v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls.
Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com
Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..2ddd4b2 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.
- No special interfaces, userspace simply calls mmap on the dma-buf fd.
- No special interfaces, userspace simply calls mmap on the dma-buf fd. Very
- important to note though is that, even if it is not mandatory, the userspace
- is strongly recommended to always use the cache synchronization ioctl
- (DMA_BUF_IOCTL_SYNC) discussed next.
- Some systems might need some sort of cache coherency management e.g. when
- CPU and GPU domains are being accessed through dma-buf at the same time. To
- circumvent this problem there are begin/end coherency markers, that forward
- directly to existing dma-buf device drivers vfunc hooks. Userspace can make
- use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
- would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
- In principle systems with the memory cache shared by the GPU and CPU may
- not need SYNC_START and SYNC_END but still, userspace is always encouraged
- to use these ioctls before and after, respectively, when accessing the
- mapped address.
- Supporting existing mmap interfaces in importers
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>
+#include <uapi/linux/dma-buf.h>
static inline int is_dma_buf_file(struct file *);
struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; }
+static long dma_buf_ioctl(struct file *file,
unsigned int cmd, unsigned long arg)
+{
struct dma_buf *dmabuf;
struct dma_buf_sync sync;
enum dma_data_direction direction;
dmabuf = file->private_data;
if (!is_dma_buf_file(file))
return -EINVAL;
switch (cmd) {
case DMA_BUF_IOCTL_SYNC:
if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
return -EFAULT;
if (sync.flags & DMA_BUF_SYNC_RW)
direction = DMA_BIDIRECTIONAL;
else if (sync.flags & DMA_BUF_SYNC_READ)
direction = DMA_FROM_DEVICE;
else if (sync.flags & DMA_BUF_SYNC_WRITE)
direction = DMA_TO_DEVICE;
else
return -EINVAL;
if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
return -EINVAL;
if (sync.flags & DMA_BUF_SYNC_END)
dma_buf_end_cpu_access(dmabuf, direction);
else
dma_buf_begin_cpu_access(dmabuf, direction);
return 0;
default:
return -ENOTTY;
}
+}
static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
.unlocked_ioctl = dma_buf_ioctl,
};
/* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..bd195f2 --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/*
- Framework for buffer objects that can be shared across devices/subsystems.
- Copyright(C) 2015 Intel Ltd
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License version 2 as published by
- the Free Software Foundation.
- This program is distributed in the hope that it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
- You should have received a copy of the GNU General Public License along with
- this program. If not, see http://www.gnu.org/licenses/.
- */
+#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_
+/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync {
__u64 flags;
+};
+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (3 << 0)
Maybe make this:
#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE)
or maybe:
#define DMA_BUF_SYNC_RW_MASK (3 << 0)
+#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2)
if you go with the mask above, maybe: #define DMA_BUF_SYNC_MASK (1 << 2)
+#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
(DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
+#endif
2.1.4
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On 12/17/2015 04:19 PM, Alex Deucher wrote:
On Wed, Dec 16, 2015 at 5:25 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:
From: Daniel Vetter daniel.vetter@ffwll.ch
The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more
v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls.
Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com
Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..2ddd4b2 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.
- No special interfaces, userspace simply calls mmap on the dma-buf fd.
No special interfaces, userspace simply calls mmap on the dma-buf fd. Very
important to note though is that, even if it is not mandatory, the userspace
is strongly recommended to always use the cache synchronization ioctl
(DMA_BUF_IOCTL_SYNC) discussed next.
Some systems might need some sort of cache coherency management e.g. when
CPU and GPU domains are being accessed through dma-buf at the same time. To
circumvent this problem there are begin/end coherency markers, that forward
directly to existing dma-buf device drivers vfunc hooks. Userspace can make
use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
In principle systems with the memory cache shared by the GPU and CPU may
not need SYNC_START and SYNC_END but still, userspace is always encouraged
to use these ioctls before and after, respectively, when accessing the
mapped address.
- Supporting existing mmap interfaces in importers
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>
+#include <uapi/linux/dma-buf.h>
static inline int is_dma_buf_file(struct file *);
struct dma_buf_list {
@@ -251,11 +253,52 @@ out: return events; }
+static long dma_buf_ioctl(struct file *file,
unsigned int cmd, unsigned long arg)
+{
struct dma_buf *dmabuf;
struct dma_buf_sync sync;
enum dma_data_direction direction;
dmabuf = file->private_data;
if (!is_dma_buf_file(file))
return -EINVAL;
switch (cmd) {
case DMA_BUF_IOCTL_SYNC:
if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
return -EFAULT;
if (sync.flags & DMA_BUF_SYNC_RW)
direction = DMA_BIDIRECTIONAL;
else if (sync.flags & DMA_BUF_SYNC_READ)
direction = DMA_FROM_DEVICE;
else if (sync.flags & DMA_BUF_SYNC_WRITE)
direction = DMA_TO_DEVICE;
else
return -EINVAL;
if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
return -EINVAL;
if (sync.flags & DMA_BUF_SYNC_END)
dma_buf_end_cpu_access(dmabuf, direction);
else
dma_buf_begin_cpu_access(dmabuf, direction);
return 0;
default:
return -ENOTTY;
}
+}
static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
.unlocked_ioctl = dma_buf_ioctl,
};
/*
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..bd195f2 --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/*
- Framework for buffer objects that can be shared across devices/subsystems.
- Copyright(C) 2015 Intel Ltd
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License version 2 as published by
- the Free Software Foundation.
- This program is distributed in the hope that it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
- You should have received a copy of the GNU General Public License along with
- this program. If not, see http://www.gnu.org/licenses/.
- */
+#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_
+/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync {
__u64 flags;
+};
+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (3 << 0)
Maybe make this:
#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE)
or maybe:
#define DMA_BUF_SYNC_RW_MASK (3 << 0)
Thanks Alex. I think I like your first option for being a bit more suggestive than the other -- I'm changing for it now.
Tiago
On 12/16/2015 11:25 PM, Tiago Vignatti wrote:
From: Daniel Vetter daniel.vetter@ffwll.ch
The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more
v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls.
Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com
Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..2ddd4b2 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.
- No special interfaces, userspace simply calls mmap on the dma-buf fd.
- No special interfaces, userspace simply calls mmap on the dma-buf fd. Very
- important to note though is that, even if it is not mandatory, the userspace
- is strongly recommended to always use the cache synchronization ioctl
- (DMA_BUF_IOCTL_SYNC) discussed next.
- Some systems might need some sort of cache coherency management e.g. when
- CPU and GPU domains are being accessed through dma-buf at the same time. To
- circumvent this problem there are begin/end coherency markers, that forward
- directly to existing dma-buf device drivers vfunc hooks. Userspace can make
- use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
- would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
- In principle systems with the memory cache shared by the GPU and CPU may
- not need SYNC_START and SYNC_END but still, userspace is always encouraged
- to use these ioctls before and after, respectively, when accessing the
- mapped address.
I think the wording here is far too weak. If this is a generic user-space interface and syncing is required for a) Correctness: then syncing must be mandatory. b) Optimal performance then an implementation must generate expected results also in the absence of SYNC ioctls, but is allowed to rely on correct pairing of SYNC_START and SYNC_END to render correctly.
Also recalling the discussion of multidimensional sync, which we said we would add when it was needed, my worst nightmare is when (not if) there are a number of important applications starting to abuse this interface for small writes or reads to large DMA buffers. It will work flawlessly on coherent architectures, and probably some incoherent architectures as well, but will probably be mostly useless on VMware. What is the plan for adding 2D sync to make it work? How do we pursuade app developers to think of damage regions, and can we count on support from the DRM community when this happens?
/Thomas
- Supporting existing mmap interfaces in importers
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>
+#include <uapi/linux/dma-buf.h>
static inline int is_dma_buf_file(struct file *);
struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; }
+static long dma_buf_ioctl(struct file *file,
unsigned int cmd, unsigned long arg)
+{
- struct dma_buf *dmabuf;
- struct dma_buf_sync sync;
- enum dma_data_direction direction;
- dmabuf = file->private_data;
- if (!is_dma_buf_file(file))
return -EINVAL;
- switch (cmd) {
- case DMA_BUF_IOCTL_SYNC:
if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
return -EFAULT;
if (sync.flags & DMA_BUF_SYNC_RW)
direction = DMA_BIDIRECTIONAL;
else if (sync.flags & DMA_BUF_SYNC_READ)
direction = DMA_FROM_DEVICE;
else if (sync.flags & DMA_BUF_SYNC_WRITE)
direction = DMA_TO_DEVICE;
else
return -EINVAL;
if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
return -EINVAL;
if (sync.flags & DMA_BUF_SYNC_END)
dma_buf_end_cpu_access(dmabuf, direction);
else
dma_buf_begin_cpu_access(dmabuf, direction);
return 0;
- default:
return -ENOTTY;
- }
+}
static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
- .unlocked_ioctl = dma_buf_ioctl,
};
/* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..bd195f2 --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/*
- Framework for buffer objects that can be shared across devices/subsystems.
- Copyright(C) 2015 Intel Ltd
- This program is free software; you can redistribute it and/or modify it
- under the terms of the GNU General Public License version 2 as published by
- the Free Software Foundation.
- This program is distributed in the hope that it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
- You should have received a copy of the GNU General Public License along with
- this program. If not, see http://www.gnu.org/licenses/.
- */
+#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_
+/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync {
- __u64 flags;
+};
+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (3 << 0) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
- (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
+#endif
On Thu, Dec 17, 2015 at 10:58:20PM +0100, Thomas Hellstrom wrote:
On 12/16/2015 11:25 PM, Tiago Vignatti wrote:
From: Daniel Vetter daniel.vetter@ffwll.ch
The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more
v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls.
Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com
Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..2ddd4b2 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.
- No special interfaces, userspace simply calls mmap on the dma-buf fd.
- No special interfaces, userspace simply calls mmap on the dma-buf fd. Very
- important to note though is that, even if it is not mandatory, the userspace
- is strongly recommended to always use the cache synchronization ioctl
- (DMA_BUF_IOCTL_SYNC) discussed next.
- Some systems might need some sort of cache coherency management e.g. when
- CPU and GPU domains are being accessed through dma-buf at the same time. To
- circumvent this problem there are begin/end coherency markers, that forward
- directly to existing dma-buf device drivers vfunc hooks. Userspace can make
- use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
- would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
- In principle systems with the memory cache shared by the GPU and CPU may
- not need SYNC_START and SYNC_END but still, userspace is always encouraged
- to use these ioctls before and after, respectively, when accessing the
- mapped address.
I think the wording here is far too weak. If this is a generic user-space interface and syncing is required for a) Correctness: then syncing must be mandatory. b) Optimal performance then an implementation must generate expected results also in the absence of SYNC ioctls, but is allowed to rely on correct pairing of SYNC_START and SYNC_END to render correctly.
Yeah I guess the wroking should be stronger to make it clear you better call these, always.
Also recalling the discussion of multidimensional sync, which we said we would add when it was needed, my worst nightmare is when (not if) there are a number of important applications starting to abuse this interface for small writes or reads to large DMA buffers. It will work flawlessly on coherent architectures, and probably some incoherent architectures as well, but will probably be mostly useless on VMware. What is the plan for adding 2D sync to make it work? How do we pursuade app developers to think of damage regions, and can we count on support from the DRM community when this happens?
The current use-case in cros seems to be to do full-blown buffer uploads, nothing partial. I don't think there's anything we can do really (flushing for small uploads will make i915 crawl too, albeit you can still see a slideshow and it's not a complete stop), except maybe improve the docs with a statement that trying to use dma-buf mmap for small uploads will be result in horrible performance. I think inevitable reality is that some truly horrible software will always ship. I am pretty hopeful though that this won't be too common, since the main users for anything dma-buf are on mobile, and those systems (at least on i915) are all incoherent. So as long as it'll run acceptably on i915 I think it should run at least ok-ish on vmwgfx too. -Daniel
On 12/17/2015 07:58 PM, Thomas Hellstrom wrote:
On 12/16/2015 11:25 PM, Tiago Vignatti wrote:
From: Daniel Vetter daniel.vetter@ffwll.ch
The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more
v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls.
Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com
Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..2ddd4b2 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.
- No special interfaces, userspace simply calls mmap on the dma-buf fd.
- No special interfaces, userspace simply calls mmap on the dma-buf fd. Very
- important to note though is that, even if it is not mandatory, the userspace
- is strongly recommended to always use the cache synchronization ioctl
- (DMA_BUF_IOCTL_SYNC) discussed next.
- Some systems might need some sort of cache coherency management e.g. when
- CPU and GPU domains are being accessed through dma-buf at the same time. To
- circumvent this problem there are begin/end coherency markers, that forward
- directly to existing dma-buf device drivers vfunc hooks. Userspace can make
- use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence
- would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
- In principle systems with the memory cache shared by the GPU and CPU may
- not need SYNC_START and SYNC_END but still, userspace is always encouraged
- to use these ioctls before and after, respectively, when accessing the
- mapped address.
I think the wording here is far too weak. If this is a generic user-space interface and syncing is required for a) Correctness: then syncing must be mandatory. b) Optimal performance then an implementation must generate expected results also in the absence of SYNC ioctls, but is allowed to rely on correct pairing of SYNC_START and SYNC_END to render correctly.
Thomas, do you think the following write-up captures this?
- No special interfaces, userspace simply calls mmap on the dma-buf fd. + No special interfaces, userspace simply calls mmap on the dma-buf fd, making + sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always* + used when the access happens. This is discussed next paragraphs. + + Some systems might need some sort of cache coherency management e.g. when + CPU and GPU domains are being accessed through dma-buf at the same time. To + circumvent this problem there are begin/end coherency markers, that forward + directly to existing dma-buf device drivers vfunc hooks. Userspace can make + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence + would be used like following: + - mmap dma-buf fd + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you + want (with the new data being consumed by the GPU or say scanout device) + - munmap once you don't need the buffer any more + + Therefore, for correctness and optimal performance, systems with the memory + cache shared by the GPU and CPU i.e. the "coherent" and also "incoherent" + systems are always required to use SYNC_START and SYNC_END before and + after, respectively, when accessing the mapped address.
Thank you,
Tiago
On 12/18/2015 08:50 PM, Tiago Vignatti wrote:
On 12/17/2015 07:58 PM, Thomas Hellstrom wrote:
On 12/16/2015 11:25 PM, Tiago Vignatti wrote:
From: Daniel Vetter daniel.vetter@ffwll.ch
The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more
v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls.
Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com
Documentation/dma-buf-sharing.txt | 22 +++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 102 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..2ddd4b2 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,27 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.
- No special interfaces, userspace simply calls mmap on the
dma-buf fd.
- No special interfaces, userspace simply calls mmap on the
dma-buf fd. Very
- important to note though is that, even if it is not mandatory,
the userspace
- is strongly recommended to always use the cache synchronization
ioctl
- (DMA_BUF_IOCTL_SYNC) discussed next.
- Some systems might need some sort of cache coherency management
e.g. when
- CPU and GPU domains are being accessed through dma-buf at the
same time. To
- circumvent this problem there are begin/end coherency markers,
that forward
- directly to existing dma-buf device drivers vfunc hooks.
Userspace can make
- use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The
sequence
- would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2.
read/write
to mmap area 3. SYNC_END ioctl. This can be repeated as
often as you
want (with the new data being consumed by the GPU or say
scanout device)
- munmap once you don't need the buffer any more
- In principle systems with the memory cache shared by the GPU
and CPU may
- not need SYNC_START and SYNC_END but still, userspace is always
encouraged
- to use these ioctls before and after, respectively, when
accessing the
- mapped address.
I think the wording here is far too weak. If this is a generic user-space interface and syncing is required for a) Correctness: then syncing must be mandatory. b) Optimal performance then an implementation must generate expected results also in the absence of SYNC ioctls, but is allowed to rely on correct pairing of SYNC_START and SYNC_END to render correctly.
Thomas, do you think the following write-up captures this?
- No special interfaces, userspace simply calls mmap on the dma-buf fd.
- No special interfaces, userspace simply calls mmap on the dma-buf
fd, making
- sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is
*always*
- used when the access happens. This is discussed next paragraphs.
- Some systems might need some sort of cache coherency management
e.g. when
- CPU and GPU domains are being accessed through dma-buf at the same
time. To
- circumvent this problem there are begin/end coherency markers,
that forward
- directly to existing dma-buf device drivers vfunc hooks. Userspace
can make
- use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The
sequence
- would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2.
read/write
to mmap area 3. SYNC_END ioctl. This can be repeated as often
as you
want (with the new data being consumed by the GPU or say
scanout device)
- munmap once you don't need the buffer any more
- Therefore, for correctness and optimal performance, systems with
the memory
- cache shared by the GPU and CPU i.e. the "coherent" and also
"incoherent"
- systems are always required to use SYNC_START and SYNC_END before
and
- after, respectively, when accessing the mapped address.
Thank you,
Tiago
Yes, that sounds better,
Thanks, Thomas
This function is meant to be used with dma-buf mmap, when finishing the CPU access of the mapped pointer.
The error case should be rare to happen though, requiring the buffer become active during the sync period and for the end_cpu_access to be interrupted. So we use a uninterruptible mutex_lock to spit out when it ever happens.
v2: disable interruption to make sure errors are reported. v3: update to the new end_cpu_access API.
Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 65ab2bd..9dba876 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -212,6 +212,27 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire return ret; }
+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + struct drm_device *dev = obj->base.dev; + struct drm_i915_private *dev_priv = to_i915(dev); + bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE); + int ret; + + mutex_lock(&dev->struct_mutex); + was_interruptible = dev_priv->mm.interruptible; + dev_priv->mm.interruptible = false; + + ret = i915_gem_object_set_to_gtt_domain(obj, write); + + dev_priv->mm.interruptible = was_interruptible; + mutex_unlock(&dev->struct_mutex); + + if (unlikely(ret)) + DRM_ERROR("unable to flush buffer following CPU access; rendering may be corrupt\n"); +} + static const struct dma_buf_ops i915_dmabuf_ops = { .map_dma_buf = i915_gem_map_dma_buf, .unmap_dma_buf = i915_gem_unmap_dma_buf, @@ -224,6 +245,7 @@ static const struct dma_buf_ops i915_dmabuf_ops = { .vmap = i915_gem_dmabuf_vmap, .vunmap = i915_gem_dmabuf_vunmap, .begin_cpu_access = i915_gem_begin_cpu_access, + .end_cpu_access = i915_gem_end_cpu_access, };
struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
On Wed, Dec 16, 2015 at 08:25:36PM -0200, Tiago Vignatti wrote:
This function is meant to be used with dma-buf mmap, when finishing the CPU access of the mapped pointer.
+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) +{
- struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
- struct drm_device *dev = obj->base.dev;
- struct drm_i915_private *dev_priv = to_i915(dev);
- bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE);
- int ret;
- mutex_lock(&dev->struct_mutex);
- was_interruptible = dev_priv->mm.interruptible;
- dev_priv->mm.interruptible = false;
- ret = i915_gem_object_set_to_gtt_domain(obj, write);
This only needs to pass .write=false. The dma-buf direction is only for the period of the user access, and we are now flushing the caches. This is equivalent to the sw-finish ioctl and ideally we just want the i915_gem_object_flush_cpu_write_domain(). -Chris
On 12/17/2015 06:01 AM, Chris Wilson wrote:
On Wed, Dec 16, 2015 at 08:25:36PM -0200, Tiago Vignatti wrote:
This function is meant to be used with dma-buf mmap, when finishing the CPU access of the mapped pointer.
+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) +{
- struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
- struct drm_device *dev = obj->base.dev;
- struct drm_i915_private *dev_priv = to_i915(dev);
- bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE);
- int ret;
- mutex_lock(&dev->struct_mutex);
- was_interruptible = dev_priv->mm.interruptible;
- dev_priv->mm.interruptible = false;
- ret = i915_gem_object_set_to_gtt_domain(obj, write);
This only needs to pass .write=false. The dma-buf direction is only for the period of the user access, and we are now flushing the caches. This is equivalent to the sw-finish ioctl and ideally we just want the i915_gem_object_flush_cpu_write_domain().
in fact the only usage so far I found for end_cpu_access is when the pinned buffer is scanout out. Should I pretty much copy sw-finish in end_cpu_access then?
Thanks,
Tiago
On 12/18/2015 05:02 PM, Tiago Vignatti wrote:
On 12/17/2015 06:01 AM, Chris Wilson wrote:
On Wed, Dec 16, 2015 at 08:25:36PM -0200, Tiago Vignatti wrote:
This function is meant to be used with dma-buf mmap, when finishing the CPU access of the mapped pointer.
+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) +{
- struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
- struct drm_device *dev = obj->base.dev;
- struct drm_i915_private *dev_priv = to_i915(dev);
- bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL
|| direction == DMA_TO_DEVICE);
- int ret;
- mutex_lock(&dev->struct_mutex);
- was_interruptible = dev_priv->mm.interruptible;
- dev_priv->mm.interruptible = false;
- ret = i915_gem_object_set_to_gtt_domain(obj, write);
This only needs to pass .write=false. The dma-buf direction is only for the period of the user access, and we are now flushing the caches. This is equivalent to the sw-finish ioctl and ideally we just want the i915_gem_object_flush_cpu_write_domain().
in fact the only usage so far I found for end_cpu_access is when the pinned buffer is scanout out. Should I pretty much copy sw-finish in end_cpu_access then?
And do you think it's okay to declare i915_gem_object_flush_cpu_write_domain outside its file's only scope?
Tiago
On Fri, Dec 18, 2015 at 05:19:24PM -0200, Tiago Vignatti wrote:
On 12/18/2015 05:02 PM, Tiago Vignatti wrote:
On 12/17/2015 06:01 AM, Chris Wilson wrote:
On Wed, Dec 16, 2015 at 08:25:36PM -0200, Tiago Vignatti wrote:
This function is meant to be used with dma-buf mmap, when finishing the CPU access of the mapped pointer.
+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) +{
- struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
- struct drm_device *dev = obj->base.dev;
- struct drm_i915_private *dev_priv = to_i915(dev);
- bool was_interruptible, write = (direction == DMA_BIDIRECTIONAL
|| direction == DMA_TO_DEVICE);
- int ret;
- mutex_lock(&dev->struct_mutex);
- was_interruptible = dev_priv->mm.interruptible;
- dev_priv->mm.interruptible = false;
- ret = i915_gem_object_set_to_gtt_domain(obj, write);
This only needs to pass .write=false. The dma-buf direction is only for the period of the user access, and we are now flushing the caches. This is equivalent to the sw-finish ioctl and ideally we just want the i915_gem_object_flush_cpu_write_domain().
in fact the only usage so far I found for end_cpu_access is when the pinned buffer is scanout out. Should I pretty much copy sw-finish in end_cpu_access then?
And do you think it's okay to declare i915_gem_object_flush_cpu_write_domain outside its file's only scope?
Whilst the simplicity of just doing the flush appeals, calling set_gtt_domain(write=false) isn't that much heavier (the difference will be lost in the noise of any clflushing) and is going to be always correct. Whereas just flushing the CPU domain may be a hassle for us in future. -Chris
Userspace is the one in charge of flush CPU by wrapping mmap with begin{,end}_cpu_access.
v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return before transferring ownership when mmap fails. v3: Fix return values. v4: !obj->base.filp is user triggerable, so removed the WARN_ON.
Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 9dba876..b7e7a90 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_n
static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) { - return -EINVAL; + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + int ret; + + if (obj->base.size < vma->vm_end - vma->vm_start) + return -EINVAL; + + if (!obj->base.filp) + return -ENODEV; + + ret = obj->base.filp->f_op->mmap(obj->base.filp, vma); + if (ret) + return ret; + + fput(vma->vm_file); + vma->vm_file = get_file(obj->base.filp); + + return 0; }
static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction)
This patch moves userptr definitions and helpers implementation that were locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other tests can make use of them as well. There's no functional changes.
v2: added __ function to differentiate when errors want to be handled back in the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc.
Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- benchmarks/gem_userptr_benchmark.c | 55 +++----------------- lib/ioctl_wrappers.c | 41 +++++++++++++++ lib/ioctl_wrappers.h | 13 +++++ tests/gem_userptr_blits.c | 104 ++++++++++--------------------------- 4 files changed, 86 insertions(+), 127 deletions(-)
diff --git a/benchmarks/gem_userptr_benchmark.c b/benchmarks/gem_userptr_benchmark.c index 1eae7ff..f7716df 100644 --- a/benchmarks/gem_userptr_benchmark.c +++ b/benchmarks/gem_userptr_benchmark.c @@ -58,17 +58,6 @@ #define PAGE_SIZE 4096 #endif
-#define LOCAL_I915_GEM_USERPTR 0x33 -#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr) -struct local_i915_gem_userptr { - uint64_t user_ptr; - uint64_t user_size; - uint32_t flags; -#define LOCAL_I915_USERPTR_READ_ONLY (1<<0) -#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31) - uint32_t handle; -}; - static uint32_t userptr_flags = LOCAL_I915_USERPTR_UNSYNCHRONIZED;
#define BO_SIZE (65536) @@ -83,30 +72,6 @@ static void gem_userptr_test_synchronized(void) userptr_flags = 0; }
-static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t *handle) -{ - struct local_i915_gem_userptr userptr; - int ret; - - userptr.user_ptr = (uintptr_t)ptr; - userptr.user_size = size; - userptr.flags = userptr_flags; - if (read_only) - userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY; - - ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr); - if (ret) - ret = errno; - igt_skip_on_f(ret == ENODEV && - (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 && - !read_only, - "Skipping, synchronized mappings with no kernel CONFIG_MMU_NOTIFIER?"); - if (ret == 0) - *handle = userptr.handle; - - return ret; -} - static void **handle_ptr_map; static unsigned int num_handle_ptr_map;
@@ -144,8 +109,7 @@ static uint32_t create_userptr_bo(int fd, int size) ret = posix_memalign(&ptr, PAGE_SIZE, size); igt_assert(ret == 0);
- ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle); - igt_assert(ret == 0); + gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle); add_handle_ptr(handle, ptr);
return handle; @@ -167,7 +131,7 @@ static int has_userptr(int fd) assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0); oldflags = userptr_flags; gem_userptr_test_unsynchronized(); - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle); userptr_flags = oldflags; if (ret != 0) { free(ptr); @@ -379,9 +343,7 @@ static void test_impact_overlap(int fd, const char *prefix)
for (i = 0, p = block; i < nr_bos[subtest]; i++, p += PAGE_SIZE) - ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, - &handles[i]); - igt_assert(ret == 0); + gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, userptr_flags, &handles[i]); }
if (nr_bos[subtest] > 0) @@ -427,7 +389,6 @@ static void test_single(int fd) char *ptr, *bo_ptr; uint32_t handle = 0; unsigned long iter = 0; - int ret; unsigned long map_size = BO_SIZE + PAGE_SIZE - 1;
ptr = mmap(NULL, map_size, PROT_READ | PROT_WRITE, @@ -439,8 +400,7 @@ static void test_single(int fd) start_test(test_duration_sec);
while (run_test) { - ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, &handle); - assert(ret == 0); + gem_userptr(fd, bo_ptr, BO_SIZE, 0, userptr_flags, &handle); gem_close(fd, handle); iter++; } @@ -456,7 +416,6 @@ static void test_multiple(int fd, unsigned int batch, int random) uint32_t handles[10000]; int map[10000]; unsigned long iter = 0; - int ret; int i; unsigned long map_size = batch * BO_SIZE + PAGE_SIZE - 1;
@@ -478,10 +437,8 @@ static void test_multiple(int fd, unsigned int batch, int random) if (random) igt_permute_array(map, batch, igt_exchange_int); for (i = 0; i < batch; i++) { - ret = gem_userptr(fd, bo_ptr + map[i] * BO_SIZE, - BO_SIZE, - 0, &handles[i]); - assert(ret == 0); + gem_userptr(fd, bo_ptr + map[i] * BO_SIZE, BO_SIZE, + 0, userptr_flags, &handles[i]); } if (random) igt_permute_array(map, batch, igt_exchange_int); diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index e348f26..6cad8a2 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -871,6 +871,47 @@ void gem_context_require_ban_period(int fd) igt_require(has_ban_period); }
+int __gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, uint32_t *handle) +{ + struct local_i915_gem_userptr userptr; + int ret; + + memset(&userptr, 0, sizeof(userptr)); + userptr.user_ptr = (uintptr_t)ptr; + userptr.user_size = size; + userptr.flags = flags; + if (read_only) + userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY; + + ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr); + if (ret) + ret = errno; + igt_skip_on_f(ret == ENODEV && + (flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 && + !read_only, + "Skipping, synchronized mappings with no kernel CONFIG_MMU_NOTIFIER?"); + if (ret == 0) + *handle = userptr.handle; + + return ret; +} + +/** + * gem_userptr: + * @fd: open i915 drm file descriptor + * @ptr: userptr pointer to be passed + * @size: desired size of the buffer + * @read_only: specify whether userptr is opened read only + * @flags: other userptr flags + * @handle: returned handle for the object + * + * Returns userptr handle for the GEM object. + */ +void gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, uint32_t *handle) +{ + igt_assert_eq(__gem_userptr(fd, ptr, size, read_only, flags, handle), 0); +} + /** * gem_sw_finish: * @fd: open i915 drm file descriptor diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index fe2f687..bb8a858 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -112,6 +112,19 @@ void gem_context_require_param(int fd, uint64_t param); void gem_context_get_param(int fd, struct local_i915_gem_context_param *p); void gem_context_set_param(int fd, struct local_i915_gem_context_param *p);
+#define LOCAL_I915_GEM_USERPTR 0x33 +#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr) +struct local_i915_gem_userptr { + uint64_t user_ptr; + uint64_t user_size; + uint32_t flags; +#define LOCAL_I915_USERPTR_READ_ONLY (1<<0) +#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31) + uint32_t handle; +}; +void gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, uint32_t *handle); +int __gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, uint32_t *handle); + void gem_sw_finish(int fd, uint32_t handle);
bool gem_bo_busy(int fd, uint32_t handle); diff --git a/tests/gem_userptr_blits.c b/tests/gem_userptr_blits.c index 6d38260..95d7ca2 100644 --- a/tests/gem_userptr_blits.c +++ b/tests/gem_userptr_blits.c @@ -61,17 +61,6 @@ #define PAGE_SIZE 4096 #endif
-#define LOCAL_I915_GEM_USERPTR 0x33 -#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr) -struct local_i915_gem_userptr { - uint64_t user_ptr; - uint64_t user_size; - uint32_t flags; -#define LOCAL_I915_USERPTR_READ_ONLY (1<<0) -#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31) - uint32_t handle; -}; - static uint32_t userptr_flags = LOCAL_I915_USERPTR_UNSYNCHRONIZED;
#define WIDTH 512 @@ -89,32 +78,6 @@ static void gem_userptr_test_synchronized(void) userptr_flags = 0; }
-static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t *handle) -{ - struct local_i915_gem_userptr userptr; - int ret; - - memset(&userptr, 0, sizeof(userptr)); - userptr.user_ptr = (uintptr_t)ptr; - userptr.user_size = size; - userptr.flags = userptr_flags; - if (read_only) - userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY; - - ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr); - if (ret) - ret = errno; - igt_skip_on_f(ret == ENODEV && - (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 && - !read_only, - "Skipping, synchronized mappings with no kernel CONFIG_MMU_NOTIFIER?"); - if (ret == 0) - *handle = userptr.handle; - - return ret; -} - - static void gem_userptr_sync(int fd, uint32_t handle) { gem_set_domain(fd, handle, I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU); @@ -289,10 +252,9 @@ static uint32_t create_userptr(int fd, uint32_t val, uint32_t *ptr) { uint32_t handle; - int i, ret; + int i;
- ret = gem_userptr(fd, ptr, sizeof(linear), 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr, sizeof(linear), 0, userptr_flags, &handle); igt_assert(handle != 0);
/* Fill the BO with dwords starting at val */ @@ -363,7 +325,6 @@ static uint32_t create_userptr_bo(int fd, uint64_t size) { void *ptr; uint32_t handle; - int ret;
ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, @@ -371,8 +332,7 @@ static uint32_t create_userptr_bo(int fd, uint64_t size) -1, 0); igt_assert(ptr != MAP_FAILED);
- ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle); add_handle_ptr(handle, ptr, size);
return handle; @@ -450,7 +410,7 @@ static int has_userptr(int fd) igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0); oldflags = userptr_flags; gem_userptr_test_unsynchronized(); - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle); userptr_flags = oldflags; if (ret != 0) { free(ptr); @@ -509,7 +469,7 @@ static int test_access_control(int fd)
igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0);
- ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle); if (ret == 0) gem_close(fd, handle); free(ptr); @@ -524,11 +484,9 @@ static int test_access_control(int fd) static int test_invalid_null_pointer(int fd) { uint32_t handle; - int ret;
/* NULL pointer. */ - ret = gem_userptr(fd, NULL, PAGE_SIZE, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, NULL, PAGE_SIZE, 0, userptr_flags, &handle);
copy(fd, handle, handle, ~0); /* QQQ Precise errno? */ gem_close(fd, handle); @@ -540,7 +498,6 @@ static int test_invalid_gtt_mapping(int fd) { uint32_t handle, handle2; void *ptr; - int ret;
/* GTT mapping */ handle = create_bo(fd, 0); @@ -550,8 +507,7 @@ static int test_invalid_gtt_mapping(int fd) igt_assert(((unsigned long)ptr & (PAGE_SIZE - 1)) == 0); igt_assert((sizeof(linear) & (PAGE_SIZE - 1)) == 0);
- ret = gem_userptr(fd, ptr, sizeof(linear), 0, &handle2); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr, sizeof(linear), 0, userptr_flags, &handle2); copy(fd, handle2, handle2, ~0); /* QQQ Precise errno? */ gem_close(fd, handle2);
@@ -594,8 +550,7 @@ static void test_forked_access(int fd) #ifdef MADV_DONTFORK ret |= madvise(ptr1, sizeof(linear), MADV_DONTFORK); #endif - ret |= gem_userptr(fd, ptr1, sizeof(linear), 0, &handle1); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr1, sizeof(linear), 0, userptr_flags, &handle1); igt_assert(ptr1); igt_assert(handle1);
@@ -603,8 +558,7 @@ static void test_forked_access(int fd) #ifdef MADV_DONTFORK ret |= madvise(ptr2, sizeof(linear), MADV_DONTFORK); #endif - ret |= gem_userptr(fd, ptr2, sizeof(linear), 0, &handle2); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr2, sizeof(linear), 0, userptr_flags, &handle2); igt_assert(ptr2); igt_assert(handle2);
@@ -651,8 +605,7 @@ static int test_forbidden_ops(int fd)
igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0);
- ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle);
/* pread/pwrite are not always forbidden, but when they * are they should fail with EINVAL. @@ -839,19 +792,19 @@ static int test_usage_restrictions(int fd) igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE * 2) == 0);
/* Address not aligned. */ - ret = gem_userptr(fd, (char *)ptr + 1, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, (char *)ptr + 1, PAGE_SIZE, 0, userptr_flags, &handle); igt_assert_neq(ret, 0);
/* Size not rounded to page size. */ - ret = gem_userptr(fd, ptr, PAGE_SIZE - 1, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE - 1, 0, userptr_flags, &handle); igt_assert_neq(ret, 0);
/* Both wrong. */ - ret = gem_userptr(fd, (char *)ptr + 1, PAGE_SIZE - 1, 0, &handle); + ret = __gem_userptr(fd, (char *)ptr + 1, PAGE_SIZE - 1, 0, userptr_flags, &handle); igt_assert_neq(ret, 0);
/* Read-only not supported. */ - ret = gem_userptr(fd, (char *)ptr, PAGE_SIZE, 1, &handle); + ret = __gem_userptr(fd, (char *)ptr, PAGE_SIZE, 1, userptr_flags, &handle); igt_assert_neq(ret, 0);
free(ptr); @@ -873,7 +826,7 @@ static int test_create_destroy(int fd, int time) for (n = 0; n < 1000; n++) { igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0);
- do_or_die(gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle)); + do_or_die(__gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle));
gem_close(fd, handle); free(ptr); @@ -1065,41 +1018,40 @@ static void test_overlap(int fd, int expected)
igt_assert(posix_memalign((void *)&ptr, PAGE_SIZE, PAGE_SIZE * 3) == 0);
- ret = gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE, 0, userptr_flags, &handle);
/* before, no overlap */ - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle2); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert_eq(ret, 0);
/* after, no overlap */ - ret = gem_userptr(fd, ptr + PAGE_SIZE * 2, PAGE_SIZE, 0, &handle2); + ret = __gem_userptr(fd, ptr + PAGE_SIZE * 2, PAGE_SIZE, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert_eq(ret, 0);
/* exactly overlapping */ - ret = gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE, 0, &handle2); + ret = __gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert(ret == 0 || ret == expected);
/* start overlaps */ - ret = gem_userptr(fd, ptr, PAGE_SIZE * 2, 0, &handle2); + ret = __gem_userptr(fd, ptr, PAGE_SIZE * 2, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert(ret == 0 || ret == expected);
/* end overlaps */ - ret = gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE * 2, 0, &handle2); + ret = __gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE * 2, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert(ret == 0 || ret == expected);
/* subsumes */ - ret = gem_userptr(fd, ptr, PAGE_SIZE * 3, 0, &handle2); + ret = __gem_userptr(fd, ptr, PAGE_SIZE * 3, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert(ret == 0 || ret == expected); @@ -1124,8 +1076,7 @@ static void test_unmap(int fd, int expected) bo_ptr = (char *)ALIGN((unsigned long)ptr, PAGE_SIZE);
for (i = 0; i < num_obj; i++, bo_ptr += sizeof(linear)) { - ret = gem_userptr(fd, bo_ptr, sizeof(linear), 0, &bo[i]); - igt_assert_eq(ret, 0); + gem_userptr(fd, bo_ptr, sizeof(linear), 0, userptr_flags, &bo[i]); }
bo[num_obj] = create_bo(fd, 0); @@ -1159,8 +1110,7 @@ static void test_unmap_after_close(int fd) bo_ptr = (char *)ALIGN((unsigned long)ptr, PAGE_SIZE);
for (i = 0; i < num_obj; i++, bo_ptr += sizeof(linear)) { - ret = gem_userptr(fd, bo_ptr, sizeof(linear), 0, &bo[i]); - igt_assert_eq(ret, 0); + gem_userptr(fd, bo_ptr, sizeof(linear), 0, userptr_flags, &bo[i]); }
bo[num_obj] = create_bo(fd, 0); @@ -1230,8 +1180,7 @@ static void test_stress_mm(int fd) igt_assert_eq(ret, 0);
while (loops--) { - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle);
gem_close(fd, handle); } @@ -1265,8 +1214,7 @@ static void *mm_userptr_close_thread(void *data) while (!t->stop) { pthread_mutex_unlock(&t->mutex); for (int i = 0; i < num_handles; i++) - igt_assert_eq(gem_userptr(t->fd, t->ptr, PAGE_SIZE, 0, &handle[i]), - 0); + gem_userptr(t->fd, t->ptr, PAGE_SIZE, 0, userptr_flags, &handle[i]); for (int i = 0; i < num_handles; i++) gem_close(t->fd, handle[i]); pthread_mutex_lock(&t->mutex);
From: Rob Bradford rob@linux.intel.com
This test has the following subtests: - test_correct for correctness of the data - test_map_unmap checks for mapping idempotency - test_reprime checks for dma-buf creation idempotency - test_forked checks for multiprocess access - test_refcounting checks for buffer reference counting - test_dup checks that dup()ing the fd works - test_userptr make sure it fails when mmaping due the lack of obj->base.filp in a userptr. - test_errors checks the error return values for failures - test_aperture_limit tests multiple buffer creation at the gtt aperture limit
v2 (Tiago): Removed pattern_check(), which was walking through a useless iterator. Removed superfluous PROT_WRITE from gem_mmap, in test_correct(). Added binary file to .gitignore v3 (Tiago): squash patch "prime_mmap: Test for userptr mmap" into this one. v4 (Tiago): use synchronized userptr for testing. Add test for buffer overlapping.
Signed-off-by: Rob Bradford rob@linux.intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- tests/Makefile.sources | 1 + tests/prime_mmap.c | 417 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 418 insertions(+) create mode 100644 tests/prime_mmap.c
diff --git a/tests/Makefile.sources b/tests/Makefile.sources index d594038..75f3cb0 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -96,6 +96,7 @@ TESTS_progs_M = \ pm_rps \ pm_rc6_residency \ pm_sseu \ + prime_mmap \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c new file mode 100644 index 0000000..95304a9 --- /dev/null +++ b/tests/prime_mmap.c @@ -0,0 +1,417 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + * Rob Bradford <rob at linux.intel.com> + * + */ + +/* + * Testcase: Check whether mmap()ing dma-buf works + */ +#define _GNU_SOURCE +#include <unistd.h> +#include <stdlib.h> +#include <stdio.h> +#include <string.h> +#include <fcntl.h> +#include <inttypes.h> +#include <errno.h> +#include <sys/stat.h> +#include <sys/ioctl.h> +#include <pthread.h> + +#include "drm.h" +#include "i915_drm.h" +#include "drmtest.h" +#include "igt_debugfs.h" +#include "ioctl_wrappers.h" + +#define BO_SIZE (16*1024) + +static int fd; + +char pattern[] = {0xff, 0x00, 0x00, 0x00, + 0x00, 0xff, 0x00, 0x00, + 0x00, 0x00, 0xff, 0x00, + 0x00, 0x00, 0x00, 0xff}; + +static void +fill_bo(uint32_t handle, size_t size) +{ + off_t i; + for (i = 0; i < size; i+=sizeof(pattern)) + { + gem_write(fd, handle, i, pattern, sizeof(pattern)); + } +} + +static void +test_correct(void) +{ + int dma_buf_fd; + char *ptr1, *ptr2; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + /* Check correctness vs GEM_MMAP_GTT */ + ptr1 = gem_mmap__gtt(fd, handle, BO_SIZE, PROT_READ); + ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr1 != MAP_FAILED); + igt_assert(ptr2 != MAP_FAILED); + igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0); + + munmap(ptr1, BO_SIZE); + munmap(ptr2, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +static void +test_map_unmap(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + /* Unmap and remap */ + munmap(ptr, BO_SIZE); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* prime and then unprime and then prime again the same handle */ +static void +test_reprime(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + close (dma_buf_fd); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* map from another process */ +static void +test_forked(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + igt_fork(childno, 1) { + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + } + close(dma_buf_fd); + igt_waitchildren(); + gem_close(fd, handle); +} + +static void +test_refcounting(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + /* Close gem object before mapping */ + gem_close(fd, handle); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close (dma_buf_fd); +} + +/* dup before mmap */ +static void +test_dup(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = dup(prime_handle_to_fd(fd, handle)); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + gem_close(fd, handle); + close (dma_buf_fd); +} + + +/* Used for error case testing to avoid wrapper */ +static int prime_handle_to_fd_no_assert(uint32_t handle, int *fd_out) +{ + struct drm_prime_handle args; + int ret; + + args.handle = handle; + args.flags = DRM_CLOEXEC; + args.fd = -1; + + ret = drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args); + if (ret) + ret = errno; + *fd_out = args.fd; + + return ret; +} + +/* test for mmap(dma_buf_export(userptr)) */ +static void +test_userptr(void) +{ + int ret, dma_buf_fd; + void *ptr; + uint32_t handle; + + /* create userptr bo */ + ret = posix_memalign(&ptr, 4096, BO_SIZE); + igt_assert_eq(ret, 0); + + /* we are not allowed to export unsynchronized userptr. Just create a normal + * one */ + gem_userptr(fd, (uint32_t *)ptr, BO_SIZE, 0, 0, &handle); + + /* export userptr */ + ret = prime_handle_to_fd_no_assert(handle, &dma_buf_fd); + if (ret) { + igt_assert(ret == EINVAL || ret == ENODEV); + goto free_userptr; + } else { + igt_assert_eq(ret, 0); + igt_assert_lte(0, dma_buf_fd); + } + + /* a userptr doesn't have the obj->base.filp, but can be exported via + * dma-buf, so make sure it fails here */ + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr == MAP_FAILED && errno == ENODEV); +free_userptr: + gem_close(fd, handle); + close(dma_buf_fd); +} + +static void +test_errors(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + /* Close gem object before priming */ + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + gem_close(fd, handle); + prime_handle_to_fd_no_assert(handle, &dma_buf_fd); + igt_assert(dma_buf_fd == -1 && errno == ENOENT); + errno = 0; + + /* close fd before mapping */ + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + close(dma_buf_fd); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr == MAP_FAILED && errno == EBADF); + errno = 0; + gem_close(fd, handle); + + /* Map too big */ + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + ptr = mmap(NULL, BO_SIZE * 2, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr == MAP_FAILED && errno == EINVAL); + errno = 0; + close(dma_buf_fd); + gem_close(fd, handle); + + /* Overlapping the end of the buffer */ + handle = gem_create(fd, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, BO_SIZE / 2); + igt_assert(ptr == MAP_FAILED && errno == EINVAL); + errno = 0; + close(dma_buf_fd); + gem_close(fd, handle); +} + +static void +test_aperture_limit(void) +{ + int dma_buf_fd1, dma_buf_fd2; + char *ptr1, *ptr2; + uint32_t handle1, handle2; + /* Two buffers the sum of which > mappable aperture */ + uint64_t size1 = (gem_mappable_aperture_size() * 7) / 8; + uint64_t size2 = (gem_mappable_aperture_size() * 3) / 8; + + handle1 = gem_create(fd, size1); + fill_bo(handle1, BO_SIZE); + + dma_buf_fd1 = prime_handle_to_fd(fd, handle1); + igt_assert(errno == 0); + ptr1 = mmap(NULL, size1, PROT_READ, MAP_SHARED, dma_buf_fd1, 0); + igt_assert(ptr1 != MAP_FAILED); + igt_assert(memcmp(ptr1, pattern, sizeof(pattern)) == 0); + + handle2 = gem_create(fd, size1); + fill_bo(handle2, BO_SIZE); + dma_buf_fd2 = prime_handle_to_fd(fd, handle2); + igt_assert(errno == 0); + ptr2 = mmap(NULL, size2, PROT_READ, MAP_SHARED, dma_buf_fd2, 0); + igt_assert(ptr2 != MAP_FAILED); + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0); + + igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); + + munmap(ptr1, size1); + munmap(ptr2, size2); + close(dma_buf_fd1); + close(dma_buf_fd2); + gem_close(fd, handle1); + gem_close(fd, handle2); +} + +static int +check_for_dma_buf_mmap(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + int ret = 1; + + handle = gem_create(fd, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + if (ptr != MAP_FAILED) + ret = 0; + munmap(ptr, BO_SIZE); + gem_close(fd, handle); + close(dma_buf_fd); + return ret; +} + +igt_main +{ + struct { + const char *name; + void (*fn)(void); + } tests[] = { + { "test_correct", test_correct }, + { "test_map_unmap", test_map_unmap }, + { "test_reprime", test_reprime }, + { "test_forked", test_forked }, + { "test_refcounting", test_refcounting }, + { "test_dup", test_dup }, + { "test_userptr", test_userptr }, + { "test_errors", test_errors }, + { "test_aperture_limit", test_aperture_limit }, + }; + int i; + + igt_fixture { + fd = drm_open_driver(DRIVER_INTEL); + errno = 0; + } + + igt_skip_on((check_for_dma_buf_mmap() != 0)); + + for (i = 0; i < ARRAY_SIZE(tests); i++) { + igt_subtest(tests[i].name) + tests[i].fn(); + } + + igt_fixture + close(fd); +}
This patch adds test_correct_cpu_write, which maps the texture buffer through a prime fd and then writes directly to it using the CPU. It stresses the driver to guarantee cache synchronization among the different domains.
This test also adds test_forked_cpu_write, which creates the GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. Roughly speaking this test simulates Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process").
This requires kernel modifications (Daniel Thompson's "drm: prime: Honour O_RDWR during prime-handle-to-fd") and therefore prime_handle_to_fd_for_mmap is added to fail in case these lack. Also, upcoming tests (e.g. next patch) are going to use it as well, so make it public and available in the lib.
v2: adds prime_handle_to_fd_with_mmap for skipping test in older kernels and test for invalid flags.
Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- lib/ioctl_wrappers.c | 25 +++++++++++++++ lib/ioctl_wrappers.h | 4 +++ tests/prime_mmap.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 112 insertions(+), 6 deletions(-)
diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 6cad8a2..86a61ba 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1329,6 +1329,31 @@ int prime_handle_to_fd(int fd, uint32_t handle) }
/** + * prime_handle_to_fd_for_mmap: + * @fd: open i915 drm file descriptor + * @handle: file-private gem buffer object handle + * + * Same as prime_handle_to_fd above but with DRM_RDWR capabilities, which can + * be useful for writing into the mmap'ed dma-buf file-descriptor. + * + * Returns: The created dma-buf fd handle or -1 if the ioctl fails. + */ +int prime_handle_to_fd_for_mmap(int fd, uint32_t handle) +{ + struct drm_prime_handle args; + + memset(&args, 0, sizeof(args)); + args.handle = handle; + args.flags = DRM_CLOEXEC | DRM_RDWR; + args.fd = -1; + + if (drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args) != 0) + return -1; + + return args.fd; +} + +/** * prime_fd_to_handle: * @fd: open i915 drm file descriptor * @dma_buf_fd: dma-buf fd handle diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index bb8a858..d3ffba2 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -149,6 +149,10 @@ void gem_require_ring(int fd, int ring_id);
/* prime */ int prime_handle_to_fd(int fd, uint32_t handle); +#ifndef DRM_RDWR +#define DRM_RDWR O_RDWR +#endif +int prime_handle_to_fd_for_mmap(int fd, uint32_t handle); uint32_t prime_fd_to_handle(int fd, int dma_buf_fd); off_t prime_get_size(int dma_buf_fd);
diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index 95304a9..269ada6 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -22,6 +22,7 @@ * * Authors: * Rob Bradford <rob at linux.intel.com> + * Tiago Vignatti <tiago.vignatti at intel.com> * */
@@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size) }
static void +fill_bo_cpu(char *ptr) +{ + memcpy(ptr, pattern, sizeof(pattern)); +} + +static void test_correct(void) { int dma_buf_fd; @@ -180,6 +187,65 @@ test_forked(void) gem_close(fd, handle); }
+/* test simple CPU write */ +static void +test_correct_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle); + + /* Skip if DRM_RDWR is not supported */ + igt_skip_on(errno == EINVAL); + + /* Check correctness of map using write protection (PROT_WRITE) */ + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + /* Fill bo using CPU */ + fill_bo_cpu(ptr); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* map from another process and then write using CPU */ +static void +test_forked_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle); + + /* Skip if DRM_RDWR is not supported */ + igt_skip_on(errno == EINVAL); + + igt_fork(childno, 1) { + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + fill_bo_cpu(ptr); + + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + } + close(dma_buf_fd); + igt_waitchildren(); + gem_close(fd, handle); +} + static void test_refcounting(void) { @@ -224,15 +290,14 @@ test_dup(void) close (dma_buf_fd); }
- /* Used for error case testing to avoid wrapper */ -static int prime_handle_to_fd_no_assert(uint32_t handle, int *fd_out) +static int prime_handle_to_fd_no_assert(uint32_t handle, int flags, int *fd_out) { struct drm_prime_handle args; int ret;
args.handle = handle; - args.flags = DRM_CLOEXEC; + args.flags = flags; args.fd = -1;
ret = drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args); @@ -260,7 +325,7 @@ test_userptr(void) gem_userptr(fd, (uint32_t *)ptr, BO_SIZE, 0, 0, &handle);
/* export userptr */ - ret = prime_handle_to_fd_no_assert(handle, &dma_buf_fd); + ret = prime_handle_to_fd_no_assert(handle, DRM_CLOEXEC, &dma_buf_fd); if (ret) { igt_assert(ret == EINVAL || ret == ENODEV); goto free_userptr; @@ -281,15 +346,25 @@ free_userptr: static void test_errors(void) { - int dma_buf_fd; + int i, dma_buf_fd; char *ptr; uint32_t handle; + int invalid_flags[] = {DRM_CLOEXEC - 1, DRM_CLOEXEC + 1, + DRM_RDWR - 1, DRM_RDWR + 1}; + + /* Test for invalid flags */ + handle = gem_create(fd, BO_SIZE); + for (i = 0; i < sizeof(invalid_flags) / sizeof(invalid_flags[0]); i++) { + prime_handle_to_fd_no_assert(handle, invalid_flags[i], &dma_buf_fd); + igt_assert_eq(errno, EINVAL); + errno = 0; + }
/* Close gem object before priming */ handle = gem_create(fd, BO_SIZE); fill_bo(handle, BO_SIZE); gem_close(fd, handle); - prime_handle_to_fd_no_assert(handle, &dma_buf_fd); + prime_handle_to_fd_no_assert(handle, DRM_CLOEXEC, &dma_buf_fd); igt_assert(dma_buf_fd == -1 && errno == ENOENT); errno = 0;
@@ -392,6 +467,8 @@ igt_main { "test_map_unmap", test_map_unmap }, { "test_reprime", test_reprime }, { "test_forked", test_forked }, + { "test_correct_cpu_write", test_correct_cpu_write }, + { "test_forked_cpu_write", test_forked_cpu_write }, { "test_refcounting", test_refcounting }, { "test_dup", test_dup }, { "test_userptr", test_userptr },
This patch adds dma-buf mmap synchronization ioctls that can be used by tests for cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time.
Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- lib/ioctl_wrappers.c | 26 ++++++++++++++++++++++++++ lib/ioctl_wrappers.h | 15 +++++++++++++++ 2 files changed, 41 insertions(+)
diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 86a61ba..0d84d00 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1400,6 +1400,32 @@ off_t prime_get_size(int dma_buf_fd) }
/** + * prime_sync_start + * @dma_buf_fd: dma-buf fd handle + */ +void prime_sync_start(int dma_buf_fd) +{ + struct local_dma_buf_sync sync_start; + + memset(&sync_start, 0, sizeof(sync_start)); + sync_start.flags = LOCAL_DMA_BUF_SYNC_START | LOCAL_DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_start); +} + +/** + * prime_sync_end + * @dma_buf_fd: dma-buf fd handle + */ +void prime_sync_end(int dma_buf_fd) +{ + struct local_dma_buf_sync sync_end; + + memset(&sync_end, 0, sizeof(sync_end)); + sync_end.flags = LOCAL_DMA_BUF_SYNC_END | LOCAL_DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_end); +} + +/** * igt_require_fb_modifiers: * @fd: Open DRM file descriptor. * diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index d3ffba2..cbd7a73 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -148,6 +148,19 @@ void gem_require_caching(int fd); void gem_require_ring(int fd, int ring_id);
/* prime */ +struct local_dma_buf_sync { + uint64_t flags; +}; + +#define LOCAL_DMA_BUF_SYNC_RW (3 << 0) +#define LOCAL_DMA_BUF_SYNC_START (0 << 2) +#define LOCAL_DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \ + (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) + +#define LOCAL_DMA_BUF_BASE 'b' +#define LOCAL_DMA_BUF_IOCTL_SYNC _IOW(LOCAL_DMA_BUF_BASE, 0, struct local_dma_buf_sync) + int prime_handle_to_fd(int fd, uint32_t handle); #ifndef DRM_RDWR #define DRM_RDWR O_RDWR @@ -155,6 +168,8 @@ int prime_handle_to_fd(int fd, uint32_t handle); int prime_handle_to_fd_for_mmap(int fd, uint32_t handle); uint32_t prime_fd_to_handle(int fd, int dma_buf_fd); off_t prime_get_size(int dma_buf_fd); +void prime_sync_start(int dma_buf_fd); +void prime_sync_end(int dma_buf_fd);
/* addfb2 fb modifiers */ struct local_drm_mode_fb_cmd2 {
On Wed, Dec 16, 2015 at 08:25:41PM -0200, Tiago Vignatti wrote:
This patch adds dma-buf mmap synchronization ioctls that can be used by tests for cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time.
Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com
lib/ioctl_wrappers.c | 26 ++++++++++++++++++++++++++ lib/ioctl_wrappers.h | 15 +++++++++++++++ 2 files changed, 41 insertions(+)
diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 86a61ba..0d84d00 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1400,6 +1400,32 @@ off_t prime_get_size(int dma_buf_fd) }
/**
- prime_sync_start
- @dma_buf_fd: dma-buf fd handle
- */
+void prime_sync_start(int dma_buf_fd) +{
- struct local_dma_buf_sync sync_start;
- memset(&sync_start, 0, sizeof(sync_start));
- sync_start.flags = LOCAL_DMA_BUF_SYNC_START | LOCAL_DMA_BUF_SYNC_RW;
- do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_start);
+}
+/**
- prime_sync_end
- @dma_buf_fd: dma-buf fd handle
- */
+void prime_sync_end(int dma_buf_fd) +{
- struct local_dma_buf_sync sync_end;
- memset(&sync_end, 0, sizeof(sync_end));
- sync_end.flags = LOCAL_DMA_BUF_SYNC_END | LOCAL_DMA_BUF_SYNC_RW;
- do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_end);
+}
+/**
- igt_require_fb_modifiers:
- @fd: Open DRM file descriptor.
diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index d3ffba2..cbd7a73 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -148,6 +148,19 @@ void gem_require_caching(int fd); void gem_require_ring(int fd, int ring_id);
/* prime */ +struct local_dma_buf_sync {
- uint64_t flags;
+};
+#define LOCAL_DMA_BUF_SYNC_RW (3 << 0) +#define LOCAL_DMA_BUF_SYNC_START (0 << 2) +#define LOCAL_DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
(DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
What seems to be missing is negative ioclt tests, i.e. making sure invalid flags and stuff gets properly rejected. -Daniel
+#define LOCAL_DMA_BUF_BASE 'b' +#define LOCAL_DMA_BUF_IOCTL_SYNC _IOW(LOCAL_DMA_BUF_BASE, 0, struct local_dma_buf_sync)
int prime_handle_to_fd(int fd, uint32_t handle); #ifndef DRM_RDWR #define DRM_RDWR O_RDWR @@ -155,6 +168,8 @@ int prime_handle_to_fd(int fd, uint32_t handle); int prime_handle_to_fd_for_mmap(int fd, uint32_t handle); uint32_t prime_fd_to_handle(int fd, int dma_buf_fd); off_t prime_get_size(int dma_buf_fd); +void prime_sync_start(int dma_buf_fd); +void prime_sync_end(int dma_buf_fd);
/* addfb2 fb modifiers */ struct local_drm_mode_fb_cmd2 { -- 2.1.4
This program can be used to detect when CPU writes in the dma-buf mapped object don't land in scanout due cache incoherency.
Although this seems a problem inherently of non-LCC machines ("Atom"), this particular test catches a cache dirt on scanout on LLC machines as well. It's inspired in Ville's kms_pwrite_crc.c and can be used also to test the correctness of the driver's begin_cpu_access and end_cpu_access (which requires i915 implementation.
To see the need for flush, one has to run this same binary a few times cause it's not 100% reproducible -- what I usually do is the following, using '-n' option to not call the sync ioctls:
$ while ((1)) ; do ./kms_mmap_write_crc -n; done # in terminal A $ find / # in terminal B
That will most likely trashes the memory while the test will catch the coherency issue. If you now suppress '-n', then things should just work like expected.
I tested this with !llc and llc platforms, BTY and IVY respectively.
v2: use prime_handle_to_fd_for_mmap instead. v3: merge end_cpu_access() patch with this and provide options to disable sync. v4: use library's prime_sync_{start,end} instead.
Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- tests/Makefile.sources | 1 + tests/kms_mmap_write_crc.c | 281 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 282 insertions(+) create mode 100644 tests/kms_mmap_write_crc.c
diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 75f3cb0..ad2dd6a 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -168,6 +168,7 @@ TESTS_progs = \ kms_3d \ kms_fence_pin_leak \ kms_force_connector_basic \ + kms_mmap_write_crc \ kms_pwrite_crc \ kms_sink_crc_basic \ prime_udl \ diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c new file mode 100644 index 0000000..6a12539 --- /dev/null +++ b/tests/kms_mmap_write_crc.c @@ -0,0 +1,281 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + * Tiago Vignatti <tiago.vignatti at intel.com> + */ + +#include <errno.h> +#include <limits.h> +#include <stdbool.h> +#include <stdio.h> +#include <string.h> + +#include "drmtest.h" +#include "igt_debugfs.h" +#include "igt_kms.h" +#include "intel_chipset.h" +#include "ioctl_wrappers.h" +#include "igt_aux.h" + +IGT_TEST_DESCRIPTION( + "Use the display CRC support to validate mmap write to an already uncached future scanout buffer."); + +typedef struct { + int drm_fd; + igt_display_t display; + struct igt_fb fb[2]; + igt_output_t *output; + igt_plane_t *primary; + enum pipe pipe; + igt_crc_t ref_crc; + igt_pipe_crc_t *pipe_crc; + uint32_t devid; +} data_t; + +static int ioctl_sync = true; +int dma_buf_fd; + +static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb) +{ + char *ptr = NULL; + + dma_buf_fd = prime_handle_to_fd_for_mmap(drm_fd, fb->gem_handle); + igt_skip_on(dma_buf_fd == -1 && errno == EINVAL); + + ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + return ptr; +} + +static void test(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + struct igt_fb *fb = &data->fb[1]; + drmModeModeInfo *mode; + cairo_t *cr; + char *ptr; + uint32_t caching; + void *buf; + igt_crc_t crc; + + mode = igt_output_get_mode(output); + + /* create a non-white fb where we can write later */ + igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB8888, LOCAL_DRM_FORMAT_MOD_NONE, fb); + + ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb); + + cr = igt_get_cairo_ctx(data->drm_fd, fb); + igt_paint_test_pattern(cr, fb->width, fb->height); + cairo_destroy(cr); + + /* flip to it to make it UC/WC and fully flushed */ + igt_plane_set_fb(data->primary, fb); + igt_display_commit(display); + + /* flip back the original white buffer */ + igt_plane_set_fb(data->primary, &data->fb[0]); + igt_display_commit(display); + + /* make sure caching mode has become UC/WT */ + caching = gem_get_caching(data->drm_fd, fb->gem_handle); + igt_assert(caching == I915_CACHING_NONE || caching == I915_CACHING_DISPLAY); + + /* + * firstly demonstrate the need for DMA_BUF_SYNC_START ("begin_cpu_access") + */ + if (ioctl_sync) + prime_sync_start(dma_buf_fd); + + /* use dmabuf pointer to make the other fb all white too */ + buf = malloc(fb->size); + igt_assert(buf != NULL); + memset(buf, 0xff, fb->size); + memcpy(ptr, buf, fb->size); + free(buf); + + /* and flip to it */ + igt_plane_set_fb(data->primary, fb); + igt_display_commit(display); + + /* check that the crc is as expected, which requires that caches got flushed */ + igt_pipe_crc_collect_crc(data->pipe_crc, &crc); + igt_assert_crc_equal(&crc, &data->ref_crc); + + /* + * now demonstrates the need for DMA_BUF_SYNC_END ("end_cpu_access") + */ + + /* start over, writing non-white to the fb again and flip to it to make it + * fully flushed */ + cr = igt_get_cairo_ctx(data->drm_fd, fb); + igt_paint_test_pattern(cr, fb->width, fb->height); + cairo_destroy(cr); + + igt_plane_set_fb(data->primary, fb); + igt_display_commit(display); + + /* sync start, to move to CPU domain */ + if (ioctl_sync) + prime_sync_start(dma_buf_fd); + + /* use dmabuf pointer in the same fb to make it all white */ + buf = malloc(fb->size); + igt_assert(buf != NULL); + memset(buf, 0xff, fb->size); + memcpy(ptr, buf, fb->size); + free(buf); + + /* if we don't change to the GTT domain again, the whites won't get flushed + * and therefore we demonstrates the need for sync end here */ + if (ioctl_sync) + prime_sync_end(dma_buf_fd); + + /* check that the crc is as expected, which requires that caches got flushed */ + igt_pipe_crc_collect_crc(data->pipe_crc, &crc); + igt_assert_crc_equal(&crc, &data->ref_crc); +} + +static bool prepare_crtc(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + drmModeModeInfo *mode; + + /* select the pipe we want to use */ + igt_output_set_pipe(output, data->pipe); + igt_display_commit(display); + + if (!output->valid) { + igt_output_set_pipe(output, PIPE_ANY); + igt_display_commit(display); + return false; + } + + mode = igt_output_get_mode(output); + + /* create a white reference fb and flip to it */ + igt_create_color_fb(data->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB8888, LOCAL_DRM_FORMAT_MOD_NONE, + 1.0, 1.0, 1.0, &data->fb[0]); + + data->primary = igt_output_get_plane(output, IGT_PLANE_PRIMARY); + + igt_plane_set_fb(data->primary, &data->fb[0]); + igt_display_commit(display); + + if (data->pipe_crc) + igt_pipe_crc_free(data->pipe_crc); + + data->pipe_crc = igt_pipe_crc_new(data->pipe, + INTEL_PIPE_CRC_SOURCE_AUTO); + + /* get reference crc for the white fb */ + igt_pipe_crc_collect_crc(data->pipe_crc, &data->ref_crc); + + return true; +} + +static void cleanup_crtc(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + + igt_pipe_crc_free(data->pipe_crc); + data->pipe_crc = NULL; + + igt_plane_set_fb(data->primary, NULL); + + igt_output_set_pipe(output, PIPE_ANY); + igt_display_commit(display); + + igt_remove_fb(data->drm_fd, &data->fb[0]); + igt_remove_fb(data->drm_fd, &data->fb[1]); +} + +static void run_test(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output; + enum pipe pipe; + + for_each_connected_output(display, output) { + data->output = output; + for_each_pipe(display, pipe) { + data->pipe = pipe; + + if (!prepare_crtc(data)) + continue; + + test(data); + cleanup_crtc(data); + + /* once is enough */ + return; + } + } + + igt_skip("no valid crtc/connector combinations found\n"); +} + +static int opt_handler(int opt, int opt_index, void *data) +{ + if (opt == 'n') { + ioctl_sync = false; + igt_info("set via cmd line to not use sync ioctls\n"); + } + + return 0; +} + +static data_t data; + +int main(int argc, char **argv) +{ + igt_simple_init_parse_opts(&argc, argv, "n", NULL, NULL, opt_handler, NULL); + + igt_skip_on_simulation(); + + igt_fixture { + data.drm_fd = drm_open_driver_master(DRIVER_INTEL); + + data.devid = intel_get_drm_devid(data.drm_fd); + + kmstest_set_vt_graphics_mode(); + + igt_require_pipe_crc(); + + igt_display_init(&data.display, data.drm_fd); + } + + run_test(&data); + + igt_fixture { + igt_display_fini(&data.display); + } + + igt_exit(); +}
On Wed, Dec 16, 2015 at 08:25:42PM -0200, Tiago Vignatti wrote:
This program can be used to detect when CPU writes in the dma-buf mapped object don't land in scanout due cache incoherency.
Although this seems a problem inherently of non-LCC machines ("Atom"), this particular test catches a cache dirt on scanout on LLC machines as well. It's inspired in Ville's kms_pwrite_crc.c and can be used also to test the correctness of the driver's begin_cpu_access and end_cpu_access (which requires i915 implementation.
To see the need for flush, one has to run this same binary a few times cause it's not 100% reproducible -- what I usually do is the following, using '-n' option to not call the sync ioctls:
$ while ((1)) ; do ./kms_mmap_write_crc -n; done # in terminal A $ find / # in terminal B
Sounds like we need a igt_fork_memhog_helper() and repeat the test for 20s? until faiure. -Chris
Different than kms_mmap_write_crc that captures the coherency issues within the scanout mapped buffer, this one is meant for test dma-buf mmap on !llc platforms mostly and provoke coherency bugs so we know where we need the sync ioctls.
I tested this with !llc and llc platforms, BTY and IVY respectively.
Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- tests/Makefile.sources | 1 + tests/prime_mmap_coherency.c | 246 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 247 insertions(+) create mode 100644 tests/prime_mmap_coherency.c
diff --git a/tests/Makefile.sources b/tests/Makefile.sources index ad2dd6a..78605c6 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -97,6 +97,7 @@ TESTS_progs_M = \ pm_rc6_residency \ pm_sseu \ prime_mmap \ + prime_mmap_coherency \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c new file mode 100644 index 0000000..a9a2664 --- /dev/null +++ b/tests/prime_mmap_coherency.c @@ -0,0 +1,246 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + * Tiago Vignatti + */ + +/** @file prime_mmap_coherency.c + * + * TODO: need to show the need for prime_sync_end(). + */ + +#include "igt.h" + +IGT_TEST_DESCRIPTION("Test dma-buf mmap on !llc platforms mostly and provoke" + " coherency bugs so we know for sure where we need the sync ioctls."); + +#define ROUNDS 20 + +int fd; +int stale = 0; +static drm_intel_bufmgr *bufmgr; +struct intel_batchbuffer *batch; +static int width = 1024, height = 1024; + +/* + * Exercises the need for read flush: + * 1. create a BO and write '0's, in GTT domain. + * 2. read BO using the dma-buf CPU mmap. + * 3. write '1's, in GTT domain. + * 4. read again through the mapped dma-buf. + */ +static void test_read_flush(bool expect_stale_cache) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr_gtt; + int dma_buf_fd, i; + + if (expect_stale_cache) + igt_require(!gem_has_llc(fd)); + + bo_1 = drm_intel_bo_alloc(bufmgr, "BO 1", width * height * 4, 4096); + + /* STEP #1: put the BO 1 in GTT domain. We use the blitter to copy and fill + * zeros to BO 1, so commands will be submitted and likely to place BO 1 in + * the GTT domain. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #2: read BO 1 using the dma-buf CPU mmap. This dirties the CPU caches. */ + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + for (i = 0; i < (width * height) / 4; i++) + igt_assert_eq(ptr_cpu[i], 0); + + /* STEP #3: write 0x11 into BO 1. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + ptr_gtt = gem_mmap__gtt(fd, bo_2->handle, width * height, PROT_READ | PROT_WRITE); + memset(ptr_gtt, 0x11, width * height); + munmap(ptr_gtt, width * height); + + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #4: read again using the CPU mmap. Doing #1 before #3 makes sure we + * don't do a full CPU cache flush in step #3 again. That makes sure all the + * stale cachelines from step #2 survive (mostly, a few will be evicted) + * until we try to read them again in step #4. This behavior could be fixed + * by flush CPU read right before accessing the CPU pointer */ + if (!expect_stale_cache) + prime_sync_start(dma_buf_fd); + + for (i = 0; i < (width * height) / 4; i++) + if (ptr_cpu[i] != 0x11111111) { + igt_warn_on_f(!expect_stale_cache, + "Found 0x%08x at offset 0x%08x\n", ptr_cpu[i], i); + stale++; + } + + drm_intel_bo_unreference(bo_1); + munmap(ptr_cpu, width * height); +} + +/* + * Exercises the need for write flush: + * 1. create BO 1 and write '0's, in GTT domain. + * 2. write '1's into BO 1 using the dma-buf CPU mmap. + * 3. copy BO 1 to new BO 2, in GTT domain. + * 4. read via dma-buf mmap BO 2. + */ +static void test_write_flush(bool expect_stale_cache) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr2_cpu; + int dma_buf_fd, dma_buf2_fd, i; + + if (expect_stale_cache) + igt_require(!gem_has_llc(fd)); + + bo_1 = drm_intel_bo_alloc(bufmgr, "BO 1", width * height * 4, 4096); + + /* STEP #1: Put the BO 1 in GTT domain. We use the blitter to copy and fill + * zeros to BO 1, so commands will be submitted and likely to place BO 1 in + * the GTT domain. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #2: Write '1's into BO 1 using the dma-buf CPU mmap. */ + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + /* This is the main point of this test: !llc hw requires a cache write + * flush right here (explained in step #4). */ + if (!expect_stale_cache) + prime_sync_start(dma_buf_fd); + + memset(ptr_cpu, 0x11, width * height); + + /* STEP #3: Copy BO 1 into BO 2, using blitter. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + intel_copy_bo(batch, bo_2, bo_1, width * height); + gem_sync(fd, bo_2->handle); + + /* STEP #4: compare BO 2 against written BO 1. In !llc hardware, there + * should be some cache lines that didn't get flushed out and are still 0, + * requiring cache flush before the write in step 2. */ + dma_buf2_fd = prime_handle_to_fd_for_mmap(fd, bo_2->handle); + igt_skip_on(errno == EINVAL); + + ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf2_fd, 0); + igt_assert(ptr2_cpu != MAP_FAILED); + + for (i = 0; i < (width * height) / 4; i++) + if (ptr2_cpu[i] != 0x11111111) { + igt_warn_on_f(!expect_stale_cache, + "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i); + stale++; + } + + drm_intel_bo_unreference(bo_1); + drm_intel_bo_unreference(bo_2); + munmap(ptr_cpu, width * height); +} + +int main(int argc, char **argv) +{ + int i; + bool expect_stale_cache; + igt_subtest_init(argc, argv); + + igt_fixture { + fd = drm_open_driver(DRIVER_INTEL); + + bufmgr = drm_intel_bufmgr_gem_init(fd, 4096); + batch = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd)); + } + + /* Cache coherency and the eviction are pretty much unpredictable, so + * reproducing boils down to trial and error to hit different scenarios. + * TODO: We may want to improve tests a bit by picking random subranges. */ + igt_info("%d rounds for each test\n", ROUNDS); + igt_subtest("read") { + stale = 0; + expect_stale_cache = false; + igt_info("exercising read flush\n"); + for (i = 0; i < ROUNDS; i++) + test_read_flush(expect_stale_cache); + igt_fail_on_f(stale, "num of stale cache lines %d\n", stale); + } + + /* Only for !llc platforms */ + igt_subtest("read-and-fail") { + stale = 0; + expect_stale_cache = true; + igt_info("exercising read flush and expect to fail on !llc\n"); + for (i = 0; i < ROUNDS; i++) + test_read_flush(expect_stale_cache); + igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); + } + + igt_subtest("write") { + stale = 0; + expect_stale_cache = false; + igt_info("exercising write flush\n"); + for (i = 0; i < ROUNDS; i++) + test_write_flush(expect_stale_cache); + igt_fail_on_f(stale, "num of stale cache lines %d\n", stale); + } + + /* Only for !llc platforms */ + igt_subtest("write-and-fail") { + stale = 0; + expect_stale_cache = true; + igt_info("exercising write flush and expect to fail on !llc\n"); + for (i = 0; i < ROUNDS; i++) + test_write_flush(expect_stale_cache); + igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); + } + + igt_fixture { + intel_batchbuffer_free(batch); + drm_intel_bufmgr_destroy(bufmgr); + + close(fd); + } + + igt_exit(); +}
On Wed, Dec 16, 2015 at 08:25:32PM -0200, Tiago Vignatti wrote:
Hi all,
The last version of this work was sent a while ago here:
http://lists.freedesktop.org/archives/dri-devel/2015-August/089263.html
So let's recap this series:
1. it adds a vendor-independent client interface for mapping gem objects through prime, IOW it implements userspace mmap() on dma-buf fd. This could be used for texturing from CPU rendered buffer, passing buffers among processes without performing copies in the userspace. 2. the series lets the client write on the mmap'ed memory, and 3. it deals with GPU and CPU caches synchronization.
Based on previous discussions seems that people are fine with 1. and 2. but not really with 3., given that caches coherency is a bit more boring to deal with.
It's easier to use this new infra on "coherent hardware" (systems with the memory cache that is shared by the GPU and CPU) because they rarely need to use that kind of synchronization. But would be much more convenient to have the very same interface exposed for clients no matter whether the underlying hardware is cache coherent or not.
One idea that came up was to force clients to call the sync ioctls after the dma-buf was mmaped. But apparently there's no easy, and performant, way to do so cause seems too costly to go over the page table entry and check the dirty bits. Also, depending on the instructions order sent for the devices, it might be needed a sync call after the mapped region gets accessed as well, to flush all cachelines and make sure for example the GPU domain won't read stale data. So that would make the things even more complicated, if we ever decide to go to this direction of forcing sync ioctls. The alternative therefore is to simply document it very well, strong wording the clients to use the sync ioctl regardless otherwise they will mis-behave. Do we have objections or maybe other wiser ways to circumvent this? I've made similar comments in August and no one has came up with better ideas.
I still think this is as good as it'll get. We can't force userspace to behave without a serious perf hit, and without enforcing it all the time there's not much use in it. Also there's the problem that mmap interfaces in the kernel don't really allow you (at least easily) to intercept mmap access.
It might make sense later on as a debug feature in case you do have a coherency bug and want to know who screwed up. Similar to what exists for the dma api.
Quickly looked through the patches and looks really nice. Especially the test coverage (including frontbuffer coherency checks for i915) is awesome. Imo as soon as we have an ack from chromium upstream that they're ok with this approach and your chromium patches, and after detailed code review is done this can go in. An ack from Thomas Hellstrom on the simplified coherency management interface would be good too.
Thanks, Daniel
Lastly, the diff of v6 series is that I've basically addressed concerns pointed in the igt tests, organized those changes better a bit (in smaller patches), documented the usage of sync ioctls and I have extensively tested this in different types of hardware.
https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf... https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v6
Tiago
Daniel Thompson (1): drm: prime: Honour O_RDWR during prime-handle-to-fd
Daniel Vetter (1): dma-buf: Add ioctls to allow userspace to flush
Tiago Vignatti (3): dma-buf: Remove range-based flush drm/i915: Implement end_cpu_access drm/i915: Use CPU mapping for userspace dma-buf mmap()
Documentation/dma-buf-sharing.txt | 41 +++++++++++++++------- drivers/dma-buf/dma-buf.c | 56 ++++++++++++++++++++++++++----- drivers/gpu/drm/drm_prime.c | 10 ++---- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 42 +++++++++++++++++++++-- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 +-- drivers/gpu/drm/udl/udl_fb.c | 2 -- drivers/staging/android/ion/ion.c | 6 ++-- drivers/staging/android/ion/ion_test.c | 4 +-- include/linux/dma-buf.h | 12 +++---- include/uapi/drm/drm.h | 1 + include/uapi/linux/dma-buf.h | 38 +++++++++++++++++++++ 11 files changed, 169 insertions(+), 47 deletions(-) create mode 100644 include/uapi/linux/dma-buf.h
And the igt changes: Rob Bradford (1): prime_mmap: Add new test for calling mmap() on dma-buf fds
Tiago Vignatti (5): lib: Add gem_userptr and __gem_userptr helpers prime_mmap: Add basic tests to write in a bo using CPU lib: Add prime_sync_start and prime_sync_end helpers tests: Add kms_mmap_write_crc for cache coherency tests tests: Add prime_mmap_coherency for cache coherency tests
benchmarks/gem_userptr_benchmark.c | 55 +---- lib/ioctl_wrappers.c | 92 +++++++ lib/ioctl_wrappers.h | 32 +++ tests/Makefile.sources | 3 + tests/gem_userptr_blits.c | 104 ++------ tests/kms_mmap_write_crc.c | 281 +++++++++++++++++++++ tests/prime_mmap.c | 494 +++++++++++++++++++++++++++++++++++++ tests/prime_mmap_coherency.c | 246 ++++++++++++++++++ 8 files changed, 1180 insertions(+), 127 deletions(-) create mode 100644 tests/kms_mmap_write_crc.c create mode 100644 tests/prime_mmap.c create mode 100644 tests/prime_mmap_coherency.c
-- 2.1.4
dri-devel@lists.freedesktop.org