Direct userspace dma-buf mmap (v7)

List overview All Threads
Download

newer

older

[Bug 94171] New: ati multihead...

[PATCH] drm/amd/powerplay: fix...

Tiago Vignatti

22 Dec 2015 22 Dec '15

9:36 p.m.

Hey back,

Thank you Daniel, Chris, Alex and Thomas for reviewing the last series. I think I addressed most of the comments now in version 7, including: - being even more wording in the doc about sync usage. - pass .write = false always in i915 end_cpu_access. - add sync invalid flags test (igt). - in kms_mmap_write_crc, use CPU hog and testing rounds to catch the sync problems (igt).

Here are the trees:

https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf... https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v7

Also, Chrome OS side is in progress. This past week I've been mostly struggling with fail attempts to build it (boots and goes to a black screen. Sigh.) and also finding a way to make my funky BayTrail-T laptop with 32-bit UEFI firmware boot up (success with Ubuntu but no success yet in CrOS). A WIP of Chromium changes can be seen here anyways:

https://codereview.chromium.org/1262043002/

Happy Holidays!

Tiago

-- 2.1.4

Show replies by date

Tiago Vignatti

22 Dec 22 Dec

9:36 p.m.

New subject: [PATCH v7 1/5] drm: prime: Honour O_RDWR during prime-handle-to-fd

From: Daniel Thompson daniel.thompson@linaro.org

Currently DRM_IOCTL_PRIME_HANDLE_TO_FD rejects all flags except (DRM|O)_CLOEXEC making it difficult (maybe impossible) for userspace to mmap() the resulting dma-buf even when this is supported by the DRM driver.

It is trivial to relax the restriction and permit read/write access. This is safe because the flags are seldom touched by drm; mostly they are passed verbatim to dma_buf calls.

v3 (Tiago): removed unused flags variable from drm_prime_handle_to_fd_ioctl.

Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- drivers/gpu/drm/drm_prime.c | 10 +++------- include/uapi/drm/drm.h | 1 + 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 27aa718..df6cdc7 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -329,7 +329,7 @@ static const struct dma_buf_ops drm_gem_prime_dmabuf_ops = { * drm_gem_prime_export - helper library implementation of the export callback * @dev: drm_device to export from * @obj: GEM object to export - * @flags: flags like DRM_CLOEXEC + * @flags: flags like DRM_CLOEXEC and DRM_RDWR * * This is the implementation of the gem_prime_export functions for GEM drivers * using the PRIME helpers. @@ -628,7 +628,6 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) { struct drm_prime_handle *args = data; - uint32_t flags;

if (!drm_core_check_feature(dev, DRIVER_PRIME)) return -EINVAL; @@ -637,14 +636,11 @@ int drm_prime_handle_to_fd_ioctl(struct drm_device *dev, void *data, return -ENOSYS;

/* check flags are valid */ - if (args->flags & ~DRM_CLOEXEC) + if (args->flags & ~(DRM_CLOEXEC | DRM_RDWR)) return -EINVAL;

- /* we only want to pass DRM_CLOEXEC which is == O_CLOEXEC */ - flags = args->flags & DRM_CLOEXEC; - return dev->driver->prime_handle_to_fd(dev, file_priv, - args->handle, flags, &args->fd); + args->handle, args->flags, &args->fd); }

int drm_prime_fd_to_handle_ioctl(struct drm_device *dev, void *data, diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index b4e92eb..a0ebfe7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -669,6 +669,7 @@ struct drm_set_client_cap { __u64 value; };

+#define DRM_RDWR O_RDWR #define DRM_CLOEXEC O_CLOEXEC struct drm_prime_handle { __u32 handle;

-- 2.1.4

Tiago Vignatti

9:36 p.m.

New subject: [PATCH v7 2/5] dma-buf: Remove range-based flush

This patch removes range-based information used for optimizations in begin_cpu_access and end_cpu_access.

We don't have any user nor implementation using range-based flush. It seems a consensus that if we ever want something like that again (or even more robust using 2D, 3D sub-range regions) we can use the upcoming dma-buf sync ioctl for such.

Cc: Sumit Semwal sumit.semwal@linaro.org Cc: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- Documentation/dma-buf-sharing.txt | 19 ++++++++----------- drivers/dma-buf/dma-buf.c | 13 ++++--------- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 2 +- drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 4 ++-- drivers/gpu/drm/udl/udl_fb.c | 2 -- drivers/staging/android/ion/ion.c | 6 ++---- drivers/staging/android/ion/ion_test.c | 4 ++-- include/linux/dma-buf.h | 12 +++++------- 8 files changed, 24 insertions(+), 38 deletions(-)

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 480c8de..4f4a84b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -257,17 +257,15 @@ Access to a dma_buf from the kernel context involves three steps:

Interface: int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, - size_t start, size_t len, enum dma_data_direction direction)

This allows the exporter to ensure that the memory is actually available for cpu access - the exporter might need to allocate or swap-in and pin the backing storage. The exporter also needs to ensure that cpu access is - coherent for the given range and access direction. The range and access - direction can be used by the exporter to optimize the cache flushing, i.e. - access outside of the range or with a different direction (read instead of - write) might return stale or even bogus data (e.g. when the exporter needs to - copy the data to temporary storage). + coherent for the access direction. The direction can be used by the exporter + to optimize the cache flushing, i.e. access with a different direction (read + instead of write) might return stale or even bogus data (e.g. when the + exporter needs to copy the data to temporary storage).

This step might fail, e.g. in oom conditions.

@@ -322,14 +320,13 @@ Access to a dma_buf from the kernel context involves three steps:

3. Finish access

- When the importer is done accessing the range specified in begin_cpu_access, - it needs to announce this to the exporter (to facilitate cache flushing and - unpinning of any pinned resources). The result of any dma_buf kmap calls - after end_cpu_access is undefined. + When the importer is done accessing the CPU, it needs to announce this to + the exporter (to facilitate cache flushing and unpinning of any pinned + resources). The result of any dma_buf kmap calls after end_cpu_access is + undefined.

Interface: void dma_buf_end_cpu_access(struct dma_buf *dma_buf, - size_t start, size_t len, enum dma_data_direction dir);

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 155c146..b2ac13b 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -539,13 +539,11 @@ EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); * preparations. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf: [in] buffer to prepare cpu access for. - * @start: [in] start of range for cpu access. - * @len: [in] length of range for cpu access. * @direction: [in] length of range for cpu access. * * Can return negative error values, returns 0 on success. */ -int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, +int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { int ret = 0; @@ -554,8 +552,7 @@ int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, return -EINVAL;

if (dmabuf->ops->begin_cpu_access) - ret = dmabuf->ops->begin_cpu_access(dmabuf, start, - len, direction); + ret = dmabuf->ops->begin_cpu_access(dmabuf, direction);

return ret; } @@ -567,19 +564,17 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access); * actions. Coherency is only guaranteed in the specified range for the * specified access direction. * @dmabuf: [in] buffer to complete cpu access for. - * @start: [in] start of range for cpu access. - * @len: [in] length of range for cpu access. * @direction: [in] length of range for cpu access. * * This call must always succeed. */ -void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, +void dma_buf_end_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { WARN_ON(!dmabuf);

if (dmabuf->ops->end_cpu_access) - dmabuf->ops->end_cpu_access(dmabuf, start, len, direction); + dmabuf->ops->end_cpu_access(dmabuf, direction); } EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access);

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index e9c2bfd..65ab2bd 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -196,7 +196,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct * return -EINVAL; }

-static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t length, enum dma_data_direction direction) +static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) { struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); struct drm_device *dev = obj->base.dev; diff --git a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c index 27c2976..aebae1c 100644 --- a/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c +++ b/drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c @@ -79,7 +79,7 @@ static void omap_gem_dmabuf_release(struct dma_buf *buffer)

static int omap_gem_dmabuf_begin_cpu_access(struct dma_buf *buffer, - size_t start, size_t len, enum dma_data_direction dir) + enum dma_data_direction dir) { struct drm_gem_object *obj = buffer->priv; struct page **pages; @@ -94,7 +94,7 @@ static int omap_gem_dmabuf_begin_cpu_access(struct dma_buf *buffer, }

static void omap_gem_dmabuf_end_cpu_access(struct dma_buf *buffer, - size_t start, size_t len, enum dma_data_direction dir) + enum dma_data_direction dir) { struct drm_gem_object *obj = buffer->priv; omap_gem_put_pages(obj); diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c index 200419d..c427499 100644 --- a/drivers/gpu/drm/udl/udl_fb.c +++ b/drivers/gpu/drm/udl/udl_fb.c @@ -409,7 +409,6 @@ static int udl_user_framebuffer_dirty(struct drm_framebuffer *fb,

if (ufb->obj->base.import_attach) { ret = dma_buf_begin_cpu_access(ufb->obj->base.import_attach->dmabuf, - 0, ufb->obj->base.size, DMA_FROM_DEVICE); if (ret) goto unlock; @@ -425,7 +424,6 @@ static int udl_user_framebuffer_dirty(struct drm_framebuffer *fb,

if (ufb->obj->base.import_attach) { dma_buf_end_cpu_access(ufb->obj->base.import_attach->dmabuf, - 0, ufb->obj->base.size, DMA_FROM_DEVICE); }

diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index e237e9f..0754a37 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -1057,8 +1057,7 @@ static void ion_dma_buf_kunmap(struct dma_buf *dmabuf, unsigned long offset, { }

-static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, - size_t len, +static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { struct ion_buffer *buffer = dmabuf->priv; @@ -1076,8 +1075,7 @@ static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, return PTR_ERR_OR_ZERO(vaddr); }

-static void ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t start, - size_t len, +static void ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { struct ion_buffer *buffer = dmabuf->priv; diff --git a/drivers/staging/android/ion/ion_test.c b/drivers/staging/android/ion/ion_test.c index b8dcf5a..da34bc12 100644 --- a/drivers/staging/android/ion/ion_test.c +++ b/drivers/staging/android/ion/ion_test.c @@ -109,7 +109,7 @@ static int ion_handle_test_kernel(struct dma_buf *dma_buf, void __user *ptr, if (offset > dma_buf->size || size > dma_buf->size - offset) return -EINVAL;

- ret = dma_buf_begin_cpu_access(dma_buf, offset, size, dir); + ret = dma_buf_begin_cpu_access(dma_buf, dir); if (ret) return ret;

@@ -139,7 +139,7 @@ static int ion_handle_test_kernel(struct dma_buf *dma_buf, void __user *ptr, copy_offset = 0; } err: - dma_buf_end_cpu_access(dma_buf, offset, size, dir); + dma_buf_end_cpu_access(dma_buf, dir); return ret; }

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index f98bd70..532108e 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -54,7 +54,7 @@ struct dma_buf_attachment; * @release: release this buffer; to be called after the last dma_buf_put. * @begin_cpu_access: [optional] called before cpu access to invalidate cpu * caches and allocate backing storage (if not yet done) - * respectively pin the objet into memory. + * respectively pin the object into memory. * @end_cpu_access: [optional] called after cpu access to flush caches. * @kmap_atomic: maps a page from the buffer into kernel address * space, users may not block until the subsequent unmap call. @@ -93,10 +93,8 @@ struct dma_buf_ops { /* after final dma_buf_put() */ void (*release)(struct dma_buf *);

- int (*begin_cpu_access)(struct dma_buf *, size_t, size_t, - enum dma_data_direction); - void (*end_cpu_access)(struct dma_buf *, size_t, size_t, - enum dma_data_direction); + int (*begin_cpu_access)(struct dma_buf *, enum dma_data_direction); + void (*end_cpu_access)(struct dma_buf *, enum dma_data_direction); void *(*kmap_atomic)(struct dma_buf *, unsigned long); void (*kunmap_atomic)(struct dma_buf *, unsigned long, void *); void *(*kmap)(struct dma_buf *, unsigned long); @@ -224,9 +222,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction); void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *, enum dma_data_direction); -int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t len, +int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction dir); -void dma_buf_end_cpu_access(struct dma_buf *dma_buf, size_t start, size_t len, +void dma_buf_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction dir); void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long); void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *);

-- 2.1.4

Tiago Vignatti

9:36 p.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

From: Daniel Vetter daniel.vetter@ffwll.ch

The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage.

Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- Documentation/dma-buf-sharing.txt | 21 ++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 101 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..32ac32e 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.

- No special interfaces, userspace simply calls mmap on the dma-buf fd. + No special interfaces, userspace simply calls mmap on the dma-buf fd, making + sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always* + used when the access happens. This is discussed next paragraphs. + + Some systems might need some sort of cache coherency management e.g. when + CPU and GPU domains are being accessed through dma-buf at the same time. To + circumvent this problem there are begin/end coherency markers, that forward + directly to existing dma-buf device drivers vfunc hooks. Userspace can make + use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence + would be used like following: + - mmap dma-buf fd + - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write + to mmap area 3. SYNC_END ioctl. This can be repeated as often as you + want (with the new data being consumed by the GPU or say scanout device) + - munmap once you don't need the buffer any more + + Therefore, for correctness and optimal performance, systems with the memory + cache shared by the GPU and CPU i.e. the "coherent" and also the + "incoherent" are always required to use SYNC_START and SYNC_END before and + after, respectively, when accessing the mapped address.

2. Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>

+#include <uapi/linux/dma-buf.h> + static inline int is_dma_buf_file(struct file *);

struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; }

+static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + if (!is_dma_buf_file(file)) + return -EINVAL; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & DMA_BUF_SYNC_RW) + direction = DMA_BIDIRECTIONAL; + else if (sync.flags & DMA_BUF_SYNC_READ) + direction = DMA_FROM_DEVICE; + else if (sync.flags & DMA_BUF_SYNC_WRITE) + direction = DMA_TO_DEVICE; + else + return -EINVAL; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + return -EINVAL; + + if (sync.flags & DMA_BUF_SYNC_END) + dma_buf_end_cpu_access(dmabuf, direction); + else + dma_buf_begin_cpu_access(dmabuf, direction); + + return 0; + default: + return -ENOTTY; + } +} + static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll, + .unlocked_ioctl = dma_buf_ioctl, };

/* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..4a9b36b --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/* + * Framework for buffer objects that can be shared across devices/subsystems. + * + * Copyright(C) 2015 Intel Ltd + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_ + +/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync { + __u64 flags; +}; + +#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \ + (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) + +#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync) + +#endif

-- 2.1.4

David Herrmann

9 Feb 9 Feb

9:26 a.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...

From: Daniel Vetter daniel.vetter@ffwll.ch

The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage.

Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

Documentation/dma-buf-sharing.txt | 21 ++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 101 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..32ac32e 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.

No special interfaces, userspace simply calls mmap on the dma-buf fd.
No special interfaces, userspace simply calls mmap on the dma-buf fd, making

sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always*

used when the access happens. This is discussed next paragraphs.

Some systems might need some sort of cache coherency management e.g. when

CPU and GPU domains are being accessed through dma-buf at the same time. To

circumvent this problem there are begin/end coherency markers, that forward

directly to existing dma-buf device drivers vfunc hooks. Userspace can make

use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence

would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
  to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
  want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
Therefore, for correctness and optimal performance, systems with the memory

cache shared by the GPU and CPU i.e. the "coherent" and also the

"incoherent" are always required to use SYNC_START and SYNC_END before and

after, respectively, when accessing the mapped address.
Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>

+#include <uapi/linux/dma-buf.h>

static inline int is_dma_buf_file(struct file *);

struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; }

+static long dma_buf_ioctl(struct file *file,
                    unsigned int cmd, unsigned long arg)
+{
  struct dma_buf *dmabuf;
  struct dma_buf_sync sync;
  enum dma_data_direction direction;
  dmabuf = file->private_data;
  if (!is_dma_buf_file(file))
          return -EINVAL;

Why? This can never happen, and you better not use dma_buf_ioctl() outside of dma_buf_fops.. I guess it's simply copied from the other fop callbacks, but I don't see why. dma_buf_poll() doesn't do it, neither should this, or one of the other 3 callbacks.

...

```
  switch (cmd) {
```
```
  case DMA_BUF_IOCTL_SYNC:
```

          if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))

```
                  return -EFAULT;
```

          if (sync.flags & DMA_BUF_SYNC_RW)

                  direction = DMA_BIDIRECTIONAL;

          else if (sync.flags & DMA_BUF_SYNC_READ)

                  direction = DMA_FROM_DEVICE;

          else if (sync.flags & DMA_BUF_SYNC_WRITE)

                  direction = DMA_TO_DEVICE;

```
          else
```
```
                  return -EINVAL;
```

This looks bogus. It always ends up being "DMA_BIDIRECTIONAL" or EINVAL. I recommend changing it to:

switch (sync.flags & DMA_BUF_SYNC_RW) { case DMA_BUF_SYNC_READ: direction = DMA_FROM_DEVICE; break; case DMA_BUF_SYNC_WRITE: direction = DMA_TO_DEVICE; break; case DMA_BUF_SYNC_READ: direction = DMA_BIDIRECTIONAL; break; default: return -EINVAL; }

...

          if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)

```
                  return -EINVAL;
```

Why isn't this done immediately after copy_from_user()?

...

          if (sync.flags & DMA_BUF_SYNC_END)

                  dma_buf_end_cpu_access(dmabuf, direction);

```
          else
```

                  dma_buf_begin_cpu_access(dmabuf, direction);

Why are SYNC_START and SYNC_END exclusive? It feels very natural to me to invoke both at the same time (especially if two objects are stored in the same dma-buf).

...

          return 0;
  default:
          return -ENOTTY;
  }
+}

static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
  .unlocked_ioctl = dma_buf_ioctl,
};

/* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..4a9b36b --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/*

Framework for buffer objects that can be shared across devices/subsystems.

Copyright(C) 2015 Intel Ltd

This program is free software; you can redistribute it and/or modify it

under the terms of the GNU General Public License version 2 as published by

the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for

more details.

You should have received a copy of the GNU General Public License along with

this program. If not, see http://www.gnu.org/licenses/.

*/

+#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_

+/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync {
  __u64 flags;
+};

Please add '#include <linux/types.h>', otherwise this header cannot be compiled on its own (missing __u64).

...

+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
  (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)

Why _IOW? A read-only ioctl should be able to call DMA_BUF_SYNC_READ, right?

Thanks David

...

+#endif

Daniel Vetter

10:20 a.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

On Tue, Feb 09, 2016 at 10:26:25AM +0100, David Herrmann wrote:

...

Hi

On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
From: Daniel Vetter daniel.vetter@ffwll.ch

The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage.

Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

Documentation/dma-buf-sharing.txt | 21 ++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 101 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..32ac32e 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.

No special interfaces, userspace simply calls mmap on the dma-buf fd.
No special interfaces, userspace simply calls mmap on the dma-buf fd, making

sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always*

used when the access happens. This is discussed next paragraphs.

Some systems might need some sort of cache coherency management e.g. when

CPU and GPU domains are being accessed through dma-buf at the same time. To

circumvent this problem there are begin/end coherency markers, that forward

directly to existing dma-buf device drivers vfunc hooks. Userspace can make

use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence

would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
  to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
  want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
Therefore, for correctness and optimal performance, systems with the memory

cache shared by the GPU and CPU i.e. the "coherent" and also the

"incoherent" are always required to use SYNC_START and SYNC_END before and

after, respectively, when accessing the mapped address.
Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>

+#include <uapi/linux/dma-buf.h>

static inline int is_dma_buf_file(struct file *);

struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; }

+static long dma_buf_ioctl(struct file *file,
                    unsigned int cmd, unsigned long arg)
+{
  struct dma_buf *dmabuf;
  struct dma_buf_sync sync;
  enum dma_data_direction direction;
  dmabuf = file->private_data;
  if (!is_dma_buf_file(file))
          return -EINVAL;
Why? This can never happen, and you better not use dma_buf_ioctl() outside of dma_buf_fops.. I guess it's simply copied from the other fop callbacks, but I don't see why. dma_buf_poll() doesn't do it, neither should this, or one of the other 3 callbacks.

...
  switch (cmd) {
  case DMA_BUF_IOCTL_SYNC:
          if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
                  return -EFAULT;
          if (sync.flags & DMA_BUF_SYNC_RW)
                  direction = DMA_BIDIRECTIONAL;
          else if (sync.flags & DMA_BUF_SYNC_READ)
                  direction = DMA_FROM_DEVICE;
          else if (sync.flags & DMA_BUF_SYNC_WRITE)
                  direction = DMA_TO_DEVICE;
          else
                  return -EINVAL;
This looks bogus. It always ends up being "DMA_BIDIRECTIONAL" or EINVAL. I recommend changing it to:

switch (sync.flags & DMA_BUF_SYNC_RW) { case DMA_BUF_SYNC_READ: direction = DMA_FROM_DEVICE; break; case DMA_BUF_SYNC_WRITE: direction = DMA_TO_DEVICE; break; case DMA_BUF_SYNC_READ: direction = DMA_BIDIRECTIONAL; break; default: return -EINVAL; }

...
          if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
                  return -EINVAL;
Why isn't this done immediately after copy_from_user()?

...
          if (sync.flags & DMA_BUF_SYNC_END)
                  dma_buf_end_cpu_access(dmabuf, direction);
          else
                  dma_buf_begin_cpu_access(dmabuf, direction);
Why are SYNC_START and SYNC_END exclusive? It feels very natural to me to invoke both at the same time (especially if two objects are stored in the same dma-buf).

...
          return 0;
  default:
          return -ENOTTY;
  }
+}

static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
  .unlocked_ioctl = dma_buf_ioctl,
};

/* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..4a9b36b --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/*

Framework for buffer objects that can be shared across devices/subsystems.

Copyright(C) 2015 Intel Ltd

This program is free software; you can redistribute it and/or modify it

under the terms of the GNU General Public License version 2 as published by

the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for

more details.

You should have received a copy of the GNU General Public License along with

this program. If not, see http://www.gnu.org/licenses/.

*/

+#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_

+/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync {
  __u64 flags;
+};
Please add '#include <linux/types.h>', otherwise this header cannot be compiled on its own (missing __u64).

...
+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
  (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
Why _IOW? A read-only ioctl should be able to call DMA_BUF_SYNC_READ, right?

Oops, looks like more work is needed. Thanks for your review.

I dropped patches 3-5 again meanwhile from drm-misc. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Daniel Vetter

10:52 a.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

On Tue, Feb 09, 2016 at 11:20:20AM +0100, Daniel Vetter wrote:

...

On Tue, Feb 09, 2016 at 10:26:25AM +0100, David Herrmann wrote:

...
Hi

On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
From: Daniel Vetter daniel.vetter@ffwll.ch

The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage.

Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

Documentation/dma-buf-sharing.txt | 21 ++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 101 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..32ac32e 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.

No special interfaces, userspace simply calls mmap on the dma-buf fd.
No special interfaces, userspace simply calls mmap on the dma-buf fd, making

sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always*

used when the access happens. This is discussed next paragraphs.

Some systems might need some sort of cache coherency management e.g. when

CPU and GPU domains are being accessed through dma-buf at the same time. To

circumvent this problem there are begin/end coherency markers, that forward

directly to existing dma-buf device drivers vfunc hooks. Userspace can make

use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence

would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
  to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
  want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
Therefore, for correctness and optimal performance, systems with the memory

cache shared by the GPU and CPU i.e. the "coherent" and also the

"incoherent" are always required to use SYNC_START and SYNC_END before and

after, respectively, when accessing the mapped address.
Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>

+#include <uapi/linux/dma-buf.h>

static inline int is_dma_buf_file(struct file *);

struct dma_buf_list { @@ -251,11 +253,52 @@ out: return events; }

+static long dma_buf_ioctl(struct file *file,
                    unsigned int cmd, unsigned long arg)
+{
  struct dma_buf *dmabuf;
  struct dma_buf_sync sync;
  enum dma_data_direction direction;
  dmabuf = file->private_data;
  if (!is_dma_buf_file(file))
          return -EINVAL;
Why? This can never happen, and you better not use dma_buf_ioctl() outside of dma_buf_fops.. I guess it's simply copied from the other fop callbacks, but I don't see why. dma_buf_poll() doesn't do it, neither should this, or one of the other 3 callbacks.

...
  switch (cmd) {
  case DMA_BUF_IOCTL_SYNC:
          if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
                  return -EFAULT;
          if (sync.flags & DMA_BUF_SYNC_RW)
                  direction = DMA_BIDIRECTIONAL;
          else if (sync.flags & DMA_BUF_SYNC_READ)
                  direction = DMA_FROM_DEVICE;
          else if (sync.flags & DMA_BUF_SYNC_WRITE)
                  direction = DMA_TO_DEVICE;
          else
                  return -EINVAL;
This looks bogus. It always ends up being "DMA_BIDIRECTIONAL" or EINVAL. I recommend changing it to:

switch (sync.flags & DMA_BUF_SYNC_RW) { case DMA_BUF_SYNC_READ: direction = DMA_FROM_DEVICE; break; case DMA_BUF_SYNC_WRITE: direction = DMA_TO_DEVICE; break; case DMA_BUF_SYNC_READ: direction = DMA_BIDIRECTIONAL; break; default: return -EINVAL; }

...
          if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
                  return -EINVAL;
Why isn't this done immediately after copy_from_user()?

...
          if (sync.flags & DMA_BUF_SYNC_END)
                  dma_buf_end_cpu_access(dmabuf, direction);
          else
                  dma_buf_begin_cpu_access(dmabuf, direction);
Why are SYNC_START and SYNC_END exclusive? It feels very natural to me to invoke both at the same time (especially if two objects are stored in the same dma-buf).

...
          return 0;
  default:
          return -ENOTTY;
  }
+}

static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
  .unlocked_ioctl = dma_buf_ioctl,
};

/* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..4a9b36b --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/*

Framework for buffer objects that can be shared across devices/subsystems.

Copyright(C) 2015 Intel Ltd

This program is free software; you can redistribute it and/or modify it

under the terms of the GNU General Public License version 2 as published by

the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for

more details.

You should have received a copy of the GNU General Public License along with

this program. If not, see http://www.gnu.org/licenses/.

*/

+#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_

+/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync {
  __u64 flags;
+};
Please add '#include <linux/types.h>', otherwise this header cannot be compiled on its own (missing __u64).

...
+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
  (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
Why _IOW? A read-only ioctl should be able to call DMA_BUF_SYNC_READ, right?
Oops, looks like more work is needed. Thanks for your review.

I dropped patches 3-5 again meanwhile from drm-misc.

Correction: Only dropped this patch, since the internal interfaces stayed the same. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Tiago Vignatti

11 Feb 11 Feb

5:54 p.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

Thanks for reviewing, David. Please take a look in my comments in-line.

On 02/09/2016 07:26 AM, David Herrmann wrote:

...

On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
From: Daniel Vetter daniel.vetter@ffwll.ch

The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage.

Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

Documentation/dma-buf-sharing.txt | 21 ++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 101 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..32ac32e 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.

No special interfaces, userspace simply calls mmap on the dma-buf fd.
No special interfaces, userspace simply calls mmap on the dma-buf fd, making

sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always*

used when the access happens. This is discussed next paragraphs.

Some systems might need some sort of cache coherency management e.g. when

CPU and GPU domains are being accessed through dma-buf at the same time. To

circumvent this problem there are begin/end coherency markers, that forward

directly to existing dma-buf device drivers vfunc hooks. Userspace can make

use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence

would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
  to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
  want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
Therefore, for correctness and optimal performance, systems with the memory

cache shared by the GPU and CPU i.e. the "coherent" and also the

"incoherent" are always required to use SYNC_START and SYNC_END before and

after, respectively, when accessing the mapped address.

Supporting existing mmap interfaces in importers
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>

+#include <uapi/linux/dma-buf.h>

static inline int is_dma_buf_file(struct file *);

struct dma_buf_list {

@@ -251,11 +253,52 @@ out: return events; }

+static long dma_buf_ioctl(struct file *file,
                    unsigned int cmd, unsigned long arg)
+{
  struct dma_buf *dmabuf;
  struct dma_buf_sync sync;
  enum dma_data_direction direction;
  dmabuf = file->private_data;
  if (!is_dma_buf_file(file))
          return -EINVAL;
Why? This can never happen, and you better not use dma_buf_ioctl() outside of dma_buf_fops.. I guess it's simply copied from the other fop callbacks, but I don't see why. dma_buf_poll() doesn't do it, neither should this, or one of the other 3 callbacks.

yup. I fixed now.

...

...
  switch (cmd) {
  case DMA_BUF_IOCTL_SYNC:
          if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
                  return -EFAULT;
          if (sync.flags & DMA_BUF_SYNC_RW)
                  direction = DMA_BIDIRECTIONAL;
          else if (sync.flags & DMA_BUF_SYNC_READ)
                  direction = DMA_FROM_DEVICE;
          else if (sync.flags & DMA_BUF_SYNC_WRITE)
                  direction = DMA_TO_DEVICE;
          else
                  return -EINVAL;
This looks bogus. It always ends up being "DMA_BIDIRECTIONAL" or EINVAL. I recommend changing it to:

switch (sync.flags & DMA_BUF_SYNC_RW) { case DMA_BUF_SYNC_READ: direction = DMA_FROM_DEVICE; break; case DMA_BUF_SYNC_WRITE: direction = DMA_TO_DEVICE; break; case DMA_BUF_SYNC_READ: direction = DMA_BIDIRECTIONAL; break; default: return -EINVAL; }

hmm I can't really get what's wrong with my snip. Why bogus? Can you double-check actually your suggestion, cause that's wrong with _READ being repeated.

...

          if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)

```
                  return -EINVAL;
```

Why isn't this done immediately after copy_from_user()?

done.

...

...
          if (sync.flags & DMA_BUF_SYNC_END)
                  dma_buf_end_cpu_access(dmabuf, direction);
          else
                  dma_buf_begin_cpu_access(dmabuf, direction);
Why are SYNC_START and SYNC_END exclusive? It feels very natural to me to invoke both at the same time (especially if two objects are stored in the same dma-buf).

Can you open a bit and teach how two objects would be stored in the same dma-buf? I didn't care about this case and if we want that, we'd need also to change the sequence of accesses as described in the dma-buf-sharing.txt I'm proposing in this patch.

...

...
          return 0;
  default:
          return -ENOTTY;
  }
+}
static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
  .unlocked_ioctl = dma_buf_ioctl,
};

/*
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..4a9b36b --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/*

Framework for buffer objects that can be shared across devices/subsystems.

Copyright(C) 2015 Intel Ltd

This program is free software; you can redistribute it and/or modify it

under the terms of the GNU General Public License version 2 as published by

the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for

more details.

You should have received a copy of the GNU General Public License along with

this program. If not, see http://www.gnu.org/licenses/.

*/

+#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_

+/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync {
  __u64 flags;
+};
Please add '#include <linux/types.h>', otherwise this header cannot be compiled on its own (missing __u64).

done.

...

...
+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
  (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
Why _IOW? A read-only ioctl should be able to call DMA_BUF_SYNC_READ, right?

yup. I've changed it to _IOWR now.

Tiago

Alex Deucher

6 p.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

On Thu, Feb 11, 2016 at 12:54 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...

Thanks for reviewing, David. Please take a look in my comments in-line.

On 02/09/2016 07:26 AM, David Herrmann wrote:

...
On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
From: Daniel Vetter daniel.vetter@ffwll.ch

The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage.

Cc: Sumit Semwal sumit.semwal@linaro.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

Documentation/dma-buf-sharing.txt | 21 ++++++++++++++++++- drivers/dma-buf/dma-buf.c | 43 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 38 ++++++++++++++++++++++++++++++++++ 3 files changed, 101 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..32ac32e 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.

No special interfaces, userspace simply calls mmap on the dma-buf fd.

No special interfaces, userspace simply calls mmap on the dma-buf fd,

making

sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is

*always*

used when the access happens. This is discussed next paragraphs.

Some systems might need some sort of cache coherency management e.g.

when

CPU and GPU domains are being accessed through dma-buf at the same

time. To

circumvent this problem there are begin/end coherency markers, that

forward

directly to existing dma-buf device drivers vfunc hooks. Userspace

can make

use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The

sequence
would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2.
read/write
  to mmap area 3. SYNC_END ioctl. This can be repeated as often as
you
  want (with the new data being consumed by the GPU or say scanout
device)
- munmap once you don't need the buffer any more
Therefore, for correctness and optimal performance, systems with the
memory

cache shared by the GPU and CPU i.e. the "coherent" and also the

"incoherent" are always required to use SYNC_START and SYNC_END

before and

after, respectively, when accessing the mapped address.

Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9a298bd 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>

+#include <uapi/linux/dma-buf.h>

static inline int is_dma_buf_file(struct file *);

struct dma_buf_list {

@@ -251,11 +253,52 @@ out: return events; }

+static long dma_buf_ioctl(struct file *file,
                    unsigned int cmd, unsigned long arg)
+{
  struct dma_buf *dmabuf;
  struct dma_buf_sync sync;
  enum dma_data_direction direction;
  dmabuf = file->private_data;
  if (!is_dma_buf_file(file))
          return -EINVAL;
Why? This can never happen, and you better not use dma_buf_ioctl() outside of dma_buf_fops.. I guess it's simply copied from the other fop callbacks, but I don't see why. dma_buf_poll() doesn't do it, neither should this, or one of the other 3 callbacks.
yup. I fixed now.

...
...
  switch (cmd) {
  case DMA_BUF_IOCTL_SYNC:
          if (copy_from_user(&sync, (void __user *) arg,
sizeof(sync)))
                  return -EFAULT;
          if (sync.flags & DMA_BUF_SYNC_RW)
                  direction = DMA_BIDIRECTIONAL;
          else if (sync.flags & DMA_BUF_SYNC_READ)
                  direction = DMA_FROM_DEVICE;
          else if (sync.flags & DMA_BUF_SYNC_WRITE)
                  direction = DMA_TO_DEVICE;
          else
                  return -EINVAL;
This looks bogus. It always ends up being "DMA_BIDIRECTIONAL" or EINVAL. I recommend changing it to:

switch (sync.flags & DMA_BUF_SYNC_RW) { case DMA_BUF_SYNC_READ: direction = DMA_FROM_DEVICE; break; case DMA_BUF_SYNC_WRITE: direction = DMA_TO_DEVICE; break; case DMA_BUF_SYNC_READ: direction = DMA_BIDIRECTIONAL; break; default: return -EINVAL; }
hmm I can't really get what's wrong with my snip. Why bogus? Can you double-check actually your suggestion, cause that's wrong with _READ being repeated.

It should be something like: switch (sync.flags & DMA_BUF_SYNC_RW) { case DMA_BUF_SYNC_READ: direction = DMA_FROM_DEVICE; break; case DMA_BUF_SYNC_WRITE: direction = DMA_TO_DEVICE; break; case DMA_BUF_SYNC_RW: direction = DMA_BIDIRECTIONAL; break; default: return -EINVAL; }

In your code, the first if case: + if (sync.flags & DMA_BUF_SYNC_RW) Will catch read, write and rw.

Alex

...

...
...
          if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
                  return -EINVAL;
Why isn't this done immediately after copy_from_user()?
done.

...
...
          if (sync.flags & DMA_BUF_SYNC_END)
                  dma_buf_end_cpu_access(dmabuf, direction);
          else
                  dma_buf_begin_cpu_access(dmabuf, direction);
Why are SYNC_START and SYNC_END exclusive? It feels very natural to me to invoke both at the same time (especially if two objects are stored in the same dma-buf).
Can you open a bit and teach how two objects would be stored in the same dma-buf? I didn't care about this case and if we want that, we'd need also to change the sequence of accesses as described in the dma-buf-sharing.txt I'm proposing in this patch.

...
...
          return 0;
  default:
          return -ENOTTY;
  }
+}
static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
  .unlocked_ioctl = dma_buf_ioctl,
};

/*
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..4a9b36b --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,38 @@ +/*

Framework for buffer objects that can be shared across

devices/subsystems.

Copyright(C) 2015 Intel Ltd

This program is free software; you can redistribute it and/or modify

it

under the terms of the GNU General Public License version 2 as

published by

the Free Software Foundation.

This program is distributed in the hope that it will be useful, but

WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License

for

more details.

You should have received a copy of the GNU General Public License

along with

this program. If not, see http://www.gnu.org/licenses/.

*/

+#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_

+/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync {
  __u64 flags;
+};
Please add '#include <linux/types.h>', otherwise this header cannot be compiled on its own (missing __u64).
done.

...
...
+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
  (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
Why _IOW? A read-only ioctl should be able to call DMA_BUF_SYNC_READ, right?
yup. I've changed it to _IOWR now.

Tiago

dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

David Herrmann

6:08 p.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

On Thu, Feb 11, 2016 at 6:54 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...

On 02/09/2016 07:26 AM, David Herrmann wrote:

...
...
  switch (cmd) {
  case DMA_BUF_IOCTL_SYNC:
          if (copy_from_user(&sync, (void __user *) arg,
sizeof(sync)))
                  return -EFAULT;
          if (sync.flags & DMA_BUF_SYNC_RW)
                  direction = DMA_BIDIRECTIONAL;
          else if (sync.flags & DMA_BUF_SYNC_READ)
                  direction = DMA_FROM_DEVICE;
          else if (sync.flags & DMA_BUF_SYNC_WRITE)
                  direction = DMA_TO_DEVICE;
          else
                  return -EINVAL;
This looks bogus. It always ends up being "DMA_BIDIRECTIONAL" or EINVAL. I recommend changing it to:

switch (sync.flags & DMA_BUF_SYNC_RW) { case DMA_BUF_SYNC_READ: direction = DMA_FROM_DEVICE; break; case DMA_BUF_SYNC_WRITE: direction = DMA_TO_DEVICE; break; case DMA_BUF_SYNC_READ: direction = DMA_BIDIRECTIONAL; break; default: return -EINVAL; }
hmm I can't really get what's wrong with my snip. Why bogus? Can you double-check actually your suggestion, cause that's wrong with _READ being repeated.

You did this:

if (sync.flags & DMA_BUF_SYNC_RW)

...but what you meant is this:

if ((sync.flags & DMA_BUF_SYNC_RW) == DMA_BUF_SYNC_RW)

Feel free to fix it with this simple change. I just thought a switch() statement would be easier to read. And yes, I screwed up the third 'case' statement, which should read DMA_BUF_SYNC_RW rather than DMA_BUF_SYNC_READ. Sorry for that.

...

...
...
          if (sync.flags & DMA_BUF_SYNC_END)
                  dma_buf_end_cpu_access(dmabuf, direction);
          else
                  dma_buf_begin_cpu_access(dmabuf, direction);
Why are SYNC_START and SYNC_END exclusive? It feels very natural to me to invoke both at the same time (especially if two objects are stored in the same dma-buf).
Can you open a bit and teach how two objects would be stored in the same dma-buf? I didn't care about this case and if we want that, we'd need also to change the sequence of accesses as described in the dma-buf-sharing.txt I'm proposing in this patch.

Just store two frames next to each other in the same BO. Create two DRM-FBs with different offsets, covering one frame each. Now you can just switch between the two FBs, backed by the same object.

I'm not saying that this is a good idea. I just wondered why the START/END was exclusive, rather than inclusive. But.. I guess it is cheap enough that someone can just call ioctl(END) followed by ioctl(START).

...

...
...
+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
  (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
Why _IOW? A read-only ioctl should be able to call DMA_BUF_SYNC_READ, right?
yup. I've changed it to _IOWR now.

Well, I'd have used _IOC_NONE, rather than READ/WRITE, but I just checked and it seems vfs doesn't even enforce them. So... eh... I don't care.

Thanks David

Ville Syrjälä

6:08 p.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

On Thu, Feb 11, 2016 at 03:54:38PM -0200, Tiago Vignatti wrote:

...

Thanks for reviewing, David. Please take a look in my comments in-line.

On 02/09/2016 07:26 AM, David Herrmann wrote:

...
On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
From: Daniel Vetter daniel.vetter@ffwll.ch

<snip>

...

...
...
+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
  (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
Why _IOW? A read-only ioctl should be able to call DMA_BUF_SYNC_READ, right?
yup. I've changed it to _IOWR now.

AFAICS the ioctl only does copy_from_user() so _IOW seemed perfectly correct to me.

-- Ville Syrjälä Intel OTC

David Herrmann

6:19 p.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

On Thu, Feb 11, 2016 at 7:08 PM, Ville Syrjälä ville.syrjala@linux.intel.com wrote:

...

On Thu, Feb 11, 2016 at 03:54:38PM -0200, Tiago Vignatti wrote:

...
Thanks for reviewing, David. Please take a look in my comments in-line.

On 02/09/2016 07:26 AM, David Herrmann wrote:

...
On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
From: Daniel Vetter daniel.vetter@ffwll.ch

<snip> > >> + > >> +#define DMA_BUF_SYNC_READ (1 << 0) > >> +#define DMA_BUF_SYNC_WRITE (2 << 0) > >> +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) > >> +#define DMA_BUF_SYNC_START (0 << 2) > >> +#define DMA_BUF_SYNC_END (1 << 2) > >> +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \ > >> + (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) > >> + > >> +#define DMA_BUF_BASE 'b' > >> +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync) > > > > Why _IOW? A read-only ioctl should be able to call DMA_BUF_SYNC_READ, right? > > yup. I've changed it to _IOWR now.

AFAICS the ioctl only does copy_from_user() so _IOW seemed perfectly correct to me.

AFAIK, the _IOC_DIR() arguments in ioctls specify the mode of the operation, not the way the arguments are used. So if read(2) was an ioctl, it would be annotated as _IOC_READ (even though it _writes_ into the passed buffer). write(2) would be annotated as _IOC_WRITE (even though it _reads_ the buffer). As such, they correspond to the file-access mode, whether you opened it readable and/or writable.

Anyway, turns out VFS does not verify those. As such, you can specify whatever you want. I just checked with different existing ioctls throughout the kernel (`git grep _IOC_DIR`), and they seem to match what I describe.

But I don't care much. I guess _IORW is fine either way. David

Ville Syrjälä

7:10 p.m.

New subject: [PATCH v7 3/5] dma-buf: Add ioctls to allow userspace to flush

On Thu, Feb 11, 2016 at 07:19:45PM +0100, David Herrmann wrote:

...

Hi

On Thu, Feb 11, 2016 at 7:08 PM, Ville Syrjälä ville.syrjala@linux.intel.com wrote:

...
On Thu, Feb 11, 2016 at 03:54:38PM -0200, Tiago Vignatti wrote:

...
Thanks for reviewing, David. Please take a look in my comments in-line.

On 02/09/2016 07:26 AM, David Herrmann wrote:

...
On Tue, Dec 22, 2015 at 10:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
From: Daniel Vetter daniel.vetter@ffwll.ch

<snip> > >> + > >> +#define DMA_BUF_SYNC_READ (1 << 0) > >> +#define DMA_BUF_SYNC_WRITE (2 << 0) > >> +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) > >> +#define DMA_BUF_SYNC_START (0 << 2) > >> +#define DMA_BUF_SYNC_END (1 << 2) > >> +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \ > >> + (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) > >> + > >> +#define DMA_BUF_BASE 'b' > >> +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync) > > > > Why _IOW? A read-only ioctl should be able to call DMA_BUF_SYNC_READ, right? > > yup. I've changed it to _IOWR now.

AFAICS the ioctl only does copy_from_user() so _IOW seemed perfectly correct to me.

AFAIK, the _IOC_DIR() arguments in ioctls specify the mode of the operation, not the way the arguments are used. So if read(2) was an ioctl, it would be annotated as _IOC_READ (even though it _writes_ into the passed buffer). write(2) would be annotated as _IOC_WRITE (even though it _reads_ the buffer). As such, they correspond to the file-access mode, whether you opened it readable and/or writable.

Anyway, turns out VFS does not verify those. As such, you can specify whatever you want. I just checked with different existing ioctls throughout the kernel (`git grep _IOC_DIR`), and they seem to match what I describe.

Not sure which ones you checked because I don't think I've ever seen that interpretation in the kernel. Though I suppose often it sort of matches, eg. when the ioctl just gets/sets some value. Any relationship to some hardware operation is mostly coincidental (well, except when the hardware really just does some kind of read/write operation). And for lost ioctls there is no hardware interaction whatsoever.

As far as checking the file access mode goes, well lots of ioctls totally ignore it.

About the direction you're right. _IOW means userspace writes to the kernel, and _IOR means userspace reads from the kernel. Which is exactly what the code did.

Anyway ioctl-numbers.txt says this: _IO an ioctl with no parameters _IOW an ioctl with write parameters (copy_from_user) _IOR an ioctl with read parameters (copy_to_user) _IOWR an ioctl with both write and read parameters.

so I win ;)

-- Ville Syrjälä Intel OTC

Tiago Vignatti

10:04 p.m.

New subject: [PATCH v9] dma-buf: Add ioctls to allow userspace to flush

From: Daniel Vetter daniel.vetter@ffwll.ch

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: David Herrmann dh.herrmann@gmail.com Cc: Sumit Semwal sumit.semwal@linaro.org Reviewed-by: Stéphane Marchesin marcheu@chromium.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com ---

I left SYNC_START and SYNC_END exclusive, just how the logic was before. If we see an useful use case, maybe like the way David said, to store two frames next to each other in the same BO, we can patch up later fairly easily.

About the ioctl direction, just like Ville pointed, we're doing only copy_from_user at the moment and seems that _IOW is all we need. So I also didn't touch anything on that.

David, Ville PTAL. Thank you,

Tiago

Documentation/dma-buf-sharing.txt | 21 +++++++++++++++++- drivers/dma-buf/dma-buf.c | 45 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 40 ++++++++++++++++++++++++++++++++++ 3 files changed, 105 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h

2. Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9810d1d 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>

+#include <uapi/linux/dma-buf.h> + static inline int is_dma_buf_file(struct file *);

struct dma_buf_list { @@ -251,11 +253,54 @@ out: return events; }

+static long dma_buf_ioctl(struct file *file, + unsigned int cmd, unsigned long arg) +{ + struct dma_buf *dmabuf; + struct dma_buf_sync sync; + enum dma_data_direction direction; + + dmabuf = file->private_data; + + switch (cmd) { + case DMA_BUF_IOCTL_SYNC: + if (copy_from_user(&sync, (void __user *) arg, sizeof(sync))) + return -EFAULT; + + if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK) + return -EINVAL; + + switch (sync.flags & DMA_BUF_SYNC_RW) { + case DMA_BUF_SYNC_READ: + direction = DMA_FROM_DEVICE; + break; + case DMA_BUF_SYNC_WRITE: + direction = DMA_TO_DEVICE; + break; + case DMA_BUF_SYNC_RW: + direction = DMA_BIDIRECTIONAL; + break; + default: + return -EINVAL; + } + + if (sync.flags & DMA_BUF_SYNC_END) + dma_buf_end_cpu_access(dmabuf, direction); + else + dma_buf_begin_cpu_access(dmabuf, direction); + + return 0; + default: + return -ENOTTY; + } +} + static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll, + .unlocked_ioctl = dma_buf_ioctl, };

/* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..fb0dedb --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,40 @@ +/* + * Framework for buffer objects that can be shared across devices/subsystems. + * + * Copyright(C) 2015 Intel Ltd + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_ + +#include <linux/types.h> + +/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync { + __u64 flags; +}; + +#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \ + (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END) + +#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync) + +#endif

-- 2.1.4

David Herrmann

12 Feb 12 Feb

2:50 p.m.

New subject: [PATCH v9] dma-buf: Add ioctls to allow userspace to flush

On Thu, Feb 11, 2016 at 11:04 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...

From: Daniel Vetter daniel.vetter@ffwll.ch

The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage. v9 (Tiago): remove useless is_dma_buf_file check. Fix sync.flags conditionals and its mask order check. Add <linux/types.h> include in dma-buf.h.

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: David Herrmann dh.herrmann@gmail.com Cc: Sumit Semwal sumit.semwal@linaro.org Reviewed-by: Stéphane Marchesin marcheu@chromium.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

I left SYNC_START and SYNC_END exclusive, just how the logic was before. If we see an useful use case, maybe like the way David said, to store two frames next to each other in the same BO, we can patch up later fairly easily.

About the ioctl direction, just like Ville pointed, we're doing only copy_from_user at the moment and seems that _IOW is all we need. So I also didn't touch anything on that.

Fair enough.

All I have left are comments on coding-style, and I'll refrain from verbalizing them.. (gnah, it is so tempting).

Reviewed-by: David Herrmann dh.herrmann@gmail.com

Thanks! David

...

Documentation/dma-buf-sharing.txt | 21 +++++++++++++++++- drivers/dma-buf/dma-buf.c | 45 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/dma-buf.h | 40 ++++++++++++++++++++++++++++++++++ 3 files changed, 105 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/dma-buf.h

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 4f4a84b..32ac32e 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -350,7 +350,26 @@ Being able to mmap an export dma-buf buffer object has 2 main use-cases: handles, too). So it's beneficial to support this in a similar fashion on dma-buf to have a good transition path for existing Android userspace.

No special interfaces, userspace simply calls mmap on the dma-buf fd.
No special interfaces, userspace simply calls mmap on the dma-buf fd, making

sure that the cache synchronization ioctl (DMA_BUF_IOCTL_SYNC) is *always*

used when the access happens. This is discussed next paragraphs.

Some systems might need some sort of cache coherency management e.g. when

CPU and GPU domains are being accessed through dma-buf at the same time. To

circumvent this problem there are begin/end coherency markers, that forward

directly to existing dma-buf device drivers vfunc hooks. Userspace can make

use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence

would be used like following:
- mmap dma-buf fd
- for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write
  to mmap area 3. SYNC_END ioctl. This can be repeated as often as you
  want (with the new data being consumed by the GPU or say scanout device)
- munmap once you don't need the buffer any more
Therefore, for correctness and optimal performance, systems with the memory

cache shared by the GPU and CPU i.e. the "coherent" and also the

"incoherent" are always required to use SYNC_START and SYNC_END before and

after, respectively, when accessing the mapped address.
Supporting existing mmap interfaces in importers

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index b2ac13b..9810d1d 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -34,6 +34,8 @@ #include <linux/poll.h> #include <linux/reservation.h>

+#include <uapi/linux/dma-buf.h>

static inline int is_dma_buf_file(struct file *);

struct dma_buf_list { @@ -251,11 +253,54 @@ out: return events; }

+static long dma_buf_ioctl(struct file *file,
                    unsigned int cmd, unsigned long arg)
+{
  struct dma_buf *dmabuf;
  struct dma_buf_sync sync;
  enum dma_data_direction direction;
  dmabuf = file->private_data;
  switch (cmd) {
  case DMA_BUF_IOCTL_SYNC:
          if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
                  return -EFAULT;
          if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
                  return -EINVAL;
          switch (sync.flags & DMA_BUF_SYNC_RW) {
          case DMA_BUF_SYNC_READ:
                  direction = DMA_FROM_DEVICE;
                  break;
          case DMA_BUF_SYNC_WRITE:
                  direction = DMA_TO_DEVICE;
                  break;
          case DMA_BUF_SYNC_RW:
                  direction = DMA_BIDIRECTIONAL;
                  break;
          default:
                  return -EINVAL;
          }
          if (sync.flags & DMA_BUF_SYNC_END)
                  dma_buf_end_cpu_access(dmabuf, direction);
          else
                  dma_buf_begin_cpu_access(dmabuf, direction);
          return 0;
  default:
          return -ENOTTY;
  }
+}

static const struct file_operations dma_buf_fops = { .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
  .unlocked_ioctl = dma_buf_ioctl,
};

/* diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h new file mode 100644 index 0000000..fb0dedb --- /dev/null +++ b/include/uapi/linux/dma-buf.h @@ -0,0 +1,40 @@ +/*

Framework for buffer objects that can be shared across devices/subsystems.

Copyright(C) 2015 Intel Ltd

This program is free software; you can redistribute it and/or modify it

under the terms of the GNU General Public License version 2 as published by

the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT

ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or

FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for

more details.

You should have received a copy of the GNU General Public License along with

this program. If not, see http://www.gnu.org/licenses/.

*/

+#ifndef _DMA_BUF_UAPI_H_ +#define _DMA_BUF_UAPI_H_

+#include <linux/types.h>

+/* begin/end dma-buf functions used for userspace mmap. */ +struct dma_buf_sync {
  __u64 flags;
+};

+#define DMA_BUF_SYNC_READ (1 << 0) +#define DMA_BUF_SYNC_WRITE (2 << 0) +#define DMA_BUF_SYNC_RW (DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE) +#define DMA_BUF_SYNC_START (0 << 2) +#define DMA_BUF_SYNC_END (1 << 2) +#define DMA_BUF_SYNC_VALID_FLAGS_MASK \
  (DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END)
+#define DMA_BUF_BASE 'b' +#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)

+#endif

2.1.4

Daniel Vetter

3:02 p.m.

New subject: [PATCH v9] dma-buf: Add ioctls to allow userspace to flush

On Fri, Feb 12, 2016 at 03:50:02PM +0100, David Herrmann wrote:

...

Hi

On Thu, Feb 11, 2016 at 11:04 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
From: Daniel Vetter daniel.vetter@ffwll.ch

The userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl. The sequence would be used like following: - mmap dma-buf fd - for each drawing/upload cycle in CPU 1. SYNC_START ioctl, 2. read/write to mmap area 3. SYNC_END ioctl. This can be repeated as often as you want (with the new data being consumed by the GPU or say scanout device) - munmap once you don't need the buffer any more

v2 (Tiago): Fix header file type names (u64 -> __u64) v3 (Tiago): Add documentation. Use enum dma_buf_sync_flags to the begin/end dma-buf functions. Check for overflows in start/length. v4 (Tiago): use 2d regions for sync. v5 (Tiago): forget about 2d regions (v4); use _IOW in DMA_BUF_IOCTL_SYNC and remove range information from struct dma_buf_sync. v6 (Tiago): use __u64 structured padded flags instead enum. Adjust documentation about the recommendation on using sync ioctls. v7 (Tiago): Alex' nit on flags definition and being even more wording in the doc about sync usage. v9 (Tiago): remove useless is_dma_buf_file check. Fix sync.flags conditionals and its mask order check. Add <linux/types.h> include in dma-buf.h.

Cc: Ville Syrjälä ville.syrjala@linux.intel.com Cc: David Herrmann dh.herrmann@gmail.com Cc: Sumit Semwal sumit.semwal@linaro.org Reviewed-by: Stéphane Marchesin marcheu@chromium.org Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

I left SYNC_START and SYNC_END exclusive, just how the logic was before. If we see an useful use case, maybe like the way David said, to store two frames next to each other in the same BO, we can patch up later fairly easily.

About the ioctl direction, just like Ville pointed, we're doing only copy_from_user at the moment and seems that _IOW is all we need. So I also didn't touch anything on that.

Fair enough.

All I have left are comments on coding-style, and I'll refrain from verbalizing them.. (gnah, it is so tempting).

Reviewed-by: David Herrmann dh.herrmann@gmail.com

Merged to drm-misc, thanks. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Chris Wilson

25 Feb 25 Feb

6:01 p.m.

New subject: [PATCH v9] dma-buf: Add ioctls to allow userspace to flush

On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:

...

+static long dma_buf_ioctl(struct file *file,

	  unsigned int cmd, unsigned long arg)

struct dma_buf *dmabuf;
struct dma_buf_sync sync;
enum dma_data_direction direction;
dmabuf = file->private_data;
switch (cmd) {
case DMA_BUF_IOCTL_SYNC:

if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))

```
	return -EFAULT;
```

if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)

```
	return -EINVAL;
```

switch (sync.flags & DMA_BUF_SYNC_RW) {

```
case DMA_BUF_SYNC_READ:
```
```
	direction = DMA_FROM_DEVICE;
```
```
	break;
```
```
case DMA_BUF_SYNC_WRITE:
```
```
	direction = DMA_TO_DEVICE;
```
```
	break;
```
```
case DMA_BUF_SYNC_RW:
```
```
	direction = DMA_BIDIRECTIONAL;
```
```
	break;
```
```
default:
```
```
	return -EINVAL;
```
```
}
```
```
if (sync.flags & DMA_BUF_SYNC_END)
```

	dma_buf_end_cpu_access(dmabuf, direction);

```
else
```

	dma_buf_begin_cpu_access(dmabuf, direction);

We forgot to report the error back to userspace. Might as well fixup the callchain to propagate error from end-cpu-access as well. Found after updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU. -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Daniel Vetter

29 Feb 29 Feb

2:54 p.m.

New subject: [PATCH v9] dma-buf: Add ioctls to allow userspace to flush

On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:

...

On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:

...
+static long dma_buf_ioctl(struct file *file,
	  unsigned int cmd, unsigned long arg)
+{
struct dma_buf *dmabuf;

struct dma_buf_sync sync;

enum dma_data_direction direction;

dmabuf = file->private_data;

switch (cmd) {

case DMA_BUF_IOCTL_SYNC:
if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
	return -EFAULT;
if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
	return -EINVAL;
switch (sync.flags & DMA_BUF_SYNC_RW) {
case DMA_BUF_SYNC_READ:
	direction = DMA_FROM_DEVICE;
	break;
case DMA_BUF_SYNC_WRITE:
	direction = DMA_TO_DEVICE;
	break;
case DMA_BUF_SYNC_RW:
	direction = DMA_BIDIRECTIONAL;
	break;
default:
	return -EINVAL;
}
if (sync.flags & DMA_BUF_SYNC_END)
	dma_buf_end_cpu_access(dmabuf, direction);
else
	dma_buf_begin_cpu_access(dmabuf, direction);
We forgot to report the error back to userspace. Might as well fixup the callchain to propagate error from end-cpu-access as well. Found after updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.

EINTR? Do we need to make this ABI - I guess so? Tiago, do you have patches? See drmIoctl() in libdrm for what's needed on the userspace side if my guess is right. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Chris Wilson

3:02 p.m.

New subject: [PATCH v9] dma-buf: Add ioctls to allow userspace to flush

On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote:

...

On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:

...
On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:

...
+static long dma_buf_ioctl(struct file *file,
	  unsigned int cmd, unsigned long arg)
+{
struct dma_buf *dmabuf;

struct dma_buf_sync sync;

enum dma_data_direction direction;

dmabuf = file->private_data;

switch (cmd) {

case DMA_BUF_IOCTL_SYNC:
if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
	return -EFAULT;
if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
	return -EINVAL;
switch (sync.flags & DMA_BUF_SYNC_RW) {
case DMA_BUF_SYNC_READ:
	direction = DMA_FROM_DEVICE;
	break;
case DMA_BUF_SYNC_WRITE:
	direction = DMA_TO_DEVICE;
	break;
case DMA_BUF_SYNC_RW:
	direction = DMA_BIDIRECTIONAL;
	break;
default:
	return -EINVAL;
}
if (sync.flags & DMA_BUF_SYNC_END)
	dma_buf_end_cpu_access(dmabuf, direction);
else
	dma_buf_begin_cpu_access(dmabuf, direction);
We forgot to report the error back to userspace. Might as well fixup the callchain to propagate error from end-cpu-access as well. Found after updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
EINTR? Do we need to make this ABI - I guess so? Tiago, do you have patches? See drmIoctl() in libdrm for what's needed on the userspace side if my guess is right.

EINTR is the easiest, but conceivably we could also get EIO and currently EAGAIN.

I am also seeing some strange timing dependent (i.e. valgrind doesn't show up anything client side and the tests then pass) failures (SIGSEGV, SIGBUS) with !llc. -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Daniel Vetter

5 Mar 5 Mar

9:34 a.m.

New subject: [PATCH v9] dma-buf: Add ioctls to allow userspace to flush

On Mon, Feb 29, 2016 at 03:02:09PM +0000, Chris Wilson wrote:

...

On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote:

...
On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:

...
On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:

...
+static long dma_buf_ioctl(struct file *file,
	  unsigned int cmd, unsigned long arg)
+{
struct dma_buf *dmabuf;

struct dma_buf_sync sync;

enum dma_data_direction direction;

dmabuf = file->private_data;

switch (cmd) {

case DMA_BUF_IOCTL_SYNC:
if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
	return -EFAULT;
if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
	return -EINVAL;
switch (sync.flags & DMA_BUF_SYNC_RW) {
case DMA_BUF_SYNC_READ:
	direction = DMA_FROM_DEVICE;
	break;
case DMA_BUF_SYNC_WRITE:
	direction = DMA_TO_DEVICE;
	break;
case DMA_BUF_SYNC_RW:
	direction = DMA_BIDIRECTIONAL;
	break;
default:
	return -EINVAL;
}
if (sync.flags & DMA_BUF_SYNC_END)
	dma_buf_end_cpu_access(dmabuf, direction);
else
	dma_buf_begin_cpu_access(dmabuf, direction);
We forgot to report the error back to userspace. Might as well fixup the callchain to propagate error from end-cpu-access as well. Found after updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
EINTR? Do we need to make this ABI - I guess so? Tiago, do you have patches? See drmIoctl() in libdrm for what's needed on the userspace side if my guess is right.
EINTR is the easiest, but conceivably we could also get EIO and currently EAGAIN.

I am also seeing some strange timing dependent (i.e. valgrind doesn't show up anything client side and the tests then pass) failures (SIGSEGV, SIGBUS) with !llc.

Tiago, ping. Also probably a gap in igt coverage besides the kernel side. -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Tiago Vignatti

14 Mar 14 Mar

8:21 p.m.

New subject: [PATCH v9] dma-buf: Add ioctls to allow userspace to flush

On 03/05/2016 06:34 AM, Daniel Vetter wrote:

...

On Mon, Feb 29, 2016 at 03:02:09PM +0000, Chris Wilson wrote:

...
On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote:

...
On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:

...
On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:

...
+static long dma_buf_ioctl(struct file *file,
	  unsigned int cmd, unsigned long arg)
+{
struct dma_buf *dmabuf;

struct dma_buf_sync sync;

enum dma_data_direction direction;

dmabuf = file->private_data;

switch (cmd) {

case DMA_BUF_IOCTL_SYNC:
if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
	return -EFAULT;
if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
	return -EINVAL;
switch (sync.flags & DMA_BUF_SYNC_RW) {
case DMA_BUF_SYNC_READ:
	direction = DMA_FROM_DEVICE;
	break;
case DMA_BUF_SYNC_WRITE:
	direction = DMA_TO_DEVICE;
	break;
case DMA_BUF_SYNC_RW:
	direction = DMA_BIDIRECTIONAL;
	break;
default:
	return -EINVAL;
}
if (sync.flags & DMA_BUF_SYNC_END)
	dma_buf_end_cpu_access(dmabuf, direction);
else
	dma_buf_begin_cpu_access(dmabuf, direction);
We forgot to report the error back to userspace. Might as well fixup the callchain to propagate error from end-cpu-access as well. Found after updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
EINTR? Do we need to make this ABI - I guess so? Tiago, do you have patches? See drmIoctl() in libdrm for what's needed on the userspace side if my guess is right.
EINTR is the easiest, but conceivably we could also get EIO and currently EAGAIN.

I am also seeing some strange timing dependent (i.e. valgrind doesn't show up anything client side and the tests then pass) failures (SIGSEGV, SIGBUS) with !llc.
Tiago, ping. Also probably a gap in igt coverage besides the kernel side. -Daniel

Hey guys! I'm back from vacation now. I'll take a look on it in the next days and then come back to you.

Tiago

Chris Wilson

15 Mar 15 Mar

8:51 a.m.

New subject: [PATCH v9] dma-buf: Add ioctls to allow userspace to flush

On Mon, Mar 14, 2016 at 05:21:10PM -0300, Tiago Vignatti wrote:

...

On 03/05/2016 06:34 AM, Daniel Vetter wrote:

...
On Mon, Feb 29, 2016 at 03:02:09PM +0000, Chris Wilson wrote:

...
On Mon, Feb 29, 2016 at 03:54:19PM +0100, Daniel Vetter wrote:

...
On Thu, Feb 25, 2016 at 06:01:22PM +0000, Chris Wilson wrote:

...
On Thu, Feb 11, 2016 at 08:04:51PM -0200, Tiago Vignatti wrote:

...
+static long dma_buf_ioctl(struct file *file,
	  unsigned int cmd, unsigned long arg)
+{
struct dma_buf *dmabuf;

struct dma_buf_sync sync;

enum dma_data_direction direction;

dmabuf = file->private_data;

switch (cmd) {

case DMA_BUF_IOCTL_SYNC:
if (copy_from_user(&sync, (void __user *) arg, sizeof(sync)))
	return -EFAULT;
if (sync.flags & ~DMA_BUF_SYNC_VALID_FLAGS_MASK)
	return -EINVAL;
switch (sync.flags & DMA_BUF_SYNC_RW) {
case DMA_BUF_SYNC_READ:
	direction = DMA_FROM_DEVICE;
	break;
case DMA_BUF_SYNC_WRITE:
	direction = DMA_TO_DEVICE;
	break;
case DMA_BUF_SYNC_RW:
	direction = DMA_BIDIRECTIONAL;
	break;
default:
	return -EINVAL;
}
if (sync.flags & DMA_BUF_SYNC_END)
	dma_buf_end_cpu_access(dmabuf, direction);
else
	dma_buf_begin_cpu_access(dmabuf, direction);
We forgot to report the error back to userspace. Might as well fixup the callchain to propagate error from end-cpu-access as well. Found after updating igt/gem_concurrent_blit to exercise dmabuf mmaps vs the GPU.
EINTR? Do we need to make this ABI - I guess so? Tiago, do you have patches? See drmIoctl() in libdrm for what's needed on the userspace side if my guess is right.
EINTR is the easiest, but conceivably we could also get EIO and currently EAGAIN.

I am also seeing some strange timing dependent (i.e. valgrind doesn't show up anything client side and the tests then pass) failures (SIGSEGV, SIGBUS) with !llc.
Tiago, ping. Also probably a gap in igt coverage besides the kernel side. -Daniel
Hey guys! I'm back from vacation now. I'll take a look on it in the next days and then come back to you.

An unpolished version: https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=breadcrumbs&id=b... -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Tiago Vignatti

17 Mar 17 Mar

6:18 p.m.

New subject: [PATCH] prime_mmap_coherency: Add return error tests for prime sync ioctl

This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors.

The subtest constantly interrupts via signals a function doing concurrent blit to stress out the right usage of prime_sync_*, making sure these ioctl errors are handled accordingly. Important to note that in case of failure (e.g. in a case where the ioctl wouldn't try again in a return error) this test does not reliably catch the problem with 100% of accuracy.

Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com ---

Chris, your unpolished dma-buf patch for adding return error into the ioctl calls lgtm. Let me know if you think this kind of test is useful now in igt.

Thanks

Tiago

tests/prime_mmap_coherency.c | 87 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+)

diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c index 180d8a4..bae2144 100644 --- a/tests/prime_mmap_coherency.c +++ b/tests/prime_mmap_coherency.c @@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache) munmap(ptr_cpu, width * height); }

+static void blit_and_cmp(void) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr2_cpu; + int dma_buf_fd, dma_buf2_fd, i; + int local_fd; + drm_intel_bufmgr *local_bufmgr; + struct intel_batchbuffer *local_batch; + + /* recreate process local variables */ + local_fd = drm_open_driver(DRIVER_INTEL); + local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096); + igt_assert(local_bufmgr); + + local_batch = intel_batchbuffer_alloc(local_bufmgr, intel_get_drm_devid(local_fd)); + igt_assert(local_batch); + + bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 4096); + dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 4096); + dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle); + + ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf2_fd, 0); + igt_assert(ptr2_cpu != MAP_FAILED); + + /* Fill up BO 1 with '1's and BO 2 with '0's */ + prime_sync_start(dma_buf_fd, true); + memset(ptr_cpu, 0x11, width * height); + prime_sync_end(dma_buf_fd, true); + + prime_sync_start(dma_buf2_fd, true); + memset(ptr2_cpu, 0x00, width * height); + prime_sync_end(dma_buf2_fd, true); + + /* Copy BO 1 into BO 2, using blitter. */ + intel_copy_bo(local_batch, bo_2, bo_1, width * height); + usleep(0); /* let someone else claim the mutex */ + + /* Compare BOs. If prime_sync_* were executed properly, the caches + * should be synced. */ + prime_sync_start(dma_buf2_fd, true); + for (i = 0; i < (width * height) / 4; i++) + igt_fail_on_f(ptr2_cpu[i] != 0x11111111, "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i); + prime_sync_end(dma_buf2_fd, true); + + drm_intel_bo_unreference(bo_1); + drm_intel_bo_unreference(bo_2); + munmap(ptr_cpu, width * height); + munmap(ptr2_cpu, width * height); +} + +/* + * Constantly interrupt concurrent blits to stress out prime_sync_* and make + * sure these ioctl errors are handled accordingly. + * + * Important to note that in case of failure (e.g. in a case where the ioctl + * wouldn't try again in a return error) this test does not reliably catch the + * problem with 100% of accuracy. + */ +static void test_ioctl_errors(void) +{ + int i; + int num_children = 8*sysconf(_SC_NPROCESSORS_ONLN); + + igt_fork_signal_helper(); + igt_fork(child, num_children) { + for (i = 0; i < ROUNDS; i++) + blit_and_cmp(); + } + igt_waitchildren(); + igt_stop_signal_helper(); +} + int main(int argc, char **argv) { int i; @@ -235,6 +317,11 @@ int main(int argc, char **argv) igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); }

+ igt_subtest("ioctl-errors") { + igt_info("exercising concurrent blit to get ioctl errors\n"); + test_ioctl_errors(); + } + igt_fixture { intel_batchbuffer_free(batch); drm_intel_bufmgr_destroy(bufmgr);

-- 2.1.4

Chris Wilson

9:01 p.m.

New subject: [PATCH] prime_mmap_coherency: Add return error tests for prime sync ioctl

On Thu, Mar 17, 2016 at 03:18:03PM -0300, Tiago Vignatti wrote:

...

This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors.

The subtest constantly interrupts via signals a function doing concurrent blit to stress out the right usage of prime_sync_*, making sure these ioctl errors are handled accordingly. Important to note that in case of failure (e.g. in a case where the ioctl wouldn't try again in a return error) this test does not reliably catch the problem with 100% of accuracy.

Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

Chris, your unpolished dma-buf patch for adding return error into the ioctl calls lgtm. Let me know if you think this kind of test is useful now in igt.

Thanks

Tiago

tests/prime_mmap_coherency.c | 87 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+)

diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c index 180d8a4..bae2144 100644 --- a/tests/prime_mmap_coherency.c +++ b/tests/prime_mmap_coherency.c @@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache) munmap(ptr_cpu, width * height); }

+static void blit_and_cmp(void) +{
drm_intel_bo *bo_1;

drm_intel_bo *bo_2;

uint32_t *ptr_cpu;

uint32_t *ptr2_cpu;

int dma_buf_fd, dma_buf2_fd, i;

int local_fd;

drm_intel_bufmgr *local_bufmgr;

struct intel_batchbuffer *local_batch;

/* recreate process local variables */

local_fd = drm_open_driver(DRIVER_INTEL);

local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096);

igt_assert(local_bufmgr);

local_batch = intel_batchbuffer_alloc(local_bufmgr, intel_get_drm_devid(local_fd));

igt_assert(local_batch);

bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 4096);

dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle);

igt_skip_on(errno == EINVAL);

ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
	MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr_cpu != MAP_FAILED);

bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 4096);

dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle);

ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
	MAP_SHARED, dma_buf2_fd, 0);
igt_assert(ptr2_cpu != MAP_FAILED);

/* Fill up BO 1 with '1's and BO 2 with '0's */

prime_sync_start(dma_buf_fd, true);

memset(ptr_cpu, 0x11, width * height);

prime_sync_end(dma_buf_fd, true);

prime_sync_start(dma_buf2_fd, true);

memset(ptr2_cpu, 0x00, width * height);

prime_sync_end(dma_buf2_fd, true);

/* Copy BO 1 into BO 2, using blitter. */

intel_copy_bo(local_batch, bo_2, bo_1, width * height);

usleep(0); /* let someone else claim the mutex */

/* Compare BOs. If prime_sync_* were executed properly, the caches
* should be synced. */
prime_sync_start(dma_buf2_fd, true);

Maybe false here? Note that it makes any difference for the driver atm.

...

for (i = 0; i < (width * height) / 4; i++)
igt_fail_on_f(ptr2_cpu[i] != 0x11111111, "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i);
prime_sync_end(dma_buf2_fd, true);

drm_intel_bo_unreference(bo_1);

drm_intel_bo_unreference(bo_2);

munmap(ptr_cpu, width * height);

munmap(ptr2_cpu, width * height);

Do we have anything that verifies that dmabuf maps persist beyond gem_close() on the original bo?

Yes, that test should hit all interruptible paths we have in dmabuf and would be a great addition to igt. -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Tiago Vignatti

9:15 p.m.

New subject: [PATCH] prime_mmap_coherency: Add return error tests for prime sync ioctl

On 03/17/2016 06:01 PM, Chris Wilson wrote:

...

On Thu, Mar 17, 2016 at 03:18:03PM -0300, Tiago Vignatti wrote:

...
This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors.

The subtest constantly interrupts via signals a function doing concurrent blit to stress out the right usage of prime_sync_*, making sure these ioctl errors are handled accordingly. Important to note that in case of failure (e.g. in a case where the ioctl wouldn't try again in a return error) this test does not reliably catch the problem with 100% of accuracy.

Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

Chris, your unpolished dma-buf patch for adding return error into the ioctl calls lgtm. Let me know if you think this kind of test is useful now in igt.

Thanks

Tiago

tests/prime_mmap_coherency.c | 87 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+)

diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c index 180d8a4..bae2144 100644 --- a/tests/prime_mmap_coherency.c +++ b/tests/prime_mmap_coherency.c @@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache) munmap(ptr_cpu, width * height); }

+static void blit_and_cmp(void) +{
drm_intel_bo *bo_1;

drm_intel_bo *bo_2;

uint32_t *ptr_cpu;

uint32_t *ptr2_cpu;

int dma_buf_fd, dma_buf2_fd, i;

int local_fd;

drm_intel_bufmgr *local_bufmgr;

struct intel_batchbuffer *local_batch;

/* recreate process local variables */

local_fd = drm_open_driver(DRIVER_INTEL);

local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096);

igt_assert(local_bufmgr);

local_batch = intel_batchbuffer_alloc(local_bufmgr, intel_get_drm_devid(local_fd));

igt_assert(local_batch);

bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 4096);

dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle);

igt_skip_on(errno == EINVAL);

ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
	MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr_cpu != MAP_FAILED);

bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 4096);

dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle);

ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
	MAP_SHARED, dma_buf2_fd, 0);
igt_assert(ptr2_cpu != MAP_FAILED);

/* Fill up BO 1 with '1's and BO 2 with '0's */

prime_sync_start(dma_buf_fd, true);

memset(ptr_cpu, 0x11, width * height);

prime_sync_end(dma_buf_fd, true);

prime_sync_start(dma_buf2_fd, true);

memset(ptr2_cpu, 0x00, width * height);

prime_sync_end(dma_buf2_fd, true);

/* Copy BO 1 into BO 2, using blitter. */

intel_copy_bo(local_batch, bo_2, bo_1, width * height);

usleep(0); /* let someone else claim the mutex */

/* Compare BOs. If prime_sync_* were executed properly, the caches
* should be synced. */
prime_sync_start(dma_buf2_fd, true);
Maybe false here? Note that it makes any difference for the driver atm.

oh, my bad.

...

...
for (i = 0; i < (width * height) / 4; i++)
igt_fail_on_f(ptr2_cpu[i] != 0x11111111, "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i);
prime_sync_end(dma_buf2_fd, true);

drm_intel_bo_unreference(bo_1);

drm_intel_bo_unreference(bo_2);

munmap(ptr_cpu, width * height);

munmap(ptr2_cpu, width * height);
Do we have anything that verifies that dmabuf maps persist beyond gem_close() on the original bo?

that's test_refcounting in prime_mmap.c

...

Yes, that test should hit all interruptible paths we have in dmabuf and would be a great addition to igt.

cool, thanks. I'm resending now v2.

Tiago

Tiago Vignatti

9:18 p.m.

New subject: [PATCH v2] prime_mmap_coherency: Add return error tests for prime sync ioctl

This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors.

v2: fix prime sync direction when reading mmap'ed file.

Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- tests/prime_mmap_coherency.c | 87 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+)

diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c index 180d8a4..80d1c1f 100644 --- a/tests/prime_mmap_coherency.c +++ b/tests/prime_mmap_coherency.c @@ -180,6 +180,88 @@ static void test_write_flush(bool expect_stale_cache) munmap(ptr_cpu, width * height); }

+static void blit_and_cmp(void) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr2_cpu; + int dma_buf_fd, dma_buf2_fd, i; + int local_fd; + drm_intel_bufmgr *local_bufmgr; + struct intel_batchbuffer *local_batch; + + /* recreate process local variables */ + local_fd = drm_open_driver(DRIVER_INTEL); + local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096); + igt_assert(local_bufmgr); + + local_batch = intel_batchbuffer_alloc(local_bufmgr, intel_get_drm_devid(local_fd)); + igt_assert(local_batch); + + bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 4096); + dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 4096); + dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle); + + ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf2_fd, 0); + igt_assert(ptr2_cpu != MAP_FAILED); + + /* Fill up BO 1 with '1's and BO 2 with '0's */ + prime_sync_start(dma_buf_fd, true); + memset(ptr_cpu, 0x11, width * height); + prime_sync_end(dma_buf_fd, true); + + prime_sync_start(dma_buf2_fd, true); + memset(ptr2_cpu, 0x00, width * height); + prime_sync_end(dma_buf2_fd, true); + + /* Copy BO 1 into BO 2, using blitter. */ + intel_copy_bo(local_batch, bo_2, bo_1, width * height); + usleep(0); /* let someone else claim the mutex */ + + /* Compare BOs. If prime_sync_* were executed properly, the caches + * should be synced. */ + prime_sync_start(dma_buf2_fd, false); + for (i = 0; i < (width * height) / 4; i++) + igt_fail_on_f(ptr2_cpu[i] != 0x11111111, "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i); + prime_sync_end(dma_buf2_fd, false); + + drm_intel_bo_unreference(bo_1); + drm_intel_bo_unreference(bo_2); + munmap(ptr_cpu, width * height); + munmap(ptr2_cpu, width * height); +} + +/* + * Constantly interrupt concurrent blits to stress out prime_sync_* and make + * sure these ioctl errors are handled accordingly. + * + * Important to note that in case of failure (e.g. in a case where the ioctl + * wouldn't try again in a return error) this test does not reliably catch the + * problem with 100% of accuracy. + */ +static void test_ioctl_errors(void) +{ + int i; + int num_children = 8*sysconf(_SC_NPROCESSORS_ONLN); + + igt_fork_signal_helper(); + igt_fork(child, num_children) { + for (i = 0; i < ROUNDS; i++) + blit_and_cmp(); + } + igt_waitchildren(); + igt_stop_signal_helper(); +} + int main(int argc, char **argv) { int i; @@ -235,6 +317,11 @@ int main(int argc, char **argv) igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); }

-- 2.1.4

Chris Wilson

18 Mar 18 Mar

9:44 a.m.

New subject: [PATCH v2] prime_mmap_coherency: Add return error tests for prime sync ioctl

On Thu, Mar 17, 2016 at 06:18:06PM -0300, Tiago Vignatti wrote:

...

+static void test_ioctl_errors(void) +{

int i;

int num_children = 8*sysconf(_SC_NPROCESSORS_ONLN);

igt_fork_signal_helper();

igt_fork(child, num_children) {

Actually there is a danger here in only using a fork-bomb, we may loose out on signal interruptions due to busyspins on the mutex (i.e. the contention hiding critical paths, since we crucially rely on timing).

For fun, I would use:

int ncpus = sysconf(_SC_NPROCESSORS_ONLN); int num_children;

igt_fork_signal_helper(); for (int num_children = 1; num_children <= 8 *ncpus; num_children <<= 1) { igt_fork(child, num_children) { struct timespec start = {}; while (igt_seconds_elapsed(&start) <= num_children) blit_and_cmp(); } } igt_waitchildren(); igt_stop_signal_helper(); -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Chris Wilson

9:53 a.m.

New subject: [Intel-gfx] [PATCH v2] prime_mmap_coherency: Add return error tests for prime sync ioctl

On Fri, Mar 18, 2016 at 09:44:40AM +0000, Chris Wilson wrote:

...

On Thu, Mar 17, 2016 at 06:18:06PM -0300, Tiago Vignatti wrote:

...
+static void test_ioctl_errors(void) +{

int i;

int num_children = 8*sysconf(_SC_NPROCESSORS_ONLN);

igt_fork_signal_helper();

igt_fork(child, num_children) {

Actually there is a danger here in only using a fork-bomb, we may loose out on signal interruptions due to busyspins on the mutex (i.e. the contention hiding critical paths, since we crucially rely on timing).

For fun, I would use:

int ncpus = sysconf(_SC_NPROCESSORS_ONLN); int num_children;

igt_fork_signal_helper(); for (int num_children = 1; num_children <= 8 *ncpus; num_children <<= 1) { igt_fork(child, num_children) { struct timespec start = {}; while (igt_seconds_elapsed(&start) <= num_children) blit_and_cmp(); } } igt_waitchildren();

With the bug ofc! -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Tiago Vignatti

6:08 p.m.

New subject: [PATCH v3] prime_mmap_coherency: Add return error tests for prime sync ioctl

This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors.

v2: fix prime sync direction when reading mmap'ed file. v3: change the upper bound using time rather than loops

Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- tests/prime_mmap_coherency.c | 89 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+)

diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c index 180d8a4..d2b2a4f 100644 --- a/tests/prime_mmap_coherency.c +++ b/tests/prime_mmap_coherency.c @@ -180,6 +180,90 @@ static void test_write_flush(bool expect_stale_cache) munmap(ptr_cpu, width * height); }

+static void blit_and_cmp(void) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr2_cpu; + int dma_buf_fd, dma_buf2_fd, i; + int local_fd; + drm_intel_bufmgr *local_bufmgr; + struct intel_batchbuffer *local_batch; + + /* recreate process local variables */ + local_fd = drm_open_driver(DRIVER_INTEL); + local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096); + igt_assert(local_bufmgr); + + local_batch = intel_batchbuffer_alloc(local_bufmgr, intel_get_drm_devid(local_fd)); + igt_assert(local_batch); + + bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 4096); + dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 4096); + dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle); + + ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf2_fd, 0); + igt_assert(ptr2_cpu != MAP_FAILED); + + /* Fill up BO 1 with '1's and BO 2 with '0's */ + prime_sync_start(dma_buf_fd, true); + memset(ptr_cpu, 0x11, width * height); + prime_sync_end(dma_buf_fd, true); + + prime_sync_start(dma_buf2_fd, true); + memset(ptr2_cpu, 0x00, width * height); + prime_sync_end(dma_buf2_fd, true); + + /* Copy BO 1 into BO 2, using blitter. */ + intel_copy_bo(local_batch, bo_2, bo_1, width * height); + usleep(0); /* let someone else claim the mutex */ + + /* Compare BOs. If prime_sync_* were executed properly, the caches + * should be synced. */ + prime_sync_start(dma_buf2_fd, false); + for (i = 0; i < (width * height) / 4; i++) + igt_fail_on_f(ptr2_cpu[i] != 0x11111111, "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i); + prime_sync_end(dma_buf2_fd, false); + + drm_intel_bo_unreference(bo_1); + drm_intel_bo_unreference(bo_2); + munmap(ptr_cpu, width * height); + munmap(ptr2_cpu, width * height); +} + +/* + * Constantly interrupt concurrent blits to stress out prime_sync_* and make + * sure these ioctl errors are handled accordingly. + * + * Important to note that in case of failure (e.g. in a case where the ioctl + * wouldn't try again in a return error) this test does not reliably catch the + * problem with 100% of accuracy. + */ +static void test_ioctl_errors(void) +{ + int ncpus = sysconf(_SC_NPROCESSORS_ONLN); + + igt_fork_signal_helper(); + for (int num_children = 1; num_children <= 8 *ncpus; num_children <<= 1) { + igt_fork(child, num_children) { + struct timespec start = {}; + while (igt_nsec_elapsed(&start) <= num_children) + blit_and_cmp(); + } + igt_waitchildren(); + } + igt_stop_signal_helper(); +} + int main(int argc, char **argv) { int i; @@ -235,6 +319,11 @@ int main(int argc, char **argv) igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); }

-- 2.1.4

Daniel Vetter

6:11 p.m.

New subject: [PATCH v3] prime_mmap_coherency: Add return error tests for prime sync ioctl

On Fri, Mar 18, 2016 at 03:08:56PM -0300, Tiago Vignatti wrote:

...

This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors.

The subtest constantly interrupts via signals a function doing concurrent blit to stress out the right usage of prime_sync_*, making sure these ioctl errors are handled accordingly. Important to note that in case of failure (e.g. in a case where the ioctl wouldn't try again in a return error) this test does not reliably catch the problem with 100% of accuracy.

v2: fix prime sync direction when reading mmap'ed file. v3: change the upper bound using time rather than loops

Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

I'm probably blind, but where is the reviewed kernel patch for this? If it's somewhere hidden, please resubmit with all the whizzbang stuff needed for merging added ;-)

Thanks, Daniel

...

tests/prime_mmap_coherency.c | 89 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+)

diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c index 180d8a4..d2b2a4f 100644 --- a/tests/prime_mmap_coherency.c +++ b/tests/prime_mmap_coherency.c @@ -180,6 +180,90 @@ static void test_write_flush(bool expect_stale_cache) munmap(ptr_cpu, width * height); }

+static void blit_and_cmp(void) +{
drm_intel_bo *bo_1;

drm_intel_bo *bo_2;

uint32_t *ptr_cpu;

uint32_t *ptr2_cpu;

int dma_buf_fd, dma_buf2_fd, i;

int local_fd;

drm_intel_bufmgr *local_bufmgr;

struct intel_batchbuffer *local_batch;

/* recreate process local variables */

local_fd = drm_open_driver(DRIVER_INTEL);

local_bufmgr = drm_intel_bufmgr_gem_init(local_fd, 4096);

igt_assert(local_bufmgr);

local_batch = intel_batchbuffer_alloc(local_bufmgr, intel_get_drm_devid(local_fd));

igt_assert(local_batch);

bo_1 = drm_intel_bo_alloc(local_bufmgr, "BO 1", width * height * 4, 4096);

dma_buf_fd = prime_handle_to_fd_for_mmap(local_fd, bo_1->handle);

igt_skip_on(errno == EINVAL);

ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
	MAP_SHARED, dma_buf_fd, 0);
igt_assert(ptr_cpu != MAP_FAILED);

bo_2 = drm_intel_bo_alloc(local_bufmgr, "BO 2", width * height * 4, 4096);

dma_buf2_fd = prime_handle_to_fd_for_mmap(local_fd, bo_2->handle);

ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE,
	MAP_SHARED, dma_buf2_fd, 0);
igt_assert(ptr2_cpu != MAP_FAILED);

/* Fill up BO 1 with '1's and BO 2 with '0's */

prime_sync_start(dma_buf_fd, true);

memset(ptr_cpu, 0x11, width * height);

prime_sync_end(dma_buf_fd, true);

prime_sync_start(dma_buf2_fd, true);

memset(ptr2_cpu, 0x00, width * height);

prime_sync_end(dma_buf2_fd, true);

/* Copy BO 1 into BO 2, using blitter. */

intel_copy_bo(local_batch, bo_2, bo_1, width * height);

usleep(0); /* let someone else claim the mutex */

/* Compare BOs. If prime_sync_* were executed properly, the caches
* should be synced. */
prime_sync_start(dma_buf2_fd, false);

for (i = 0; i < (width * height) / 4; i++)
igt_fail_on_f(ptr2_cpu[i] != 0x11111111, "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i);
prime_sync_end(dma_buf2_fd, false);

drm_intel_bo_unreference(bo_1);

drm_intel_bo_unreference(bo_2);

munmap(ptr_cpu, width * height);

munmap(ptr2_cpu, width * height);
+}

+/*

Constantly interrupt concurrent blits to stress out prime_sync_* and make

sure these ioctl errors are handled accordingly.

Important to note that in case of failure (e.g. in a case where the ioctl

wouldn't try again in a return error) this test does not reliably catch the

problem with 100% of accuracy.

*/

+static void test_ioctl_errors(void) +{
int ncpus = sysconf(_SC_NPROCESSORS_ONLN);

igt_fork_signal_helper();

for (int num_children = 1; num_children <= 8 *ncpus; num_children <<= 1) {
igt_fork(child, num_children) {
	struct timespec start = {};
	while (igt_nsec_elapsed(&start) <= num_children)
		blit_and_cmp();
}
igt_waitchildren();
}

igt_stop_signal_helper();
+}

int main(int argc, char **argv) { int i; @@ -235,6 +319,11 @@ int main(int argc, char **argv) igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); }
igt_subtest("ioctl-errors") {
igt_info("exercising concurrent blit to get ioctl errors\n");
test_ioctl_errors();
}

igt_fixture { intel_batchbuffer_free(batch); drm_intel_bufmgr_destroy(bufmgr);
-- 2.1.4

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Tiago Vignatti

6:17 p.m.

New subject: [PATCH v3] prime_mmap_coherency: Add return error tests for prime sync ioctl

On 03/18/2016 03:11 PM, Daniel Vetter wrote:

...

On Fri, Mar 18, 2016 at 03:08:56PM -0300, Tiago Vignatti wrote:

...
This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors.

The subtest constantly interrupts via signals a function doing concurrent blit to stress out the right usage of prime_sync_*, making sure these ioctl errors are handled accordingly. Important to note that in case of failure (e.g. in a case where the ioctl wouldn't try again in a return error) this test does not reliably catch the problem with 100% of accuracy.

v2: fix prime sync direction when reading mmap'ed file. v3: change the upper bound using time rather than loops

Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

I'm probably blind, but where is the reviewed kernel patch for this? If it's somewhere hidden, please resubmit with all the whizzbang stuff needed for merging added ;-)

Thanks, Daniel

You're not blind Daniel :) Chris will be sending the kernel side but regardless this igt test should be good to go even without the kernel patch.

Tiago

Chris Wilson

8:43 p.m.

New subject: [PATCH v3] prime_mmap_coherency: Add return error tests for prime sync ioctl

On Fri, Mar 18, 2016 at 03:08:56PM -0300, Tiago Vignatti wrote:

...

This patch adds ioctl-errors subtest to be used for exercising prime sync ioctl errors.

The subtest constantly interrupts via signals a function doing concurrent blit to stress out the right usage of prime_sync_*, making sure these ioctl errors are handled accordingly. Important to note that in case of failure (e.g. in a case where the ioctl wouldn't try again in a return error) this test does not reliably catch the problem with 100% of accuracy.

v2: fix prime sync direction when reading mmap'ed file. v3: change the upper bound using time rather than loops

Cc: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com

+static void test_ioctl_errors(void) +{

int ncpus = sysconf(_SC_NPROCESSORS_ONLN);

igt_fork_signal_helper();

for (int num_children = 1; num_children <= 8 *ncpus; num_children <<= 1) {

Hmm, that's a lot of buffers....

I'm going to stick a couple of intel_require_memmory and intel_check_memory() here.

Wait that's no moon. Oops, give me back my swap!

...

```
igt_fork(child, num_children) {
```
```
	struct timespec start = {};
```

	while (igt_nsec_elapsed(&start) <= num_children)

igt_nsec_elapsed().... Barely any time at all!

Presumed you wanted seconds, fixed the memleak and pushed. Thanks, -Chris

-- Chris Wilson, Intel Open Source Technology Centre

Tiago Vignatti

22 Dec 22 Dec

9:36 p.m.

New subject: [PATCH v7 4/5] drm/i915: Implement end_cpu_access

This function is meant to be used with dma-buf mmap, when finishing the CPU access of the mapped pointer.

The error case should be rare to happen though, requiring the buffer become active during the sync period and for the end_cpu_access to be interrupted. So we use a uninterruptible mutex_lock to spit out when it ever happens.

v2: disable interruption to make sure errors are reported. v3: update to the new end_cpu_access API. v7: use .write = false cause it doesn't need to know whether it's write.

Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 65ab2bd..8c9ed2a 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -212,6 +212,27 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire return ret; }

+static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) +{ + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + struct drm_device *dev = obj->base.dev; + struct drm_i915_private *dev_priv = to_i915(dev); + bool was_interruptible; + int ret; + + mutex_lock(&dev->struct_mutex); + was_interruptible = dev_priv->mm.interruptible; + dev_priv->mm.interruptible = false; + + ret = i915_gem_object_set_to_gtt_domain(obj, false); + + dev_priv->mm.interruptible = was_interruptible; + mutex_unlock(&dev->struct_mutex); + + if (unlikely(ret)) + DRM_ERROR("unable to flush buffer following CPU access; rendering may be corrupt\n"); +} + static const struct dma_buf_ops i915_dmabuf_ops = { .map_dma_buf = i915_gem_map_dma_buf, .unmap_dma_buf = i915_gem_unmap_dma_buf, @@ -224,6 +245,7 @@ static const struct dma_buf_ops i915_dmabuf_ops = { .vmap = i915_gem_dmabuf_vmap, .vunmap = i915_gem_dmabuf_vunmap, .begin_cpu_access = i915_gem_begin_cpu_access, + .end_cpu_access = i915_gem_end_cpu_access, };

struct dma_buf *i915_gem_prime_export(struct drm_device *dev,

-- 2.1.4

Tiago Vignatti

9:36 p.m.

New subject: [PATCH v7 5/5] drm/i915: Use CPU mapping for userspace dma-buf mmap()

Userspace is the one in charge of flush CPU by wrapping mmap with begin{,end}_cpu_access.

v2: Remove LLC check cause we have dma-buf sync providers now. Also, fix return before transferring ownership when mmap fails. v3: Fix return values. v4: !obj->base.filp is user triggerable, so removed the WARN_ON.

Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 8c9ed2a..1f3eef6 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -193,7 +193,23 @@ static void i915_gem_dmabuf_kunmap(struct dma_buf *dma_buf, unsigned long page_n

static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *vma) { - return -EINVAL; + struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); + int ret; + + if (obj->base.size < vma->vm_end - vma->vm_start) + return -EINVAL; + + if (!obj->base.filp) + return -ENODEV; + + ret = obj->base.filp->f_op->mmap(obj->base.filp, vma); + if (ret) + return ret; + + fput(vma->vm_file); + vma->vm_file = get_file(obj->base.filp); + + return 0; }

static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction)

-- 2.1.4

Tiago Vignatti

9:36 p.m.

New subject: [PATCH igt v7 1/6] lib: Add gem_userptr and __gem_userptr helpers

This patch moves userptr definitions and helpers implementation that were locally in gem_userptr_benchmark and gem_userptr_blits to the library, so other tests can make use of them as well. There's no functional changes.

v2: added __ function to differentiate when errors want to be handled back in the caller; bring gem_userptr_sync back to gem_userptr_blits; added gtkdoc.

Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- benchmarks/gem_userptr_benchmark.c | 55 +++----------------- lib/ioctl_wrappers.c | 41 +++++++++++++++ lib/ioctl_wrappers.h | 13 +++++ tests/gem_userptr_blits.c | 104 ++++++++++--------------------------- 4 files changed, 86 insertions(+), 127 deletions(-)

diff --git a/benchmarks/gem_userptr_benchmark.c b/benchmarks/gem_userptr_benchmark.c index 1eae7ff..f7716df 100644 --- a/benchmarks/gem_userptr_benchmark.c +++ b/benchmarks/gem_userptr_benchmark.c @@ -58,17 +58,6 @@ #define PAGE_SIZE 4096 #endif

-#define LOCAL_I915_GEM_USERPTR 0x33 -#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr) -struct local_i915_gem_userptr { - uint64_t user_ptr; - uint64_t user_size; - uint32_t flags; -#define LOCAL_I915_USERPTR_READ_ONLY (1<<0) -#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31) - uint32_t handle; -}; - static uint32_t userptr_flags = LOCAL_I915_USERPTR_UNSYNCHRONIZED;

#define BO_SIZE (65536) @@ -83,30 +72,6 @@ static void gem_userptr_test_synchronized(void) userptr_flags = 0; }

-static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t *handle) -{ - struct local_i915_gem_userptr userptr; - int ret; - - userptr.user_ptr = (uintptr_t)ptr; - userptr.user_size = size; - userptr.flags = userptr_flags; - if (read_only) - userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY; - - ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr); - if (ret) - ret = errno; - igt_skip_on_f(ret == ENODEV && - (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 && - !read_only, - "Skipping, synchronized mappings with no kernel CONFIG_MMU_NOTIFIER?"); - if (ret == 0) - *handle = userptr.handle; - - return ret; -} - static void **handle_ptr_map; static unsigned int num_handle_ptr_map;

@@ -144,8 +109,7 @@ static uint32_t create_userptr_bo(int fd, int size) ret = posix_memalign(&ptr, PAGE_SIZE, size); igt_assert(ret == 0);

- ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle); - igt_assert(ret == 0); + gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle); add_handle_ptr(handle, ptr);

return handle; @@ -167,7 +131,7 @@ static int has_userptr(int fd) assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0); oldflags = userptr_flags; gem_userptr_test_unsynchronized(); - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle); userptr_flags = oldflags; if (ret != 0) { free(ptr); @@ -379,9 +343,7 @@ static void test_impact_overlap(int fd, const char *prefix)

for (i = 0, p = block; i < nr_bos[subtest]; i++, p += PAGE_SIZE) - ret = gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, - &handles[i]); - igt_assert(ret == 0); + gem_userptr(fd, (uint32_t *)p, BO_SIZE, 0, userptr_flags, &handles[i]); }

if (nr_bos[subtest] > 0) @@ -427,7 +389,6 @@ static void test_single(int fd) char *ptr, *bo_ptr; uint32_t handle = 0; unsigned long iter = 0; - int ret; unsigned long map_size = BO_SIZE + PAGE_SIZE - 1;

ptr = mmap(NULL, map_size, PROT_READ | PROT_WRITE, @@ -439,8 +400,7 @@ static void test_single(int fd) start_test(test_duration_sec);

while (run_test) { - ret = gem_userptr(fd, bo_ptr, BO_SIZE, 0, &handle); - assert(ret == 0); + gem_userptr(fd, bo_ptr, BO_SIZE, 0, userptr_flags, &handle); gem_close(fd, handle); iter++; } @@ -456,7 +416,6 @@ static void test_multiple(int fd, unsigned int batch, int random) uint32_t handles[10000]; int map[10000]; unsigned long iter = 0; - int ret; int i; unsigned long map_size = batch * BO_SIZE + PAGE_SIZE - 1;

@@ -478,10 +437,8 @@ static void test_multiple(int fd, unsigned int batch, int random) if (random) igt_permute_array(map, batch, igt_exchange_int); for (i = 0; i < batch; i++) { - ret = gem_userptr(fd, bo_ptr + map[i] * BO_SIZE, - BO_SIZE, - 0, &handles[i]); - assert(ret == 0); + gem_userptr(fd, bo_ptr + map[i] * BO_SIZE, BO_SIZE, + 0, userptr_flags, &handles[i]); } if (random) igt_permute_array(map, batch, igt_exchange_int); diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index e348f26..6cad8a2 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -871,6 +871,47 @@ void gem_context_require_ban_period(int fd) igt_require(has_ban_period); }

+int __gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, uint32_t *handle) +{ + struct local_i915_gem_userptr userptr; + int ret; + + memset(&userptr, 0, sizeof(userptr)); + userptr.user_ptr = (uintptr_t)ptr; + userptr.user_size = size; + userptr.flags = flags; + if (read_only) + userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY; + + ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr); + if (ret) + ret = errno; + igt_skip_on_f(ret == ENODEV && + (flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 && + !read_only, + "Skipping, synchronized mappings with no kernel CONFIG_MMU_NOTIFIER?"); + if (ret == 0) + *handle = userptr.handle; + + return ret; +} + +/** + * gem_userptr: + * @fd: open i915 drm file descriptor + * @ptr: userptr pointer to be passed + * @size: desired size of the buffer + * @read_only: specify whether userptr is opened read only + * @flags: other userptr flags + * @handle: returned handle for the object + * + * Returns userptr handle for the GEM object. + */ +void gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, uint32_t *handle) +{ + igt_assert_eq(__gem_userptr(fd, ptr, size, read_only, flags, handle), 0); +} + /** * gem_sw_finish: * @fd: open i915 drm file descriptor diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index fe2f687..bb8a858 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -112,6 +112,19 @@ void gem_context_require_param(int fd, uint64_t param); void gem_context_get_param(int fd, struct local_i915_gem_context_param *p); void gem_context_set_param(int fd, struct local_i915_gem_context_param *p);

+#define LOCAL_I915_GEM_USERPTR 0x33 +#define LOCAL_IOCTL_I915_GEM_USERPTR DRM_IOWR (DRM_COMMAND_BASE + LOCAL_I915_GEM_USERPTR, struct local_i915_gem_userptr) +struct local_i915_gem_userptr { + uint64_t user_ptr; + uint64_t user_size; + uint32_t flags; +#define LOCAL_I915_USERPTR_READ_ONLY (1<<0) +#define LOCAL_I915_USERPTR_UNSYNCHRONIZED (1<<31) + uint32_t handle; +}; +void gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, uint32_t *handle); +int __gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t flags, uint32_t *handle); + void gem_sw_finish(int fd, uint32_t handle);

bool gem_bo_busy(int fd, uint32_t handle); diff --git a/tests/gem_userptr_blits.c b/tests/gem_userptr_blits.c index 6d38260..95d7ca2 100644 --- a/tests/gem_userptr_blits.c +++ b/tests/gem_userptr_blits.c @@ -61,17 +61,6 @@ #define PAGE_SIZE 4096 #endif

#define WIDTH 512 @@ -89,32 +78,6 @@ static void gem_userptr_test_synchronized(void) userptr_flags = 0; }

-static int gem_userptr(int fd, void *ptr, int size, int read_only, uint32_t *handle) -{ - struct local_i915_gem_userptr userptr; - int ret; - - memset(&userptr, 0, sizeof(userptr)); - userptr.user_ptr = (uintptr_t)ptr; - userptr.user_size = size; - userptr.flags = userptr_flags; - if (read_only) - userptr.flags |= LOCAL_I915_USERPTR_READ_ONLY; - - ret = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_USERPTR, &userptr); - if (ret) - ret = errno; - igt_skip_on_f(ret == ENODEV && - (userptr_flags & LOCAL_I915_USERPTR_UNSYNCHRONIZED) == 0 && - !read_only, - "Skipping, synchronized mappings with no kernel CONFIG_MMU_NOTIFIER?"); - if (ret == 0) - *handle = userptr.handle; - - return ret; -} - - static void gem_userptr_sync(int fd, uint32_t handle) { gem_set_domain(fd, handle, I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU); @@ -289,10 +252,9 @@ static uint32_t create_userptr(int fd, uint32_t val, uint32_t *ptr) { uint32_t handle; - int i, ret; + int i;

- ret = gem_userptr(fd, ptr, sizeof(linear), 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr, sizeof(linear), 0, userptr_flags, &handle); igt_assert(handle != 0);

/* Fill the BO with dwords starting at val */ @@ -363,7 +325,6 @@ static uint32_t create_userptr_bo(int fd, uint64_t size) { void *ptr; uint32_t handle; - int ret;

ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, @@ -371,8 +332,7 @@ static uint32_t create_userptr_bo(int fd, uint64_t size) -1, 0); igt_assert(ptr != MAP_FAILED);

- ret = gem_userptr(fd, (uint32_t *)ptr, size, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, (uint32_t *)ptr, size, 0, userptr_flags, &handle); add_handle_ptr(handle, ptr, size);

return handle; @@ -450,7 +410,7 @@ static int has_userptr(int fd) igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0); oldflags = userptr_flags; gem_userptr_test_unsynchronized(); - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle); userptr_flags = oldflags; if (ret != 0) { free(ptr); @@ -509,7 +469,7 @@ static int test_access_control(int fd)

igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0);

- ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle); if (ret == 0) gem_close(fd, handle); free(ptr); @@ -524,11 +484,9 @@ static int test_access_control(int fd) static int test_invalid_null_pointer(int fd) { uint32_t handle; - int ret;

/* NULL pointer. */ - ret = gem_userptr(fd, NULL, PAGE_SIZE, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, NULL, PAGE_SIZE, 0, userptr_flags, &handle);

copy(fd, handle, handle, ~0); /* QQQ Precise errno? */ gem_close(fd, handle); @@ -540,7 +498,6 @@ static int test_invalid_gtt_mapping(int fd) { uint32_t handle, handle2; void *ptr; - int ret;

/* GTT mapping */ handle = create_bo(fd, 0); @@ -550,8 +507,7 @@ static int test_invalid_gtt_mapping(int fd) igt_assert(((unsigned long)ptr & (PAGE_SIZE - 1)) == 0); igt_assert((sizeof(linear) & (PAGE_SIZE - 1)) == 0);

- ret = gem_userptr(fd, ptr, sizeof(linear), 0, &handle2); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr, sizeof(linear), 0, userptr_flags, &handle2); copy(fd, handle2, handle2, ~0); /* QQQ Precise errno? */ gem_close(fd, handle2);

@@ -594,8 +550,7 @@ static void test_forked_access(int fd) #ifdef MADV_DONTFORK ret |= madvise(ptr1, sizeof(linear), MADV_DONTFORK); #endif - ret |= gem_userptr(fd, ptr1, sizeof(linear), 0, &handle1); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr1, sizeof(linear), 0, userptr_flags, &handle1); igt_assert(ptr1); igt_assert(handle1);

@@ -603,8 +558,7 @@ static void test_forked_access(int fd) #ifdef MADV_DONTFORK ret |= madvise(ptr2, sizeof(linear), MADV_DONTFORK); #endif - ret |= gem_userptr(fd, ptr2, sizeof(linear), 0, &handle2); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr2, sizeof(linear), 0, userptr_flags, &handle2); igt_assert(ptr2); igt_assert(handle2);

@@ -651,8 +605,7 @@ static int test_forbidden_ops(int fd)

igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0);

- ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle);

/* pread/pwrite are not always forbidden, but when they * are they should fail with EINVAL. @@ -839,19 +792,19 @@ static int test_usage_restrictions(int fd) igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE * 2) == 0);

/* Address not aligned. */ - ret = gem_userptr(fd, (char *)ptr + 1, PAGE_SIZE, 0, &handle); + ret = __gem_userptr(fd, (char *)ptr + 1, PAGE_SIZE, 0, userptr_flags, &handle); igt_assert_neq(ret, 0);

/* Size not rounded to page size. */ - ret = gem_userptr(fd, ptr, PAGE_SIZE - 1, 0, &handle); + ret = __gem_userptr(fd, ptr, PAGE_SIZE - 1, 0, userptr_flags, &handle); igt_assert_neq(ret, 0);

/* Both wrong. */ - ret = gem_userptr(fd, (char *)ptr + 1, PAGE_SIZE - 1, 0, &handle); + ret = __gem_userptr(fd, (char *)ptr + 1, PAGE_SIZE - 1, 0, userptr_flags, &handle); igt_assert_neq(ret, 0);

/* Read-only not supported. */ - ret = gem_userptr(fd, (char *)ptr, PAGE_SIZE, 1, &handle); + ret = __gem_userptr(fd, (char *)ptr, PAGE_SIZE, 1, userptr_flags, &handle); igt_assert_neq(ret, 0);

free(ptr); @@ -873,7 +826,7 @@ static int test_create_destroy(int fd, int time) for (n = 0; n < 1000; n++) { igt_assert(posix_memalign(&ptr, PAGE_SIZE, PAGE_SIZE) == 0);

- do_or_die(gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle)); + do_or_die(__gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle));

gem_close(fd, handle); free(ptr); @@ -1065,41 +1018,40 @@ static void test_overlap(int fd, int expected)

igt_assert(posix_memalign((void *)&ptr, PAGE_SIZE, PAGE_SIZE * 3) == 0);

- ret = gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE, 0, userptr_flags, &handle);

/* before, no overlap */ - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle2); + ret = __gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert_eq(ret, 0);

/* after, no overlap */ - ret = gem_userptr(fd, ptr + PAGE_SIZE * 2, PAGE_SIZE, 0, &handle2); + ret = __gem_userptr(fd, ptr + PAGE_SIZE * 2, PAGE_SIZE, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert_eq(ret, 0);

/* exactly overlapping */ - ret = gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE, 0, &handle2); + ret = __gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert(ret == 0 || ret == expected);

/* start overlaps */ - ret = gem_userptr(fd, ptr, PAGE_SIZE * 2, 0, &handle2); + ret = __gem_userptr(fd, ptr, PAGE_SIZE * 2, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert(ret == 0 || ret == expected);

/* end overlaps */ - ret = gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE * 2, 0, &handle2); + ret = __gem_userptr(fd, ptr + PAGE_SIZE, PAGE_SIZE * 2, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert(ret == 0 || ret == expected);

/* subsumes */ - ret = gem_userptr(fd, ptr, PAGE_SIZE * 3, 0, &handle2); + ret = __gem_userptr(fd, ptr, PAGE_SIZE * 3, 0, userptr_flags, &handle2); if (ret == 0) gem_close(fd, handle2); igt_assert(ret == 0 || ret == expected); @@ -1124,8 +1076,7 @@ static void test_unmap(int fd, int expected) bo_ptr = (char *)ALIGN((unsigned long)ptr, PAGE_SIZE);

for (i = 0; i < num_obj; i++, bo_ptr += sizeof(linear)) { - ret = gem_userptr(fd, bo_ptr, sizeof(linear), 0, &bo[i]); - igt_assert_eq(ret, 0); + gem_userptr(fd, bo_ptr, sizeof(linear), 0, userptr_flags, &bo[i]); }

bo[num_obj] = create_bo(fd, 0); @@ -1159,8 +1110,7 @@ static void test_unmap_after_close(int fd) bo_ptr = (char *)ALIGN((unsigned long)ptr, PAGE_SIZE);

bo[num_obj] = create_bo(fd, 0); @@ -1230,8 +1180,7 @@ static void test_stress_mm(int fd) igt_assert_eq(ret, 0);

while (loops--) { - ret = gem_userptr(fd, ptr, PAGE_SIZE, 0, &handle); - igt_assert_eq(ret, 0); + gem_userptr(fd, ptr, PAGE_SIZE, 0, userptr_flags, &handle);

gem_close(fd, handle); } @@ -1265,8 +1214,7 @@ static void *mm_userptr_close_thread(void *data) while (!t->stop) { pthread_mutex_unlock(&t->mutex); for (int i = 0; i < num_handles; i++) - igt_assert_eq(gem_userptr(t->fd, t->ptr, PAGE_SIZE, 0, &handle[i]), - 0); + gem_userptr(t->fd, t->ptr, PAGE_SIZE, 0, userptr_flags, &handle[i]); for (int i = 0; i < num_handles; i++) gem_close(t->fd, handle[i]); pthread_mutex_lock(&t->mutex);

-- 2.1.4

Tiago Vignatti

9:36 p.m.

New subject: [PATCH igt v7 2/6] prime_mmap: Add new test for calling mmap() on dma-buf fds

From: Rob Bradford rob@linux.intel.com

This test has the following subtests: - test_correct for correctness of the data - test_map_unmap checks for mapping idempotency - test_reprime checks for dma-buf creation idempotency - test_forked checks for multiprocess access - test_refcounting checks for buffer reference counting - test_dup checks that dup()ing the fd works - test_userptr make sure it fails when mmaping due the lack of obj->base.filp in a userptr. - test_errors checks the error return values for failures - test_aperture_limit tests multiple buffer creation at the gtt aperture limit

v2 (Tiago): Removed pattern_check(), which was walking through a useless iterator. Removed superfluous PROT_WRITE from gem_mmap, in test_correct(). Added binary file to .gitignore v3 (Tiago): squash patch "prime_mmap: Test for userptr mmap" into this one. v4 (Tiago): use synchronized userptr for testing. Add test for buffer overlapping.

Signed-off-by: Rob Bradford rob@linux.intel.com Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- tests/Makefile.sources | 1 + tests/prime_mmap.c | 417 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 418 insertions(+) create mode 100644 tests/prime_mmap.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources index d594038..75f3cb0 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -96,6 +96,7 @@ TESTS_progs_M = \ pm_rps \ pm_rc6_residency \ pm_sseu \ + prime_mmap \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c new file mode 100644 index 0000000..95304a9 --- /dev/null +++ b/tests/prime_mmap.c @@ -0,0 +1,417 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + * Rob Bradford <rob at linux.intel.com> + * + */ + +/* + * Testcase: Check whether mmap()ing dma-buf works + */ +#define _GNU_SOURCE +#include <unistd.h> +#include <stdlib.h> +#include <stdio.h> +#include <string.h> +#include <fcntl.h> +#include <inttypes.h> +#include <errno.h> +#include <sys/stat.h> +#include <sys/ioctl.h> +#include <pthread.h> + +#include "drm.h" +#include "i915_drm.h" +#include "drmtest.h" +#include "igt_debugfs.h" +#include "ioctl_wrappers.h" + +#define BO_SIZE (16*1024) + +static int fd; + +char pattern[] = {0xff, 0x00, 0x00, 0x00, + 0x00, 0xff, 0x00, 0x00, + 0x00, 0x00, 0xff, 0x00, + 0x00, 0x00, 0x00, 0xff}; + +static void +fill_bo(uint32_t handle, size_t size) +{ + off_t i; + for (i = 0; i < size; i+=sizeof(pattern)) + { + gem_write(fd, handle, i, pattern, sizeof(pattern)); + } +} + +static void +test_correct(void) +{ + int dma_buf_fd; + char *ptr1, *ptr2; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + /* Check correctness vs GEM_MMAP_GTT */ + ptr1 = gem_mmap__gtt(fd, handle, BO_SIZE, PROT_READ); + ptr2 = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr1 != MAP_FAILED); + igt_assert(ptr2 != MAP_FAILED); + igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0); + + munmap(ptr1, BO_SIZE); + munmap(ptr2, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +static void +test_map_unmap(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + /* Unmap and remap */ + munmap(ptr, BO_SIZE); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* prime and then unprime and then prime again the same handle */ +static void +test_reprime(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + close (dma_buf_fd); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* map from another process */ +static void +test_forked(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + + igt_fork(childno, 1) { + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + } + close(dma_buf_fd); + igt_waitchildren(); + gem_close(fd, handle); +} + +static void +test_refcounting(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + /* Close gem object before mapping */ + gem_close(fd, handle); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close (dma_buf_fd); +} + +/* dup before mmap */ +static void +test_dup(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + + dma_buf_fd = dup(prime_handle_to_fd(fd, handle)); + igt_assert(errno == 0); + + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + gem_close(fd, handle); + close (dma_buf_fd); +} + + +/* Used for error case testing to avoid wrapper */ +static int prime_handle_to_fd_no_assert(uint32_t handle, int *fd_out) +{ + struct drm_prime_handle args; + int ret; + + args.handle = handle; + args.flags = DRM_CLOEXEC; + args.fd = -1; + + ret = drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args); + if (ret) + ret = errno; + *fd_out = args.fd; + + return ret; +} + +/* test for mmap(dma_buf_export(userptr)) */ +static void +test_userptr(void) +{ + int ret, dma_buf_fd; + void *ptr; + uint32_t handle; + + /* create userptr bo */ + ret = posix_memalign(&ptr, 4096, BO_SIZE); + igt_assert_eq(ret, 0); + + /* we are not allowed to export unsynchronized userptr. Just create a normal + * one */ + gem_userptr(fd, (uint32_t *)ptr, BO_SIZE, 0, 0, &handle); + + /* export userptr */ + ret = prime_handle_to_fd_no_assert(handle, &dma_buf_fd); + if (ret) { + igt_assert(ret == EINVAL || ret == ENODEV); + goto free_userptr; + } else { + igt_assert_eq(ret, 0); + igt_assert_lte(0, dma_buf_fd); + } + + /* a userptr doesn't have the obj->base.filp, but can be exported via + * dma-buf, so make sure it fails here */ + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr == MAP_FAILED && errno == ENODEV); +free_userptr: + gem_close(fd, handle); + close(dma_buf_fd); +} + +static void +test_errors(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + /* Close gem object before priming */ + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + gem_close(fd, handle); + prime_handle_to_fd_no_assert(handle, &dma_buf_fd); + igt_assert(dma_buf_fd == -1 && errno == ENOENT); + errno = 0; + + /* close fd before mapping */ + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + close(dma_buf_fd); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr == MAP_FAILED && errno == EBADF); + errno = 0; + gem_close(fd, handle); + + /* Map too big */ + handle = gem_create(fd, BO_SIZE); + fill_bo(handle, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + ptr = mmap(NULL, BO_SIZE * 2, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr == MAP_FAILED && errno == EINVAL); + errno = 0; + close(dma_buf_fd); + gem_close(fd, handle); + + /* Overlapping the end of the buffer */ + handle = gem_create(fd, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + igt_assert(errno == 0); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, BO_SIZE / 2); + igt_assert(ptr == MAP_FAILED && errno == EINVAL); + errno = 0; + close(dma_buf_fd); + gem_close(fd, handle); +} + +static void +test_aperture_limit(void) +{ + int dma_buf_fd1, dma_buf_fd2; + char *ptr1, *ptr2; + uint32_t handle1, handle2; + /* Two buffers the sum of which > mappable aperture */ + uint64_t size1 = (gem_mappable_aperture_size() * 7) / 8; + uint64_t size2 = (gem_mappable_aperture_size() * 3) / 8; + + handle1 = gem_create(fd, size1); + fill_bo(handle1, BO_SIZE); + + dma_buf_fd1 = prime_handle_to_fd(fd, handle1); + igt_assert(errno == 0); + ptr1 = mmap(NULL, size1, PROT_READ, MAP_SHARED, dma_buf_fd1, 0); + igt_assert(ptr1 != MAP_FAILED); + igt_assert(memcmp(ptr1, pattern, sizeof(pattern)) == 0); + + handle2 = gem_create(fd, size1); + fill_bo(handle2, BO_SIZE); + dma_buf_fd2 = prime_handle_to_fd(fd, handle2); + igt_assert(errno == 0); + ptr2 = mmap(NULL, size2, PROT_READ, MAP_SHARED, dma_buf_fd2, 0); + igt_assert(ptr2 != MAP_FAILED); + igt_assert(memcmp(ptr2, pattern, sizeof(pattern)) == 0); + + igt_assert(memcmp(ptr1, ptr2, BO_SIZE) == 0); + + munmap(ptr1, size1); + munmap(ptr2, size2); + close(dma_buf_fd1); + close(dma_buf_fd2); + gem_close(fd, handle1); + gem_close(fd, handle2); +} + +static int +check_for_dma_buf_mmap(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + int ret = 1; + + handle = gem_create(fd, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + ptr = mmap(NULL, BO_SIZE, PROT_READ, MAP_SHARED, dma_buf_fd, 0); + if (ptr != MAP_FAILED) + ret = 0; + munmap(ptr, BO_SIZE); + gem_close(fd, handle); + close(dma_buf_fd); + return ret; +} + +igt_main +{ + struct { + const char *name; + void (*fn)(void); + } tests[] = { + { "test_correct", test_correct }, + { "test_map_unmap", test_map_unmap }, + { "test_reprime", test_reprime }, + { "test_forked", test_forked }, + { "test_refcounting", test_refcounting }, + { "test_dup", test_dup }, + { "test_userptr", test_userptr }, + { "test_errors", test_errors }, + { "test_aperture_limit", test_aperture_limit }, + }; + int i; + + igt_fixture { + fd = drm_open_driver(DRIVER_INTEL); + errno = 0; + } + + igt_skip_on((check_for_dma_buf_mmap() != 0)); + + for (i = 0; i < ARRAY_SIZE(tests); i++) { + igt_subtest(tests[i].name) + tests[i].fn(); + } + + igt_fixture + close(fd); +}

-- 2.1.4

Tiago Vignatti

9:36 p.m.

New subject: [PATCH igt v7 3/6] prime_mmap: Add basic tests to write in a bo using CPU

This patch adds test_correct_cpu_write, which maps the texture buffer through a prime fd and then writes directly to it using the CPU. It stresses the driver to guarantee cache synchronization among the different domains.

This test also adds test_forked_cpu_write, which creates the GEM bo in one process and pass the prime handle of the it to another process, which in turn uses the handle only to map and write. Roughly speaking this test simulates Chrome OS architecture, where the Web content ("unpriviledged process") maps and CPU-draws a buffer, which was previously allocated in the GPU process ("priviledged process").

This requires kernel modifications (Daniel Thompson's "drm: prime: Honour O_RDWR during prime-handle-to-fd") and therefore prime_handle_to_fd_for_mmap is added to fail in case these lack. Also, upcoming tests (e.g. next patch) are going to use it as well, so make it public and available in the lib.

v2: adds prime_handle_to_fd_with_mmap for skipping test in older kernels and test for invalid flags.

Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- lib/ioctl_wrappers.c | 25 +++++++++++++++ lib/ioctl_wrappers.h | 4 +++ tests/prime_mmap.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 6cad8a2..86a61ba 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1329,6 +1329,31 @@ int prime_handle_to_fd(int fd, uint32_t handle) }

/** + * prime_handle_to_fd_for_mmap: + * @fd: open i915 drm file descriptor + * @handle: file-private gem buffer object handle + * + * Same as prime_handle_to_fd above but with DRM_RDWR capabilities, which can + * be useful for writing into the mmap'ed dma-buf file-descriptor. + * + * Returns: The created dma-buf fd handle or -1 if the ioctl fails. + */ +int prime_handle_to_fd_for_mmap(int fd, uint32_t handle) +{ + struct drm_prime_handle args; + + memset(&args, 0, sizeof(args)); + args.handle = handle; + args.flags = DRM_CLOEXEC | DRM_RDWR; + args.fd = -1; + + if (drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args) != 0) + return -1; + + return args.fd; +} + +/** * prime_fd_to_handle: * @fd: open i915 drm file descriptor * @dma_buf_fd: dma-buf fd handle diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index bb8a858..d3ffba2 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -149,6 +149,10 @@ void gem_require_ring(int fd, int ring_id);

/* prime */ int prime_handle_to_fd(int fd, uint32_t handle); +#ifndef DRM_RDWR +#define DRM_RDWR O_RDWR +#endif +int prime_handle_to_fd_for_mmap(int fd, uint32_t handle); uint32_t prime_fd_to_handle(int fd, int dma_buf_fd); off_t prime_get_size(int dma_buf_fd);

diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index 95304a9..269ada6 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -22,6 +22,7 @@ * * Authors: * Rob Bradford <rob at linux.intel.com> + * Tiago Vignatti <tiago.vignatti at intel.com> * */

@@ -66,6 +67,12 @@ fill_bo(uint32_t handle, size_t size) }

static void +fill_bo_cpu(char *ptr) +{ + memcpy(ptr, pattern, sizeof(pattern)); +} + +static void test_correct(void) { int dma_buf_fd; @@ -180,6 +187,65 @@ test_forked(void) gem_close(fd, handle); }

+/* test simple CPU write */ +static void +test_correct_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle); + + /* Skip if DRM_RDWR is not supported */ + igt_skip_on(errno == EINVAL); + + /* Check correctness of map using write protection (PROT_WRITE) */ + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + /* Fill bo using CPU */ + fill_bo_cpu(ptr); + + /* Check pattern correctness */ + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + gem_close(fd, handle); +} + +/* map from another process and then write using CPU */ +static void +test_forked_cpu_write(void) +{ + int dma_buf_fd; + char *ptr; + uint32_t handle; + + handle = gem_create(fd, BO_SIZE); + + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, handle); + + /* Skip if DRM_RDWR is not supported */ + igt_skip_on(errno == EINVAL); + + igt_fork(childno, 1) { + ptr = mmap(NULL, BO_SIZE, PROT_READ | PROT_WRITE , MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + fill_bo_cpu(ptr); + + igt_assert(memcmp(ptr, pattern, sizeof(pattern)) == 0); + munmap(ptr, BO_SIZE); + close(dma_buf_fd); + } + close(dma_buf_fd); + igt_waitchildren(); + gem_close(fd, handle); +} + static void test_refcounting(void) { @@ -224,15 +290,14 @@ test_dup(void) close (dma_buf_fd); }

- /* Used for error case testing to avoid wrapper */ -static int prime_handle_to_fd_no_assert(uint32_t handle, int *fd_out) +static int prime_handle_to_fd_no_assert(uint32_t handle, int flags, int *fd_out) { struct drm_prime_handle args; int ret;

args.handle = handle; - args.flags = DRM_CLOEXEC; + args.flags = flags; args.fd = -1;

ret = drmIoctl(fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, &args); @@ -260,7 +325,7 @@ test_userptr(void) gem_userptr(fd, (uint32_t *)ptr, BO_SIZE, 0, 0, &handle);

/* export userptr */ - ret = prime_handle_to_fd_no_assert(handle, &dma_buf_fd); + ret = prime_handle_to_fd_no_assert(handle, DRM_CLOEXEC, &dma_buf_fd); if (ret) { igt_assert(ret == EINVAL || ret == ENODEV); goto free_userptr; @@ -281,15 +346,25 @@ free_userptr: static void test_errors(void) { - int dma_buf_fd; + int i, dma_buf_fd; char *ptr; uint32_t handle; + int invalid_flags[] = {DRM_CLOEXEC - 1, DRM_CLOEXEC + 1, + DRM_RDWR - 1, DRM_RDWR + 1}; + + /* Test for invalid flags */ + handle = gem_create(fd, BO_SIZE); + for (i = 0; i < sizeof(invalid_flags) / sizeof(invalid_flags[0]); i++) { + prime_handle_to_fd_no_assert(handle, invalid_flags[i], &dma_buf_fd); + igt_assert_eq(errno, EINVAL); + errno = 0; + }

/* Close gem object before priming */ handle = gem_create(fd, BO_SIZE); fill_bo(handle, BO_SIZE); gem_close(fd, handle); - prime_handle_to_fd_no_assert(handle, &dma_buf_fd); + prime_handle_to_fd_no_assert(handle, DRM_CLOEXEC, &dma_buf_fd); igt_assert(dma_buf_fd == -1 && errno == ENOENT); errno = 0;

@@ -392,6 +467,8 @@ igt_main { "test_map_unmap", test_map_unmap }, { "test_reprime", test_reprime }, { "test_forked", test_forked }, + { "test_correct_cpu_write", test_correct_cpu_write }, + { "test_forked_cpu_write", test_forked_cpu_write }, { "test_refcounting", test_refcounting }, { "test_dup", test_dup }, { "test_userptr", test_userptr },

-- 2.1.4

Tiago Vignatti

9:36 p.m.

New subject: [PATCH igt v7 4/6] lib: Add prime_sync_start and prime_sync_end helpers

This patch adds dma-buf mmap synchronization ioctls that can be used by tests for cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time.

v7: add sync invalid flags test.

Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- lib/ioctl_wrappers.c | 26 ++++++++++++++++++++++++++ lib/ioctl_wrappers.h | 17 +++++++++++++++++ tests/prime_mmap.c | 25 +++++++++++++++++++++++++ 3 files changed, 68 insertions(+)

diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c index 86a61ba..0d84d00 100644 --- a/lib/ioctl_wrappers.c +++ b/lib/ioctl_wrappers.c @@ -1400,6 +1400,32 @@ off_t prime_get_size(int dma_buf_fd) }

/** + * prime_sync_start + * @dma_buf_fd: dma-buf fd handle + */ +void prime_sync_start(int dma_buf_fd) +{ + struct local_dma_buf_sync sync_start; + + memset(&sync_start, 0, sizeof(sync_start)); + sync_start.flags = LOCAL_DMA_BUF_SYNC_START | LOCAL_DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_start); +} + +/** + * prime_sync_end + * @dma_buf_fd: dma-buf fd handle + */ +void prime_sync_end(int dma_buf_fd) +{ + struct local_dma_buf_sync sync_end; + + memset(&sync_end, 0, sizeof(sync_end)); + sync_end.flags = LOCAL_DMA_BUF_SYNC_END | LOCAL_DMA_BUF_SYNC_RW; + do_ioctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync_end); +} + +/** * igt_require_fb_modifiers: * @fd: Open DRM file descriptor. * diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h index d3ffba2..d004165 100644 --- a/lib/ioctl_wrappers.h +++ b/lib/ioctl_wrappers.h @@ -148,6 +148,21 @@ void gem_require_caching(int fd); void gem_require_ring(int fd, int ring_id);

/* prime */ +struct local_dma_buf_sync { + uint64_t flags; +}; + +#define LOCAL_DMA_BUF_SYNC_READ (1 << 0) +#define LOCAL_DMA_BUF_SYNC_WRITE (2 << 0) +#define LOCAL_DMA_BUF_SYNC_RW (LOCAL_DMA_BUF_SYNC_READ | LOCAL_DMA_BUF_SYNC_WRITE) +#define LOCAL_DMA_BUF_SYNC_START (0 << 2) +#define LOCAL_DMA_BUF_SYNC_END (1 << 2) +#define LOCAL_DMA_BUF_SYNC_VALID_FLAGS_MASK \ + (LOCAL_DMA_BUF_SYNC_RW | LOCAL_DMA_BUF_SYNC_END) + +#define LOCAL_DMA_BUF_BASE 'b' +#define LOCAL_DMA_BUF_IOCTL_SYNC _IOW(LOCAL_DMA_BUF_BASE, 0, struct local_dma_buf_sync) + int prime_handle_to_fd(int fd, uint32_t handle); #ifndef DRM_RDWR #define DRM_RDWR O_RDWR @@ -155,6 +170,8 @@ int prime_handle_to_fd(int fd, uint32_t handle); int prime_handle_to_fd_for_mmap(int fd, uint32_t handle); uint32_t prime_fd_to_handle(int fd, int dma_buf_fd); off_t prime_get_size(int dma_buf_fd); +void prime_sync_start(int dma_buf_fd); +void prime_sync_end(int dma_buf_fd);

/* addfb2 fb modifiers */ struct local_drm_mode_fb_cmd2 { diff --git a/tests/prime_mmap.c b/tests/prime_mmap.c index 269ada6..29a0cfd 100644 --- a/tests/prime_mmap.c +++ b/tests/prime_mmap.c @@ -401,6 +401,30 @@ test_errors(void) gem_close(fd, handle); }

+/* Test for invalid flags on sync ioctl */ +static void +test_invalid_sync_flags(void) +{ + int i, dma_buf_fd; + uint32_t handle; + struct local_dma_buf_sync sync; + int invalid_flags[] = {-1, + 0x00, + LOCAL_DMA_BUF_SYNC_RW + 1, + LOCAL_DMA_BUF_SYNC_VALID_FLAGS_MASK + 1}; + + handle = gem_create(fd, BO_SIZE); + dma_buf_fd = prime_handle_to_fd(fd, handle); + for (i = 0; i < sizeof(invalid_flags) / sizeof(invalid_flags[0]); i++) { + memset(&sync, 0, sizeof(sync)); + sync.flags = invalid_flags[i]; + + drmIoctl(dma_buf_fd, LOCAL_DMA_BUF_IOCTL_SYNC, &sync); + igt_assert_eq(errno, EINVAL); + errno = 0; + } +} + static void test_aperture_limit(void) { @@ -473,6 +497,7 @@ igt_main { "test_dup", test_dup }, { "test_userptr", test_userptr }, { "test_errors", test_errors }, + { "test_invalid_sync_flags", test_invalid_sync_flags }, { "test_aperture_limit", test_aperture_limit }, }; int i;

-- 2.1.4

Tiago Vignatti

9:36 p.m.

New subject: [PATCH igt v7 5/6] tests: Add kms_mmap_write_crc for cache coherency tests

This program can be used to detect when CPU writes in the dma-buf mapped object don't land in scanout due cache incoherency.

Although this seems a problem inherently of non-LCC machines ("Atom"), this particular test catches a cache dirt on scanout on LLC machines as well. It's inspired in Ville's kms_pwrite_crc.c and can be used also to test the correctness of the driver's begin_cpu_access and end_cpu_access (which requires i915 implementation.

To see the need for flush, one has to run using '-n' option to not call the sync ioctls which, via a rather simple CPU hog the system will trashes the caches, while the test will catch the coherency issue. If you now suppress '-n', then things should just work like expected.

I tested this with !llc and llc platforms, BTY and IVY respectively.

v2: use prime_handle_to_fd_for_mmap instead. v3: merge end_cpu_access() patch with this and provide options to disable sync. v4: use library's prime_sync_{start,end} instead. v7: use CPU hog instead and use testing rounds to catch the sync problems.

Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- tests/Makefile.sources | 1 + tests/kms_mmap_write_crc.c | 313 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 314 insertions(+) create mode 100644 tests/kms_mmap_write_crc.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources index 75f3cb0..ad2dd6a 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -168,6 +168,7 @@ TESTS_progs = \ kms_3d \ kms_fence_pin_leak \ kms_force_connector_basic \ + kms_mmap_write_crc \ kms_pwrite_crc \ kms_sink_crc_basic \ prime_udl \ diff --git a/tests/kms_mmap_write_crc.c b/tests/kms_mmap_write_crc.c new file mode 100644 index 0000000..6984bbd --- /dev/null +++ b/tests/kms_mmap_write_crc.c @@ -0,0 +1,313 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + * Tiago Vignatti <tiago.vignatti at intel.com> + */ + +#include <errno.h> +#include <limits.h> +#include <stdbool.h> +#include <stdio.h> +#include <string.h> + +#include "drmtest.h" +#include "igt_debugfs.h" +#include "igt_kms.h" +#include "intel_chipset.h" +#include "ioctl_wrappers.h" +#include "igt_aux.h" + +IGT_TEST_DESCRIPTION( + "Use the display CRC support to validate mmap write to an already uncached future scanout buffer."); + +#define ROUNDS 10 + +typedef struct { + int drm_fd; + igt_display_t display; + struct igt_fb fb[2]; + igt_output_t *output; + igt_plane_t *primary; + enum pipe pipe; + igt_crc_t ref_crc; + igt_pipe_crc_t *pipe_crc; + uint32_t devid; +} data_t; + +static int ioctl_sync = true; +int dma_buf_fd; + +static char *dmabuf_mmap_framebuffer(int drm_fd, struct igt_fb *fb) +{ + char *ptr = NULL; + + dma_buf_fd = prime_handle_to_fd_for_mmap(drm_fd, fb->gem_handle); + igt_skip_on(dma_buf_fd == -1 && errno == EINVAL); + + ptr = mmap(NULL, fb->size, PROT_READ | PROT_WRITE, MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr != MAP_FAILED); + + return ptr; +} + +static void test(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + struct igt_fb *fb = &data->fb[1]; + drmModeModeInfo *mode; + cairo_t *cr; + char *ptr; + uint32_t caching; + void *buf; + igt_crc_t crc; + + mode = igt_output_get_mode(output); + + /* create a non-white fb where we can write later */ + igt_create_fb(data->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB8888, LOCAL_DRM_FORMAT_MOD_NONE, fb); + + ptr = dmabuf_mmap_framebuffer(data->drm_fd, fb); + + cr = igt_get_cairo_ctx(data->drm_fd, fb); + igt_paint_test_pattern(cr, fb->width, fb->height); + cairo_destroy(cr); + + /* flip to it to make it UC/WC and fully flushed */ + igt_plane_set_fb(data->primary, fb); + igt_display_commit(display); + + /* flip back the original white buffer */ + igt_plane_set_fb(data->primary, &data->fb[0]); + igt_display_commit(display); + + /* make sure caching mode has become UC/WT */ + caching = gem_get_caching(data->drm_fd, fb->gem_handle); + igt_assert(caching == I915_CACHING_NONE || caching == I915_CACHING_DISPLAY); + + /* + * firstly demonstrate the need for DMA_BUF_SYNC_START ("begin_cpu_access") + */ + if (ioctl_sync) + prime_sync_start(dma_buf_fd); + + /* use dmabuf pointer to make the other fb all white too */ + buf = malloc(fb->size); + igt_assert(buf != NULL); + memset(buf, 0xff, fb->size); + memcpy(ptr, buf, fb->size); + free(buf); + + /* and flip to it */ + igt_plane_set_fb(data->primary, fb); + igt_display_commit(display); + + /* check that the crc is as expected, which requires that caches got flushed */ + igt_pipe_crc_collect_crc(data->pipe_crc, &crc); + igt_assert_crc_equal(&crc, &data->ref_crc); + + /* + * now demonstrates the need for DMA_BUF_SYNC_END ("end_cpu_access") + */ + + /* start over, writing non-white to the fb again and flip to it to make it + * fully flushed */ + cr = igt_get_cairo_ctx(data->drm_fd, fb); + igt_paint_test_pattern(cr, fb->width, fb->height); + cairo_destroy(cr); + + igt_plane_set_fb(data->primary, fb); + igt_display_commit(display); + + /* sync start, to move to CPU domain */ + if (ioctl_sync) + prime_sync_start(dma_buf_fd); + + /* use dmabuf pointer in the same fb to make it all white */ + buf = malloc(fb->size); + igt_assert(buf != NULL); + memset(buf, 0xff, fb->size); + memcpy(ptr, buf, fb->size); + free(buf); + + /* if we don't change to the GTT domain again, the whites won't get flushed + * and therefore we demonstrates the need for sync end here */ + if (ioctl_sync) + prime_sync_end(dma_buf_fd); + + /* check that the crc is as expected, which requires that caches got flushed */ + igt_pipe_crc_collect_crc(data->pipe_crc, &crc); + igt_assert_crc_equal(&crc, &data->ref_crc); +} + +static bool prepare_crtc(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + drmModeModeInfo *mode; + + /* select the pipe we want to use */ + igt_output_set_pipe(output, data->pipe); + igt_display_commit(display); + + if (!output->valid) { + igt_output_set_pipe(output, PIPE_ANY); + igt_display_commit(display); + return false; + } + + mode = igt_output_get_mode(output); + + /* create a white reference fb and flip to it */ + igt_create_color_fb(data->drm_fd, mode->hdisplay, mode->vdisplay, + DRM_FORMAT_XRGB8888, LOCAL_DRM_FORMAT_MOD_NONE, + 1.0, 1.0, 1.0, &data->fb[0]); + + data->primary = igt_output_get_plane(output, IGT_PLANE_PRIMARY); + + igt_plane_set_fb(data->primary, &data->fb[0]); + igt_display_commit(display); + + if (data->pipe_crc) + igt_pipe_crc_free(data->pipe_crc); + + data->pipe_crc = igt_pipe_crc_new(data->pipe, + INTEL_PIPE_CRC_SOURCE_AUTO); + + /* get reference crc for the white fb */ + igt_pipe_crc_collect_crc(data->pipe_crc, &data->ref_crc); + + return true; +} + +static void cleanup_crtc(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output = data->output; + + igt_pipe_crc_free(data->pipe_crc); + data->pipe_crc = NULL; + + igt_plane_set_fb(data->primary, NULL); + + igt_output_set_pipe(output, PIPE_ANY); + igt_display_commit(display); + + igt_remove_fb(data->drm_fd, &data->fb[0]); + igt_remove_fb(data->drm_fd, &data->fb[1]); +} + +static void run_test(data_t *data) +{ + igt_display_t *display = &data->display; + igt_output_t *output; + enum pipe pipe; + + for_each_connected_output(display, output) { + data->output = output; + for_each_pipe(display, pipe) { + data->pipe = pipe; + + if (!prepare_crtc(data)) + continue; + + test(data); + cleanup_crtc(data); + + /* once is enough */ + return; + } + } + + igt_skip("no valid crtc/connector combinations found\n"); +} + +/** + * fork_cpuhog_helper: + * + * Fork a child process that loops indefinitely to consume CPU. This is used to + * fill the CPU caches with random information so they can get stalled, + * provoking incoherency with the GPU most likely. + */ +static void fork_cpuhog_helper(void) { + + /* TODO: if the parent is about to die before its child, e.g. + * igt_assert_crc_equal() fails, there will be a boring exit handler + * waiting the child to exit also. A workaround is to simply disable that + * handler, buy this needs to be fixed properly in an elegant way. */ + igt_disable_exit_handler(); + + igt_fork(hog, 1) { + while (1) { + usleep(10); /* quite ramdom really. */ + + if ((int)getppid() == 1) /* Parent has died, so must we. */ + exit(0); + } + } +} + +static int opt_handler(int opt, int opt_index, void *data) +{ + if (opt == 'n') { + ioctl_sync = false; + igt_info("set via cmd line to not use sync ioctls\n"); + } + + return 0; +} + +static data_t data; + +int main(int argc, char **argv) +{ + int i; + igt_simple_init_parse_opts(&argc, argv, "n", NULL, NULL, opt_handler, NULL); + + igt_skip_on_simulation(); + + igt_fixture { + data.drm_fd = drm_open_driver_master(DRIVER_INTEL); + + data.devid = intel_get_drm_devid(data.drm_fd); + + kmstest_set_vt_graphics_mode(); + + igt_require_pipe_crc(); + + igt_display_init(&data.display, data.drm_fd); + } + + igt_info("Using %d rounds for the test\n", ROUNDS); + fork_cpuhog_helper(); + + for (i = 0; i < ROUNDS; i++) + run_test(&data); + + igt_fixture { + igt_display_fini(&data.display); + } + + igt_exit(); +}

-- 2.1.4

Tiago Vignatti

9:36 p.m.

New subject: [PATCH igt v7 6/6] tests: Add prime_mmap_coherency for cache coherency tests

Different than kms_mmap_write_crc that captures the coherency issues within the scanout mapped buffer, this one is meant for test dma-buf mmap on !llc platforms mostly and provoke coherency bugs so we know where we need the sync ioctls.

I tested this with !llc and llc platforms, BTY and IVY respectively.

Signed-off-by: Tiago Vignatti tiago.vignatti@intel.com --- tests/Makefile.sources | 1 + tests/prime_mmap_coherency.c | 246 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 247 insertions(+) create mode 100644 tests/prime_mmap_coherency.c

diff --git a/tests/Makefile.sources b/tests/Makefile.sources index ad2dd6a..78605c6 100644 --- a/tests/Makefile.sources +++ b/tests/Makefile.sources @@ -97,6 +97,7 @@ TESTS_progs_M = \ pm_rc6_residency \ pm_sseu \ prime_mmap \ + prime_mmap_coherency \ prime_self_import \ template \ $(NULL) diff --git a/tests/prime_mmap_coherency.c b/tests/prime_mmap_coherency.c new file mode 100644 index 0000000..a9a2664 --- /dev/null +++ b/tests/prime_mmap_coherency.c @@ -0,0 +1,246 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + * Tiago Vignatti + */ + +/** @file prime_mmap_coherency.c + * + * TODO: need to show the need for prime_sync_end(). + */ + +#include "igt.h" + +IGT_TEST_DESCRIPTION("Test dma-buf mmap on !llc platforms mostly and provoke" + " coherency bugs so we know for sure where we need the sync ioctls."); + +#define ROUNDS 20 + +int fd; +int stale = 0; +static drm_intel_bufmgr *bufmgr; +struct intel_batchbuffer *batch; +static int width = 1024, height = 1024; + +/* + * Exercises the need for read flush: + * 1. create a BO and write '0's, in GTT domain. + * 2. read BO using the dma-buf CPU mmap. + * 3. write '1's, in GTT domain. + * 4. read again through the mapped dma-buf. + */ +static void test_read_flush(bool expect_stale_cache) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr_gtt; + int dma_buf_fd, i; + + if (expect_stale_cache) + igt_require(!gem_has_llc(fd)); + + bo_1 = drm_intel_bo_alloc(bufmgr, "BO 1", width * height * 4, 4096); + + /* STEP #1: put the BO 1 in GTT domain. We use the blitter to copy and fill + * zeros to BO 1, so commands will be submitted and likely to place BO 1 in + * the GTT domain. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #2: read BO 1 using the dma-buf CPU mmap. This dirties the CPU caches. */ + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + for (i = 0; i < (width * height) / 4; i++) + igt_assert_eq(ptr_cpu[i], 0); + + /* STEP #3: write 0x11 into BO 1. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + ptr_gtt = gem_mmap__gtt(fd, bo_2->handle, width * height, PROT_READ | PROT_WRITE); + memset(ptr_gtt, 0x11, width * height); + munmap(ptr_gtt, width * height); + + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #4: read again using the CPU mmap. Doing #1 before #3 makes sure we + * don't do a full CPU cache flush in step #3 again. That makes sure all the + * stale cachelines from step #2 survive (mostly, a few will be evicted) + * until we try to read them again in step #4. This behavior could be fixed + * by flush CPU read right before accessing the CPU pointer */ + if (!expect_stale_cache) + prime_sync_start(dma_buf_fd); + + for (i = 0; i < (width * height) / 4; i++) + if (ptr_cpu[i] != 0x11111111) { + igt_warn_on_f(!expect_stale_cache, + "Found 0x%08x at offset 0x%08x\n", ptr_cpu[i], i); + stale++; + } + + drm_intel_bo_unreference(bo_1); + munmap(ptr_cpu, width * height); +} + +/* + * Exercises the need for write flush: + * 1. create BO 1 and write '0's, in GTT domain. + * 2. write '1's into BO 1 using the dma-buf CPU mmap. + * 3. copy BO 1 to new BO 2, in GTT domain. + * 4. read via dma-buf mmap BO 2. + */ +static void test_write_flush(bool expect_stale_cache) +{ + drm_intel_bo *bo_1; + drm_intel_bo *bo_2; + uint32_t *ptr_cpu; + uint32_t *ptr2_cpu; + int dma_buf_fd, dma_buf2_fd, i; + + if (expect_stale_cache) + igt_require(!gem_has_llc(fd)); + + bo_1 = drm_intel_bo_alloc(bufmgr, "BO 1", width * height * 4, 4096); + + /* STEP #1: Put the BO 1 in GTT domain. We use the blitter to copy and fill + * zeros to BO 1, so commands will be submitted and likely to place BO 1 in + * the GTT domain. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + intel_copy_bo(batch, bo_1, bo_2, width * height); + gem_sync(fd, bo_1->handle); + drm_intel_bo_unreference(bo_2); + + /* STEP #2: Write '1's into BO 1 using the dma-buf CPU mmap. */ + dma_buf_fd = prime_handle_to_fd_for_mmap(fd, bo_1->handle); + igt_skip_on(errno == EINVAL); + + ptr_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf_fd, 0); + igt_assert(ptr_cpu != MAP_FAILED); + + /* This is the main point of this test: !llc hw requires a cache write + * flush right here (explained in step #4). */ + if (!expect_stale_cache) + prime_sync_start(dma_buf_fd); + + memset(ptr_cpu, 0x11, width * height); + + /* STEP #3: Copy BO 1 into BO 2, using blitter. */ + bo_2 = drm_intel_bo_alloc(bufmgr, "BO 2", width * height * 4, 4096); + intel_copy_bo(batch, bo_2, bo_1, width * height); + gem_sync(fd, bo_2->handle); + + /* STEP #4: compare BO 2 against written BO 1. In !llc hardware, there + * should be some cache lines that didn't get flushed out and are still 0, + * requiring cache flush before the write in step 2. */ + dma_buf2_fd = prime_handle_to_fd_for_mmap(fd, bo_2->handle); + igt_skip_on(errno == EINVAL); + + ptr2_cpu = mmap(NULL, width * height, PROT_READ | PROT_WRITE, + MAP_SHARED, dma_buf2_fd, 0); + igt_assert(ptr2_cpu != MAP_FAILED); + + for (i = 0; i < (width * height) / 4; i++) + if (ptr2_cpu[i] != 0x11111111) { + igt_warn_on_f(!expect_stale_cache, + "Found 0x%08x at offset 0x%08x\n", ptr2_cpu[i], i); + stale++; + } + + drm_intel_bo_unreference(bo_1); + drm_intel_bo_unreference(bo_2); + munmap(ptr_cpu, width * height); +} + +int main(int argc, char **argv) +{ + int i; + bool expect_stale_cache; + igt_subtest_init(argc, argv); + + igt_fixture { + fd = drm_open_driver(DRIVER_INTEL); + + bufmgr = drm_intel_bufmgr_gem_init(fd, 4096); + batch = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd)); + } + + /* Cache coherency and the eviction are pretty much unpredictable, so + * reproducing boils down to trial and error to hit different scenarios. + * TODO: We may want to improve tests a bit by picking random subranges. */ + igt_info("%d rounds for each test\n", ROUNDS); + igt_subtest("read") { + stale = 0; + expect_stale_cache = false; + igt_info("exercising read flush\n"); + for (i = 0; i < ROUNDS; i++) + test_read_flush(expect_stale_cache); + igt_fail_on_f(stale, "num of stale cache lines %d\n", stale); + } + + /* Only for !llc platforms */ + igt_subtest("read-and-fail") { + stale = 0; + expect_stale_cache = true; + igt_info("exercising read flush and expect to fail on !llc\n"); + for (i = 0; i < ROUNDS; i++) + test_read_flush(expect_stale_cache); + igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); + } + + igt_subtest("write") { + stale = 0; + expect_stale_cache = false; + igt_info("exercising write flush\n"); + for (i = 0; i < ROUNDS; i++) + test_write_flush(expect_stale_cache); + igt_fail_on_f(stale, "num of stale cache lines %d\n", stale); + } + + /* Only for !llc platforms */ + igt_subtest("write-and-fail") { + stale = 0; + expect_stale_cache = true; + igt_info("exercising write flush and expect to fail on !llc\n"); + for (i = 0; i < ROUNDS; i++) + test_write_flush(expect_stale_cache); + igt_fail_on_f(!stale, "couldn't find any stale cache lines\n"); + } + + igt_fixture { + intel_batchbuffer_free(batch); + drm_intel_bufmgr_destroy(bufmgr); + + close(fd); + } + + igt_exit(); +}

-- 2.1.4

Stéphane Marchesin

4 Feb 4 Feb

8:55 p.m.

On Tue, Dec 22, 2015 at 1:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...

Hey back,

Thank you Daniel, Chris, Alex and Thomas for reviewing the last series. I think I addressed most of the comments now in version 7, including:

being even more wording in the doc about sync usage.

pass .write = false always in i915 end_cpu_access.

add sync invalid flags test (igt).

in kms_mmap_write_crc, use CPU hog and testing rounds to catch the sync problems (igt).

Here are the trees:

https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf... https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v7

Also, Chrome OS side is in progress. This past week I've been mostly struggling with fail attempts to build it (boots and goes to a black screen. Sigh.) and also finding a way to make my funky BayTrail-T laptop with 32-bit UEFI firmware boot up (success with Ubuntu but no success yet in CrOS). A WIP of Chromium changes can be seen here anyways:

https://codereview.chromium.org/1262043002/

Happy Holidays!

For the series:

Reviewed-by: Stéphane Marchesin marcheu@chromium.org

...

Tiago

-- 2.1.4

dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

Tiago Vignatti

5 Feb 5 Feb

1:53 p.m.

On 02/04/2016 06:55 PM, Stéphane Marchesin wrote:

...

On Tue, Dec 22, 2015 at 1:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
Hey back,

Thank you Daniel, Chris, Alex and Thomas for reviewing the last series. I think I addressed most of the comments now in version 7, including:

being even more wording in the doc about sync usage.

pass .write = false always in i915 end_cpu_access.

add sync invalid flags test (igt).

in kms_mmap_write_crc, use CPU hog and testing rounds to catch the sync problems (igt).

Here are the trees:

https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf... https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v7

Also, Chrome OS side is in progress. This past week I've been mostly struggling with fail attempts to build it (boots and goes to a black screen. Sigh.) and also finding a way to make my funky BayTrail-T laptop with 32-bit UEFI firmware boot up (success with Ubuntu but no success yet in CrOS). A WIP of Chromium changes can be seen here anyways:

https://codereview.chromium.org/1262043002/

Happy Holidays!

For the series:

Reviewed-by: Stéphane Marchesin marcheu@chromium.org

Thank you! Daniel, here are the trees ready for pulling (I hope) now:

https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf... https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v8

Tiago

Daniel Vetter

9 Feb 9 Feb

8:47 a.m.

On Fri, Feb 05, 2016 at 11:53:03AM -0200, Tiago Vignatti wrote:

...

On 02/04/2016 06:55 PM, Stéphane Marchesin wrote:

...
On Tue, Dec 22, 2015 at 1:36 PM, Tiago Vignatti tiago.vignatti@intel.com wrote:

...
Hey back,

Thank you Daniel, Chris, Alex and Thomas for reviewing the last series. I think I addressed most of the comments now in version 7, including:

being even more wording in the doc about sync usage.

pass .write = false always in i915 end_cpu_access.

add sync invalid flags test (igt).

in kms_mmap_write_crc, use CPU hog and testing rounds to catch the sync problems (igt).

Here are the trees:

https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf... https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v7

Also, Chrome OS side is in progress. This past week I've been mostly struggling with fail attempts to build it (boots and goes to a black screen. Sigh.) and also finding a way to make my funky BayTrail-T laptop with 32-bit UEFI firmware boot up (success with Ubuntu but no success yet in CrOS). A WIP of Chromium changes can be seen here anyways:

https://codereview.chromium.org/1262043002/

Happy Holidays!

For the series:

Reviewed-by: Stéphane Marchesin marcheu@chromium.org

Thank you! Daniel, here are the trees ready for pulling (I hope) now:

https://github.com/tiagovignatti/drm-intel/commits/drm-intel-nightly_dma-buf... https://github.com/tiagovignatti/intel-gpu-tools/commits/dma-buf-mmap-v8

Pulled in the kernel bits, but the igt ones need to be rebased. Can you please resend?

Thanks, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

3336

Age (days ago)

3423

Last active (days ago)

dri-devel@lists.freedesktop.org

42 comments

7 participants

tags (0)

participants (7)

Alex Deucher
Chris Wilson
Daniel Vetter
David Herrmann
Stéphane Marchesin
Tiago Vignatti
Ville Syrjälä