This series contains the drm and i915 bits required to implement the VK_KHR_external_semaphore and VK_KHR_external_fence extensions on top of DRM sync objects.
Patches 1-3 are cleanups and fixes for the core syncobj APIs.
Patch 4 is the i915 patch that adds support for syncobj in the i915 driver. I'm re-sending it because I've rebased the latest version on patches 1-3.
Patch 6 is technically not required by the end of the series but is needed for i915 if we don't have patches 7-9. I think it's a good and possibly useful clean-up, but I'm more than willing to drop it in favor of 7-9.
Patch 7 adds a reset ioctl which allows the caller to reset the syncobj by setting the fence to NULL. This is needed to implement vkResetFences.
The real meat of the series is patches 8 and 9. The previous wait implementation would throw -EINVAL if you ever tried to wait on a syncobj which did not yet have a dma_fence. Patch 9 adds a new flag to the syncobj which makes it so that, if it does not yet have a fence, it first waits for a fence to show up and then waits on the fence. This is needed to properly implement VkWaitForFences behavior.
This series can be found as a branch here:
https://cgit.freedesktop.org/~jekstrand/linux/log/?h=drm-syncobj-wait-submit...
Patches to the Intel Vulkan driver to implement VK_KHR_external_fence on top of this kernel interface can be found here:
https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/anv-external-fence
Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Dave Airlie airlied@redhat.com
Dave Airlie (1): drm/syncobj: add sync obj wait interface. (v8)
Jason Ekstrand (8): drm/syncobj: Rename fence_get to find_fence drm/syncobj: Lock around drm_syncobj::fence drm/syncobj: Remove file_private from replace_fence i915: Add support for drm syncobjs dma-buf/dma-fence: Allow wait_any_timeout without default_wait drm/syncobj: Add a reset ioctl drm/syncobj: Add a callback mechanism for replace_fence drm/syncobj: Allow wait for submit and signal behavior
drivers/dma-buf/dma-fence.c | 5 - drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +- drivers/gpu/drm/drm_internal.h | 4 + drivers/gpu/drm/drm_ioctl.c | 4 + drivers/gpu/drm/drm_syncobj.c | 522 ++++++++++++++++++++++++++++- drivers/gpu/drm/i915/i915_drv.c | 3 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 146 +++++++- include/drm/drm_syncobj.h | 46 ++- include/uapi/drm/drm.h | 19 ++ include/uapi/drm/i915_drm.h | 30 +- 10 files changed, 752 insertions(+), 32 deletions(-)
The function has far more in common with drm_syncobj_find than with any in the get/put functions.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/drm_syncobj.c | 10 +++++----- include/drm/drm_syncobj.h | 6 +++--- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index aeee684..0aa33ae 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -984,7 +984,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p, { int r; struct dma_fence *fence; - r = drm_syncobj_fence_get(p->filp, handle, &fence); + r = drm_syncobj_find_fence(p->filp, handle, &fence); if (r) return r;
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 89441bc..6f2a6041 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -97,9 +97,9 @@ void drm_syncobj_replace_fence(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_replace_fence);
-int drm_syncobj_fence_get(struct drm_file *file_private, - u32 handle, - struct dma_fence **fence) +int drm_syncobj_find_fence(struct drm_file *file_private, + u32 handle, + struct dma_fence **fence) { struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle); int ret = 0; @@ -114,7 +114,7 @@ int drm_syncobj_fence_get(struct drm_file *file_private, drm_syncobj_put(syncobj); return ret; } -EXPORT_SYMBOL(drm_syncobj_fence_get); +EXPORT_SYMBOL(drm_syncobj_find_fence);
/** * drm_syncobj_free - free a sync object. @@ -309,7 +309,7 @@ int drm_syncobj_export_sync_file(struct drm_file *file_private, if (fd < 0) return fd;
- ret = drm_syncobj_fence_get(file_private, handle, &fence); + ret = drm_syncobj_find_fence(file_private, handle, &fence); if (ret) goto err_put_fd;
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 2c3610a..9ffff22 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -82,9 +82,9 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, void drm_syncobj_replace_fence(struct drm_file *file_private, struct drm_syncobj *syncobj, struct dma_fence *fence); -int drm_syncobj_fence_get(struct drm_file *file_private, - u32 handle, - struct dma_fence **fence); +int drm_syncobj_find_fence(struct drm_file *file_private, + u32 handle, + struct dma_fence **fence); void drm_syncobj_free(struct kref *kref);
#endif
The atomic exchange operation we were doing before in replace_fence was sufficient for the case where it raced with itself. However, if you have a race between a replace_fence and dma_fence_get(syncobj->fence), you may end up with the entire replace_fence happening between the point in time where the one thread gets the syncobj->fence pointer and when it calls dma_fence_get() on it. If this happens, then the reference may be dropped before we get a chance to get a new one.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/drm_syncobj.c | 25 +++++++++++++++++++++++-- include/drm/drm_syncobj.h | 6 ++++++ 2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 6f2a6041..dafca83 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -75,6 +75,22 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find);
+struct dma_fence *drm_syncobj_fence_get(struct drm_syncobj *syncobj) +{ + struct dma_fence *fence; + + /* Lock to prevent someone from replacing the fence and dropping + * the syncobj's reference between when we get the dma_fence + * pointer and wehen we have a chance to incref. + */ + spin_lock(&syncobj->lock); + fence = dma_fence_get(syncobj->fence); + spin_unlock(&syncobj->lock); + + return fence; +} +EXPORT_SYMBOL(drm_syncobj_fence_get); + /** * drm_syncobj_replace_fence - replace fence in a sync object. * @file_private: drm file private pointer. @@ -91,7 +107,11 @@ void drm_syncobj_replace_fence(struct drm_file *file_private,
if (fence) dma_fence_get(fence); - old_fence = xchg(&syncobj->fence, fence); + + spin_lock(&syncobj->lock); + old_fence = syncobj->fence; + syncobj->fence = fence; + spin_unlock(&syncobj->lock);
dma_fence_put(old_fence); } @@ -107,7 +127,7 @@ int drm_syncobj_find_fence(struct drm_file *file_private, if (!syncobj) return -ENOENT;
- *fence = dma_fence_get(syncobj->fence); + *fence = drm_syncobj_fence_get(syncobj); if (!*fence) { ret = -EINVAL; } @@ -143,6 +163,7 @@ static int drm_syncobj_create(struct drm_file *file_private, return -ENOMEM;
kref_init(&syncobj->refcount); + spin_lock_init(&syncobj->lock);
idr_preload(GFP_KERNEL); spin_lock(&file_private->syncobj_table_lock); diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 9ffff22..422d631 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -46,6 +46,11 @@ struct drm_syncobj { */ struct dma_fence *fence; /** + * @spinlock: + * locks fence. + */ + spinlock_t lock; + /** * @file: * a file backing for this syncobj. */ @@ -79,6 +84,7 @@ drm_syncobj_put(struct drm_syncobj *obj)
struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, u32 handle); +struct dma_fence *drm_syncobj_fence_get(struct drm_syncobj *syncobj); void drm_syncobj_replace_fence(struct drm_file *file_private, struct drm_syncobj *syncobj, struct dma_fence *fence);
On Wed, Aug 9, 2017 at 2:21 PM, Chris Wilson chris@chris-wilson.co.uk wrote:
Doesn't that also require that we start using an RCU for syncobj? That was my interpretation of the hieroglyphics above the definition of get_rcu_safe()
Quoting Jason Ekstrand (2017-08-10 01:31:52)
That we use RCU for the fence, which we do. i.e. so that the fence pointer remains valid during the atomic_inc_unless_zero. The caller is responsible for making sure that the syncobj remains valid across the call (which we do using a ref, but it too could use RCU if that was ever desired). -Chris
Quoting Jason Ekstrand (2017-08-10 01:31:52)
Interesting you mention RCUing syncobj... The spinlock in drm_syncobj_find() is the first contention point we hit. If we do make syncobj RCU'ed we can avoid that lock. -Chris
It's completely unused.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 +-- drivers/gpu/drm/drm_syncobj.c | 5 ++--- include/drm/drm_syncobj.h | 3 +-- 3 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 0aa33ae..3314702 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1070,8 +1070,7 @@ static void amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p) int i;
for (i = 0; i < p->num_post_dep_syncobjs; ++i) { - drm_syncobj_replace_fence(p->filp, p->post_dep_syncobjs[i], - p->fence); + drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence); } }
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index dafca83..7fdf7e8 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -99,8 +99,7 @@ EXPORT_SYMBOL(drm_syncobj_fence_get); * * This replaces the fence on a sync object. */ -void drm_syncobj_replace_fence(struct drm_file *file_private, - struct drm_syncobj *syncobj, +void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence) { struct dma_fence *old_fence = NULL; @@ -313,7 +312,7 @@ int drm_syncobj_import_sync_file_fence(struct drm_file *file_private, return -ENOENT; }
- drm_syncobj_replace_fence(file_private, syncobj, fence); + drm_syncobj_replace_fence(syncobj, fence); dma_fence_put(fence); drm_syncobj_put(syncobj); return 0; diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 422d631..1c81658 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -85,8 +85,7 @@ drm_syncobj_put(struct drm_syncobj *obj) struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, u32 handle); struct dma_fence *drm_syncobj_fence_get(struct drm_syncobj *syncobj); -void drm_syncobj_replace_fence(struct drm_file *file_private, - struct drm_syncobj *syncobj, +void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence); int drm_syncobj_find_fence(struct drm_file *file_private, u32 handle,
This commit adds support for waiting on or signaling DRM syncobjs as part of execbuf. It does so by hijacking the currently unused cliprects pointer to instead point to an array of i915_gem_exec_fence structs which containe a DRM syncobj and a flags parameter which specifies whether to wait on it or to signal it. This implementation theoretically allows for both flags to be set in which case it waits on the dma_fence that was in the syncobj and then immediately replaces it with the dma_fence from the current execbuf.
v2: - Rebase on new syncobj API v3: - Pull everything out into helpers - Do all allocation in gem_execbuffer2 - Pack the flags in the bottom 2 bits of the drm_syncobj* v4: - Prevent a potential race on syncobj->fence
Testcase: igt/gem_exec_fence/syncobj* Signed-off-by: Jason Ekstrand jason@jlekstrand.net Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/i915_drv.c | 3 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 146 ++++++++++++++++++++++++++++- include/uapi/drm/i915_drm.h | 30 +++++- 3 files changed, 171 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 6a22f0e..f8f3d45 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -384,6 +384,7 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_EXEC_FENCE: case I915_PARAM_HAS_EXEC_CAPTURE: case I915_PARAM_HAS_EXEC_BATCH_FIRST: + case I915_PARAM_HAS_EXEC_FENCE_ARRAY: /* For the time being all of these are always true; * if some supported hardware does not have one of these * features this value needs to be provided from @@ -2733,7 +2734,7 @@ static struct drm_driver driver = { */ .driver_features = DRIVER_HAVE_IRQ | DRIVER_IRQ_SHARED | DRIVER_GEM | DRIVER_PRIME | - DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC, + DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_SYNCOBJ, .release = i915_driver_release, .open = i915_driver_open, .lastclose = i915_driver_lastclose, diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 26eedb1..936fb0b 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -33,6 +33,7 @@
#include <drm/drmP.h> #include <drm/i915_drm.h> +#include <drm/drm_syncobj.h>
#include "i915_drv.h" #include "i915_gem_clflush.h" @@ -1800,8 +1801,10 @@ static bool i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec) return false;
/* Kernel clipping was a DRI1 misfeature */ - if (exec->num_cliprects || exec->cliprects_ptr) - return false; + if (!(exec->flags & I915_EXEC_FENCE_ARRAY)) { + if (exec->num_cliprects || exec->cliprects_ptr) + return false; + }
if (exec->DR4 == 0xffffffff) { DRM_DEBUG("UXA submitting garbage DR4, fixing up\n"); @@ -2029,11 +2032,125 @@ eb_select_engine(struct drm_i915_private *dev_priv, return engine; }
+static void __free_fence_array(struct drm_syncobj **fences, unsigned int n) +{ + while (n--) + drm_syncobj_put(ptr_mask_bits(fences[n], 2)); + kvfree(fences); +} + +static struct drm_syncobj **get_fence_array(struct drm_i915_gem_execbuffer2 *args, + struct drm_file *file) +{ + const unsigned int nfences = args->num_cliprects; + struct drm_i915_gem_exec_fence __user *user; + struct drm_syncobj **fences; + unsigned int n; + int err; + + if (!(args->flags & I915_EXEC_FENCE_ARRAY)) + return NULL; + + if (nfences > SIZE_MAX / sizeof(*fences)) + return ERR_PTR(-EINVAL); + + user = u64_to_user_ptr(args->cliprects_ptr); + if (!access_ok(VERIFY_READ, user, nfences * 2 * sizeof(u32))) + return ERR_PTR(-EFAULT); + + fences = kvmalloc_array(args->num_cliprects, sizeof(*fences), + __GFP_NOWARN | GFP_TEMPORARY); + if (!fences) + return ERR_PTR(-ENOMEM); + + for (n = 0; n < nfences; n++) { + struct drm_i915_gem_exec_fence fence; + struct drm_syncobj *syncobj; + + if (__copy_from_user(&fence, user++, sizeof(fence))) { + err = -EFAULT; + goto err; + } + + syncobj = drm_syncobj_find(file, fence.handle); + if (!syncobj) { + DRM_DEBUG("Invalid syncobj handle provided\n"); + err = -EINVAL; + goto err; + } + + fences[n] = ptr_pack_bits(syncobj, fence.flags, 2); + } + + return fences; + +err: + __free_fence_array(fences, n); + return ERR_PTR(err); +} + +static void put_fence_array(struct drm_i915_gem_execbuffer2 *args, + struct drm_syncobj **fences) +{ + if (!fences) + return; + __free_fence_array(fences, args->num_cliprects); +} + +static int await_fence_array(struct i915_execbuffer *eb, + struct drm_syncobj **fences) +{ + const unsigned int nfences = eb->args->num_cliprects; + unsigned int n; + int err; + + for (n = 0; n < nfences; n++) { + struct drm_syncobj *syncobj; + struct dma_fence *fence; + unsigned int flags; + + syncobj = ptr_unpack_bits(fences[n], &flags, 2); + if (!(flags & I915_EXEC_FENCE_WAIT)) + continue; + + fence = drm_syncobj_fence_get(syncobj); + if (!fence) + return -EINVAL; + + err = i915_gem_request_await_dma_fence(eb->request, fence); + dma_fence_put(fence); + if (err < 0) + return err; + } + + return 0; +} + +static void signal_fence_array(struct i915_execbuffer *eb, + struct drm_syncobj **fences) +{ + const unsigned int nfences = eb->args->num_cliprects; + struct dma_fence * const fence = &eb->request->fence; + unsigned int n; + + for (n = 0; n < nfences; n++) { + struct drm_syncobj *syncobj; + unsigned int flags; + + syncobj = ptr_unpack_bits(fences[n], &flags, 2); + if (!(flags & I915_EXEC_FENCE_SIGNAL)) + continue; + + drm_syncobj_replace_fence(syncobj, fence); + } +} + static int i915_gem_do_execbuffer(struct drm_device *dev, struct drm_file *file, struct drm_i915_gem_execbuffer2 *args, - struct drm_i915_gem_exec_object2 *exec) + struct drm_i915_gem_exec_object2 *exec, + struct drm_syncobj **fences) { struct i915_execbuffer eb; struct dma_fence *in_fence = NULL; @@ -2221,6 +2338,12 @@ i915_gem_do_execbuffer(struct drm_device *dev, goto err_request; }
+ if (fences) { + err = await_fence_array(&eb, fences); + if (err < 0) + goto err_request; + } + if (out_fence_fd != -1) { out_fence = sync_file_create(&eb.request->fence); if (!out_fence) { @@ -2244,6 +2367,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, __i915_add_request(eb.request, err == 0); add_to_client(eb.request, file);
+ if (fences) + signal_fence_array(&eb, fences); + if (out_fence) { if (err == 0) { fd_install(out_fence_fd, out_fence->file); @@ -2347,7 +2473,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data, exec2_list[i].flags = 0; }
- err = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list); + err = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list, NULL); if (exec2.flags & __EXEC_HAS_RELOC) { struct drm_i915_gem_exec_object __user *user_exec_list = u64_to_user_ptr(args->buffers_ptr); @@ -2381,6 +2507,7 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data, sizeof(unsigned int)); struct drm_i915_gem_execbuffer2 *args = data; struct drm_i915_gem_exec_object2 *exec2_list; + struct drm_syncobj **fences = NULL; int err;
if (args->buffer_count < 1 || args->buffer_count > SIZE_MAX / sz - 1) { @@ -2407,7 +2534,15 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data, return -EFAULT; }
- err = i915_gem_do_execbuffer(dev, file, args, exec2_list); + if (args->flags & I915_EXEC_FENCE_ARRAY) { + fences = get_fence_array(args, file); + if (IS_ERR(fences)) { + kvfree(exec2_list); + return PTR_ERR(fences); + } + } + + err = i915_gem_do_execbuffer(dev, file, args, exec2_list, fences);
/* * Now that we have begun execution of the batchbuffer, we ignore @@ -2438,5 +2573,6 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS; kvfree(exec2_list); + put_fence_array(args, fences); return err; } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 7650e1d..3a0b61a 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -475,6 +475,11 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_BATCH_FIRST 48
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports supplying an array of + * drm_i915_gem_exec_fence structures. See I915_EXEC_FENCE_ARRAY. + */ +#define I915_PARAM_HAS_EXEC_FENCE_ARRAY 49 + typedef struct drm_i915_getparam { __s32 param; /* @@ -856,6 +861,17 @@ struct drm_i915_gem_exec_object2 { __u64 rsvd2; };
+struct drm_i915_gem_exec_fence { + /** + * User's handle for a dma-fence to wait on or signal. + */ + __u32 handle; + +#define I915_EXEC_FENCE_WAIT (1<<0) +#define I915_EXEC_FENCE_SIGNAL (1<<1) + __u32 flags; +}; + struct drm_i915_gem_execbuffer2 { /** * List of gem_exec_object2 structs @@ -870,7 +886,10 @@ struct drm_i915_gem_execbuffer2 { __u32 DR1; __u32 DR4; __u32 num_cliprects; - /** This is a struct drm_clip_rect *cliprects */ + /** This is a struct drm_clip_rect *cliprects if I915_EXEC_FENCE_ARRAY + * is not set. If I915_EXEC_FENCE_ARRAY is set, then this is a + * struct drm_i915_gem_exec_fence *fences. + */ __u64 cliprects_ptr; #define I915_EXEC_RING_MASK (7<<0) #define I915_EXEC_DEFAULT (0<<0) @@ -971,7 +990,14 @@ struct drm_i915_gem_execbuffer2 { * element). */ #define I915_EXEC_BATCH_FIRST (1<<18) -#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_BATCH_FIRST<<1)) + +/* Setting I915_FENCE_ARRAY implies that num_cliprects and cliprects_ptr + * define an array of i915_gem_exec_fence structures which specify a set of + * dma fences to wait upon or signal. + */ +#define I915_EXEC_FENCE_ARRAY (1<<19) + +#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
#define I915_EXEC_CONTEXT_ID_MASK (0xffffffff) #define i915_execbuffer2_set_context_id(eb2, context) \
From: Dave Airlie airlied@redhat.com
This interface will allow sync object to be used to back Vulkan fences. This API is pretty much the vulkan fence waiting API, and I've ported the code from amdgpu.
v2: accept relative timeout, pass remaining time back to userspace. v3: return to absolute timeouts. v4: absolute zero = poll, rewrite any/all code to have same operation for arrays return -EINVAL for 0 fences. v4.1: fixup fences allocation check, use u64_to_user_ptr v5: move to sec/nsec, and use timespec64 for calcs. v6: use -ETIME and drop the out status flag. (-ETIME is suggested by ickle, I can feel a shed painting) v7: talked to Daniel/Arnd, use ktime and ns everywhere. v8: be more careful in the timeout calculations use uint32_t for counter variables so we don't overflow graciously handle -ENOINT being returned from dma_fence_wait_timeout
Signed-off-by: Dave Airlie airlied@redhat.com Reviewed-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/drm_internal.h | 2 + drivers/gpu/drm/drm_ioctl.c | 2 + drivers/gpu/drm/drm_syncobj.c | 142 +++++++++++++++++++++++++++++++++++++++++ include/uapi/drm/drm.h | 12 ++++ 4 files changed, 158 insertions(+)
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 068b685..f3c4e57 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -163,3 +163,5 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private); diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index f7f150e..310f3ec 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -657,6 +657,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_UNLOCKED|DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, drm_syncobj_fd_to_handle_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl, + DRM_UNLOCKED|DRM_RENDER_ALLOW), };
#define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls ) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 7fdf7e8..2f14e38 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -1,5 +1,7 @@ /* * Copyright 2017 Red Hat + * Parts ported from amdgpu (fence wait code). + * Copyright 2016 Advanced Micro Devices, Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), @@ -31,6 +33,9 @@ * that contain an optional fence. The fence can be updated with a new * fence, or be NULL. * + * syncobj's can be waited upon, where it will wait for the underlying + * fence. + * * syncobj's can be export to fd's and back, these fd's are opaque and * have no other use case, except passing the syncobj between processes. * @@ -471,3 +476,140 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, return drm_syncobj_fd_to_handle(file_private, args->fd, &args->handle); } + +/** + * drm_timeout_abs_to_jiffies - calculate jiffies timeout from absolute value + * + * @timeout_nsec: timeout nsec component in ns, 0 for poll + * + * Calculate the timeout in jiffies from an absolute time in sec/nsec. + */ +static signed long drm_timeout_abs_to_jiffies(int64_t timeout_nsec) +{ + ktime_t abs_timeout, now; + u64 timeout_ns, timeout_jiffies64; + + /* make 0 timeout means poll - absolute 0 doesn't seem valid */ + if (timeout_nsec == 0) + return 0; + + abs_timeout = ns_to_ktime(timeout_nsec); + now = ktime_get(); + + if (!ktime_after(abs_timeout, now)) + return 0; + + timeout_ns = ktime_to_ns(ktime_sub(abs_timeout, now)); + + timeout_jiffies64 = nsecs_to_jiffies64(timeout_ns); + /* clamp timeout to avoid infinite timeout */ + if (timeout_jiffies64 >= MAX_SCHEDULE_TIMEOUT - 1) + return MAX_SCHEDULE_TIMEOUT - 1; + + return timeout_jiffies64 + 1; +} + +static int drm_syncobj_wait_fences(struct drm_device *dev, + struct drm_file *file_private, + struct drm_syncobj_wait *wait, + struct dma_fence **fences) +{ + signed long timeout = drm_timeout_abs_to_jiffies(wait->timeout_nsec); + signed long ret = 0; + uint32_t first = ~0; + + if (wait->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) { + uint32_t i; + for (i = 0; i < wait->count_handles; i++) { + ret = dma_fence_wait_timeout(fences[i], true, timeout); + + /* Various dma_fence wait callbacks will return + * ENOENT to indicate that the fence has already + * been signaled. We need to sanitize this to 0 so + * we don't return early and the client doesn't see + * an unexpected error. + */ + if (ret == -ENOENT) + ret = 0; + + if (ret < 0) + return ret; + if (ret == 0) + break; + timeout = ret; + } + first = 0; + } else { + ret = dma_fence_wait_any_timeout(fences, + wait->count_handles, + true, timeout, + &first); + } + + if (ret < 0) + return ret; + + wait->first_signaled = first; + if (ret == 0) + return -ETIME; + return 0; +} + +int +drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private) +{ + struct drm_syncobj_wait *args = data; + uint32_t *handles; + struct dma_fence **fences; + int ret = 0; + uint32_t i; + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) + return -ENODEV; + + if (args->flags != 0 && args->flags != DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) + return -EINVAL; + + if (args->count_handles == 0) + return -EINVAL; + + /* Get the handles from userspace */ + handles = kmalloc_array(args->count_handles, sizeof(uint32_t), + GFP_KERNEL); + if (handles == NULL) + return -ENOMEM; + + if (copy_from_user(handles, + u64_to_user_ptr(args->handles), + sizeof(uint32_t) * args->count_handles)) { + ret = -EFAULT; + goto err_free_handles; + } + + fences = kcalloc(args->count_handles, + sizeof(struct dma_fence *), GFP_KERNEL); + if (!fences) { + ret = -ENOMEM; + goto err_free_handles; + } + + for (i = 0; i < args->count_handles; i++) { + ret = drm_syncobj_find_fence(file_private, handles[i], + &fences[i]); + if (ret) + goto err_free_fence_array; + } + + ret = drm_syncobj_wait_fences(dev, file_private, + args, fences); + +err_free_fence_array: + for (i = 0; i < args->count_handles; i++) + dma_fence_put(fences[i]); + kfree(fences); +err_free_handles: + kfree(handles); + + return ret; +} diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 101593a..0757c1a 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -718,6 +718,17 @@ struct drm_syncobj_handle { __u32 pad; };
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0) +struct drm_syncobj_wait { + __u64 handles; + /* absolute timeout */ + __s64 timeout_nsec; + __u32 count_handles; + __u32 flags; + __u32 first_signaled; /* only valid when not waiting all */ + __u32 pad; +}; + #if defined(__cplusplus) } #endif @@ -840,6 +851,7 @@ extern "C" { #define DRM_IOCTL_SYNCOBJ_DESTROY DRM_IOWR(0xC0, struct drm_syncobj_destroy) #define DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD DRM_IOWR(0xC1, struct drm_syncobj_handle) #define DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE DRM_IOWR(0xC2, struct drm_syncobj_handle) +#define DRM_IOCTL_SYNCOBJ_WAIT DRM_IOWR(0xC3, struct drm_syncobj_wait)
/** * Device specific ioctls should only be in their respective headers
dma_fence_wait_any_timeout only relies on two things to work correctly:
1) The SIGNALED flag to be set upon fence signal 2) The callback list to get called upon fence signal
Both of these are part of the core dma-fence API so it should work fine even with a non-default wait function.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/dma-buf/dma-fence.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 57da14c..1218692 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -501,11 +501,6 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, for (i = 0; i < count; ++i) { struct dma_fence *fence = fences[i];
- if (fence->ops->wait != dma_fence_default_wait) { - ret = -EINVAL; - goto fence_rm_cb; - } - cb[i].task = current; if (dma_fence_add_callback(fence, &cb[i].base, dma_fence_default_wait_cb)) {
This just resets the dma_fence to NULL so it looks like it's never been signaled. This will be useful once we add the new wait API for allowing wait on "submit and signal" behavior.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c | 2 ++ drivers/gpu/drm/drm_syncobj.c | 24 ++++++++++++++++++++++++ include/uapi/drm/drm.h | 6 ++++++ 4 files changed, 34 insertions(+)
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index f3c4e57..6e8bc4d 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -165,3 +165,5 @@ int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private); diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 310f3ec..1acf6ac 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -659,6 +659,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_UNLOCKED|DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_RESET, drm_syncobj_reset_ioctl, + DRM_UNLOCKED|DRM_RENDER_ALLOW), };
#define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls ) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 2f14e38..25807f5 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -613,3 +613,27 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
return ret; } + +int +drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private) +{ + struct drm_syncobj_reset *args = data; + struct drm_syncobj *syncobj; + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) + return -ENODEV; + + if (args->flags != 0) + return -EINVAL; + + syncobj = drm_syncobj_find(file_private, args->handle); + if (!syncobj) + return -ENOENT; + + drm_syncobj_replace_fence(syncobj, NULL); + + drm_syncobj_put(syncobj); + + return 0; +} diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 0757c1a..4b301b4 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -729,6 +729,11 @@ struct drm_syncobj_wait { __u32 pad; };
+struct drm_syncobj_reset { + __u32 handle; + __u32 flags; +}; + #if defined(__cplusplus) } #endif @@ -852,6 +857,7 @@ extern "C" { #define DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD DRM_IOWR(0xC1, struct drm_syncobj_handle) #define DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE DRM_IOWR(0xC2, struct drm_syncobj_handle) #define DRM_IOCTL_SYNCOBJ_WAIT DRM_IOWR(0xC3, struct drm_syncobj_wait) +#define DRM_IOCTL_SYNCOBJ_RESET DRM_IOWR(0xC4, struct drm_syncobj_reset)
/** * Device specific ioctls should only be in their respective headers
It is useful in certain circumstances to know when the fence is replaced in a syncobj. Specifically, it may be useful to know when the fence goes from NULL to something valid. This does make syncobj_replace_fence a little more expensive because it has to take a lock but, in the common case where there is no callback list, it spends a very short amount of time inside the lock.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/drm_syncobj.c | 51 ++++++++++++++++++++++++++++++++++++++++++- include/drm/drm_syncobj.h | 33 +++++++++++++++++++++++++++- 2 files changed, 82 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 25807f5..510dfc2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -96,6 +96,46 @@ struct dma_fence *drm_syncobj_fence_get(struct drm_syncobj *syncobj) } EXPORT_SYMBOL(drm_syncobj_fence_get);
+static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *cb, + drm_syncobj_func_t func) +{ + cb->func = func; + list_add_tail(&cb->node, &syncobj->cb_list); +} + +/** + * drm_syncobj_add_callback - adds a callback to syncobj::cb_list + * @syncobj: Sync object to which to add the callback + * @cb: Callback to add + * @func: Func to use when initializing the drm_syncobj_cb struct + * + * This adds a callback to be called next time the fence is replaced + */ +void drm_syncobj_add_callback(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *cb, + drm_syncobj_func_t func) +{ + spin_lock(&syncobj->lock); + drm_syncobj_add_callback_locked(syncobj, cb, func); + spin_unlock(&syncobj->lock); +} +EXPORT_SYMBOL(drm_syncobj_add_callback); + +/** + * drm_syncobj_add_callback - removes a callback to syncobj::cb_list + * @syncobj: Sync object from which to remove the callback + * @cb: Callback to remove + */ +void drm_syncobj_remove_callback(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *cb) +{ + spin_lock(&syncobj->lock); + list_del_init(&cb->node); + spin_unlock(&syncobj->lock); +} +EXPORT_SYMBOL(drm_syncobj_remove_callback); + /** * drm_syncobj_replace_fence - replace fence in a sync object. * @file_private: drm file private pointer. @@ -108,13 +148,21 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence) { struct dma_fence *old_fence = NULL; + struct drm_syncobj_cb *cur, *tmp;
if (fence) dma_fence_get(fence);
spin_lock(&syncobj->lock); + old_fence = syncobj->fence; syncobj->fence = fence; + + list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { + list_del_init(&cur->node); + cur->func(syncobj, cur); + } + spin_unlock(&syncobj->lock);
dma_fence_put(old_fence); @@ -151,7 +199,7 @@ void drm_syncobj_free(struct kref *kref) struct drm_syncobj *syncobj = container_of(kref, struct drm_syncobj, refcount); - dma_fence_put(syncobj->fence); + drm_syncobj_replace_fence(syncobj, NULL); kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free); @@ -167,6 +215,7 @@ static int drm_syncobj_create(struct drm_file *file_private, return -ENOMEM;
kref_init(&syncobj->refcount); + INIT_LIST_HEAD(&syncobj->cb_list); spin_lock_init(&syncobj->lock);
idr_preload(GFP_KERNEL); diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 1c81658..a81af76 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -28,6 +28,8 @@
#include "linux/dma-fence.h"
+struct drm_syncobj_cb; + /** * struct drm_syncobj - sync object. * @@ -46,8 +48,13 @@ struct drm_syncobj { */ struct dma_fence *fence; /** + * @list + * List of callbaks to call when the fence gets replaced + */ + struct list_head cb_list; + /** * @spinlock: - * locks fence. + * locks fence and cb_list. */ spinlock_t lock; /** @@ -57,6 +64,25 @@ struct drm_syncobj { struct file *file; };
+typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *cb); + +/** + * struct drm_syncobj_cb - callback for drm_syncobj_add_callback + * @node: used by drm_syncob_add_callback to append this struct to + * syncobj::cb_list + * @func: drm_syncobj_func_t to call + * + * This struct will be initialized by drm_syncobj_add_callback, additional + * data can be passed along by embedding drm_syncobj_cb in another struct. + * The callback will get called the next time drm_syncobj_replace_fence is + * called. + */ +struct drm_syncobj_cb { + struct list_head node; + drm_syncobj_func_t func; +}; + void drm_syncobj_free(struct kref *kref);
/** @@ -85,6 +111,11 @@ drm_syncobj_put(struct drm_syncobj *obj) struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, u32 handle); struct dma_fence *drm_syncobj_fence_get(struct drm_syncobj *syncobj); +void drm_syncobj_add_callback(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *cb, + drm_syncobj_func_t func); +void drm_syncobj_remove_callback(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *cb); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence); int drm_syncobj_find_fence(struct drm_file *file_private,
Vulkan VkFence semantics require that the application be able to perform a CPU wait on work which may not yet have been submitted. This is perfectly safe because the CPU wait has a timeout which will get triggered eventually if no work is ever submitted. This behavior is advantageous for multi-threaded workloads because, so long as all of the threads agree on what fences to use up-front, you don't have the extra cross-thread synchronization cost of thread A telling thread B that it has submitted its dependent work and thread B is now free to wait.
Within a single process, this can be implemented in the userspace driver by doing exactly the same kind of tracking the app would have to do using posix condition variables or similar. However, in order for this to work cross-process (as is required by VK_KHR_external_fence), we need to handle this in the kernel.
This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which instructs the IOCTL to wait for the syncobj to have a non-null fence and then wait on the fence. Combined with DRM_IOCTL_SYNCOBJ_RESET, you can easily get the Vulkan behavior.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/drm_syncobj.c | 319 ++++++++++++++++++++++++++++++++++++++---- include/uapi/drm/drm.h | 1 + 2 files changed, 293 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 510dfc2..ecfc43b 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -51,6 +51,7 @@ #include <linux/fs.h> #include <linux/anon_inodes.h> #include <linux/sync_file.h> +#include <linux/sched/signal.h>
#include "drm_internal.h" #include <drm/drm_syncobj.h> @@ -104,6 +105,27 @@ static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, list_add_tail(&cb->node, &syncobj->cb_list); }
+static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, + struct dma_fence **fence, + struct drm_syncobj_cb *cb, + drm_syncobj_func_t func) +{ + int ret; + + spin_lock(&syncobj->lock); + if (syncobj->fence) { + *fence = dma_fence_get(syncobj->fence); + ret = 1; + } else { + *fence = NULL; + drm_syncobj_add_callback_locked(syncobj, cb, func); + ret = 0; + } + spin_unlock(&syncobj->lock); + + return ret; +} + /** * drm_syncobj_add_callback - adds a callback to syncobj::cb_list * @syncobj: Sync object to which to add the callback @@ -526,6 +548,255 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, &args->handle); }
+static int drm_syncobj_signaled(struct drm_syncobj *syncobj, + uint32_t wait_flags) +{ + struct dma_fence *fence; + int ret; + + fence = drm_syncobj_fence_get(syncobj); + if (!fence) { + if (wait_flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) + return 0; + else + return -EINVAL; + } + + ret = dma_fence_is_signaled(fence) ? 1 : 0; + + dma_fence_put(fence); + + return ret; +} + +struct syncobj_get_wait_cb { + struct drm_syncobj_cb base; + struct task_struct *task; + struct dma_fence **fence; +}; + +static void syncobj_get_wait_cb_func(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *wait_cb) +{ + struct syncobj_get_wait_cb *wait = + container_of(wait_cb, struct syncobj_get_wait_cb, base); + + /* This happens inside the syncobj lock */ + *wait->fence = dma_fence_get(syncobj->fence); + wake_up_process(wait->task); +} + +static signed long +drm_syncobj_get_fence_wait_timeout(struct drm_syncobj *syncobj, + struct dma_fence **fence, + signed long timeout) +{ + struct syncobj_get_wait_cb cb; + signed long ret; + + if (timeout == 0) { + *fence = drm_syncobj_fence_get(syncobj); + if (*fence) + return 1; + return 0; + } + + cb.task = current; + cb.fence = fence; + + if (drm_syncobj_fence_get_or_add_callback(syncobj, fence, &cb.base, + syncobj_get_wait_cb_func)) { + /* We got the fence immediately. No callback was installed. */ + return timeout; + } + + ret = timeout; + while (ret > 0) { + set_current_state(TASK_INTERRUPTIBLE); + + if (*fence) + break; + + ret = schedule_timeout(ret); + + if (ret > 0 && signal_pending(current)) + ret = -ERESTARTSYS; + } + + __set_current_state(TASK_RUNNING); + + drm_syncobj_remove_callback(syncobj, &cb.base); + + /* It's possible that we got the fence after timing out or getting + * interrupted but before removing the callback. + */ + if (ret <= 0 && *fence) + ret = 1; + + return ret; +} + +static signed long drm_syncobj_wait_timeout(struct drm_syncobj *syncobj, + uint32_t flags, + signed long timeout) +{ + struct dma_fence *fence; + signed long ret; + + if (timeout == 0) + return drm_syncobj_signaled(syncobj, flags); + + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + ret = drm_syncobj_get_fence_wait_timeout(syncobj, &fence, + timeout); + if (ret <= 0) + return ret; + timeout = ret; + } else { + fence = drm_syncobj_fence_get(syncobj); + } + + if (!fence) + return -EINVAL; + + ret = dma_fence_wait_timeout(fence, true, timeout); + + /* Various dma_fence wait callbacks will return ENOENT to indicate + * that the fence has already been signaled. We sanitize this as + * returning with all of the time left. + */ + if (ret == -ENOENT) + ret = timeout; + + dma_fence_put(fence); + + return ret; +} + +struct syncobj_wait_any_entry { + struct task_struct *task; + struct dma_fence *fence; + struct dma_fence_cb fence_cb; + struct drm_syncobj_cb syncobj_cb; +}; + +static void syncobj_wait_any_fence_func(struct dma_fence *fence, + struct dma_fence_cb *cb) +{ + struct syncobj_wait_any_entry *wait = + container_of(cb, struct syncobj_wait_any_entry, fence_cb); + + wake_up_process(wait->task); +} + +static void syncobj_wait_any_syncobj_func(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *cb) +{ + struct syncobj_wait_any_entry *wait = + container_of(cb, struct syncobj_wait_any_entry, syncobj_cb); + + /* This happens inside the syncobj lock */ + wait->fence = dma_fence_get(syncobj->fence); + wake_up_process(wait->task); +} + +static signed long drm_syncobj_wait_any_timeout(struct drm_syncobj **syncobjs, + uint32_t count, + uint32_t flags, + signed long timeout, + uint32_t *idx) +{ + struct syncobj_wait_any_entry *entries; + struct dma_fence *fence; + signed long ret; + uint32_t i; + + if (timeout == 0) { + for (i = 0; i < count; ++i) { + ret = drm_syncobj_signaled(syncobjs[i], flags); + if (ret < 0) + return ret; + if (ret == 0) + continue; + if (idx) + *idx = i; + return 1; + } + + return 0; + } + + entries = kcalloc(count, sizeof(*entries), GFP_KERNEL); + if (!entries) + return -ENOMEM; + + for (i = 0; i < count; ++i) { + entries[i].task = current; + + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + drm_syncobj_fence_get_or_add_callback(syncobjs[i], + &entries[i].fence, + &entries[i].syncobj_cb, + syncobj_wait_any_syncobj_func); + } else { + entries[i].fence = drm_syncobj_fence_get(syncobjs[i]); + if (!entries[i].fence) { + ret = -EINVAL; + goto err_cleanup_entries; + } + } + } + + ret = timeout; + while (ret > 0) { + for (i = 0; i < count; ++i) { + fence = entries[i].fence; + if (!fence) + continue; + + if (dma_fence_is_signaled(fence)) { + if (idx) + *idx = i; + goto done_waiting; + } + + if (!entries[i].fence_cb.func && + dma_fence_add_callback(fence, + &entries[i].fence_cb, + syncobj_wait_any_fence_func)) { + /* The fence is already signaled */ + if (idx) + *idx = i; + goto done_waiting; + } + } + + set_current_state(TASK_INTERRUPTIBLE); + + ret = schedule_timeout(ret); + + if (ret > 0 && signal_pending(current)) + ret = -ERESTARTSYS; + } + +done_waiting: + __set_current_state(TASK_RUNNING); + +err_cleanup_entries: + for (i = 0; i < count; ++i) { + if (entries[i].syncobj_cb.func) + drm_syncobj_remove_callback(syncobjs[i], + &entries[i].syncobj_cb); + if (entries[i].fence_cb.func) + dma_fence_remove_callback(entries[i].fence, + &entries[i].fence_cb); + dma_fence_put(entries[i].fence); + } + kfree(entries); + + return ret; +} + /** * drm_timeout_abs_to_jiffies - calculate jiffies timeout from absolute value * @@ -561,7 +832,7 @@ static signed long drm_timeout_abs_to_jiffies(int64_t timeout_nsec) static int drm_syncobj_wait_fences(struct drm_device *dev, struct drm_file *file_private, struct drm_syncobj_wait *wait, - struct dma_fence **fences) + struct drm_syncobj **syncobjs) { signed long timeout = drm_timeout_abs_to_jiffies(wait->timeout_nsec); signed long ret = 0; @@ -570,17 +841,9 @@ static int drm_syncobj_wait_fences(struct drm_device *dev, if (wait->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) { uint32_t i; for (i = 0; i < wait->count_handles; i++) { - ret = dma_fence_wait_timeout(fences[i], true, timeout); - - /* Various dma_fence wait callbacks will return - * ENOENT to indicate that the fence has already - * been signaled. We need to sanitize this to 0 so - * we don't return early and the client doesn't see - * an unexpected error. - */ - if (ret == -ENOENT) - ret = 0; - + ret = drm_syncobj_wait_timeout(syncobjs[i], + wait->flags, + timeout); if (ret < 0) return ret; if (ret == 0) @@ -589,10 +852,10 @@ static int drm_syncobj_wait_fences(struct drm_device *dev, } first = 0; } else { - ret = dma_fence_wait_any_timeout(fences, - wait->count_handles, - true, timeout, - &first); + ret = drm_syncobj_wait_any_timeout(syncobjs, + wait->count_handles, + wait->flags, + timeout, &first); }
if (ret < 0) @@ -610,14 +873,15 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, { struct drm_syncobj_wait *args = data; uint32_t *handles; - struct dma_fence **fences; + struct drm_syncobj **syncobjs; int ret = 0; uint32_t i;
if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) return -ENODEV;
- if (args->flags != 0 && args->flags != DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) + if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL | + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)) return -EINVAL;
if (args->count_handles == 0) @@ -636,27 +900,28 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, goto err_free_handles; }
- fences = kcalloc(args->count_handles, - sizeof(struct dma_fence *), GFP_KERNEL); - if (!fences) { + syncobjs = kcalloc(args->count_handles, + sizeof(struct drm_syncobj *), GFP_KERNEL); + if (!syncobjs) { ret = -ENOMEM; goto err_free_handles; }
for (i = 0; i < args->count_handles; i++) { - ret = drm_syncobj_find_fence(file_private, handles[i], - &fences[i]); - if (ret) + syncobjs[i] = drm_syncobj_find(file_private, handles[i]); + if (!syncobjs[i]) { + ret = -ENOENT; goto err_free_fence_array; + } }
ret = drm_syncobj_wait_fences(dev, file_private, - args, fences); + args, syncobjs);
err_free_fence_array: for (i = 0; i < args->count_handles; i++) - dma_fence_put(fences[i]); - kfree(fences); + drm_syncobj_put(syncobjs[i]); + kfree(syncobjs); err_free_handles: kfree(handles);
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 4b301b4..f8ec8fe 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -719,6 +719,7 @@ struct drm_syncobj_handle { };
#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0) +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1) struct drm_syncobj_wait { __u64 handles; /* absolute timeout */
Vulkan VkFence semantics require that the application be able to perform a CPU wait on work which may not yet have been submitted. This is perfectly safe because the CPU wait has a timeout which will get triggered eventually if no work is ever submitted. This behavior is advantageous for multi-threaded workloads because, so long as all of the threads agree on what fences to use up-front, you don't have the extra cross-thread synchronization cost of thread A telling thread B that it has submitted its dependent work and thread B is now free to wait.
Within a single process, this can be implemented in the userspace driver by doing exactly the same kind of tracking the app would have to do using posix condition variables or similar. However, in order for this to work cross-process (as is required by VK_KHR_external_fence), we need to handle this in the kernel.
This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which instructs the IOCTL to wait for the syncobj to have a non-null fence and then wait on the fence. Combined with DRM_IOCTL_SYNCOBJ_RESET, you can easily get the Vulkan behavior.
v2: - Fix a bug in the invalid syncobj error path - Unify the wait-all and wait-any cases
Signed-off-by: Jason Ekstrand jason@jlekstrand.net Cc: Dave Airlie airlied@redhat.com ---
I realized today (this is what happens when you sleep) that it takes almost no work to make the wait-any path also handle wait-all. By unifying the two, I deleted over a hundred lines of code from the implementation.
drivers/gpu/drm/drm_syncobj.c | 243 ++++++++++++++++++++++++++++++++++-------- include/uapi/drm/drm.h | 1 + 2 files changed, 199 insertions(+), 45 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 510dfc2..5e7f654 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -51,6 +51,7 @@ #include <linux/fs.h> #include <linux/anon_inodes.h> #include <linux/sync_file.h> +#include <linux/sched/signal.h>
#include "drm_internal.h" #include <drm/drm_syncobj.h> @@ -104,6 +105,27 @@ static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, list_add_tail(&cb->node, &syncobj->cb_list); }
+static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, + struct dma_fence **fence, + struct drm_syncobj_cb *cb, + drm_syncobj_func_t func) +{ + int ret; + + spin_lock(&syncobj->lock); + if (syncobj->fence) { + *fence = dma_fence_get(syncobj->fence); + ret = 1; + } else { + *fence = NULL; + drm_syncobj_add_callback_locked(syncobj, cb, func); + ret = 0; + } + spin_unlock(&syncobj->lock); + + return ret; +} + /** * drm_syncobj_add_callback - adds a callback to syncobj::cb_list * @syncobj: Sync object to which to add the callback @@ -526,6 +548,159 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, &args->handle); }
+static int drm_syncobj_signaled(struct drm_syncobj *syncobj, + uint32_t wait_flags) +{ + struct dma_fence *fence; + int ret; + + fence = drm_syncobj_fence_get(syncobj); + if (!fence) { + if (wait_flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) + return 0; + else + return -EINVAL; + } + + ret = dma_fence_is_signaled(fence) ? 1 : 0; + + dma_fence_put(fence); + + return ret; +} + +struct syncobj_wait_entry { + struct task_struct *task; + struct dma_fence *fence; + struct dma_fence_cb fence_cb; + struct drm_syncobj_cb syncobj_cb; +}; + +static void syncobj_wait_fence_func(struct dma_fence *fence, + struct dma_fence_cb *cb) +{ + struct syncobj_wait_entry *wait = + container_of(cb, struct syncobj_wait_entry, fence_cb); + + wake_up_process(wait->task); +} + +static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *cb) +{ + struct syncobj_wait_entry *wait = + container_of(cb, struct syncobj_wait_entry, syncobj_cb); + + /* This happens inside the syncobj lock */ + wait->fence = dma_fence_get(syncobj->fence); + wake_up_process(wait->task); +} + +static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, + uint32_t count, + uint32_t flags, + signed long timeout, + uint32_t *idx) +{ + struct syncobj_wait_entry *entries; + struct dma_fence *fence; + signed long ret; + uint32_t signaled_count, i; + + if (timeout == 0) { + signaled_count = 0; + for (i = 0; i < count; ++i) { + ret = drm_syncobj_signaled(syncobjs[i], flags); + if (ret < 0) + return ret; + if (ret == 0) + continue; + if (signaled_count == 0 && idx) + *idx = i; + signaled_count++; + } + + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) + return signaled_count == count ? 1 : 0; + else + return signaled_count > 0 ? 1 : 0; + } + + entries = kcalloc(count, sizeof(*entries), GFP_KERNEL); + if (!entries) + return -ENOMEM; + + for (i = 0; i < count; ++i) { + entries[i].task = current; + + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + drm_syncobj_fence_get_or_add_callback(syncobjs[i], + &entries[i].fence, + &entries[i].syncobj_cb, + syncobj_wait_syncobj_func); + } else { + entries[i].fence = drm_syncobj_fence_get(syncobjs[i]); + if (!entries[i].fence) { + ret = -EINVAL; + goto err_cleanup_entries; + } + } + } + + ret = timeout; + while (ret > 0) { + signaled_count = 0; + for (i = 0; i < count; ++i) { + fence = entries[i].fence; + if (!fence) + continue; + + if (dma_fence_is_signaled(fence) || + (!entries[i].fence_cb.func && + dma_fence_add_callback(fence, + &entries[i].fence_cb, + syncobj_wait_fence_func))) { + /* The fence has been signaled */ + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) { + signaled_count++; + } else { + if (idx) + *idx = i; + goto done_waiting; + } + } + } + + if ((flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) && + signaled_count == count) + goto done_waiting; + + set_current_state(TASK_INTERRUPTIBLE); + + ret = schedule_timeout(ret); + + if (ret > 0 && signal_pending(current)) + ret = -ERESTARTSYS; + } + +done_waiting: + __set_current_state(TASK_RUNNING); + +err_cleanup_entries: + for (i = 0; i < count; ++i) { + if (entries[i].syncobj_cb.func) + drm_syncobj_remove_callback(syncobjs[i], + &entries[i].syncobj_cb); + if (entries[i].fence_cb.func) + dma_fence_remove_callback(entries[i].fence, + &entries[i].fence_cb); + dma_fence_put(entries[i].fence); + } + kfree(entries); + + return ret; +} + /** * drm_timeout_abs_to_jiffies - calculate jiffies timeout from absolute value * @@ -558,43 +733,19 @@ static signed long drm_timeout_abs_to_jiffies(int64_t timeout_nsec) return timeout_jiffies64 + 1; }
-static int drm_syncobj_wait_fences(struct drm_device *dev, - struct drm_file *file_private, - struct drm_syncobj_wait *wait, - struct dma_fence **fences) +static int drm_syncobj_array_wait(struct drm_device *dev, + struct drm_file *file_private, + struct drm_syncobj_wait *wait, + struct drm_syncobj **syncobjs) { signed long timeout = drm_timeout_abs_to_jiffies(wait->timeout_nsec); signed long ret = 0; uint32_t first = ~0;
- if (wait->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) { - uint32_t i; - for (i = 0; i < wait->count_handles; i++) { - ret = dma_fence_wait_timeout(fences[i], true, timeout); - - /* Various dma_fence wait callbacks will return - * ENOENT to indicate that the fence has already - * been signaled. We need to sanitize this to 0 so - * we don't return early and the client doesn't see - * an unexpected error. - */ - if (ret == -ENOENT) - ret = 0; - - if (ret < 0) - return ret; - if (ret == 0) - break; - timeout = ret; - } - first = 0; - } else { - ret = dma_fence_wait_any_timeout(fences, - wait->count_handles, - true, timeout, - &first); - } - + ret = drm_syncobj_array_wait_timeout(syncobjs, + wait->count_handles, + wait->flags, + timeout, &first); if (ret < 0) return ret;
@@ -610,14 +761,15 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, { struct drm_syncobj_wait *args = data; uint32_t *handles; - struct dma_fence **fences; + struct drm_syncobj **syncobjs; int ret = 0; uint32_t i;
if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) return -ENODEV;
- if (args->flags != 0 && args->flags != DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) + if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL | + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)) return -EINVAL;
if (args->count_handles == 0) @@ -636,27 +788,28 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, goto err_free_handles; }
- fences = kcalloc(args->count_handles, - sizeof(struct dma_fence *), GFP_KERNEL); - if (!fences) { + syncobjs = kcalloc(args->count_handles, + sizeof(struct drm_syncobj *), GFP_KERNEL); + if (!syncobjs) { ret = -ENOMEM; goto err_free_handles; }
for (i = 0; i < args->count_handles; i++) { - ret = drm_syncobj_find_fence(file_private, handles[i], - &fences[i]); - if (ret) + syncobjs[i] = drm_syncobj_find(file_private, handles[i]); + if (!syncobjs[i]) { + ret = -ENOENT; goto err_free_fence_array; + } }
- ret = drm_syncobj_wait_fences(dev, file_private, - args, fences); + ret = drm_syncobj_array_wait(dev, file_private, + args, syncobjs);
err_free_fence_array: - for (i = 0; i < args->count_handles; i++) - dma_fence_put(fences[i]); - kfree(fences); + while (i-- > 0) + drm_syncobj_put(syncobjs[i]); + kfree(syncobjs); err_free_handles: kfree(handles);
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 4b301b4..f8ec8fe 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -719,6 +719,7 @@ struct drm_syncobj_handle { };
#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0) +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1) struct drm_syncobj_wait { __u64 handles; /* absolute timeout */
Quoting Jason Ekstrand (2017-08-09 18:00:54)
This is a bit of the puzzle that is missing.
So we are taking a snapshot here. It looks like this could have been done using a dma_fence_array + dma_fence_proxy for capturing the future fence.
This is the first step in the wait-loop, you set TASK_INTERRUPTIBLE before doing any checks. So that if any state changes whilst you are in the middle of those checks, the schedule_timeout is a nop and you can check again. -Chris
Am 09.08.2017 um 19:57 schrieb Chris Wilson:
Yeah, completely agree.
I would rather drop the approach with the custom wait and try to use wait_event_interruptible here.
As far as I can see that should make the whole patch much cleaner in general.
Regards, Christian.
On Wed, Aug 9, 2017 at 11:25 AM, Christian König deathsimple@vodafone.de wrote:
? It's in the previous patch.
I'm not sure what you mean. Is that something in the future fence series that I should be looking at?
Sounds good to me.
--Jason
Quoting Chris Wilson (2017-08-09 18:57:44)
A quick sketch of this idea looks like:
void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence) { - struct dma_fence *old_fence; + unsigned long flags;
- if (fence) - dma_fence_get(fence); - old_fence = xchg(&syncobj->fence, fence); - - dma_fence_put(old_fence); + spin_lock_irqsave(&syncobj->lock, flags); + dma_fence_replace_proxy(&syncobj->fence, fence); + spin_unlock_irqrestore(&syncobj->lock, flags); }
+int +drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private) +{ + struct drm_syncobj_wait *args = data; + struct dma_fence **fences; + struct dma_fence_array *array; + unsigned long timeout; + unsigned int count; + u32 *handles; + int ret = 0; + u32 i; + + BUILD_BUG_ON(sizeof(*fences) < sizeof(*handles)); + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) + return -ENODEV; + + if (args->flags != 0 && args->flags != DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) + return -EINVAL; + + count = args->count_handles; + if (!count) + return -EINVAL; + + /* Get the handles from userspace */ + fences = kmalloc_array(count, + sizeof(struct dma_fence *), + __GFP_NOWARN | GFP_KERNEL); + if (!fences) + return -ENOMEM; + + handles = (void *)fences + count * (sizeof(*fences) - sizeof(*handles)); + if (copy_from_user(handles, + u64_to_user_ptr(args->handles), + sizeof(u32) * count)) { + ret = -EFAULT; + goto err; + } + + for (i = 0; i < count; i++) { + struct drm_syncobj *s; + + ret = -ENOENT; + s = drm_syncobj_find(file_private, handles[i]); + if (s) { + ret = 0; + spin_lock_irq(&s->lock); + if (!s->fence) { + if (args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) + s->fence = dma_fence_create_proxy(); + else + ret = -EINVAL; + } + if (s->fence) + fences[i] = dma_fence_get(s->fence); + spin_unlock_irq(&s->lock); + } + if (ret) { + count = i; + goto err_fences; + } + } + + array = dma_fence_array_create(count, fences, 0, 0, + !(args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL)); + if (!array) { + ret = -ENOMEM; + goto err_fences; + } + + timeout = drm_timeout_abs_to_jiffies(args->timeout_nsec); + timeout = dma_fence_wait_timeout(&array->base, true, timeout); + args->first_signaled = array->first_signaled; + dma_fence_put(&array->base); + + return timeout < 0 ? timeout : 0; + +err_fences: + for (i = 0; i < count; i++) + dma_fence_put(fences[i]); +err: + kfree(fences); + return ret; +}
The key advantage is that this translate the ioctl into a dma-fence-array which already has to deal with the mess, sharing the burden. (But it does require a trivial patch to dma-fence-array to record the first signaled fence.)
However, this installs the proxy into syncobj->fence with the result that any concurrent wait also become a WAIT_FOR_SUBMIT. The behaviour of drm_syncobj is then quite inconsistent, sometimes it will wait for a future fence, sometimes it will report an error.
Furthermore, do we want future-fences in the execbuf API? etc. -Chris
On Wed, Aug 9, 2017 at 3:41 PM, Chris Wilson chris@chris-wilson.co.uk wrote:
Yeah, that's not good. I thought about a variety of solutions to try and re-use more core dma_fence code. Ultimately I chose the current one because it takes a snapshot of the syncobjs and then, from that point forward, it's consistent with its snapshot. Nothing I was able to come up with based on core dma_fence wait routines does that.
--Jason
Quoting Jason Ekstrand (2017-08-10 00:53:00)
So if we choose to keep the proxy local to the fence-array and not replace syncobj->fence, we can still reduce this to a plain dma-fence-array + wait.
Pertinent details:
+static void syncobj_notify(struct drm_syncobj *syncobj, struct dma_fence *fence) +{ + struct drm_syncobj_cb *cb, *cn; + unsigned long flags; + + /* This is just an opencoded waitqueue; FIXME! */ + spin_lock_irqsave(&syncobj->lock, flags); + list_for_each_entry_safe(cb, cn, &syncobj->cb_list, link) + cb->func(cb, fence); + INIT_LIST_HEAD(&syncobj->cb_list); + spin_unlock_irqrestore(&syncobj->lock, flags); +} + /** * drm_syncobj_replace_fence - replace fence in a sync object. * @syncobj: Sync object to replace fence in @@ -89,7 +109,10 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
if (fence) dma_fence_get(fence); + old_fence = xchg(&syncobj->fence, fence); + if (!old_fence && fence) + syncobj_notify(syncobj, fence);
dma_fence_put(old_fence); } @@ -124,6 +147,9 @@ void drm_syncobj_free(struct kref *kref) struct drm_syncobj *syncobj = container_of(kref, struct drm_syncobj, refcount); + + syncobj_notify(syncobj, NULL); + dma_fence_put(syncobj->fence); kfree(syncobj); } @@ -140,6 +166,8 @@ static int drm_syncobj_create(struct drm_file *file_private, return -ENOMEM;
kref_init(&syncobj->refcount); + spin_lock_init(&syncobj->lock); + INIT_LIST_HEAD(&syncobj->cb_list);
idr_preload(GFP_KERNEL); spin_lock(&file_private->syncobj_table_lock);
struct future_fence { + struct drm_syncobj_cb base; + struct dma_fence **slot; +}; + +static void future_fence_cb(struct drm_syncobj_cb *cb, struct dma_fence *fence) +{ + struct future_fence *ff = container_of(cb, typeof(*ff), base); + + if (fence) + dma_fence_replace_proxy(ff->slot, fence); + else + dma_fence_signal(*ff->slot); + + kfree(ff); +} + +int +drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private) +{ + struct drm_syncobj_wait *args = data; + struct dma_fence_array *array; + struct dma_fence **fences; + unsigned int count, i; + long timeout; + u32 *handles; + int ret; + + BUILD_BUG_ON(sizeof(*fences) < sizeof(*handles)); + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) + return -ENODEV; + + if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL | + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)) + return -EINVAL; + + count = args->count_handles; + if (!count) + return -EINVAL; + + /* Get the handles from userspace */ + fences = kmalloc_array(count, + sizeof(struct dma_fence *), + __GFP_NOWARN | GFP_KERNEL); + if (!fences) + return -ENOMEM; + + handles = (void *)fences + count * (sizeof(*fences) - sizeof(*handles)); + if (copy_from_user(handles, + u64_to_user_ptr(args->handles), + sizeof(u32) * count)) { + ret = -EFAULT; + goto err; + } + + for (i = 0; i < count; i++) { + struct drm_syncobj *s; + + ret = -ENOENT; + s = drm_syncobj_find(file_private, handles[i]); + if (s) { + ret = 0; + spin_lock_irq(&s->lock); + if (s->fence) { + fences[i] = dma_fence_get(s->fence); + } else if (args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + struct future_fence *cb; + + cb = kmalloc(sizeof(*cb), GFP_KERNEL); + fences[i] = dma_fence_create_proxy(); + if (cb && fences[i]) { + cb->slot = &fences[i]; + cb->base.func = future_fence_cb; + list_add(&cb->base.link, &s->cb_list); + } else { + kfree(cb); + dma_fence_put(fences[i]); + ret = -ENOMEM; + } + } else { + ret = -EINVAL; + } + spin_unlock_irq(&s->lock); + } + if (ret) { + count = i; + goto err_fences; + } + } + + array = dma_fence_array_create(count, fences, 0, 0, + !(args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL)); + if (!array) { + ret = -ENOMEM; + goto err_fences; + } + + timeout = drm_timeout_abs_to_jiffies(args->timeout_nsec); + timeout = dma_fence_wait_timeout(&array->base, true, timeout); + args->first_signaled = array->first_signaled; + dma_fence_put(&array->base); + + return timeout < 0 ? timeout : timeout == 0 ? -ETIME : 0; /* Gah. */ + +err_fences: + for (i = 0; i < count; i++) + dma_fence_put(fences[i]); +err: + kfree(fences); + return ret; +}
On Thu, Aug 10, 2017 at 4:00 AM, Chris Wilson chris@chris-wilson.co.uk wrote:
Where does this come from? I can't find it on dri-devel and "proxy" doesn't show up anywhere in the dam-buf sources. What does it do?
Am 10.08.2017 um 01:53 schrieb Jason Ekstrand:
As Chris pointed out, that's really not a good idea.
Most of the time we need the behavior of reporting an error and only when the flag is given wait until some fence shows up.
In general I suggest that we only support this use case in the form of a wait_event_interruptible() on setting the first fence on an object.
Waiting on the first one of multiple objects wouldn't be possible (you would always wait till all objects have fences), but I think that this is acceptable.
The big advantage of the wait_event_interruptible() interface is that your condition checking and process handling is bullet prove, as far as I have seen so far your code is a bit buggy in that direction.
Christian.
On Thu, Aug 10, 2017 at 5:26 AM, Christian König deathsimple@vodafone.de wrote:
What isn't a good idea?
Not really. If you're doing a wait-any, then you want it to return as soon as you have a signaled fence even if half your sync objects never get fences. At least that's what's required for implementing vkWaitForFences. The idea is that, if the WAIT_FOR_SUBMIT flag is set, then a syncobj without a fence and a syncobj with an untriggered fence are identical as far as waiting is concerned: You wait until it has a signaled fence.
In v3, I used wait_event_interruptible. Does it have those same bugs?
--Jason
Am 10.08.2017 um 16:32 schrieb Jason Ekstrand:
However, this installs the proxy into syncobj->fence with the result that any concurrent wait also become a WAIT_FOR_SUBMIT.
Installing that proxy. I'm always happy if someone reuses the code (after all I wrote a good part of it), but I think we should rather try to make this as clean as possible.
Crap, that makes the whole thing much more harder to implement.
I haven't seen that yet, but when you now use wait_even_interruptible() the problem should be gone.
Regards, Christian.
Quoting Jason Ekstrand (2017-08-09 18:00:54)
There's a very annoying laxness in the dma_fence API here, in that backends are not required to automatically report when a fence is signaled prior to fence->ops->enable_signaling() being called. So here if we fail to match signaled_count, we need to fallthough and try a 0 timeout wait!
Christian, dma_fence_wait_any_timeout() has this same bug you told me off for, e.g. commit 698c0f7ff216 ("dma-buf/fence: revert "don't wait when specified timeout is zero" (v2)")! -Chris
On Wed, Aug 9, 2017 at 2:31 PM, Chris Wilson chris@chris-wilson.co.uk wrote:
That is very annoying! I'll see how bad the fall-through is...
Vulkan VkFence semantics require that the application be able to perform a CPU wait on work which may not yet have been submitted. This is perfectly safe because the CPU wait has a timeout which will get triggered eventually if no work is ever submitted. This behavior is advantageous for multi-threaded workloads because, so long as all of the threads agree on what fences to use up-front, you don't have the extra cross-thread synchronization cost of thread A telling thread B that it has submitted its dependent work and thread B is now free to wait.
Within a single process, this can be implemented in the userspace driver by doing exactly the same kind of tracking the app would have to do using posix condition variables or similar. However, in order for this to work cross-process (as is required by VK_KHR_external_fence), we need to handle this in the kernel.
This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which instructs the IOCTL to wait for the syncobj to have a non-null fence and then wait on the fence. Combined with DRM_IOCTL_SYNCOBJ_RESET, you can easily get the Vulkan behavior.
v2: - Fix a bug in the invalid syncobj error path - Unify the wait-all and wait-any cases v3: - Unify the timeout == 0 case a bit with the timeout > 0 case - use wait_event_interruptible_timeout
Signed-off-by: Jason Ekstrand jason@jlekstrand.net Cc: Dave Airlie airlied@redhat.com --- drivers/gpu/drm/drm_syncobj.c | 239 ++++++++++++++++++++++++++++++++++-------- include/uapi/drm/drm.h | 1 + 2 files changed, 195 insertions(+), 45 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 510dfc2..6e6fb86 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -51,6 +51,7 @@ #include <linux/fs.h> #include <linux/anon_inodes.h> #include <linux/sync_file.h> +#include <linux/sched/signal.h>
#include "drm_internal.h" #include <drm/drm_syncobj.h> @@ -104,6 +105,27 @@ static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, list_add_tail(&cb->node, &syncobj->cb_list); }
+static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, + struct dma_fence **fence, + struct drm_syncobj_cb *cb, + drm_syncobj_func_t func) +{ + int ret; + + spin_lock(&syncobj->lock); + if (syncobj->fence) { + *fence = dma_fence_get(syncobj->fence); + ret = 1; + } else { + *fence = NULL; + drm_syncobj_add_callback_locked(syncobj, cb, func); + ret = 0; + } + spin_unlock(&syncobj->lock); + + return ret; +} + /** * drm_syncobj_add_callback - adds a callback to syncobj::cb_list * @syncobj: Sync object to which to add the callback @@ -526,6 +548,155 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, &args->handle); }
+struct syncobj_wait_entry { + wait_queue_head_t *wq; + struct dma_fence *fence; + struct dma_fence_cb fence_cb; + struct drm_syncobj_cb syncobj_cb; +}; + +static void syncobj_wait_fence_func(struct dma_fence *fence, + struct dma_fence_cb *cb) +{ + struct syncobj_wait_entry *wait = + container_of(cb, struct syncobj_wait_entry, fence_cb); + + wake_up(wait->wq); +} + +static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, + struct drm_syncobj_cb *cb) +{ + struct syncobj_wait_entry *wait = + container_of(cb, struct syncobj_wait_entry, syncobj_cb); + + /* This happens inside the syncobj lock */ + wait->fence = dma_fence_get(syncobj->fence); + wake_up(wait->wq); +} + +static bool drm_syncobj_array_wait_complete(struct syncobj_wait_entry *entries, + uint32_t count, + uint32_t flags, + uint32_t *idx) +{ + struct dma_fence *fence; + uint32_t signaled_count; + int i; + + signaled_count = 0; + for (i = 0; i < count; ++i) { + fence = entries[i].fence; + if (!fence) + continue; + + if (dma_fence_is_signaled(fence) || + (!entries[i].fence_cb.func && + dma_fence_add_callback(fence, + &entries[i].fence_cb, + syncobj_wait_fence_func))) { + /* The fence has been signaled */ + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) { + signaled_count++; + } else { + if (idx) + *idx = i; + return true; + } + } + } + + if (signaled_count == count) + return true; + + return false; +} + +static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, + uint32_t count, + uint32_t flags, + signed long timeout, + uint32_t *idx) +{ + wait_queue_head_t syncobj_wait_queue; + struct syncobj_wait_entry *entries; + signed long ret; + uint32_t signaled_count, i; + + entries = kcalloc(count, sizeof(*entries), GFP_KERNEL); + if (!entries) + return -ENOMEM; + + /* Walk the list of sync objects and initialize entries. We do + * this up-front so that we can properly return -EINVAL if there is + * a syncobj with a missing fence and then never have the chance of + * returning -EINVAL again. + */ + signaled_count = 0; + for (i = 0; i < count; ++i) { + entries[i].wq = &syncobj_wait_queue; + entries[i].fence = drm_syncobj_fence_get(syncobjs[i]); + if (!entries[i].fence) { + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + continue; + } else { + ret = -EINVAL; + goto cleanup_entries; + } + } + + if (dma_fence_is_signaled(entries[i].fence)) { + if (signaled_count == 0 && idx) + *idx = i; + signaled_count++; + } + } + + if (signaled_count == count || + (signaled_count > 0 && + !(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL))) { + ret = max_t(signed long, timeout, 1); + goto cleanup_entries; + } + + /* There's a very annoying laxness in the dma_fence API here, in + * that backends are not required to automatically report when a + * fence is signaled prior to fence->ops->enable_signaling() being + * called. So here if we fail to match signaled_count, we need to + * fallthough and try a 0 timeout wait! + */ + + init_waitqueue_head(&syncobj_wait_queue); + + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + for (i = 0; i < count; ++i) { + drm_syncobj_fence_get_or_add_callback(syncobjs[i], + &entries[i].fence, + &entries[i].syncobj_cb, + syncobj_wait_syncobj_func); + } + } + + ret = wait_event_interruptible_timeout( + syncobj_wait_queue, + drm_syncobj_array_wait_complete(entries, count, flags, idx), + timeout); + +cleanup_entries: + for (i = 0; i < count; ++i) { + if (entries[i].syncobj_cb.func) + drm_syncobj_remove_callback(syncobjs[i], + &entries[i].syncobj_cb); + if (entries[i].fence_cb.func) + dma_fence_remove_callback(entries[i].fence, + &entries[i].fence_cb); + dma_fence_put(entries[i].fence); + } + kfree(entries); + + return ret; +} + /** * drm_timeout_abs_to_jiffies - calculate jiffies timeout from absolute value * @@ -558,43 +729,19 @@ static signed long drm_timeout_abs_to_jiffies(int64_t timeout_nsec) return timeout_jiffies64 + 1; }
-static int drm_syncobj_wait_fences(struct drm_device *dev, - struct drm_file *file_private, - struct drm_syncobj_wait *wait, - struct dma_fence **fences) +static int drm_syncobj_array_wait(struct drm_device *dev, + struct drm_file *file_private, + struct drm_syncobj_wait *wait, + struct drm_syncobj **syncobjs) { signed long timeout = drm_timeout_abs_to_jiffies(wait->timeout_nsec); signed long ret = 0; uint32_t first = ~0;
- if (wait->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) { - uint32_t i; - for (i = 0; i < wait->count_handles; i++) { - ret = dma_fence_wait_timeout(fences[i], true, timeout); - - /* Various dma_fence wait callbacks will return - * ENOENT to indicate that the fence has already - * been signaled. We need to sanitize this to 0 so - * we don't return early and the client doesn't see - * an unexpected error. - */ - if (ret == -ENOENT) - ret = 0; - - if (ret < 0) - return ret; - if (ret == 0) - break; - timeout = ret; - } - first = 0; - } else { - ret = dma_fence_wait_any_timeout(fences, - wait->count_handles, - true, timeout, - &first); - } - + ret = drm_syncobj_array_wait_timeout(syncobjs, + wait->count_handles, + wait->flags, + timeout, &first); if (ret < 0) return ret;
@@ -610,14 +757,15 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, { struct drm_syncobj_wait *args = data; uint32_t *handles; - struct dma_fence **fences; + struct drm_syncobj **syncobjs; int ret = 0; uint32_t i;
if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) return -ENODEV;
- if (args->flags != 0 && args->flags != DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) + if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL | + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)) return -EINVAL;
if (args->count_handles == 0) @@ -636,27 +784,28 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, goto err_free_handles; }
- fences = kcalloc(args->count_handles, - sizeof(struct dma_fence *), GFP_KERNEL); - if (!fences) { + syncobjs = kcalloc(args->count_handles, + sizeof(struct drm_syncobj *), GFP_KERNEL); + if (!syncobjs) { ret = -ENOMEM; goto err_free_handles; }
for (i = 0; i < args->count_handles; i++) { - ret = drm_syncobj_find_fence(file_private, handles[i], - &fences[i]); - if (ret) + syncobjs[i] = drm_syncobj_find(file_private, handles[i]); + if (!syncobjs[i]) { + ret = -ENOENT; goto err_free_fence_array; + } }
- ret = drm_syncobj_wait_fences(dev, file_private, - args, fences); + ret = drm_syncobj_array_wait(dev, file_private, + args, syncobjs);
err_free_fence_array: - for (i = 0; i < args->count_handles; i++) - dma_fence_put(fences[i]); - kfree(fences); + while (i-- > 0) + drm_syncobj_put(syncobjs[i]); + kfree(syncobjs); err_free_handles: kfree(handles);
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 4b301b4..f8ec8fe 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -719,6 +719,7 @@ struct drm_syncobj_handle { };
#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0) +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1) struct drm_syncobj_wait { __u64 handles; /* absolute timeout */
This series does the same thing as my earlier series in that it adds a sync object wait interface complete with WAIT_FOR_SUBMIT flag. While the uapi remains unchanged, the guts look a bit different. Instead of adding a callback mechanism to drm_syncobj that fired whenever replace_fence was called, it's now using proxy fences. The drm_syncobj_fence_get still returns NULL whenever the sync object is in an unsubmitted state but there is a new drm_syncobj_fence_proxy_get which returns either the real fence or a proxy fence that will be triggered the next time replace_fence is called with a non-NULL replacement. This does make both drm_syncobj_fence_get and drm_syncobj_replace_fence a tiny bit more expensive, but it lets us do it all without locking.
This series can be found as a branch here:
https://cgit.freedesktop.org/~jekstrand/linux/log/?h=drm-syncobj-wait-submit...
IGT tests for DRM_IOCTL_SYNCOBJ_WAIT and DRM_IOCTL_SYNCOBJ_RESET can be found on patchwork here:
https://patchwork.freedesktop.org/series/28666/
Patches to the Intel Vulkan driver to implement VK_KHR_external_fence on top of this kernel interface can be found here:
https://cgit.freedesktop.org/~jekstrand/mesa/log/?h=wip/anv-external-fence
Cc: Dave Airlie airlied@redhat.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Christian König christian.koenig@amd.com
Chris Wilson (2): dma-buf/dma-fence: Signal all callbacks from dma_fence_release() dma-buf/dma-fence: Add a mechanism for proxy fences
Dave Airlie (1): drm/syncobj: add sync obj wait interface. (v8)
Jason Ekstrand (6): drm/syncobj: Rename fence_get to find_fence drm/syncobj: Add a race-free drm_syncobj_fence_get helper i915: Add support for drm syncobjs dma-buf/dma-fence: Allow wait_any_timeout without default_wait (v2) drm/syncobj: Add a reset ioctl drm/syncobj: Allow wait for submit and signal behavior (v4)
drivers/dma-buf/Makefile | 4 +- drivers/dma-buf/dma-fence-proxy.c | 186 +++++++++++++++++++ drivers/dma-buf/dma-fence.c | 34 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/drm_internal.h | 4 + drivers/gpu/drm/drm_ioctl.c | 4 + drivers/gpu/drm/drm_syncobj.c | 275 +++++++++++++++++++++++++++-- drivers/gpu/drm/i915/i915_drv.c | 3 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 146 ++++++++++++++- include/drm/drm_syncobj.h | 15 +- include/linux/dma-fence-proxy.h | 25 +++ include/uapi/drm/drm.h | 19 ++ include/uapi/drm/i915_drm.h | 30 +++- 13 files changed, 710 insertions(+), 37 deletions(-) create mode 100644 drivers/dma-buf/dma-fence-proxy.c create mode 100644 include/linux/dma-fence-proxy.h
The function has far more in common with drm_syncobj_find than with any in the get/put functions.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/drm_syncobj.c | 10 +++++----- include/drm/drm_syncobj.h | 6 +++--- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 3378951..dcfb056 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1038,7 +1038,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p, { int r; struct dma_fence *fence; - r = drm_syncobj_fence_get(p->filp, handle, &fence); + r = drm_syncobj_find_fence(p->filp, handle, &fence); if (r) return r;
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index a5b38a8..0412b0b 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -95,9 +95,9 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, } EXPORT_SYMBOL(drm_syncobj_replace_fence);
-int drm_syncobj_fence_get(struct drm_file *file_private, - u32 handle, - struct dma_fence **fence) +int drm_syncobj_find_fence(struct drm_file *file_private, + u32 handle, + struct dma_fence **fence) { struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle); int ret = 0; @@ -112,7 +112,7 @@ int drm_syncobj_fence_get(struct drm_file *file_private, drm_syncobj_put(syncobj); return ret; } -EXPORT_SYMBOL(drm_syncobj_fence_get); +EXPORT_SYMBOL(drm_syncobj_find_fence);
/** * drm_syncobj_free - free a sync object. @@ -307,7 +307,7 @@ int drm_syncobj_export_sync_file(struct drm_file *file_private, if (fd < 0) return fd;
- ret = drm_syncobj_fence_get(file_private, handle, &fence); + ret = drm_syncobj_find_fence(file_private, handle, &fence); if (ret) goto err_put_fd;
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 89976da..7d4ad77 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -81,9 +81,9 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, u32 handle); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence); -int drm_syncobj_fence_get(struct drm_file *file_private, - u32 handle, - struct dma_fence **fence); +int drm_syncobj_find_fence(struct drm_file *file_private, + u32 handle, + struct dma_fence **fence); void drm_syncobj_free(struct kref *kref);
#endif
The atomic exchange operation in drm_syncobj_replace_fence is sufficient for the case where it races with itself. However, if you have a race between a replace_fence and dma_fence_get(syncobj->fence), you may end up with the entire replace_fence happening between the point in time where the one thread gets the syncobj->fence pointer and when it calls dma_fence_get() on it. If this happens, then the reference may be dropped before we get a chance to get a new one. The new helper uses dma_fence_get_rcu_safe to get rid of the race.
This is also needed because it allows us to do a bit more than just get a reference in drm_syncobj_fence_get should we wish to do so.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/drm_syncobj.c | 12 +++++++++--- include/drm/drm_syncobj.h | 6 +++++- 2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 0412b0b..0190ec2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -75,6 +75,12 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find);
+struct dma_fence *drm_syncobj_fence_get(struct drm_syncobj *syncobj) +{ + return dma_fence_get_rcu_safe(&syncobj->_fence); +} +EXPORT_SYMBOL(drm_syncobj_fence_get); + /** * drm_syncobj_replace_fence - replace fence in a sync object. * @syncobj: Sync object to replace fence in @@ -89,7 +95,7 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
if (fence) dma_fence_get(fence); - old_fence = xchg(&syncobj->fence, fence); + old_fence = xchg(&syncobj->_fence, fence);
dma_fence_put(old_fence); } @@ -105,7 +111,7 @@ int drm_syncobj_find_fence(struct drm_file *file_private, if (!syncobj) return -ENOENT;
- *fence = dma_fence_get(syncobj->fence); + *fence = drm_syncobj_fence_get(syncobj); if (!*fence) { ret = -EINVAL; } @@ -125,7 +131,7 @@ void drm_syncobj_free(struct kref *kref) struct drm_syncobj *syncobj = container_of(kref, struct drm_syncobj, refcount); - dma_fence_put(syncobj->fence); + dma_fence_put(syncobj->_fence); kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free); diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 7d4ad77..c06a441 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -43,8 +43,11 @@ struct drm_syncobj { /** * @fence: * NULL or a pointer to the fence bound to this object. + * + * This pointer should not be accessed directly. Instead, use + * drm_syncobj_fence_get or drm_syncobj_replace_fence. */ - struct dma_fence *fence; + struct dma_fence *_fence; /** * @file: * a file backing for this syncobj. @@ -79,6 +82,7 @@ drm_syncobj_put(struct drm_syncobj *obj)
struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, u32 handle); +struct dma_fence *drm_syncobj_fence_get(struct drm_syncobj *syncobj); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence); int drm_syncobj_find_fence(struct drm_file *file_private,
Hi Jason,
[auto build test WARNING on drm/drm-next] [also build test WARNING on v4.13-rc5 next-20170811] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Jason-Ekstrand/drm-syncobj-Add-full... base: git://people.freedesktop.org/~airlied/linux.git drm-next reproduce: make htmldocs
All warnings (new ones prefixed by >>):
WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org) include/linux/init.h:1: warning: no structured comments found include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_major' description in 'fsl_mc_device_id' include/linux/mod_devicetable.h:687: warning: Excess struct/union/enum/typedef member 'ver_minor' description in 'fsl_mc_device_id' kernel/sched/core.c:2080: warning: No description found for parameter 'rf' kernel/sched/core.c:2080: warning: Excess function parameter 'cookie' description in 'try_to_wake_up_local' include/linux/wait.h:555: warning: No description found for parameter 'wq' include/linux/wait.h:555: warning: Excess function parameter 'wq_head' description in 'wait_event_interruptible_hrtimeout' include/linux/wait.h:759: warning: No description found for parameter 'wq_head' include/linux/wait.h:759: warning: Excess function parameter 'wq' description in 'wait_event_killable' include/linux/kthread.h:26: warning: Excess function parameter '...' description in 'kthread_create' kernel/sys.c:1: warning: no structured comments found include/linux/device.h:968: warning: No description found for parameter 'dma_ops' drivers/dma-buf/seqno-fence.c:1: warning: no structured comments found include/linux/sync_file.h:51: warning: No description found for parameter 'flags' include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly' include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev' include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig' include/linux/device.h:969: warning: No description found for parameter 'dma_ops' drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link' drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done' drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc' drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense' drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page' drivers/mtd/nand/nand_base.c:2751: warning: Excess function parameter 'cached' description in 'nand_write_page' arch/s390/include/asm/cmb.h:1: warning: no structured comments found drivers/scsi/scsi_lib.c:1116: warning: No description found for parameter 'rq' drivers/scsi/constants.c:1: warning: no structured comments found include/linux/usb/gadget.h:230: warning: No description found for parameter 'claimed' include/linux/usb/gadget.h:230: warning: No description found for parameter 'enabled' include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_altset_not_supp' include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_stall_not_supp' include/linux/usb/gadget.h:412: warning: No description found for parameter 'quirk_zlp_not_supp' fs/inode.c:1666: warning: No description found for parameter 'rcu' include/linux/jbd2.h:443: warning: No description found for parameter 'i_transaction' include/linux/jbd2.h:443: warning: No description found for parameter 'i_next_transaction' include/linux/jbd2.h:443: warning: No description found for parameter 'i_list' include/linux/jbd2.h:443: warning: No description found for parameter 'i_vfs_inode' include/linux/jbd2.h:443: warning: No description found for parameter 'i_flags' include/linux/jbd2.h:497: warning: No description found for parameter 'h_rsv_handle' include/linux/jbd2.h:497: warning: No description found for parameter 'h_reserved' include/linux/jbd2.h:497: warning: No description found for parameter 'h_type' include/linux/jbd2.h:497: warning: No description found for parameter 'h_line_no' include/linux/jbd2.h:497: warning: No description found for parameter 'h_start_jiffies' include/linux/jbd2.h:497: warning: No description found for parameter 'h_requested_credits' include/linux/jbd2.h:497: warning: No description found for parameter 'saved_alloc_context' include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chkpt_bhs' include/linux/jbd2.h:1050: warning: No description found for parameter 'j_devname' include/linux/jbd2.h:1050: warning: No description found for parameter 'j_average_commit_time' include/linux/jbd2.h:1050: warning: No description found for parameter 'j_min_batch_time' include/linux/jbd2.h:1050: warning: No description found for parameter 'j_max_batch_time' include/linux/jbd2.h:1050: warning: No description found for parameter 'j_commit_callback' include/linux/jbd2.h:1050: warning: No description found for parameter 'j_failed_commit' include/linux/jbd2.h:1050: warning: No description found for parameter 'j_chksum_driver' include/linux/jbd2.h:1050: warning: No description found for parameter 'j_csum_seed' fs/jbd2/transaction.c:511: warning: No description found for parameter 'type' fs/jbd2/transaction.c:511: warning: No description found for parameter 'line_no' fs/jbd2/transaction.c:641: warning: No description found for parameter 'gfp_mask' include/drm/drm_drv.h:593: warning: No description found for parameter 'gem_prime_pin' include/drm/drm_drv.h:593: warning: No description found for parameter 'gem_prime_unpin' include/drm/drm_drv.h:593: warning: No description found for parameter 'gem_prime_res_obj' include/drm/drm_drv.h:593: warning: No description found for parameter 'gem_prime_get_sg_table' include/drm/drm_drv.h:593: warning: No description found for parameter 'gem_prime_import_sg_table' include/drm/drm_drv.h:593: warning: No description found for parameter 'gem_prime_vmap' include/drm/drm_drv.h:593: warning: No description found for parameter 'gem_prime_vunmap' include/drm/drm_drv.h:593: warning: No description found for parameter 'gem_prime_mmap' include/drm/drm_mode_config.h:771: warning: No description found for parameter 'modifiers_property' include/drm/drm_mode_config.h:771: warning: Excess struct/union/enum/typedef member 'modifiers' description in 'drm_mode_config' include/drm/drm_plane.h:544: warning: No description found for parameter 'modifiers' include/drm/drm_plane.h:544: warning: No description found for parameter 'modifier_count'
include/drm/drm_syncobj.h:57: warning: No description found for parameter '_fence' include/drm/drm_syncobj.h:57: warning: Excess struct/union/enum/typedef member 'fence' description in 'drm_syncobj'
Documentation/doc-guide/sphinx.rst:121: ERROR: Unknown target name: "sphinx c domain". kernel/sched/fair.c:7584: WARNING: Inline emphasis start-string without end-string. kernel/time/timer.c:1200: ERROR: Unexpected indentation. kernel/time/timer.c:1202: ERROR: Unexpected indentation. kernel/time/timer.c:1203: WARNING: Block quote ends without a blank line; unexpected unindent. include/linux/wait.h:108: WARNING: Block quote ends without a blank line; unexpected unindent. include/linux/wait.h:111: ERROR: Unexpected indentation. include/linux/wait.h:113: WARNING: Block quote ends without a blank line; unexpected unindent. kernel/time/hrtimer.c:991: WARNING: Block quote ends without a blank line; unexpected unindent. kernel/signal.c:323: WARNING: Inline literal start-string without end-string. kernel/rcu/tree.c:3187: ERROR: Unexpected indentation. kernel/rcu/tree.c:3214: ERROR: Unexpected indentation. kernel/rcu/tree.c:3215: WARNING: Bullet list ends without a blank line; unexpected unindent. include/linux/iio/iio.h:219: ERROR: Unexpected indentation. include/linux/iio/iio.h:220: WARNING: Block quote ends without a blank line; unexpected unindent. include/linux/iio/iio.h:226: WARNING: Definition list ends without a blank line; unexpected unindent. drivers/iio/industrialio-core.c:633: ERROR: Unknown target name: "iio_val". drivers/iio/industrialio-core.c:640: ERROR: Unknown target name: "iio_val". drivers/ata/libata-core.c:5906: ERROR: Unknown target name: "hw". drivers/message/fusion/mptbase.c:5051: WARNING: Definition list ends without a blank line; unexpected unindent. drivers/tty/serial/serial_core.c:1897: WARNING: Definition list ends without a blank line; unexpected unindent. drivers/pci/pci.c:3470: ERROR: Unexpected indentation. include/linux/regulator/driver.h:271: ERROR: Unknown target name: "regulator_regmap_x_voltage". include/linux/spi/spi.h:373: ERROR: Unexpected indentation. drivers/w1/w1_io.c:196: WARNING: Definition list ends without a blank line; unexpected unindent. block/bio.c:404: ERROR: Unknown target name: "gfp". sound/soc/soc-core.c:2703: ERROR: Unknown target name: "snd_soc_daifmt". sound/core/jack.c:312: ERROR: Unknown target name: "snd_jack_btn". Documentation/media/v4l-drivers/imx.rst:: WARNING: document isn't included in any toctree Documentation/virtual/kvm/vcpu-requests.rst:: WARNING: document isn't included in any toctree Documentation/dev-tools/kselftest.rst:15: WARNING: Could not lex literal_block as "c". Highlighting skipped. Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "~/.fonts.conf", line 193: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 43: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 56: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 69: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 82: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 96: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 109: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 122: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 133: Having multiple values in <test> isn't supported and may not work as expected Fontconfig warning: "/home/kbuild/.config/fontconfig/fonts.conf", line 164: Having multiple values in <test> isn't supported and may not work as expected
vim +/_fence +57 include/drm/drm_syncobj.h
e9083420 Dave Airlie 2017-04-04 @57
:::::: The code at line 57 was first introduced by commit :::::: e9083420bbacce27e43d418064d0d2dfb4b37aaa drm: introduce sync objects (v4)
:::::: TO: Dave Airlie airlied@redhat.com :::::: CC: Dave Airlie airlied@redhat.com
--- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
This commit adds support for waiting on or signaling DRM syncobjs as part of execbuf. It does so by hijacking the currently unused cliprects pointer to instead point to an array of i915_gem_exec_fence structs which containe a DRM syncobj and a flags parameter which specifies whether to wait on it or to signal it. This implementation theoretically allows for both flags to be set in which case it waits on the dma_fence that was in the syncobj and then immediately replaces it with the dma_fence from the current execbuf.
v2: - Rebase on new syncobj API v3: - Pull everything out into helpers - Do all allocation in gem_execbuffer2 - Pack the flags in the bottom 2 bits of the drm_syncobj* v4: - Prevent a potential race on syncobj->fence
Testcase: igt/gem_exec_fence/syncobj* Signed-off-by: Jason Ekstrand jason@jlekstrand.net Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- drivers/gpu/drm/i915/i915_drv.c | 3 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 146 ++++++++++++++++++++++++++++- include/uapi/drm/i915_drm.h | 30 +++++- 3 files changed, 171 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 4c96a72..50db490 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -388,6 +388,7 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_EXEC_FENCE: case I915_PARAM_HAS_EXEC_CAPTURE: case I915_PARAM_HAS_EXEC_BATCH_FIRST: + case I915_PARAM_HAS_EXEC_FENCE_ARRAY: /* For the time being all of these are always true; * if some supported hardware does not have one of these * features this value needs to be provided from @@ -2738,7 +2739,7 @@ static struct drm_driver driver = { */ .driver_features = DRIVER_HAVE_IRQ | DRIVER_IRQ_SHARED | DRIVER_GEM | DRIVER_PRIME | - DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC, + DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_SYNCOBJ, .release = i915_driver_release, .open = i915_driver_open, .lastclose = i915_driver_lastclose, diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 929f275..8df845b 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -33,6 +33,7 @@
#include <drm/drmP.h> #include <drm/i915_drm.h> +#include <drm/drm_syncobj.h>
#include "i915_drv.h" #include "i915_gem_clflush.h" @@ -1882,8 +1883,10 @@ static bool i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec) return false;
/* Kernel clipping was a DRI1 misfeature */ - if (exec->num_cliprects || exec->cliprects_ptr) - return false; + if (!(exec->flags & I915_EXEC_FENCE_ARRAY)) { + if (exec->num_cliprects || exec->cliprects_ptr) + return false; + }
if (exec->DR4 == 0xffffffff) { DRM_DEBUG("UXA submitting garbage DR4, fixing up\n"); @@ -2114,11 +2117,125 @@ eb_select_engine(struct drm_i915_private *dev_priv, return engine; }
+static void __free_fence_array(struct drm_syncobj **fences, unsigned int n) +{ + while (n--) + drm_syncobj_put(ptr_mask_bits(fences[n], 2)); + kvfree(fences); +} + +static struct drm_syncobj **get_fence_array(struct drm_i915_gem_execbuffer2 *args, + struct drm_file *file) +{ + const unsigned int nfences = args->num_cliprects; + struct drm_i915_gem_exec_fence __user *user; + struct drm_syncobj **fences; + unsigned int n; + int err; + + if (!(args->flags & I915_EXEC_FENCE_ARRAY)) + return NULL; + + if (nfences > SIZE_MAX / sizeof(*fences)) + return ERR_PTR(-EINVAL); + + user = u64_to_user_ptr(args->cliprects_ptr); + if (!access_ok(VERIFY_READ, user, nfences * 2 * sizeof(u32))) + return ERR_PTR(-EFAULT); + + fences = kvmalloc_array(args->num_cliprects, sizeof(*fences), + __GFP_NOWARN | GFP_TEMPORARY); + if (!fences) + return ERR_PTR(-ENOMEM); + + for (n = 0; n < nfences; n++) { + struct drm_i915_gem_exec_fence fence; + struct drm_syncobj *syncobj; + + if (__copy_from_user(&fence, user++, sizeof(fence))) { + err = -EFAULT; + goto err; + } + + syncobj = drm_syncobj_find(file, fence.handle); + if (!syncobj) { + DRM_DEBUG("Invalid syncobj handle provided\n"); + err = -EINVAL; + goto err; + } + + fences[n] = ptr_pack_bits(syncobj, fence.flags, 2); + } + + return fences; + +err: + __free_fence_array(fences, n); + return ERR_PTR(err); +} + +static void put_fence_array(struct drm_i915_gem_execbuffer2 *args, + struct drm_syncobj **fences) +{ + if (!fences) + return; + __free_fence_array(fences, args->num_cliprects); +} + +static int await_fence_array(struct i915_execbuffer *eb, + struct drm_syncobj **fences) +{ + const unsigned int nfences = eb->args->num_cliprects; + unsigned int n; + int err; + + for (n = 0; n < nfences; n++) { + struct drm_syncobj *syncobj; + struct dma_fence *fence; + unsigned int flags; + + syncobj = ptr_unpack_bits(fences[n], &flags, 2); + if (!(flags & I915_EXEC_FENCE_WAIT)) + continue; + + fence = drm_syncobj_fence_get(syncobj); + if (!fence) + return -EINVAL; + + err = i915_gem_request_await_dma_fence(eb->request, fence); + dma_fence_put(fence); + if (err < 0) + return err; + } + + return 0; +} + +static void signal_fence_array(struct i915_execbuffer *eb, + struct drm_syncobj **fences) +{ + const unsigned int nfences = eb->args->num_cliprects; + struct dma_fence * const fence = &eb->request->fence; + unsigned int n; + + for (n = 0; n < nfences; n++) { + struct drm_syncobj *syncobj; + unsigned int flags; + + syncobj = ptr_unpack_bits(fences[n], &flags, 2); + if (!(flags & I915_EXEC_FENCE_SIGNAL)) + continue; + + drm_syncobj_replace_fence(syncobj, fence); + } +} + static int i915_gem_do_execbuffer(struct drm_device *dev, struct drm_file *file, struct drm_i915_gem_execbuffer2 *args, - struct drm_i915_gem_exec_object2 *exec) + struct drm_i915_gem_exec_object2 *exec, + struct drm_syncobj **fences) { struct i915_execbuffer eb; struct dma_fence *in_fence = NULL; @@ -2304,6 +2421,12 @@ i915_gem_do_execbuffer(struct drm_device *dev, goto err_request; }
+ if (fences) { + err = await_fence_array(&eb, fences); + if (err < 0) + goto err_request; + } + if (out_fence_fd != -1) { out_fence = sync_file_create(&eb.request->fence); if (!out_fence) { @@ -2327,6 +2450,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, __i915_add_request(eb.request, err == 0); add_to_client(eb.request, file);
+ if (fences) + signal_fence_array(&eb, fences); + if (out_fence) { if (err == 0) { fd_install(out_fence_fd, out_fence->file); @@ -2428,7 +2554,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data, exec2_list[i].flags = 0; }
- err = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list); + err = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list, NULL); if (exec2.flags & __EXEC_HAS_RELOC) { struct drm_i915_gem_exec_object __user *user_exec_list = u64_to_user_ptr(args->buffers_ptr); @@ -2460,6 +2586,7 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data, const size_t sz = sizeof(struct drm_i915_gem_exec_object2); struct drm_i915_gem_execbuffer2 *args = data; struct drm_i915_gem_exec_object2 *exec2_list; + struct drm_syncobj **fences = NULL; int err;
if (args->buffer_count < 1 || args->buffer_count > SIZE_MAX / sz - 1) { @@ -2486,7 +2613,15 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data, return -EFAULT; }
- err = i915_gem_do_execbuffer(dev, file, args, exec2_list); + if (args->flags & I915_EXEC_FENCE_ARRAY) { + fences = get_fence_array(args, file); + if (IS_ERR(fences)) { + kvfree(exec2_list); + return PTR_ERR(fences); + } + } + + err = i915_gem_do_execbuffer(dev, file, args, exec2_list, fences);
/* * Now that we have begun execution of the batchbuffer, we ignore @@ -2517,5 +2652,6 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS; kvfree(exec2_list); + put_fence_array(args, fences); return err; } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 7ccbd6a..bb3fb49 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -431,6 +431,11 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_BATCH_FIRST 48
+/* Query whether DRM_I915_GEM_EXECBUFFER2 supports supplying an array of + * drm_i915_gem_exec_fence structures. See I915_EXEC_FENCE_ARRAY. + */ +#define I915_PARAM_HAS_EXEC_FENCE_ARRAY 49 + typedef struct drm_i915_getparam { __s32 param; /* @@ -812,6 +817,17 @@ struct drm_i915_gem_exec_object2 { __u64 rsvd2; };
+struct drm_i915_gem_exec_fence { + /** + * User's handle for a dma-fence to wait on or signal. + */ + __u32 handle; + +#define I915_EXEC_FENCE_WAIT (1<<0) +#define I915_EXEC_FENCE_SIGNAL (1<<1) + __u32 flags; +}; + struct drm_i915_gem_execbuffer2 { /** * List of gem_exec_object2 structs @@ -826,7 +842,10 @@ struct drm_i915_gem_execbuffer2 { __u32 DR1; __u32 DR4; __u32 num_cliprects; - /** This is a struct drm_clip_rect *cliprects */ + /** This is a struct drm_clip_rect *cliprects if I915_EXEC_FENCE_ARRAY + * is not set. If I915_EXEC_FENCE_ARRAY is set, then this is a + * struct drm_i915_gem_exec_fence *fences. + */ __u64 cliprects_ptr; #define I915_EXEC_RING_MASK (7<<0) #define I915_EXEC_DEFAULT (0<<0) @@ -927,7 +946,14 @@ struct drm_i915_gem_execbuffer2 { * element). */ #define I915_EXEC_BATCH_FIRST (1<<18) -#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_BATCH_FIRST<<1)) + +/* Setting I915_FENCE_ARRAY implies that num_cliprects and cliprects_ptr + * define an array of i915_gem_exec_fence structures which specify a set of + * dma fences to wait upon or signal. + */ +#define I915_EXEC_FENCE_ARRAY (1<<19) + +#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
#define I915_EXEC_CONTEXT_ID_MASK (0xffffffff) #define i915_execbuffer2_set_context_id(eb2, context) \
We now have reviewed userspace for this:
https://lists.freedesktop.org/archives/mesa-dev/2017-August/166200.html
On Fri, Aug 11, 2017 at 3:39 PM, Jason Ekstrand jason@jlekstrand.net wrote:
From: Dave Airlie airlied@redhat.com
This interface will allow sync object to be used to back Vulkan fences. This API is pretty much the vulkan fence waiting API, and I've ported the code from amdgpu.
v2: accept relative timeout, pass remaining time back to userspace. v3: return to absolute timeouts. v4: absolute zero = poll, rewrite any/all code to have same operation for arrays return -EINVAL for 0 fences. v4.1: fixup fences allocation check, use u64_to_user_ptr v5: move to sec/nsec, and use timespec64 for calcs. v6: use -ETIME and drop the out status flag. (-ETIME is suggested by ickle, I can feel a shed painting) v7: talked to Daniel/Arnd, use ktime and ns everywhere. v8: be more careful in the timeout calculations use uint32_t for counter variables so we don't overflow graciously handle -ENOINT being returned from dma_fence_wait_timeout
Signed-off-by: Dave Airlie airlied@redhat.com Reviewed-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/drm_internal.h | 2 + drivers/gpu/drm/drm_ioctl.c | 2 + drivers/gpu/drm/drm_syncobj.c | 142 +++++++++++++++++++++++++++++++++++++++++ include/uapi/drm/drm.h | 12 ++++ 4 files changed, 158 insertions(+)
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 4e906b8..534e5ac 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -167,3 +167,5 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private); diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 8bfeb32..2ab0ff90 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -657,6 +657,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_UNLOCKED|DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, drm_syncobj_fd_to_handle_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl, + DRM_UNLOCKED|DRM_RENDER_ALLOW), };
#define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls ) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 0190ec2..e1f96b9 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -1,5 +1,7 @@ /* * Copyright 2017 Red Hat + * Parts ported from amdgpu (fence wait code). + * Copyright 2016 Advanced Micro Devices, Inc. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), @@ -31,6 +33,9 @@ * that contain an optional fence. The fence can be updated with a new * fence, or be NULL. * + * syncobj's can be waited upon, where it will wait for the underlying + * fence. + * * syncobj's can be export to fd's and back, these fd's are opaque and * have no other use case, except passing the syncobj between processes. * @@ -453,3 +458,140 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, return drm_syncobj_fd_to_handle(file_private, args->fd, &args->handle); } + +/** + * drm_timeout_abs_to_jiffies - calculate jiffies timeout from absolute value + * + * @timeout_nsec: timeout nsec component in ns, 0 for poll + * + * Calculate the timeout in jiffies from an absolute time in sec/nsec. + */ +static signed long drm_timeout_abs_to_jiffies(int64_t timeout_nsec) +{ + ktime_t abs_timeout, now; + u64 timeout_ns, timeout_jiffies64; + + /* make 0 timeout means poll - absolute 0 doesn't seem valid */ + if (timeout_nsec == 0) + return 0; + + abs_timeout = ns_to_ktime(timeout_nsec); + now = ktime_get(); + + if (!ktime_after(abs_timeout, now)) + return 0; + + timeout_ns = ktime_to_ns(ktime_sub(abs_timeout, now)); + + timeout_jiffies64 = nsecs_to_jiffies64(timeout_ns); + /* clamp timeout to avoid infinite timeout */ + if (timeout_jiffies64 >= MAX_SCHEDULE_TIMEOUT - 1) + return MAX_SCHEDULE_TIMEOUT - 1; + + return timeout_jiffies64 + 1; +} + +static int drm_syncobj_wait_fences(struct drm_device *dev, + struct drm_file *file_private, + struct drm_syncobj_wait *wait, + struct dma_fence **fences) +{ + signed long timeout = drm_timeout_abs_to_jiffies(wait->timeout_nsec); + signed long ret = 0; + uint32_t first = ~0; + + if (wait->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) { + uint32_t i; + for (i = 0; i < wait->count_handles; i++) { + ret = dma_fence_wait_timeout(fences[i], true, timeout); + + /* Various dma_fence wait callbacks will return + * ENOENT to indicate that the fence has already + * been signaled. We need to sanitize this to 0 so + * we don't return early and the client doesn't see + * an unexpected error. + */ + if (ret == -ENOENT) + ret = 0; + + if (ret < 0) + return ret; + if (ret == 0) + break; + timeout = ret; + } + first = 0; + } else { + ret = dma_fence_wait_any_timeout(fences, + wait->count_handles, + true, timeout, + &first); + } + + if (ret < 0) + return ret; + + wait->first_signaled = first; + if (ret == 0) + return -ETIME; + return 0; +} + +int +drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private) +{ + struct drm_syncobj_wait *args = data; + uint32_t *handles; + struct dma_fence **fences; + int ret = 0; + uint32_t i; + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) + return -ENODEV; + + if (args->flags != 0 && args->flags != DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) + return -EINVAL; + + if (args->count_handles == 0) + return -EINVAL; + + /* Get the handles from userspace */ + handles = kmalloc_array(args->count_handles, sizeof(uint32_t), + GFP_KERNEL); + if (handles == NULL) + return -ENOMEM; + + if (copy_from_user(handles, + u64_to_user_ptr(args->handles), + sizeof(uint32_t) * args->count_handles)) { + ret = -EFAULT; + goto err_free_handles; + } + + fences = kcalloc(args->count_handles, + sizeof(struct dma_fence *), GFP_KERNEL); + if (!fences) { + ret = -ENOMEM; + goto err_free_handles; + } + + for (i = 0; i < args->count_handles; i++) { + ret = drm_syncobj_find_fence(file_private, handles[i], + &fences[i]); + if (ret) + goto err_free_fence_array; + } + + ret = drm_syncobj_wait_fences(dev, file_private, + args, fences); + +err_free_fence_array: + for (i = 0; i < args->count_handles; i++) + dma_fence_put(fences[i]); + kfree(fences); +err_free_handles: + kfree(handles); + + return ret; +} diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 101593a..0757c1a 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -718,6 +718,17 @@ struct drm_syncobj_handle { __u32 pad; };
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0) +struct drm_syncobj_wait { + __u64 handles; + /* absolute timeout */ + __s64 timeout_nsec; + __u32 count_handles; + __u32 flags; + __u32 first_signaled; /* only valid when not waiting all */ + __u32 pad; +}; + #if defined(__cplusplus) } #endif @@ -840,6 +851,7 @@ extern "C" { #define DRM_IOCTL_SYNCOBJ_DESTROY DRM_IOWR(0xC0, struct drm_syncobj_destroy) #define DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD DRM_IOWR(0xC1, struct drm_syncobj_handle) #define DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE DRM_IOWR(0xC2, struct drm_syncobj_handle) +#define DRM_IOCTL_SYNCOBJ_WAIT DRM_IOWR(0xC3, struct drm_syncobj_wait)
/** * Device specific ioctls should only be in their respective headers
dma_fence_wait_any_timeout only relies on two things to work correctly:
1) The callback list to get called upon fence signal 2) The SIGNALED flag to be set upon fence signal
The first if these is part of the core dma-fence API. The second is only mostly part of the core dma-fence API with the one issue that the flag may not be set and dma_fence_is_signaled may never return true if fence->ops->enable_signaling() has not been called. It's easy enough to work around that by always using dma_fence_is_signaled instead of manually checking the bit and falling through to a zero-timeout wait if none of the fences report that they are signaled.
v2: - Use dma_fence_is_signaled in test_signaled_any - Fall through to a zero-timeout wait
Cc: Christian König christian.koenig@amd.com Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/dma-buf/dma-fence.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 9a30279..0cac367 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -437,8 +437,7 @@ dma_fence_test_signaled_any(struct dma_fence **fences, uint32_t count, int i;
for (i = 0; i < count; ++i) { - struct dma_fence *fence = fences[i]; - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) { + if (dma_fence_is_signaled(fences[i])) { if (idx) *idx = i; return true; @@ -484,7 +483,13 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, return 1; }
- return 0; + /* There's a very annoying laxness in the dma_fence API + * here, in that backends are not required to automatically + * report when a fence is signaled prior to + * fence->ops->enable_signaling() being called. So here if + * we fail to match signaled_count, we need to fallthough + * and try a 0 timeout wait! + */ }
cb = kcalloc(count, sizeof(struct default_wait_cb), GFP_KERNEL); @@ -496,11 +501,6 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, for (i = 0; i < count; ++i) { struct dma_fence *fence = fences[i];
- if (fence->ops->wait != dma_fence_default_wait) { - ret = -EINVAL; - goto fence_rm_cb; - } - cb[i].task = current; if (dma_fence_add_callback(fence, &cb[i].base, dma_fence_default_wait_cb)) { @@ -511,7 +511,7 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } }
- while (ret > 0) { + do { if (intr) set_current_state(TASK_INTERRUPTIBLE); else @@ -524,7 +524,7 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
if (ret > 0 && intr && signal_pending(current)) ret = -ERESTARTSYS; - } + } while (ret > 0);
__set_current_state(TASK_RUNNING);
This just resets the dma_fence to NULL so it looks like it's never been signaled. This will be useful once we add the new wait API for allowing wait on "submit and signal" behavior.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c | 2 ++ drivers/gpu/drm/drm_syncobj.c | 24 ++++++++++++++++++++++++ include/uapi/drm/drm.h | 6 ++++++ 4 files changed, 34 insertions(+)
diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 534e5ac..83f1615 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -169,3 +169,5 @@ int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private); diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index 2ab0ff90..4185483 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -659,6 +659,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_UNLOCKED|DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_RESET, drm_syncobj_reset_ioctl, + DRM_UNLOCKED|DRM_RENDER_ALLOW), };
#define DRM_CORE_IOCTL_COUNT ARRAY_SIZE( drm_ioctls ) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index e1f96b9..226f1e7 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -595,3 +595,27 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
return ret; } + +int +drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private) +{ + struct drm_syncobj_reset *args = data; + struct drm_syncobj *syncobj; + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) + return -ENODEV; + + if (args->flags != 0) + return -EINVAL; + + syncobj = drm_syncobj_find(file_private, args->handle); + if (!syncobj) + return -ENOENT; + + drm_syncobj_replace_fence(syncobj, NULL); + + drm_syncobj_put(syncobj); + + return 0; +} diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 0757c1a..4b301b4 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -729,6 +729,11 @@ struct drm_syncobj_wait { __u32 pad; };
+struct drm_syncobj_reset { + __u32 handle; + __u32 flags; +}; + #if defined(__cplusplus) } #endif @@ -852,6 +857,7 @@ extern "C" { #define DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD DRM_IOWR(0xC1, struct drm_syncobj_handle) #define DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE DRM_IOWR(0xC2, struct drm_syncobj_handle) #define DRM_IOCTL_SYNCOBJ_WAIT DRM_IOWR(0xC3, struct drm_syncobj_wait) +#define DRM_IOCTL_SYNCOBJ_RESET DRM_IOWR(0xC4, struct drm_syncobj_reset)
/** * Device specific ioctls should only be in their respective headers
From: Chris Wilson chris@chris-wilson.co.uk
This is an illegal scenario, to free the fence whilst there are pending callbacks. Currently, we emit a WARN and then cast aside the callbacks leaving them dangling. Alternatively, we could set an error on the fence and then signal fence so that any dependency chains from the fence can be tidied up, and if they care they can check for the error.
The question is whether or not the cure is worse than the disease (premature fence signaling is never pretty).
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk --- drivers/dma-buf/dma-fence.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 0cac367..4062708 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -172,7 +172,19 @@ void dma_fence_release(struct kref *kref)
trace_dma_fence_destroy(fence);
- WARN_ON(!list_empty(&fence->cb_list)); + if (WARN_ON(!list_empty(&fence->cb_list))) { + unsigned long flags; + + /* + * This should never happen, but if it does make sure that we + * don't leave chains dangling. We set the error flag first + * so that the callbacks know this signal is due to an error. + */ + spin_lock_irqsave(fence->lock, flags); + fence->error = -EDEADLK; + dma_fence_signal_locked(fence); + spin_unlock_irqrestore(fence->lock, flags); + }
if (fence->ops->release) fence->ops->release(fence);
Hi,
2017-08-11 Jason Ekstrand jason@jlekstrand.net:
Thanks for the patch!
Reviewed-by: Gustavo Padovan gustavo.padovan@collabora.com
Quoting Gustavo Padovan (2018-01-31 12:32:21)
If you are picking up this version, please also include a Reviewed-by: Christian König christian.koenig@amd.com given to an earlier posting, which included a discussion on the pitiful lack of EHUMAN. -Chris
From: Chris Wilson chris@chris-wilson.co.uk
Proxy fences allow you to create a place-holder fence which you can later assign to the "real" fence at which point the two look indistinguishable to anyone other than the driver which created the real fence. These should be used with care as it is the responsibility of proxy fences creator to ensure that it eventually gets assigned a real fence so it gets signaled.
Signed-off-by: Jason Ekstrand jason@jlekstrand.net --- drivers/dma-buf/Makefile | 4 +- drivers/dma-buf/dma-fence-proxy.c | 186 ++++++++++++++++++++++++++++++++++++++ include/linux/dma-fence-proxy.h | 25 +++++ 3 files changed, 214 insertions(+), 1 deletion(-) create mode 100644 drivers/dma-buf/dma-fence-proxy.c create mode 100644 include/linux/dma-fence-proxy.h
diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index c33bf88..e468215 100644 --- a/drivers/dma-buf/Makefile +++ b/drivers/dma-buf/Makefile @@ -1,3 +1,5 @@ -obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o +obj-y := dma-buf.o \ + dma-fence.o dma-fence-array.o dma-fence-proxy.o \ + reservation.o seqno-fence.o obj-$(CONFIG_SYNC_FILE) += sync_file.o obj-$(CONFIG_SW_SYNC) += sw_sync.o sync_debug.o diff --git a/drivers/dma-buf/dma-fence-proxy.c b/drivers/dma-buf/dma-fence-proxy.c new file mode 100644 index 0000000..61bf8e5 --- /dev/null +++ b/drivers/dma-buf/dma-fence-proxy.c @@ -0,0 +1,186 @@ +/* + * dma-fence-proxy: placeholder unsignaled fence + * + * Copyright (C) 2017 Intel Corporation + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include <linux/dma-fence.h> +#include <linux/export.h> +#include <linux/irq_work.h> +#include <linux/slab.h> + +struct dma_fence_proxy { + struct dma_fence base; + spinlock_t lock; + + const char *driver_name; + void *tag; + + struct dma_fence *real; + struct dma_fence_cb cb; + struct irq_work work; +}; + +static const char *proxy_get_driver_name(struct dma_fence *fence) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + + return p->real ? p->real->ops->get_driver_name(p->real) : p->driver_name; +} + +static const char *proxy_get_timeline_name(struct dma_fence *fence) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + + return p->real ? p->real->ops->get_timeline_name(p->real) : "unset"; +} + +static void proxy_irq_work(struct irq_work *work) +{ + struct dma_fence_proxy *p = container_of(work, typeof(*p), work); + + dma_fence_signal(&p->base); + dma_fence_put(&p->base); +} + +static void proxy_callback(struct dma_fence *fence, struct dma_fence_cb *cb) +{ + struct dma_fence_proxy *p = container_of(cb, typeof(*p), cb); + + /* beware the alleged spinlock inversion */ + irq_work_queue(&p->work); +} + +static bool proxy_enable_signaling(struct dma_fence *fence) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + + if (!p->real) + return true; + + if (dma_fence_add_callback(p->real, &p->cb, proxy_callback)) + return false; + + dma_fence_get(fence); + return true; +} + +static bool proxy_signaled(struct dma_fence *fence) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + + return p->real ? dma_fence_is_signaled(p->real) : false; +} + +static void proxy_release(struct dma_fence *fence) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + + if (!p->real) + dma_fence_signal(&p->base); + + dma_fence_put(p->real); + dma_fence_free(&p->base); +} + +static const struct dma_fence_ops dma_fence_proxy_ops = { + .get_driver_name = proxy_get_driver_name, + .get_timeline_name = proxy_get_timeline_name, + .enable_signaling = proxy_enable_signaling, + .signaled = proxy_signaled, + .wait = dma_fence_default_wait, + .release = proxy_release, +}; + +/** + * dma_fence_proy_create - Create an unset proxy dma-fence + * @driver_name: The driver name to report; must outlive the fence + * @tag: A pointer which uniquely identifies the creator + */ +struct dma_fence *dma_fence_create_proxy(const char *driver_name, void *tag) +{ + struct dma_fence_proxy *p; + + p = kzalloc(sizeof(*p), GFP_KERNEL); + if (!p) + return NULL; + + p->driver_name = driver_name; + p->tag = tag; + spin_lock_init(&p->lock); + dma_fence_init(&p->base, &dma_fence_proxy_ops, &p->lock, 0, 0); + init_irq_work(&p->work, proxy_irq_work); + + return &p->base; +} +EXPORT_SYMBOL(dma_fence_create_proxy); + +static bool dma_fence_is_proxy(struct dma_fence *fence) +{ + return fence->ops == &dma_fence_proxy_ops; +} + +/** + * dma_fence_is_proxy_tagged - identify a proxy fence + * @fence: The fence to identify + * @tag: The tag pointer provided to dma_fence_create_proxy + * + * This returns true if this is a proxy fence tag is the same pointer as + * the tag provided to dma_fence_create_proxy. + */ +bool dma_fence_is_proxy_tagged(struct dma_fence *fence, void *tag) +{ + struct dma_fence_proxy *p = container_of(fence, typeof(*p), base); + + if (!dma_fence_is_proxy(fence)) + return false; + + return p->tag == tag; +} +EXPORT_SYMBOL(dma_fence_is_proxy_tagged); + +/** + * dma_fence_proxy_assign - assign a fence to a proxy fence + * @proxy: The proxy fence + * @real: The real fence to assign to proxy + * + * This assigns the given real fence to the proxy fence. From this point + * forward, the proxy fence will be almost indistinguishable from the real + * fence. It will report the same driver and timeline names and will + * signal when the real fence signals. If the real fence is already + * signaled when this function is called, it will signal as soon as it has + * any listeners, possibly immediately. + */ +void dma_fence_proxy_assign(struct dma_fence *proxy, struct dma_fence *real) +{ + struct dma_fence_proxy *p = container_of(proxy, typeof(*p), base); + unsigned long flags; + + BUG_ON(!dma_fence_is_proxy(proxy)); + BUG_ON(p->real); + + spin_lock_irqsave(p->base.lock, flags); + + p->real = dma_fence_get(real); + + if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &p->base.flags)) { + if (dma_fence_add_callback(real, &p->cb, proxy_callback)) + dma_fence_signal_locked(&p->base); + else + dma_fence_get(&p->base); + } else if (dma_fence_is_signaled(real)) { + dma_fence_signal_locked(&p->base); + } + + spin_unlock_irqrestore(p->base.lock, flags); +} +EXPORT_SYMBOL(dma_fence_proxy_assign); diff --git a/include/linux/dma-fence-proxy.h b/include/linux/dma-fence-proxy.h new file mode 100644 index 0000000..177a5165 --- /dev/null +++ b/include/linux/dma-fence-proxy.h @@ -0,0 +1,25 @@ +/* + * dma-fence-proxy: allows waiting upon unset fences + * + * Copyright (C) 2017 Intel Corporation + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef __LINUX_DMA_FENCE_PROXY_H +#define __LINUX_DMA_FENCE_PROXY_H + +#include <linux/dma-fence.h> + +struct dma_fence *dma_fence_create_proxy(const char *driver_name, void *tag); +bool dma_fence_is_proxy_tagged(struct dma_fence *fence, void *tag); +void dma_fence_proxy_assign(struct dma_fence *proxy, struct dma_fence *real); + +#endif /* __LINUX_DMA_FENCE_PROXY_H */
Vulkan VkFence semantics require that the application be able to perform a CPU wait on work which may not yet have been submitted. This is perfectly safe because the CPU wait has a timeout which will get triggered eventually if no work is ever submitted. This behavior is advantageous for multi-threaded workloads because, so long as all of the threads agree on what fences to use up-front, you don't have the extra cross-thread synchronization cost of thread A telling thread B that it has submitted its dependent work and thread B is now free to wait.
Within a single process, this can be implemented in the userspace driver by doing exactly the same kind of tracking the app would have to do using posix condition variables or similar. However, in order for this to work cross-process (as is required by VK_KHR_external_fence), we need to handle this in the kernel.
This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which instructs the IOCTL to wait for the syncobj to have a non-null fence and then wait on the fence. Combined with DRM_IOCTL_SYNCOBJ_RESET, you can easily get the Vulkan behavior.
v2: - Fix a bug in the invalid syncobj error path - Unify the wait-all and wait-any cases v3: - Unify the timeout == 0 case a bit with the timeout > 0 case - use wait_event_interruptible_timeout v4: - Use proxy fence
Signed-off-by: Jason Ekstrand jason@jlekstrand.net Cc: Dave Airlie airlied@redhat.com --- drivers/gpu/drm/drm_syncobj.c | 95 +++++++++++++++++++++++++++++++++++++++---- include/drm/drm_syncobj.h | 3 +- include/uapi/drm/drm.h | 1 + 3 files changed, 91 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 226f1e7..4f7c5e5 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -51,6 +51,7 @@ #include <linux/fs.h> #include <linux/anon_inodes.h> #include <linux/sync_file.h> +#include <linux/dma-fence-proxy.h>
#include "drm_internal.h" #include <drm/drm_syncobj.h> @@ -82,10 +83,45 @@ EXPORT_SYMBOL(drm_syncobj_find);
struct dma_fence *drm_syncobj_fence_get(struct drm_syncobj *syncobj) { - return dma_fence_get_rcu_safe(&syncobj->_fence); + struct dma_fence *fence = dma_fence_get_rcu_safe(&syncobj->_fence); + + /* Don't hand out our internal proxy fence. Proxy fences are only used + * for implementing WAIT_FLAGS_WAIT_FOR_SUBMIT behavior and should not + * be handed out to random drivers unless they are prepared to deal + * with the possibility that the fence will never get signaled. + */ + if (fence && dma_fence_is_proxy_tagged(fence, drm_syncobj_free)) { + dma_fence_put(fence); + return NULL; + } + + return fence; } EXPORT_SYMBOL(drm_syncobj_fence_get);
+static struct dma_fence *drm_syncobj_fence_proxy_get(struct drm_syncobj *syncobj) +{ + struct dma_fence *fence, *proxy; + + do { + fence = dma_fence_get_rcu_safe(&syncobj->_fence); + if (fence) + return fence; + + proxy = dma_fence_create_proxy("drm_syncobj", drm_syncobj_free); + if (!proxy) + return NULL; + + fence = cmpxchg(&syncobj->_fence, NULL, proxy); + if (!fence) + return dma_fence_get(proxy); + + dma_fence_put(proxy); + } while(1); + + return proxy; +} + /** * drm_syncobj_replace_fence - replace fence in a sync object. * @syncobj: Sync object to replace fence in @@ -97,10 +133,40 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence) { struct dma_fence *old_fence; + bool old_fence_is_proxy;
- if (fence) - dma_fence_get(fence); - old_fence = xchg(&syncobj->_fence, fence); + dma_fence_get(fence); + + do { + old_fence = dma_fence_get_rcu_safe(&syncobj->_fence); + old_fence_is_proxy = old_fence && + dma_fence_is_proxy_tagged(old_fence, drm_syncobj_free); + dma_fence_put(old_fence); + + if (!fence && old_fence_is_proxy) { + /* If we're replacing a proxy with NULL, just leave + * the proxy. + */ + return; + } + + if (cmpxchg(&syncobj->_fence, old_fence, fence) != old_fence) + continue; + } while(0); + + if (fence && old_fence_is_proxy) { + /* If we just replaced a proxy fence with a real fence, + * assign the real fence to the proxy so that it gets + * triggered when the real fence triggers. If we are + * replacing a proxy with NULL (such as through + * DRM_IOCTL_SYNCOBJ_RESET), we drop the fence on the floor + * and it will never get signaled. This is ok because the + * only code which waits on our proxy fences is + * SYNCOBJ_WAIT with WAIT_FLAGS_WAIT_FOR_SUBMIT set and the + * wait ioctl has a timeout which will eventually trigger. + */ + dma_fence_proxy_assign(old_fence, fence); + }
dma_fence_put(old_fence); } @@ -544,13 +610,15 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, struct drm_syncobj_wait *args = data; uint32_t *handles; struct dma_fence **fences; + struct drm_syncobj *syncobj; int ret = 0; uint32_t i;
if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ)) return -ENODEV;
- if (args->flags != 0 && args->flags != DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) + if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL | + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)) return -EINVAL;
if (args->count_handles == 0) @@ -577,8 +645,21 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data, }
for (i = 0; i < args->count_handles; i++) { - ret = drm_syncobj_find_fence(file_private, handles[i], - &fences[i]); + syncobj = drm_syncobj_find(file_private, handles[i]); + if (!syncobj) { + ret = -ENOENT; + goto err_free_fence_array; + } + + if (args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + fences[i] = drm_syncobj_fence_proxy_get(syncobj); + if (!fences[i]) + ret = -ENOMEM; + } else { + fences[i] = drm_syncobj_fence_get(syncobj); + if (!fences[i]) + ret = -EINVAL; + } if (ret) goto err_free_fence_array; } diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index c06a441..29037d5 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -42,7 +42,8 @@ struct drm_syncobj { struct kref refcount; /** * @fence: - * NULL or a pointer to the fence bound to this object. + * NULL or a pointer to the fence bound to this object or a pointer + * to a proxy fence which will be assigned to the next bound fence. * * This pointer should not be accessed directly. Instead, use * drm_syncobj_fence_get or drm_syncobj_replace_fence. diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 4b301b4..f8ec8fe 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -719,6 +719,7 @@ struct drm_syncobj_handle { };
#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0) +#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1) struct drm_syncobj_wait { __u64 handles; /* absolute timeout */
Patches #1-#4 are Acked-by: Christian König christian.koenig@amd.com.
Patch #5: NAK, that will break radeon.
On radeon we need the non-default wait or otherwise we can run into a situation where we never signal a fence.
The general question is why do you need this?
Patch #6: Yes, please. Patch is Reviewed-by: Christian König christian.koenig@amd.com.
Patch #7: Already gave my rb on the patch Chris send out earlier.
Patch #8: NAK to the whole approach.
IIRC we discussed a very similar thing during the initial fence bringup and also during the fence_array development.
The problem is that you can easily build ring dependencies and so deadlocks with it.
I would really prefer an approach which is completely contained inside the syncobj code base.
Regards, Christian.
Am 12.08.2017 um 00:39 schrieb Jason Ekstrand:
On August 13, 2017 6:19:53 AM Christian König christian.koenig@amd.com wrote:
Because i915 sets a non-default wait function so calling wait_any just bails with fences from i915 immediately bails with -EINVAL. This makes it work even with non-default waits.
Are you use to the approach of internally making a proxy so long as all the proxy code is inside syncobj? Is also be happy to go back to the original approach with v3 of the last patch.
Am 13.08.2017 um 17:26 schrieb Jason Ekstrand:
Ok well, let me refine the question: Why does i915 sets a non-default wait function?
In radeon we have it because we need to handle 10+ years of different hardware generation, each which a bunch of separate bugs in their fence handling (and some even not solved by today).
Yes, that would be a start.
In general if possible I would rather like to avoid the whole handling with the proxy/callback altogether, but that possible only works with wait for any if the waitqueue is global and that wouldn't be ideal either.
Is also be happy to go back to the original approach with v3 of the last patch.
v3 looked like it should work as well, I would just drop abusing the fence callback structure for the signaling.
Ideally we would finally come up with an interface to wait for multiple waitqueue at the same time, but that probably goes a bit to far.
For now just use a single linked list to start all processes waiting for a fence to arrive or something like this.
Regards, Christian.
On August 13, 2017 8:52:21 AM Christian König christian.koenig@amd.com wrote:
I have no idea.
So how does wait_any returning -EINVAL for non-default waits help radeon?
I'm happy to get rid of the proxies. They did work nicely but I'm not really attached to them
I don't really know what you're suggesting. Patch v3 has a single waitqueue per process. Are you suggesting one per fence?
--Jason
On August 13, 2017 4:14:45 PM Jason Ekstrand jason@jlekstrand.net wrote:
I apologize for my plethora of questions. I'm still very new to this stuff and don't know what all is out there.
Am 14.08.2017 um 01:14 schrieb Jason Ekstrand:
Can you figure that out? I'm not completely against removing that limitation, but it would be a lot cleaner if we can just fix i915 to not set a non-default wait function.
When the wait function detects a problem it reports -EDEADLK to the caller to signal that the hardware is stuck.
The Caller then goes up the call chain and ultimately reports this to the IOCTL where the error is handled with a hardware reset. And yeah, I perfectly know that this design sucks badly.
The point is we should try to prevent that the wait for any function is even used together with a radeon fence.
Yeah, more or less. What I'm suggesting is to use one wait_queue_head_t per drm_syncobj.
See a wait_queue is a callback mechanism anyway, so you are wrapping a callback mechanism inside another callback mechanism and that makes not really much sense.
The problem is that we don't have a wait_event_* variant which can wait for multiple events. Somebody should really add something like that :)
Regards, Christian.
On Mon, Aug 14, 2017 at 12:36 AM, Christian König christian.koenig@amd.com wrote:
I asked on IRC how bad it would be and chris replied with "a lot of work". He didn't say it's impossible but, apparently, it is a pile of work.
Could we use dma_fence::err for this? i.e., could the radeon driver set the error bit and then have wait_any check for errors on wakeup and report the first one it sees?
The point is we should try to prevent that the wait for any function is even used together with a radeon fence.
Does radeon not support sync_file? Because this is the same mechanism it uses for poll.
Fair enough. There is one little snag though: We need to wait on sync objects and fences at the same time in order for WAIT_ANY | WAIT_FOR_SUBMIT to work. I see two options here:
1) Convert dma-fence to use waitqueue instead of its callback mechanism and add a wait_queue_any. A quick grep for dma_fence_add_callback says that this would affect four drivers.
2) Drop waitqueues and go back and fix patch v2 so that it does the wait correctly.
--Jason
On Mon, Aug 14, 2017 at 8:08 AM, Jason Ekstrand jason@jlekstrand.net wrote:
For what it's worth, my proxy-based implementation does not suffer from the same issues as the one suggested by Chris. I was very careful to ensure that drm_syncobj_fence_get will never return a proxy - if the current fence is a proxy, it returns NULL. It should look no different to drivers than any of the other interfaces I've proposed from that perspective.
The more I think about it, the less sense using waitqueues makes. The fundamental problem here is that the event we are waiting on is actually the concatenation of two events: submit and signal. Since we are waiting on several of these pairs of concatenated events simultaneously, the only two options we have are to either combine them into one event (the proxy approach) or to implement a wait which is capable of handling both at the same time. I don't see a way to do the latter with wait queues.
--Jason
Agree completely.
Essentially we would need to enable wait_event_* to wait for multiple events and then convert all the fence callback stuff to wait_event structures.
But that is certainly outside the scope of this patchset, so feel free to go ahead with the approach of waiting manually (but please without the bugs).
Well if you got a student/interim with free time that would certainly be a nice cleanup task to start on kernel work.
Regards, Christian.
On Wed, Aug 16, 2017 at 9:53 AM, Christian König christian.koenig@amd.com wrote:
Well, the patches I sent last night should do just that. It's mostly the original approach but with the bugfixes from versions 3 and 4. Modulo finding additional bugs, I think they should be good to go.
--Jason
Am 21.08.2017 um 23:42 schrieb Jason Ekstrand:
I just way to much to do at the moment (as usually). Feel free to add an Acked-by: Christian König christian.koenig@amd.com to the patches in the meantime, but an detailed review would have to wait a bit.
Sorry for the delay, Christian.
dri-devel@lists.freedesktop.org