[PATCH 1/5] drm: fix syncobj null_fence_enable_signaling

List overview All Threads
Download

newer

older

[Bug 105046] Screen resolution...

[PATCH 0/5] drm: byteorder fixes

Chunming Zhou

29 Aug 2018 29 Aug '18

10:44 a.m.

That is certainly totally nonsense. dma_fence_enable_sw_signaling() is the function who is calling this callback.

Signed-off-by: Chunming Zhou david1.zhou@amd.com Cc: Jason Ekstrand jason@jlekstrand.net Reviewed-by: Christian König christian.koenig@amd.com Acked-by: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/drm_syncobj.c | 1 - 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 3a8837c49639..d17ed75ac7e2 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -184,7 +184,6 @@ static const char *drm_syncobj_null_fence_get_name(struct dma_fence *fence)

static bool drm_syncobj_null_fence_enable_signaling(struct dma_fence *fence) { - dma_fence_enable_sw_signaling(fence); return !dma_fence_is_signaled(fence); }

-- 2.17.1

Show replies by date

Chunming Zhou

29 Aug 29 Aug

10:44 a.m.

New subject: [PATCH 2/5] drm: rename null fence to stub fence in syncobj

stub fence will be used by timeline syncobj as well.

Signed-off-by: Chunming Zhou david1.zhou@amd.com Cc: Jason Ekstrand jason@jlekstrand.net Reviewed-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/drm_syncobj.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index d17ed75ac7e2..d4b48fb410a1 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -172,37 +172,37 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, } EXPORT_SYMBOL(drm_syncobj_replace_fence);

-struct drm_syncobj_null_fence { +struct drm_syncobj_stub_fence { struct dma_fence base; spinlock_t lock; };

-static const char *drm_syncobj_null_fence_get_name(struct dma_fence *fence) +static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) { - return "syncobjnull"; + return "syncobjstub"; }

-static bool drm_syncobj_null_fence_enable_signaling(struct dma_fence *fence) +static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) { return !dma_fence_is_signaled(fence); }

-static const struct dma_fence_ops drm_syncobj_null_fence_ops = { - .get_driver_name = drm_syncobj_null_fence_get_name, - .get_timeline_name = drm_syncobj_null_fence_get_name, - .enable_signaling = drm_syncobj_null_fence_enable_signaling, +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { + .get_driver_name = drm_syncobj_stub_fence_get_name, + .get_timeline_name = drm_syncobj_stub_fence_get_name, + .enable_signaling = drm_syncobj_stub_fence_enable_signaling, .release = NULL, };

static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) { - struct drm_syncobj_null_fence *fence; + struct drm_syncobj_stub_fence *fence; fence = kzalloc(sizeof(*fence), GFP_KERNEL); if (fence == NULL) return -ENOMEM;

spin_lock_init(&fence->lock); - dma_fence_init(&fence->base, &drm_syncobj_null_fence_ops, + dma_fence_init(&fence->base, &drm_syncobj_stub_fence_ops, &fence->lock, 0, 0); dma_fence_signal(&fence->base);

-- 2.17.1

Chunming Zhou

10:44 a.m.

New subject: [PATCH 3/5] drm: expand drm_syncobj_find_fence to support timeline point v2

we can fetch timeline point fence after expanded. v2: The parameter fence is the result of the function and should come last.

Signed-off-by: Chunming Zhou david1.zhou@amd.com Reviewed-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/drm_syncobj.c | 5 +++-- drivers/gpu/drm/v3d/v3d_gem.c | 4 ++-- drivers/gpu/drm/vc4/vc4_gem.c | 2 +- include/drm/drm_syncobj.h | 2 +- 5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 7a625f3989a0..0e8cf088175f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1062,7 +1062,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p, { int r; struct dma_fence *fence; - r = drm_syncobj_find_fence(p->filp, handle, &fence); + r = drm_syncobj_find_fence(p->filp, handle, 0, &fence); if (r) return r;

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index d4b48fb410a1..2dcb60f4c0f7 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -217,6 +217,7 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) * drm_syncobj_find_fence - lookup and reference the fence in a sync object * @file_private: drm file private pointer * @handle: sync object handle to lookup. + * @point: timeline point * @fence: out parameter for the fence * * This is just a convenience function that combines drm_syncobj_find() and @@ -227,7 +228,7 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) * dma_fence_put(). */ int drm_syncobj_find_fence(struct drm_file *file_private, - u32 handle, + u32 handle, u64 point, struct dma_fence **fence) { struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle); @@ -498,7 +499,7 @@ static int drm_syncobj_export_sync_file(struct drm_file *file_private, if (fd < 0) return fd;

- ret = drm_syncobj_find_fence(file_private, handle, &fence); + ret = drm_syncobj_find_fence(file_private, handle, 0, &fence); if (ret) goto err_put_fd;

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index e1fcbb4cd0ae..d25c35c45c33 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -521,12 +521,12 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data, kref_init(&exec->refcount);

ret = drm_syncobj_find_fence(file_priv, args->in_sync_bcl, - &exec->bin.in_fence); + 0, &exec->bin.in_fence); if (ret == -EINVAL) goto fail;

ret = drm_syncobj_find_fence(file_priv, args->in_sync_rcl, - &exec->render.in_fence); + 0, &exec->render.in_fence); if (ret == -EINVAL) goto fail;

diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c index 7910b9acedd6..928718b467bd 100644 --- a/drivers/gpu/drm/vc4/vc4_gem.c +++ b/drivers/gpu/drm/vc4/vc4_gem.c @@ -1173,7 +1173,7 @@ vc4_submit_cl_ioctl(struct drm_device *dev, void *data,

if (args->in_sync) { ret = drm_syncobj_find_fence(file_priv, args->in_sync, - &in_fence); + 0, &in_fence); if (ret) goto fail;

diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index e419c79ba94d..ab9055f943c7 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -134,7 +134,7 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, struct dma_fence *fence); int drm_syncobj_find_fence(struct drm_file *file_private, - u32 handle, + u32 handle, u64 point, struct dma_fence **fence); void drm_syncobj_free(struct kref *kref); int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,

-- 2.17.1

Chunming Zhou

10:44 a.m.

New subject: [PATCH 4/5] drm: expand replace_fence to support timeline point v2

we can place a fence to a timeline point after expanded. v2: change func parameter order

Signed-off-by: Chunming Zhou david1.zhou@amd.com Reviewed-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/drm_syncobj.c | 14 ++++++++------ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/v3d/v3d_gem.c | 2 +- drivers/gpu/drm/vc4/vc4_gem.c | 2 +- include/drm/drm_syncobj.h | 2 +- 6 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 0e8cf088175f..1dba9223927a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1151,7 +1151,7 @@ static void amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p) int i;

for (i = 0; i < p->num_post_dep_syncobjs; ++i) - drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence); + drm_syncobj_replace_fence(p->post_dep_syncobjs[i], 0, p->fence); }

static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 2dcb60f4c0f7..ab43559398d0 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -140,11 +140,13 @@ void drm_syncobj_remove_callback(struct drm_syncobj *syncobj, /** * drm_syncobj_replace_fence - replace fence in a sync object. * @syncobj: Sync object to replace fence in + * @point: timeline point * @fence: fence to install in sync file. * - * This replaces the fence on a sync object. + * This replaces the fence on a sync object, or a timeline point fence. */ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, + u64 point, struct dma_fence *fence) { struct dma_fence *old_fence; @@ -206,7 +208,7 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) &fence->lock, 0, 0); dma_fence_signal(&fence->base);

- drm_syncobj_replace_fence(syncobj, &fence->base); + drm_syncobj_replace_fence(syncobj, 0, &fence->base);

dma_fence_put(&fence->base);

@@ -257,7 +259,7 @@ void drm_syncobj_free(struct kref *kref) struct drm_syncobj *syncobj = container_of(kref, struct drm_syncobj, refcount); - drm_syncobj_replace_fence(syncobj, NULL); + drm_syncobj_replace_fence(syncobj, 0, NULL); kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free); @@ -297,7 +299,7 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, }

if (fence) - drm_syncobj_replace_fence(syncobj, fence); + drm_syncobj_replace_fence(syncobj, 0, fence);

*out_syncobj = syncobj; return 0; @@ -482,7 +484,7 @@ static int drm_syncobj_import_sync_file_fence(struct drm_file *file_private, return -ENOENT; }

- drm_syncobj_replace_fence(syncobj, fence); + drm_syncobj_replace_fence(syncobj, 0, fence); dma_fence_put(fence); drm_syncobj_put(syncobj); return 0; @@ -964,7 +966,7 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, return ret;

for (i = 0; i < args->count_handles; i++) - drm_syncobj_replace_fence(syncobjs[i], NULL); + drm_syncobj_replace_fence(syncobjs[i], 0, NULL);

drm_syncobj_array_free(syncobjs, args->count_handles);

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 60dc2a865f5f..7209dd832d39 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2211,7 +2211,7 @@ signal_fence_array(struct i915_execbuffer *eb, if (!(flags & I915_EXEC_FENCE_SIGNAL)) continue;

- drm_syncobj_replace_fence(syncobj, fence); + drm_syncobj_replace_fence(syncobj, 0, fence); } }

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index d25c35c45c33..edb4b3651e1d 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -586,7 +586,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data, /* Update the return sync object for the */ sync_out = drm_syncobj_find(file_priv, args->out_sync); if (sync_out) { - drm_syncobj_replace_fence(sync_out, + drm_syncobj_replace_fence(sync_out, 0, &exec->render.base.s_fence->finished); drm_syncobj_put(sync_out); } diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c index 928718b467bd..5b22e996af6c 100644 --- a/drivers/gpu/drm/vc4/vc4_gem.c +++ b/drivers/gpu/drm/vc4/vc4_gem.c @@ -681,7 +681,7 @@ vc4_queue_submit(struct drm_device *dev, struct vc4_exec_info *exec, exec->fence = &fence->base;

if (out_sync) - drm_syncobj_replace_fence(out_sync, exec->fence); + drm_syncobj_replace_fence(out_sync, 0, exec->fence);

vc4_update_bo_seqnos(exec, seqno);

diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index ab9055f943c7..425432b85a87 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -131,7 +131,7 @@ drm_syncobj_fence_get(struct drm_syncobj *syncobj)

struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, u32 handle); -void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, +void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point, struct dma_fence *fence); int drm_syncobj_find_fence(struct drm_file *file_private, u32 handle, u64 point,

-- 2.17.1

Chunming Zhou

10:44 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations: * CPU query - A host operation that allows querying the payload of the timeline semaphore. * CPU wait - A host operation that allows a blocking wait for a timeline semaphore to reach a specified value. * Device wait - A device operation that allows waiting for a timeline semaphore to reach a specified value. * Device signal - A device operation that allows advancing the timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that.

v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. some bug fix and clean up 3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO: 1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou david1.zhou@amd.com Cc: Christian Konig christian.koenig@amd.com Cc: Dave Airlie airlied@redhat.com Cc: Daniel Rakos Daniel.Rakos@amd.com Cc: Daniel Vetter daniel@ffwll.ch --- drivers/gpu/drm/drm_syncobj.c | 593 ++++++++++++++++----- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 78 +-- include/uapi/drm/drm.h | 1 + 4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include <drm/drm_syncobj.h>

+/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + +struct drm_syncobj_stub_fence { + struct dma_fence base; + spinlock_t lock; +}; + +static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{ + return "syncobjstub"; +} + +static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{ + return !dma_fence_is_signaled(fence); +} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{ + kfree(f); +} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { + .get_driver_name = drm_syncobj_stub_fence_get_name, + .get_timeline_name = drm_syncobj_stub_fence_get_name, + .enable_signaling = drm_syncobj_stub_fence_enable_signaling, + .release = drm_syncobj_stub_fence_release, +}; + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; + /** * drm_syncobj_find - lookup and reference a sync object. * @file_private: drm file private pointer @@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find);

-static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, - struct drm_syncobj_cb *cb, - drm_syncobj_func_t func) +static void drm_syncobj_timeline_init(struct drm_syncobj_timeline + *syncobj_timeline) { - cb->func = func; - list_add_tail(&cb->node, &syncobj->cb_list); + syncobj_timeline->timeline_context = dma_fence_context_alloc(1); + syncobj_timeline->timeline = 0; + syncobj_timeline->signal_point = 0; + init_waitqueue_head(&syncobj_timeline->wq); + + syncobj_timeline->wait_pt_tree = RB_ROOT; + INIT_LIST_HEAD(&syncobj_timeline->signal_pt_list); }

-static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, - struct dma_fence **fence, - struct drm_syncobj_cb *cb, - drm_syncobj_func_t func) +static void drm_syncobj_timeline_fini(struct drm_syncobj *syncobj, + struct drm_syncobj_timeline *syncobj_timeline) { - int ret; + struct rb_node *node = NULL; + struct drm_syncobj_wait_pt *wait_pt = NULL; + struct drm_syncobj_signal_pt *signal_pt = NULL, *tmp; + + spin_lock(&syncobj->lock); + for(node = rb_first(&syncobj_timeline->wait_pt_tree); + node != NULL; ) { + wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); + node = rb_next(node); + rb_erase(&wait_pt->node, + &syncobj_timeline->wait_pt_tree); + RB_CLEAR_NODE(&wait_pt->node); + spin_unlock(&syncobj->lock); + dma_fence_wait(&wait_pt->base.base, true); + spin_lock(&syncobj->lock); + /* kfree(wait_pt) is excuted by fence put */ + dma_fence_put(&wait_pt->base.base); + } + list_for_each_entry_safe(signal_pt, tmp, + &syncobj_timeline->signal_pt_list, list) { + list_del(&signal_pt->list); + if (signal_pt->signal_fence) { + dma_fence_remove_callback(signal_pt->signal_fence, + &signal_pt->signal_cb); + dma_fence_put(signal_pt->signal_fence); + } + if (signal_pt->pre_pt_base) { + dma_fence_remove_callback(signal_pt->pre_pt_base, + &signal_pt->pre_pt_cb); + dma_fence_put(signal_pt->pre_pt_base); + } + dma_fence_put(&signal_pt->base.base); + } + spin_unlock(&syncobj->lock); +} +

- *fence = drm_syncobj_fence_get(syncobj); - if (*fence) - return 1; +static bool drm_syncobj_normal_signal_wait_pts(struct drm_syncobj *syncobj, + u64 signal_pt) +{ + struct rb_node *node = NULL; + struct drm_syncobj_wait_pt *wait_pt = NULL;

spin_lock(&syncobj->lock); - /* We've already tried once to get a fence and failed. Now that we - * have the lock, try one more time just to be sure we don't add a - * callback when a fence has already been set. - */ - if (syncobj->fence) { - *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, - lockdep_is_held(&syncobj->lock))); - ret = 1; - } else { - *fence = NULL; - drm_syncobj_add_callback_locked(syncobj, cb, func); - ret = 0; + for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); + node != NULL; ) { + wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); + node = rb_next(node); + /* for normal syncobj */ + if (wait_pt->value == signal_pt) { + dma_fence_signal(&wait_pt->base.base); + rb_erase(&wait_pt->node, + &syncobj->syncobj_timeline.wait_pt_tree); + RB_CLEAR_NODE(&wait_pt->node); + /* kfree(wait_pt) is excuted by fence put */ + dma_fence_put(&wait_pt->base.base); + spin_unlock(&syncobj->lock); + return true; + } } spin_unlock(&syncobj->lock); + return false; +}

- return ret; +static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj) +{ + struct rb_node *node = NULL; + struct drm_syncobj_wait_pt *wait_pt = NULL; + + spin_lock(&syncobj->lock); + for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); + node != NULL; ) { + wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); + node = rb_next(node); + if (wait_pt->value <= syncobj->syncobj_timeline.timeline) { + dma_fence_signal(&wait_pt->base.base); + rb_erase(&wait_pt->node, + &syncobj->syncobj_timeline.wait_pt_tree); + RB_CLEAR_NODE(&wait_pt->node); + /* kfree(wait_pt) is excuted by fence put */ + dma_fence_put(&wait_pt->base.base); + } else { + /* the loop is from left to right, the later entry value is + * bigger, so don't need to check any more */ + break; + } + } + spin_unlock(&syncobj->lock); }

-void drm_syncobj_add_callback(struct drm_syncobj *syncobj, - struct drm_syncobj_cb *cb, - drm_syncobj_func_t func) + + +static void pt_fence_cb(struct drm_syncobj_signal_pt *signal_pt) { + struct dma_fence *fence = NULL; + struct drm_syncobj *syncobj; + struct drm_syncobj_signal_pt *tail_pt; + u64 pt_value = signal_pt->value; + + dma_fence_signal(&signal_pt->base.base); + fence = signal_pt->signal_fence; + signal_pt->signal_fence = NULL; + dma_fence_put(fence); + fence = signal_pt->pre_pt_base; + signal_pt->pre_pt_base = NULL; + dma_fence_put(fence); + + syncobj = signal_pt->syncobj; spin_lock(&syncobj->lock); - drm_syncobj_add_callback_locked(syncobj, cb, func); + syncobj->syncobj_timeline.timeline = pt_value; + tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, + struct drm_syncobj_signal_pt, list); + if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL && signal_pt != tail_pt) + || syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { + list_del(&signal_pt->list); + /* kfree(signal_pt) will be executed by below fence put */ + dma_fence_put(&signal_pt->base.base); + } spin_unlock(&syncobj->lock); + if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) + drm_syncobj_normal_signal_wait_pts(syncobj, pt_value); + else + drm_syncobj_timeline_signal_wait_pts(syncobj); } +static void pt_signal_fence_func(struct dma_fence *fence, + struct dma_fence_cb *cb) +{ + struct drm_syncobj_signal_pt *signal_pt = + container_of(cb, struct drm_syncobj_signal_pt, signal_cb);

-void drm_syncobj_remove_callback(struct drm_syncobj *syncobj, - struct drm_syncobj_cb *cb) + if (signal_pt->pre_pt_base && + !dma_fence_is_signaled(signal_pt->pre_pt_base)) + return; + + pt_fence_cb(signal_pt); +} +static void pt_pre_fence_func(struct dma_fence *fence, + struct dma_fence_cb *cb) +{ + struct drm_syncobj_signal_pt *signal_pt = + container_of(cb, struct drm_syncobj_signal_pt, pre_pt_cb); + + if (signal_pt->signal_fence && + !dma_fence_is_signaled(signal_pt->pre_pt_base)) + return; + + pt_fence_cb(signal_pt); +} + +static int drm_syncobj_timeline_create_signal_pt(struct drm_syncobj *syncobj, + struct dma_fence *fence, + u64 point) { + struct drm_syncobj_signal_pt *signal_pt = + kzalloc(sizeof(struct drm_syncobj_signal_pt), GFP_KERNEL); + struct drm_syncobj_stub_fence *base; + struct drm_syncobj_signal_pt *tail_pt; + struct dma_fence *tail_pt_fence = NULL; + int ret = 0; + + if (!signal_pt) + return -ENOMEM; + if (syncobj->syncobj_timeline.signal_point >= point) { + DRM_WARN("A later signal is ready!"); + goto out; + } + if (!fence) + goto out; + dma_fence_get(fence); spin_lock(&syncobj->lock); - list_del_init(&cb->node); + base = &signal_pt->base; + spin_lock_init(&base->lock); + dma_fence_init(&base->base, + &drm_syncobj_stub_fence_ops, + &base->lock, + syncobj->syncobj_timeline.timeline_context, point); + signal_pt->signal_fence = fence; + /* timeline syncobj must take this dependency */ + if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { + if (!list_empty(&syncobj->syncobj_timeline.signal_pt_list)) { + tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, + struct drm_syncobj_signal_pt, list); + tail_pt_fence = &tail_pt->base.base; + if (dma_fence_is_signaled(tail_pt_fence)) + tail_pt_fence = NULL; + else + signal_pt->pre_pt_base = + dma_fence_get(tail_pt_fence); + } + } + + signal_pt->value = point; + syncobj->syncobj_timeline.signal_point = point; + signal_pt->syncobj = syncobj; + INIT_LIST_HEAD(&signal_pt->list); + list_add_tail(&signal_pt->list, &syncobj->syncobj_timeline.signal_pt_list); spin_unlock(&syncobj->lock); + wake_up_all(&syncobj->syncobj_timeline.wq); + /** + * Every pt is depending on signal fence and previous pt fence, add + * callbacks to them + */ + ret = dma_fence_add_callback(signal_pt->signal_fence, + &signal_pt->signal_cb, + pt_signal_fence_func); + + if (signal_pt->pre_pt_base) { + ret = dma_fence_add_callback(signal_pt->pre_pt_base, + &signal_pt->pre_pt_cb, + pt_pre_fence_func); + if (ret == -ENOENT) + pt_pre_fence_func(signal_pt->pre_pt_base, + &signal_pt->pre_pt_cb); + else if (ret) + goto out; + } else if (ret == -ENOENT) { + pt_signal_fence_func(signal_pt->signal_fence, + &signal_pt->signal_cb); + } else if (ret) { + goto out; + } + + return 0; +out: + dma_fence_put(&signal_pt->base.base); + return ret; }

/** @@ -149,53 +381,30 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point, struct dma_fence *fence) { - struct dma_fence *old_fence; - struct drm_syncobj_cb *cur, *tmp; - - if (fence) - dma_fence_get(fence); - - spin_lock(&syncobj->lock); - - old_fence = rcu_dereference_protected(syncobj->fence, - lockdep_is_held(&syncobj->lock)); - rcu_assign_pointer(syncobj->fence, fence); - - if (fence != old_fence) { - list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { - list_del_init(&cur->node); - cur->func(syncobj, cur); + if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { + if (fence) + drm_syncobj_timeline_create_signal_pt(syncobj, fence, + point); + return; + } else if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { + u64 pt_value; + + if (!fence) { + drm_syncobj_timeline_fini(syncobj, + &syncobj->syncobj_timeline); + drm_syncobj_timeline_init(&syncobj->syncobj_timeline); + return; } + pt_value = syncobj->syncobj_timeline.signal_point + + DRM_SYNCOBJ_NORMAL_POINT; + drm_syncobj_timeline_create_signal_pt(syncobj, fence, pt_value); + return; + } else { + DRM_ERROR("the syncobj type isn't support\n"); } - - spin_unlock(&syncobj->lock); - - dma_fence_put(old_fence); } EXPORT_SYMBOL(drm_syncobj_replace_fence);

-struct drm_syncobj_stub_fence { - struct dma_fence base; - spinlock_t lock; -}; - -static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) -{ - return "syncobjstub"; -} - -static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) -{ - return !dma_fence_is_signaled(fence); -} - -static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { - .get_driver_name = drm_syncobj_stub_fence_get_name, - .get_timeline_name = drm_syncobj_stub_fence_get_name, - .enable_signaling = drm_syncobj_stub_fence_enable_signaling, - .release = NULL, -}; - static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) { struct drm_syncobj_stub_fence *fence; @@ -215,6 +424,143 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) return 0; }

+static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_lookup_wait_pt_fence(struct drm_syncobj *syncobj, + u64 point, + struct dma_fence **fence) +{ + struct rb_node *node = syncobj->syncobj_timeline.wait_pt_tree.rb_node; + struct drm_syncobj_wait_pt *wait_pt = NULL; + + + spin_lock(&syncobj->lock); + while(node) { + int result; + + wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); + result = point - wait_pt->value; + if (result < 0) { + node = node->rb_left; + } else if (result > 0) { + node = node->rb_right; + } else { + if (fence) + *fence = dma_fence_get(&wait_pt->base.base); + break; + } + } + spin_unlock(&syncobj->lock); + + return wait_pt; +} + +static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_create_wait_pt_and_return_fence(struct drm_syncobj *syncobj, + u64 point, + struct dma_fence **fence) +{ + struct drm_syncobj_wait_pt *wait_pt = kzalloc(sizeof(*wait_pt), + GFP_KERNEL); + struct drm_syncobj_stub_fence *base; + struct rb_node **new = &(syncobj->syncobj_timeline.wait_pt_tree.rb_node), *parent = NULL; + struct drm_syncobj_signal_pt *tail_signal_pt = + list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, + struct drm_syncobj_signal_pt, list); + + if (!wait_pt) + return NULL; + base = &wait_pt->base; + spin_lock_init(&base->lock); + dma_fence_init(&base->base, + &drm_syncobj_stub_fence_ops, + &base->lock, + syncobj->syncobj_timeline.timeline_context, point); + wait_pt->value = point; + + /* wait pt must be in an order, so that we can easily lookup and signal + * it */ + spin_lock(&syncobj->lock); + if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && + point <= syncobj->syncobj_timeline.timeline) + dma_fence_signal(&wait_pt->base.base); + if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) && + (point == tail_signal_pt->value) && + dma_fence_is_signaled(&tail_signal_pt->base.base)) + dma_fence_signal(&wait_pt->base.base); + while(*new) { + struct drm_syncobj_wait_pt *this = + rb_entry(*new, struct drm_syncobj_wait_pt, node); + int result = wait_pt->value - this->value; + + parent = *new; + if (result < 0) + new = &((*new)->rb_left); + else if (result > 0) + new = &((*new)->rb_right); + else + goto exist; + } + if (fence) + *fence = dma_fence_get(&wait_pt->base.base); + rb_link_node(&wait_pt->node, parent, new); + rb_insert_color(&wait_pt->node, &syncobj->syncobj_timeline.wait_pt_tree); + spin_unlock(&syncobj->lock); + return wait_pt; +exist: + spin_unlock(&syncobj->lock); + dma_fence_put(&wait_pt->base.base); + wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, + fence); + return wait_pt; +} + +static struct dma_fence * +drm_syncobj_timeline_point_get(struct drm_syncobj *syncobj, u64 point, u64 flags) +{ + struct drm_syncobj_wait_pt *wait_pt; + struct dma_fence *fence = NULL; + + /* already signaled, simply return a signaled stub fence */ + if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && + point <= syncobj->syncobj_timeline.timeline) { + struct drm_syncobj_stub_fence *fence; + + fence = kzalloc(sizeof(*fence), GFP_KERNEL); + if (fence == NULL) + return NULL; + + spin_lock_init(&fence->lock); + dma_fence_init(&fence->base, &drm_syncobj_stub_fence_ops, + &fence->lock, 0, 0); + dma_fence_signal(&fence->base); + return &fence->base; + } + + /* check if the wait pt exists */ + wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, + &fence); + if (!fence) { + /* This is a new wait pt, so create it */ + wait_pt = drm_syncobj_timeline_create_wait_pt_and_return_fence(syncobj, point, + &fence); + if (!fence) + return NULL; + } + if (wait_pt) { + int ret = 0; + + if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { + ret = wait_event_interruptible_timeout(syncobj->syncobj_timeline.wq, + wait_pt->value <= syncobj->syncobj_timeline.signal_point, + msecs_to_jiffies(10000)); /* wait 10s */ + if (ret <= 0) + return NULL; + } + return fence; + } + return NULL; +} + /** * drm_syncobj_find_fence - lookup and reference the fence in a sync object * @file_private: drm file private pointer @@ -229,20 +575,45 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) * contains a reference to the fence, which must be released by calling * dma_fence_put(). */ -int drm_syncobj_find_fence(struct drm_file *file_private, - u32 handle, u64 point, - struct dma_fence **fence) +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, + u64 flags, struct dma_fence **fence) { - struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle); int ret = 0;

if (!syncobj) return -ENOENT;

- *fence = drm_syncobj_fence_get(syncobj); + if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { + /*NORMAL syncobj always wait on last pt */ + u64 tail_pt_value = syncobj->syncobj_timeline.signal_point; + + if (tail_pt_value == 0) + tail_pt_value += DRM_SYNCOBJ_NORMAL_POINT; + /* NORMAL syncobj doesn't care point value */ + WARN_ON(point != 0); + *fence = drm_syncobj_timeline_point_get(syncobj, tail_pt_value, + flags); + } else if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { + *fence = drm_syncobj_timeline_point_get(syncobj, point, + flags); + } else { + DRM_ERROR("Don't support this type syncobj\n"); + *fence = NULL; + } if (!*fence) { ret = -EINVAL; } + return ret; +} +EXPORT_SYMBOL(drm_syncobj_search_fence); +int drm_syncobj_find_fence(struct drm_file *file_private, + u32 handle, u64 point, + struct dma_fence **fence) { + struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle); + + int ret = drm_syncobj_search_fence(syncobj, point, + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT, + fence); drm_syncobj_put(syncobj); return ret; } @@ -259,7 +630,7 @@ void drm_syncobj_free(struct kref *kref) struct drm_syncobj *syncobj = container_of(kref, struct drm_syncobj, refcount); - drm_syncobj_replace_fence(syncobj, 0, NULL); + drm_syncobj_timeline_fini(syncobj, &syncobj->syncobj_timeline); kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free); @@ -287,8 +658,12 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, return -ENOMEM;

kref_init(&syncobj->refcount); - INIT_LIST_HEAD(&syncobj->cb_list); spin_lock_init(&syncobj->lock); + if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE) + syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE; + else + syncobj->type = DRM_SYNCOBJ_TYPE_NORMAL; + drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) { ret = drm_syncobj_assign_null_handle(syncobj); @@ -646,7 +1021,6 @@ struct syncobj_wait_entry { struct task_struct *task; struct dma_fence *fence; struct dma_fence_cb fence_cb; - struct drm_syncobj_cb syncobj_cb; };

static void syncobj_wait_fence_func(struct dma_fence *fence, @@ -658,18 +1032,6 @@ static void syncobj_wait_fence_func(struct dma_fence *fence, wake_up_process(wait->task); }

-static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, - struct drm_syncobj_cb *cb) -{ - struct syncobj_wait_entry *wait = - container_of(cb, struct syncobj_wait_entry, syncobj_cb); - - /* This happens inside the syncobj lock */ - wait->fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, - lockdep_is_held(&syncobj->lock))); - wake_up_process(wait->task); -} - static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, uint32_t count, uint32_t flags, @@ -693,14 +1055,11 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, signaled_count = 0; for (i = 0; i < count; ++i) { entries[i].task = current; - entries[i].fence = drm_syncobj_fence_get(syncobjs[i]); + ret = drm_syncobj_search_fence(syncobjs[i], 0, 0, + &entries[i].fence); if (!entries[i].fence) { - if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { - continue; - } else { - ret = -EINVAL; - goto cleanup_entries; - } + ret = -EINVAL; + goto cleanup_entries; }

if (dma_fence_is_signaled(entries[i].fence)) { @@ -728,15 +1087,6 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, * fallthough and try a 0 timeout wait! */

- if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { - for (i = 0; i < count; ++i) { - drm_syncobj_fence_get_or_add_callback(syncobjs[i], - &entries[i].fence, - &entries[i].syncobj_cb, - syncobj_wait_syncobj_func); - } - } - do { set_current_state(TASK_INTERRUPTIBLE);

@@ -784,13 +1134,10 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,

cleanup_entries: for (i = 0; i < count; ++i) { - if (entries[i].syncobj_cb.func) - drm_syncobj_remove_callback(syncobjs[i], - &entries[i].syncobj_cb); + dma_fence_put(entries[i].fence); if (entries[i].fence_cb.func) dma_fence_remove_callback(entries[i].fence, &entries[i].fence_cb); - dma_fence_put(entries[i].fence); } kfree(entries);

@@ -965,12 +1312,20 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, if (ret < 0) return ret;

- for (i = 0; i < args->count_handles; i++) - drm_syncobj_replace_fence(syncobjs[i], 0, NULL); - + for (i = 0; i < args->count_handles; i++) { + if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) { + DRM_ERROR("timeline syncobj cannot reset!\n"); + ret = -EINVAL; + goto out; + } + drm_syncobj_timeline_fini(syncobjs[i], + &syncobjs[i]->syncobj_timeline); + drm_syncobj_timeline_init(&syncobjs[i]->syncobj_timeline); + } +out: drm_syncobj_array_free(syncobjs, args->count_handles);

- return 0; + return ret; }

int diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 7209dd832d39..bb20d318c9d6 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2182,7 +2182,9 @@ await_fence_array(struct i915_execbuffer *eb, if (!(flags & I915_EXEC_FENCE_WAIT)) continue;

- fence = drm_syncobj_fence_get(syncobj); + drm_syncobj_search_fence(syncobj, 0, + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT, + &fence); if (!fence) return -EINVAL;

diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 425432b85a87..657c97dc25ec 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -30,6 +30,25 @@

struct drm_syncobj_cb;

+enum drm_syncobj_type { + DRM_SYNCOBJ_TYPE_NORMAL, + DRM_SYNCOBJ_TYPE_TIMELINE +}; + +struct drm_syncobj_timeline { + wait_queue_head_t wq; + u64 timeline_context; + /** + * @timeline: syncobj timeline + */ + u64 timeline; + u64 signal_point; + + + struct rb_root wait_pt_tree; + struct list_head signal_pt_list; +}; + /** * struct drm_syncobj - sync object. * @@ -41,19 +60,16 @@ struct drm_syncobj { */ struct kref refcount; /** - * @fence: - * NULL or a pointer to the fence bound to this object. - * - * This field should not be used directly. Use drm_syncobj_fence_get() - * and drm_syncobj_replace_fence() instead. + * @type: indicate syncobj type */ - struct dma_fence __rcu *fence; + enum drm_syncobj_type type; /** - * @cb_list: List of callbacks to call when the &fence gets replaced. + * @syncobj_timeline: timeline */ - struct list_head cb_list; + struct drm_syncobj_timeline syncobj_timeline; + /** - * @lock: Protects &cb_list and write-locks &fence. + * @lock: Protects timeline list and write-locks &fence. */ spinlock_t lock; /** @@ -62,25 +78,6 @@ struct drm_syncobj { struct file *file; };

-typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj, - struct drm_syncobj_cb *cb); - -/** - * struct drm_syncobj_cb - callback for drm_syncobj_add_callback - * @node: used by drm_syncob_add_callback to append this struct to - * &drm_syncobj.cb_list - * @func: drm_syncobj_func_t to call - * - * This struct will be initialized by drm_syncobj_add_callback, additional - * data can be passed along by embedding drm_syncobj_cb in another struct. - * The callback will get called the next time drm_syncobj_replace_fence is - * called. - */ -struct drm_syncobj_cb { - struct list_head node; - drm_syncobj_func_t func; -}; - void drm_syncobj_free(struct kref *kref);

/** @@ -106,29 +103,6 @@ drm_syncobj_put(struct drm_syncobj *obj) kref_put(&obj->refcount, drm_syncobj_free); }

-/** - * drm_syncobj_fence_get - get a reference to a fence in a sync object - * @syncobj: sync object. - * - * This acquires additional reference to &drm_syncobj.fence contained in @obj, - * if not NULL. It is illegal to call this without already holding a reference. - * No locks required. - * - * Returns: - * Either the fence of @obj or NULL if there's none. - */ -static inline struct dma_fence * -drm_syncobj_fence_get(struct drm_syncobj *syncobj) -{ - struct dma_fence *fence; - - rcu_read_lock(); - fence = dma_fence_get_rcu_safe(&syncobj->fence); - rcu_read_unlock(); - - return fence; -} - struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, u32 handle); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point, @@ -142,5 +116,7 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, int drm_syncobj_get_handle(struct drm_file *file_private, struct drm_syncobj *syncobj, u32 *handle); int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd); +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, + u64 flags, struct dma_fence **fence);

#endif diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 300f336633f2..cebdb2541eb7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -717,6 +717,7 @@ struct drm_prime_handle { struct drm_syncobj_create { __u32 handle; #define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0) +#define DRM_SYNCOBJ_CREATE_TYPE_TIMELINE (1 << 1) __u32 flags; };

-- 2.17.1

Christian König

11:42 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

...

VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations: * CPU query - A host operation that allows querying the payload of the timeline semaphore. * CPU wait - A host operation that allows a blocking wait for a timeline semaphore to reach a specified value. * Device wait - A device operation that allows waiting for a timeline semaphore to reach a specified value. * Device signal - A device operation that allows advancing the timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that.

v2:

remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)

move unexposed denitions to .c file. (Daniel Vetter)

split up the change to drm_syncobj_find_fence() in a separate patch. (Christian)

split up the change to drm_syncobj_replace_fence() in a separate patch.

drop the submission_fence implementation and instead use wait_event() for that. (Christian)

WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:

replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT.

some bug fix and clean up

tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:

CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou david1.zhou@amd.com Cc: Christian Konig christian.koenig@amd.com Cc: Dave Airlie airlied@redhat.com Cc: Daniel Rakos Daniel.Rakos@amd.com Cc: Daniel Vetter daniel@ffwll.ch

drivers/gpu/drm/drm_syncobj.c | 593 ++++++++++++++++----- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 78 +-- include/uapi/drm/drm.h | 1 + 4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include <drm/drm_syncobj.h>

+/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1

+struct drm_syncobj_stub_fence {

struct dma_fence base;

spinlock_t lock;

+};

+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{
   return "syncobjstub";
+}

+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{

return !dma_fence_is_signaled(fence);

+} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{

kfree(f);

+} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {

.get_driver_name = drm_syncobj_stub_fence_get_name,

.get_timeline_name = drm_syncobj_stub_fence_get_name,

.enable_signaling = drm_syncobj_stub_fence_enable_signaling,

.release = drm_syncobj_stub_fence_release,

+};

Do we really need to move that around? Could reduce the size of the patch quite a bit if we don't.

...

+struct drm_syncobj_wait_pt {

struct drm_syncobj_stub_fence base;

u64 value;

struct rb_node node;

+}; +struct drm_syncobj_signal_pt {

struct drm_syncobj_stub_fence base;

struct dma_fence *signal_fence;

struct dma_fence *pre_pt_base;

struct dma_fence_cb signal_cb;

struct dma_fence_cb pre_pt_cb;

struct drm_syncobj *syncobj;

u64 value;

struct list_head list;

+};

What are those two structures good for and why is the stub fence their base?

...

/**

drm_syncobj_find - lookup and reference a sync object.

@file_private: drm file private pointer

@@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find);

-static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,
			    struct drm_syncobj_cb *cb,
			    drm_syncobj_func_t func)
+static void drm_syncobj_timeline_init(struct drm_syncobj_timeline
		      *syncobj_timeline)
{
cb->func = func;

list_add_tail(&cb->node, &syncobj->cb_list);

syncobj_timeline->timeline_context = dma_fence_context_alloc(1);

syncobj_timeline->timeline = 0;

syncobj_timeline->signal_point = 0;

init_waitqueue_head(&syncobj_timeline->wq);

syncobj_timeline->wait_pt_tree = RB_ROOT;

INIT_LIST_HEAD(&syncobj_timeline->signal_pt_list); }

-static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,
				 struct dma_fence **fence,
				 struct drm_syncobj_cb *cb,
				 drm_syncobj_func_t func)
+static void drm_syncobj_timeline_fini(struct drm_syncobj *syncobj,
		      struct drm_syncobj_timeline *syncobj_timeline)
{
int ret;
struct rb_node *node = NULL;

struct drm_syncobj_wait_pt *wait_pt = NULL;

struct drm_syncobj_signal_pt *signal_pt = NULL, *tmp;

spin_lock(&syncobj->lock);

for(node = rb_first(&syncobj_timeline->wait_pt_tree);
   node != NULL; ) {

Better use rbtree_postorder_for_each_entry_safe() for this.

Regards, Christian.

...

wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
node = rb_next(node);
rb_erase(&wait_pt->node,
	 &syncobj_timeline->wait_pt_tree);
RB_CLEAR_NODE(&wait_pt->node);
spin_unlock(&syncobj->lock);
dma_fence_wait(&wait_pt->base.base, true);
spin_lock(&syncobj->lock);
/* kfree(wait_pt) is excuted by fence put */
dma_fence_put(&wait_pt->base.base);
}

list_for_each_entry_safe(signal_pt, tmp,
		 &syncobj_timeline->signal_pt_list, list) {
list_del(&signal_pt->list);
if (signal_pt->signal_fence) {
	dma_fence_remove_callback(signal_pt->signal_fence,
				  &signal_pt->signal_cb);
	dma_fence_put(signal_pt->signal_fence);
}
if (signal_pt->pre_pt_base) {
	dma_fence_remove_callback(signal_pt->pre_pt_base,
				  &signal_pt->pre_pt_cb);
	dma_fence_put(signal_pt->pre_pt_base);
}
dma_fence_put(&signal_pt->base.base);
}

spin_unlock(&syncobj->lock);
+}
*fence = drm_syncobj_fence_get(syncobj);

if (*fence)
return 1;
+static bool drm_syncobj_normal_signal_wait_pts(struct drm_syncobj *syncobj,
			       u64 signal_pt)
+{

struct rb_node *node = NULL;

struct drm_syncobj_wait_pt *wait_pt = NULL;

spin_lock(&syncobj->lock);
/* We've already tried once to get a fence and failed. Now that we
* have the lock, try one more time just to be sure we don't add a
* callback when a fence has already been set.
*/
if (syncobj->fence) {
*fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,
						 lockdep_is_held(&syncobj->lock)));
ret = 1;
} else {
*fence = NULL;
drm_syncobj_add_callback_locked(syncobj, cb, func);
ret = 0;
for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree);
   node != NULL; ) {
wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
node = rb_next(node);
/* for normal syncobj */
if (wait_pt->value == signal_pt) {
	dma_fence_signal(&wait_pt->base.base);
	rb_erase(&wait_pt->node,
		 &syncobj->syncobj_timeline.wait_pt_tree);
	RB_CLEAR_NODE(&wait_pt->node);
	/* kfree(wait_pt) is excuted by fence put */
	dma_fence_put(&wait_pt->base.base);
	spin_unlock(&syncobj->lock);
	return true;
}
} spin_unlock(&syncobj->lock);
return false;
+}

return ret;

+static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj) +{
struct rb_node *node = NULL;

struct drm_syncobj_wait_pt *wait_pt = NULL;

spin_lock(&syncobj->lock);

for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree);
   node != NULL; ) {
wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
node = rb_next(node);
if (wait_pt->value <= syncobj->syncobj_timeline.timeline) {
	dma_fence_signal(&wait_pt->base.base);
	rb_erase(&wait_pt->node,
		 &syncobj->syncobj_timeline.wait_pt_tree);
	RB_CLEAR_NODE(&wait_pt->node);
	/* kfree(wait_pt) is excuted by fence put */
	dma_fence_put(&wait_pt->base.base);
} else {
	/* the loop is from left to right, the later entry value is
	 * bigger, so don't need to check any more */
	break;
}
}

spin_unlock(&syncobj->lock); }
-void drm_syncobj_add_callback(struct drm_syncobj *syncobj,
	      struct drm_syncobj_cb *cb,
	      drm_syncobj_func_t func)
+static void pt_fence_cb(struct drm_syncobj_signal_pt *signal_pt) {

struct dma_fence *fence = NULL;

struct drm_syncobj *syncobj;

struct drm_syncobj_signal_pt *tail_pt;

u64 pt_value = signal_pt->value;

dma_fence_signal(&signal_pt->base.base);

fence = signal_pt->signal_fence;

signal_pt->signal_fence = NULL;

dma_fence_put(fence);

fence = signal_pt->pre_pt_base;

signal_pt->pre_pt_base = NULL;

dma_fence_put(fence);

syncobj = signal_pt->syncobj; spin_lock(&syncobj->lock);

drm_syncobj_add_callback_locked(syncobj, cb, func);
syncobj->syncobj_timeline.timeline = pt_value;

tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list,
		  struct drm_syncobj_signal_pt, list);
if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL && signal_pt != tail_pt)
   || syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
list_del(&signal_pt->list);
/* kfree(signal_pt) will be  executed by below fence put */
dma_fence_put(&signal_pt->base.base);
} spin_unlock(&syncobj->lock);

if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL)
drm_syncobj_normal_signal_wait_pts(syncobj, pt_value);
else
drm_syncobj_timeline_signal_wait_pts(syncobj);
}
+static void pt_signal_fence_func(struct dma_fence *fence,
		 struct dma_fence_cb *cb)
+{
struct drm_syncobj_signal_pt *signal_pt =
container_of(cb, struct drm_syncobj_signal_pt, signal_cb);
-void drm_syncobj_remove_callback(struct drm_syncobj *syncobj,
		 struct drm_syncobj_cb *cb)
if (signal_pt->pre_pt_base &&
   !dma_fence_is_signaled(signal_pt->pre_pt_base))
return;
pt_fence_cb(signal_pt);
+} +static void pt_pre_fence_func(struct dma_fence *fence,
		 struct dma_fence_cb *cb)
+{
struct drm_syncobj_signal_pt *signal_pt =
container_of(cb, struct drm_syncobj_signal_pt, pre_pt_cb);
if (signal_pt->signal_fence &&
   !dma_fence_is_signaled(signal_pt->pre_pt_base))
return;
pt_fence_cb(signal_pt);
+}

+static int drm_syncobj_timeline_create_signal_pt(struct drm_syncobj *syncobj,
				 struct dma_fence *fence,
				 u64 point)
{
struct drm_syncobj_signal_pt *signal_pt =
kzalloc(sizeof(struct drm_syncobj_signal_pt), GFP_KERNEL);
struct drm_syncobj_stub_fence *base;

struct drm_syncobj_signal_pt *tail_pt;

struct dma_fence *tail_pt_fence = NULL;

int ret = 0;

if (!signal_pt)
return -ENOMEM;
if (syncobj->syncobj_timeline.signal_point >= point) {
DRM_WARN("A later signal is ready!");
goto out;
}

if (!fence)
goto out;
dma_fence_get(fence); spin_lock(&syncobj->lock);
list_del_init(&cb->node);
base = &signal_pt->base;

spin_lock_init(&base->lock);

dma_fence_init(&base->base,
       &drm_syncobj_stub_fence_ops,
       &base->lock,
       syncobj->syncobj_timeline.timeline_context, point);
signal_pt->signal_fence = fence;

/* timeline syncobj must take this dependency */

if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
if (!list_empty(&syncobj->syncobj_timeline.signal_pt_list)) {
	tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list,
				  struct drm_syncobj_signal_pt, list);
	tail_pt_fence = &tail_pt->base.base;
	if (dma_fence_is_signaled(tail_pt_fence))
		tail_pt_fence = NULL;
	else
		signal_pt->pre_pt_base =
			dma_fence_get(tail_pt_fence);
}
}

signal_pt->value = point;

syncobj->syncobj_timeline.signal_point = point;

signal_pt->syncobj = syncobj;

INIT_LIST_HEAD(&signal_pt->list);

list_add_tail(&signal_pt->list, &syncobj->syncobj_timeline.signal_pt_list); spin_unlock(&syncobj->lock);

wake_up_all(&syncobj->syncobj_timeline.wq);

/**
* Every pt is depending on signal fence and previous pt fence, add
* callbacks to them
*/
ret = dma_fence_add_callback(signal_pt->signal_fence,
		     &signal_pt->signal_cb,
		     pt_signal_fence_func);
if (signal_pt->pre_pt_base) {
ret = dma_fence_add_callback(signal_pt->pre_pt_base,
			     &signal_pt->pre_pt_cb,
			     pt_pre_fence_func);
if (ret == -ENOENT)
	pt_pre_fence_func(signal_pt->pre_pt_base,
			  &signal_pt->pre_pt_cb);
else if (ret)
	goto out;
} else if (ret == -ENOENT) {
pt_signal_fence_func(signal_pt->signal_fence,
		     &signal_pt->signal_cb);
} else if (ret) {
	goto out;
}

return 0;
+out:

dma_fence_put(&signal_pt->base.base);

return ret; }

/**

@@ -149,53 +381,30 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point, struct dma_fence *fence) {
struct dma_fence *old_fence;

struct drm_syncobj_cb *cur, *tmp;

if (fence)
dma_fence_get(fence);
spin_lock(&syncobj->lock);

old_fence = rcu_dereference_protected(syncobj->fence,
			      lockdep_is_held(&syncobj->lock));
rcu_assign_pointer(syncobj->fence, fence);

if (fence != old_fence) {
list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) {
	list_del_init(&cur->node);
	cur->func(syncobj, cur);
if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
if (fence)
	drm_syncobj_timeline_create_signal_pt(syncobj, fence,
					      point);
return;
} else if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) {
u64 pt_value;
if (!fence) {
	drm_syncobj_timeline_fini(syncobj,
				  &syncobj->syncobj_timeline);
	drm_syncobj_timeline_init(&syncobj->syncobj_timeline);
	return;
}
pt_value = syncobj->syncobj_timeline.signal_point +
	DRM_SYNCOBJ_NORMAL_POINT;
drm_syncobj_timeline_create_signal_pt(syncobj, fence, pt_value);
return;
} else {
DRM_ERROR("the syncobj type isn't support\n");
}
spin_unlock(&syncobj->lock);

dma_fence_put(old_fence); } EXPORT_SYMBOL(drm_syncobj_replace_fence);

-struct drm_syncobj_stub_fence {

struct dma_fence base;

spinlock_t lock;

-};

-static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) -{
   return "syncobjstub";
-}

-static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) -{

return !dma_fence_is_signaled(fence);

-}

-static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {

.get_driver_name = drm_syncobj_stub_fence_get_name,

.get_timeline_name = drm_syncobj_stub_fence_get_name,

.enable_signaling = drm_syncobj_stub_fence_enable_signaling,

.release = NULL,

-};

static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) { struct drm_syncobj_stub_fence *fence;

@@ -215,6 +424,143 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) return 0; }

+static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_lookup_wait_pt_fence(struct drm_syncobj *syncobj,
			  u64 point,
			  struct dma_fence **fence)
+{
struct rb_node *node = syncobj->syncobj_timeline.wait_pt_tree.rb_node;

struct drm_syncobj_wait_pt *wait_pt = NULL;

spin_lock(&syncobj->lock);

while(node) {
   int result;
   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
   result = point - wait_pt->value;
   if (result < 0) {
    node = node->rb_left;
   } else if (result > 0) {
    node = node->rb_right;
   } else {
    if (fence)
	*fence = dma_fence_get(&wait_pt->base.base);
    break;
   }
}

spin_unlock(&syncobj->lock);

return wait_pt;
+}

+static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_create_wait_pt_and_return_fence(struct drm_syncobj *syncobj,
				     u64 point,
				     struct dma_fence **fence)
+{
struct drm_syncobj_wait_pt *wait_pt = kzalloc(sizeof(*wait_pt),
				      GFP_KERNEL);
struct drm_syncobj_stub_fence *base;

struct rb_node **new = &(syncobj->syncobj_timeline.wait_pt_tree.rb_node), *parent = NULL;

struct drm_syncobj_signal_pt *tail_signal_pt =
list_last_entry(&syncobj->syncobj_timeline.signal_pt_list,
		struct drm_syncobj_signal_pt, list);
if (!wait_pt)
return NULL;
base = &wait_pt->base;

spin_lock_init(&base->lock);

dma_fence_init(&base->base,
       &drm_syncobj_stub_fence_ops,
       &base->lock,
       syncobj->syncobj_timeline.timeline_context, point);
wait_pt->value = point;

/* wait pt must be in an order, so that we can easily lookup and signal
* it */
spin_lock(&syncobj->lock);

if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE &&
   point <= syncobj->syncobj_timeline.timeline)
dma_fence_signal(&wait_pt->base.base);
if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) &&
   (point == tail_signal_pt->value) &&
   dma_fence_is_signaled(&tail_signal_pt->base.base))
dma_fence_signal(&wait_pt->base.base);
while(*new) {
struct drm_syncobj_wait_pt *this =
	rb_entry(*new, struct drm_syncobj_wait_pt, node);
int result = wait_pt->value - this->value;
parent = *new;
if (result < 0)
	new = &((*new)->rb_left);
else if (result > 0)
	new = &((*new)->rb_right);
else
	goto exist;
}

if (fence)
*fence = dma_fence_get(&wait_pt->base.base);
rb_link_node(&wait_pt->node, parent, new);

rb_insert_color(&wait_pt->node, &syncobj->syncobj_timeline.wait_pt_tree);

spin_unlock(&syncobj->lock);

return wait_pt;
+exist:
spin_unlock(&syncobj->lock);

dma_fence_put(&wait_pt->base.base);

wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point,
					    fence);
return wait_pt;
+}

+static struct dma_fence * +drm_syncobj_timeline_point_get(struct drm_syncobj *syncobj, u64 point, u64 flags) +{
struct drm_syncobj_wait_pt *wait_pt;

struct dma_fence *fence = NULL;

/* already signaled, simply return a signaled stub fence */

if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE &&
   point <= syncobj->syncobj_timeline.timeline) {
struct drm_syncobj_stub_fence *fence;
fence = kzalloc(sizeof(*fence), GFP_KERNEL);
if (fence == NULL)
	return NULL;
spin_lock_init(&fence->lock);
dma_fence_init(&fence->base, &drm_syncobj_stub_fence_ops,
	       &fence->lock, 0, 0);
dma_fence_signal(&fence->base);
return &fence->base;
}

/* check if the wait pt exists */

wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point,
					    &fence);
if (!fence) {
/* This is a new wait pt, so create it */
wait_pt = drm_syncobj_timeline_create_wait_pt_and_return_fence(syncobj, point,
					      &fence);
if (!fence)
	return NULL;
}

if (wait_pt) {
int ret = 0;
if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
	ret = wait_event_interruptible_timeout(syncobj->syncobj_timeline.wq,
					       wait_pt->value <= syncobj->syncobj_timeline.signal_point,
					       msecs_to_jiffies(10000)); /* wait 10s */
	if (ret <= 0)
		return NULL;
}
return fence;
}

return NULL;
+}

/**

drm_syncobj_find_fence - lookup and reference the fence in a sync object

@file_private: drm file private pointer

@@ -229,20 +575,45 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)

contains a reference to the fence, which must be released by calling

dma_fence_put().

*/ -int drm_syncobj_find_fence(struct drm_file *file_private,
	   u32 handle, u64 point,
	   struct dma_fence **fence)
+int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point,
	     u64 flags, struct dma_fence **fence)
{
struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle); int ret = 0;

if (!syncobj) return -ENOENT;

*fence = drm_syncobj_fence_get(syncobj);
if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) {
/*NORMAL syncobj always wait on last pt */
u64 tail_pt_value = syncobj->syncobj_timeline.signal_point;
if (tail_pt_value == 0)
	tail_pt_value += DRM_SYNCOBJ_NORMAL_POINT;
/* NORMAL syncobj doesn't care point value */
WARN_ON(point != 0);
*fence = drm_syncobj_timeline_point_get(syncobj, tail_pt_value,
					flags);
} else if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
*fence = drm_syncobj_timeline_point_get(syncobj, point,
					flags);
} else {
DRM_ERROR("Don't support this type syncobj\n");
*fence = NULL;
} if (!*fence) { ret = -EINVAL; }

return ret;
+} +EXPORT_SYMBOL(drm_syncobj_search_fence); +int drm_syncobj_find_fence(struct drm_file *file_private,
	   u32 handle, u64 point,
	   struct dma_fence **fence) {
struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);

int ret = drm_syncobj_search_fence(syncobj, point,
			DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
			fence);
drm_syncobj_put(syncobj); return ret; }
@@ -259,7 +630,7 @@ void drm_syncobj_free(struct kref *kref) struct drm_syncobj *syncobj = container_of(kref, struct drm_syncobj, refcount);

drm_syncobj_replace_fence(syncobj, 0, NULL);

drm_syncobj_timeline_fini(syncobj, &syncobj->syncobj_timeline); kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free);

@@ -287,8 +658,12 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, return -ENOMEM;

kref_init(&syncobj->refcount);

INIT_LIST_HEAD(&syncobj->cb_list); spin_lock_init(&syncobj->lock);
if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE)
syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE;
else
syncobj->type = DRM_SYNCOBJ_TYPE_NORMAL;
drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) { ret = drm_syncobj_assign_null_handle(syncobj);
@@ -646,7 +1021,6 @@ struct syncobj_wait_entry { struct task_struct *task; struct dma_fence *fence; struct dma_fence_cb fence_cb;

struct drm_syncobj_cb syncobj_cb; };

static void syncobj_wait_fence_func(struct dma_fence *fence,

@@ -658,18 +1032,6 @@ static void syncobj_wait_fence_func(struct dma_fence *fence, wake_up_process(wait->task); }

-static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
		      struct drm_syncobj_cb *cb)
-{
struct syncobj_wait_entry *wait =
container_of(cb, struct syncobj_wait_entry, syncobj_cb);
/* This happens inside the syncobj lock */

wait->fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,
					      lockdep_is_held(&syncobj->lock)));
wake_up_process(wait->task);
-}

static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, uint32_t count, uint32_t flags,

@@ -693,14 +1055,11 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, signaled_count = 0; for (i = 0; i < count; ++i) { entries[i].task = current;
entries[i].fence = drm_syncobj_fence_get(syncobjs[i]);
ret = drm_syncobj_search_fence(syncobjs[i], 0, 0,
			       &entries[i].fence);
if (!entries[i].fence) {
	if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
		continue;
	} else {
		ret = -EINVAL;
		goto cleanup_entries;
	}
	ret = -EINVAL;
	goto cleanup_entries;
}

if (dma_fence_is_signaled(entries[i].fence)) {
@@ -728,15 +1087,6 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, * fallthough and try a 0 timeout wait! */
if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
for (i = 0; i < count; ++i) {
	drm_syncobj_fence_get_or_add_callback(syncobjs[i],
					      &entries[i].fence,
					      &entries[i].syncobj_cb,
					      syncobj_wait_syncobj_func);
}
}

do { set_current_state(TASK_INTERRUPTIBLE);
@@ -784,13 +1134,10 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,

cleanup_entries: for (i = 0; i < count; ++i) {
if (entries[i].syncobj_cb.func)
	drm_syncobj_remove_callback(syncobjs[i],
				    &entries[i].syncobj_cb);
dma_fence_put(entries[i].fence);
if (entries[i].fence_cb.func) dma_fence_remove_callback(entries[i].fence, &entries[i].fence_cb);
dma_fence_put(entries[i].fence);
} kfree(entries);
@@ -965,12 +1312,20 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, if (ret < 0) return ret;
for (i = 0; i < args->count_handles; i++)
drm_syncobj_replace_fence(syncobjs[i], 0, NULL);
for (i = 0; i < args->count_handles; i++) {
if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) {
	DRM_ERROR("timeline syncobj cannot reset!\n");
	ret = -EINVAL;
	goto out;
}
drm_syncobj_timeline_fini(syncobjs[i],
			  &syncobjs[i]->syncobj_timeline);
drm_syncobj_timeline_init(&syncobjs[i]->syncobj_timeline);
}
+out: drm_syncobj_array_free(syncobjs, args->count_handles);

return 0;

return ret; }

int

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 7209dd832d39..bb20d318c9d6 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2182,7 +2182,9 @@ await_fence_array(struct i915_execbuffer *eb, if (!(flags & I915_EXEC_FENCE_WAIT)) continue;
fence = drm_syncobj_fence_get(syncobj);
drm_syncobj_search_fence(syncobj, 0,
			 DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
			 &fence);
if (!fence) return -EINVAL;
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 425432b85a87..657c97dc25ec 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -30,6 +30,25 @@

struct drm_syncobj_cb;

+enum drm_syncobj_type {

DRM_SYNCOBJ_TYPE_NORMAL,

DRM_SYNCOBJ_TYPE_TIMELINE

+};

+struct drm_syncobj_timeline {
wait_queue_head_t wq;

u64 timeline_context;

/**
* @timeline: syncobj timeline
*/
u64 timeline;

u64 signal_point;

struct rb_root wait_pt_tree;

struct list_head signal_pt_list;
+};

/**

struct drm_syncobj - sync object.

@@ -41,19 +60,16 @@ struct drm_syncobj { */ struct kref refcount; /**
* @fence:
* NULL or a pointer to the fence bound to this object.
*
* This field should not be used directly. Use drm_syncobj_fence_get()
* and drm_syncobj_replace_fence() instead.
* @type: indicate syncobj type
*/
struct dma_fence __rcu *fence;

enum drm_syncobj_type type; /**
* @cb_list: List of callbacks to call when the &fence gets replaced.
* @syncobj_timeline: timeline
*/
struct list_head cb_list;

struct drm_syncobj_timeline syncobj_timeline;

/**
* @lock: Protects &cb_list and write-locks &fence.
* @lock: Protects timeline list and write-locks &fence.
*/ spinlock_t lock; /**
@@ -62,25 +78,6 @@ struct drm_syncobj { struct file *file; };

-typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj,
		   struct drm_syncobj_cb *cb);
-/**
struct drm_syncobj_cb - callback for drm_syncobj_add_callback

@node: used by drm_syncob_add_callback to append this struct to
 &drm_syncobj.cb_list
@func: drm_syncobj_func_t to call

This struct will be initialized by drm_syncobj_add_callback, additional

data can be passed along by embedding drm_syncobj_cb in another struct.

The callback will get called the next time drm_syncobj_replace_fence is

called.

*/
-struct drm_syncobj_cb {

struct list_head node;

drm_syncobj_func_t func;

-};

void drm_syncobj_free(struct kref *kref);

/**

@@ -106,29 +103,6 @@ drm_syncobj_put(struct drm_syncobj *obj) kref_put(&obj->refcount, drm_syncobj_free); }

-/**

drm_syncobj_fence_get - get a reference to a fence in a sync object

@syncobj: sync object.

This acquires additional reference to &drm_syncobj.fence contained in @obj,

if not NULL. It is illegal to call this without already holding a reference.

No locks required.

Returns:

Either the fence of @obj or NULL if there's none.

*/

-static inline struct dma_fence * -drm_syncobj_fence_get(struct drm_syncobj *syncobj) -{

struct dma_fence *fence;

rcu_read_lock();

fence = dma_fence_get_rcu_safe(&syncobj->fence);

rcu_read_unlock();

return fence;

-}

struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, u32 handle); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point,

@@ -142,5 +116,7 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, int drm_syncobj_get_handle(struct drm_file *file_private, struct drm_syncobj *syncobj, u32 *handle); int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd); +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point,
	     u64 flags, struct dma_fence **fence);
#endif
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 300f336633f2..cebdb2541eb7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -717,6 +717,7 @@ struct drm_prime_handle { struct drm_syncobj_create { __u32 handle; #define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0) +#define DRM_SYNCOBJ_CREATE_TYPE_TIMELINE (1 << 1) __u32 flags; };

zhoucm1

30 Aug 30 Aug

3:50 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

On 2018年08月29日 19:42, Christian König wrote:

...

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

...
VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations:     * CPU query - A host operation that allows querying the payload of the       timeline semaphore.     * CPU wait - A host operation that allows a blocking wait for a       timeline semaphore to reach a specified value.     * Device wait - A device operation that allows waiting for a       timeline semaphore to reach a specified value.     * Device signal - A device operation that allows advancing the       timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that.

v2:

remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)

move unexposed denitions to .c file. (Daniel Vetter)

split up the change to drm_syncobj_find_fence() in a separate

patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:

replace normal syncobj with timeline implemenation. (Vetter and

Christian)      a. normal syncobj signal op will create a signal PT to tail of signal pt list.      b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. some bug fix and clean up 3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:

CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou david1.zhou@amd.com Cc: Christian Konig christian.koenig@amd.com Cc: Dave Airlie airlied@redhat.com Cc: Daniel Rakos Daniel.Rakos@amd.com Cc: Daniel Vetter daniel@ffwll.ch

drivers/gpu/drm/drm_syncobj.c              | 593 ++++++++++++++++----- drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +- include/drm/drm_syncobj.h                  | 78 +-- include/uapi/drm/drm.h                     |   1 + 4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include <drm/drm_syncobj.h> +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1

+struct drm_syncobj_stub_fence { +    struct dma_fence base; +    spinlock_t lock; +};

+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{ +        return "syncobjstub"; +}

+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{ +    return !dma_fence_is_signaled(fence); +} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{ +    kfree(f); +} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { +    .get_driver_name = drm_syncobj_stub_fence_get_name, +    .get_timeline_name = drm_syncobj_stub_fence_get_name, +    .enable_signaling = drm_syncobj_stub_fence_enable_signaling, +    .release = drm_syncobj_stub_fence_release, +};

Do we really need to move that around? Could reduce the size of the patch quite a bit if we don't.

stub fence is used widely in both normal and timeline syncobj, if you think which increase patch size, I can do a separate patch for that.

...

...

+struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +};

What are those two structures good for

They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value.

For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt.

...

and why is the stub fence their base?

Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base.

...

...

/**    * drm_syncobj_find - lookup and reference a sync object.    * @file_private: drm file private pointer @@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find); -static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, -                        struct drm_syncobj_cb *cb, -                        drm_syncobj_func_t func) +static void drm_syncobj_timeline_init(struct drm_syncobj_timeline +                      *syncobj_timeline) { -    cb->func = func; -    list_add_tail(&cb->node, &syncobj->cb_list); +    syncobj_timeline->timeline_context = dma_fence_context_alloc(1); +    syncobj_timeline->timeline = 0; +    syncobj_timeline->signal_point = 0; +    init_waitqueue_head(&syncobj_timeline->wq);

+    syncobj_timeline->wait_pt_tree = RB_ROOT; +    INIT_LIST_HEAD(&syncobj_timeline->signal_pt_list); } -static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, -                         struct dma_fence **fence, -                         struct drm_syncobj_cb *cb, -                         drm_syncobj_func_t func) +static void drm_syncobj_timeline_fini(struct drm_syncobj *syncobj, +                      struct drm_syncobj_timeline *syncobj_timeline) { -    int ret; +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL; +    struct drm_syncobj_signal_pt *signal_pt = NULL, *tmp;

+    spin_lock(&syncobj->lock); +    for(node = rb_first(&syncobj_timeline->wait_pt_tree); +        node != NULL; ) {

Better use rbtree_postorder_for_each_entry_safe() for this.

From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it is iterating over. This includes calling rb_erase() on @pos, as * rb_erase() may rebalance the tree, causing us to miss some nodes. */

Thanks, David Zhou

...

Regards, Christian.

...
+        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        rb_erase(&wait_pt->node, +             &syncobj_timeline->wait_pt_tree); +        RB_CLEAR_NODE(&wait_pt->node); +        spin_unlock(&syncobj->lock); +        dma_fence_wait(&wait_pt->base.base, true); +        spin_lock(&syncobj->lock); +        /* kfree(wait_pt) is excuted by fence put */ +        dma_fence_put(&wait_pt->base.base); +    } +    list_for_each_entry_safe(signal_pt, tmp, +                 &syncobj_timeline->signal_pt_list, list) { +        list_del(&signal_pt->list); +        if (signal_pt->signal_fence) {

dma_fence_remove_callback(signal_pt->signal_fence,

+                          &signal_pt->signal_cb); +            dma_fence_put(signal_pt->signal_fence); +        } +        if (signal_pt->pre_pt_base) {

dma_fence_remove_callback(signal_pt->pre_pt_base,

+                          &signal_pt->pre_pt_cb); +            dma_fence_put(signal_pt->pre_pt_base); +        } +        dma_fence_put(&signal_pt->base.base); +    } +    spin_unlock(&syncobj->lock); +}

-    *fence = drm_syncobj_fence_get(syncobj); -    if (*fence) -        return 1; +static bool drm_syncobj_normal_signal_wait_pts(struct drm_syncobj *syncobj, +                           u64 signal_pt) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;       spin_lock(&syncobj->lock); -    /* We've already tried once to get a fence and failed. Now that we -     * have the lock, try one more time just to be sure we don't add a -     * callback when a fence has already been set. -     */ -    if (syncobj->fence) { -        *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-        ret = 1; -    } else { -        *fence = NULL; -        drm_syncobj_add_callback_locked(syncobj, cb, func); -        ret = 0; +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        /* for normal syncobj */ +        if (wait_pt->value == signal_pt) { +            dma_fence_signal(&wait_pt->base.base); +            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +            spin_unlock(&syncobj->lock); +            return true; +        }       }       spin_unlock(&syncobj->lock); +    return false; +} -    return ret; +static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        if (wait_pt->value <= syncobj->syncobj_timeline.timeline) { +            dma_fence_signal(&wait_pt->base.base); +            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +        } else { +            /* the loop is from left to right, the later entry value is +             * bigger, so don't need to check any more */ +            break; +        } +    } +    spin_unlock(&syncobj->lock); } -void drm_syncobj_add_callback(struct drm_syncobj *syncobj, -                  struct drm_syncobj_cb *cb, -                  drm_syncobj_func_t func)

+static void pt_fence_cb(struct drm_syncobj_signal_pt *signal_pt) { +    struct dma_fence *fence = NULL; +    struct drm_syncobj *syncobj; +    struct drm_syncobj_signal_pt *tail_pt; +    u64 pt_value = signal_pt->value;

+    dma_fence_signal(&signal_pt->base.base); +    fence = signal_pt->signal_fence; +    signal_pt->signal_fence = NULL; +    dma_fence_put(fence); +    fence = signal_pt->pre_pt_base; +    signal_pt->pre_pt_base = NULL; +    dma_fence_put(fence);

+    syncobj = signal_pt->syncobj;       spin_lock(&syncobj->lock); -    drm_syncobj_add_callback_locked(syncobj, cb, func); +    syncobj->syncobj_timeline.timeline = pt_value; +    tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                  struct drm_syncobj_signal_pt, list); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL && signal_pt != tail_pt) +        || syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        list_del(&signal_pt->list); +        /* kfree(signal_pt) will be executed by below fence put */ +        dma_fence_put(&signal_pt->base.base); +    }       spin_unlock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) +        drm_syncobj_normal_signal_wait_pts(syncobj, pt_value); +    else +        drm_syncobj_timeline_signal_wait_pts(syncobj); } +static void pt_signal_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, signal_cb); -void drm_syncobj_remove_callback(struct drm_syncobj *syncobj, -                 struct drm_syncobj_cb *cb) +    if (signal_pt->pre_pt_base && +        !dma_fence_is_signaled(signal_pt->pre_pt_base)) +        return;

+    pt_fence_cb(signal_pt); +} +static void pt_pre_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, pre_pt_cb);

+    if (signal_pt->signal_fence && +        !dma_fence_is_signaled(signal_pt->pre_pt_base)) +        return;

+    pt_fence_cb(signal_pt); +}

+static int drm_syncobj_timeline_create_signal_pt(struct drm_syncobj *syncobj, +                         struct dma_fence *fence, +                         u64 point) { +    struct drm_syncobj_signal_pt *signal_pt = +        kzalloc(sizeof(struct drm_syncobj_signal_pt), GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct drm_syncobj_signal_pt *tail_pt; +    struct dma_fence *tail_pt_fence = NULL; +    int ret = 0;

+    if (!signal_pt) +        return -ENOMEM; +    if (syncobj->syncobj_timeline.signal_point >= point) { +        DRM_WARN("A later signal is ready!"); +        goto out; +    } +    if (!fence) +        goto out; +    dma_fence_get(fence);       spin_lock(&syncobj->lock); -    list_del_init(&cb->node); +    base = &signal_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock, +               syncobj->syncobj_timeline.timeline_context, point); +    signal_pt->signal_fence = fence; +    /* timeline syncobj must take this dependency */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (!list_empty(&syncobj->syncobj_timeline.signal_pt_list)) { +            tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                          struct drm_syncobj_signal_pt, list); +            tail_pt_fence = &tail_pt->base.base; +            if (dma_fence_is_signaled(tail_pt_fence)) +                tail_pt_fence = NULL; +            else +                signal_pt->pre_pt_base = +                    dma_fence_get(tail_pt_fence); +        } +    }

+    signal_pt->value = point; +    syncobj->syncobj_timeline.signal_point = point; +    signal_pt->syncobj = syncobj; +    INIT_LIST_HEAD(&signal_pt->list); +    list_add_tail(&signal_pt->list, &syncobj->syncobj_timeline.signal_pt_list);       spin_unlock(&syncobj->lock); +    wake_up_all(&syncobj->syncobj_timeline.wq); +    /** +     * Every pt is depending on signal fence and previous pt fence, add +     * callbacks to them +     */ +    ret = dma_fence_add_callback(signal_pt->signal_fence, +                     &signal_pt->signal_cb, +                     pt_signal_fence_func);

+    if (signal_pt->pre_pt_base) { +        ret = dma_fence_add_callback(signal_pt->pre_pt_base, +                         &signal_pt->pre_pt_cb, +                         pt_pre_fence_func); +        if (ret == -ENOENT) +            pt_pre_fence_func(signal_pt->pre_pt_base, +                      &signal_pt->pre_pt_cb); +        else if (ret) +            goto out; +    } else if (ret == -ENOENT) { +        pt_signal_fence_func(signal_pt->signal_fence, +                     &signal_pt->signal_cb); +    } else if (ret) { +            goto out; +    }

+    return 0; +out: +    dma_fence_put(&signal_pt->base.base); +    return ret; } /** @@ -149,53 +381,30 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,                      u64 point,                      struct dma_fence *fence) { -    struct dma_fence *old_fence; -    struct drm_syncobj_cb *cur, *tmp;

-    if (fence) -        dma_fence_get(fence);

-    spin_lock(&syncobj->lock);

-    old_fence = rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock));

-    rcu_assign_pointer(syncobj->fence, fence);

-    if (fence != old_fence) { -        list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { -            list_del_init(&cur->node); -            cur->func(syncobj, cur); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (fence) +            drm_syncobj_timeline_create_signal_pt(syncobj, fence, +                                  point); +        return; +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        u64 pt_value;

+        if (!fence) { +            drm_syncobj_timeline_fini(syncobj, +                          &syncobj->syncobj_timeline);

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

+            return;           } +        pt_value = syncobj->syncobj_timeline.signal_point + +            DRM_SYNCOBJ_NORMAL_POINT; +        drm_syncobj_timeline_create_signal_pt(syncobj, fence, pt_value); +        return; +    } else { +        DRM_ERROR("the syncobj type isn't support\n");       }

-    spin_unlock(&syncobj->lock);

-    dma_fence_put(old_fence); } EXPORT_SYMBOL(drm_syncobj_replace_fence); -struct drm_syncobj_stub_fence { -    struct dma_fence base; -    spinlock_t lock; -};

-static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) -{ -        return "syncobjstub"; -}

-static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) -{ -    return !dma_fence_is_signaled(fence); -}

-static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { -    .get_driver_name = drm_syncobj_stub_fence_get_name, -    .get_timeline_name = drm_syncobj_stub_fence_get_name, -    .enable_signaling = drm_syncobj_stub_fence_enable_signaling, -    .release = NULL, -};

static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) {       struct drm_syncobj_stub_fence *fence; @@ -215,6 +424,143 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)       return 0; } +static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_lookup_wait_pt_fence(struct drm_syncobj *syncobj, +                      u64 point, +                      struct dma_fence **fence) +{ +    struct rb_node *node = syncobj->syncobj_timeline.wait_pt_tree.rb_node; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    while(node) { +        int result;

+        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        result = point - wait_pt->value; +        if (result < 0) { +            node = node->rb_left; +        } else if (result > 0) { +            node = node->rb_right; +        } else { +            if (fence) +            *fence = dma_fence_get(&wait_pt->base.base); +            break; +        } +    } +    spin_unlock(&syncobj->lock);

+    return wait_pt; +}

+static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_create_wait_pt_and_return_fence(struct drm_syncobj *syncobj, +                             u64 point, +                             struct dma_fence **fence) +{ +    struct drm_syncobj_wait_pt *wait_pt = kzalloc(sizeof(*wait_pt), +                              GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct rb_node **new = &(syncobj->syncobj_timeline.wait_pt_tree.rb_node), *parent = NULL; +    struct drm_syncobj_signal_pt *tail_signal_pt =

list_last_entry(&syncobj->syncobj_timeline.signal_pt_list,

+                struct drm_syncobj_signal_pt, list);

+    if (!wait_pt) +        return NULL; +    base = &wait_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock, +               syncobj->syncobj_timeline.timeline_context, point); +    wait_pt->value = point;

+    /* wait pt must be in an order, so that we can easily lookup and signal +     * it */ +    spin_lock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) +        dma_fence_signal(&wait_pt->base.base); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) && +        (point == tail_signal_pt->value) &&

dma_fence_is_signaled(&tail_signal_pt->base.base))

+        dma_fence_signal(&wait_pt->base.base); +    while(*new) { +        struct drm_syncobj_wait_pt *this = +            rb_entry(*new, struct drm_syncobj_wait_pt, node); +        int result = wait_pt->value - this->value;

+        parent = *new; +        if (result < 0) +            new = &((*new)->rb_left); +        else if (result > 0) +            new = &((*new)->rb_right); +        else +            goto exist; +    } +    if (fence) +        *fence = dma_fence_get(&wait_pt->base.base); +    rb_link_node(&wait_pt->node, parent, new); +    rb_insert_color(&wait_pt->node, &syncobj->syncobj_timeline.wait_pt_tree); +    spin_unlock(&syncobj->lock); +    return wait_pt; +exist: +    spin_unlock(&syncobj->lock); +    dma_fence_put(&wait_pt->base.base); +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                fence); +    return wait_pt; +}

+static struct dma_fence * +drm_syncobj_timeline_point_get(struct drm_syncobj *syncobj, u64 point, u64 flags) +{ +    struct drm_syncobj_wait_pt *wait_pt; +    struct dma_fence *fence = NULL;

+    /* already signaled, simply return a signaled stub fence */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) { +        struct drm_syncobj_stub_fence *fence;

+        fence = kzalloc(sizeof(*fence), GFP_KERNEL); +        if (fence == NULL) +            return NULL;

+        spin_lock_init(&fence->lock); +        dma_fence_init(&fence->base, &drm_syncobj_stub_fence_ops, +                   &fence->lock, 0, 0); +        dma_fence_signal(&fence->base); +        return &fence->base; +    }

+    /* check if the wait pt exists */ +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                &fence); +    if (!fence) { +        /* This is a new wait pt, so create it */ +        wait_pt = drm_syncobj_timeline_create_wait_pt_and_return_fence(syncobj, point, +                                  &fence); +        if (!fence) +            return NULL; +    } +    if (wait_pt) { +        int ret = 0;

+        if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { +            ret = wait_event_interruptible_timeout(syncobj->syncobj_timeline.wq, +                                   wait_pt->value <= syncobj->syncobj_timeline.signal_point, +                                   msecs_to_jiffies(10000)); /* wait 10s */ +            if (ret <= 0) +                return NULL; +        } +        return fence; +    } +    return NULL; +}

/**    * drm_syncobj_find_fence - lookup and reference the fence in a sync object    * @file_private: drm file private pointer @@ -229,20 +575,45 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)    * contains a reference to the fence, which must be released by calling    * dma_fence_put().    */ -int drm_syncobj_find_fence(struct drm_file *file_private, -               u32 handle, u64 point, -               struct dma_fence **fence) +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence) { -    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);       int ret = 0;       if (!syncobj)           return -ENOENT; -    *fence = drm_syncobj_fence_get(syncobj); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        /*NORMAL syncobj always wait on last pt */ +        u64 tail_pt_value = syncobj->syncobj_timeline.signal_point;

+        if (tail_pt_value == 0) +            tail_pt_value += DRM_SYNCOBJ_NORMAL_POINT; +        /* NORMAL syncobj doesn't care point value */ +        WARN_ON(point != 0); +        *fence = drm_syncobj_timeline_point_get(syncobj, tail_pt_value, +                            flags); +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        *fence = drm_syncobj_timeline_point_get(syncobj, point, +                            flags); +    } else { +        DRM_ERROR("Don't support this type syncobj\n"); +        *fence = NULL; +    }       if (!*fence) {           ret = -EINVAL;       } +    return ret; +} +EXPORT_SYMBOL(drm_syncobj_search_fence); +int drm_syncobj_find_fence(struct drm_file *file_private, +               u32 handle, u64 point, +               struct dma_fence **fence) { +    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);

+    int ret = drm_syncobj_search_fence(syncobj, point, +                    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT, +                    fence);       drm_syncobj_put(syncobj);       return ret; } @@ -259,7 +630,7 @@ void drm_syncobj_free(struct kref *kref)       struct drm_syncobj *syncobj = container_of(kref,                              struct drm_syncobj,                              refcount); -    drm_syncobj_replace_fence(syncobj, 0, NULL); +    drm_syncobj_timeline_fini(syncobj, &syncobj->syncobj_timeline);       kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free); @@ -287,8 +658,12 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,           return -ENOMEM;       kref_init(&syncobj->refcount); -    INIT_LIST_HEAD(&syncobj->cb_list);       spin_lock_init(&syncobj->lock); +    if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE) +        syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE; +    else +        syncobj->type = DRM_SYNCOBJ_TYPE_NORMAL;

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {           ret = drm_syncobj_assign_null_handle(syncobj); @@ -646,7 +1021,6 @@ struct syncobj_wait_entry {       struct task_struct *task;       struct dma_fence *fence;       struct dma_fence_cb fence_cb; -    struct drm_syncobj_cb syncobj_cb; }; static void syncobj_wait_fence_func(struct dma_fence *fence, @@ -658,18 +1032,6 @@ static void syncobj_wait_fence_func(struct dma_fence *fence,       wake_up_process(wait->task); } -static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, -                      struct drm_syncobj_cb *cb) -{ -    struct syncobj_wait_entry *wait = -        container_of(cb, struct syncobj_wait_entry, syncobj_cb);

-    /* This happens inside the syncobj lock */ -    wait->fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-    wake_up_process(wait->task); -}

static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,                             uint32_t count,                             uint32_t flags, @@ -693,14 +1055,11 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,       signaled_count = 0;       for (i = 0; i < count; ++i) {           entries[i].task = current; -        entries[i].fence = drm_syncobj_fence_get(syncobjs[i]); +        ret = drm_syncobj_search_fence(syncobjs[i], 0, 0, +                           &entries[i].fence);           if (!entries[i].fence) { -            if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -                continue; -            } else { -                ret = -EINVAL; -                goto cleanup_entries; -            } +            ret = -EINVAL; +            goto cleanup_entries;           }           if (dma_fence_is_signaled(entries[i].fence)) { @@ -728,15 +1087,6 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,        * fallthough and try a 0 timeout wait!        */ -    if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -        for (i = 0; i < count; ++i) { -            drm_syncobj_fence_get_or_add_callback(syncobjs[i], -                                  &entries[i].fence, -                                  &entries[i].syncobj_cb, -                                  syncobj_wait_syncobj_func); -        } -    }

do {           set_current_state(TASK_INTERRUPTIBLE); @@ -784,13 +1134,10 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, cleanup_entries:       for (i = 0; i < count; ++i) { -        if (entries[i].syncobj_cb.func) -            drm_syncobj_remove_callback(syncobjs[i], -                            &entries[i].syncobj_cb); +        dma_fence_put(entries[i].fence);           if (entries[i].fence_cb.func)               dma_fence_remove_callback(entries[i].fence,                             &entries[i].fence_cb); -        dma_fence_put(entries[i].fence);       }       kfree(entries); @@ -965,12 +1312,20 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,       if (ret < 0)           return ret; -    for (i = 0; i < args->count_handles; i++) -        drm_syncobj_replace_fence(syncobjs[i], 0, NULL);

+    for (i = 0; i < args->count_handles; i++) { +        if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +            DRM_ERROR("timeline syncobj cannot reset!\n"); +            ret = -EINVAL; +            goto out; +        } +        drm_syncobj_timeline_fini(syncobjs[i], +                      &syncobjs[i]->syncobj_timeline);

drm_syncobj_timeline_init(&syncobjs[i]->syncobj_timeline);

+    } +out:       drm_syncobj_array_free(syncobjs, args->count_handles); -    return 0; +    return ret; } int diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 7209dd832d39..bb20d318c9d6 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2182,7 +2182,9 @@ await_fence_array(struct i915_execbuffer *eb,           if (!(flags & I915_EXEC_FENCE_WAIT))               continue; -        fence = drm_syncobj_fence_get(syncobj); +        drm_syncobj_search_fence(syncobj, 0, +                     DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT, +                     &fence);           if (!fence)               return -EINVAL; diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 425432b85a87..657c97dc25ec 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -30,6 +30,25 @@ struct drm_syncobj_cb; +enum drm_syncobj_type { +    DRM_SYNCOBJ_TYPE_NORMAL, +    DRM_SYNCOBJ_TYPE_TIMELINE +};

+struct drm_syncobj_timeline { +    wait_queue_head_t    wq; +    u64 timeline_context; +    /** +     * @timeline: syncobj timeline +     */ +    u64 timeline; +    u64 signal_point;

+    struct rb_root wait_pt_tree; +    struct list_head signal_pt_list; +};

/**    * struct drm_syncobj - sync object.    * @@ -41,19 +60,16 @@ struct drm_syncobj {        */       struct kref refcount;       /** -     * @fence: -     * NULL or a pointer to the fence bound to this object. -     * -     * This field should not be used directly. Use drm_syncobj_fence_get() -     * and drm_syncobj_replace_fence() instead. +     * @type: indicate syncobj type        */ -    struct dma_fence __rcu *fence; +    enum drm_syncobj_type type;       /** -     * @cb_list: List of callbacks to call when the &fence gets replaced. +     * @syncobj_timeline: timeline        */ -    struct list_head cb_list; +    struct drm_syncobj_timeline syncobj_timeline;

/** -     * @lock: Protects &cb_list and write-locks &fence. +     * @lock: Protects timeline list and write-locks &fence.        */       spinlock_t lock;       /** @@ -62,25 +78,6 @@ struct drm_syncobj {       struct file *file; }; -typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj, -                   struct drm_syncobj_cb *cb);

-/**

struct drm_syncobj_cb - callback for drm_syncobj_add_callback

@node: used by drm_syncob_add_callback to append this struct to

*      &drm_syncobj.cb_list

@func: drm_syncobj_func_t to call

This struct will be initialized by drm_syncobj_add_callback,

additional

data can be passed along by embedding drm_syncobj_cb in another

struct.

The callback will get called the next time

drm_syncobj_replace_fence is

called.

*/

-struct drm_syncobj_cb { -    struct list_head node; -    drm_syncobj_func_t func; -};

void drm_syncobj_free(struct kref *kref); /** @@ -106,29 +103,6 @@ drm_syncobj_put(struct drm_syncobj *obj)       kref_put(&obj->refcount, drm_syncobj_free); } -/**

drm_syncobj_fence_get - get a reference to a fence in a sync object

@syncobj: sync object.

This acquires additional reference to &drm_syncobj.fence

contained in @obj,

if not NULL. It is illegal to call this without already holding a

reference.

No locks required.

Returns:

Either the fence of @obj or NULL if there's none.

*/

-static inline struct dma_fence * -drm_syncobj_fence_get(struct drm_syncobj *syncobj) -{ -    struct dma_fence *fence;

-    rcu_read_lock(); -    fence = dma_fence_get_rcu_safe(&syncobj->fence); -    rcu_read_unlock();

-    return fence; -}

struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,                        u32 handle); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point, @@ -142,5 +116,7 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, int drm_syncobj_get_handle(struct drm_file *file_private,                  struct drm_syncobj *syncobj, u32 *handle); int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd); +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence); #endif diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 300f336633f2..cebdb2541eb7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -717,6 +717,7 @@ struct drm_prime_handle { struct drm_syncobj_create {       __u32 handle; #define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0) +#define DRM_SYNCOBJ_CREATE_TYPE_TIMELINE (1 << 1)       __u32 flags; };

Christian König

7:25 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Am 30.08.2018 um 05:50 schrieb zhoucm1:

...

On 2018年08月29日 19:42, Christian König wrote:

...
Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

...
VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations:     * CPU query - A host operation that allows querying the payload of the       timeline semaphore.     * CPU wait - A host operation that allows a blocking wait for a       timeline semaphore to reach a specified value.     * Device wait - A device operation that allows waiting for a       timeline semaphore to reach a specified value.     * Device signal - A device operation that allows advancing the       timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that.

v2:

remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)

move unexposed denitions to .c file. (Daniel Vetter)

split up the change to drm_syncobj_find_fence() in a separate

patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:

replace normal syncobj with timeline implemenation. (Vetter and

Christian)      a. normal syncobj signal op will create a signal PT to tail of signal pt list.      b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. some bug fix and clean up 3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:

CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou david1.zhou@amd.com Cc: Christian Konig christian.koenig@amd.com Cc: Dave Airlie airlied@redhat.com Cc: Daniel Rakos Daniel.Rakos@amd.com Cc: Daniel Vetter daniel@ffwll.ch

drivers/gpu/drm/drm_syncobj.c              | 593 ++++++++++++++++----- drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +- include/drm/drm_syncobj.h                  | 78 +-- include/uapi/drm/drm.h                     |   1 + 4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include <drm/drm_syncobj.h> +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1

+struct drm_syncobj_stub_fence { +    struct dma_fence base; +    spinlock_t lock; +};

+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{ +        return "syncobjstub"; +}

+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{ +    return !dma_fence_is_signaled(fence); +} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{ +    kfree(f); +} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { +    .get_driver_name = drm_syncobj_stub_fence_get_name, +    .get_timeline_name = drm_syncobj_stub_fence_get_name, +    .enable_signaling = drm_syncobj_stub_fence_enable_signaling, +    .release = drm_syncobj_stub_fence_release, +};

Do we really need to move that around? Could reduce the size of the patch quite a bit if we don't.

stub fence is used widely in both normal and timeline syncobj, if you think which increase patch size, I can do a separate patch for that.

...
...

+struct drm_syncobj_wait_pt { +    struct drm_syncobj_stub_fence base; +    u64    value; +    struct rb_node   node; +}; +struct drm_syncobj_signal_pt { +    struct drm_syncobj_stub_fence base; +    struct dma_fence *signal_fence; +    struct dma_fence *pre_pt_base; +    struct dma_fence_cb signal_cb; +    struct dma_fence_cb pre_pt_cb; +    struct drm_syncobj *syncobj; +    u64    value; +    struct list_head list; +};

What are those two structures good for

They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like:     a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops.     b. wait pt tree checks if timeline value over itself value.

For normal, works like:     a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed.     b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt.

...
and why is the stub fence their base?

Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base.

That sounds like you only did this because you messed up the lifecycle.

Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea.

What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested.

Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A.

...

...
...

/**    * drm_syncobj_find - lookup and reference a sync object.    * @file_private: drm file private pointer @@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find); -static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, -                        struct drm_syncobj_cb *cb, -                        drm_syncobj_func_t func) +static void drm_syncobj_timeline_init(struct drm_syncobj_timeline +                      *syncobj_timeline) { -    cb->func = func; -    list_add_tail(&cb->node, &syncobj->cb_list); +    syncobj_timeline->timeline_context = dma_fence_context_alloc(1); +    syncobj_timeline->timeline = 0; +    syncobj_timeline->signal_point = 0; +    init_waitqueue_head(&syncobj_timeline->wq);

+    syncobj_timeline->wait_pt_tree = RB_ROOT; +    INIT_LIST_HEAD(&syncobj_timeline->signal_pt_list); } -static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, -                         struct dma_fence **fence, -                         struct drm_syncobj_cb *cb, -                         drm_syncobj_func_t func) +static void drm_syncobj_timeline_fini(struct drm_syncobj *syncobj, +                      struct drm_syncobj_timeline *syncobj_timeline) { -    int ret; +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL; +    struct drm_syncobj_signal_pt *signal_pt = NULL, *tmp;

+    spin_lock(&syncobj->lock); +    for(node = rb_first(&syncobj_timeline->wait_pt_tree); +        node != NULL; ) {

Better use rbtree_postorder_for_each_entry_safe() for this.

From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it is iterating over. This includes calling rb_erase() on @pos, as * rb_erase() may rebalance the tree, causing us to miss some nodes. */

Yeah, but your current implementation has the same problem :)

You need to use something like "while (node = rb_first(...))" instead if you want to destruct the whole tree.

Regards, Christian.

...

Thanks, David Zhou

...
Regards, Christian.

...
+        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        rb_erase(&wait_pt->node, +             &syncobj_timeline->wait_pt_tree); +        RB_CLEAR_NODE(&wait_pt->node); +        spin_unlock(&syncobj->lock); +        dma_fence_wait(&wait_pt->base.base, true); +        spin_lock(&syncobj->lock); +        /* kfree(wait_pt) is excuted by fence put */ +        dma_fence_put(&wait_pt->base.base); +    } +    list_for_each_entry_safe(signal_pt, tmp, +                 &syncobj_timeline->signal_pt_list, list) { +        list_del(&signal_pt->list); +        if (signal_pt->signal_fence) {

dma_fence_remove_callback(signal_pt->signal_fence,

+                          &signal_pt->signal_cb); +            dma_fence_put(signal_pt->signal_fence); +        } +        if (signal_pt->pre_pt_base) {

dma_fence_remove_callback(signal_pt->pre_pt_base,

+                          &signal_pt->pre_pt_cb); +            dma_fence_put(signal_pt->pre_pt_base); +        } +        dma_fence_put(&signal_pt->base.base); +    } +    spin_unlock(&syncobj->lock); +}

-    *fence = drm_syncobj_fence_get(syncobj); -    if (*fence) -        return 1; +static bool drm_syncobj_normal_signal_wait_pts(struct drm_syncobj *syncobj, +                           u64 signal_pt) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;       spin_lock(&syncobj->lock); -    /* We've already tried once to get a fence and failed. Now that we -     * have the lock, try one more time just to be sure we don't add a -     * callback when a fence has already been set. -     */ -    if (syncobj->fence) { -        *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-        ret = 1; -    } else { -        *fence = NULL; -        drm_syncobj_add_callback_locked(syncobj, cb, func); -        ret = 0; +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        /* for normal syncobj */ +        if (wait_pt->value == signal_pt) { +            dma_fence_signal(&wait_pt->base.base); +            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +            spin_unlock(&syncobj->lock); +            return true; +        }       }       spin_unlock(&syncobj->lock); +    return false; +} -    return ret; +static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        if (wait_pt->value <= syncobj->syncobj_timeline.timeline) { +            dma_fence_signal(&wait_pt->base.base); +            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +        } else { +            /* the loop is from left to right, the later entry value is +             * bigger, so don't need to check any more */ +            break; +        } +    } +    spin_unlock(&syncobj->lock); } -void drm_syncobj_add_callback(struct drm_syncobj *syncobj, -                  struct drm_syncobj_cb *cb, -                  drm_syncobj_func_t func)

+static void pt_fence_cb(struct drm_syncobj_signal_pt *signal_pt) { +    struct dma_fence *fence = NULL; +    struct drm_syncobj *syncobj; +    struct drm_syncobj_signal_pt *tail_pt; +    u64 pt_value = signal_pt->value;

+    dma_fence_signal(&signal_pt->base.base); +    fence = signal_pt->signal_fence; +    signal_pt->signal_fence = NULL; +    dma_fence_put(fence); +    fence = signal_pt->pre_pt_base; +    signal_pt->pre_pt_base = NULL; +    dma_fence_put(fence);

+    syncobj = signal_pt->syncobj;       spin_lock(&syncobj->lock); -    drm_syncobj_add_callback_locked(syncobj, cb, func); +    syncobj->syncobj_timeline.timeline = pt_value; +    tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                  struct drm_syncobj_signal_pt, list); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL && signal_pt != tail_pt) +        || syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        list_del(&signal_pt->list); +        /* kfree(signal_pt) will be executed by below fence put */ +        dma_fence_put(&signal_pt->base.base); +    }       spin_unlock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) +        drm_syncobj_normal_signal_wait_pts(syncobj, pt_value); +    else +        drm_syncobj_timeline_signal_wait_pts(syncobj); } +static void pt_signal_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, signal_cb); -void drm_syncobj_remove_callback(struct drm_syncobj *syncobj, -                 struct drm_syncobj_cb *cb) +    if (signal_pt->pre_pt_base && +        !dma_fence_is_signaled(signal_pt->pre_pt_base)) +        return;

+    pt_fence_cb(signal_pt); +} +static void pt_pre_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, pre_pt_cb);

+    if (signal_pt->signal_fence && +        !dma_fence_is_signaled(signal_pt->pre_pt_base)) +        return;

+    pt_fence_cb(signal_pt); +}

+static int drm_syncobj_timeline_create_signal_pt(struct drm_syncobj *syncobj, +                         struct dma_fence *fence, +                         u64 point) { +    struct drm_syncobj_signal_pt *signal_pt = +        kzalloc(sizeof(struct drm_syncobj_signal_pt), GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct drm_syncobj_signal_pt *tail_pt; +    struct dma_fence *tail_pt_fence = NULL; +    int ret = 0;

+    if (!signal_pt) +        return -ENOMEM; +    if (syncobj->syncobj_timeline.signal_point >= point) { +        DRM_WARN("A later signal is ready!"); +        goto out; +    } +    if (!fence) +        goto out; +    dma_fence_get(fence);       spin_lock(&syncobj->lock); -    list_del_init(&cb->node); +    base = &signal_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock, +               syncobj->syncobj_timeline.timeline_context, point); +    signal_pt->signal_fence = fence; +    /* timeline syncobj must take this dependency */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (!list_empty(&syncobj->syncobj_timeline.signal_pt_list)) { +            tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                          struct drm_syncobj_signal_pt, list); +            tail_pt_fence = &tail_pt->base.base; +            if (dma_fence_is_signaled(tail_pt_fence)) +                tail_pt_fence = NULL; +            else +                signal_pt->pre_pt_base = +                    dma_fence_get(tail_pt_fence); +        } +    }

+    signal_pt->value = point; +    syncobj->syncobj_timeline.signal_point = point; +    signal_pt->syncobj = syncobj; +    INIT_LIST_HEAD(&signal_pt->list); +    list_add_tail(&signal_pt->list, &syncobj->syncobj_timeline.signal_pt_list);       spin_unlock(&syncobj->lock); +    wake_up_all(&syncobj->syncobj_timeline.wq); +    /** +     * Every pt is depending on signal fence and previous pt fence, add +     * callbacks to them +     */ +    ret = dma_fence_add_callback(signal_pt->signal_fence, +                     &signal_pt->signal_cb, +                     pt_signal_fence_func);

+    if (signal_pt->pre_pt_base) { +        ret = dma_fence_add_callback(signal_pt->pre_pt_base, +                         &signal_pt->pre_pt_cb, +                         pt_pre_fence_func); +        if (ret == -ENOENT) +            pt_pre_fence_func(signal_pt->pre_pt_base, +                      &signal_pt->pre_pt_cb); +        else if (ret) +            goto out; +    } else if (ret == -ENOENT) { +        pt_signal_fence_func(signal_pt->signal_fence, +                     &signal_pt->signal_cb); +    } else if (ret) { +            goto out; +    }

+    return 0; +out: +    dma_fence_put(&signal_pt->base.base); +    return ret; } /** @@ -149,53 +381,30 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,                      u64 point,                      struct dma_fence *fence) { -    struct dma_fence *old_fence; -    struct drm_syncobj_cb *cur, *tmp;

-    if (fence) -        dma_fence_get(fence);

-    spin_lock(&syncobj->lock);

-    old_fence = rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock));

-    rcu_assign_pointer(syncobj->fence, fence);

-    if (fence != old_fence) { -        list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { -            list_del_init(&cur->node); -            cur->func(syncobj, cur); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (fence) +            drm_syncobj_timeline_create_signal_pt(syncobj, fence, +                                  point); +        return; +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        u64 pt_value;

+        if (!fence) { +            drm_syncobj_timeline_fini(syncobj, +                          &syncobj->syncobj_timeline);

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

+            return;           } +        pt_value = syncobj->syncobj_timeline.signal_point + +            DRM_SYNCOBJ_NORMAL_POINT; +        drm_syncobj_timeline_create_signal_pt(syncobj, fence, pt_value); +        return; +    } else { +        DRM_ERROR("the syncobj type isn't support\n");       }

-    spin_unlock(&syncobj->lock);

-    dma_fence_put(old_fence); } EXPORT_SYMBOL(drm_syncobj_replace_fence); -struct drm_syncobj_stub_fence { -    struct dma_fence base; -    spinlock_t lock; -};

-static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) -{ -        return "syncobjstub"; -}

-static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) -{ -    return !dma_fence_is_signaled(fence); -}

-static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { -    .get_driver_name = drm_syncobj_stub_fence_get_name, -    .get_timeline_name = drm_syncobj_stub_fence_get_name, -    .enable_signaling = drm_syncobj_stub_fence_enable_signaling, -    .release = NULL, -};

static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) {       struct drm_syncobj_stub_fence *fence; @@ -215,6 +424,143 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)       return 0; } +static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_lookup_wait_pt_fence(struct drm_syncobj *syncobj, +                      u64 point, +                      struct dma_fence **fence) +{ +    struct rb_node *node = syncobj->syncobj_timeline.wait_pt_tree.rb_node; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    while(node) { +        int result;

+        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        result = point - wait_pt->value; +        if (result < 0) { +            node = node->rb_left; +        } else if (result > 0) { +            node = node->rb_right; +        } else { +            if (fence) +            *fence = dma_fence_get(&wait_pt->base.base); +            break; +        } +    } +    spin_unlock(&syncobj->lock);

+    return wait_pt; +}

+static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_create_wait_pt_and_return_fence(struct drm_syncobj *syncobj, +                             u64 point, +                             struct dma_fence **fence) +{ +    struct drm_syncobj_wait_pt *wait_pt = kzalloc(sizeof(*wait_pt), +                              GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct rb_node **new = &(syncobj->syncobj_timeline.wait_pt_tree.rb_node), *parent = NULL; +    struct drm_syncobj_signal_pt *tail_signal_pt =

list_last_entry(&syncobj->syncobj_timeline.signal_pt_list,

+                struct drm_syncobj_signal_pt, list);

+    if (!wait_pt) +        return NULL; +    base = &wait_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock, +               syncobj->syncobj_timeline.timeline_context, point); +    wait_pt->value = point;

+    /* wait pt must be in an order, so that we can easily lookup and signal +     * it */ +    spin_lock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) +        dma_fence_signal(&wait_pt->base.base); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) && +        (point == tail_signal_pt->value) &&

dma_fence_is_signaled(&tail_signal_pt->base.base))

+        dma_fence_signal(&wait_pt->base.base); +    while(*new) { +        struct drm_syncobj_wait_pt *this = +            rb_entry(*new, struct drm_syncobj_wait_pt, node); +        int result = wait_pt->value - this->value;

+        parent = *new; +        if (result < 0) +            new = &((*new)->rb_left); +        else if (result > 0) +            new = &((*new)->rb_right); +        else +            goto exist; +    } +    if (fence) +        *fence = dma_fence_get(&wait_pt->base.base); +    rb_link_node(&wait_pt->node, parent, new); +    rb_insert_color(&wait_pt->node, &syncobj->syncobj_timeline.wait_pt_tree); +    spin_unlock(&syncobj->lock); +    return wait_pt; +exist: +    spin_unlock(&syncobj->lock); +    dma_fence_put(&wait_pt->base.base); +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                fence); +    return wait_pt; +}

+static struct dma_fence * +drm_syncobj_timeline_point_get(struct drm_syncobj *syncobj, u64 point, u64 flags) +{ +    struct drm_syncobj_wait_pt *wait_pt; +    struct dma_fence *fence = NULL;

+    /* already signaled, simply return a signaled stub fence */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) { +        struct drm_syncobj_stub_fence *fence;

+        fence = kzalloc(sizeof(*fence), GFP_KERNEL); +        if (fence == NULL) +            return NULL;

+        spin_lock_init(&fence->lock); +        dma_fence_init(&fence->base, &drm_syncobj_stub_fence_ops, +                   &fence->lock, 0, 0); +        dma_fence_signal(&fence->base); +        return &fence->base; +    }

+    /* check if the wait pt exists */ +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                &fence); +    if (!fence) { +        /* This is a new wait pt, so create it */ +        wait_pt = drm_syncobj_timeline_create_wait_pt_and_return_fence(syncobj, point, +                                  &fence); +        if (!fence) +            return NULL; +    } +    if (wait_pt) { +        int ret = 0;

+        if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { +            ret = wait_event_interruptible_timeout(syncobj->syncobj_timeline.wq, +                                   wait_pt->value <= syncobj->syncobj_timeline.signal_point, +                                   msecs_to_jiffies(10000)); /* wait 10s */ +            if (ret <= 0) +                return NULL; +        } +        return fence; +    } +    return NULL; +}

/**    * drm_syncobj_find_fence - lookup and reference the fence in a sync object    * @file_private: drm file private pointer @@ -229,20 +575,45 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)    * contains a reference to the fence, which must be released by calling    * dma_fence_put().    */ -int drm_syncobj_find_fence(struct drm_file *file_private, -               u32 handle, u64 point, -               struct dma_fence **fence) +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence) { -    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);       int ret = 0;       if (!syncobj)           return -ENOENT; -    *fence = drm_syncobj_fence_get(syncobj); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        /*NORMAL syncobj always wait on last pt */ +        u64 tail_pt_value = syncobj->syncobj_timeline.signal_point;

+        if (tail_pt_value == 0) +            tail_pt_value += DRM_SYNCOBJ_NORMAL_POINT; +        /* NORMAL syncobj doesn't care point value */ +        WARN_ON(point != 0); +        *fence = drm_syncobj_timeline_point_get(syncobj, tail_pt_value, +                            flags); +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        *fence = drm_syncobj_timeline_point_get(syncobj, point, +                            flags); +    } else { +        DRM_ERROR("Don't support this type syncobj\n"); +        *fence = NULL; +    }       if (!*fence) {           ret = -EINVAL;       } +    return ret; +} +EXPORT_SYMBOL(drm_syncobj_search_fence); +int drm_syncobj_find_fence(struct drm_file *file_private, +               u32 handle, u64 point, +               struct dma_fence **fence) { +    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);

+    int ret = drm_syncobj_search_fence(syncobj, point, +                    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT, +                    fence);       drm_syncobj_put(syncobj);       return ret; } @@ -259,7 +630,7 @@ void drm_syncobj_free(struct kref *kref)       struct drm_syncobj *syncobj = container_of(kref,                              struct drm_syncobj,                              refcount); -    drm_syncobj_replace_fence(syncobj, 0, NULL); +    drm_syncobj_timeline_fini(syncobj, &syncobj->syncobj_timeline);       kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free); @@ -287,8 +658,12 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,           return -ENOMEM;       kref_init(&syncobj->refcount); -    INIT_LIST_HEAD(&syncobj->cb_list);       spin_lock_init(&syncobj->lock); +    if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE) +        syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE; +    else +        syncobj->type = DRM_SYNCOBJ_TYPE_NORMAL;

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {           ret = drm_syncobj_assign_null_handle(syncobj); @@ -646,7 +1021,6 @@ struct syncobj_wait_entry {       struct task_struct *task;       struct dma_fence *fence;       struct dma_fence_cb fence_cb; -    struct drm_syncobj_cb syncobj_cb; }; static void syncobj_wait_fence_func(struct dma_fence *fence, @@ -658,18 +1032,6 @@ static void syncobj_wait_fence_func(struct dma_fence *fence,       wake_up_process(wait->task); } -static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, -                      struct drm_syncobj_cb *cb) -{ -    struct syncobj_wait_entry *wait = -        container_of(cb, struct syncobj_wait_entry, syncobj_cb);

-    /* This happens inside the syncobj lock */ -    wait->fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-    wake_up_process(wait->task); -}

static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,                             uint32_t count,                             uint32_t flags, @@ -693,14 +1055,11 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,       signaled_count = 0;       for (i = 0; i < count; ++i) {           entries[i].task = current; -        entries[i].fence = drm_syncobj_fence_get(syncobjs[i]); +        ret = drm_syncobj_search_fence(syncobjs[i], 0, 0, +                           &entries[i].fence);           if (!entries[i].fence) { -            if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -                continue; -            } else { -                ret = -EINVAL; -                goto cleanup_entries; -            } +            ret = -EINVAL; +            goto cleanup_entries;           }           if (dma_fence_is_signaled(entries[i].fence)) { @@ -728,15 +1087,6 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,        * fallthough and try a 0 timeout wait!        */ -    if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -        for (i = 0; i < count; ++i) {

drm_syncobj_fence_get_or_add_callback(syncobjs[i],

-                                  &entries[i].fence, -                                  &entries[i].syncobj_cb, -                                  syncobj_wait_syncobj_func); -        } -    }

do {           set_current_state(TASK_INTERRUPTIBLE); @@ -784,13 +1134,10 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, cleanup_entries:       for (i = 0; i < count; ++i) { -        if (entries[i].syncobj_cb.func) -            drm_syncobj_remove_callback(syncobjs[i], -                            &entries[i].syncobj_cb); +        dma_fence_put(entries[i].fence);           if (entries[i].fence_cb.func)               dma_fence_remove_callback(entries[i].fence,                             &entries[i].fence_cb); -        dma_fence_put(entries[i].fence);       }       kfree(entries); @@ -965,12 +1312,20 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,       if (ret < 0)           return ret; -    for (i = 0; i < args->count_handles; i++) -        drm_syncobj_replace_fence(syncobjs[i], 0, NULL);

+    for (i = 0; i < args->count_handles; i++) { +        if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +            DRM_ERROR("timeline syncobj cannot reset!\n"); +            ret = -EINVAL; +            goto out; +        } +        drm_syncobj_timeline_fini(syncobjs[i], +                      &syncobjs[i]->syncobj_timeline);

drm_syncobj_timeline_init(&syncobjs[i]->syncobj_timeline);

+    } +out:       drm_syncobj_array_free(syncobjs, args->count_handles); -    return 0; +    return ret; } int diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 7209dd832d39..bb20d318c9d6 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2182,7 +2182,9 @@ await_fence_array(struct i915_execbuffer *eb,           if (!(flags & I915_EXEC_FENCE_WAIT))               continue; -        fence = drm_syncobj_fence_get(syncobj); +        drm_syncobj_search_fence(syncobj, 0, +                     DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT, +                     &fence);           if (!fence)               return -EINVAL; diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 425432b85a87..657c97dc25ec 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -30,6 +30,25 @@ struct drm_syncobj_cb; +enum drm_syncobj_type { +    DRM_SYNCOBJ_TYPE_NORMAL, +    DRM_SYNCOBJ_TYPE_TIMELINE +};

+struct drm_syncobj_timeline { +    wait_queue_head_t    wq; +    u64 timeline_context; +    /** +     * @timeline: syncobj timeline +     */ +    u64 timeline; +    u64 signal_point;

+    struct rb_root wait_pt_tree; +    struct list_head signal_pt_list; +};

/**    * struct drm_syncobj - sync object.    * @@ -41,19 +60,16 @@ struct drm_syncobj {        */       struct kref refcount;       /** -     * @fence: -     * NULL or a pointer to the fence bound to this object. -     * -     * This field should not be used directly. Use drm_syncobj_fence_get() -     * and drm_syncobj_replace_fence() instead. +     * @type: indicate syncobj type        */ -    struct dma_fence __rcu *fence; +    enum drm_syncobj_type type;       /** -     * @cb_list: List of callbacks to call when the &fence gets replaced. +     * @syncobj_timeline: timeline        */ -    struct list_head cb_list; +    struct drm_syncobj_timeline syncobj_timeline;

/** -     * @lock: Protects &cb_list and write-locks &fence. +     * @lock: Protects timeline list and write-locks &fence.        */       spinlock_t lock;       /** @@ -62,25 +78,6 @@ struct drm_syncobj {       struct file *file; }; -typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj, -                   struct drm_syncobj_cb *cb);

-/**

struct drm_syncobj_cb - callback for drm_syncobj_add_callback

@node: used by drm_syncob_add_callback to append this struct to

*      &drm_syncobj.cb_list

@func: drm_syncobj_func_t to call

This struct will be initialized by drm_syncobj_add_callback,

additional

data can be passed along by embedding drm_syncobj_cb in another

struct.

The callback will get called the next time

drm_syncobj_replace_fence is

called.

*/

-struct drm_syncobj_cb { -    struct list_head node; -    drm_syncobj_func_t func; -};

void drm_syncobj_free(struct kref *kref); /** @@ -106,29 +103,6 @@ drm_syncobj_put(struct drm_syncobj *obj)       kref_put(&obj->refcount, drm_syncobj_free); } -/**

drm_syncobj_fence_get - get a reference to a fence in a sync object

@syncobj: sync object.

This acquires additional reference to &drm_syncobj.fence

contained in @obj,

if not NULL. It is illegal to call this without already holding

a reference.

No locks required.

Returns:

Either the fence of @obj or NULL if there's none.

*/

-static inline struct dma_fence * -drm_syncobj_fence_get(struct drm_syncobj *syncobj) -{ -    struct dma_fence *fence;

-    rcu_read_lock(); -    fence = dma_fence_get_rcu_safe(&syncobj->fence); -    rcu_read_unlock();

-    return fence; -}

struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,                        u32 handle); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point, @@ -142,5 +116,7 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, int drm_syncobj_get_handle(struct drm_file *file_private,                  struct drm_syncobj *syncobj, u32 *handle); int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd); +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence); #endif diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 300f336633f2..cebdb2541eb7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -717,6 +717,7 @@ struct drm_prime_handle { struct drm_syncobj_create {       __u32 handle; #define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0) +#define DRM_SYNCOBJ_CREATE_TYPE_TIMELINE (1 << 1)       __u32 flags; };

zhoucm1

10:40 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

On 2018年08月30日 15:25, Christian König wrote:

...

Am 30.08.2018 um 05:50 schrieb zhoucm1:

...
On 2018年08月29日 19:42, Christian König wrote:

...
Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

...
VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations:     * CPU query - A host operation that allows querying the payload of the       timeline semaphore.     * CPU wait - A host operation that allows a blocking wait for a       timeline semaphore to reach a specified value.     * Device wait - A device operation that allows waiting for a       timeline semaphore to reach a specified value.     * Device signal - A device operation that allows advancing the       timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that.

v2:

remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)

move unexposed denitions to .c file. (Daniel Vetter)

split up the change to drm_syncobj_find_fence() in a separate

patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:

replace normal syncobj with timeline implemenation. (Vetter and

Christian)      a. normal syncobj signal op will create a signal PT to tail of signal pt list.      b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. some bug fix and clean up 3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:

CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou david1.zhou@amd.com Cc: Christian Konig christian.koenig@amd.com Cc: Dave Airlie airlied@redhat.com Cc: Daniel Rakos Daniel.Rakos@amd.com Cc: Daniel Vetter daniel@ffwll.ch

drivers/gpu/drm/drm_syncobj.c              | 593 ++++++++++++++++----- drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +- include/drm/drm_syncobj.h                  | 78 +-- include/uapi/drm/drm.h                     |   1 + 4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include <drm/drm_syncobj.h> +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1

+struct drm_syncobj_stub_fence { +    struct dma_fence base; +    spinlock_t lock; +};

+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{ +        return "syncobjstub"; +}

+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{ +    return !dma_fence_is_signaled(fence); +} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{ +    kfree(f); +} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { +    .get_driver_name = drm_syncobj_stub_fence_get_name, +    .get_timeline_name = drm_syncobj_stub_fence_get_name, +    .enable_signaling = drm_syncobj_stub_fence_enable_signaling, +    .release = drm_syncobj_stub_fence_release, +};

Do we really need to move that around? Could reduce the size of the patch quite a bit if we don't.

stub fence is used widely in both normal and timeline syncobj, if you think which increase patch size, I can do a separate patch for that.

...
...

+struct drm_syncobj_wait_pt { +    struct drm_syncobj_stub_fence base; +    u64    value; +    struct rb_node   node; +}; +struct drm_syncobj_signal_pt { +    struct drm_syncobj_stub_fence base; +    struct dma_fence *signal_fence; +    struct dma_fence *pre_pt_base; +    struct dma_fence_cb signal_cb; +    struct dma_fence_cb pre_pt_cb; +    struct drm_syncobj *syncobj; +    u64    value; +    struct list_head list; +};

What are those two structures good for

They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like:     a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops.     b. wait pt tree checks if timeline value over itself value.

For normal, works like:     a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed.     b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt.

...
and why is the stub fence their base?

Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base.

That sounds like you only did this because you messed up the lifecycle.

Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea.

What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested.

Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array. So timeline value is good to resolve that.

...

Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A.

Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use.

...

...
...
...

/**    * drm_syncobj_find - lookup and reference a sync object.    * @file_private: drm file private pointer @@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find); -static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, -                        struct drm_syncobj_cb *cb, -                        drm_syncobj_func_t func) +static void drm_syncobj_timeline_init(struct drm_syncobj_timeline +                      *syncobj_timeline) { -    cb->func = func; -    list_add_tail(&cb->node, &syncobj->cb_list); +    syncobj_timeline->timeline_context = dma_fence_context_alloc(1); +    syncobj_timeline->timeline = 0; +    syncobj_timeline->signal_point = 0; +    init_waitqueue_head(&syncobj_timeline->wq);

+    syncobj_timeline->wait_pt_tree = RB_ROOT;

INIT_LIST_HEAD(&syncobj_timeline->signal_pt_list);

} -static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, -                         struct dma_fence **fence, -                         struct drm_syncobj_cb *cb, -                         drm_syncobj_func_t func) +static void drm_syncobj_timeline_fini(struct drm_syncobj *syncobj, +                      struct drm_syncobj_timeline *syncobj_timeline) { -    int ret; +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL; +    struct drm_syncobj_signal_pt *signal_pt = NULL, *tmp;

+    spin_lock(&syncobj->lock); +    for(node = rb_first(&syncobj_timeline->wait_pt_tree); +        node != NULL; ) {

Better use rbtree_postorder_for_each_entry_safe() for this.

From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it is iterating over. This includes calling rb_erase() on @pos, as * rb_erase() may rebalance the tree, causing us to miss some nodes. */

Yeah, but your current implementation has the same problem :)

You need to use something like "while (node = rb_first(...))" instead if you want to destruct the whole tree.

Regards, Christian.

...
Thanks, David Zhou

...
Regards, Christian.

...
+        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node);

I already get the next node before erasing this node, so the "for (..." sentence is equal with "while (...)"

Thanks, David Zhou

...

...
...
...

rb_erase(&wait_pt->node,

+             &syncobj_timeline->wait_pt_tree); +        RB_CLEAR_NODE(&wait_pt->node); +        spin_unlock(&syncobj->lock); +        dma_fence_wait(&wait_pt->base.base, true); +        spin_lock(&syncobj->lock); +        /* kfree(wait_pt) is excuted by fence put */ +        dma_fence_put(&wait_pt->base.base); +    } +    list_for_each_entry_safe(signal_pt, tmp, +                 &syncobj_timeline->signal_pt_list, list) { +        list_del(&signal_pt->list); +        if (signal_pt->signal_fence) {

dma_fence_remove_callback(signal_pt->signal_fence,

+                          &signal_pt->signal_cb); +            dma_fence_put(signal_pt->signal_fence); +        } +        if (signal_pt->pre_pt_base) {

dma_fence_remove_callback(signal_pt->pre_pt_base,

+                          &signal_pt->pre_pt_cb); +            dma_fence_put(signal_pt->pre_pt_base); +        } +        dma_fence_put(&signal_pt->base.base); +    } +    spin_unlock(&syncobj->lock); +}

-    *fence = drm_syncobj_fence_get(syncobj); -    if (*fence) -        return 1; +static bool drm_syncobj_normal_signal_wait_pts(struct drm_syncobj *syncobj, +                           u64 signal_pt) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;       spin_lock(&syncobj->lock); -    /* We've already tried once to get a fence and failed. Now that we -     * have the lock, try one more time just to be sure we don't add a -     * callback when a fence has already been set. -     */ -    if (syncobj->fence) { -        *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-        ret = 1; -    } else { -        *fence = NULL; -        drm_syncobj_add_callback_locked(syncobj, cb, func); -        ret = 0; +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        /* for normal syncobj */ +        if (wait_pt->value == signal_pt) { +            dma_fence_signal(&wait_pt->base.base); +            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +            spin_unlock(&syncobj->lock); +            return true; +        }       }       spin_unlock(&syncobj->lock); +    return false; +} -    return ret; +static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        if (wait_pt->value <= syncobj->syncobj_timeline.timeline) { +            dma_fence_signal(&wait_pt->base.base); +            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +        } else { +            /* the loop is from left to right, the later entry value is +             * bigger, so don't need to check any more */ +            break; +        } +    } +    spin_unlock(&syncobj->lock); } -void drm_syncobj_add_callback(struct drm_syncobj *syncobj, -                  struct drm_syncobj_cb *cb, -                  drm_syncobj_func_t func)

+static void pt_fence_cb(struct drm_syncobj_signal_pt *signal_pt) { +    struct dma_fence *fence = NULL; +    struct drm_syncobj *syncobj; +    struct drm_syncobj_signal_pt *tail_pt; +    u64 pt_value = signal_pt->value;

+    dma_fence_signal(&signal_pt->base.base); +    fence = signal_pt->signal_fence; +    signal_pt->signal_fence = NULL; +    dma_fence_put(fence); +    fence = signal_pt->pre_pt_base; +    signal_pt->pre_pt_base = NULL; +    dma_fence_put(fence);

+    syncobj = signal_pt->syncobj;       spin_lock(&syncobj->lock); -    drm_syncobj_add_callback_locked(syncobj, cb, func); +    syncobj->syncobj_timeline.timeline = pt_value; +    tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                  struct drm_syncobj_signal_pt, list); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL && signal_pt != tail_pt) +        || syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        list_del(&signal_pt->list); +        /* kfree(signal_pt) will be executed by below fence put */ +        dma_fence_put(&signal_pt->base.base); +    }       spin_unlock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) +        drm_syncobj_normal_signal_wait_pts(syncobj, pt_value); +    else +        drm_syncobj_timeline_signal_wait_pts(syncobj); } +static void pt_signal_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, signal_cb); -void drm_syncobj_remove_callback(struct drm_syncobj *syncobj, -                 struct drm_syncobj_cb *cb) +    if (signal_pt->pre_pt_base && +        !dma_fence_is_signaled(signal_pt->pre_pt_base)) +        return;

+    pt_fence_cb(signal_pt); +} +static void pt_pre_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, pre_pt_cb);

+    if (signal_pt->signal_fence && +        !dma_fence_is_signaled(signal_pt->pre_pt_base)) +        return;

+    pt_fence_cb(signal_pt); +}

+static int drm_syncobj_timeline_create_signal_pt(struct drm_syncobj *syncobj, +                         struct dma_fence *fence, +                         u64 point) { +    struct drm_syncobj_signal_pt *signal_pt = +        kzalloc(sizeof(struct drm_syncobj_signal_pt), GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct drm_syncobj_signal_pt *tail_pt; +    struct dma_fence *tail_pt_fence = NULL; +    int ret = 0;

+    if (!signal_pt) +        return -ENOMEM; +    if (syncobj->syncobj_timeline.signal_point >= point) { +        DRM_WARN("A later signal is ready!"); +        goto out; +    } +    if (!fence) +        goto out; +    dma_fence_get(fence);       spin_lock(&syncobj->lock); -    list_del_init(&cb->node); +    base = &signal_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock,

syncobj->syncobj_timeline.timeline_context, point);

+    signal_pt->signal_fence = fence; +    /* timeline syncobj must take this dependency */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (!list_empty(&syncobj->syncobj_timeline.signal_pt_list)) { +            tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                          struct drm_syncobj_signal_pt, list); +            tail_pt_fence = &tail_pt->base.base; +            if (dma_fence_is_signaled(tail_pt_fence)) +                tail_pt_fence = NULL; +            else +                signal_pt->pre_pt_base = +                    dma_fence_get(tail_pt_fence); +        } +    }

+    signal_pt->value = point; +    syncobj->syncobj_timeline.signal_point = point; +    signal_pt->syncobj = syncobj; +    INIT_LIST_HEAD(&signal_pt->list); +    list_add_tail(&signal_pt->list, &syncobj->syncobj_timeline.signal_pt_list);       spin_unlock(&syncobj->lock); +    wake_up_all(&syncobj->syncobj_timeline.wq); +    /** +     * Every pt is depending on signal fence and previous pt fence, add +     * callbacks to them +     */ +    ret = dma_fence_add_callback(signal_pt->signal_fence, +                     &signal_pt->signal_cb, +                     pt_signal_fence_func);

+    if (signal_pt->pre_pt_base) { +        ret = dma_fence_add_callback(signal_pt->pre_pt_base, +                         &signal_pt->pre_pt_cb, +                         pt_pre_fence_func); +        if (ret == -ENOENT) +            pt_pre_fence_func(signal_pt->pre_pt_base, +                      &signal_pt->pre_pt_cb); +        else if (ret) +            goto out; +    } else if (ret == -ENOENT) { +        pt_signal_fence_func(signal_pt->signal_fence, +                     &signal_pt->signal_cb); +    } else if (ret) { +            goto out; +    }

+    return 0; +out: +    dma_fence_put(&signal_pt->base.base); +    return ret; } /** @@ -149,53 +381,30 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,                      u64 point,                      struct dma_fence *fence) { -    struct dma_fence *old_fence; -    struct drm_syncobj_cb *cur, *tmp;

-    if (fence) -        dma_fence_get(fence);

-    spin_lock(&syncobj->lock);

-    old_fence = rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock));

-    rcu_assign_pointer(syncobj->fence, fence);

-    if (fence != old_fence) { -        list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { -            list_del_init(&cur->node); -            cur->func(syncobj, cur); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (fence) +            drm_syncobj_timeline_create_signal_pt(syncobj, fence, +                                  point); +        return; +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        u64 pt_value;

+        if (!fence) { +            drm_syncobj_timeline_fini(syncobj,

&syncobj->syncobj_timeline);

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

+            return;           } +        pt_value = syncobj->syncobj_timeline.signal_point + +            DRM_SYNCOBJ_NORMAL_POINT; +        drm_syncobj_timeline_create_signal_pt(syncobj, fence, pt_value); +        return; +    } else { +        DRM_ERROR("the syncobj type isn't support\n");       }

-    spin_unlock(&syncobj->lock);

-    dma_fence_put(old_fence); } EXPORT_SYMBOL(drm_syncobj_replace_fence); -struct drm_syncobj_stub_fence { -    struct dma_fence base; -    spinlock_t lock; -};

-static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) -{ -        return "syncobjstub"; -}

-static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) -{ -    return !dma_fence_is_signaled(fence); -}

-static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { -    .get_driver_name = drm_syncobj_stub_fence_get_name, -    .get_timeline_name = drm_syncobj_stub_fence_get_name, -    .enable_signaling = drm_syncobj_stub_fence_enable_signaling, -    .release = NULL, -};

static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) {       struct drm_syncobj_stub_fence *fence; @@ -215,6 +424,143 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)       return 0; } +static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_lookup_wait_pt_fence(struct drm_syncobj *syncobj, +                      u64 point, +                      struct dma_fence **fence) +{ +    struct rb_node *node = syncobj->syncobj_timeline.wait_pt_tree.rb_node; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    while(node) { +        int result;

+        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        result = point - wait_pt->value; +        if (result < 0) { +            node = node->rb_left; +        } else if (result > 0) { +            node = node->rb_right; +        } else { +            if (fence) +            *fence = dma_fence_get(&wait_pt->base.base); +            break; +        } +    } +    spin_unlock(&syncobj->lock);

+    return wait_pt; +}

+static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_create_wait_pt_and_return_fence(struct drm_syncobj *syncobj, +                             u64 point, +                             struct dma_fence **fence) +{ +    struct drm_syncobj_wait_pt *wait_pt = kzalloc(sizeof(*wait_pt), +                              GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct rb_node **new = &(syncobj->syncobj_timeline.wait_pt_tree.rb_node), *parent = NULL; +    struct drm_syncobj_signal_pt *tail_signal_pt =

list_last_entry(&syncobj->syncobj_timeline.signal_pt_list,

+                struct drm_syncobj_signal_pt, list);

+    if (!wait_pt) +        return NULL; +    base = &wait_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock,

syncobj->syncobj_timeline.timeline_context, point);

+    wait_pt->value = point;

+    /* wait pt must be in an order, so that we can easily lookup and signal +     * it */ +    spin_lock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) +        dma_fence_signal(&wait_pt->base.base); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) && +        (point == tail_signal_pt->value) &&

dma_fence_is_signaled(&tail_signal_pt->base.base))

+        dma_fence_signal(&wait_pt->base.base); +    while(*new) { +        struct drm_syncobj_wait_pt *this = +            rb_entry(*new, struct drm_syncobj_wait_pt, node); +        int result = wait_pt->value - this->value;

+        parent = *new; +        if (result < 0) +            new = &((*new)->rb_left); +        else if (result > 0) +            new = &((*new)->rb_right); +        else +            goto exist; +    } +    if (fence) +        *fence = dma_fence_get(&wait_pt->base.base); +    rb_link_node(&wait_pt->node, parent, new); +    rb_insert_color(&wait_pt->node, &syncobj->syncobj_timeline.wait_pt_tree); +    spin_unlock(&syncobj->lock); +    return wait_pt; +exist: +    spin_unlock(&syncobj->lock); +    dma_fence_put(&wait_pt->base.base); +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                fence); +    return wait_pt; +}

+static struct dma_fence * +drm_syncobj_timeline_point_get(struct drm_syncobj *syncobj, u64 point, u64 flags) +{ +    struct drm_syncobj_wait_pt *wait_pt; +    struct dma_fence *fence = NULL;

+    /* already signaled, simply return a signaled stub fence */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) { +        struct drm_syncobj_stub_fence *fence;

+        fence = kzalloc(sizeof(*fence), GFP_KERNEL); +        if (fence == NULL) +            return NULL;

+        spin_lock_init(&fence->lock); +        dma_fence_init(&fence->base, &drm_syncobj_stub_fence_ops, +                   &fence->lock, 0, 0); +        dma_fence_signal(&fence->base); +        return &fence->base; +    }

+    /* check if the wait pt exists */ +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                &fence); +    if (!fence) { +        /* This is a new wait pt, so create it */ +        wait_pt = drm_syncobj_timeline_create_wait_pt_and_return_fence(syncobj, point, +                                  &fence); +        if (!fence) +            return NULL; +    } +    if (wait_pt) { +        int ret = 0;

+        if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { +            ret = wait_event_interruptible_timeout(syncobj->syncobj_timeline.wq, +                                   wait_pt->value <= syncobj->syncobj_timeline.signal_point,

msecs_to_jiffies(10000)); /* wait 10s */

+            if (ret <= 0) +                return NULL; +        } +        return fence; +    } +    return NULL; +}

/**    * drm_syncobj_find_fence - lookup and reference the fence in a sync object    * @file_private: drm file private pointer @@ -229,20 +575,45 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)    * contains a reference to the fence, which must be released by calling    * dma_fence_put().    */ -int drm_syncobj_find_fence(struct drm_file *file_private, -               u32 handle, u64 point, -               struct dma_fence **fence) +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence) { -    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);       int ret = 0;       if (!syncobj)           return -ENOENT; -    *fence = drm_syncobj_fence_get(syncobj); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        /*NORMAL syncobj always wait on last pt */ +        u64 tail_pt_value = syncobj->syncobj_timeline.signal_point;

+        if (tail_pt_value == 0) +            tail_pt_value += DRM_SYNCOBJ_NORMAL_POINT; +        /* NORMAL syncobj doesn't care point value */ +        WARN_ON(point != 0); +        *fence = drm_syncobj_timeline_point_get(syncobj, tail_pt_value, +                            flags); +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        *fence = drm_syncobj_timeline_point_get(syncobj, point, +                            flags); +    } else { +        DRM_ERROR("Don't support this type syncobj\n"); +        *fence = NULL; +    }       if (!*fence) {           ret = -EINVAL;       } +    return ret; +} +EXPORT_SYMBOL(drm_syncobj_search_fence); +int drm_syncobj_find_fence(struct drm_file *file_private, +               u32 handle, u64 point, +               struct dma_fence **fence) { +    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);

+    int ret = drm_syncobj_search_fence(syncobj, point, +                    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT, +                    fence);       drm_syncobj_put(syncobj);       return ret; } @@ -259,7 +630,7 @@ void drm_syncobj_free(struct kref *kref)       struct drm_syncobj *syncobj = container_of(kref,                              struct drm_syncobj,                              refcount); -    drm_syncobj_replace_fence(syncobj, 0, NULL); +    drm_syncobj_timeline_fini(syncobj, &syncobj->syncobj_timeline);       kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free); @@ -287,8 +658,12 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,           return -ENOMEM;       kref_init(&syncobj->refcount); -    INIT_LIST_HEAD(&syncobj->cb_list);       spin_lock_init(&syncobj->lock); +    if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE) +        syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE; +    else +        syncobj->type = DRM_SYNCOBJ_TYPE_NORMAL;

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {           ret = drm_syncobj_assign_null_handle(syncobj); @@ -646,7 +1021,6 @@ struct syncobj_wait_entry {       struct task_struct *task;       struct dma_fence *fence;       struct dma_fence_cb fence_cb; -    struct drm_syncobj_cb syncobj_cb; }; static void syncobj_wait_fence_func(struct dma_fence *fence, @@ -658,18 +1032,6 @@ static void syncobj_wait_fence_func(struct dma_fence *fence,       wake_up_process(wait->task); } -static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, -                      struct drm_syncobj_cb *cb) -{ -    struct syncobj_wait_entry *wait = -        container_of(cb, struct syncobj_wait_entry, syncobj_cb);

-    /* This happens inside the syncobj lock */ -    wait->fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-    wake_up_process(wait->task); -}

static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,                             uint32_t count,                             uint32_t flags, @@ -693,14 +1055,11 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,       signaled_count = 0;       for (i = 0; i < count; ++i) {           entries[i].task = current; -        entries[i].fence = drm_syncobj_fence_get(syncobjs[i]); +        ret = drm_syncobj_search_fence(syncobjs[i], 0, 0, +                           &entries[i].fence);           if (!entries[i].fence) { -            if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -                continue; -            } else { -                ret = -EINVAL; -                goto cleanup_entries; -            } +            ret = -EINVAL; +            goto cleanup_entries;           }           if (dma_fence_is_signaled(entries[i].fence)) { @@ -728,15 +1087,6 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,        * fallthough and try a 0 timeout wait!        */ -    if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -        for (i = 0; i < count; ++i) {

drm_syncobj_fence_get_or_add_callback(syncobjs[i],

-                                  &entries[i].fence,

&entries[i].syncobj_cb,

syncobj_wait_syncobj_func);

-        } -    }

do {           set_current_state(TASK_INTERRUPTIBLE); @@ -784,13 +1134,10 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, cleanup_entries:       for (i = 0; i < count; ++i) { -        if (entries[i].syncobj_cb.func) -            drm_syncobj_remove_callback(syncobjs[i], -                            &entries[i].syncobj_cb); +        dma_fence_put(entries[i].fence);           if (entries[i].fence_cb.func)               dma_fence_remove_callback(entries[i].fence,                             &entries[i].fence_cb); -        dma_fence_put(entries[i].fence);       }       kfree(entries); @@ -965,12 +1312,20 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,       if (ret < 0)           return ret; -    for (i = 0; i < args->count_handles; i++) -        drm_syncobj_replace_fence(syncobjs[i], 0, NULL);

+    for (i = 0; i < args->count_handles; i++) { +        if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +            DRM_ERROR("timeline syncobj cannot reset!\n"); +            ret = -EINVAL; +            goto out; +        } +        drm_syncobj_timeline_fini(syncobjs[i],

&syncobjs[i]->syncobj_timeline);

drm_syncobj_timeline_init(&syncobjs[i]->syncobj_timeline);

+    } +out:       drm_syncobj_array_free(syncobjs, args->count_handles); -    return 0; +    return ret; } int diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 7209dd832d39..bb20d318c9d6 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2182,7 +2182,9 @@ await_fence_array(struct i915_execbuffer *eb,           if (!(flags & I915_EXEC_FENCE_WAIT))               continue; -        fence = drm_syncobj_fence_get(syncobj); +        drm_syncobj_search_fence(syncobj, 0,

DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,

+                     &fence);           if (!fence)               return -EINVAL; diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 425432b85a87..657c97dc25ec 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -30,6 +30,25 @@ struct drm_syncobj_cb; +enum drm_syncobj_type { +    DRM_SYNCOBJ_TYPE_NORMAL, +    DRM_SYNCOBJ_TYPE_TIMELINE +};

+struct drm_syncobj_timeline { +    wait_queue_head_t    wq; +    u64 timeline_context; +    /** +     * @timeline: syncobj timeline +     */ +    u64 timeline; +    u64 signal_point;

+    struct rb_root wait_pt_tree; +    struct list_head signal_pt_list; +};

/**    * struct drm_syncobj - sync object.    * @@ -41,19 +60,16 @@ struct drm_syncobj {        */       struct kref refcount;       /** -     * @fence: -     * NULL or a pointer to the fence bound to this object. -     * -     * This field should not be used directly. Use drm_syncobj_fence_get() -     * and drm_syncobj_replace_fence() instead. +     * @type: indicate syncobj type        */ -    struct dma_fence __rcu *fence; +    enum drm_syncobj_type type;       /** -     * @cb_list: List of callbacks to call when the &fence gets replaced. +     * @syncobj_timeline: timeline        */ -    struct list_head cb_list; +    struct drm_syncobj_timeline syncobj_timeline;

/** -     * @lock: Protects &cb_list and write-locks &fence. +     * @lock: Protects timeline list and write-locks &fence.        */       spinlock_t lock;       /** @@ -62,25 +78,6 @@ struct drm_syncobj {       struct file *file; }; -typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj, -                   struct drm_syncobj_cb *cb);

-/**

struct drm_syncobj_cb - callback for drm_syncobj_add_callback

@node: used by drm_syncob_add_callback to append this struct to

*      &drm_syncobj.cb_list

@func: drm_syncobj_func_t to call

This struct will be initialized by drm_syncobj_add_callback,

additional

data can be passed along by embedding drm_syncobj_cb in another

struct.

The callback will get called the next time

drm_syncobj_replace_fence is

called.

*/

-struct drm_syncobj_cb { -    struct list_head node; -    drm_syncobj_func_t func; -};

void drm_syncobj_free(struct kref *kref); /** @@ -106,29 +103,6 @@ drm_syncobj_put(struct drm_syncobj *obj)       kref_put(&obj->refcount, drm_syncobj_free); } -/**

drm_syncobj_fence_get - get a reference to a fence in a sync

object

@syncobj: sync object.

This acquires additional reference to &drm_syncobj.fence

contained in @obj,

if not NULL. It is illegal to call this without already holding

a reference.

No locks required.

Returns:

Either the fence of @obj or NULL if there's none.

*/

-static inline struct dma_fence * -drm_syncobj_fence_get(struct drm_syncobj *syncobj) -{ -    struct dma_fence *fence;

-    rcu_read_lock(); -    fence = dma_fence_get_rcu_safe(&syncobj->fence); -    rcu_read_unlock();

-    return fence; -}

struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,                        u32 handle); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point, @@ -142,5 +116,7 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, int drm_syncobj_get_handle(struct drm_file *file_private,                  struct drm_syncobj *syncobj, u32 *handle); int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd); +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence); #endif diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 300f336633f2..cebdb2541eb7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -717,6 +717,7 @@ struct drm_prime_handle { struct drm_syncobj_create {       __u32 handle; #define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0) +#define DRM_SYNCOBJ_CREATE_TYPE_TIMELINE (1 << 1)       __u32 flags; };

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Christian König

11:32 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

[SNIP]

...

...
...
...
...

+struct drm_syncobj_wait_pt { +    struct drm_syncobj_stub_fence base; +    u64    value; +    struct rb_node   node; +}; +struct drm_syncobj_signal_pt { +    struct drm_syncobj_stub_fence base; +    struct dma_fence *signal_fence; +    struct dma_fence *pre_pt_base; +    struct dma_fence_cb signal_cb; +    struct dma_fence_cb pre_pt_cb; +    struct drm_syncobj *syncobj; +    u64    value; +    struct list_head list; +};

What are those two structures good for

They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like:     a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops.     b. wait pt tree checks if timeline value over itself value.

For normal, works like:     a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed.     b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt.

...
and why is the stub fence their base?

Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base.

That sounds like you only did this because you messed up the lifecycle.

Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea.

What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested.

Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement:     a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead.

As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array.

...

b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array.

Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this.

...

So timeline value is good to resolve that.

...
Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A.

Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use.

No, fence-array is initialized only once with a static list of fences. This way it is impossible to add the fence-array to itself for example.

E.g. you can't build circle dependencies with that.

...

Better use rbtree_postorder_for_each_entry_safe() for this.

...
...
From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it is iterating over. This includes calling rb_erase() on @pos, as * rb_erase() may rebalance the tree, causing us to miss some nodes. */

Yeah, but your current implementation has the same problem :)

You need to use something like "while (node = rb_first(...))" instead if you want to destruct the whole tree.

Regards, Christian.

...
Thanks, David Zhou

...
Regards, Christian.

...
+ wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); + node = rb_next(node);

I already get the next node before erasing this node, so the "for (..." sentence is equal with "while (...)"

That still doesn't work. The problem is the because of an erase the next node might skip some nodes when rebalancing.

What you need to do is to either not erase the nodes at all (because we re-initialize the tree anyway) or always use rb_first().

Regards, Christian.

...

Thanks, David Zhou

...
...
...
...

rb_erase(&wait_pt->node,

+             &syncobj_timeline->wait_pt_tree); +        RB_CLEAR_NODE(&wait_pt->node); +        spin_unlock(&syncobj->lock); +        dma_fence_wait(&wait_pt->base.base, true); +        spin_lock(&syncobj->lock); +        /* kfree(wait_pt) is excuted by fence put */ +        dma_fence_put(&wait_pt->base.base); +    } +    list_for_each_entry_safe(signal_pt, tmp,

&syncobj_timeline->signal_pt_list, list) {

+        list_del(&signal_pt->list); +        if (signal_pt->signal_fence) {

dma_fence_remove_callback(signal_pt->signal_fence,

+                          &signal_pt->signal_cb); +            dma_fence_put(signal_pt->signal_fence); +        } +        if (signal_pt->pre_pt_base) {

dma_fence_remove_callback(signal_pt->pre_pt_base,

+                          &signal_pt->pre_pt_cb); +            dma_fence_put(signal_pt->pre_pt_base); +        } +        dma_fence_put(&signal_pt->base.base); +    } +    spin_unlock(&syncobj->lock); +}

-    *fence = drm_syncobj_fence_get(syncobj); -    if (*fence) -        return 1; +static bool drm_syncobj_normal_signal_wait_pts(struct drm_syncobj *syncobj, +                           u64 signal_pt) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;       spin_lock(&syncobj->lock); -    /* We've already tried once to get a fence and failed. Now that we -     * have the lock, try one more time just to be sure we don't add a -     * callback when a fence has already been set. -     */ -    if (syncobj->fence) { -        *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-        ret = 1; -    } else { -        *fence = NULL; -        drm_syncobj_add_callback_locked(syncobj, cb, func); -        ret = 0; +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        /* for normal syncobj */ +        if (wait_pt->value == signal_pt) { +            dma_fence_signal(&wait_pt->base.base); +            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +            spin_unlock(&syncobj->lock); +            return true; +        }       }       spin_unlock(&syncobj->lock); +    return false; +} -    return ret; +static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        if (wait_pt->value <= syncobj->syncobj_timeline.timeline) { +            dma_fence_signal(&wait_pt->base.base); +            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +        } else { +            /* the loop is from left to right, the later entry value is +             * bigger, so don't need to check any more */ +            break; +        } +    } +    spin_unlock(&syncobj->lock); } -void drm_syncobj_add_callback(struct drm_syncobj *syncobj, -                  struct drm_syncobj_cb *cb, -                  drm_syncobj_func_t func)

+static void pt_fence_cb(struct drm_syncobj_signal_pt *signal_pt) { +    struct dma_fence *fence = NULL; +    struct drm_syncobj *syncobj; +    struct drm_syncobj_signal_pt *tail_pt; +    u64 pt_value = signal_pt->value;

+    dma_fence_signal(&signal_pt->base.base); +    fence = signal_pt->signal_fence; +    signal_pt->signal_fence = NULL; +    dma_fence_put(fence); +    fence = signal_pt->pre_pt_base; +    signal_pt->pre_pt_base = NULL; +    dma_fence_put(fence);

+    syncobj = signal_pt->syncobj;       spin_lock(&syncobj->lock); -    drm_syncobj_add_callback_locked(syncobj, cb, func); +    syncobj->syncobj_timeline.timeline = pt_value; +    tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                  struct drm_syncobj_signal_pt, list); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL && signal_pt != tail_pt) +        || syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        list_del(&signal_pt->list); +        /* kfree(signal_pt) will be executed by below fence put */ +        dma_fence_put(&signal_pt->base.base); +    }       spin_unlock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) +        drm_syncobj_normal_signal_wait_pts(syncobj, pt_value); +    else +        drm_syncobj_timeline_signal_wait_pts(syncobj); } +static void pt_signal_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, signal_cb); -void drm_syncobj_remove_callback(struct drm_syncobj *syncobj, -                 struct drm_syncobj_cb *cb) +    if (signal_pt->pre_pt_base &&

!dma_fence_is_signaled(signal_pt->pre_pt_base))

+        return;

+    pt_fence_cb(signal_pt); +} +static void pt_pre_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, pre_pt_cb);

+    if (signal_pt->signal_fence &&

!dma_fence_is_signaled(signal_pt->pre_pt_base))

+        return;

+    pt_fence_cb(signal_pt); +}

+static int drm_syncobj_timeline_create_signal_pt(struct drm_syncobj *syncobj, +                         struct dma_fence *fence, +                         u64 point) { +    struct drm_syncobj_signal_pt *signal_pt = +        kzalloc(sizeof(struct drm_syncobj_signal_pt), GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct drm_syncobj_signal_pt *tail_pt; +    struct dma_fence *tail_pt_fence = NULL; +    int ret = 0;

+    if (!signal_pt) +        return -ENOMEM; +    if (syncobj->syncobj_timeline.signal_point >= point) { +        DRM_WARN("A later signal is ready!"); +        goto out; +    } +    if (!fence) +        goto out; +    dma_fence_get(fence);       spin_lock(&syncobj->lock); -    list_del_init(&cb->node); +    base = &signal_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock,

syncobj->syncobj_timeline.timeline_context, point);

+    signal_pt->signal_fence = fence; +    /* timeline syncobj must take this dependency */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (!list_empty(&syncobj->syncobj_timeline.signal_pt_list)) { +            tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                          struct drm_syncobj_signal_pt, list); +            tail_pt_fence = &tail_pt->base.base; +            if (dma_fence_is_signaled(tail_pt_fence)) +                tail_pt_fence = NULL; +            else +                signal_pt->pre_pt_base = +                    dma_fence_get(tail_pt_fence); +        } +    }

+    signal_pt->value = point; +    syncobj->syncobj_timeline.signal_point = point; +    signal_pt->syncobj = syncobj; +    INIT_LIST_HEAD(&signal_pt->list); +    list_add_tail(&signal_pt->list, &syncobj->syncobj_timeline.signal_pt_list);       spin_unlock(&syncobj->lock); +    wake_up_all(&syncobj->syncobj_timeline.wq); +    /** +     * Every pt is depending on signal fence and previous pt fence, add +     * callbacks to them +     */ +    ret = dma_fence_add_callback(signal_pt->signal_fence, +                     &signal_pt->signal_cb, +                     pt_signal_fence_func);

+    if (signal_pt->pre_pt_base) { +        ret = dma_fence_add_callback(signal_pt->pre_pt_base, +                         &signal_pt->pre_pt_cb, +                         pt_pre_fence_func); +        if (ret == -ENOENT) +            pt_pre_fence_func(signal_pt->pre_pt_base, +                      &signal_pt->pre_pt_cb); +        else if (ret) +            goto out; +    } else if (ret == -ENOENT) { +        pt_signal_fence_func(signal_pt->signal_fence, +                     &signal_pt->signal_cb); +    } else if (ret) { +            goto out; +    }

+    return 0; +out: +    dma_fence_put(&signal_pt->base.base); +    return ret; } /** @@ -149,53 +381,30 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,                      u64 point,                      struct dma_fence *fence) { -    struct dma_fence *old_fence; -    struct drm_syncobj_cb *cur, *tmp;

-    if (fence) -        dma_fence_get(fence);

-    spin_lock(&syncobj->lock);

-    old_fence = rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock));

-    rcu_assign_pointer(syncobj->fence, fence);

-    if (fence != old_fence) { -        list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { -            list_del_init(&cur->node); -            cur->func(syncobj, cur); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (fence)

drm_syncobj_timeline_create_signal_pt(syncobj, fence,

+                                  point); +        return; +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        u64 pt_value;

+        if (!fence) { +            drm_syncobj_timeline_fini(syncobj,

&syncobj->syncobj_timeline);

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

+            return;           } +        pt_value = syncobj->syncobj_timeline.signal_point + +            DRM_SYNCOBJ_NORMAL_POINT; +        drm_syncobj_timeline_create_signal_pt(syncobj, fence, pt_value); +        return; +    } else { +        DRM_ERROR("the syncobj type isn't support\n");       }

-    spin_unlock(&syncobj->lock);

-    dma_fence_put(old_fence); } EXPORT_SYMBOL(drm_syncobj_replace_fence); -struct drm_syncobj_stub_fence { -    struct dma_fence base; -    spinlock_t lock; -};

-static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) -{ -        return "syncobjstub"; -}

-static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) -{ -    return !dma_fence_is_signaled(fence); -}

-static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { -    .get_driver_name = drm_syncobj_stub_fence_get_name, -    .get_timeline_name = drm_syncobj_stub_fence_get_name, -    .enable_signaling = drm_syncobj_stub_fence_enable_signaling, -    .release = NULL, -};

static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) {       struct drm_syncobj_stub_fence *fence; @@ -215,6 +424,143 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)       return 0; } +static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_lookup_wait_pt_fence(struct drm_syncobj *syncobj, +                      u64 point, +                      struct dma_fence **fence) +{ +    struct rb_node *node = syncobj->syncobj_timeline.wait_pt_tree.rb_node; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    while(node) { +        int result;

+        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        result = point - wait_pt->value; +        if (result < 0) { +            node = node->rb_left; +        } else if (result > 0) { +            node = node->rb_right; +        } else { +            if (fence) +            *fence = dma_fence_get(&wait_pt->base.base); +            break; +        } +    } +    spin_unlock(&syncobj->lock);

+    return wait_pt; +}

+static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_create_wait_pt_and_return_fence(struct drm_syncobj *syncobj, +                             u64 point, +                             struct dma_fence **fence) +{ +    struct drm_syncobj_wait_pt *wait_pt = kzalloc(sizeof(*wait_pt), +                              GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct rb_node **new = &(syncobj->syncobj_timeline.wait_pt_tree.rb_node), *parent = NULL; +    struct drm_syncobj_signal_pt *tail_signal_pt =

list_last_entry(&syncobj->syncobj_timeline.signal_pt_list,

+                struct drm_syncobj_signal_pt, list);

+    if (!wait_pt) +        return NULL; +    base = &wait_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock,

syncobj->syncobj_timeline.timeline_context, point);

+    wait_pt->value = point;

+    /* wait pt must be in an order, so that we can easily lookup and signal +     * it */ +    spin_lock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) +        dma_fence_signal(&wait_pt->base.base); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) && +        (point == tail_signal_pt->value) &&

dma_fence_is_signaled(&tail_signal_pt->base.base))

+        dma_fence_signal(&wait_pt->base.base); +    while(*new) { +        struct drm_syncobj_wait_pt *this = +            rb_entry(*new, struct drm_syncobj_wait_pt, node); +        int result = wait_pt->value - this->value;

+        parent = *new; +        if (result < 0) +            new = &((*new)->rb_left); +        else if (result > 0) +            new = &((*new)->rb_right); +        else +            goto exist; +    } +    if (fence) +        *fence = dma_fence_get(&wait_pt->base.base); +    rb_link_node(&wait_pt->node, parent, new); +    rb_insert_color(&wait_pt->node, &syncobj->syncobj_timeline.wait_pt_tree); +    spin_unlock(&syncobj->lock); +    return wait_pt; +exist: +    spin_unlock(&syncobj->lock); +    dma_fence_put(&wait_pt->base.base); +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                fence); +    return wait_pt; +}

+static struct dma_fence * +drm_syncobj_timeline_point_get(struct drm_syncobj *syncobj, u64 point, u64 flags) +{ +    struct drm_syncobj_wait_pt *wait_pt; +    struct dma_fence *fence = NULL;

+    /* already signaled, simply return a signaled stub fence */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) { +        struct drm_syncobj_stub_fence *fence;

+        fence = kzalloc(sizeof(*fence), GFP_KERNEL); +        if (fence == NULL) +            return NULL;

+        spin_lock_init(&fence->lock); +        dma_fence_init(&fence->base, &drm_syncobj_stub_fence_ops, +                   &fence->lock, 0, 0); +        dma_fence_signal(&fence->base); +        return &fence->base; +    }

+    /* check if the wait pt exists */ +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                &fence); +    if (!fence) { +        /* This is a new wait pt, so create it */ +        wait_pt = drm_syncobj_timeline_create_wait_pt_and_return_fence(syncobj, point, +                                  &fence); +        if (!fence) +            return NULL; +    } +    if (wait_pt) { +        int ret = 0;

+        if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { +            ret = wait_event_interruptible_timeout(syncobj->syncobj_timeline.wq, +                                   wait_pt->value <= syncobj->syncobj_timeline.signal_point,

msecs_to_jiffies(10000)); /* wait 10s */

+            if (ret <= 0) +                return NULL; +        } +        return fence; +    } +    return NULL; +}

/**    * drm_syncobj_find_fence - lookup and reference the fence in a sync object    * @file_private: drm file private pointer @@ -229,20 +575,45 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)    * contains a reference to the fence, which must be released by calling    * dma_fence_put().    */ -int drm_syncobj_find_fence(struct drm_file *file_private, -               u32 handle, u64 point, -               struct dma_fence **fence) +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence) { -    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);       int ret = 0;       if (!syncobj)           return -ENOENT; -    *fence = drm_syncobj_fence_get(syncobj); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        /*NORMAL syncobj always wait on last pt */ +        u64 tail_pt_value = syncobj->syncobj_timeline.signal_point;

+        if (tail_pt_value == 0) +            tail_pt_value += DRM_SYNCOBJ_NORMAL_POINT; +        /* NORMAL syncobj doesn't care point value */ +        WARN_ON(point != 0); +        *fence = drm_syncobj_timeline_point_get(syncobj, tail_pt_value, +                            flags); +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        *fence = drm_syncobj_timeline_point_get(syncobj, point, +                            flags); +    } else { +        DRM_ERROR("Don't support this type syncobj\n"); +        *fence = NULL; +    }       if (!*fence) {           ret = -EINVAL;       } +    return ret; +} +EXPORT_SYMBOL(drm_syncobj_search_fence); +int drm_syncobj_find_fence(struct drm_file *file_private, +               u32 handle, u64 point, +               struct dma_fence **fence) { +    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);

+    int ret = drm_syncobj_search_fence(syncobj, point,

DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,

+                    fence);       drm_syncobj_put(syncobj);       return ret; } @@ -259,7 +630,7 @@ void drm_syncobj_free(struct kref *kref)       struct drm_syncobj *syncobj = container_of(kref,                              struct drm_syncobj,                              refcount); -    drm_syncobj_replace_fence(syncobj, 0, NULL); +    drm_syncobj_timeline_fini(syncobj, &syncobj->syncobj_timeline);       kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free); @@ -287,8 +658,12 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,           return -ENOMEM;       kref_init(&syncobj->refcount); -    INIT_LIST_HEAD(&syncobj->cb_list);       spin_lock_init(&syncobj->lock); +    if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE) +        syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE; +    else +        syncobj->type = DRM_SYNCOBJ_TYPE_NORMAL;

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {           ret = drm_syncobj_assign_null_handle(syncobj); @@ -646,7 +1021,6 @@ struct syncobj_wait_entry {       struct task_struct *task;       struct dma_fence *fence;       struct dma_fence_cb fence_cb; -    struct drm_syncobj_cb syncobj_cb; }; static void syncobj_wait_fence_func(struct dma_fence *fence, @@ -658,18 +1032,6 @@ static void syncobj_wait_fence_func(struct dma_fence *fence,       wake_up_process(wait->task); } -static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, -                      struct drm_syncobj_cb *cb) -{ -    struct syncobj_wait_entry *wait = -        container_of(cb, struct syncobj_wait_entry, syncobj_cb);

-    /* This happens inside the syncobj lock */ -    wait->fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-    wake_up_process(wait->task); -}

static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,                             uint32_t count,                             uint32_t flags, @@ -693,14 +1055,11 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,       signaled_count = 0;       for (i = 0; i < count; ++i) {           entries[i].task = current; -        entries[i].fence = drm_syncobj_fence_get(syncobjs[i]); +        ret = drm_syncobj_search_fence(syncobjs[i], 0, 0, +                           &entries[i].fence);           if (!entries[i].fence) { -            if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -                continue; -            } else { -                ret = -EINVAL; -                goto cleanup_entries; -            } +            ret = -EINVAL; +            goto cleanup_entries;           }           if (dma_fence_is_signaled(entries[i].fence)) { @@ -728,15 +1087,6 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,        * fallthough and try a 0 timeout wait!        */ -    if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -        for (i = 0; i < count; ++i) {

drm_syncobj_fence_get_or_add_callback(syncobjs[i],

-                                  &entries[i].fence,

&entries[i].syncobj_cb,

syncobj_wait_syncobj_func);

-        } -    }

do {           set_current_state(TASK_INTERRUPTIBLE); @@ -784,13 +1134,10 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, cleanup_entries:       for (i = 0; i < count; ++i) { -        if (entries[i].syncobj_cb.func) -            drm_syncobj_remove_callback(syncobjs[i], -                            &entries[i].syncobj_cb); +        dma_fence_put(entries[i].fence);           if (entries[i].fence_cb.func)               dma_fence_remove_callback(entries[i].fence,                             &entries[i].fence_cb); -        dma_fence_put(entries[i].fence);       }       kfree(entries); @@ -965,12 +1312,20 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,       if (ret < 0)           return ret; -    for (i = 0; i < args->count_handles; i++) -        drm_syncobj_replace_fence(syncobjs[i], 0, NULL);

+    for (i = 0; i < args->count_handles; i++) { +        if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +            DRM_ERROR("timeline syncobj cannot reset!\n"); +            ret = -EINVAL; +            goto out; +        } +        drm_syncobj_timeline_fini(syncobjs[i],

&syncobjs[i]->syncobj_timeline);

drm_syncobj_timeline_init(&syncobjs[i]->syncobj_timeline);

+    } +out:       drm_syncobj_array_free(syncobjs, args->count_handles); -    return 0; +    return ret; } int diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 7209dd832d39..bb20d318c9d6 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2182,7 +2182,9 @@ await_fence_array(struct i915_execbuffer *eb,           if (!(flags & I915_EXEC_FENCE_WAIT))               continue; -        fence = drm_syncobj_fence_get(syncobj); +        drm_syncobj_search_fence(syncobj, 0,

DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,

+                     &fence);           if (!fence)               return -EINVAL; diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 425432b85a87..657c97dc25ec 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -30,6 +30,25 @@ struct drm_syncobj_cb; +enum drm_syncobj_type { +    DRM_SYNCOBJ_TYPE_NORMAL, +    DRM_SYNCOBJ_TYPE_TIMELINE +};

+struct drm_syncobj_timeline { +    wait_queue_head_t    wq; +    u64 timeline_context; +    /** +     * @timeline: syncobj timeline +     */ +    u64 timeline; +    u64 signal_point;

+    struct rb_root wait_pt_tree; +    struct list_head signal_pt_list; +};

/**    * struct drm_syncobj - sync object.    * @@ -41,19 +60,16 @@ struct drm_syncobj {        */       struct kref refcount;       /** -     * @fence: -     * NULL or a pointer to the fence bound to this object. -     * -     * This field should not be used directly. Use drm_syncobj_fence_get() -     * and drm_syncobj_replace_fence() instead. +     * @type: indicate syncobj type        */ -    struct dma_fence __rcu *fence; +    enum drm_syncobj_type type;       /** -     * @cb_list: List of callbacks to call when the &fence gets replaced. +     * @syncobj_timeline: timeline        */ -    struct list_head cb_list; +    struct drm_syncobj_timeline syncobj_timeline;

/** -     * @lock: Protects &cb_list and write-locks &fence. +     * @lock: Protects timeline list and write-locks &fence.        */       spinlock_t lock;       /** @@ -62,25 +78,6 @@ struct drm_syncobj {       struct file *file; }; -typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj, -                   struct drm_syncobj_cb *cb);

-/**

struct drm_syncobj_cb - callback for drm_syncobj_add_callback

@node: used by drm_syncob_add_callback to append this struct to

*      &drm_syncobj.cb_list

@func: drm_syncobj_func_t to call

This struct will be initialized by drm_syncobj_add_callback,

additional

data can be passed along by embedding drm_syncobj_cb in

another struct.

The callback will get called the next time

drm_syncobj_replace_fence is

called.

*/

-struct drm_syncobj_cb { -    struct list_head node; -    drm_syncobj_func_t func; -};

void drm_syncobj_free(struct kref *kref); /** @@ -106,29 +103,6 @@ drm_syncobj_put(struct drm_syncobj *obj)       kref_put(&obj->refcount, drm_syncobj_free); } -/**

drm_syncobj_fence_get - get a reference to a fence in a sync

object

@syncobj: sync object.

This acquires additional reference to &drm_syncobj.fence

contained in @obj,

if not NULL. It is illegal to call this without already

holding a reference.

No locks required.

Returns:

Either the fence of @obj or NULL if there's none.

*/

-static inline struct dma_fence * -drm_syncobj_fence_get(struct drm_syncobj *syncobj) -{ -    struct dma_fence *fence;

-    rcu_read_lock(); -    fence = dma_fence_get_rcu_safe(&syncobj->fence); -    rcu_read_unlock();

-    return fence; -}

struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,                        u32 handle); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point, @@ -142,5 +116,7 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, int drm_syncobj_get_handle(struct drm_file *file_private,                  struct drm_syncobj *syncobj, u32 *handle); int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd); +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence); #endif diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 300f336633f2..cebdb2541eb7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -717,6 +717,7 @@ struct drm_prime_handle { struct drm_syncobj_create {       __u32 handle; #define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0) +#define DRM_SYNCOBJ_CREATE_TYPE_TIMELINE (1 << 1)       __u32 flags; };

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Chunming Zhou

3 Sep 3 Sep

4:13 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

在 2018/8/30 19:32, Christian König 写道:

...

[SNIP]

...
...
...
...
...

+struct drm_syncobj_wait_pt { +    struct drm_syncobj_stub_fence base; +    u64    value; +    struct rb_node   node; +}; +struct drm_syncobj_signal_pt { +    struct drm_syncobj_stub_fence base; +    struct dma_fence *signal_fence; +    struct dma_fence *pre_pt_base; +    struct dma_fence_cb signal_cb; +    struct dma_fence_cb pre_pt_cb; +    struct drm_syncobj *syncobj; +    u64    value; +    struct list_head list; +};

What are those two structures good for

They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like:     a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops.     b. wait pt tree checks if timeline value over itself value.

For normal, works like:     a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed.     b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt.

...
and why is the stub fence their base?

Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base.

That sounds like you only did this because you messed up the lifecycle.

Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea.

What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested.

Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement:     a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead.

As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array.

again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences.

...

...
b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array.

Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this.

yes, we don't support real wait-before-signal in patch now, just a fake wait-before-signal, which still wait on CS submission until signal operation coming through wait_event, which is the conclusion we disscussed before.

...

...
So timeline value is good to resolve that.

...
Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A.

Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use.

No, fence-array is initialized only once with a static list of fences. This way it is impossible to add the fence-array to itself for example.

E.g. you can't build circle dependencies with that.

we already wait for signal operation event, so never build circle dependencies with that. The theory is same.

...

...
Better use rbtree_postorder_for_each_entry_safe() for this.

...
...
From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it is iterating over. This includes calling rb_erase() on @pos, as * rb_erase() may rebalance the tree, causing us to miss some nodes. */

Yeah, but your current implementation has the same problem :)

You need to use something like "while (node = rb_first(...))" instead if you want to destruct the whole tree.

OK, got it, I will change it in v4.

Thanks, David Zhou

...

...
...
Regards, Christian.

...
Thanks, David Zhou

...
Regards, Christian.

...
+        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node);

I already get the next node before erasing this node, so the "for (..." sentence is equal with "while (...)"

That still doesn't work. The problem is the because of an erase the next node might skip some nodes when rebalancing.

What you need to do is to either not erase the nodes at all (because we re-initialize the tree anyway) or always use rb_first().

Regards, Christian.

...
Thanks, David Zhou

...
...
...
...

rb_erase(&wait_pt->node,

+             &syncobj_timeline->wait_pt_tree); +        RB_CLEAR_NODE(&wait_pt->node); +        spin_unlock(&syncobj->lock); +        dma_fence_wait(&wait_pt->base.base, true); +        spin_lock(&syncobj->lock); +        /* kfree(wait_pt) is excuted by fence put */ +        dma_fence_put(&wait_pt->base.base); +    } +    list_for_each_entry_safe(signal_pt, tmp,

&syncobj_timeline->signal_pt_list, list) {

+        list_del(&signal_pt->list); +        if (signal_pt->signal_fence) {

dma_fence_remove_callback(signal_pt->signal_fence,

&signal_pt->signal_cb);

+            dma_fence_put(signal_pt->signal_fence); +        } +        if (signal_pt->pre_pt_base) {

dma_fence_remove_callback(signal_pt->pre_pt_base,

&signal_pt->pre_pt_cb);

+            dma_fence_put(signal_pt->pre_pt_base); +        } +        dma_fence_put(&signal_pt->base.base); +    } +    spin_unlock(&syncobj->lock); +}

-    *fence = drm_syncobj_fence_get(syncobj); -    if (*fence) -        return 1; +static bool drm_syncobj_normal_signal_wait_pts(struct drm_syncobj *syncobj, +                           u64 signal_pt) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;       spin_lock(&syncobj->lock); -    /* We've already tried once to get a fence and failed. Now that we -     * have the lock, try one more time just to be sure we don't add a -     * callback when a fence has already been set. -     */ -    if (syncobj->fence) { -        *fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-        ret = 1; -    } else { -        *fence = NULL; -        drm_syncobj_add_callback_locked(syncobj, cb, func); -        ret = 0; +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        /* for normal syncobj */ +        if (wait_pt->value == signal_pt) {

dma_fence_signal(&wait_pt->base.base);

+            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +            spin_unlock(&syncobj->lock); +            return true; +        }       }       spin_unlock(&syncobj->lock); +    return false; +} -    return ret; +static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj) +{ +    struct rb_node *node = NULL; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    for(node = rb_first(&syncobj->syncobj_timeline.wait_pt_tree); +        node != NULL; ) { +        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        node = rb_next(node); +        if (wait_pt->value <= syncobj->syncobj_timeline.timeline) {

dma_fence_signal(&wait_pt->base.base);

+            rb_erase(&wait_pt->node,

&syncobj->syncobj_timeline.wait_pt_tree);

+            RB_CLEAR_NODE(&wait_pt->node); +            /* kfree(wait_pt) is excuted by fence put */ +            dma_fence_put(&wait_pt->base.base); +        } else { +            /* the loop is from left to right, the later entry value is +             * bigger, so don't need to check any more */ +            break; +        } +    } +    spin_unlock(&syncobj->lock); } -void drm_syncobj_add_callback(struct drm_syncobj *syncobj, -                  struct drm_syncobj_cb *cb, -                  drm_syncobj_func_t func)

+static void pt_fence_cb(struct drm_syncobj_signal_pt *signal_pt) { +    struct dma_fence *fence = NULL; +    struct drm_syncobj *syncobj; +    struct drm_syncobj_signal_pt *tail_pt; +    u64 pt_value = signal_pt->value;

+    dma_fence_signal(&signal_pt->base.base); +    fence = signal_pt->signal_fence; +    signal_pt->signal_fence = NULL; +    dma_fence_put(fence); +    fence = signal_pt->pre_pt_base; +    signal_pt->pre_pt_base = NULL; +    dma_fence_put(fence);

+    syncobj = signal_pt->syncobj;       spin_lock(&syncobj->lock); -    drm_syncobj_add_callback_locked(syncobj, cb, func); +    syncobj->syncobj_timeline.timeline = pt_value; +    tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                  struct drm_syncobj_signal_pt, list); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL && signal_pt != tail_pt) +        || syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        list_del(&signal_pt->list); +        /* kfree(signal_pt) will be executed by below fence put */ +        dma_fence_put(&signal_pt->base.base); +    }       spin_unlock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) +        drm_syncobj_normal_signal_wait_pts(syncobj, pt_value); +    else +        drm_syncobj_timeline_signal_wait_pts(syncobj); } +static void pt_signal_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, signal_cb); -void drm_syncobj_remove_callback(struct drm_syncobj *syncobj, -                 struct drm_syncobj_cb *cb) +    if (signal_pt->pre_pt_base &&

!dma_fence_is_signaled(signal_pt->pre_pt_base))

+        return;

+    pt_fence_cb(signal_pt); +} +static void pt_pre_fence_func(struct dma_fence *fence, +                 struct dma_fence_cb *cb) +{ +    struct drm_syncobj_signal_pt *signal_pt = +        container_of(cb, struct drm_syncobj_signal_pt, pre_pt_cb);

+    if (signal_pt->signal_fence &&

!dma_fence_is_signaled(signal_pt->pre_pt_base))

+        return;

+    pt_fence_cb(signal_pt); +}

+static int drm_syncobj_timeline_create_signal_pt(struct drm_syncobj *syncobj, +                         struct dma_fence *fence, +                         u64 point) { +    struct drm_syncobj_signal_pt *signal_pt = +        kzalloc(sizeof(struct drm_syncobj_signal_pt), GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct drm_syncobj_signal_pt *tail_pt; +    struct dma_fence *tail_pt_fence = NULL; +    int ret = 0;

+    if (!signal_pt) +        return -ENOMEM; +    if (syncobj->syncobj_timeline.signal_point >= point) { +        DRM_WARN("A later signal is ready!"); +        goto out; +    } +    if (!fence) +        goto out; +    dma_fence_get(fence);       spin_lock(&syncobj->lock); -    list_del_init(&cb->node); +    base = &signal_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock,

syncobj->syncobj_timeline.timeline_context, point);

+    signal_pt->signal_fence = fence; +    /* timeline syncobj must take this dependency */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (!list_empty(&syncobj->syncobj_timeline.signal_pt_list)) { +            tail_pt = list_last_entry(&syncobj->syncobj_timeline.signal_pt_list, +                          struct drm_syncobj_signal_pt, list); +            tail_pt_fence = &tail_pt->base.base; +            if (dma_fence_is_signaled(tail_pt_fence)) +                tail_pt_fence = NULL; +            else +                signal_pt->pre_pt_base = +                    dma_fence_get(tail_pt_fence); +        } +    }

+    signal_pt->value = point; +    syncobj->syncobj_timeline.signal_point = point; +    signal_pt->syncobj = syncobj; +    INIT_LIST_HEAD(&signal_pt->list); +    list_add_tail(&signal_pt->list, &syncobj->syncobj_timeline.signal_pt_list);       spin_unlock(&syncobj->lock); +    wake_up_all(&syncobj->syncobj_timeline.wq); +    /** +     * Every pt is depending on signal fence and previous pt fence, add +     * callbacks to them +     */ +    ret = dma_fence_add_callback(signal_pt->signal_fence, +                     &signal_pt->signal_cb, +                     pt_signal_fence_func);

+    if (signal_pt->pre_pt_base) { +        ret = dma_fence_add_callback(signal_pt->pre_pt_base, +                         &signal_pt->pre_pt_cb, +                         pt_pre_fence_func); +        if (ret == -ENOENT)

pt_pre_fence_func(signal_pt->pre_pt_base,

+                      &signal_pt->pre_pt_cb); +        else if (ret) +            goto out; +    } else if (ret == -ENOENT) {

pt_signal_fence_func(signal_pt->signal_fence,

+                     &signal_pt->signal_cb); +    } else if (ret) { +            goto out; +    }

+    return 0; +out: +    dma_fence_put(&signal_pt->base.base); +    return ret; } /** @@ -149,53 +381,30 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,                      u64 point,                      struct dma_fence *fence) { -    struct dma_fence *old_fence; -    struct drm_syncobj_cb *cur, *tmp;

-    if (fence) -        dma_fence_get(fence);

-    spin_lock(&syncobj->lock);

-    old_fence = rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock));

-    rcu_assign_pointer(syncobj->fence, fence);

-    if (fence != old_fence) { -        list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { -            list_del_init(&cur->node); -            cur->func(syncobj, cur); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        if (fence)

drm_syncobj_timeline_create_signal_pt(syncobj, fence,

+                                  point); +        return; +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        u64 pt_value;

+        if (!fence) { +            drm_syncobj_timeline_fini(syncobj,

&syncobj->syncobj_timeline);

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

+            return;           } +        pt_value = syncobj->syncobj_timeline.signal_point + +            DRM_SYNCOBJ_NORMAL_POINT; +        drm_syncobj_timeline_create_signal_pt(syncobj, fence, pt_value); +        return; +    } else { +        DRM_ERROR("the syncobj type isn't support\n");       }

-    spin_unlock(&syncobj->lock);

-    dma_fence_put(old_fence); } EXPORT_SYMBOL(drm_syncobj_replace_fence); -struct drm_syncobj_stub_fence { -    struct dma_fence base; -    spinlock_t lock; -};

-static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) -{ -        return "syncobjstub"; -}

-static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) -{ -    return !dma_fence_is_signaled(fence); -}

-static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { -    .get_driver_name = drm_syncobj_stub_fence_get_name, -    .get_timeline_name = drm_syncobj_stub_fence_get_name, -    .enable_signaling = drm_syncobj_stub_fence_enable_signaling, -    .release = NULL, -};

static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj) {       struct drm_syncobj_stub_fence *fence; @@ -215,6 +424,143 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)       return 0; } +static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_lookup_wait_pt_fence(struct drm_syncobj *syncobj, +                      u64 point, +                      struct dma_fence **fence) +{ +    struct rb_node *node = syncobj->syncobj_timeline.wait_pt_tree.rb_node; +    struct drm_syncobj_wait_pt *wait_pt = NULL;

+    spin_lock(&syncobj->lock); +    while(node) { +        int result;

+        wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); +        result = point - wait_pt->value; +        if (result < 0) { +            node = node->rb_left; +        } else if (result > 0) { +            node = node->rb_right; +        } else { +            if (fence) +            *fence = dma_fence_get(&wait_pt->base.base); +            break; +        } +    } +    spin_unlock(&syncobj->lock);

+    return wait_pt; +}

+static struct drm_syncobj_wait_pt * +drm_syncobj_timeline_create_wait_pt_and_return_fence(struct drm_syncobj *syncobj, +                             u64 point, +                             struct dma_fence **fence) +{ +    struct drm_syncobj_wait_pt *wait_pt = kzalloc(sizeof(*wait_pt), +                              GFP_KERNEL); +    struct drm_syncobj_stub_fence *base; +    struct rb_node **new = &(syncobj->syncobj_timeline.wait_pt_tree.rb_node), *parent = NULL; +    struct drm_syncobj_signal_pt *tail_signal_pt =

list_last_entry(&syncobj->syncobj_timeline.signal_pt_list,

+                struct drm_syncobj_signal_pt, list);

+    if (!wait_pt) +        return NULL; +    base = &wait_pt->base; +    spin_lock_init(&base->lock); +    dma_fence_init(&base->base, +               &drm_syncobj_stub_fence_ops, +               &base->lock,

syncobj->syncobj_timeline.timeline_context, point);

+    wait_pt->value = point;

+    /* wait pt must be in an order, so that we can easily lookup and signal +     * it */ +    spin_lock(&syncobj->lock); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) +        dma_fence_signal(&wait_pt->base.base); +    if ((syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) && +        (point == tail_signal_pt->value) &&

dma_fence_is_signaled(&tail_signal_pt->base.base))

+        dma_fence_signal(&wait_pt->base.base); +    while(*new) { +        struct drm_syncobj_wait_pt *this = +            rb_entry(*new, struct drm_syncobj_wait_pt, node); +        int result = wait_pt->value - this->value;

+        parent = *new; +        if (result < 0) +            new = &((*new)->rb_left); +        else if (result > 0) +            new = &((*new)->rb_right); +        else +            goto exist; +    } +    if (fence) +        *fence = dma_fence_get(&wait_pt->base.base); +    rb_link_node(&wait_pt->node, parent, new); +    rb_insert_color(&wait_pt->node, &syncobj->syncobj_timeline.wait_pt_tree); +    spin_unlock(&syncobj->lock); +    return wait_pt; +exist: +    spin_unlock(&syncobj->lock); +    dma_fence_put(&wait_pt->base.base); +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                fence); +    return wait_pt; +}

+static struct dma_fence * +drm_syncobj_timeline_point_get(struct drm_syncobj *syncobj, u64 point, u64 flags) +{ +    struct drm_syncobj_wait_pt *wait_pt; +    struct dma_fence *fence = NULL;

+    /* already signaled, simply return a signaled stub fence */ +    if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE && +        point <= syncobj->syncobj_timeline.timeline) { +        struct drm_syncobj_stub_fence *fence;

+        fence = kzalloc(sizeof(*fence), GFP_KERNEL); +        if (fence == NULL) +            return NULL;

+        spin_lock_init(&fence->lock); +        dma_fence_init(&fence->base, &drm_syncobj_stub_fence_ops, +                   &fence->lock, 0, 0); +        dma_fence_signal(&fence->base); +        return &fence->base; +    }

+    /* check if the wait pt exists */ +    wait_pt = drm_syncobj_timeline_lookup_wait_pt_fence(syncobj, point, +                                &fence); +    if (!fence) { +        /* This is a new wait pt, so create it */ +        wait_pt = drm_syncobj_timeline_create_wait_pt_and_return_fence(syncobj, point, +                                  &fence); +        if (!fence) +            return NULL; +    } +    if (wait_pt) { +        int ret = 0;

+        if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { +            ret = wait_event_interruptible_timeout(syncobj->syncobj_timeline.wq, +                                   wait_pt->value <= syncobj->syncobj_timeline.signal_point,

msecs_to_jiffies(10000)); /* wait 10s */

+            if (ret <= 0) +                return NULL; +        } +        return fence; +    } +    return NULL; +}

/**    * drm_syncobj_find_fence - lookup and reference the fence in a sync object    * @file_private: drm file private pointer @@ -229,20 +575,45 @@ static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)    * contains a reference to the fence, which must be released by calling    * dma_fence_put().    */ -int drm_syncobj_find_fence(struct drm_file *file_private, -               u32 handle, u64 point, -               struct dma_fence **fence) +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence) { -    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);       int ret = 0;       if (!syncobj)           return -ENOENT; -    *fence = drm_syncobj_fence_get(syncobj); +    if (syncobj->type == DRM_SYNCOBJ_TYPE_NORMAL) { +        /*NORMAL syncobj always wait on last pt */ +        u64 tail_pt_value = syncobj->syncobj_timeline.signal_point;

+        if (tail_pt_value == 0) +            tail_pt_value += DRM_SYNCOBJ_NORMAL_POINT; +        /* NORMAL syncobj doesn't care point value */ +        WARN_ON(point != 0); +        *fence = drm_syncobj_timeline_point_get(syncobj, tail_pt_value, +                            flags); +    } else if (syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +        *fence = drm_syncobj_timeline_point_get(syncobj, point, +                            flags); +    } else { +        DRM_ERROR("Don't support this type syncobj\n"); +        *fence = NULL; +    }       if (!*fence) {           ret = -EINVAL;       } +    return ret; +} +EXPORT_SYMBOL(drm_syncobj_search_fence); +int drm_syncobj_find_fence(struct drm_file *file_private, +               u32 handle, u64 point, +               struct dma_fence **fence) { +    struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);

+    int ret = drm_syncobj_search_fence(syncobj, point,

DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,

+                    fence);       drm_syncobj_put(syncobj);       return ret; } @@ -259,7 +630,7 @@ void drm_syncobj_free(struct kref *kref)       struct drm_syncobj *syncobj = container_of(kref,                              struct drm_syncobj,                              refcount); -    drm_syncobj_replace_fence(syncobj, 0, NULL); +    drm_syncobj_timeline_fini(syncobj, &syncobj->syncobj_timeline);       kfree(syncobj); } EXPORT_SYMBOL(drm_syncobj_free); @@ -287,8 +658,12 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,           return -ENOMEM;       kref_init(&syncobj->refcount); -    INIT_LIST_HEAD(&syncobj->cb_list);       spin_lock_init(&syncobj->lock); +    if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE) +        syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE; +    else +        syncobj->type = DRM_SYNCOBJ_TYPE_NORMAL;

drm_syncobj_timeline_init(&syncobj->syncobj_timeline);

if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {           ret = drm_syncobj_assign_null_handle(syncobj); @@ -646,7 +1021,6 @@ struct syncobj_wait_entry {       struct task_struct *task;       struct dma_fence *fence;       struct dma_fence_cb fence_cb; -    struct drm_syncobj_cb syncobj_cb; }; static void syncobj_wait_fence_func(struct dma_fence *fence, @@ -658,18 +1032,6 @@ static void syncobj_wait_fence_func(struct dma_fence *fence,       wake_up_process(wait->task); } -static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj, -                      struct drm_syncobj_cb *cb) -{ -    struct syncobj_wait_entry *wait = -        container_of(cb, struct syncobj_wait_entry, syncobj_cb);

-    /* This happens inside the syncobj lock */ -    wait->fence = dma_fence_get(rcu_dereference_protected(syncobj->fence,

lockdep_is_held(&syncobj->lock)));

-    wake_up_process(wait->task); -}

static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,                             uint32_t count,                             uint32_t flags, @@ -693,14 +1055,11 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,       signaled_count = 0;       for (i = 0; i < count; ++i) {           entries[i].task = current; -        entries[i].fence = drm_syncobj_fence_get(syncobjs[i]); +        ret = drm_syncobj_search_fence(syncobjs[i], 0, 0, +                           &entries[i].fence);           if (!entries[i].fence) { -            if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -                continue; -            } else { -                ret = -EINVAL; -                goto cleanup_entries; -            } +            ret = -EINVAL; +            goto cleanup_entries;           }           if (dma_fence_is_signaled(entries[i].fence)) { @@ -728,15 +1087,6 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,        * fallthough and try a 0 timeout wait!        */ -    if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { -        for (i = 0; i < count; ++i) {

drm_syncobj_fence_get_or_add_callback(syncobjs[i],

&entries[i].fence,

&entries[i].syncobj_cb,

syncobj_wait_syncobj_func);

-        } -    }

do {           set_current_state(TASK_INTERRUPTIBLE); @@ -784,13 +1134,10 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, cleanup_entries:       for (i = 0; i < count; ++i) { -        if (entries[i].syncobj_cb.func) -            drm_syncobj_remove_callback(syncobjs[i],

&entries[i].syncobj_cb);

+        dma_fence_put(entries[i].fence);           if (entries[i].fence_cb.func) dma_fence_remove_callback(entries[i].fence,                             &entries[i].fence_cb); -        dma_fence_put(entries[i].fence);       }       kfree(entries); @@ -965,12 +1312,20 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,       if (ret < 0)           return ret; -    for (i = 0; i < args->count_handles; i++) -        drm_syncobj_replace_fence(syncobjs[i], 0, NULL);

+    for (i = 0; i < args->count_handles; i++) { +        if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) { +            DRM_ERROR("timeline syncobj cannot reset!\n"); +            ret = -EINVAL; +            goto out; +        } +        drm_syncobj_timeline_fini(syncobjs[i],

&syncobjs[i]->syncobj_timeline);

drm_syncobj_timeline_init(&syncobjs[i]->syncobj_timeline);

+    } +out:       drm_syncobj_array_free(syncobjs, args->count_handles); -    return 0; +    return ret; } int diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 7209dd832d39..bb20d318c9d6 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2182,7 +2182,9 @@ await_fence_array(struct i915_execbuffer *eb,           if (!(flags & I915_EXEC_FENCE_WAIT))               continue; -        fence = drm_syncobj_fence_get(syncobj); +        drm_syncobj_search_fence(syncobj, 0,

DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,

+                     &fence);           if (!fence)               return -EINVAL; diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 425432b85a87..657c97dc25ec 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -30,6 +30,25 @@ struct drm_syncobj_cb; +enum drm_syncobj_type { +    DRM_SYNCOBJ_TYPE_NORMAL, +    DRM_SYNCOBJ_TYPE_TIMELINE +};

+struct drm_syncobj_timeline { +    wait_queue_head_t    wq; +    u64 timeline_context; +    /** +     * @timeline: syncobj timeline +     */ +    u64 timeline; +    u64 signal_point;

+    struct rb_root wait_pt_tree; +    struct list_head signal_pt_list; +};

/**    * struct drm_syncobj - sync object.    * @@ -41,19 +60,16 @@ struct drm_syncobj {        */       struct kref refcount;       /** -     * @fence: -     * NULL or a pointer to the fence bound to this object. -     * -     * This field should not be used directly. Use drm_syncobj_fence_get() -     * and drm_syncobj_replace_fence() instead. +     * @type: indicate syncobj type        */ -    struct dma_fence __rcu *fence; +    enum drm_syncobj_type type;       /** -     * @cb_list: List of callbacks to call when the &fence gets replaced. +     * @syncobj_timeline: timeline        */ -    struct list_head cb_list; +    struct drm_syncobj_timeline syncobj_timeline;

/** -     * @lock: Protects &cb_list and write-locks &fence. +     * @lock: Protects timeline list and write-locks &fence.        */       spinlock_t lock;       /** @@ -62,25 +78,6 @@ struct drm_syncobj {       struct file *file; }; -typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj, -                   struct drm_syncobj_cb *cb);

-/**

struct drm_syncobj_cb - callback for drm_syncobj_add_callback

@node: used by drm_syncob_add_callback to append this struct to

*      &drm_syncobj.cb_list

@func: drm_syncobj_func_t to call

This struct will be initialized by drm_syncobj_add_callback,

additional

data can be passed along by embedding drm_syncobj_cb in

another struct.

The callback will get called the next time

drm_syncobj_replace_fence is

called.

*/

-struct drm_syncobj_cb { -    struct list_head node; -    drm_syncobj_func_t func; -};

void drm_syncobj_free(struct kref *kref); /** @@ -106,29 +103,6 @@ drm_syncobj_put(struct drm_syncobj *obj)       kref_put(&obj->refcount, drm_syncobj_free); } -/**

drm_syncobj_fence_get - get a reference to a fence in a sync

object

@syncobj: sync object.

This acquires additional reference to &drm_syncobj.fence

contained in @obj,

if not NULL. It is illegal to call this without already

holding a reference.

No locks required.

Returns:

Either the fence of @obj or NULL if there's none.

*/

-static inline struct dma_fence * -drm_syncobj_fence_get(struct drm_syncobj *syncobj) -{ -    struct dma_fence *fence;

-    rcu_read_lock(); -    fence = dma_fence_get_rcu_safe(&syncobj->fence); -    rcu_read_unlock();

-    return fence; -}

struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,                        u32 handle); void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, u64 point, @@ -142,5 +116,7 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags, int drm_syncobj_get_handle(struct drm_file *file_private,                  struct drm_syncobj *syncobj, u32 *handle); int drm_syncobj_get_fd(struct drm_syncobj *syncobj, int *p_fd); +int drm_syncobj_search_fence(struct drm_syncobj *syncobj, u64 point, +                 u64 flags, struct dma_fence **fence); #endif diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 300f336633f2..cebdb2541eb7 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -717,6 +717,7 @@ struct drm_prime_handle { struct drm_syncobj_create {       __u32 handle; #define DRM_SYNCOBJ_CREATE_SIGNALED (1 << 0) +#define DRM_SYNCOBJ_CREATE_TYPE_TIMELINE (1 << 1)       __u32 flags; };

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Christian König

8:50 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:

...

在 2018/8/30 19:32, Christian König 写道:

...
[SNIP]

...
...
...
...
> + > +struct drm_syncobj_wait_pt { > +    struct drm_syncobj_stub_fence base; > +    u64    value; > +    struct rb_node   node; > +}; > +struct drm_syncobj_signal_pt { > +    struct drm_syncobj_stub_fence base; > +    struct dma_fence *signal_fence; > +    struct dma_fence *pre_pt_base; > +    struct dma_fence_cb signal_cb; > +    struct dma_fence_cb pre_pt_cb; > +    struct drm_syncobj *syncobj; > +    u64    value; > +    struct list_head list; > +};

What are those two structures good for

They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like:     a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops.     b. wait pt tree checks if timeline value over itself value.

For normal, works like:     a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed.     b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt.

...
and why is the stub fence their base?

Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base.

That sounds like you only did this because you messed up the lifecycle.

Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea.

What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested.

Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement:     a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead.

As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array.

again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences.

The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea.

Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization.

I suggest to either condense all the fences you need to wait for in an array during the wait operation, or reference count the sync_obj and only enable the signaling on the fences when requested by a wait.

...

...
...
b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array.

Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this.

yes, we don't support real wait-before-signal in patch now, just a fake wait-before-signal, which still wait on CS submission until signal operation coming through wait_event, which is the conclusion we disscussed before.

Well in this case we should call it wait-for-signal and not wait-before-signal :)

...

...
...
So timeline value is good to resolve that.

...
Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A.

Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use.

No, fence-array is initialized only once with a static list of fences. This way it is impossible to add the fence-array to itself for example.

E.g. you can't build circle dependencies with that.

we already wait for signal operation event, so never build circle dependencies with that. The theory is same.

Yeah, ok that is certainly true.

Regards, Christian.

...

...
...
Better use rbtree_postorder_for_each_entry_safe() for this.

...
...
From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it is iterating over. This includes calling rb_erase() on @pos, as * rb_erase() may rebalance the tree, causing us to miss some nodes. */

Yeah, but your current implementation has the same problem :)

You need to use something like "while (node = rb_first(...))" instead if you want to destruct the whole tree.

OK, got it, I will change it in v4.

Thanks, David Zhou

Chunming Zhou

10:07 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

在 2018/9/3 16:50, Christian König 写道:

...

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:

...
在 2018/8/30 19:32, Christian König 写道:

...
[SNIP]

...
...
...
> >> + >> +struct drm_syncobj_wait_pt { >> +    struct drm_syncobj_stub_fence base; >> +    u64    value; >> +    struct rb_node   node; >> +}; >> +struct drm_syncobj_signal_pt { >> +    struct drm_syncobj_stub_fence base; >> +    struct dma_fence *signal_fence; >> +    struct dma_fence *pre_pt_base; >> +    struct dma_fence_cb signal_cb; >> +    struct dma_fence_cb pre_pt_cb; >> +    struct drm_syncobj *syncobj; >> +    u64    value; >> +    struct list_head list; >> +}; > > What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like:     a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops.     b. wait pt tree checks if timeline value over itself value.

For normal, works like:     a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed.     b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt.

> and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base.

That sounds like you only did this because you messed up the lifecycle.

Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea.

What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested.

Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement:     a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead.

As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array.

again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences.

The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea.

Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage.

...

Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization.

Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt? This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait.

Thanks, David Zhou

...

I suggest to either condense all the fences you need to wait for in an array during the wait operation, or reference count the sync_obj and only enable the signaling on the fences when requested by a wait.

...
...
...
b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array.

Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this.

yes, we don't support real wait-before-signal in patch now, just a fake wait-before-signal, which still wait on CS submission until signal operation coming through wait_event, which is the conclusion we disscussed before.

Well in this case we should call it wait-for-signal and not wait-before-signal :)

...
...
...
So timeline value is good to resolve that.

...
Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A.

Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use.

No, fence-array is initialized only once with a static list of fences. This way it is impossible to add the fence-array to itself for example.

E.g. you can't build circle dependencies with that.

we already wait for signal operation event, so never build circle dependencies with that. The theory is same.

Yeah, ok that is certainly true.

Regards, Christian.

...
...
...
Better use rbtree_postorder_for_each_entry_safe() for this.

...
...
From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it is iterating over. This includes calling rb_erase() on @pos, as * rb_erase() may rebalance the tree, causing us to miss some nodes. */

Yeah, but your current implementation has the same problem :)

You need to use something like "while (node = rb_first(...))" instead if you want to destruct the whole tree.

OK, got it, I will change it in v4.

Thanks, David Zhou

Christian König

11:19 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:

...

在 2018/9/3 16:50, Christian König 写道:

...
Am 03.09.2018 um 06:13 schrieb Chunming Zhou:

...
在 2018/8/30 19:32, Christian König 写道:

...
[SNIP]

...
...
>> >>> + >>> +struct drm_syncobj_wait_pt { >>> + struct drm_syncobj_stub_fence base; >>> + u64 value; >>> + struct rb_node node; >>> +}; >>> +struct drm_syncobj_signal_pt { >>> + struct drm_syncobj_stub_fence base; >>> + struct dma_fence *signal_fence; >>> + struct dma_fence *pre_pt_base; >>> + struct dma_fence_cb signal_cb; >>> + struct dma_fence_cb pre_pt_cb; >>> + struct drm_syncobj *syncobj; >>> + u64 value; >>> + struct list_head list; >>> +}; >> >> What are those two structures good for > They are used to record wait ops points and signal ops points. > For timeline, they are connected by timeline value, works like: > a. signal pt increase timeline value to signal_pt value when > signal_pt is signaled. signal_pt is depending on previous pt > fence and itself signal fence from signal ops. > b. wait pt tree checks if timeline value over itself value. > > For normal, works like: > a. signal pt increase 1 for syncobj_timeline->signal_point > every time when signal ops is performed. > b. when wait ops is comming, wait pt will fetch above last > signal pt value as its wait point. wait pt will be only signaled > by equal point value signal_pt. > > >> and why is the stub fence their base? > Good question, I tried to kzalloc for them as well when I debug > them, I encountered a problem: > I lookup/find wait_pt or signal_pt successfully, but when I > tried to use them, they are freed sometimes, and results in NULL > point. > and generally, when lookup them, we often need their stub fence > as well, and in the meantime, their lifecycle are same. > Above reason, I make stub fence as their base.

That sounds like you only did this because you messed up the lifecycle.

Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea.

What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested.

Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead.

As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array.

again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences.

The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea.

Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage.

That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed.

Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths.

...

...
Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization.

Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt?

Yes, exactly.

...

This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait.

That is actually the better coding style. We usually try to avoid doing things in interrupt handlers as much as possible.

How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled.

5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences.

6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number.

7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped.

Well that is quite a bunch of logic, but I think that should work fine.

Regards, Christian.

...

Thanks, David Zhou

...
I suggest to either condense all the fences you need to wait for in an array during the wait operation, or reference count the sync_obj and only enable the signaling on the fences when requested by a wait.

...
...
...
b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array.

Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this.

yes, we don't support real wait-before-signal in patch now, just a fake wait-before-signal, which still wait on CS submission until signal operation coming through wait_event, which is the conclusion we disscussed before.

Well in this case we should call it wait-for-signal and not wait-before-signal :)

...
...
...
So timeline value is good to resolve that.

...
Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A.

Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use.

No, fence-array is initialized only once with a static list of fences. This way it is impossible to add the fence-array to itself for example.

E.g. you can't build circle dependencies with that.

we already wait for signal operation event, so never build circle dependencies with that. The theory is same.

Yeah, ok that is certainly true.

Regards, Christian.

...
...
...
Better use rbtree_postorder_for_each_entry_safe() for this.

...
> From the comments, seems we cannot use it here, we need erase > node here. > Comments: > * Note, however, that it cannot handle other modifications that > re-order the > * rbtree it is iterating over. This includes calling rb_erase() > on @pos, as > * rb_erase() may rebalance the tree, causing us to miss some > nodes. > */

Yeah, but your current implementation has the same problem :)

You need to use something like "while (node = rb_first(...))" instead if you want to destruct the whole tree.

OK, got it, I will change it in v4.

Thanks, David Zhou

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

zhoucm1

4 Sep 4 Sep

4:04 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

On 2018年09月03日 19:19, Christian König wrote:

...

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:

...
在 2018/9/3 16:50, Christian König 写道:

...
Am 03.09.2018 um 06:13 schrieb Chunming Zhou:

...
在 2018/8/30 19:32, Christian König 写道:

...
[SNIP]

...
>>> >>>> + >>>> +struct drm_syncobj_wait_pt { >>>> + struct drm_syncobj_stub_fence base; >>>> + u64 value; >>>> + struct rb_node node; >>>> +}; >>>> +struct drm_syncobj_signal_pt { >>>> + struct drm_syncobj_stub_fence base; >>>> + struct dma_fence *signal_fence; >>>> + struct dma_fence *pre_pt_base; >>>> + struct dma_fence_cb signal_cb; >>>> + struct dma_fence_cb pre_pt_cb; >>>> + struct drm_syncobj *syncobj; >>>> + u64 value; >>>> + struct list_head list; >>>> +}; >>> >>> What are those two structures good for >> They are used to record wait ops points and signal ops points. >> For timeline, they are connected by timeline value, works like: >> a. signal pt increase timeline value to signal_pt value >> when signal_pt is signaled. signal_pt is depending on previous >> pt fence and itself signal fence from signal ops. >> b. wait pt tree checks if timeline value over itself value. >> >> For normal, works like: >> a. signal pt increase 1 for syncobj_timeline->signal_point >> every time when signal ops is performed. >> b. when wait ops is comming, wait pt will fetch above last >> signal pt value as its wait point. wait pt will be only >> signaled by equal point value signal_pt. >> >> >>> and why is the stub fence their base? >> Good question, I tried to kzalloc for them as well when I debug >> them, I encountered a problem: >> I lookup/find wait_pt or signal_pt successfully, but when I >> tried to use them, they are freed sometimes, and results in >> NULL point. >> and generally, when lookup them, we often need their stub fence >> as well, and in the meantime, their lifecycle are same. >> Above reason, I make stub fence as their base. > > That sounds like you only did this because you messed up the > lifecycle. > > Additional to that I don't think you correctly considered the > lifecycle of the waits and the sync object itself. E.g. blocking > in drm_syncobj_timeline_fini() until all waits are done is not a > good idea. > > What you should do instead is to create a fence_array object > with all the fence we need to wait for when a wait point is > requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead.

As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array.

again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences.

The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea.

Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage.

That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed.

Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths.

Ah, I find a easy way, we just need to make syncobj_timeline structure as a reference. This way syncobj itself can be released first, wait_pt/signal_pt don't need syncobj at all. every wait_pt/signal_pt keep a reference of syncobj_timeline.

...

...
...
Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization.

Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt?

Yes, exactly.

...
This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait.

That is actually the better coding style. We usually try to avoid doing things in interrupt handlers as much as possible.

OK, I see your concern, how about to delay handling to a workqueue? this way, we only increase timeline value and wake up workqueue in fence cb, is that acceptable?

...

How about this idea:

Each signaling point is a fence implementation with an rb node.

Each node keeps a reference to the last previously inserted node.

Each node is referenced by the sync object itself.

Before each signal/wait operation we do a garbage collection and

remove the first node from the tree as long as it is signaled.

When enable_signaling is requested for a node we cascade that to

the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences.

A wait just looks into the tree for the signal point lower or equal

of the requested sequence number.

When the sync object is released we use

rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped.

Well that is quite a bunch of logic, but I think that should work fine.

Yeah, it could work, simple timeline reference also can solve 'free' problem.

Thanks, David Zhou

...

Regards, Christian.

...
Thanks, David Zhou

...
I suggest to either condense all the fences you need to wait for in an array during the wait operation, or reference count the sync_obj and only enable the signaling on the fences when requested by a wait.

...
...
...
b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array.

Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this.

yes, we don't support real wait-before-signal in patch now, just a fake wait-before-signal, which still wait on CS submission until signal operation coming through wait_event, which is the conclusion we disscussed before.

Well in this case we should call it wait-for-signal and not wait-before-signal :)

...
...
...
So timeline value is good to resolve that.

> > Otherwise somebody can easily construct a situation where > timeline sync obj A waits on B and B waits on A. Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use.

No, fence-array is initialized only once with a static list of fences. This way it is impossible to add the fence-array to itself for example.

E.g. you can't build circle dependencies with that.

we already wait for signal operation event, so never build circle dependencies with that. The theory is same.

Yeah, ok that is certainly true.

Regards, Christian.

...
...
...
Better use rbtree_postorder_for_each_entry_safe() for this. >> From the comments, seems we cannot use it here, we need erase >> node here. >> Comments: >> * Note, however, that it cannot handle other modifications >> that re-order the >> * rbtree it is iterating over. This includes calling >> rb_erase() on @pos, as >> * rb_erase() may rebalance the tree, causing us to miss some >> nodes. >> */ > > Yeah, but your current implementation has the same problem :) > > You need to use something like "while (node = rb_first(...))" > instead if you want to destruct the whole tree.

OK, got it, I will change it in v4.

Thanks, David Zhou

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Christian König

7 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Am 04.09.2018 um 06:04 schrieb zhoucm1:

...

On 2018年09月03日 19:19, Christian König wrote:

...
Am 03.09.2018 um 12:07 schrieb Chunming Zhou:

...
在 2018/9/3 16:50, Christian König 写道:

...
Am 03.09.2018 um 06:13 schrieb Chunming Zhou:

...
在 2018/8/30 19:32, Christian König 写道:

...
[SNIP] >>>> >>>>> + >>>>> +struct drm_syncobj_wait_pt { >>>>> + struct drm_syncobj_stub_fence base; >>>>> + u64 value; >>>>> + struct rb_node node; >>>>> +}; >>>>> +struct drm_syncobj_signal_pt { >>>>> + struct drm_syncobj_stub_fence base; >>>>> + struct dma_fence *signal_fence; >>>>> + struct dma_fence *pre_pt_base; >>>>> + struct dma_fence_cb signal_cb; >>>>> + struct dma_fence_cb pre_pt_cb; >>>>> + struct drm_syncobj *syncobj; >>>>> + u64 value; >>>>> + struct list_head list; >>>>> +}; >>>> >>>> What are those two structures good for >>> They are used to record wait ops points and signal ops points. >>> For timeline, they are connected by timeline value, works like: >>> a. signal pt increase timeline value to signal_pt value >>> when signal_pt is signaled. signal_pt is depending on previous >>> pt fence and itself signal fence from signal ops. >>> b. wait pt tree checks if timeline value over itself value. >>> >>> For normal, works like: >>> a. signal pt increase 1 for syncobj_timeline->signal_point >>> every time when signal ops is performed. >>> b. when wait ops is comming, wait pt will fetch above last >>> signal pt value as its wait point. wait pt will be only >>> signaled by equal point value signal_pt. >>> >>> >>>> and why is the stub fence their base? >>> Good question, I tried to kzalloc for them as well when I >>> debug them, I encountered a problem: >>> I lookup/find wait_pt or signal_pt successfully, but when I >>> tried to use them, they are freed sometimes, and results in >>> NULL point. >>> and generally, when lookup them, we often need their stub >>> fence as well, and in the meantime, their lifecycle are same. >>> Above reason, I make stub fence as their base. >> >> That sounds like you only did this because you messed up the >> lifecycle. >> >> Additional to that I don't think you correctly considered the >> lifecycle of the waits and the sync object itself. E.g. >> blocking in drm_syncobj_timeline_fini() until all waits are >> done is not a good idea. >> >> What you should do instead is to create a fence_array object >> with all the fence we need to wait for when a wait point is >> requested. > Yeah, this is our initial discussion result, but when I tried to > do that, I found that cannot meet the advance signal requirement: > a. We need to consider the wait and signal pt value are not > one-to-one match, it's difficult to find dependent point, at > least, there is some overhead.

As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array.

again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences.

The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea.

Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage.

That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed.

Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths.

Ah, I find a easy way, we just need to make syncobj_timeline structure as a reference. This way syncobj itself can be released first, wait_pt/signal_pt don't need syncobj at all. every wait_pt/signal_pt keep a reference of syncobj_timeline.

I've thought about that as well, but came to the conclusion that you run into problems because of circle dependencies.

E.g. sync_obj references sync_point and sync_point references sync_obj.

Additional to that it is quite a bit larger memory footprint because you need to keep the sync_obj around as well.

...

...
...
...
Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization.

Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt?

Yes, exactly.

...
This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait.

That is actually the better coding style. We usually try to avoid doing things in interrupt handlers as much as possible.

OK, I see your concern, how about to delay handling to a workqueue? this way, we only increase timeline value and wake up workqueue in fence cb, is that acceptable?

A little bit, but not much better. The nice thing with garbage collection is that the CPU time is accounted to the process causing the garbage.

...

...
How about this idea:

Each signaling point is a fence implementation with an rb node.

Each node keeps a reference to the last previously inserted node.

Each node is referenced by the sync object itself.

Before each signal/wait operation we do a garbage collection and

remove the first node from the tree as long as it is signaled.

When enable_signaling is requested for a node we cascade that to

the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences.

A wait just looks into the tree for the signal point lower or

equal of the requested sequence number.

When the sync object is released we use

rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped.

Well that is quite a bunch of logic, but I think that should work fine.

Yeah, it could work, simple timeline reference also can solve 'free' problem.

I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it.

Regards, Christian.

...

Thanks, David Zhou

zhoucm1

7:53 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

On 2018年09月04日 15:00, Christian König wrote:

...

Am 04.09.2018 um 06:04 schrieb zhoucm1:

...
On 2018年09月03日 19:19, Christian König wrote:

...
Am 03.09.2018 um 12:07 schrieb Chunming Zhou:

...
在 2018/9/3 16:50, Christian König 写道:

...
Am 03.09.2018 um 06:13 schrieb Chunming Zhou:

...
在 2018/8/30 19:32, Christian König 写道: > [SNIP] >>>>> >>>>>> + >>>>>> +struct drm_syncobj_wait_pt { >>>>>> + struct drm_syncobj_stub_fence base; >>>>>> + u64 value; >>>>>> + struct rb_node node; >>>>>> +}; >>>>>> +struct drm_syncobj_signal_pt { >>>>>> + struct drm_syncobj_stub_fence base; >>>>>> + struct dma_fence *signal_fence; >>>>>> + struct dma_fence *pre_pt_base; >>>>>> + struct dma_fence_cb signal_cb; >>>>>> + struct dma_fence_cb pre_pt_cb; >>>>>> + struct drm_syncobj *syncobj; >>>>>> + u64 value; >>>>>> + struct list_head list; >>>>>> +}; >>>>> >>>>> What are those two structures good for >>>> They are used to record wait ops points and signal ops points. >>>> For timeline, they are connected by timeline value, works like: >>>> a. signal pt increase timeline value to signal_pt value >>>> when signal_pt is signaled. signal_pt is depending on >>>> previous pt fence and itself signal fence from signal ops. >>>> b. wait pt tree checks if timeline value over itself value. >>>> >>>> For normal, works like: >>>> a. signal pt increase 1 for >>>> syncobj_timeline->signal_point every time when signal ops is >>>> performed. >>>> b. when wait ops is comming, wait pt will fetch above >>>> last signal pt value as its wait point. wait pt will be only >>>> signaled by equal point value signal_pt. >>>> >>>> >>>>> and why is the stub fence their base? >>>> Good question, I tried to kzalloc for them as well when I >>>> debug them, I encountered a problem: >>>> I lookup/find wait_pt or signal_pt successfully, but when I >>>> tried to use them, they are freed sometimes, and results in >>>> NULL point. >>>> and generally, when lookup them, we often need their stub >>>> fence as well, and in the meantime, their lifecycle are same. >>>> Above reason, I make stub fence as their base. >>> >>> That sounds like you only did this because you messed up the >>> lifecycle. >>> >>> Additional to that I don't think you correctly considered the >>> lifecycle of the waits and the sync object itself. E.g. >>> blocking in drm_syncobj_timeline_fini() until all waits are >>> done is not a good idea. >>> >>> What you should do instead is to create a fence_array object >>> with all the fence we need to wait for when a wait point is >>> requested. >> Yeah, this is our initial discussion result, but when I tried >> to do that, I found that cannot meet the advance signal >> requirement: >> a. We need to consider the wait and signal pt value are not >> one-to-one match, it's difficult to find dependent point, at >> least, there is some overhead. > > As far as I can see that is independent of using a fence array > here. See you can either use a ring buffer or an rb-tree, but > when you want to wait for a specific point we need to condense > the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences.

The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea.

Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage.

That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed.

Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths.

Ah, I find a easy way, we just need to make syncobj_timeline structure as a reference. This way syncobj itself can be released first, wait_pt/signal_pt don't need syncobj at all. every wait_pt/signal_pt keep a reference of syncobj_timeline.

I've thought about that as well, but came to the conclusion that you run into problems because of circle dependencies.

E.g. sync_obj references sync_point and sync_point references sync_obj.

sync_obj can be freed first, only sync point references syncobj_timeline structure, syncobj_timeline doesn't reference sync_pt, no circle dep.

...

Additional to that it is quite a bit larger memory footprint because you need to keep the sync_obj around as well.

all signaled sync_pt are freed immediately except syncobj_timeline sturcture, where does extra memory foootprint take?

...

...
...
...
...
Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization.

Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt?

Yes, exactly.

...
This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait.

That is actually the better coding style. We usually try to avoid doing things in interrupt handlers as much as possible.

OK, I see your concern, how about to delay handling to a workqueue? this way, we only increase timeline value and wake up workqueue in fence cb, is that acceptable?

A little bit, but not much better. The nice thing with garbage collection is that the CPU time is accounted to the process causing the garbage.

...
...
How about this idea:

Each signaling point is a fence implementation with an rb node.

Each node keeps a reference to the last previously inserted node.

Each node is referenced by the sync object itself.

Before each signal/wait operation we do a garbage collection and

remove the first node from the tree as long as it is signaled.

When enable_signaling is requested for a node we cascade that to

the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences.

A wait just looks into the tree for the signal point lower or

equal of the requested sequence number.

After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7.

...

...
...

When the sync object is released we use

rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped.

And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves.

Regards, David Zhou

...

...
...
Well that is quite a bunch of logic, but I think that should work fine.

Yeah, it could work, simple timeline reference also can solve 'free' problem.

I think this approach is still quite a bit better,

...

e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it.

Regards, Christian.

...
Thanks, David Zhou

Christian König

8:05 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Am 04.09.2018 um 09:53 schrieb zhoucm1:

...

[SNIP]

...
...
...
How about this idea:

Each signaling point is a fence implementation with an rb node.

Each node keeps a reference to the last previously inserted node.

Each node is referenced by the sync object itself.

Before each signal/wait operation we do a garbage collection and

remove the first node from the tree as long as it is signaled.

When enable_signaling is requested for a node we cascade that to

the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences.

A wait just looks into the tree for the signal point lower or

equal of the requested sequence number.

After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7.

That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well.

The key is that as soon as a signal point is added adding a previous point is no longer allowed.

...

...
...
...

When the sync object is released we use

rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped.

And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves.

The node will be destroyed when the last reference drops, not when enable_signaling is called.

In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more.

We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed.

Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation.

Regards, Christian.

...

Regards, David Zhou

...
...
...
Well that is quite a bunch of logic, but I think that should work fine.

Yeah, it could work, simple timeline reference also can solve 'free' problem.

I think this approach is still quite a bit better,

...
e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it.

Regards, Christian.

...
Thanks, David Zhou

zhoucm1

8:27 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

On 2018年09月04日 16:05, Christian König wrote:

...

Am 04.09.2018 um 09:53 schrieb zhoucm1:

...
[SNIP]

...
...
...
How about this idea:

Each signaling point is a fence implementation with an rb node.

Each node keeps a reference to the last previously inserted node.

Each node is referenced by the sync object itself.

Before each signal/wait operation we do a garbage collection

and remove the first node from the tree as long as it is signaled.

When enable_signaling is requested for a node we cascade that

to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences.

A wait just looks into the tree for the signal point lower or

equal of the requested sequence number.

After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7.

That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well.

If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? Since there is no timeline as a line, I think this is not right direction.

...

The key is that as soon as a signal point is added adding a previous point is no longer allowed.

That's intention.

Regards, David Zhou

...

...
...
...
...

When the sync object is released we use

rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped.

And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves.

The node will be destroyed when the last reference drops, not when enable_signaling is called.

In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more.

We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed.

Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation.

Regards, Christian.

...
Regards, David Zhou

...
...
...
Well that is quite a bunch of logic, but I think that should work fine.

Yeah, it could work, simple timeline reference also can solve 'free' problem.

I think this approach is still quite a bit better,

...
e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it.

Regards, Christian.

...
Thanks, David Zhou

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Christian König

8:42 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Am 04.09.2018 um 10:27 schrieb zhoucm1:

...

On 2018年09月04日 16:05, Christian König wrote:

...
Am 04.09.2018 um 09:53 schrieb zhoucm1:

...
[SNIP]

...
...
...
How about this idea:

Each signaling point is a fence implementation with an rb node.

Each node keeps a reference to the last previously inserted node.

Each node is referenced by the sync object itself.

Before each signal/wait operation we do a garbage collection

and remove the first node from the tree as long as it is signaled.

When enable_signaling is requested for a node we cascade that

to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences.

A wait just looks into the tree for the signal point lower or

equal of the requested sequence number.

After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7.

That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well.

If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15?

The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything.

...

Since there is no timeline as a line, I think this is not right direction.

That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences.

Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number.

Regards, Christian.

...

...
The key is that as soon as a signal point is added adding a previous point is no longer allowed.

That's intention.

Regards, David Zhou

...
...
...
...
...

When the sync object is released we use

rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped.

And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves.

The node will be destroyed when the last reference drops, not when enable_signaling is called.

In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more.

We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed.

Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation.

Regards, Christian.

...
Regards, David Zhou

...
...
...
Well that is quite a bunch of logic, but I think that should work fine.

Yeah, it could work, simple timeline reference also can solve 'free' problem.

I think this approach is still quite a bit better,

...
e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it.

Regards, Christian.

...
Thanks, David Zhou

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

zhoucm1

9 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

On 2018年09月04日 16:42, Christian König wrote:

...

Am 04.09.2018 um 10:27 schrieb zhoucm1:

...
On 2018年09月04日 16:05, Christian König wrote:

...
Am 04.09.2018 um 09:53 schrieb zhoucm1:

...
[SNIP]

...
...
> > How about this idea: > 1. Each signaling point is a fence implementation with an rb node. > 2. Each node keeps a reference to the last previously inserted > node. > 3. Each node is referenced by the sync object itself. > 4. Before each signal/wait operation we do a garbage collection > and remove the first node from the tree as long as it is signaled. > > 5. When enable_signaling is requested for a node we cascade that > to the left using rb_prev. > This ensures that signaling is enabled for the current fence > as well as all previous fences. > > 6. A wait just looks into the tree for the signal point lower or > equal of the requested sequence number.

After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7.

That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well.

If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15?

The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything.

8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history?

...

...
Since there is no timeline as a line, I think this is not right direction.

That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences.

No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released.

Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:)

Thanks, David Zhou

...

Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number.

Regards, Christian.

...
...
The key is that as soon as a signal point is added adding a previous point is no longer allowed.

That's intention.

Regards, David Zhou

...
...
...
...
> > 7. When the sync object is released we use > rbtree_postorder_for_each_entry_safe() and drop the extra > reference to each node, but never call rb_erase! > This way the rb_tree stays in memory, but without a root > (e.g. the sync object). It only destructs itself when the looked > up references to the nodes are dropped.

And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves.

The node will be destroyed when the last reference drops, not when enable_signaling is called.

In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more.

We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed.

Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation.

Regards, Christian.

...
Regards, David Zhou

...
...
> > Well that is quite a bunch of logic, but I think that should > work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem.

I think this approach is still quite a bit better,

...
e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it.

Regards, Christian.

...
Thanks, David Zhou

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Christian König

9:20 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Am 04.09.2018 um 11:00 schrieb zhoucm1:

...

On 2018年09月04日 16:42, Christian König wrote:

...
Am 04.09.2018 um 10:27 schrieb zhoucm1:

...
On 2018年09月04日 16:05, Christian König wrote:

...
Am 04.09.2018 um 09:53 schrieb zhoucm1:

...
[SNIP]

...
>> >> How about this idea: >> 1. Each signaling point is a fence implementation with an rb node. >> 2. Each node keeps a reference to the last previously inserted >> node. >> 3. Each node is referenced by the sync object itself. >> 4. Before each signal/wait operation we do a garbage collection >> and remove the first node from the tree as long as it is signaled. >> >> 5. When enable_signaling is requested for a node we cascade >> that to the left using rb_prev. >> This ensures that signaling is enabled for the current >> fence as well as all previous fences. >> >> 6. A wait just looks into the tree for the signal point lower >> or equal of the requested sequence number.

After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7.

That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well.

If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15?

The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything.

8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history?

Well we of course keep what the last garbage collected number is, don't we?

...

...
...
Since there is no timeline as a line, I think this is not right direction.

That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences.

No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released.

Yeah, but the syncobj can live for a very very long time. Not having some form of shrinking it when fences are signaled is certainly not going to fly very far.

Regards, Christian.

...

Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:)

Thanks, David Zhou

...
Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number.

Regards, Christian.

...
...
The key is that as soon as a signal point is added adding a previous point is no longer allowed.

That's intention.

Regards, David Zhou

...
...
...
>> >> 7. When the sync object is released we use >> rbtree_postorder_for_each_entry_safe() and drop the extra >> reference to each node, but never call rb_erase! >> This way the rb_tree stays in memory, but without a root >> (e.g. the sync object). It only destructs itself when the >> looked up references to the nodes are dropped.

And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves.

The node will be destroyed when the last reference drops, not when enable_signaling is called.

In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more.

We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed.

Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation.

Regards, Christian.

...
Regards, David Zhou

...
>> >> Well that is quite a bunch of logic, but I think that should >> work fine. > Yeah, it could work, simple timeline reference also can solve > 'free' problem.

I think this approach is still quite a bit better,

...
e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it.

Regards, Christian.

> > Thanks, > David Zhou

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

zhoucm1

5 Sep 5 Sep

3:36 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

On 2018年09月04日 17:20, Christian König wrote:

...

Am 04.09.2018 um 11:00 schrieb zhoucm1:

...
On 2018年09月04日 16:42, Christian König wrote:

...
Am 04.09.2018 um 10:27 schrieb zhoucm1:

...
On 2018年09月04日 16:05, Christian König wrote:

...
Am 04.09.2018 um 09:53 schrieb zhoucm1:

...
[SNIP] >>> >>> How about this idea: >>> 1. Each signaling point is a fence implementation with an rb >>> node. >>> 2. Each node keeps a reference to the last previously inserted >>> node. >>> 3. Each node is referenced by the sync object itself. >>> 4. Before each signal/wait operation we do a garbage >>> collection and remove the first node from the tree as long as >>> it is signaled. >>> >>> 5. When enable_signaling is requested for a node we cascade >>> that to the left using rb_prev. >>> This ensures that signaling is enabled for the current >>> fence as well as all previous fences. >>> >>> 6. A wait just looks into the tree for the signal point lower >>> or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7.

That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well.

If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15?

The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything.

8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history?

Well we of course keep what the last garbage collected number is, don't we?

...
...
...
Since there is no timeline as a line, I think this is not right direction.

That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences.

No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released.

Yeah, but the syncobj can live for a very very long time. Not having some form of shrinking it when fences are signaled is certainly not going to fly very far.

I will try to fix this problem. btw, when I try your suggestion, I find it will be difficult to implement drm_syncobj_array_wait_timeout by your idea, since it needs first_signaled. if there is un-signaled syncobj, we will still register cb to timeline value change, then still back to need enble_signaling.

Thanks, David Zhou

...

Regards, Christian.

...
Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:)

Thanks, David Zhou

...
Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number.

Regards, Christian.

...
...
The key is that as soon as a signal point is added adding a previous point is no longer allowed.

That's intention.

Regards, David Zhou

...
...
>>> >>> 7. When the sync object is released we use >>> rbtree_postorder_for_each_entry_safe() and drop the extra >>> reference to each node, but never call rb_erase! >>> This way the rb_tree stays in memory, but without a root >>> (e.g. the sync object). It only destructs itself when the >>> looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves.

The node will be destroyed when the last reference drops, not when enable_signaling is called.

In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more.

We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed.

Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation.

Regards, Christian.

...
Regards, David Zhou >>> >>> Well that is quite a bunch of logic, but I think that should >>> work fine. >> Yeah, it could work, simple timeline reference also can solve >> 'free' problem. > > I think this approach is still quite a bit better,

> e.g. you don't run into circle dependency problems, it needs > less memory and each node has always the same size which means > we can use a kmem_cache for it. > > Regards, > Christian. > >> >> Thanks, >> David Zhou >

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Christian König

6:55 a.m.

New subject: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

Am 05.09.2018 um 05:36 schrieb zhoucm1:

...

On 2018年09月04日 17:20, Christian König wrote:

...
Am 04.09.2018 um 11:00 schrieb zhoucm1:

...
On 2018年09月04日 16:42, Christian König wrote:

...
Am 04.09.2018 um 10:27 schrieb zhoucm1:

...
On 2018年09月04日 16:05, Christian König wrote:

...
Am 04.09.2018 um 09:53 schrieb zhoucm1: > [SNIP] >>>> >>>> How about this idea: >>>> 1. Each signaling point is a fence implementation with an rb >>>> node. >>>> 2. Each node keeps a reference to the last previously >>>> inserted node. >>>> 3. Each node is referenced by the sync object itself. >>>> 4. Before each signal/wait operation we do a garbage >>>> collection and remove the first node from the tree as long as >>>> it is signaled. >>>> >>>> 5. When enable_signaling is requested for a node we cascade >>>> that to the left using rb_prev. >>>> This ensures that signaling is enabled for the current >>>> fence as well as all previous fences. >>>> >>>> 6. A wait just looks into the tree for the signal point lower >>>> or equal of the requested sequence number. > After re-thought your idea, I think it doesn't work since there > is no timeline value as a line: > signal pt value doesn't must be continues, which can be jump by > signal operation, like 1, 4, 8, 15, 19, e.g. there are five > singal_pt, > signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if > a wait pt is 7, do you mean this wait only needs signal_pt1 and > signal_pt4??? That's certainly not right, we need to make sure > the timeline value is bigger than wait pt value, that means > signal_pt8 is need for wait_pt7.

That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well.

If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15?

The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything.

8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history?

Well we of course keep what the last garbage collected number is, don't we?

...
...
...
Since there is no timeline as a line, I think this is not right direction.

That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences.

No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released.

Yeah, but the syncobj can live for a very very long time. Not having some form of shrinking it when fences are signaled is certainly not going to fly very far.

I will try to fix this problem. btw, when I try your suggestion, I find it will be difficult to implement drm_syncobj_array_wait_timeout by your idea, since it needs first_signaled. if there is un-signaled syncobj, we will still register cb to timeline value change, then still back to need enble_signaling.

I don't full understand the problem here. When we need to wait for a fence it is obvious that we need to enable signaling. So what exactly is the issue?

BTW: I'm going to commit patches #1-#4 to drm-misc-next now if nobody is going to object.

Christian.

...

Thanks, David Zhou

...
Regards, Christian.

...
Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:)

Thanks, David Zhou

...
Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number.

Regards, Christian.

...
...
The key is that as soon as a signal point is added adding a previous point is no longer allowed.

That's intention.

Regards, David Zhou

...
>>>> >>>> 7. When the sync object is released we use >>>> rbtree_postorder_for_each_entry_safe() and drop the extra >>>> reference to each node, but never call rb_erase! >>>> This way the rb_tree stays in memory, but without a root >>>> (e.g. the sync object). It only destructs itself when the >>>> looked up references to the nodes are dropped. > And here, who will destroy rb node since no one do > enable_signaling, and there is no callback to free themselves.

The node will be destroyed when the last reference drops, not when enable_signaling is called.

In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more.

We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed.

Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation.

Regards, Christian.

> > Regards, > David Zhou >>>> >>>> Well that is quite a bunch of logic, but I think that should >>>> work fine. >>> Yeah, it could work, simple timeline reference also can solve >>> 'free' problem. >> >> I think this approach is still quite a bit better, > >> e.g. you don't run into circle dependency problems, it needs >> less memory and each node has always the same size which means >> we can use a kmem_cache for it. >> >> Regards, >> Christian. >> >>> >>> Thanks, >>> David Zhou >> >

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx

2420

Age (days ago)

2427

Last active (days ago)

dri-devel@lists.freedesktop.org

23 comments

5 participants

tags (0)

participants (5)

Christian König
Christian König
Chunming Zhou
Chunming Zhou
zhoucm1