Quoting Sean Paul (2017-06-29 19:10:11)
On Thu, Jun 29, 2017 at 01:59:30PM +0100, Chris Wilson wrote:
Reduce the list iteration when incrementing the timeline by storing the fences in increasing order.
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Sumit Semwal sumit.semwal@linaro.org Cc: Sean Paul seanpaul@chromium.org Cc: Gustavo Padovan gustavo@padovan.org
drivers/dma-buf/sw_sync.c | 49 ++++++++++++++++++++++++++++++++++++++------ drivers/dma-buf/sync_debug.h | 5 +++++ 2 files changed, 48 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index e51fe11bbbea..8cef5d345316 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -96,6 +96,7 @@ static struct sync_timeline *sync_timeline_create(const char *name) obj->context = dma_fence_context_alloc(1); strlcpy(obj->name, name, sizeof(obj->name));
obj->pt_tree = RB_ROOT; INIT_LIST_HEAD(&obj->pt_list); spin_lock_init(&obj->lock);
@@ -142,9 +143,13 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
obj->value += inc;
list_for_each_entry_safe(pt, next, &obj->pt_list, link)
if (dma_fence_is_signaled_locked(&pt->base))
list_del_init(&pt->link);
list_for_each_entry_safe(pt, next, &obj->pt_list, link) {
if (!dma_fence_is_signaled_locked(&pt->base))
break;
list_del_init(&pt->link);
rb_erase(&pt->node, &obj->pt_tree);
} spin_unlock_irq(&obj->lock);
} @@ -174,8 +179,38 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj, INIT_LIST_HEAD(&pt->link);
spin_lock_irq(&obj->lock);
if (!dma_fence_is_signaled_locked(&pt->base))
list_add_tail(&pt->link, &obj->pt_list);
if (!dma_fence_is_signaled_locked(&pt->base)) {
struct rb_node **p = &obj->pt_tree.rb_node;
struct rb_node *parent = NULL;
while (*p) {
struct sync_pt *other;
int cmp;
parent = *p;
other = rb_entry(parent, typeof(*pt), node);
cmp = value - other->base.seqno;
We're imposing an implicit limitation on userspace here that points cannot be
INT_MAX apart (since they'll be in the wrong order in the tree).
That's not a new limitation. It's an artifact of using the u32 timeline/seqno.
I wonder how much this patch will actually save, given that the number of active points on a given timeline should be small. Do we have any evidence that this optimization is warranted?
The good news is that for empty/small trees, the overhead is tiny, less than the cost of the syscall. I honestly don't know who uses sw_sync and so would benefit. I figured I would throw it out here since it was trivial. -Chris