Fence array patchset

List overview All Threads
Download

newer

older

[PATCH] drm/vc4: clean up error...

[GIT PULL]...

Christian König

1 Jun 2016 1 Jun '16

1:10 p.m.

Hi guys,

this is the next iteration of the fence array patch set.

Daniel suggested that I provide an example on how this functionality might be used by a driver. So I added a few additional patches in this series to show what I want to do with this in the amdgpu driver.

The main idea is that for each VMID we have a set of hardware fences which are currently using this VMID. Now when a new command submission needs a VMID we construct a fence array which should signal when any of the VMIDs becomes available and gives that back to our the scheduler.

This effort and my testing also found a rather stupid typo in the code and I also tried to incorporate the comments from Chris and Daniel as well.

I think it's ready to land now, but as usual feel free to take it apart.

Cheers, Christian.

Show replies by date

Christian König

1 Jun 1 Jun

1:10 p.m.

New subject: [PATCH 01/11] dma-buf/fence: make fence context 64 bit v2

From: Christian König christian.koenig@amd.com

Fence contexts are created on the fly (for example) by the GPU scheduler used in the amdgpu driver as a result of an userspace request. Because of this userspace could in theory force a wrap around of the 32bit context number if it doesn't behave well.

Avoid this by increasing the context number to 64bits. This way even when userspace manages to allocate a billion contexts per second it takes more than 500 years for the context number to wrap around.

v2: fix printf formats as well.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/fence.c | 8 ++++---- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gpu.h | 2 +- drivers/gpu/drm/nouveau/nouveau_fence.h | 3 ++- drivers/gpu/drm/qxl/qxl_release.c | 2 +- drivers/gpu/drm/radeon/radeon.h | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +- drivers/staging/android/sync.h | 3 ++- include/linux/fence.h | 13 +++++++------ 10 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/dma-buf/fence.c b/drivers/dma-buf/fence.c index 7b05dbe..4d51f9e 100644 --- a/drivers/dma-buf/fence.c +++ b/drivers/dma-buf/fence.c @@ -35,7 +35,7 @@ EXPORT_TRACEPOINT_SYMBOL(fence_emit); * context or not. One device can have multiple separate contexts, * and they're used if some engine can run independently of another. */ -static atomic_t fence_context_counter = ATOMIC_INIT(0); +static atomic64_t fence_context_counter = ATOMIC64_INIT(0);

/** * fence_context_alloc - allocate an array of fence contexts @@ -44,10 +44,10 @@ static atomic_t fence_context_counter = ATOMIC_INIT(0); * This function will return the first index of the number of fences allocated. * The fence context is used for setting fence->context to a unique number. */ -unsigned fence_context_alloc(unsigned num) +u64 fence_context_alloc(unsigned num) { BUG_ON(!num); - return atomic_add_return(num, &fence_context_counter) - num; + return atomic64_add_return(num, &fence_context_counter) - num; } EXPORT_SYMBOL(fence_context_alloc);

@@ -513,7 +513,7 @@ EXPORT_SYMBOL(fence_wait_any_timeout); */ void fence_init(struct fence *fence, const struct fence_ops *ops, - spinlock_t *lock, unsigned context, unsigned seqno) + spinlock_t *lock, u64 context, unsigned seqno) { BUG_ON(!lock); BUG_ON(!ops || !ops->wait || !ops->enable_signaling || diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 3d0cec2..7d25977 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -2045,7 +2045,7 @@ struct amdgpu_device { struct amdgpu_irq_src hpd_irq;

/* rings */ - unsigned fence_context; + u64 fence_context; unsigned num_rings; struct amdgpu_ring *rings[AMDGPU_MAX_RINGS]; bool ib_pool_ready; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c index 48618ee..d8af37a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c @@ -428,7 +428,7 @@ void amdgpu_sa_bo_dump_debug_info(struct amdgpu_sa_manager *sa_manager, soffset, eoffset, eoffset - soffset);

if (i->fence) - seq_printf(m, " protected by 0x%08x on context %d", + seq_printf(m, " protected by 0x%08x on context %llu", i->fence->seqno, i->fence->context);

seq_printf(m, "\n"); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h index f233ac4..6a69214 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.h @@ -123,7 +123,7 @@ struct etnaviv_gpu { u32 completed_fence; u32 retired_fence; wait_queue_head_t fence_event; - unsigned int fence_context; + u64 fence_context; spinlock_t fence_spinlock;

/* worker for handling active-list retiring: */ diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h index 2e3a62d..64c4ce7 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.h +++ b/drivers/gpu/drm/nouveau/nouveau_fence.h @@ -57,7 +57,8 @@ struct nouveau_fence_priv { int (*context_new)(struct nouveau_channel *); void (*context_del)(struct nouveau_channel *);

- u32 contexts, context_base; + u32 contexts; + u64 context_base; bool uevent; };

diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c index 4efa8e2..f599cd0 100644 --- a/drivers/gpu/drm/qxl/qxl_release.c +++ b/drivers/gpu/drm/qxl/qxl_release.c @@ -96,7 +96,7 @@ retry: return 0;

if (have_drawable_releases && sc > 300) { - FENCE_WARN(fence, "failed to wait on release %d " + FENCE_WARN(fence, "failed to wait on release %llu " "after spincount %d\n", fence->context & ~0xf0000000, sc); goto signaled; diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index 80b24a4..5633ee3 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -2386,7 +2386,7 @@ struct radeon_device { struct radeon_mman mman; struct radeon_fence_driver fence_drv[RADEON_NUM_RINGS]; wait_queue_head_t fence_queue; - unsigned fence_context; + u64 fence_context; struct mutex ring_lock; struct radeon_ring ring[RADEON_NUM_RINGS]; bool ib_pool_ready; diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index 8e689b4..96308e3 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -46,7 +46,7 @@ struct vmw_fence_manager { bool goal_irq_on; /* Protected by @goal_irq_mutex */ bool seqno_valid; /* Protected by @lock, and may not be set to true without the @goal_irq_mutex held. */ - unsigned ctx; + u64 ctx; };

struct vmw_user_fence { diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h index afa0752..9298677 100644 --- a/drivers/staging/android/sync.h +++ b/drivers/staging/android/sync.h @@ -96,7 +96,8 @@ struct sync_timeline {

/* protected by child_list_lock */ bool destroyed; - int context, value; + u64 context; + int value;

struct list_head child_list_head; spinlock_t child_list_lock; diff --git a/include/linux/fence.h b/include/linux/fence.h index 5aa95eb..a2c4f53 100644 --- a/include/linux/fence.h +++ b/include/linux/fence.h @@ -75,7 +75,8 @@ struct fence { struct rcu_head rcu; struct list_head cb_list; spinlock_t *lock; - unsigned context, seqno; + u64 context; + unsigned seqno; unsigned long flags; ktime_t timestamp; int status; @@ -176,7 +177,7 @@ struct fence_ops { };

void fence_init(struct fence *fence, const struct fence_ops *ops, - spinlock_t *lock, unsigned context, unsigned seqno); + spinlock_t *lock, u64 context, unsigned seqno);

void fence_release(struct kref *kref); void fence_free(struct fence *fence); @@ -350,27 +351,27 @@ static inline signed long fence_wait(struct fence *fence, bool intr) return ret < 0 ? ret : 0; }

-unsigned fence_context_alloc(unsigned num); +u64 fence_context_alloc(unsigned num);

#define FENCE_TRACE(f, fmt, args...) \ do { \ struct fence *__ff = (f); \ if (config_enabled(CONFIG_FENCE_TRACE)) \ - pr_info("f %u#%u: " fmt, \ + pr_info("f %llu#%u: " fmt, \ __ff->context, __ff->seqno, ##args); \ } while (0)

#define FENCE_WARN(f, fmt, args...) \ do { \ struct fence *__ff = (f); \ - pr_warn("f %u#%u: " fmt, __ff->context, __ff->seqno, \ + pr_warn("f %llu#%u: " fmt, __ff->context, __ff->seqno, \ ##args); \ } while (0)

#define FENCE_ERR(f, fmt, args...) \ do { \ struct fence *__ff = (f); \ - pr_err("f %u#%u: " fmt, __ff->context, __ff->seqno, \ + pr_err("f %llu#%u: " fmt, __ff->context, __ff->seqno, \ ##args); \ } while (0)

-- 2.5.0

Gustavo Padovan

3:23 p.m.

New subject: [PATCH 01/11] dma-buf/fence: make fence context 64 bit v2

2016-06-01 Christian König deathsimple@vodafone.de:

...

From: Christian König christian.koenig@amd.com

Fence contexts are created on the fly (for example) by the GPU scheduler used in the amdgpu driver as a result of an userspace request. Because of this userspace could in theory force a wrap around of the 32bit context number if it doesn't behave well.

Avoid this by increasing the context number to 64bits. This way even when userspace manages to allocate a billion contexts per second it takes more than 500 years for the context number to wrap around.

v2: fix printf formats as well.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/fence.c | 8 ++++---- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gpu.h | 2 +- drivers/gpu/drm/nouveau/nouveau_fence.h | 3 ++- drivers/gpu/drm/qxl/qxl_release.c | 2 +- drivers/gpu/drm/radeon/radeon.h | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +- drivers/staging/android/sync.h | 3 ++- include/linux/fence.h | 13 +++++++------ 10 files changed, 21 insertions(+), 18 deletions(-)

Reviewed-by: Gustavo Padovan gustavo.padovan@collabora.co.uk

Gustavo

Christian König

1:10 p.m.

New subject: [PATCH 02/11] dma-buf/fence: add fence_array fences v6

From: Gustavo Padovan gustavo.padovan@collabora.co.uk

struct fence_collection inherits from struct fence and carries a collection of fences that needs to be waited together.

It is useful to translate a sync_file to a fence to remove the complexity of dealing with sync_files on DRM drivers. So even if there are many fences in the sync_file that needs to waited for a commit to happen, they all get added to the fence_collection and passed for DRM use as a standard struct fence.

That means that no changes needed to any driver besides supporting fences.

fence_collection's fence doesn't belong to any timeline context, so fence_is_later() and fence_later() are not meant to be called with fence_collections fences.

v2: Comments by Daniel Vetter: - merge fence_collection_init() and fence_collection_add() - only add callbacks at ->enable_signalling() - remove fence_collection_put() - check for type on to_fence_collection() - adjust fence_is_later() and fence_later() to WARN_ON() if they are used with collection fences.

v3: - Initialize fence_cb.node at fence init.

Comments by Chris Wilson: - return "unbound" on fence_collection_get_timeline_name() - don't stop adding callbacks if one fails - remove redundant !! on fence_collection_enable_signaling() - remove redundant () on fence_collection_signaled - use fence_default_wait() instead

v4 (chk): Rework, simplification and cleanup: - Drop FENCE_NO_CONTEXT handling, always allocate a context. - Rename to fence_array. - Return fixed driver name. - Register only one callback at a time. - Document that create function takes ownership of array.

v5 (chk): More work and fixes: - Avoid deadlocks by adding all callbacks at once again. - Stop trying to remove the callbacks. - Provide context and sequence number for the array fence.

v6 (chk): Fixes found during testing - Fix stupid typo in _enable_signaling().

Signed-off-by: Gustavo Padovan gustavo.padovan@collabora.co.uk Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/Makefile | 2 +- drivers/dma-buf/fence-array.c | 127 ++++++++++++++++++++++++++++++++++++++++++ include/linux/fence-array.h | 72 ++++++++++++++++++++++++ 3 files changed, 200 insertions(+), 1 deletion(-) create mode 100644 drivers/dma-buf/fence-array.c create mode 100644 include/linux/fence-array.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index 57a675f..85f6928 100644 --- a/drivers/dma-buf/Makefile +++ b/drivers/dma-buf/Makefile @@ -1 +1 @@ -obj-y := dma-buf.o fence.o reservation.o seqno-fence.o +obj-y := dma-buf.o fence.o reservation.o seqno-fence.o fence-array.o diff --git a/drivers/dma-buf/fence-array.c b/drivers/dma-buf/fence-array.c new file mode 100644 index 0000000..8141217 --- /dev/null +++ b/drivers/dma-buf/fence-array.c @@ -0,0 +1,127 @@ +/* + * fence-array: aggregate fences to be waited together + * + * Copyright (C) 2016 Collabora Ltd + * Copyright (C) 2016 Advanced Micro Devices, Inc. + * Authors: + * Gustavo Padovan gustavo@padovan.org + * Christian König christian.koenig@amd.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#include <linux/export.h> +#include <linux/slab.h> +#include <linux/fence-array.h> + +static void fence_array_cb_func(struct fence *f, struct fence_cb *cb); + +static const char *fence_array_get_driver_name(struct fence *fence) +{ + return "fence_array"; +} + +static const char *fence_array_get_timeline_name(struct fence *fence) +{ + return "unbound"; +} + +static void fence_array_cb_func(struct fence *f, struct fence_cb *cb) +{ + struct fence_array_cb *array_cb = + container_of(cb, struct fence_array_cb, cb); + struct fence_array *array = array_cb->array; + + if (atomic_dec_and_test(&array->num_pending)) + fence_signal(&array->base); +} + +static bool fence_array_enable_signaling(struct fence *fence) +{ + struct fence_array *array = to_fence_array(fence); + struct fence_array_cb *cb = (void *)(&array[1]); + unsigned i; + + for (i = 0; i < array->num_fences; ++i) { + cb[i].array = array; + if (fence_add_callback(array->fences[i], &cb[i].cb, + fence_array_cb_func)) + if (atomic_dec_and_test(&array->num_pending)) + return false; + } + + return true; +} + +static bool fence_array_signaled(struct fence *fence) +{ + struct fence_array *array = to_fence_array(fence); + + return atomic_read(&array->num_pending) == 0; +} + +static void fence_array_release(struct fence *fence) +{ + struct fence_array *array = to_fence_array(fence); + unsigned i; + + for (i = 0; i < array->num_fences; ++i) + fence_put(array->fences[i]); + + kfree(array->fences); + fence_free(fence); +} + +const struct fence_ops fence_array_ops = { + .get_driver_name = fence_array_get_driver_name, + .get_timeline_name = fence_array_get_timeline_name, + .enable_signaling = fence_array_enable_signaling, + .signaled = fence_array_signaled, + .wait = fence_default_wait, + .release = fence_array_release, +}; + +/** + * fence_array_create - Create a custom fence array + * @num_fences: [in] number of fences to add in the array + * @fences: [in] array containing the fences + * @context: [in] fence context to use + * @seqno: [in] sequence number to use + * + * Allocate a fence_array object and initialize the base fence with fence_init(). + * In case of error it returns NULL. + * + * The caller should allocte the fences array with num_fences size + * and fill it with the fences it wants to add to the object. Ownership of this + * array is take and fence_put() is used on each fence on release. + */ +struct fence_array *fence_array_create(int num_fences, struct fence **fences, + u64 context, unsigned seqno) +{ + struct fence_array *array; + size_t size = sizeof(*array); + + /* Allocate the callback structures behind the array. */ + size += num_fences * sizeof(struct fence_array_cb); + array = kzalloc(size, GFP_KERNEL); + if (!array) + return NULL; + + spin_lock_init(&array->lock); + fence_init(&array->base, &fence_array_ops, &array->lock, + context, seqno); + + array->num_fences = num_fences; + atomic_set(&array->num_pending, num_fences); + array->fences = fences; + + return array; +} +EXPORT_SYMBOL(fence_array_create); diff --git a/include/linux/fence-array.h b/include/linux/fence-array.h new file mode 100644 index 0000000..593ab98 --- /dev/null +++ b/include/linux/fence-array.h @@ -0,0 +1,72 @@ +/* + * fence-array: aggregates fence to be waited together + * + * Copyright (C) 2016 Collabora Ltd + * Copyright (C) 2016 Advanced Micro Devices, Inc. + * Authors: + * Gustavo Padovan gustavo@padovan.org + * Christian König christian.koenig@amd.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef __LINUX_FENCE_ARRAY_H +#define __LINUX_FENCE_ARRAY_H + +#include <linux/fence.h> + +/** + * struct fence_array_cb - callback helper for fence array + * @cb: fence callback structure for signaling + * @array: reference to the parent fence array object + */ +struct fence_array_cb { + struct fence_cb cb; + struct fence_array *array; +}; + +/** + * struct fence_array - fence to represent an array of fences + * @base: fence base class + * @lock: spinlock for fence handling + * @num_fences: number of fences in the array + * @num_pending: fences in the array still pending + * @fences: array of the fences + */ +struct fence_array { + struct fence base; + + spinlock_t lock; + unsigned num_fences; + atomic_t num_pending; + struct fence **fences; +}; + +extern const struct fence_ops fence_array_ops; + +/** + * to_fence_array - cast a fence to a fence_array + * @fence: fence to cast to a fence_array + * + * Returns NULL if the fence is not a fence_array, + * or the fence_array otherwise. + */ +static inline struct fence_array *to_fence_array(struct fence *fence) +{ + if (fence->ops != &fence_array_ops) + return NULL; + + return container_of(fence, struct fence_array, base); +} + +struct fence_array *fence_array_create(int num_fences, struct fence **fences, + u64 context, unsigned seqno); + +#endif /* __LINUX_FENCE_ARRAY_H */

-- 2.5.0

Gustavo Padovan

2:01 p.m.

New subject: [PATCH 02/11] dma-buf/fence: add fence_array fences v6

Hi Christian,

2016-06-01 Christian König deathsimple@vodafone.de:

...

From: Gustavo Padovan gustavo.padovan@collabora.co.uk

struct fence_collection inherits from struct fence and carries a collection of fences that needs to be waited together.

It is useful to translate a sync_file to a fence to remove the complexity of dealing with sync_files on DRM drivers. So even if there are many fences in the sync_file that needs to waited for a commit to happen, they all get added to the fence_collection and passed for DRM use as a standard struct fence.

That means that no changes needed to any driver besides supporting fences.

fence_collection's fence doesn't belong to any timeline context, so fence_is_later() and fence_later() are not meant to be called with fence_collections fences.

The commit message needs to be fixed to say mention fence_array instead of fence_collection and we do create fence contexts for fence_arrays now.

Gustavo

Gustavo Padovan

3:25 p.m.

New subject: [PATCH 02/11] dma-buf/fence: add fence_array fences v6

2016-06-01 Christian König deathsimple@vodafone.de:

...

From: Gustavo Padovan gustavo.padovan@collabora.co.uk

struct fence_collection inherits from struct fence and carries a collection of fences that needs to be waited together.

It is useful to translate a sync_file to a fence to remove the complexity of dealing with sync_files on DRM drivers. So even if there are many fences in the sync_file that needs to waited for a commit to happen, they all get added to the fence_collection and passed for DRM use as a standard struct fence.

That means that no changes needed to any driver besides supporting fences.

fence_collection's fence doesn't belong to any timeline context, so fence_is_later() and fence_later() are not meant to be called with fence_collections fences.

v2: Comments by Daniel Vetter:

merge fence_collection_init() and fence_collection_add()

only add callbacks at ->enable_signalling()

remove fence_collection_put()

check for type on to_fence_collection()

adjust fence_is_later() and fence_later() to WARN_ON() if they

are used with collection fences.

v3: - Initialize fence_cb.node at fence init.
Comments by Chris Wilson:
return "unbound" on fence_collection_get_timeline_name()

don't stop adding callbacks if one fails

remove redundant !! on fence_collection_enable_signaling()

remove redundant () on fence_collection_signaled

use fence_default_wait() instead

v4 (chk): Rework, simplification and cleanup:

Drop FENCE_NO_CONTEXT handling, always allocate a context.

Rename to fence_array.

Return fixed driver name.

Register only one callback at a time.

Document that create function takes ownership of array.

v5 (chk): More work and fixes:

Avoid deadlocks by adding all callbacks at once again.

Stop trying to remove the callbacks.

Provide context and sequence number for the array fence.

v6 (chk): Fixes found during testing

Fix stupid typo in _enable_signaling().

Signed-off-by: Gustavo Padovan gustavo.padovan@collabora.co.uk Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/Makefile | 2 +- drivers/dma-buf/fence-array.c | 127 ++++++++++++++++++++++++++++++++++++++++++ include/linux/fence-array.h | 72 ++++++++++++++++++++++++ 3 files changed, 200 insertions(+), 1 deletion(-) create mode 100644 drivers/dma-buf/fence-array.c create mode 100644 include/linux/fence-array.h

This is working fine to me. Once the commit message is fixed:

Reviewed-by: Gustavo Padovan gustavo.padovan@collabora.co.uk

You need to Cc Sumit here to decide who is picking these patches. It would be better to get them through the drm trees so we would need his Ack at least.

Gustavo

Sumit Semwal

4:24 p.m.

New subject: [PATCH 02/11] dma-buf/fence: add fence_array fences v6

Hi Christian, Gustavo,

Thanks for these patches.

On 1 June 2016 at 20:55, Gustavo Padovan gustavo@padovan.org wrote:

...

2016-06-01 Christian König deathsimple@vodafone.de:

...
From: Gustavo Padovan gustavo.padovan@collabora.co.uk

struct fence_collection inherits from struct fence and carries a collection of fences that needs to be waited together.

It is useful to translate a sync_file to a fence to remove the complexity of dealing with sync_files on DRM drivers. So even if there are many fences in the sync_file that needs to waited for a commit to happen, they all get added to the fence_collection and passed for DRM use as a standard struct fence.

That means that no changes needed to any driver besides supporting fences.

fence_collection's fence doesn't belong to any timeline context, so fence_is_later() and fence_later() are not meant to be called with fence_collections fences.

v2: Comments by Daniel Vetter: - merge fence_collection_init() and fence_collection_add() - only add callbacks at ->enable_signalling() - remove fence_collection_put() - check for type on to_fence_collection() - adjust fence_is_later() and fence_later() to WARN_ON() if they are used with collection fences.

v3: - Initialize fence_cb.node at fence init.
Comments by Chris Wilson:
  - return "unbound" on fence_collection_get_timeline_name()
  - don't stop adding callbacks if one fails
  - remove redundant !! on fence_collection_enable_signaling()
  - remove redundant () on fence_collection_signaled
  - use fence_default_wait() instead
v4 (chk): Rework, simplification and cleanup: - Drop FENCE_NO_CONTEXT handling, always allocate a context. - Rename to fence_array. - Return fixed driver name. - Register only one callback at a time. - Document that create function takes ownership of array.

v5 (chk): More work and fixes: - Avoid deadlocks by adding all callbacks at once again. - Stop trying to remove the callbacks. - Provide context and sequence number for the array fence.

v6 (chk): Fixes found during testing - Fix stupid typo in _enable_signaling().

Signed-off-by: Gustavo Padovan gustavo.padovan@collabora.co.uk Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/Makefile | 2 +- drivers/dma-buf/fence-array.c | 127 ++++++++++++++++++++++++++++++++++++++++++ include/linux/fence-array.h | 72 ++++++++++++++++++++++++ 3 files changed, 200 insertions(+), 1 deletion(-) create mode 100644 drivers/dma-buf/fence-array.c create mode 100644 include/linux/fence-array.h
This is working fine to me. Once the commit message is fixed:

Reviewed-by: Gustavo Padovan gustavo.padovan@collabora.co.uk

You need to Cc Sumit here to decide who is picking these patches. It would be better to get them through the drm trees so we would need his Ack at least.

+1 on the commit message consistency change; post that, please feel free to add my Acked-by: Sumit Semwal sumit.semwal@linaro.org

for the 3 dma-buf/fences patches and request to take them via DRM tree as rightly suggested by Gustavo.

...

    Gustavo

BR, Sumit.

Daniel Vetter

10:44 p.m.

New subject: [PATCH 02/11] dma-buf/fence: add fence_array fences v6

On Wed, Jun 01, 2016 at 09:54:04PM +0530, Sumit Semwal wrote:

...

Hi Christian, Gustavo,

Thanks for these patches.

On 1 June 2016 at 20:55, Gustavo Padovan gustavo@padovan.org wrote:

...
2016-06-01 Christian König deathsimple@vodafone.de:

...
From: Gustavo Padovan gustavo.padovan@collabora.co.uk

struct fence_collection inherits from struct fence and carries a collection of fences that needs to be waited together.

It is useful to translate a sync_file to a fence to remove the complexity of dealing with sync_files on DRM drivers. So even if there are many fences in the sync_file that needs to waited for a commit to happen, they all get added to the fence_collection and passed for DRM use as a standard struct fence.

That means that no changes needed to any driver besides supporting fences.

fence_collection's fence doesn't belong to any timeline context, so fence_is_later() and fence_later() are not meant to be called with fence_collections fences.

v2: Comments by Daniel Vetter: - merge fence_collection_init() and fence_collection_add() - only add callbacks at ->enable_signalling() - remove fence_collection_put() - check for type on to_fence_collection() - adjust fence_is_later() and fence_later() to WARN_ON() if they are used with collection fences.

v3: - Initialize fence_cb.node at fence init.
Comments by Chris Wilson:
  - return "unbound" on fence_collection_get_timeline_name()
  - don't stop adding callbacks if one fails
  - remove redundant !! on fence_collection_enable_signaling()
  - remove redundant () on fence_collection_signaled
  - use fence_default_wait() instead
v4 (chk): Rework, simplification and cleanup: - Drop FENCE_NO_CONTEXT handling, always allocate a context. - Rename to fence_array. - Return fixed driver name. - Register only one callback at a time. - Document that create function takes ownership of array.

v5 (chk): More work and fixes: - Avoid deadlocks by adding all callbacks at once again. - Stop trying to remove the callbacks. - Provide context and sequence number for the array fence.

v6 (chk): Fixes found during testing - Fix stupid typo in _enable_signaling().

Signed-off-by: Gustavo Padovan gustavo.padovan@collabora.co.uk Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/Makefile | 2 +- drivers/dma-buf/fence-array.c | 127 ++++++++++++++++++++++++++++++++++++++++++ include/linux/fence-array.h | 72 ++++++++++++++++++++++++ 3 files changed, 200 insertions(+), 1 deletion(-) create mode 100644 drivers/dma-buf/fence-array.c create mode 100644 include/linux/fence-array.h
This is working fine to me. Once the commit message is fixed:

Reviewed-by: Gustavo Padovan gustavo.padovan@collabora.co.uk

You need to Cc Sumit here to decide who is picking these patches. It would be better to get them through the drm trees so we would need his Ack at least.
+1 on the commit message consistency change; post that, please feel free to add my Acked-by: Sumit Semwal sumit.semwal@linaro.org

for the 3 dma-buf/fences patches and request to take them via DRM tree as rightly suggested by Gustavo.

Commit message improved and all 3 queued up for drm-misc. I haven't pushed out the new branch yet though, since I'm waiting for Dave to open drm-next and pull in a few other bits first so that I can rebase.

Thanks, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

2 Jun 2 Jun

7:20 a.m.

New subject: [PATCH 02/11] dma-buf/fence: add fence_array fences v6

Am 02.06.2016 um 00:44 schrieb Daniel Vetter:

...

On Wed, Jun 01, 2016 at 09:54:04PM +0530, Sumit Semwal wrote:

...
Hi Christian, Gustavo,

Thanks for these patches.

On 1 June 2016 at 20:55, Gustavo Padovan gustavo@padovan.org wrote:

...
2016-06-01 Christian König deathsimple@vodafone.de:

...
From: Gustavo Padovan gustavo.padovan@collabora.co.uk

struct fence_collection inherits from struct fence and carries a collection of fences that needs to be waited together.

It is useful to translate a sync_file to a fence to remove the complexity of dealing with sync_files on DRM drivers. So even if there are many fences in the sync_file that needs to waited for a commit to happen, they all get added to the fence_collection and passed for DRM use as a standard struct fence.

That means that no changes needed to any driver besides supporting fences.

fence_collection's fence doesn't belong to any timeline context, so fence_is_later() and fence_later() are not meant to be called with fence_collections fences.

v2: Comments by Daniel Vetter: - merge fence_collection_init() and fence_collection_add() - only add callbacks at ->enable_signalling() - remove fence_collection_put() - check for type on to_fence_collection() - adjust fence_is_later() and fence_later() to WARN_ON() if they are used with collection fences.

v3: - Initialize fence_cb.node at fence init.
 Comments by Chris Wilson:
   - return "unbound" on fence_collection_get_timeline_name()
   - don't stop adding callbacks if one fails
   - remove redundant !! on fence_collection_enable_signaling()
   - remove redundant () on fence_collection_signaled
   - use fence_default_wait() instead
v4 (chk): Rework, simplification and cleanup: - Drop FENCE_NO_CONTEXT handling, always allocate a context. - Rename to fence_array. - Return fixed driver name. - Register only one callback at a time. - Document that create function takes ownership of array.

v5 (chk): More work and fixes: - Avoid deadlocks by adding all callbacks at once again. - Stop trying to remove the callbacks. - Provide context and sequence number for the array fence.

v6 (chk): Fixes found during testing - Fix stupid typo in _enable_signaling().

Signed-off-by: Gustavo Padovan gustavo.padovan@collabora.co.uk Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/Makefile | 2 +- drivers/dma-buf/fence-array.c | 127 ++++++++++++++++++++++++++++++++++++++++++ include/linux/fence-array.h | 72 ++++++++++++++++++++++++ 3 files changed, 200 insertions(+), 1 deletion(-) create mode 100644 drivers/dma-buf/fence-array.c create mode 100644 include/linux/fence-array.h
This is working fine to me. Once the commit message is fixed:

Reviewed-by: Gustavo Padovan gustavo.padovan@collabora.co.uk

You need to Cc Sumit here to decide who is picking these patches. It would be better to get them through the drm trees so we would need his Ack at least.
+1 on the commit message consistency change; post that, please feel free to add my Acked-by: Sumit Semwal sumit.semwal@linaro.org

for the 3 dma-buf/fences patches and request to take them via DRM tree as rightly suggested by Gustavo.
Commit message improved and all 3 queued up for drm-misc. I haven't pushed out the new branch yet though, since I'm waiting for Dave to open drm-next and pull in a few other bits first so that I can rebase.

Thanks and sorry for missing the commit message. I was a bit to concentrated on fixing the code.

Leave me a note when the branch is public available, cause I then want to rebase the amdgpu changes on top and send a pull request to Alex.

Christian.

...

Thanks, Daniel

Christian König

1 Jun 1 Jun

1:10 p.m.

New subject: [PATCH 03/11] dma-buf/fence: add signal_on_any to the fence array v2

From: Christian König christian.koenig@amd.com

If @signal_on_any is true the fence array signals if any fence in the array signals, otherwise it signals when all fences in the array signal.

v2: fix signaled test and add comment suggested by Chris Wilson.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/fence-array.c | 33 +++++++++++++++++++++++++-------- include/linux/fence-array.h | 3 ++- 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/dma-buf/fence-array.c b/drivers/dma-buf/fence-array.c index 8141217..a8731c8 100644 --- a/drivers/dma-buf/fence-array.c +++ b/drivers/dma-buf/fence-array.c @@ -41,6 +41,7 @@ static void fence_array_cb_func(struct fence *f, struct fence_cb *cb)

if (atomic_dec_and_test(&array->num_pending)) fence_signal(&array->base); + fence_put(&array->base); }

static bool fence_array_enable_signaling(struct fence *fence) @@ -51,10 +52,21 @@ static bool fence_array_enable_signaling(struct fence *fence)

for (i = 0; i < array->num_fences; ++i) { cb[i].array = array; + /* + * As we may report that the fence is signaled before all + * callbacks are complete, we need to take an additional + * reference count on the array so that we do not free it too + * early. The core fence handling will only hold the reference + * until we signal the array as complete (but that is now + * insufficient). + */ + fence_get(&array->base); if (fence_add_callback(array->fences[i], &cb[i].cb, - fence_array_cb_func)) + fence_array_cb_func)) { + fence_put(&array->base); if (atomic_dec_and_test(&array->num_pending)) return false; + } }

return true; @@ -64,7 +76,7 @@ static bool fence_array_signaled(struct fence *fence) { struct fence_array *array = to_fence_array(fence);

- return atomic_read(&array->num_pending) == 0; + return atomic_read(&array->num_pending) <= 0; }

static void fence_array_release(struct fence *fence) @@ -90,10 +102,11 @@ const struct fence_ops fence_array_ops = {

/** * fence_array_create - Create a custom fence array - * @num_fences: [in] number of fences to add in the array - * @fences: [in] array containing the fences - * @context: [in] fence context to use - * @seqno: [in] sequence number to use + * @num_fences: [in] number of fences to add in the array + * @fences: [in] array containing the fences + * @context: [in] fence context to use + * @seqno: [in] sequence number to use + * @signal_on_any [in] signal on any fence in the array * * Allocate a fence_array object and initialize the base fence with fence_init(). * In case of error it returns NULL. @@ -101,9 +114,13 @@ const struct fence_ops fence_array_ops = { * The caller should allocte the fences array with num_fences size * and fill it with the fences it wants to add to the object. Ownership of this * array is take and fence_put() is used on each fence on release. + * + * If @signal_on_any is true the fence array signals if any fence in the array + * signals, otherwise it signals when all fences in the array signal. */ struct fence_array *fence_array_create(int num_fences, struct fence **fences, - u64 context, unsigned seqno) + u64 context, unsigned seqno, + bool signal_on_any) { struct fence_array *array; size_t size = sizeof(*array); @@ -119,7 +136,7 @@ struct fence_array *fence_array_create(int num_fences, struct fence **fences, context, seqno);

array->num_fences = num_fences; - atomic_set(&array->num_pending, num_fences); + atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences); array->fences = fences;

return array; diff --git a/include/linux/fence-array.h b/include/linux/fence-array.h index 593ab98..86baaa4 100644 --- a/include/linux/fence-array.h +++ b/include/linux/fence-array.h @@ -67,6 +67,7 @@ static inline struct fence_array *to_fence_array(struct fence *fence) }

struct fence_array *fence_array_create(int num_fences, struct fence **fences, - u64 context, unsigned seqno); + u64 context, unsigned seqno, + bool signal_on_any);

#endif /* __LINUX_FENCE_ARRAY_H */

-- 2.5.0

Gustavo Padovan

3:25 p.m.

New subject: [PATCH 03/11] dma-buf/fence: add signal_on_any to the fence array v2

2016-06-01 Christian König deathsimple@vodafone.de:

...

From: Christian König christian.koenig@amd.com

If @signal_on_any is true the fence array signals if any fence in the array signals, otherwise it signals when all fences in the array signal.

v2: fix signaled test and add comment suggested by Chris Wilson.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/fence-array.c | 33 +++++++++++++++++++++++++-------- include/linux/fence-array.h | 3 ++- 2 files changed, 27 insertions(+), 9 deletions(-)

Reviewed-by: Gustavo Padovan gustavo.padovan@collabora.co.uk

Gustavo

Christian König

1:10 p.m.

New subject: [PATCH 04/11] drm/amdgpu: document amdgpu_sync_get_fence

From: Christian König christian.koenig@amd.com

It's not obvious what it should do.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 7 +++++++ 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c index 34a9280..e0ff1a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c @@ -297,6 +297,13 @@ int amdgpu_sync_cycle_fences(struct amdgpu_sync *dst, struct amdgpu_sync *src, return 0; }

+/** + * amdgpu_sync_get_fence - get the next fence from the sync object + * + * @sync: sync object to use + * + * Get and removes the next fence from the sync object not signaled yet. + */ struct fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync) { struct amdgpu_sync_entry *e;

-- 2.5.0

Christian König

1:10 p.m.

New subject: [PATCH 05/11] drm/amdgpu: generalize the scheduler fence

From: Christian König christian.koenig@amd.com

Make it two events, one for the job being scheduled and one when it is finished.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 4 +- drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h | 4 +- drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 34 +++++++------ drivers/gpu/drm/amd/scheduler/gpu_scheduler.h | 19 ++++---- drivers/gpu/drm/amd/scheduler/sched_fence.c | 63 ++++++++++++++++++------- 6 files changed, 79 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index d4791a7..ddfed93 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -85,7 +85,7 @@ static void amdgpu_job_free_resources(struct amdgpu_job *job) unsigned i;

/* use sched fence if available */ - f = job->base.s_fence ? &job->base.s_fence->base : job->fence; + f = job->base.s_fence ? &job->base.s_fence->finished : job->fence;

for (i = 0; i < job->num_ibs; ++i) amdgpu_ib_free(job->adev, &job->ibs[i], f); @@ -143,7 +143,7 @@ static struct fence *amdgpu_job_dependency(struct amd_sched_job *sched_job) int r;

r = amdgpu_vm_grab_id(vm, ring, &job->sync, - &job->base.s_fence->base, + &job->base.s_fence->finished, &job->vm_id, &job->vm_pd_addr); if (r) DRM_ERROR("Error getting VM ID (%d)\n", r); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h index 38e5689..ecd08f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h @@ -102,7 +102,7 @@ TRACE_EVENT(amdgpu_cs_ioctl, __entry->adev = job->adev; __entry->sched_job = &job->base; __entry->ib = job->ibs; - __entry->fence = &job->base.s_fence->base; + __entry->fence = &job->base.s_fence->finished; __entry->ring_name = job->ring->name; __entry->num_ibs = job->num_ibs; ), @@ -127,7 +127,7 @@ TRACE_EVENT(amdgpu_sched_run_job, __entry->adev = job->adev; __entry->sched_job = &job->base; __entry->ib = job->ibs; - __entry->fence = &job->base.s_fence->base; + __entry->fence = &job->base.s_fence->finished; __entry->ring_name = job->ring->name; __entry->num_ibs = job->num_ibs; ), diff --git a/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h b/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h index c89dc77..b961a1c 100644 --- a/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h +++ b/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h @@ -26,7 +26,7 @@ TRACE_EVENT(amd_sched_job, TP_fast_assign( __entry->entity = sched_job->s_entity; __entry->sched_job = sched_job; - __entry->fence = &sched_job->s_fence->base; + __entry->fence = &sched_job->s_fence->finished; __entry->name = sched_job->sched->name; __entry->job_count = kfifo_len( &sched_job->s_entity->job_queue) / sizeof(sched_job); @@ -46,7 +46,7 @@ TRACE_EVENT(amd_sched_process_job, ),

TP_fast_assign( - __entry->fence = &fence->base; + __entry->fence = &fence->finished; ), TP_printk("fence=%p signaled", __entry->fence) ); diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c index 2425172..74aa0b3 100644 --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c @@ -140,7 +140,7 @@ int amd_sched_entity_init(struct amd_gpu_scheduler *sched, return r;

atomic_set(&entity->fence_seq, 0); - entity->fence_context = fence_context_alloc(1); + entity->fence_context = fence_context_alloc(2);

return 0; } @@ -251,17 +251,21 @@ static bool amd_sched_entity_add_dependency_cb(struct amd_sched_entity *entity)

s_fence = to_amd_sched_fence(fence); if (s_fence && s_fence->sched == sched) { - /* Fence is from the same scheduler */ - if (test_bit(AMD_SCHED_FENCE_SCHEDULED_BIT, &fence->flags)) { - /* Ignore it when it is already scheduled */ - fence_put(entity->dependency); - return false; - }

- /* Wait for fence to be scheduled */ - entity->cb.func = amd_sched_entity_clear_dep; - list_add_tail(&entity->cb.node, &s_fence->scheduled_cb); - return true; + /* + * Fence is from the same scheduler, only need to wait for + * it to be scheduled + */ + fence = fence_get(&s_fence->scheduled); + fence_put(entity->dependency); + entity->dependency = fence; + if (!fence_add_callback(fence, &entity->cb, + amd_sched_entity_clear_dep)) + return true; + + /* Ignore it when it is already scheduled */ + fence_put(fence); + return false; }

if (!fence_add_callback(entity->dependency, &entity->cb, @@ -389,7 +393,7 @@ void amd_sched_entity_push_job(struct amd_sched_job *sched_job) struct amd_sched_entity *entity = sched_job->s_entity;

trace_amd_sched_job(sched_job); - fence_add_callback(&sched_job->s_fence->base, &sched_job->finish_cb, + fence_add_callback(&sched_job->s_fence->finished, &sched_job->finish_cb, amd_sched_job_finish_cb); wait_event(entity->sched->job_scheduled, amd_sched_entity_in(sched_job)); @@ -412,7 +416,7 @@ int amd_sched_job_init(struct amd_sched_job *job, INIT_DELAYED_WORK(&job->work_tdr, amd_sched_job_timedout);

if (fence) - *fence = &job->s_fence->base; + *fence = &job->s_fence->finished; return 0; }

@@ -463,10 +467,10 @@ static void amd_sched_process_job(struct fence *f, struct fence_cb *cb) struct amd_gpu_scheduler *sched = s_fence->sched;

atomic_dec(&sched->hw_rq_count); - amd_sched_fence_signal(s_fence); + amd_sched_fence_finished(s_fence);

trace_amd_sched_process_job(s_fence); - fence_put(&s_fence->base); + fence_put(&s_fence->finished); wake_up_interruptible(&sched->wake_up_worker); }

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h index e63034e..3e989b1 100644 --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h @@ -27,8 +27,6 @@ #include <linux/kfifo.h> #include <linux/fence.h>

-#define AMD_SCHED_FENCE_SCHEDULED_BIT FENCE_FLAG_USER_BITS - struct amd_gpu_scheduler; struct amd_sched_rq;

@@ -68,9 +66,9 @@ struct amd_sched_rq { };

struct amd_sched_fence { - struct fence base; + struct fence scheduled; + struct fence finished; struct fence_cb cb; - struct list_head scheduled_cb; struct amd_gpu_scheduler *sched; spinlock_t lock; void *owner; @@ -86,14 +84,15 @@ struct amd_sched_job { struct delayed_work work_tdr; };

-extern const struct fence_ops amd_sched_fence_ops; +extern const struct fence_ops amd_sched_fence_ops_scheduled; +extern const struct fence_ops amd_sched_fence_ops_finished; static inline struct amd_sched_fence *to_amd_sched_fence(struct fence *f) { - struct amd_sched_fence *__f = container_of(f, struct amd_sched_fence, - base); + if (f->ops == &amd_sched_fence_ops_scheduled) + return container_of(f, struct amd_sched_fence, scheduled);

- if (__f->base.ops == &amd_sched_fence_ops) - return __f; + if (f->ops == &amd_sched_fence_ops_finished) + return container_of(f, struct amd_sched_fence, finished);

return NULL; } @@ -148,7 +147,7 @@ void amd_sched_entity_push_job(struct amd_sched_job *sched_job); struct amd_sched_fence *amd_sched_fence_create( struct amd_sched_entity *s_entity, void *owner); void amd_sched_fence_scheduled(struct amd_sched_fence *fence); -void amd_sched_fence_signal(struct amd_sched_fence *fence); +void amd_sched_fence_finished(struct amd_sched_fence *fence); int amd_sched_job_init(struct amd_sched_job *job, struct amd_gpu_scheduler *sched, struct amd_sched_entity *entity, diff --git a/drivers/gpu/drm/amd/scheduler/sched_fence.c b/drivers/gpu/drm/amd/scheduler/sched_fence.c index 71931bc..a5e3fef 100644 --- a/drivers/gpu/drm/amd/scheduler/sched_fence.c +++ b/drivers/gpu/drm/amd/scheduler/sched_fence.c @@ -37,36 +37,37 @@ struct amd_sched_fence *amd_sched_fence_create(struct amd_sched_entity *entity, if (fence == NULL) return NULL;

- INIT_LIST_HEAD(&fence->scheduled_cb); fence->owner = owner; fence->sched = entity->sched; spin_lock_init(&fence->lock);

seq = atomic_inc_return(&entity->fence_seq); - fence_init(&fence->base, &amd_sched_fence_ops, &fence->lock, - entity->fence_context, seq); + fence_init(&fence->scheduled, &amd_sched_fence_ops_scheduled, + &fence->lock, entity->fence_context, seq); + fence_init(&fence->finished, &amd_sched_fence_ops_finished, + &fence->lock, entity->fence_context + 1, seq);

return fence; }

-void amd_sched_fence_signal(struct amd_sched_fence *fence) +void amd_sched_fence_scheduled(struct amd_sched_fence *fence) { - int ret = fence_signal(&fence->base); + int ret = fence_signal(&fence->scheduled); + if (!ret) - FENCE_TRACE(&fence->base, "signaled from irq context\n"); + FENCE_TRACE(&fence->scheduled, "signaled from irq context\n"); else - FENCE_TRACE(&fence->base, "was already signaled\n"); + FENCE_TRACE(&fence->scheduled, "was already signaled\n"); }

-void amd_sched_fence_scheduled(struct amd_sched_fence *s_fence) +void amd_sched_fence_finished(struct amd_sched_fence *fence) { - struct fence_cb *cur, *tmp; + int ret = fence_signal(&fence->finished);

- set_bit(AMD_SCHED_FENCE_SCHEDULED_BIT, &s_fence->base.flags); - list_for_each_entry_safe(cur, tmp, &s_fence->scheduled_cb, node) { - list_del_init(&cur->node); - cur->func(&s_fence->base, cur); - } + if (!ret) + FENCE_TRACE(&fence->finished, "signaled from irq context\n"); + else + FENCE_TRACE(&fence->finished, "was already signaled\n"); }

static const char *amd_sched_fence_get_driver_name(struct fence *fence) @@ -96,6 +97,7 @@ static void amd_sched_fence_free(struct rcu_head *rcu) { struct fence *f = container_of(rcu, struct fence, rcu); struct amd_sched_fence *fence = to_amd_sched_fence(f); + kmem_cache_free(sched_fence_slab, fence); }

@@ -107,16 +109,41 @@ static void amd_sched_fence_free(struct rcu_head *rcu) * This function is called when the reference count becomes zero. * It just RCU schedules freeing up the fence. */ -static void amd_sched_fence_release(struct fence *f) +static void amd_sched_fence_release_scheduled(struct fence *f) +{ + struct amd_sched_fence *fence = to_amd_sched_fence(f); + + call_rcu(&fence->finished.rcu, amd_sched_fence_free); +} + +/** + * amd_sched_fence_release_scheduled - drop extra reference + * + * @f: fence + * + * Drop the extra reference from the scheduled fence to the base fence. + */ +static void amd_sched_fence_release_finished(struct fence *f) { - call_rcu(&f->rcu, amd_sched_fence_free); + struct amd_sched_fence *fence = to_amd_sched_fence(f); + + fence_put(&fence->scheduled); }

-const struct fence_ops amd_sched_fence_ops = { +const struct fence_ops amd_sched_fence_ops_scheduled = { + .get_driver_name = amd_sched_fence_get_driver_name, + .get_timeline_name = amd_sched_fence_get_timeline_name, + .enable_signaling = amd_sched_fence_enable_signaling, + .signaled = NULL, + .wait = fence_default_wait, + .release = amd_sched_fence_release_scheduled, +}; + +const struct fence_ops amd_sched_fence_ops_finished = { .get_driver_name = amd_sched_fence_get_driver_name, .get_timeline_name = amd_sched_fence_get_timeline_name, .enable_signaling = amd_sched_fence_enable_signaling, .signaled = NULL, .wait = fence_default_wait, - .release = amd_sched_fence_release, + .release = amd_sched_fence_release_finished, };

-- 2.5.0

Christian König

1:10 p.m.

New subject: [PATCH 06/11] drm/amdgpu: remove amdgpu_sync_wait

From: Christian König christian.koenig@amd.com

Stop hiding bugs, instead print a proper error when the scheduler doesn't handle all dependencies.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 +----- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 19 ------------------- 3 files changed, 1 insertion(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 7d25977..2f4ab1b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -601,7 +601,6 @@ bool amdgpu_sync_is_idle(struct amdgpu_sync *sync); int amdgpu_sync_cycle_fences(struct amdgpu_sync *dst, struct amdgpu_sync *src, struct fence *fence); struct fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync); -int amdgpu_sync_wait(struct amdgpu_sync *sync); void amdgpu_sync_free(struct amdgpu_sync *sync); int amdgpu_sync_init(void); void amdgpu_sync_fini(void); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index ddfed93..009e905 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -166,11 +166,7 @@ static struct fence *amdgpu_job_run(struct amd_sched_job *sched_job) } job = to_amdgpu_job(sched_job);

- r = amdgpu_sync_wait(&job->sync); - if (r) { - DRM_ERROR("failed to sync wait (%d)\n", r); - return NULL; - } + BUG_ON(!amdgpu_sync_is_idle(&job->sync));

-int amdgpu_sync_wait(struct amdgpu_sync *sync) -{ - struct amdgpu_sync_entry *e; - struct hlist_node *tmp; - int i, r; - - hash_for_each_safe(sync->fences, i, tmp, e, node) { - r = fence_wait(e->fence, false); - if (r) - return r; - - hash_del(&e->node); - fence_put(e->fence); - kmem_cache_free(amdgpu_sync_slab, e); - } - - return 0; -} - /** * amdgpu_sync_free - free the sync object *

-- 2.5.0

Christian König

1:10 p.m.

New subject: [PATCH 07/11] drm/amdgpu: add optional ring to amdgpu_sync_is_idle

From: Christian König christian.koenig@amd.com

Check if the sync object is idle depending on the ring a submission works with.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 17 +++++++++++++++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 2f4ab1b..f154d9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -597,7 +597,8 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync, struct reservation_object *resv, void *owner); -bool amdgpu_sync_is_idle(struct amdgpu_sync *sync); +bool amdgpu_sync_is_idle(struct amdgpu_sync *sync, + struct amdgpu_ring *ring); int amdgpu_sync_cycle_fences(struct amdgpu_sync *dst, struct amdgpu_sync *src, struct fence *fence); struct fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index 009e905..e395bbe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -166,7 +166,7 @@ static struct fence *amdgpu_job_run(struct amd_sched_job *sched_job) } job = to_amdgpu_job(sched_job);

- BUG_ON(!amdgpu_sync_is_idle(&job->sync)); + BUG_ON(!amdgpu_sync_is_idle(&job->sync, NULL));

trace_amdgpu_sched_run_job(job); r = amdgpu_ib_schedule(job->ring, job->num_ibs, job->ibs, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c index c0ed5b9..a2766d7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c @@ -226,10 +226,13 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, * amdgpu_sync_is_idle - test if all fences are signaled * * @sync: the sync object + * @ring: optional ring to use for test * - * Returns true if all fences in the sync object are signaled. + * Returns true if all fences in the sync object are signaled or scheduled to + * the ring (if provided). */ -bool amdgpu_sync_is_idle(struct amdgpu_sync *sync) +bool amdgpu_sync_is_idle(struct amdgpu_sync *sync, + struct amdgpu_ring *ring) { struct amdgpu_sync_entry *e; struct hlist_node *tmp; @@ -237,6 +240,16 @@ bool amdgpu_sync_is_idle(struct amdgpu_sync *sync)

hash_for_each_safe(sync->fences, i, tmp, e, node) { struct fence *f = e->fence; + struct amd_sched_fence *s_fence = to_amd_sched_fence(f); + + if (ring && s_fence) { + /* For fences from the same ring it is sufficient + * when they are scheduled. + */ + if (s_fence->sched == &ring->sched && + fence_is_signaled(&s_fence->scheduled)) + continue; + }

if (fence_is_signaled(f)) { hash_del(&e->node); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 62a4c12..2bcb623 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -240,13 +240,13 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, struct amdgpu_vm_id, list);

- if (!amdgpu_sync_is_idle(&id->active)) { + if (!amdgpu_sync_is_idle(&id->active, NULL)) { struct list_head *head = &adev->vm_manager.ids_lru; struct amdgpu_vm_id *tmp;

list_for_each_entry_safe(id, tmp, &adev->vm_manager.ids_lru, list) { - if (amdgpu_sync_is_idle(&id->active)) { + if (amdgpu_sync_is_idle(&id->active, NULL)) { list_move(&id->list, head); head = &id->list; }

-- 2.5.0

Christian König

1:10 p.m.

New subject: [PATCH 08/11] drm/amdgpu: prefer VMIDs idle on the current ring

From: Christian König christian.koenig@amd.com

Prefer to use a VMIDs which are idle on the ring we want to submit to. This also removes bubbling idle VMIDs up on the LRU, which is actually not beneficial.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 2bcb623..b6484a2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -236,21 +236,15 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring,

} while (i != ring->idx);

- id = list_first_entry(&adev->vm_manager.ids_lru, - struct amdgpu_vm_id, - list); - - if (!amdgpu_sync_is_idle(&id->active, NULL)) { - struct list_head *head = &adev->vm_manager.ids_lru; - struct amdgpu_vm_id *tmp; - - list_for_each_entry_safe(id, tmp, &adev->vm_manager.ids_lru, - list) { - if (amdgpu_sync_is_idle(&id->active, NULL)) { - list_move(&id->list, head); - head = &id->list; - } - } + /* Check if we have an idle VMID */ + list_for_each_entry(id, &adev->vm_manager.ids_lru, list) { + if (amdgpu_sync_is_idle(&id->active, ring)) + break; + + } + + /* If we can't find a idle VMID to use, just wait for the oldest */ + if (&id->list == &adev->vm_manager.ids_lru) { id = list_first_entry(&adev->vm_manager.ids_lru, struct amdgpu_vm_id, list);

-- 2.5.0

Christian König

1:10 p.m.

New subject: [PATCH 09/11] drm/amdgpu: reuse VMIDs assigned to a VM only if there is also a free one

From: Christian König christian.koenig@amd.com

This fixes a fairness problem with the GPU scheduler. VM having lot of jobs could previously starve VM with less jobs.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 113 +++++++++++++++++---------------- 1 file changed, 59 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index b6484a2..f206820 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -179,75 +179,80 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, uint64_t pd_addr = amdgpu_bo_gpu_offset(vm->page_directory); struct amdgpu_device *adev = ring->adev; struct fence *updates = sync->last_vm_update; - struct amdgpu_vm_id *id; + struct amdgpu_vm_id *id, *idle; unsigned i = ring->idx; int r;

mutex_lock(&adev->vm_manager.lock);

- /* Check if we can use a VMID already assigned to this VM */ - do { - struct fence *flushed; - - id = vm->ids[i++]; - if (i == AMDGPU_MAX_RINGS) - i = 0; - - /* Check all the prerequisites to using this VMID */ - if (!id) - continue; - - if (atomic64_read(&id->owner) != vm->client_id) - continue; - - if (pd_addr != id->pd_gpu_addr) - continue; + /* Check if we have an idle VMID */ + list_for_each_entry(idle, &adev->vm_manager.ids_lru, list) { + if (amdgpu_sync_is_idle(&idle->active, ring)) + break;

- if (id->last_user != ring && - (!id->last_flush || !fence_is_signaled(id->last_flush))) - continue; + }

- flushed = id->flushed_updates; - if (updates && (!flushed || fence_is_later(updates, flushed))) - continue; + /* If we can't find a idle VMID to use, just wait for the oldest */ + if (&idle->list == &adev->vm_manager.ids_lru) { + id = list_first_entry(&adev->vm_manager.ids_lru, + struct amdgpu_vm_id, + list); + } else { + /* Check if we can use a VMID already assigned to this VM */ + do { + struct fence *flushed; + + id = vm->ids[i++]; + if (i == AMDGPU_MAX_RINGS) + i = 0; + + /* Check all the prerequisites to using this VMID */ + if (!id) + continue; + + if (atomic64_read(&id->owner) != vm->client_id) + continue; + + if (pd_addr != id->pd_gpu_addr) + continue; + + if (id->last_user != ring && (!id->last_flush || + !fence_is_signaled(id->last_flush))) + continue; + + flushed = id->flushed_updates; + if (updates && (!flushed || + fence_is_later(updates, flushed))) + continue; + + /* Good we can use this VMID */ + if (id->last_user == ring) { + r = amdgpu_sync_fence(ring->adev, sync, + id->first); + if (r) + goto error; + }

- /* Good we can use this VMID */ - if (id->last_user == ring) { - r = amdgpu_sync_fence(ring->adev, sync, - id->first); + /* And remember this submission as user of the VMID */ + r = amdgpu_sync_fence(ring->adev, &id->active, fence); if (r) goto error; - } - - /* And remember this submission as user of the VMID */ - r = amdgpu_sync_fence(ring->adev, &id->active, fence); - if (r) - goto error;

- list_move_tail(&id->list, &adev->vm_manager.ids_lru); - vm->ids[ring->idx] = id; + list_move_tail(&id->list, &adev->vm_manager.ids_lru); + vm->ids[ring->idx] = id;

- *vm_id = id - adev->vm_manager.ids; - *vm_pd_addr = AMDGPU_VM_NO_FLUSH; - trace_amdgpu_vm_grab_id(vm, ring->idx, *vm_id, *vm_pd_addr); + *vm_id = id - adev->vm_manager.ids; + *vm_pd_addr = AMDGPU_VM_NO_FLUSH; + trace_amdgpu_vm_grab_id(vm, ring->idx, *vm_id, + *vm_pd_addr);

- mutex_unlock(&adev->vm_manager.lock); - return 0; + mutex_unlock(&adev->vm_manager.lock); + return 0;

- } while (i != ring->idx); + } while (i != ring->idx);

- /* Check if we have an idle VMID */ - list_for_each_entry(id, &adev->vm_manager.ids_lru, list) { - if (amdgpu_sync_is_idle(&id->active, ring)) - break; - - } - - /* If we can't find a idle VMID to use, just wait for the oldest */ - if (&id->list == &adev->vm_manager.ids_lru) { - id = list_first_entry(&adev->vm_manager.ids_lru, - struct amdgpu_vm_id, - list); + /* Still no ID to use? Then use the idle one found earlier */ + id = idle; }

r = amdgpu_sync_cycle_fences(sync, &id->active, fence);

-- 2.5.0

Christian König

1:10 p.m.

New subject: [PATCH 10/11] drm/amdgpu: use a fence array for VMID management

From: Christian König christian.koenig@amd.com

Just wait for any fence to become available, instead of waiting for the last entry of the LRU.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 10 +- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 69 +++----------- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 155 +++++++++++++++++++------------ 4 files changed, 117 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index f154d9f..52326d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -597,10 +597,8 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync, struct reservation_object *resv, void *owner); -bool amdgpu_sync_is_idle(struct amdgpu_sync *sync, - struct amdgpu_ring *ring); -int amdgpu_sync_cycle_fences(struct amdgpu_sync *dst, struct amdgpu_sync *src, - struct fence *fence); +struct fence *amdgpu_sync_peek_fence(struct amdgpu_sync *sync, + struct amdgpu_ring *ring); struct fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync); void amdgpu_sync_free(struct amdgpu_sync *sync); int amdgpu_sync_init(void); @@ -910,6 +908,10 @@ struct amdgpu_vm_manager { struct list_head ids_lru; struct amdgpu_vm_id ids[AMDGPU_NUM_VM];

+ /* Handling of VM fences */ + u64 fence_context; + unsigned seqno[AMDGPU_MAX_RINGS]; + uint32_t max_pfn; /* vram base address for page table entry */ u64 vram_base_offset; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c index e395bbe..b50a845 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c @@ -166,7 +166,7 @@ static struct fence *amdgpu_job_run(struct amd_sched_job *sched_job) } job = to_amdgpu_job(sched_job);

- BUG_ON(!amdgpu_sync_is_idle(&job->sync, NULL)); + BUG_ON(amdgpu_sync_peek_fence(&job->sync, NULL));

/** - * amdgpu_sync_is_idle - test if all fences are signaled + * amdgpu_sync_peek_fence - get the next fence not signaled yet * * @sync: the sync object * @ring: optional ring to use for test * - * Returns true if all fences in the sync object are signaled or scheduled to - * the ring (if provided). + * Returns the next fence not signaled yet without removing it from the sync + * object. */ -bool amdgpu_sync_is_idle(struct amdgpu_sync *sync, - struct amdgpu_ring *ring) +struct fence *amdgpu_sync_peek_fence(struct amdgpu_sync *sync, + struct amdgpu_ring *ring) { struct amdgpu_sync_entry *e; struct hlist_node *tmp; @@ -246,68 +246,25 @@ bool amdgpu_sync_is_idle(struct amdgpu_sync *sync, /* For fences from the same ring it is sufficient * when they are scheduled. */ - if (s_fence->sched == &ring->sched && - fence_is_signaled(&s_fence->scheduled)) - continue; - } + if (s_fence->sched == &ring->sched) { + if (fence_is_signaled(&s_fence->scheduled)) + continue;

- if (fence_is_signaled(f)) { - hash_del(&e->node); - fence_put(f); - kmem_cache_free(amdgpu_sync_slab, e); - continue; + return &s_fence->scheduled; + } }

- return false; - } - - return true; -} - -/** - * amdgpu_sync_cycle_fences - move fences from one sync object into another - * - * @dst: the destination sync object - * @src: the source sync object - * @fence: fence to add to source - * - * Remove all fences from source and put them into destination and add - * fence as new one into source. - */ -int amdgpu_sync_cycle_fences(struct amdgpu_sync *dst, struct amdgpu_sync *src, - struct fence *fence) -{ - struct amdgpu_sync_entry *e, *newone; - struct hlist_node *tmp; - int i; - - /* Allocate the new entry before moving the old ones */ - newone = kmem_cache_alloc(amdgpu_sync_slab, GFP_KERNEL); - if (!newone) - return -ENOMEM; - - hash_for_each_safe(src->fences, i, tmp, e, node) { - struct fence *f = e->fence; - - hash_del(&e->node); if (fence_is_signaled(f)) { + hash_del(&e->node); fence_put(f); kmem_cache_free(amdgpu_sync_slab, e); continue; }

- if (amdgpu_sync_add_later(dst, f)) { - kmem_cache_free(amdgpu_sync_slab, e); - continue; - } - - hash_add(dst->fences, &e->node, f->context); + return f; }

- hash_add(src->fences, &newone->node, fence->context); - newone->fence = fence_get(fence); - - return 0; + return NULL; }

/** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index f206820..8ea1c73 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -25,6 +25,7 @@ * Alex Deucher * Jerome Glisse */ +#include <linux/fence-array.h> #include <drm/drmP.h> #include <drm/amdgpu_drm.h> #include "amdgpu.h" @@ -180,82 +181,116 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, struct amdgpu_device *adev = ring->adev; struct fence *updates = sync->last_vm_update; struct amdgpu_vm_id *id, *idle; - unsigned i = ring->idx; - int r; + struct fence **fences; + unsigned i; + int r = 0; + + fences = kmalloc_array(sizeof(void *), adev->vm_manager.num_ids, + GFP_KERNEL); + if (!fences) + return -ENOMEM;

mutex_lock(&adev->vm_manager.lock);

/* Check if we have an idle VMID */ + i = 0; list_for_each_entry(idle, &adev->vm_manager.ids_lru, list) { - if (amdgpu_sync_is_idle(&idle->active, ring)) + fences[i] = amdgpu_sync_peek_fence(&idle->active, ring); + if (!fences[i]) break; - + ++i; }

- /* If we can't find a idle VMID to use, just wait for the oldest */ + /* If we can't find a idle VMID to use, wait till one becomes available */ if (&idle->list == &adev->vm_manager.ids_lru) { - id = list_first_entry(&adev->vm_manager.ids_lru, - struct amdgpu_vm_id, - list); - } else { - /* Check if we can use a VMID already assigned to this VM */ - do { - struct fence *flushed; - - id = vm->ids[i++]; - if (i == AMDGPU_MAX_RINGS) - i = 0; - - /* Check all the prerequisites to using this VMID */ - if (!id) - continue; - - if (atomic64_read(&id->owner) != vm->client_id) - continue; - - if (pd_addr != id->pd_gpu_addr) - continue; - - if (id->last_user != ring && (!id->last_flush || - !fence_is_signaled(id->last_flush))) - continue; - - flushed = id->flushed_updates; - if (updates && (!flushed || - fence_is_later(updates, flushed))) - continue; - - /* Good we can use this VMID */ - if (id->last_user == ring) { - r = amdgpu_sync_fence(ring->adev, sync, - id->first); - if (r) - goto error; - } + u64 fence_context = adev->vm_manager.fence_context + ring->idx; + unsigned seqno = ++adev->vm_manager.seqno[ring->idx]; + struct fence_array *array; + unsigned j; + + for (j = 0; j < i; ++j) + fence_get(fences[j]); + + array = fence_array_create(i, fences, fence_context, + seqno, true); + if (!array) { + for (j = 0; j < i; ++j) + fence_put(fences[j]); + kfree(fences); + r = -ENOMEM; + goto error; + } + + + r = amdgpu_sync_fence(ring->adev, sync, &array->base); + fence_put(&array->base); + if (r) + goto error; + + mutex_unlock(&adev->vm_manager.lock); + return 0; + + } + kfree(fences); + + /* Check if we can use a VMID already assigned to this VM */ + i = ring->idx; + do { + struct fence *flushed; + + id = vm->ids[i++]; + if (i == AMDGPU_MAX_RINGS) + i = 0;

- /* And remember this submission as user of the VMID */ - r = amdgpu_sync_fence(ring->adev, &id->active, fence); + /* Check all the prerequisites to using this VMID */ + if (!id) + continue; + + if (atomic64_read(&id->owner) != vm->client_id) + continue; + + if (pd_addr != id->pd_gpu_addr) + continue; + + if (id->last_user != ring && + (!id->last_flush || !fence_is_signaled(id->last_flush))) + continue; + + flushed = id->flushed_updates; + if (updates && + (!flushed || fence_is_later(updates, flushed))) + continue; + + /* Good we can use this VMID */ + if (id->last_user == ring) { + r = amdgpu_sync_fence(ring->adev, sync, + id->first); if (r) goto error; + } + + /* And remember this submission as user of the VMID */ + r = amdgpu_sync_fence(ring->adev, &id->active, fence); + if (r) + goto error;

- list_move_tail(&id->list, &adev->vm_manager.ids_lru); - vm->ids[ring->idx] = id; + list_move_tail(&id->list, &adev->vm_manager.ids_lru); + vm->ids[ring->idx] = id;

- *vm_id = id - adev->vm_manager.ids; - *vm_pd_addr = AMDGPU_VM_NO_FLUSH; - trace_amdgpu_vm_grab_id(vm, ring->idx, *vm_id, - *vm_pd_addr); + *vm_id = id - adev->vm_manager.ids; + *vm_pd_addr = AMDGPU_VM_NO_FLUSH; + trace_amdgpu_vm_grab_id(vm, ring->idx, *vm_id, *vm_pd_addr);

- mutex_unlock(&adev->vm_manager.lock); - return 0; + mutex_unlock(&adev->vm_manager.lock); + return 0;

- } while (i != ring->idx); + } while (i != ring->idx);

- /* Still no ID to use? Then use the idle one found earlier */ - id = idle; - } + /* Still no ID to use? Then use the idle one found earlier */ + id = idle;

- r = amdgpu_sync_cycle_fences(sync, &id->active, fence); + /* Remember this submission as user of the VMID */ + r = amdgpu_sync_fence(ring->adev, &id->active, fence); if (r) goto error;

@@ -1515,6 +1550,10 @@ void amdgpu_vm_manager_init(struct amdgpu_device *adev) &adev->vm_manager.ids_lru); }

+ adev->vm_manager.fence_context = fence_context_alloc(AMDGPU_MAX_RINGS); + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) + adev->vm_manager.seqno[i] = 0; + atomic_set(&adev->vm_manager.vm_pte_next_ring, 0); atomic64_set(&adev->vm_manager.client_counter, 0); }

-- 2.5.0

Christian König

1:10 p.m.

New subject: [PATCH 11/11] drm/amdgpu: remove now unnecessary checks

From: Christian König christian.koenig@amd.com

vm_flush() now comes directly after vm_grab_id().

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31 +++++++++++-------------------- 2 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 52326d3..e054542 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -886,7 +886,6 @@ struct amdgpu_vm_id { struct fence *first; struct amdgpu_sync active; struct fence *last_flush; - struct amdgpu_ring *last_user; atomic64_t owner;

uint64_t pd_gpu_addr; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8ea1c73..48d5ad18 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -237,6 +237,7 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, i = ring->idx; do { struct fence *flushed; + bool same_ring = ring->idx == i;

id = vm->ids[i++]; if (i == AMDGPU_MAX_RINGS) @@ -252,7 +253,7 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, if (pd_addr != id->pd_gpu_addr) continue;

- if (id->last_user != ring && + if (!same_ring && (!id->last_flush || !fence_is_signaled(id->last_flush))) continue;

@@ -261,15 +262,9 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, (!flushed || fence_is_later(updates, flushed))) continue;

- /* Good we can use this VMID */ - if (id->last_user == ring) { - r = amdgpu_sync_fence(ring->adev, sync, - id->first); - if (r) - goto error; - } - - /* And remember this submission as user of the VMID */ + /* Good we can use this VMID. Remember this submission as + * user of the VMID. + */ r = amdgpu_sync_fence(ring->adev, &id->active, fence); if (r) goto error; @@ -306,7 +301,6 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, id->pd_gpu_addr = pd_addr;

list_move_tail(&id->list, &adev->vm_manager.ids_lru); - id->last_user = ring; atomic64_set(&id->owner, vm->client_id); vm->ids[ring->idx] = id;

@@ -357,16 +351,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, trace_amdgpu_vm_flush(pd_addr, ring->idx, vm_id); amdgpu_ring_emit_vm_flush(ring, vm_id, pd_addr);

+ r = amdgpu_fence_emit(ring, &fence); + if (r) + return r; + mutex_lock(&adev->vm_manager.lock); - if ((id->pd_gpu_addr == pd_addr) && (id->last_user == ring)) { - r = amdgpu_fence_emit(ring, &fence); - if (r) { - mutex_unlock(&adev->vm_manager.lock); - return r; - } - fence_put(id->last_flush); - id->last_flush = fence; - } + fence_put(id->last_flush); + id->last_flush = fence; mutex_unlock(&adev->vm_manager.lock); }

-- 2.5.0

Alex Deucher

6 Jun 6 Jun

9:14 p.m.

New subject: [PATCH 11/11] drm/amdgpu: remove now unnecessary checks

On Wed, Jun 1, 2016 at 9:10 AM, Christian König deathsimple@vodafone.de wrote:

...

From: Christian König christian.koenig@amd.com

vm_flush() now comes directly after vm_grab_id().

Signed-off-by: Christian König christian.koenig@amd.com

For the series: Acked-by: Alex Deucher alexander.deucher@amd.com

...

drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 31 +++++++++++-------------------- 2 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 52326d3..e054542 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -886,7 +886,6 @@ struct amdgpu_vm_id { struct fence *first; struct amdgpu_sync active; struct fence *last_flush;
  struct amdgpu_ring      *last_user;
  atomic64_t              owner;

  uint64_t                pd_gpu_addr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 8ea1c73..48d5ad18 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -237,6 +237,7 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, i = ring->idx; do { struct fence *flushed;
          bool same_ring = ring->idx == i;

          id = vm->ids[i++];
          if (i == AMDGPU_MAX_RINGS)
@@ -252,7 +253,7 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, if (pd_addr != id->pd_gpu_addr) continue;
          if (id->last_user != ring &&
          if (!same_ring &&
              (!id->last_flush || !fence_is_signaled(id->last_flush)))
                  continue;
@@ -261,15 +262,9 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, (!flushed || fence_is_later(updates, flushed))) continue;
          /* Good we can use this VMID */
          if (id->last_user == ring) {
                  r = amdgpu_sync_fence(ring->adev, sync,
                                        id->first);
                  if (r)
                          goto error;
          }
          /* And remember this submission as user of the VMID */
          /* Good we can use this VMID. Remember this submission as
           * user of the VMID.
           */
          r = amdgpu_sync_fence(ring->adev, &id->active, fence);
          if (r)
                  goto error;
@@ -306,7 +301,6 @@ int amdgpu_vm_grab_id(struct amdgpu_vm *vm, struct amdgpu_ring *ring, id->pd_gpu_addr = pd_addr;
    list_move_tail(&id->list, &adev->vm_manager.ids_lru);
  id->last_user = ring;
  atomic64_set(&id->owner, vm->client_id);
  vm->ids[ring->idx] = id;
@@ -357,16 +351,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, trace_amdgpu_vm_flush(pd_addr, ring->idx, vm_id); amdgpu_ring_emit_vm_flush(ring, vm_id, pd_addr);
          r = amdgpu_fence_emit(ring, &fence);
          if (r)
                  return r;
          mutex_lock(&adev->vm_manager.lock);
          if ((id->pd_gpu_addr == pd_addr) && (id->last_user == ring)) {
                  r = amdgpu_fence_emit(ring, &fence);
                  if (r) {
                          mutex_unlock(&adev->vm_manager.lock);
                          return r;
                  }
                  fence_put(id->last_flush);
                  id->last_flush = fence;
          }
          fence_put(id->last_flush);
          id->last_flush = fence;
          mutex_unlock(&adev->vm_manager.lock);
  }
-- 2.5.0

dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

Alex Deucher

1 Jun 1 Jun

1:42 p.m.

On Wed, Jun 1, 2016 at 9:10 AM, Christian König deathsimple@vodafone.de wrote:

...

Hi guys,

this is the next iteration of the fence array patch set.

Daniel suggested that I provide an example on how this functionality might be used by a driver. So I added a few additional patches in this series to show what I want to do with this in the amdgpu driver.

The main idea is that for each VMID we have a set of hardware fences which are currently using this VMID. Now when a new command submission needs a VMID we construct a fence array which should signal when any of the VMIDs becomes available and gives that back to our the scheduler.

For those that are not familiar, a VMID = Virtual Memory ID. AMD GPUs have multiple virtual address space contexts which can be in flight on the GPU at any given time. The VMID is used to select which VM context you want to use for a specific GPU operation.

Alex

...

This effort and my testing also found a rather stupid typo in the code and I also tried to incorporate the comments from Chris and Daniel as well.

I think it's ready to land now, but as usual feel free to take it apart.

Cheers, Christian.

dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel

3250

Age (days ago)

3255

Last active (days ago)

dri-devel@lists.freedesktop.org

20 comments

5 participants

tags (0)

participants (5)

Alex Deucher
Christian König
Daniel Vetter
Gustavo Padovan
Sumit Semwal