[PATCH 0/7] dma-resv fence DAG fixes

List overview All Threads
Download

newer

older

[PULL] drm-intel-next-fixes

Re: [Linaro-mm-sig] [PATCH v4 0/2]...

Daniel Vetter

6 Jul 2021 6 Jul '21

10:12 a.m.

Hi all,

Finally I think I got them all in trying to audit all drivers for how they deal with dma-resv dependencies in their command submission ioctl.

This series is incomplete, it also needs a few things from Christian - nouveau fix for waiting for all fences - various patches for fixing up dma-buf/resv functions to always wait for all fences (dma-buf poll, is_signalled, ...) - I do include the one msm patch from Christian here since there was another issue in msm that needed fixing, and to make sure we have the complete set for msm

Two main things: - fix drivers that currently can break the DAG. I opted for the dumbest possible way and not for rolling out dma_fence_chain - this can be fixed later on if needed.

- allow shared fences to be decoupled from the exclusive slot, which mostly means we can't skip waiting for the exclusive fence if there's shared fences present, we have to wait for all fences. This is a semantic change compared to what we've had thus far, but really makes a ton of sense given where things are heading towards.

Note that this means the import/export patches from Jason need to be adjusted too to fit.

Plus some docs for dma-resv, they've been rather lacking.

Testing and review highly welcome.

Christian König (1): drm/msm: always wait for the exclusive fence

Daniel Vetter (6): drm/msm: Don't break exclusive fence ordering drm/etnaviv: Don't break exclusive fence ordering drm/i915: delete exclude argument from i915_sw_fence_await_reservation drm/i915: Always wait for the exclusive fence drm/i915: Don't break exclusive fence ordering dma-resv: Give the docs a do-over

drivers/dma-buf/dma-resv.c | 22 +++- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 8 +- drivers/gpu/drm/i915/display/intel_display.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 2 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 +- drivers/gpu/drm/i915/i915_sw_fence.c | 10 +- drivers/gpu/drm/i915/i915_sw_fence.h | 1 - drivers/gpu/drm/msm/msm_gem.c | 16 ++- drivers/gpu/drm/msm/msm_gem_submit.c | 3 +- include/linux/dma-resv.h | 104 +++++++++++++++++- 10 files changed, 142 insertions(+), 36 deletions(-)

-- 2.32.0

Show replies by date

Daniel Vetter

6 Jul 6 Jul

10:12 a.m.

New subject: [PATCH 1/7] drm/msm: Don't break exclusive fence ordering

There's only one exclusive slot, and we must not break the ordering.

A better fix would be to us a dma_fence_chain or _array like e.g. amdgpu now uses, but - msm has a synchronous dma_fence_wait for anything from another context, so doesn't seem to care much, - and it probably makes sense to lift this into dma-resv.c code as a proper concept, so that drivers don't have to hack up their own solution each on their own.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Rob Clark robdclark@gmail.com Cc: Sean Paul sean@poorly.run Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org --- drivers/gpu/drm/msm/msm_gem_submit.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c index b71da71a3dd8..edd0051d849f 100644 --- a/drivers/gpu/drm/msm/msm_gem_submit.c +++ b/drivers/gpu/drm/msm/msm_gem_submit.c @@ -306,7 +306,8 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit) return ret; }

- if (no_implicit) + /* exclusive fences must be ordered */ + if (no_implicit && !write) continue;

ret = msm_gem_sync_object(&msm_obj->base, submit->ring->fctx,

-- 2.32.0

Daniel Vetter

10:12 a.m.

New subject: [PATCH 2/7] drm/msm: always wait for the exclusive fence

From: Christian König ckoenig.leichtzumerken@gmail.com

Drivers also need to to sync to the exclusive fence when a shared one is present.

Signed-off-by: Christian König christian.koenig@amd.com [danvet: Not that hard to compile-test on arm ...] Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Cc: Rob Clark robdclark@gmail.com Cc: Sean Paul sean@poorly.run Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org --- drivers/gpu/drm/msm/msm_gem.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c index 141178754231..d9c4f1deeafb 100644 --- a/drivers/gpu/drm/msm/msm_gem.c +++ b/drivers/gpu/drm/msm/msm_gem.c @@ -812,17 +812,15 @@ int msm_gem_sync_object(struct drm_gem_object *obj, struct dma_fence *fence; int i, ret;

- fobj = dma_resv_shared_list(obj->resv); - if (!fobj || (fobj->shared_count == 0)) { - fence = dma_resv_excl_fence(obj->resv); - /* don't need to wait on our own fences, since ring is fifo */ - if (fence && (fence->context != fctx->context)) { - ret = dma_fence_wait(fence, true); - if (ret) - return ret; - } + fence = dma_resv_excl_fence(obj->resv); + /* don't need to wait on our own fences, since ring is fifo */ + if (fence && (fence->context != fctx->context)) { + ret = dma_fence_wait(fence, true); + if (ret) + return ret; }

+ fobj = dma_resv_shared_list(obj->resv); if (!exclusive || !fobj) return 0;

-- 2.32.0

Daniel Vetter

10:12 a.m.

New subject: [PATCH 3/7] drm/etnaviv: Don't break exclusive fence ordering

There's only one exclusive slot, and we must not break the ordering.

A better fix would be to us a dma_fence_chain or _array like e.g. amdgpu now uses, but it probably makes sense to lift this into dma-resv.c code as a proper concept, so that drivers don't have to hack up their own solution each on their own. Hence go with the simple fix for now.

Another option is the fence import ioctl from Jason:

https://lore.kernel.org/dri-devel/20210610210925.642582-7-jason@jlekstrand.n...

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Russell King linux+etnaviv@armlinux.org.uk Cc: Christian Gmeiner christian.gmeiner@gmail.com Cc: etnaviv@lists.freedesktop.org --- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 92478a50a580..5c4fed2b7c6a 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -178,18 +178,20 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) for (i = 0; i < submit->nr_bos; i++) { struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; struct dma_resv *robj = bo->obj->base.resv; + bool write = bo->flags & ETNA_SUBMIT_BO_WRITE;

- if (!(bo->flags & ETNA_SUBMIT_BO_WRITE)) { + if (!(write)) { ret = dma_resv_reserve_shared(robj, 1); if (ret) return ret; }

- if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT) + /* exclusive fences must be ordered */ + if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT && !write) continue;

ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base, - bo->flags & ETNA_SUBMIT_BO_WRITE); + write); if (ret) return ret; }

-- 2.32.0

Lucas Stach

7 Jul 7 Jul

8:54 a.m.

New subject: [PATCH 3/7] drm/etnaviv: Don't break exclusive fence ordering

Hi Daniel,

I'm feeling like I miss a ton of context here, so some maybe dumb questions/remarks below.

Am Dienstag, dem 06.07.2021 um 12:12 +0200 schrieb Daniel Vetter:

...

There's only one exclusive slot, and we must not break the ordering.

A better fix would be to us a dma_fence_chain or _array like e.g. amdgpu now uses, but it probably makes sense to lift this into dma-resv.c code as a proper concept, so that drivers don't have to hack up their own solution each on their own. Hence go with the simple fix for now.

Another option is the fence import ioctl from Jason:

https://lore.kernel.org/dri-devel/20210610210925.642582-7-jason@jlekstrand.n...

Sorry, but why is the fence import ioctl a alternative to the fix proposed in this patch?

...

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Russell King linux+etnaviv@armlinux.org.uk Cc: Christian Gmeiner christian.gmeiner@gmail.com Cc: etnaviv@lists.freedesktop.org

drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 92478a50a580..5c4fed2b7c6a 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -178,18 +178,20 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) for (i = 0; i < submit->nr_bos; i++) { struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; struct dma_resv *robj = bo->obj->base.resv;
bool write = bo->flags & ETNA_SUBMIT_BO_WRITE;
if (!(bo->flags & ETNA_SUBMIT_BO_WRITE)) {
if (!(write)) {

No parenthesis around the write needed.

...

	ret = dma_resv_reserve_shared(robj, 1);
	if (ret)
		return ret;
}

if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)

```
/* exclusive fences must be ordered */
```

I feel like this comment isn't really helpful. It might tell you all you need to know if you just paged in all the context around dma_resv and the dependency graph, but it's not more than noise to me right now.

I guess the comment should answer the question against what the exclusive fence we are going to add needs to be ordered and why it isn't safe to skip implicit sync in that case.

Regards, Lucas

...

if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT && !write)
continue;
ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,
				   bo->flags & ETNA_SUBMIT_BO_WRITE);
				   write);
if (ret) return ret; }

Daniel Vetter

11:37 a.m.

New subject: [PATCH 3/7] drm/etnaviv: Don't break exclusive fence ordering

On Wed, Jul 7, 2021 at 10:54 AM Lucas Stach l.stach@pengutronix.de wrote:

...

Hi Daniel,

I'm feeling like I miss a ton of context here, so some maybe dumb questions/remarks below.

Am Dienstag, dem 06.07.2021 um 12:12 +0200 schrieb Daniel Vetter:

...
There's only one exclusive slot, and we must not break the ordering.

A better fix would be to us a dma_fence_chain or _array like e.g. amdgpu now uses, but it probably makes sense to lift this into dma-resv.c code as a proper concept, so that drivers don't have to hack up their own solution each on their own. Hence go with the simple fix for now.

Another option is the fence import ioctl from Jason:

https://lore.kernel.org/dri-devel/20210610210925.642582-7-jason@jlekstrand.n...

Sorry, but why is the fence import ioctl a alternative to the fix proposed in this patch?

It's not an alternative to fixing the issue here, it's an alternative to trying to fix this here without adding more dependencies. Depends a bit what exactly you want to do, I just linked this for the bigger picture.

...

...
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Russell King linux+etnaviv@armlinux.org.uk Cc: Christian Gmeiner christian.gmeiner@gmail.com Cc: etnaviv@lists.freedesktop.org

drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 92478a50a580..5c4fed2b7c6a 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -178,18 +178,20 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) for (i = 0; i < submit->nr_bos; i++) { struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; struct dma_resv *robj = bo->obj->base.resv;
        bool write = bo->flags & ETNA_SUBMIT_BO_WRITE;
        if (!(bo->flags & ETNA_SUBMIT_BO_WRITE)) {
        if (!(write)) {
No parenthesis around the write needed.

...
                  ret = dma_resv_reserve_shared(robj, 1);
                  if (ret)
                          return ret;
          }
        if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
        /* exclusive fences must be ordered */
I feel like this comment isn't really helpful. It might tell you all you need to know if you just paged in all the context around dma_resv and the dependency graph, but it's not more than noise to me right now.

I guess the comment should answer the question against what the exclusive fence we are going to add needs to be ordered and why it isn't safe to skip implicit sync in that case.

The full-length comment is the doc patch, which is last in the series. Essentially the rule is that your not allowed to drop fences on the floor (which you do if you just set a new write fence and not take any of the existing fences as dependencies), at least for shared buffers. But since it's easier to be consistent I think it's best if we do this just across the board.

Like the commit message explains, there's a few ways to fix this: - just treat it as implicit synced - chain the fences together - that way you don't sync, but also there's no fence that's being lost. This is what amdgpu does, and also what the new import ioctl does.

2nd option is a lot more involved, and since all the dma-resv stuff is a bit under discussion, I went with the most minimal thing possible. -Daniel

...

Regards, Lucas

...

        if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT && !write)
                continue;

        ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,

                                           bo->flags & ETNA_SUBMIT_BO_WRITE);

                                           write);
        if (ret)
                return ret;
}

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Lucas Stach

12:31 p.m.

New subject: [PATCH 3/7] drm/etnaviv: Don't break exclusive fence ordering

Am Mittwoch, dem 07.07.2021 um 13:37 +0200 schrieb Daniel Vetter:

...

On Wed, Jul 7, 2021 at 10:54 AM Lucas Stach l.stach@pengutronix.de wrote:

...
Hi Daniel,

I'm feeling like I miss a ton of context here, so some maybe dumb questions/remarks below.

Am Dienstag, dem 06.07.2021 um 12:12 +0200 schrieb Daniel Vetter:

...
There's only one exclusive slot, and we must not break the ordering.

A better fix would be to us a dma_fence_chain or _array like e.g. amdgpu now uses, but it probably makes sense to lift this into dma-resv.c code as a proper concept, so that drivers don't have to hack up their own solution each on their own. Hence go with the simple fix for now.

Another option is the fence import ioctl from Jason:

https://lore.kernel.org/dri-devel/20210610210925.642582-7-jason@jlekstrand.n...

Sorry, but why is the fence import ioctl a alternative to the fix proposed in this patch?

It's not an alternative to fixing the issue here, it's an alternative to trying to fix this here without adding more dependencies. Depends a bit what exactly you want to do, I just linked this for the bigger picture.

I appreciate the bigger picture, it just makes it harder to follow what is going on in this exact commit when trying to match up the code change with the commit message. I would have expected this link to only be part of the cover letter explaining the series, instead of being part of the commit message.

...

...
...
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Russell King linux+etnaviv@armlinux.org.uk Cc: Christian Gmeiner christian.gmeiner@gmail.com Cc: etnaviv@lists.freedesktop.org

drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 92478a50a580..5c4fed2b7c6a 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -178,18 +178,20 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) for (i = 0; i < submit->nr_bos; i++) { struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; struct dma_resv *robj = bo->obj->base.resv;
        bool write = bo->flags & ETNA_SUBMIT_BO_WRITE;
        if (!(bo->flags & ETNA_SUBMIT_BO_WRITE)) {
        if (!(write)) {
No parenthesis around the write needed.

...
                  ret = dma_resv_reserve_shared(robj, 1);
                  if (ret)
                          return ret;
          }
        if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
        /* exclusive fences must be ordered */
I feel like this comment isn't really helpful. It might tell you all you need to know if you just paged in all the context around dma_resv and the dependency graph, but it's not more than noise to me right now.

I guess the comment should answer the question against what the exclusive fence we are going to add needs to be ordered and why it isn't safe to skip implicit sync in that case.
The full-length comment is the doc patch, which is last in the series. Essentially the rule is that your not allowed to drop fences on the floor (which you do if you just set a new write fence and not take any of the existing fences as dependencies), at least for shared buffers. But since it's easier to be consistent I think it's best if we do this just across the board.

Like the commit message explains, there's a few ways to fix this:

just treat it as implicit synced

chain the fences together - that way you don't sync, but also

there's no fence that's being lost. This is what amdgpu does, and also what the new import ioctl does.

2nd option is a lot more involved, and since all the dma-resv stuff is a bit under discussion, I went with the most minimal thing possible.

Thanks for the confirmation, that was as much as I figured from the doc patch as well. So with that in mind I would expect this comment to read something like this: "Adding a new exclusive fence drops all previous fences from the dma_resv. To avoid violating the signalling order we err on the side of over-synchronizing by waiting for the existing fences, even if userspace asked us to ignore them."

Regards, Lucas

...

-Daniel

...

Regards, Lucas

...

        if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT && !write)
                continue;

        ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,

                                           bo->flags & ETNA_SUBMIT_BO_WRITE);

                                           write);
        if (ret)
                return ret;
}

Daniel Vetter

12:59 p.m.

New subject: [PATCH 3/7] drm/etnaviv: Don't break exclusive fence ordering

On Wed, Jul 7, 2021 at 2:31 PM Lucas Stach l.stach@pengutronix.de wrote:

...

Am Mittwoch, dem 07.07.2021 um 13:37 +0200 schrieb Daniel Vetter:

...
On Wed, Jul 7, 2021 at 10:54 AM Lucas Stach l.stach@pengutronix.de wrote:

...
Hi Daniel,

I'm feeling like I miss a ton of context here, so some maybe dumb questions/remarks below.

Am Dienstag, dem 06.07.2021 um 12:12 +0200 schrieb Daniel Vetter:

...
There's only one exclusive slot, and we must not break the ordering.

A better fix would be to us a dma_fence_chain or _array like e.g. amdgpu now uses, but it probably makes sense to lift this into dma-resv.c code as a proper concept, so that drivers don't have to hack up their own solution each on their own. Hence go with the simple fix for now.

Another option is the fence import ioctl from Jason:

https://lore.kernel.org/dri-devel/20210610210925.642582-7-jason@jlekstrand.n...

Sorry, but why is the fence import ioctl a alternative to the fix proposed in this patch?

It's not an alternative to fixing the issue here, it's an alternative to trying to fix this here without adding more dependencies. Depends a bit what exactly you want to do, I just linked this for the bigger picture.

I appreciate the bigger picture, it just makes it harder to follow what is going on in this exact commit when trying to match up the code change with the commit message. I would have expected this link to only be part of the cover letter explaining the series, instead of being part of the commit message.

I wanted to list all the better fix options in the commit message so that you have the full context.

...

...
...
...
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Lucas Stach l.stach@pengutronix.de Cc: Russell King linux+etnaviv@armlinux.org.uk Cc: Christian Gmeiner christian.gmeiner@gmail.com Cc: etnaviv@lists.freedesktop.org

drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 92478a50a580..5c4fed2b7c6a 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -178,18 +178,20 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) for (i = 0; i < submit->nr_bos; i++) { struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; struct dma_resv *robj = bo->obj->base.resv;
        bool write = bo->flags & ETNA_SUBMIT_BO_WRITE;
        if (!(bo->flags & ETNA_SUBMIT_BO_WRITE)) {
        if (!(write)) {
No parenthesis around the write needed.

...
                  ret = dma_resv_reserve_shared(robj, 1);
                  if (ret)
                          return ret;
          }
        if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT)
        /* exclusive fences must be ordered */
I feel like this comment isn't really helpful. It might tell you all you need to know if you just paged in all the context around dma_resv and the dependency graph, but it's not more than noise to me right now.

I guess the comment should answer the question against what the exclusive fence we are going to add needs to be ordered and why it isn't safe to skip implicit sync in that case.
The full-length comment is the doc patch, which is last in the series. Essentially the rule is that your not allowed to drop fences on the floor (which you do if you just set a new write fence and not take any of the existing fences as dependencies), at least for shared buffers. But since it's easier to be consistent I think it's best if we do this just across the board.

Like the commit message explains, there's a few ways to fix this:

just treat it as implicit synced

chain the fences together - that way you don't sync, but also

there's no fence that's being lost. This is what amdgpu does, and also what the new import ioctl does.

2nd option is a lot more involved, and since all the dma-resv stuff is a bit under discussion, I went with the most minimal thing possible.
Thanks for the confirmation, that was as much as I figured from the doc patch as well. So with that in mind I would expect this comment to read something like this: "Adding a new exclusive fence drops all previous fences from the dma_resv. To avoid violating the signalling order we err on the side of over-synchronizing by waiting for the existing fences, even if userspace asked us to ignore them."

Thanks for the suggestion, I've applied that to all the 3 patches for msm/etnaviv and i915.

I hope with that added there's enough context in the commit message that at least the gist is understandable without full context down the road. -Daniel

Regards, Lucas

-Daniel

Regards, Lucas

        if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT && !write)
                continue;

        ret = drm_sched_job_await_implicit(&submit->sched_job, &bo->obj->base,

                                           bo->flags & ETNA_SUBMIT_BO_WRITE);

                                           write);
        if (ret)
                return ret;
}

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Daniel Vetter

6 Jul 6 Jul

10:12 a.m.

New subject: [PATCH 4/7] drm/i915: delete exclude argument from i915_sw_fence_await_reservation

No longer used, the last user disappeared with

commit d07f0e59b2c762584478920cd2d11fba2980a94a Author: Chris Wilson chris@chris-wilson.co.uk Date: Fri Oct 28 13:58:44 2016 +0100

drm/i915: Move GEM activity tracking into a common struct reservation_object

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: "Thomas Hellström" thomas.hellstrom@linux.intel.com Cc: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/i915/display/intel_display.c | 4 ++-- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +- drivers/gpu/drm/i915/i915_sw_fence.c | 6 +----- drivers/gpu/drm/i915/i915_sw_fence.h | 1 - 5 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 98e0f4ed7e4a..678c7839034e 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -11119,7 +11119,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane, */ if (intel_crtc_needs_modeset(crtc_state)) { ret = i915_sw_fence_await_reservation(&state->commit_ready, - old_obj->base.resv, NULL, + old_obj->base.resv, false, 0, GFP_KERNEL); if (ret < 0) @@ -11153,7 +11153,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane, struct dma_fence *fence;

ret = i915_sw_fence_await_reservation(&state->commit_ready, - obj->base.resv, NULL, + obj->base.resv, false, i915_fence_timeout(dev_priv), GFP_KERNEL); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index daf9284ef1f5..93439d2c7a58 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -106,7 +106,7 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, clflush = clflush_work_create(obj); if (clflush) { i915_sw_fence_await_reservation(&clflush->base.chain, - obj->base.resv, NULL, true, + obj->base.resv, true, i915_fence_timeout(to_i915(obj->base.dev)), I915_FENCE_GFP); dma_resv_add_excl_fence(obj->base.resv, &clflush->base.dma); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 68ce366f46cf..47e07179347a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -2095,7 +2095,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,

/* Wait for all writes (and relocs) into the batch to complete */ err = i915_sw_fence_await_reservation(&pw->base.chain, - pw->batch->resv, NULL, false, + pw->batch->resv, false, 0, I915_FENCE_GFP); if (err < 0) goto err_commit; diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index c589a681da77..91711a46b1c7 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -567,7 +567,6 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,

int i915_sw_fence_await_reservation(struct i915_sw_fence *fence, struct dma_resv *resv, - const struct dma_fence_ops *exclude, bool write, unsigned long timeout, gfp_t gfp) @@ -587,9 +586,6 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence, return ret;

for (i = 0; i < count; i++) { - if (shared[i]->ops == exclude) - continue; - pending = i915_sw_fence_await_dma_fence(fence, shared[i], timeout, @@ -609,7 +605,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence, excl = dma_resv_get_excl_unlocked(resv); }

- if (ret >= 0 && excl && excl->ops != exclude) { + if (ret >= 0 && excl) { pending = i915_sw_fence_await_dma_fence(fence, excl, timeout, diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h index 30a863353ee6..6572f01668e4 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.h +++ b/drivers/gpu/drm/i915/i915_sw_fence.h @@ -86,7 +86,6 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,

int i915_sw_fence_await_reservation(struct i915_sw_fence *fence, struct dma_resv *resv, - const struct dma_fence_ops *exclude, bool write, unsigned long timeout, gfp_t gfp);

-- 2.32.0

Daniel Vetter

10:12 a.m.

New subject: [PATCH 5/7] drm/i915: Always wait for the exclusive fence

We're lifting, or well, clarifying that the restriction that shared fences have to be strictly after the exclusive one doesn't apply anymore.

So adjust the code to always also wait for the exclusive fence.

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index 91711a46b1c7..271d321cea83 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -601,10 +601,10 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence, for (i = 0; i < count; i++) dma_fence_put(shared[i]); kfree(shared); - } else { - excl = dma_resv_get_excl_unlocked(resv); }

+ excl = dma_resv_get_excl_unlocked(resv); + if (ret >= 0 && excl) { pending = i915_sw_fence_await_dma_fence(fence, excl,

-- 2.32.0

Matthew Auld

12:47 p.m.

New subject: [Intel-gfx] [PATCH 5/7] drm/i915: Always wait for the exclusive fence

On Tue, 6 Jul 2021 at 11:12, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...

We're lifting, or well, clarifying that the restriction that shared fences have to be strictly after the exclusive one doesn't apply anymore.

So adjust the code to always also wait for the exclusive fence.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: "Thomas Hellström" thomas.hellstrom@linux.intel.com Cc: Jason Ekstrand jason@jlekstrand.net

drivers/gpu/drm/i915/i915_sw_fence.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index 91711a46b1c7..271d321cea83 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -601,10 +601,10 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence, for (i = 0; i < count; i++) dma_fence_put(shared[i]); kfree(shared);
  } else {
          excl = dma_resv_get_excl_unlocked(resv);
  }
  excl = dma_resv_get_excl_unlocked(resv);

The dma_resv_get_fences() call looks like it already fishes out the exclusive fence. Does this not leak the extra ref now?

...

    if (ret >= 0 && excl) {
            pending = i915_sw_fence_await_dma_fence(fence,
                                                    excl,
-- 2.32.0

Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Daniel Vetter

12:58 p.m.

New subject: [Intel-gfx] [PATCH 5/7] drm/i915: Always wait for the exclusive fence

On Tue, Jul 6, 2021 at 2:47 PM Matthew Auld matthew.william.auld@gmail.com wrote:

...

On Tue, 6 Jul 2021 at 11:12, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...
We're lifting, or well, clarifying that the restriction that shared fences have to be strictly after the exclusive one doesn't apply anymore.

So adjust the code to always also wait for the exclusive fence.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: "Thomas Hellström" thomas.hellstrom@linux.intel.com Cc: Jason Ekstrand jason@jlekstrand.net

drivers/gpu/drm/i915/i915_sw_fence.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index 91711a46b1c7..271d321cea83 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -601,10 +601,10 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence, for (i = 0; i < count; i++) dma_fence_put(shared[i]); kfree(shared);
  } else {
          excl = dma_resv_get_excl_unlocked(resv);
  }
  excl = dma_resv_get_excl_unlocked(resv);
The dma_resv_get_fences() call looks like it already fishes out the exclusive fence. Does this not leak the extra ref now?

Oh right I overlooked this, we already pick up the exclusive fence unconditionally. Control flow here was a bit too clever for my parser. I'll drop this patch in the next round. -Daniel

...

...
    if (ret >= 0 && excl) {
            pending = i915_sw_fence_await_dma_fence(fence,
                                                    excl,
-- 2.32.0

Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Daniel Vetter

10:12 a.m.

New subject: [PATCH 6/7] drm/i915: Don't break exclusive fence ordering

There's only one exclusive slot, and we must not break the ordering.

Another option is the fence import ioctl from Jason:

https://lore.kernel.org/dri-devel/20210610210925.642582-7-jason@jlekstrand.n...

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: "Thomas Hellström" thomas.hellstrom@linux.intel.com Cc: Jason Ekstrand jason@jlekstrand.net --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 47e07179347a..9d717c8842e2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1775,6 +1775,7 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb) struct i915_vma *vma = ev->vma; unsigned int flags = ev->flags; struct drm_i915_gem_object *obj = vma->obj; + bool async, write;

assert_vma_held(vma);

@@ -1806,7 +1807,10 @@ static int eb_move_to_gpu(struct i915_execbuffer *eb) flags &= ~EXEC_OBJECT_ASYNC; }

- if (err == 0 && !(flags & EXEC_OBJECT_ASYNC)) { + async = flags & EXEC_OBJECT_ASYNC; + write = flags & EXEC_OBJECT_WRITE; + + if (err == 0 && (!async || write)) { err = i915_request_await_object (eb->request, obj, flags & EXEC_OBJECT_WRITE); }

-- 2.32.0

Daniel Vetter

10:12 a.m.

New subject: [PATCH 7/7] dma-resv: Give the docs a do-over

Specifically document the new/clarified rules around how the shared fences do not have any ordering requirements against the exclusive fence.

But also document all the things a bit better, given how central struct dma_resv to dynamic buffer management the docs have been very inadequat.

- Lots more links to other pieces of the puzzle. Unfortunately ttm_buffer_object has no docs, so no links :-(

- Explain/complain a bit about dma_resv_locking_ctx(). I still don't like that one, but fixing the ttm call chains is going to be horrible. Plus we want to plug in real slowpath locking when we do that anyway.

- Main part of the patch is some actual docs for struct dma_resv.

Overall I think we still have a lot of bad naming in this area (e.g. dma_resv.fence is singular, but contains the multiple shared fences), but I think that's more indicative of how the semantics and rules are just not great.

Another thing that's real awkard is how chaining exclusive fences right now means direct dma_resv.exclusive_fence pointer access with an rcu_assign_pointer. Not so great either.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org --- drivers/dma-buf/dma-resv.c | 22 ++++++-- include/linux/dma-resv.h | 104 +++++++++++++++++++++++++++++++++++-- 2 files changed, 116 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index f26c71747d43..898f8d894bbd 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -48,6 +48,8 @@ * write operations) or N shared fences (read operations). The RCU * mechanism is used to protect read access to fences from locked * write-side updates. + * + * See struct dma_resv for more details. */

DEFINE_WD_CLASS(reservation_ww_class); @@ -137,7 +139,11 @@ EXPORT_SYMBOL(dma_resv_fini); * @num_fences: number of fences we want to add * * Should be called before dma_resv_add_shared_fence(). Must - * be called with obj->lock held. + * be called with @obj locked through dma_resv_lock(). + * + * Note that the preallocated slots need to be re-reserved if @obj is unlocked + * at any time before callind dma_resv_add_shared_fence(). This is validate when + * CONFIG_DEBUG_MUTEXES is enabled. * * RETURNS * Zero for success, or -errno @@ -234,8 +240,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max); * @obj: the reservation object * @fence: the shared fence to add * - * Add a fence to a shared slot, obj->lock must be held, and + * Add a fence to a shared slot, @obj must be locked with dma_resv_lock(), and * dma_resv_reserve_shared() has been called. + * + * See also &dma_resv.fence for a discussion of the semantics. */ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -280,7 +288,9 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence); * @obj: the reservation object * @fence: the shared fence to add * - * Add a fence to the exclusive slot. The obj->lock must be held. + * Add a fence to the exclusive slot. @obj must be locked with dma_resv_lock(). + * Note that this function replaces all fences attached to @obj, see also + * &dma_resv.fence_excl for a discussion of the semantics. */ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -609,9 +619,11 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence) * fence * * Callers are not required to hold specific locks, but maybe hold - * dma_resv_lock() already + * dma_resv_lock() already. + * * RETURNS - * true if all fences signaled, else false + * + * True if all fences signaled, else false. */ bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all) { diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e1ca2080a1ff..c77fd54d033f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,16 +62,90 @@ struct dma_resv_list {

/** * struct dma_resv - a reservation object manages fences for a buffer - * @lock: update side lock - * @seq: sequence count for managing RCU read-side synchronization - * @fence_excl: the exclusive fence, if there is one currently - * @fence: list of current shared fences + * + * There are multiple uses for this, with sometimes slightly different rules in + * how the fence slots are used. + * + * One use is to synchronize cross-driver access to a struct dma_buf, either for + * dynamic buffer management or just to handle implicit synchronization between + * different users of the buffer in userspace. See &dma_buf.resv for a more + * in-depth discussion. + * + * The other major use is to manage access and locking within a driver in a + * buffer based memory manager. struct ttm_buffer_object is the canonical + * example here, since this is were reservation objects originated from. But use + * in drivers is spreading and some drivers also manage struct + * drm_gem_object with the same scheme. */ struct dma_resv { + /** + * @lock: + * + * Update side lock. Don't use directly, instead use the wrapper + * functions like dma_resv_lock() and dma_resv_unlock(). + * + * Drivers which use the reservation object to manage memory dynamically + * also use this lock to protect buffer object state like placement, + * allocation policies or throughout command submission. + */ struct ww_mutex lock; + + /** + * @seq: + * + * Sequence count for managing RCU read-side synchronization, allows + * read-only access to @fence_excl and @fence while ensuring we take a + * consistent snapshot. + */ seqcount_ww_mutex_t seq;

+ /** + * @fence_excl: + * + * The exclusive fence, if there is one currently. + * + * There are two was to update this fence: + * + * - First by calling dma_resv_add_excl_fence(), which replaces all + * fences attached to the reservation object. To guarantee that no + * fences are lost this new fence must signal only after all previous + * fences, both shared and exclusive, have signalled. In some cases it + * is convenient to achieve that by attaching a struct dma_fence_array + * with all the new and old fences. + * + * - Alternatively the fence can be set directly, which leaves the + * shared fences unchanged. To guarantee that no fences are lost this + * new fence must signale only after the previous exclusive fence has + * singalled. Since the shared fences are staying intact, it is not + * necessary to maintain any ordering against those. If semantically + * only a new access is added without actually treating the previous + * one as a dependency the exclusive fences can be strung together + * using struct dma_fence_chain. + * + * Note that actual semantics of what an exclusive or shared fence mean + * is defined by the user, for reservation objects shared across drivers + * see &dma_buf.resv. + */ struct dma_fence __rcu *fence_excl; + + /** + * @fence: + * + * List of current shared fences. + * + * There are no ordering constraints of shared fences against the + * exclusive fence slot. If a waiter needs to wait for all access, it + * has to wait for both set of fences to signal. + * + * A new fence is added by calling dma_resv_add_shared_fence(). Since + * this often needs to be done past the point of no return in command + * submission it cannot fail, and therefor sufficient slots need to be + * reserved by calling dma_resv_reserve_shared(). + * + * Note that actual semantics of what an exclusive or shared fence mean + * is defined by the user, for reservation objects shared across drivers + * see &dma_buf.resv. + */ struct dma_resv_list __rcu *fence; };

@@ -98,6 +172,13 @@ static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {} * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. + * + * When a die situation is indicated by returning -EDEADLK all locks held by + * @ctx must be unlocked and then dma_resv_lock_slow() called on @obj. + * + * Unlocked by calling dma_resv_lock(). + * + * See also dma_resv_lock_interruptible() for the interruptible variant. */ static inline int dma_resv_lock(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -119,6 +200,12 @@ static inline int dma_resv_lock(struct dma_resv *obj, * undefined order, a #ww_acquire_ctx is passed to unwind if a cycle * is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation * object may be locked by itself by passing NULL as @ctx. + * + * When a die situation is indicated by returning -EDEADLK all locks held by + * @ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on + * @obj. + * + * Unlocked by calling dma_resv_lock(). */ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -134,6 +221,8 @@ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, * Acquires the reservation object after a die case. This function * will sleep until the lock becomes available. See dma_resv_lock() as * well. + * + * See also dma_resv_lock_slow_interruptible() for the interruptible variant. */ static inline void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -167,7 +256,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj, * if they overlap with a writer. * * Also note that since no context is provided, no deadlock protection is - * possible. + * possible, which is also not needed for a trylock. * * Returns true if the lock was acquired, false otherwise. */ @@ -193,6 +282,11 @@ static inline bool dma_resv_is_locked(struct dma_resv *obj) * * Returns the context used to lock a reservation object or NULL if no context * was used or the object is not locked at all. + * + * WARNING: This interface is pretty horrible, but TTM needs it because it + * doesn't pass the struct ww_acquire_ctx around in some very long callchains. + * Everyone else just uses it to check whether they're holding a reservation or + * not. */ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) {

-- 2.32.0

Matthew Auld

12:34 p.m.

New subject: [Intel-gfx] [PATCH 7/7] dma-resv: Give the docs a do-over

On Tue, 6 Jul 2021 at 11:12, Daniel Vetter daniel.vetter@ffwll.ch wrote:

...

Specifically document the new/clarified rules around how the shared fences do not have any ordering requirements against the exclusive fence.

But also document all the things a bit better, given how central struct dma_resv to dynamic buffer management the docs have been very inadequat.

Lots more links to other pieces of the puzzle. Unfortunately ttm_buffer_object has no docs, so no links :-(

Explain/complain a bit about dma_resv_locking_ctx(). I still don't like that one, but fixing the ttm call chains is going to be horrible. Plus we want to plug in real slowpath locking when we do that anyway.

Main part of the patch is some actual docs for struct dma_resv.

Overall I think we still have a lot of bad naming in this area (e.g. dma_resv.fence is singular, but contains the multiple shared fences), but I think that's more indicative of how the semantics and rules are just not great.

Another thing that's real awkard is how chaining exclusive fences right now means direct dma_resv.exclusive_fence pointer access with an rcu_assign_pointer. Not so great either.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org

drivers/dma-buf/dma-resv.c | 22 ++++++-- include/linux/dma-resv.h | 104 +++++++++++++++++++++++++++++++++++-- 2 files changed, 116 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index f26c71747d43..898f8d894bbd 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -48,6 +48,8 @@

write operations) or N shared fences (read operations). The RCU

mechanism is used to protect read access to fences from locked

write-side updates.

See struct dma_resv for more details.

*/

DEFINE_WD_CLASS(reservation_ww_class); @@ -137,7 +139,11 @@ EXPORT_SYMBOL(dma_resv_fini);

@num_fences: number of fences we want to add

Should be called before dma_resv_add_shared_fence(). Must

be called with obj->lock held.

be called with @obj locked through dma_resv_lock().

Note that the preallocated slots need to be re-reserved if @obj is unlocked

at any time before callind dma_resv_add_shared_fence(). This is validate when

s/callind/calling s/validate/validated

...

CONFIG_DEBUG_MUTEXES is enabled.

RETURNS

Zero for success, or -errno

@@ -234,8 +240,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max);

@obj: the reservation object

@fence: the shared fence to add

Add a fence to a shared slot, obj->lock must be held, and

Add a fence to a shared slot, @obj must be locked with dma_resv_lock(), and

dma_resv_reserve_shared() has been called.

See also &dma_resv.fence for a discussion of the semantics.

*/

void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -280,7 +288,9 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence);

@obj: the reservation object

@fence: the shared fence to add

"the exclusive fence", or perhaps "the fence to add to the exclusive slot"?

...

Add a fence to the exclusive slot. The obj->lock must be held.

Add a fence to the exclusive slot. @obj must be locked with dma_resv_lock().

Note that this function replaces all fences attached to @obj, see also

&dma_resv.fence_excl for a discussion of the semantics.

*/

void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -609,9 +619,11 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)

fence

Callers are not required to hold specific locks, but maybe hold

dma_resv_lock() already

dma_resv_lock() already.

RETURNS

true if all fences signaled, else false

True if all fences signaled, else false.

*/

bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all) { diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e1ca2080a1ff..c77fd54d033f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,16 +62,90 @@ struct dma_resv_list {

/**

struct dma_resv - a reservation object manages fences for a buffer

@lock: update side lock

@seq: sequence count for managing RCU read-side synchronization

@fence_excl: the exclusive fence, if there is one currently

@fence: list of current shared fences

There are multiple uses for this, with sometimes slightly different rules in

how the fence slots are used.

One use is to synchronize cross-driver access to a struct dma_buf, either for

dynamic buffer management or just to handle implicit synchronization between

different users of the buffer in userspace. See &dma_buf.resv for a more

in-depth discussion.

The other major use is to manage access and locking within a driver in a

buffer based memory manager. struct ttm_buffer_object is the canonical

example here, since this is were reservation objects originated from. But use

s/were/where

...

- in drivers is spreading and some drivers also manage struct
- drm_gem_object with the same scheme.
*/

struct dma_resv {

```
  /**
```
```
   * @lock:
```
```
   *
```

   * Update side lock. Don't use directly, instead use the wrapper

   * functions like dma_resv_lock() and dma_resv_unlock().

```
   *
```

   * Drivers which use the reservation object to manage memory dynamically

   * also use this lock to protect buffer object state like placement,

   * allocation policies or throughout command submission.

```
   */
  struct ww_mutex lock;
```
```
  /**
```
```
   * @seq:
```
```
   *
```

   * Sequence count for managing RCU read-side synchronization, allows

   * read-only access to @fence_excl and @fence while ensuring we take a

```
   * consistent snapshot.
```
```
   */
  seqcount_ww_mutex_t seq;
```
```
  /**
```
```
   * @fence_excl:
```
```
   *
```

   * The exclusive fence, if there is one currently.

```
   *
```

   * There are two was to update this fence:

s/was/ways

...

```
   *
```

   * - First by calling dma_resv_add_excl_fence(), which replaces all

   *   fences attached to the reservation object. To guarantee that no

   *   fences are lost this new fence must signal only after all previous

   *   fences, both shared and exclusive, have signalled. In some cases it/

Random slash at the end

...

   *   is convenient to achieve that by attaching a struct dma_fence_array

   *   with all the new and old fences.

```
   *
```

   * - Alternatively the fence can be set directly, which leaves the

   *   shared fences unchanged. To guarantee that no fences are lost this

   *   new fence must signale only after the previous exclusive fence has

s/signale/signal

...

   *   singalled. Since the shared fences are staying intact, it is not

s/singalled/signalled

...

   *   necessary to maintain any ordering against those. If semantically

   *   only a new access is added without actually treating the previous

   *   one as a dependency the exclusive fences can be strung together

```
   *   using struct dma_fence_chain.
```
```
   *
```

   * Note that actual semantics of what an exclusive or shared fence mean

   * is defined by the user, for reservation objects shared across drivers

```
   * see &dma_buf.resv.
```

   */
  struct dma_fence __rcu *fence_excl;

```
  /**
```
```
   * @fence:
```
```
   *
```
```
   * List of current shared fences.
```
```
   *
```

   * There are no ordering constraints of shared fences against the

   * exclusive fence slot. If a waiter needs to wait for all access, it

   * has to wait for both set of fences to signal.

```
   *
```

   * A new fence is added by calling dma_resv_add_shared_fence(). Since

   * this often needs to be done past the point of no return in command

   * submission it cannot fail, and therefor sufficient slots need to be

s/therefor/therefore

...

   * reserved by calling dma_resv_reserve_shared().
   *
   * Note that actual semantics of what an exclusive or shared fence mean
   * is defined by the user, for reservation objects shared across drivers
   * see &dma_buf.resv.
   */
  struct dma_resv_list __rcu *fence;
};

@@ -98,6 +172,13 @@ static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {}

undefined order, a #ww_acquire_ctx is passed to unwind if a cycle

is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation

object may be locked by itself by passing NULL as @ctx.

When a die situation is indicated by returning -EDEADLK all locks held by

@ctx must be unlocked and then dma_resv_lock_slow() called on @obj.

Unlocked by calling dma_resv_lock().

dma_resv_unlock()

...

See also dma_resv_lock_interruptible() for the interruptible variant.

*/

static inline int dma_resv_lock(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -119,6 +200,12 @@ static inline int dma_resv_lock(struct dma_resv *obj,

undefined order, a #ww_acquire_ctx is passed to unwind if a cycle

is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation

object may be locked by itself by passing NULL as @ctx.

When a die situation is indicated by returning -EDEADLK all locks held by

@ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on

@obj.

Unlocked by calling dma_resv_lock().

dma_resv_unlock()

fwiw, Reviewed-by: Matthew Auld matthew.auld@intel.com

...

*/ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -134,6 +221,8 @@ static inline int dma_resv_lock_interruptible(struct dma_resv *obj,

Acquires the reservation object after a die case. This function

will sleep until the lock becomes available. See dma_resv_lock() as

well.

See also dma_resv_lock_slow_interruptible() for the interruptible variant.

*/

static inline void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -167,7 +256,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj,

if they overlap with a writer.

Also note that since no context is provided, no deadlock protection is

possible.

possible, which is also not needed for a trylock.

Returns true if the lock was acquired, false otherwise.

*/

@@ -193,6 +282,11 @@ static inline bool dma_resv_is_locked(struct dma_resv *obj)

Returns the context used to lock a reservation object or NULL if no context

was used or the object is not locked at all.

WARNING: This interface is pretty horrible, but TTM needs it because it

doesn't pass the struct ww_acquire_ctx around in some very long callchains.

Everyone else just uses it to check whether they're holding a reservation or

not.

*/

static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) { -- 2.32.0

Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Jason Ekstrand

11:47 p.m.

New subject: [PATCH 7/7] dma-resv: Give the docs a do-over

On Tue, Jul 6, 2021 at 5:12 AM Daniel Vetter daniel.vetter@ffwll.ch wrote:

...

Specifically document the new/clarified rules around how the shared fences do not have any ordering requirements against the exclusive fence.

But also document all the things a bit better, given how central struct dma_resv to dynamic buffer management the docs have been very inadequat.

Lots more links to other pieces of the puzzle. Unfortunately ttm_buffer_object has no docs, so no links :-(

Explain/complain a bit about dma_resv_locking_ctx(). I still don't like that one, but fixing the ttm call chains is going to be horrible. Plus we want to plug in real slowpath locking when we do that anyway.

Main part of the patch is some actual docs for struct dma_resv.

Overall I think we still have a lot of bad naming in this area (e.g. dma_resv.fence is singular, but contains the multiple shared fences), but I think that's more indicative of how the semantics and rules are just not great.

Another thing that's real awkard is how chaining exclusive fences right now means direct dma_resv.exclusive_fence pointer access with an rcu_assign_pointer. Not so great either.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org

drivers/dma-buf/dma-resv.c | 22 ++++++-- include/linux/dma-resv.h | 104 +++++++++++++++++++++++++++++++++++-- 2 files changed, 116 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index f26c71747d43..898f8d894bbd 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -48,6 +48,8 @@

write operations) or N shared fences (read operations). The RCU

mechanism is used to protect read access to fences from locked

write-side updates.

See struct dma_resv for more details.

*/

DEFINE_WD_CLASS(reservation_ww_class); @@ -137,7 +139,11 @@ EXPORT_SYMBOL(dma_resv_fini);

@num_fences: number of fences we want to add

Should be called before dma_resv_add_shared_fence(). Must

be called with obj->lock held.

be called with @obj locked through dma_resv_lock().

Note that the preallocated slots need to be re-reserved if @obj is unlocked

at any time before callind dma_resv_add_shared_fence(). This is validate when

validated

...

CONFIG_DEBUG_MUTEXES is enabled.

RETURNS

Zero for success, or -errno

@@ -234,8 +240,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max);

@obj: the reservation object

@fence: the shared fence to add

Add a fence to a shared slot, obj->lock must be held, and

Add a fence to a shared slot, @obj must be locked with dma_resv_lock(), and

dma_resv_reserve_shared() has been called.

See also &dma_resv.fence for a discussion of the semantics.

*/

void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -280,7 +288,9 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence);

@obj: the reservation object

@fence: the shared fence to add

Add a fence to the exclusive slot. The obj->lock must be held.

Add a fence to the exclusive slot. @obj must be locked with dma_resv_lock().

Note that this function replaces all fences attached to @obj, see also

&dma_resv.fence_excl for a discussion of the semantics.

*/

void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { @@ -609,9 +619,11 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)

fence

Callers are not required to hold specific locks, but maybe hold

dma_resv_lock() already

dma_resv_lock() already.

RETURNS

true if all fences signaled, else false

True if all fences signaled, else false.

*/

bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all) { diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e1ca2080a1ff..c77fd54d033f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,16 +62,90 @@ struct dma_resv_list {

/**

struct dma_resv - a reservation object manages fences for a buffer

@lock: update side lock

@seq: sequence count for managing RCU read-side synchronization

@fence_excl: the exclusive fence, if there is one currently

@fence: list of current shared fences

There are multiple uses for this, with sometimes slightly different rules in

how the fence slots are used.

One use is to synchronize cross-driver access to a struct dma_buf, either for

dynamic buffer management or just to handle implicit synchronization between

different users of the buffer in userspace. See &dma_buf.resv for a more

in-depth discussion.

The other major use is to manage access and locking within a driver in a

buffer based memory manager. struct ttm_buffer_object is the canonical

example here, since this is were reservation objects originated from. But use

s/were/where/

...

- in drivers is spreading and some drivers also manage struct
- drm_gem_object with the same scheme.
*/

struct dma_resv {

```
  /**
```
```
   * @lock:
```
```
   *
```

   * Update side lock. Don't use directly, instead use the wrapper

   * functions like dma_resv_lock() and dma_resv_unlock().

```
   *
```

   * Drivers which use the reservation object to manage memory dynamically

   * also use this lock to protect buffer object state like placement,

   * allocation policies or throughout command submission.

```
   */
  struct ww_mutex lock;
```
```
  /**
```
```
   * @seq:
```
```
   *
```

   * Sequence count for managing RCU read-side synchronization, allows

   * read-only access to @fence_excl and @fence while ensuring we take a

```
   * consistent snapshot.
```
```
   */
  seqcount_ww_mutex_t seq;
```
```
  /**
```
```
   * @fence_excl:
```
```
   *
```

   * The exclusive fence, if there is one currently.

```
   *
```

   * There are two was to update this fence:

```
   *
```

   * - First by calling dma_resv_add_excl_fence(), which replaces all

   *   fences attached to the reservation object. To guarantee that no

   *   fences are lost this new fence must signal only after all previous

lost, this

...

   *   fences, both shared and exclusive, have signalled. In some cases it

   *   is convenient to achieve that by attaching a struct dma_fence_array

   *   with all the new and old fences.

```
   *
```

   * - Alternatively the fence can be set directly, which leaves the

   *   shared fences unchanged. To guarantee that no fences are lost this

   *   new fence must signale only after the previous exclusive fence has

s/signale/signal/

...

   *   singalled. Since the shared fences are staying intact, it is not

   *   necessary to maintain any ordering against those. If semantically

   *   only a new access is added without actually treating the previous

   *   one as a dependency the exclusive fences can be strung together

```
   *   using struct dma_fence_chain.
```
```
   *
```

   * Note that actual semantics of what an exclusive or shared fence mean

   * is defined by the user, for reservation objects shared across drivers

```
   * see &dma_buf.resv.
```

   */
  struct dma_fence __rcu *fence_excl;

```
  /**
```
```
   * @fence:
```
```
   *
```
```
   * List of current shared fences.
```
```
   *
```

   * There are no ordering constraints of shared fences against the

   * exclusive fence slot. If a waiter needs to wait for all access, it

   * has to wait for both set of fences to signal.

sets

...

```
   *
```

   * A new fence is added by calling dma_resv_add_shared_fence(). Since

   * this often needs to be done past the point of no return in command

   * submission it cannot fail, and therefor sufficient slots need to be

therefore

...

   * reserved by calling dma_resv_reserve_shared().
   *
   * Note that actual semantics of what an exclusive or shared fence mean
   * is defined by the user, for reservation objects shared across drivers
   * see &dma_buf.resv.
   */
  struct dma_resv_list __rcu *fence;
};

@@ -98,6 +172,13 @@ static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {}

undefined order, a #ww_acquire_ctx is passed to unwind if a cycle

is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation

object may be locked by itself by passing NULL as @ctx.

When a die situation is indicated by returning -EDEADLK all locks held by

@ctx must be unlocked and then dma_resv_lock_slow() called on @obj.

Unlocked by calling dma_resv_lock().

unlock?

...

See also dma_resv_lock_interruptible() for the interruptible variant.

*/

static inline int dma_resv_lock(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -119,6 +200,12 @@ static inline int dma_resv_lock(struct dma_resv *obj,

undefined order, a #ww_acquire_ctx is passed to unwind if a cycle

is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation

object may be locked by itself by passing NULL as @ctx.

When a die situation is indicated by returning -EDEADLK all locks held by

@ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on

@obj.

Unlocked by calling dma_resv_lock().

unlock?

...

*/ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -134,6 +221,8 @@ static inline int dma_resv_lock_interruptible(struct dma_resv *obj,

Acquires the reservation object after a die case. This function

will sleep until the lock becomes available. See dma_resv_lock() as

well.

See also dma_resv_lock_slow_interruptible() for the interruptible variant.

*/

static inline void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx) @@ -167,7 +256,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj,

if they overlap with a writer.

Also note that since no context is provided, no deadlock protection is

possible.

possible, which is also not needed for a trylock.

Returns true if the lock was acquired, false otherwise.

*/

@@ -193,6 +282,11 @@ static inline bool dma_resv_is_locked(struct dma_resv *obj)

Returns the context used to lock a reservation object or NULL if no context

was used or the object is not locked at all.

WARNING: This interface is pretty horrible, but TTM needs it because it

doesn't pass the struct ww_acquire_ctx around in some very long callchains.

Everyone else just uses it to check whether they're holding a reservation or

not.

*/

static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) { -- 2.32.0

Christian König

7 Jul 7 Jul

8:06 a.m.

New subject: [Linaro-mm-sig] [PATCH 7/7] dma-resv: Give the docs a do-over

Am 06.07.21 um 12:12 schrieb Daniel Vetter:

...

Specifically document the new/clarified rules around how the shared fences do not have any ordering requirements against the exclusive fence.

But also document all the things a bit better, given how central struct dma_resv to dynamic buffer management the docs have been very inadequat.

Lots more links to other pieces of the puzzle. Unfortunately ttm_buffer_object has no docs, so no links :-(

Explain/complain a bit about dma_resv_locking_ctx(). I still don't like that one, but fixing the ttm call chains is going to be horrible. Plus we want to plug in real slowpath locking when we do that anyway.

Main part of the patch is some actual docs for struct dma_resv.

Overall I think we still have a lot of bad naming in this area (e.g. dma_resv.fence is singular, but contains the multiple shared fences), but I think that's more indicative of how the semantics and rules are just not great.

Another thing that's real awkard is how chaining exclusive fences right now means direct dma_resv.exclusive_fence pointer access with an rcu_assign_pointer. Not so great either.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org

drivers/dma-buf/dma-resv.c | 22 ++++++-- include/linux/dma-resv.h | 104 +++++++++++++++++++++++++++++++++++-- 2 files changed, 116 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index f26c71747d43..898f8d894bbd 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -48,6 +48,8 @@

write operations) or N shared fences (read operations). The RCU

mechanism is used to protect read access to fences from locked

write-side updates.

See struct dma_resv for more details.

*/

DEFINE_WD_CLASS(reservation_ww_class);

@@ -137,7 +139,11 @@ EXPORT_SYMBOL(dma_resv_fini);

@num_fences: number of fences we want to add

Should be called before dma_resv_add_shared_fence(). Must

be called with obj->lock held.

be called with @obj locked through dma_resv_lock().

Note that the preallocated slots need to be re-reserved if @obj is unlocked

at any time before callind dma_resv_add_shared_fence(). This is validate when

CONFIG_DEBUG_MUTEXES is enabled.

RETURNS

Zero for success, or -errno

@@ -234,8 +240,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max);

@obj: the reservation object

@fence: the shared fence to add

Add a fence to a shared slot, obj->lock must be held, and

Add a fence to a shared slot, @obj must be locked with dma_resv_lock(), and

dma_resv_reserve_shared() has been called.

See also &dma_resv.fence for a discussion of the semantics.

*/ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) {

@@ -280,7 +288,9 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence);

@obj: the reservation object

@fence: the shared fence to add

Add a fence to the exclusive slot. The obj->lock must be held.

Add a fence to the exclusive slot. @obj must be locked with dma_resv_lock().

Note that this function replaces all fences attached to @obj, see also

&dma_resv.fence_excl for a discussion of the semantics.

*/ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) {

@@ -609,9 +619,11 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)

fence

Callers are not required to hold specific locks, but maybe hold

dma_resv_lock() already

dma_resv_lock() already.

RETURNS

true if all fences signaled, else false

True if all fences signaled, else false.

*/ bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all) {

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e1ca2080a1ff..c77fd54d033f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,16 +62,90 @@ struct dma_resv_list {

/**

struct dma_resv - a reservation object manages fences for a buffer

@lock: update side lock

@seq: sequence count for managing RCU read-side synchronization

@fence_excl: the exclusive fence, if there is one currently

@fence: list of current shared fences

There are multiple uses for this, with sometimes slightly different rules in

how the fence slots are used.

One use is to synchronize cross-driver access to a struct dma_buf, either for

dynamic buffer management or just to handle implicit synchronization between

different users of the buffer in userspace. See &dma_buf.resv for a more

in-depth discussion.

The other major use is to manage access and locking within a driver in a

buffer based memory manager. struct ttm_buffer_object is the canonical

example here, since this is were reservation objects originated from. But use

in drivers is spreading and some drivers also manage struct

drm_gem_object with the same scheme.

I would still make that even harder, e.g. mentioning that you run into use after free and the resulting memory corruption if you don't obey the rules.

Apart from that with the spelling stuff pointed out by others fixed the patch is Reviewed-by: Christian König christian.koenig@amd.com

Regards, Christian.

...

*/ struct dma_resv {
/**
* @lock:
*
* Update side lock. Don't use directly, instead use the wrapper
* functions like dma_resv_lock() and dma_resv_unlock().
*
* Drivers which use the reservation object to manage memory dynamically
* also use this lock to protect buffer object state like placement,
* allocation policies or throughout command submission.
*/
struct ww_mutex lock;
/**
* @seq:
*
* Sequence count for managing RCU read-side synchronization, allows
* read-only access to @fence_excl and @fence while ensuring we take a
* consistent snapshot.
*/
seqcount_ww_mutex_t seq;
/**
* @fence_excl:
*
* The exclusive fence, if there is one currently.
*
* There are two was to update this fence:
*
* - First by calling dma_resv_add_excl_fence(), which replaces all
*   fences attached to the reservation object. To guarantee that no
*   fences are lost this new fence must signal only after all previous
*   fences, both shared and exclusive, have signalled. In some cases it
*   is convenient to achieve that by attaching a struct dma_fence_array
*   with all the new and old fences.
*
* - Alternatively the fence can be set directly, which leaves the
*   shared fences unchanged. To guarantee that no fences are lost this
*   new fence must signale only after the previous exclusive fence has
*   singalled. Since the shared fences are staying intact, it is not
*   necessary to maintain any ordering against those. If semantically
*   only a new access is added without actually treating the previous
*   one as a dependency the exclusive fences can be strung together
*   using struct dma_fence_chain.
*
* Note that actual semantics of what an exclusive or shared fence mean
* is defined by the user, for reservation objects shared across drivers
* see &dma_buf.resv.
*/
struct dma_fence __rcu *fence_excl;
/**
* @fence:
*
* List of current shared fences.
*
* There are no ordering constraints of shared fences against the
* exclusive fence slot. If a waiter needs to wait for all access, it
* has to wait for both set of fences to signal.
*
* A new fence is added by calling dma_resv_add_shared_fence(). Since
* this often needs to be done past the point of no return in command
* submission it cannot fail, and therefor sufficient slots need to be
* reserved by calling dma_resv_reserve_shared().
*
* Note that actual semantics of what an exclusive or shared fence mean
* is defined by the user, for reservation objects shared across drivers
* see &dma_buf.resv.
*/
struct dma_resv_list __rcu *fence; };
@@ -98,6 +172,13 @@ static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {}

undefined order, a #ww_acquire_ctx is passed to unwind if a cycle

is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation

object may be locked by itself by passing NULL as @ctx.

When a die situation is indicated by returning -EDEADLK all locks held by

@ctx must be unlocked and then dma_resv_lock_slow() called on @obj.

Unlocked by calling dma_resv_lock().

See also dma_resv_lock_interruptible() for the interruptible variant.

*/ static inline int dma_resv_lock(struct dma_resv *obj, struct ww_acquire_ctx *ctx)

@@ -119,6 +200,12 @@ static inline int dma_resv_lock(struct dma_resv *obj,

undefined order, a #ww_acquire_ctx is passed to unwind if a cycle

is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation

object may be locked by itself by passing NULL as @ctx.

When a die situation is indicated by returning -EDEADLK all locks held by

@ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on

@obj.

Unlocked by calling dma_resv_lock().

*/ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx)

@@ -134,6 +221,8 @@ static inline int dma_resv_lock_interruptible(struct dma_resv *obj,

Acquires the reservation object after a die case. This function

will sleep until the lock becomes available. See dma_resv_lock() as

well.

See also dma_resv_lock_slow_interruptible() for the interruptible variant.

*/ static inline void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx)

@@ -167,7 +256,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj,

if they overlap with a writer.

Also note that since no context is provided, no deadlock protection is

possible.

possible, which is also not needed for a trylock.

Returns true if the lock was acquired, false otherwise.

*/

@@ -193,6 +282,11 @@ static inline bool dma_resv_is_locked(struct dma_resv *obj)

Returns the context used to lock a reservation object or NULL if no context

was used or the object is not locked at all.

WARNING: This interface is pretty horrible, but TTM needs it because it

doesn't pass the struct ww_acquire_ctx around in some very long callchains.

Everyone else just uses it to check whether they're holding a reservation or

not.

*/ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) {

Daniel Vetter

9:13 a.m.

New subject: [Linaro-mm-sig] [PATCH 7/7] dma-resv: Give the docs a do-over

On Wed, Jul 7, 2021 at 10:06 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...

Am 06.07.21 um 12:12 schrieb Daniel Vetter:

...
Specifically document the new/clarified rules around how the shared fences do not have any ordering requirements against the exclusive fence.

But also document all the things a bit better, given how central struct dma_resv to dynamic buffer management the docs have been very inadequat.

Lots more links to other pieces of the puzzle. Unfortunately ttm_buffer_object has no docs, so no links :-(

Explain/complain a bit about dma_resv_locking_ctx(). I still don't like that one, but fixing the ttm call chains is going to be horrible. Plus we want to plug in real slowpath locking when we do that anyway.

Main part of the patch is some actual docs for struct dma_resv.

Overall I think we still have a lot of bad naming in this area (e.g. dma_resv.fence is singular, but contains the multiple shared fences), but I think that's more indicative of how the semantics and rules are just not great.

Another thing that's real awkard is how chaining exclusive fences right now means direct dma_resv.exclusive_fence pointer access with an rcu_assign_pointer. Not so great either.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org

drivers/dma-buf/dma-resv.c | 22 ++++++-- include/linux/dma-resv.h | 104 +++++++++++++++++++++++++++++++++++-- 2 files changed, 116 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index f26c71747d43..898f8d894bbd 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -48,6 +48,8 @@

write operations) or N shared fences (read operations). The RCU

mechanism is used to protect read access to fences from locked

write-side updates.

See struct dma_resv for more details.

*/

DEFINE_WD_CLASS(reservation_ww_class);

@@ -137,7 +139,11 @@ EXPORT_SYMBOL(dma_resv_fini);

@num_fences: number of fences we want to add

Should be called before dma_resv_add_shared_fence(). Must

be called with obj->lock held.

be called with @obj locked through dma_resv_lock().

Note that the preallocated slots need to be re-reserved if @obj is unlocked

at any time before callind dma_resv_add_shared_fence(). This is validate when

CONFIG_DEBUG_MUTEXES is enabled.

RETURNS

Zero for success, or -errno

@@ -234,8 +240,10 @@ EXPORT_SYMBOL(dma_resv_reset_shared_max);

@obj: the reservation object

@fence: the shared fence to add

Add a fence to a shared slot, obj->lock must be held, and

Add a fence to a shared slot, @obj must be locked with dma_resv_lock(), and

dma_resv_reserve_shared() has been called.

See also &dma_resv.fence for a discussion of the semantics.

*/ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) {

@@ -280,7 +288,9 @@ EXPORT_SYMBOL(dma_resv_add_shared_fence);

@obj: the reservation object

@fence: the shared fence to add

Add a fence to the exclusive slot. The obj->lock must be held.

Add a fence to the exclusive slot. @obj must be locked with dma_resv_lock().

Note that this function replaces all fences attached to @obj, see also

&dma_resv.fence_excl for a discussion of the semantics.

*/ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) {

@@ -609,9 +619,11 @@ static inline int dma_resv_test_signaled_single(struct dma_fence *passed_fence)

fence

Callers are not required to hold specific locks, but maybe hold

dma_resv_lock() already

dma_resv_lock() already.

RETURNS

true if all fences signaled, else false

True if all fences signaled, else false.

*/ bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all) {

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e1ca2080a1ff..c77fd54d033f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,16 +62,90 @@ struct dma_resv_list {

/**

struct dma_resv - a reservation object manages fences for a buffer

@lock: update side lock

@seq: sequence count for managing RCU read-side synchronization

@fence_excl: the exclusive fence, if there is one currently

@fence: list of current shared fences

There are multiple uses for this, with sometimes slightly different rules in

how the fence slots are used.

One use is to synchronize cross-driver access to a struct dma_buf, either for

dynamic buffer management or just to handle implicit synchronization between

different users of the buffer in userspace. See &dma_buf.resv for a more

in-depth discussion.

The other major use is to manage access and locking within a driver in a

buffer based memory manager. struct ttm_buffer_object is the canonical

example here, since this is were reservation objects originated from. But use

in drivers is spreading and some drivers also manage struct

drm_gem_object with the same scheme.

I would still make that even harder, e.g. mentioning that you run into use after free and the resulting memory corruption if you don't obey the rules.

Hm I think that's best documented in dma_buf.resv kerneldoc, pointing at the rules here and explaining what can go wrong on the other driver if we just overwrite fences and breakt the DAG. I'll add something there. -Daniel

...

Apart from that with the spelling stuff pointed out by others fixed the patch is Reviewed-by: Christian König christian.koenig@amd.com

Regards, Christian.

...
*/ struct dma_resv {
/**
 * @lock:
 *
 * Update side lock. Don't use directly, instead use the wrapper
 * functions like dma_resv_lock() and dma_resv_unlock().
 *
 * Drivers which use the reservation object to manage memory dynamically
 * also use this lock to protect buffer object state like placement,
 * allocation policies or throughout command submission.
 */
struct ww_mutex lock;
/**
 * @seq:
 *
 * Sequence count for managing RCU read-side synchronization, allows
 * read-only access to @fence_excl and @fence while ensuring we take a
 * consistent snapshot.
 */
seqcount_ww_mutex_t seq;
/**
 * @fence_excl:
 *
 * The exclusive fence, if there is one currently.
 *
 * There are two was to update this fence:
 *
 * - First by calling dma_resv_add_excl_fence(), which replaces all
 *   fences attached to the reservation object. To guarantee that no
 *   fences are lost this new fence must signal only after all previous
 *   fences, both shared and exclusive, have signalled. In some cases it
 *   is convenient to achieve that by attaching a struct dma_fence_array
 *   with all the new and old fences.
 *
 * - Alternatively the fence can be set directly, which leaves the
 *   shared fences unchanged. To guarantee that no fences are lost this
 *   new fence must signale only after the previous exclusive fence has
 *   singalled. Since the shared fences are staying intact, it is not
 *   necessary to maintain any ordering against those. If semantically
 *   only a new access is added without actually treating the previous
 *   one as a dependency the exclusive fences can be strung together
 *   using struct dma_fence_chain.
 *
 * Note that actual semantics of what an exclusive or shared fence mean
 * is defined by the user, for reservation objects shared across drivers
 * see &dma_buf.resv.
 */
struct dma_fence __rcu *fence_excl;
/**
 * @fence:
 *
 * List of current shared fences.
 *
 * There are no ordering constraints of shared fences against the
 * exclusive fence slot. If a waiter needs to wait for all access, it
 * has to wait for both set of fences to signal.
 *
 * A new fence is added by calling dma_resv_add_shared_fence(). Since
 * this often needs to be done past the point of no return in command
 * submission it cannot fail, and therefor sufficient slots need to be
 * reserved by calling dma_resv_reserve_shared().
 *
 * Note that actual semantics of what an exclusive or shared fence mean
 * is defined by the user, for reservation objects shared across drivers
 * see &dma_buf.resv.
 */
struct dma_resv_list __rcu *fence;
};
@@ -98,6 +172,13 @@ static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {}

undefined order, a #ww_acquire_ctx is passed to unwind if a cycle

is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation

object may be locked by itself by passing NULL as @ctx.

When a die situation is indicated by returning -EDEADLK all locks held by

@ctx must be unlocked and then dma_resv_lock_slow() called on @obj.

Unlocked by calling dma_resv_lock().

See also dma_resv_lock_interruptible() for the interruptible variant.

*/ static inline int dma_resv_lock(struct dma_resv *obj, struct ww_acquire_ctx *ctx)

@@ -119,6 +200,12 @@ static inline int dma_resv_lock(struct dma_resv *obj,

undefined order, a #ww_acquire_ctx is passed to unwind if a cycle

is detected. See ww_mutex_lock() and ww_acquire_init(). A reservation

object may be locked by itself by passing NULL as @ctx.

When a die situation is indicated by returning -EDEADLK all locks held by

@ctx must be unlocked and then dma_resv_lock_slow_interruptible() called on

@obj.

Unlocked by calling dma_resv_lock().

*/ static inline int dma_resv_lock_interruptible(struct dma_resv *obj, struct ww_acquire_ctx *ctx)

@@ -134,6 +221,8 @@ static inline int dma_resv_lock_interruptible(struct dma_resv *obj,

Acquires the reservation object after a die case. This function

will sleep until the lock becomes available. See dma_resv_lock() as

well.

See also dma_resv_lock_slow_interruptible() for the interruptible variant.

*/ static inline void dma_resv_lock_slow(struct dma_resv *obj, struct ww_acquire_ctx *ctx)

@@ -167,7 +256,7 @@ static inline int dma_resv_lock_slow_interruptible(struct dma_resv *obj,

if they overlap with a writer.

Also note that since no context is provided, no deadlock protection is

possible.

possible, which is also not needed for a trylock.

Returns true if the lock was acquired, false otherwise.

*/

@@ -193,6 +282,11 @@ static inline bool dma_resv_is_locked(struct dma_resv *obj)

Returns the context used to lock a reservation object or NULL if no context

was used or the object is not locked at all.

WARNING: This interface is pretty horrible, but TTM needs it because it

doesn't pass the struct ww_acquire_ctx around in some very long callchains.

Everyone else just uses it to check whether they're holding a reservation or

not.

*/ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) {

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

1393

Age (days ago)

1394

Last active (days ago)

dri-devel@lists.freedesktop.org

17 comments

5 participants

tags (0)

participants (5)

Christian König
Daniel Vetter
Jason Ekstrand
Lucas Stach
Matthew Auld