[PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup

List overview All Threads
Download

newer

older

[PATCH v1 2/7] nouveau: ACPI: Use...

[RFC v2 0/8] CPU + GPU...

Tvrtko Ursulin

5 Oct 2021 5 Oct '21

11:31 a.m.

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa) when rendering is done on Intel dgfx and scanout/composition on Intel igfx.

Before this patch the driver was not quite ready for that setup, mainly because it was able to emit a semaphore wait between the two GPUs, which results in deadlocks because semaphore target location in HWSP is neither shared between the two, nor mapped in both GGTT spaces.

To fix it the patch adds an additional check to a couple of relevant code paths in order to prevent using semaphores for inter-engine synchronisation when relevant objects are not in the same GGTT space.

v2: * Avoid adding rq->i915. (Chris)

v3: * Use GGTT which describes the limit more precisely.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com --- drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 79da5eca60af..4f189982f67e 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to, return 0; }

+static bool +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from) +{ + return to->engine->gt->ggtt == from->engine->gt->ggtt; +} + static int emit_semaphore_wait(struct i915_request *to, struct i915_request *from, @@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to, const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask; struct i915_sw_fence *wait = &to->submit;

+ if (!can_use_semaphore_wait(to, from)) + goto await_fence; + if (!intel_context_use_semaphores(to->context)) goto await_fence;

@@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to, * immediate execution, and so we must wait until it reaches the * active slot. */ - if (intel_engine_has_semaphores(to->engine) && + if (can_use_semaphore_wait(to, from) && + intel_engine_has_semaphores(to->engine) && !i915_request_has_initial_breadcrumb(to)) { err = __emit_semaphore_wait(to, from, from->fence.seqno - 1); if (err < 0)

-- 2.30.2

Show replies by date

Thomas Hellström

5 Oct 5 Oct

1:05 p.m.

Hi, Tvrtko,

On 10/5/21 13:31, Tvrtko Ursulin wrote:

...

From: Tvrtko Ursulin tvrtko.ursulin@intel.com

In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa) when rendering is done on Intel dgfx and scanout/composition on Intel igfx.

Before this patch the driver was not quite ready for that setup, mainly because it was able to emit a semaphore wait between the two GPUs, which results in deadlocks because semaphore target location in HWSP is neither shared between the two, nor mapped in both GGTT spaces.

To fix it the patch adds an additional check to a couple of relevant code paths in order to prevent using semaphores for inter-engine synchronisation when relevant objects are not in the same GGTT space.

v2:

Avoid adding rq->i915. (Chris)

v3:

Use GGTT which describes the limit more precisely.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com

An IMO pretty important bugfix. I read up a bit on the previous discussion on this, and from what I understand the other two options were

1) Ripping out the semaphore code, 2) Consider dma-fences from other instances of the same driver as foreign.

For imported dma-bufs we do 2), but particularly with lmem and p2p that's a more straightforward decision.

I don't think 1) is a reasonable approach to fix this bug, (but perhaps as a general cleanup?), and for 2) yes I guess we might end up doing that, unless we find some real benefits in treating same-driver-separate-device dma-fences as local, but for this particular bug, IMO this is a reasonable fix.

So,

Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com

...

drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 79da5eca60af..4f189982f67e 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to, return 0; }

+static bool +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from) +{

return to->engine->gt->ggtt == from->engine->gt->ggtt;

+}

static int emit_semaphore_wait(struct i915_request *to, struct i915_request *from,

@@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to, const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask; struct i915_sw_fence *wait = &to->submit;
if (!can_use_semaphore_wait(to, from))
goto await_fence;
if (!intel_context_use_semaphores(to->context)) goto await_fence;
@@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to, * immediate execution, and so we must wait until it reaches the * active slot. */

if (intel_engine_has_semaphores(to->engine) &&
if (can_use_semaphore_wait(to, from) &&
   intel_engine_has_semaphores(to->engine) &&
 !i915_request_has_initial_breadcrumb(to)) {
err = __emit_semaphore_wait(to, from, from->fence.seqno - 1); if (err < 0)

Tvrtko Ursulin

2:55 p.m.

On 05/10/2021 14:05, Thomas Hellström wrote:

...

Hi, Tvrtko,

On 10/5/21 13:31, Tvrtko Ursulin wrote:

...
From: Tvrtko Ursulin tvrtko.ursulin@intel.com

In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa) when rendering is done on Intel dgfx and scanout/composition on Intel igfx.

Before this patch the driver was not quite ready for that setup, mainly because it was able to emit a semaphore wait between the two GPUs, which results in deadlocks because semaphore target location in HWSP is neither shared between the two, nor mapped in both GGTT spaces.

To fix it the patch adds an additional check to a couple of relevant code paths in order to prevent using semaphores for inter-engine synchronisation when relevant objects are not in the same GGTT space.

v2: * Avoid adding rq->i915. (Chris)

v3: * Use GGTT which describes the limit more precisely.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com

An IMO pretty important bugfix. I read up a bit on the previous discussion on this, and from what I understand the other two options were

Ripping out the semaphore code,

Consider dma-fences from other instances of the same driver as foreign.

Yes, with the caveat on the second point that there is a multi-tile scenario, granted of limited consequence because it only applies is someone tries to run that wo/ GuC, where the "same driver" check is not enough. This patch handles that case as well. And of course it is hypothetical someone would be able to create a inter-tile dependency there. Probably nothing in the current code does it.

...

For imported dma-bufs we do 2), but particularly with lmem and p2p that's a more straightforward decision.

I am not immediately familiar with p2p considerations.

...

I don't think 1) is a reasonable approach to fix this bug, (but perhaps as a general cleanup?), and for 2) yes I guess we might end up doing that, unless we find some real benefits in treating same-driver-separate-device dma-fences as local, but for this particular bug, IMO this is a reasonable fix.

On the option of removing the semaphore inter-optimisation I would not call it cleanup since it had clear performance benefits. I personally don't have those benchmarks results saved though. So I'd proceed with caution there if the code can harmlessly remain in the confines of the execlists backend.

Second topic, the whole same driver fence upcast issue, I suppose can be discussed along the lines of whether priority inheritance across drivers is useful. Like for instance page flip prio boost, which currently does safely work between i915 instances, and is relevant to hybrid graphics. It was safe when I looked at it, courtesy of global scheduler lock. If we wanted to keep that and formalise via an more explicit/generic cross driver API is the question. So unless it is not safe after all, I wouldn't rip it out before the discussion on the big picture happens.

...

So,

Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com

Thanks, I'll push it once again cleared by CI.

Regards,

Tvrtko

...

...

drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 79da5eca60af..4f189982f67e 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to,       return 0; } +static bool +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from) +{ +    return to->engine->gt->ggtt == from->engine->gt->ggtt; +}

static int emit_semaphore_wait(struct i915_request *to,               struct i915_request *from, @@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to,       const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask;       struct i915_sw_fence *wait = &to->submit; +    if (!can_use_semaphore_wait(to, from)) +        goto await_fence;

if (!intel_context_use_semaphores(to->context))           goto await_fence; @@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to,        * immediate execution, and so we must wait until it reaches the        * active slot.        */ -    if (intel_engine_has_semaphores(to->engine) && +    if (can_use_semaphore_wait(to, from) && +        intel_engine_has_semaphores(to->engine) &&           !i915_request_has_initial_breadcrumb(to)) {           err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);           if (err < 0)

Daniel Vetter

13 Oct 13 Oct

12:06 p.m.

On Tue, Oct 05, 2021 at 03:05:25PM +0200, Thomas Hellström wrote:

...

Hi, Tvrtko,

On 10/5/21 13:31, Tvrtko Ursulin wrote:

...
From: Tvrtko Ursulin tvrtko.ursulin@intel.com

In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa) when rendering is done on Intel dgfx and scanout/composition on Intel igfx.

Before this patch the driver was not quite ready for that setup, mainly because it was able to emit a semaphore wait between the two GPUs, which results in deadlocks because semaphore target location in HWSP is neither shared between the two, nor mapped in both GGTT spaces.

To fix it the patch adds an additional check to a couple of relevant code paths in order to prevent using semaphores for inter-engine synchronisation when relevant objects are not in the same GGTT space.

v2:

Avoid adding rq->i915. (Chris)

v3:

Use GGTT which describes the limit more precisely.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com

An IMO pretty important bugfix. I read up a bit on the previous discussion on this, and from what I understand the other two options were

Ripping out the semaphore code,

Consider dma-fences from other instances of the same driver as foreign.

For imported dma-bufs we do 2), but particularly with lmem and p2p that's a more straightforward decision.

I don't think 1) is a reasonable approach to fix this bug, (but perhaps as a general cleanup?), and for 2) yes I guess we might end up doing that, unless we find some real benefits in treating same-driver-separate-device dma-fences as local, but for this particular bug, IMO this is a reasonable fix.

The foreign dma-fences have uapi impact, which Tvrtko shrugged off as "it's a good idea", and not it's really just not. So we still need to that this properly.

...

Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com

But I'm also ok with just merging this as-is so the situation doesn't become too entertaining. -Daniel

...

...
drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 79da5eca60af..4f189982f67e 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to, return 0; } +static bool +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from) +{

return to->engine->gt->ggtt == from->engine->gt->ggtt;

+}

static int emit_semaphore_wait(struct i915_request *to, struct i915_request *from,

@@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to, const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask; struct i915_sw_fence *wait = &to->submit;
if (!can_use_semaphore_wait(to, from))
goto await_fence;
if (!intel_context_use_semaphores(to->context)) goto await_fence;
@@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to, * immediate execution, and so we must wait until it reaches the * active slot. */

if (intel_engine_has_semaphores(to->engine) &&
if (can_use_semaphore_wait(to, from) &&
   intel_engine_has_semaphores(to->engine) &&
 !i915_request_has_initial_breadcrumb(to)) {
err = __emit_semaphore_wait(to, from, from->fence.seqno - 1); if (err < 0)

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Tvrtko Ursulin

4:02 p.m.

On 13/10/2021 13:06, Daniel Vetter wrote:

...

On Tue, Oct 05, 2021 at 03:05:25PM +0200, Thomas Hellström wrote:

...
Hi, Tvrtko,

On 10/5/21 13:31, Tvrtko Ursulin wrote:

...
From: Tvrtko Ursulin tvrtko.ursulin@intel.com

In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa) when rendering is done on Intel dgfx and scanout/composition on Intel igfx.

Before this patch the driver was not quite ready for that setup, mainly because it was able to emit a semaphore wait between the two GPUs, which results in deadlocks because semaphore target location in HWSP is neither shared between the two, nor mapped in both GGTT spaces.

To fix it the patch adds an additional check to a couple of relevant code paths in order to prevent using semaphores for inter-engine synchronisation when relevant objects are not in the same GGTT space.

v2:

Avoid adding rq->i915. (Chris)

v3:

Use GGTT which describes the limit more precisely.

Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com

An IMO pretty important bugfix. I read up a bit on the previous discussion on this, and from what I understand the other two options were

Ripping out the semaphore code,

Consider dma-fences from other instances of the same driver as foreign.

For imported dma-bufs we do 2), but particularly with lmem and p2p that's a more straightforward decision.

I don't think 1) is a reasonable approach to fix this bug, (but perhaps as a general cleanup?), and for 2) yes I guess we might end up doing that, unless we find some real benefits in treating same-driver-separate-device dma-fences as local, but for this particular bug, IMO this is a reasonable fix.

The foreign dma-fences have uapi impact, which Tvrtko shrugged off as "it's a good idea", and not it's really just not. So we still need to that this properly.

I always said lets merge the fix and discuss it. Fix only improved one fail and did not introduce any new issues you are worried about. They were all already there.

So lets start the discussion why it is not a good idea to extend the concept of priority inheritance in the hybrid case?

Today we can have high priority compositor waiting for client rendering, or even I915_PRIORITY_DISPLAY which I _think_ somehow ties into page flips with full screen stuff, and with igpu we do priority inheritance in those cases. Why it is a bad idea to do the same in the hybrid setup?

Regards,

Tvrtko

...

...
Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com

But I'm also ok with just merging this as-is so the situation doesn't become too entertaining. -Daniel

...
...
drivers/gpu/drm/i915/i915_request.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 79da5eca60af..4f189982f67e 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to, return 0; } +static bool +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from) +{

return to->engine->gt->ggtt == from->engine->gt->ggtt;

+}

static int emit_semaphore_wait(struct i915_request *to, struct i915_request *from,

@@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to, const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask; struct i915_sw_fence *wait = &to->submit;
if (!can_use_semaphore_wait(to, from))
goto await_fence;
if (!intel_context_use_semaphores(to->context)) goto await_fence;
@@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to, * immediate execution, and so we must wait until it reaches the * active slot. */

if (intel_engine_has_semaphores(to->engine) &&
if (can_use_semaphore_wait(to, from) &&
   intel_engine_has_semaphores(to->engine) &&
  !i915_request_has_initial_breadcrumb(to)) {
err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
if (err < 0)

1296

Age (days ago)

1304

Last active (days ago)

dri-devel@lists.freedesktop.org

4 comments

3 participants

tags (0)

participants (3)

Daniel Vetter
Thomas Hellström
Tvrtko Ursulin