[PATCH 0/7] CT changes required for GuC submission

List overview All Threads
Download

newer

older

[PATCH 1/2] drm/gud: Add Raspberry...

Start fixing the shared to...

Matthew Brost

6 Jul 2021 6 Jul '21

10:20 p.m.

As part of enabling GuC submission discussed in [1], [2], and [3] we need optimize and update the CT code as this is now in the critical path of submission. This series includes the patches to do that which is the first 7 patches from [3]. The patches should have addressed all the feedback in [3] and should be ready to merge once CI returns a we get a few more RBs.

v2: Fix checkpatch warning, address a couple of Michal's comments v3: Address John Harrison's comments

Signed-off-by: Matthew Brost matthew.brost@intel.com

[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/ [3] https://patchwork.freedesktop.org/series/91840/

Signed-off-by: Matthew Brost matthew.brost@intel.com

John Harrison (1): drm/i915/guc: Module load failure test for CT buffer creation

Matthew Brost (6): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads

.../gt/uc/abi/guc_communication_ctb_abi.h | 3 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 11 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 249 +++++++++++++++--- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 14 +- 4 files changed, 231 insertions(+), 46 deletions(-)

-- 2.28.0

Show replies by date

Matthew Brost

6 Jul 6 Jul

10:20 p.m.

New subject: [PATCH 1/7] drm/i915/guc: Relax CTB response timeout

In upcoming patch we will allow more CTB requests to be sent in parallel to the GuC for processing, so we shouldn't assume any more that GuC will always reply without 10ms.

Use bigger value hardcoded value of 1s instead.

v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option v3: (Daniel Vetter) - Use hardcoded value of 1s rather than config option v4: (Michal) - Use defines for timeout values

Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Michal Wajdeczko michal.wajdeczko@intel.com Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 43409044528e..b86575b99537 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -474,14 +474,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) /* * Fast commands should complete in less than 10us, so sample quickly * up to that length of time, then switch to a slower sleep-wait loop. - * No GuC command should ever take longer than 10ms. + * No GuC command should ever take longer than 10ms but many GuC + * commands can be inflight at time, so use a 1s timeout on the slower + * sleep-wait loop. */ +#define GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS 10 +#define GUC_CTB_RESPONSE_TIMEOUT_LONG_MS 1000 #define done \ (FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \ GUC_HXG_ORIGIN_GUC) - err = wait_for_us(done, 10); + err = wait_for_us(done, GUC_CTB_RESPONSE_TIMEOUT_SHORT_MS); if (err) - err = wait_for(done, 10); + err = wait_for(done, GUC_CTB_RESPONSE_TIMEOUT_LONG_MS); #undef done

if (unlikely(err))

-- 2.28.0

Matthew Brost

10:20 p.m.

New subject: [PATCH 2/7] drm/i915/guc: Improve error message for unsolicited CT response

Improve the error message when a unsolicited CT response is received by printing fence that couldn't be found, the last fence, and all requests with a response outstanding.

Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index b86575b99537..80db59b45c45 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -732,12 +732,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r found = true; break; } - spin_unlock_irqrestore(&ct->requests.lock, flags); - if (!found) { CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence); - return -ENOKEY; + CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence, + ct->requests.last_fence); + list_for_each_entry(req, &ct->requests.pending, link) + CT_ERROR(ct, "request %u awaits response\n", + req->fence); + err = -ENOKEY; } + spin_unlock_irqrestore(&ct->requests.lock, flags);

if (unlikely(err)) return err;

-- 2.28.0

Matthew Brost

10:20 p.m.

New subject: [PATCH 3/7] drm/i915/guc: Increase size of CTB buffers

With the introduction of non-blocking CTBs more than one CTB can be in flight at a time. Increasing the size of the CTBs should reduce how often software hits the case where no space is available in the CTB buffer.

Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 80db59b45c45..43e03aa2dde8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) * +--------+-----------------------------------------------+------+ * * Size of each `CT Buffer`_ must be multiple of 4K. - * As we don't expect too many messages, for now use minimum sizes. + * We don't expect too many messages in flight at any time, unless we are + * using the GuC submission. In that case each request requires a minimum + * 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this + * enough space to avoid backpressure on the driver. We increase the size + * of the receive buffer (relative to the send) to ensure a G2H response + * CTB has a landing spot. */ #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) -#define CTB_G2H_BUFFER_SIZE (SZ_4K) +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE)

struct ct_request { struct list_head link; @@ -643,7 +648,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* beware of buffer wrap case */ if (unlikely(available < 0)) available += size; - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); GEM_BUG_ON(available < 0);

header = cmds[head];

-- 2.28.0

Matthew Brost

10:20 p.m.

New subject: [PATCH 4/7] drm/i915/guc: Add non blocking CTB send function

Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function.

The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer.

The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.

v2: (Michal) - Use define for H2G room calculations - Move INTEL_GUC_SEND_NB define (Daniel Vetter) - Use msleep_interruptible rather than cond_resched v3: (Michal) - Move includes to following patch - s/INTEL_GUC_SEND_NB/INTEL_GUC_CT_SEND_NB/g v4: (John H) - Update comment, add type local variable

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: John Harrison John.C.Harrison@Intel.com --- .../gt/uc/abi/guc_communication_ctb_abi.h | 3 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 11 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 ++++++++++++++++--- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 +- 4 files changed, 91 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h index e933ca02d0eb..99e1fad5ca20 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h @@ -79,7 +79,8 @@ static_assert(sizeof(struct guc_ct_buffer_desc) == 64); * +---+-------+--------------------------------------------------------------+ */

-#define GUC_CTB_MSG_MIN_LEN 1u +#define GUC_CTB_HDR_LEN 1u +#define GUC_CTB_MSG_MIN_LEN GUC_CTB_HDR_LEN #define GUC_CTB_MSG_MAX_LEN 256u #define GUC_CTB_MSG_0_FENCE (0xffff << 16) #define GUC_CTB_MSG_0_FORMAT (0xf << 12) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..72e4653222e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,14 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) { - return intel_guc_ct_send(&guc->ct, action, len, NULL, 0); + return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0); +} + +static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{ + return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, + INTEL_GUC_CT_SEND_NB); }

static inline int @@ -82,7 +89,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len, - response_buf, response_buf_size); + response_buf, response_buf_size, 0); }

static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 43e03aa2dde8..3d6cba8d91ad 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,8 @@ * Copyright © 2016-2019 Intel Corporation */

+#include <linux/circ_buf.h> + #include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +375,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */, - u32 fence) + u32 fence, u32 flags) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -383,6 +385,7 @@ static int ct_write(struct intel_guc_ct *ct, u32 used; u32 header; u32 hxg; + u32 type; u32 *cmds = ctb->cmds; unsigned int i;

@@ -408,8 +411,8 @@ static int ct_write(struct intel_guc_ct *ct, else used = tail - head;

- /* make sure there is a space including extra dw for the fence */ - if (unlikely(used + len + 1 >= size)) + /* make sure there is a space including extra dw for the header */ + if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) return -ENOSPC;

/* @@ -421,9 +424,11 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);

- hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | - FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION | - GUC_HXG_REQUEST_MSG_0_DATA0, action[0]); + type = (flags & INTEL_GUC_CT_SEND_NB) ? GUC_HXG_TYPE_EVENT : + GUC_HXG_TYPE_REQUEST; + hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, type) | + FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION | + GUC_HXG_EVENT_MSG_0_DATA0, action[0]);

CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); @@ -500,6 +505,48 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }

+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +{ + struct guc_ct_buffer_desc *desc = ctb->desc; + u32 head = READ_ONCE(desc->head); + u32 space; + + space = CIRC_SPACE(desc->tail, head, ctb->size); + + return space >= len_dw; +} + +static int ct_send_nb(struct intel_guc_ct *ct, + const u32 *action, + u32 len, + u32 flags) +{ + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + unsigned long spin_flags; + u32 fence; + int ret; + + spin_lock_irqsave(&ctb->lock, spin_flags); + + ret = h2g_has_room(ctb, len + GUC_CTB_HDR_LEN); + if (unlikely(!ret)) { + ret = -EBUSY; + goto out; + } + + fence = ct_get_next_fence(ct); + ret = ct_write(ct, action, len, fence, flags); + if (unlikely(ret)) + goto out; + + intel_guc_notify(ct_to_guc(ct)); + +out: + spin_unlock_irqrestore(&ctb->lock, spin_flags); + + return ret; +} + static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -507,8 +554,10 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) { + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; + unsigned int sleep_period_ms = 1; u32 fence; int err;

@@ -516,8 +565,24 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size); + might_sleep(); + + /* + * We use a lazy spin wait loop here as we believe that if the CT + * buffers are sized correctly the flow control condition should be + * rare. + */ +retry: + spin_lock_irqsave(&ctb->lock, flags); + if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) { + spin_unlock_irqrestore(&ctb->lock, flags);

- spin_lock_irqsave(&ct->ctbs.send.lock, flags); + if (msleep_interruptible(sleep_period_ms)) + return -EINTR; + sleep_period_ms = sleep_period_ms << 1; + + goto retry; + }

fence = ct_get_next_fence(ct); request.fence = fence; @@ -529,9 +594,9 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock);

- err = ct_write(ct, action, len, fence); + err = ct_write(ct, action, len, fence, 0);

- spin_unlock_irqrestore(&ct->ctbs.send.lock, flags); + spin_unlock_irqrestore(&ctb->lock, flags);

if (unlikely(err)) goto unlink; @@ -571,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct, * Command Transport (CT) buffer based GuC send function. */ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, - u32 *response_buf, u32 response_buf_size) + u32 *response_buf, u32 response_buf_size, u32 flags) { u32 status = ~0; /* undefined */ int ret; @@ -581,6 +646,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }

+ if (flags & INTEL_GUC_CT_SEND_NB) + return ct_send_nb(ct, action, len, flags); + ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n", diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 1ae2dde6db93..5bb8bef024c8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer { bool broken; };

- /** Top-level structure for Command Transport related data * * Includes a pair of CT buffers for bi-directional communication and tracking @@ -87,8 +86,9 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) return ct->enabled; }

+#define INTEL_GUC_CT_SEND_NB BIT(31) int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, - u32 *response_buf, u32 response_buf_size); + u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct);

#endif /* _INTEL_GUC_CT_H_ */

-- 2.28.0

Matthew Brost

10:20 p.m.

New subject: [PATCH 5/7] drm/i915/guc: Add stall timer to non blocking CTB send function

Implement a stall timer which fails H2G CTBs once a period of time with no forward progress is reached to prevent deadlock.

v2: (Michal) - Improve error message in ct_deadlock() - Set broken when ct_deadlock() returns true - Return -EPIPE on ct_deadlock() v3: (Michal) - Add ms to stall timer comment (Matthew) - Move broken check to intel_guc_ct_send()

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: John Harrison John.C.Harrison@Intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 62 ++++++++++++++++++++--- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 ++ 2 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 3d6cba8d91ad..db3e85b89573 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -4,6 +4,9 @@ */

#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h>

#include "i915_drv.h" #include "intel_guc_ct.h" @@ -316,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct) goto err_deregister;

ct->enabled = true; + ct->stall_time = KTIME_MAX;

return 0;

@@ -389,9 +393,6 @@ static int ct_write(struct intel_guc_ct *ct, u32 *cmds = ctb->cmds; unsigned int i;

- if (unlikely(ctb->broken)) - return -EPIPE; - if (unlikely(desc->status)) goto corrupted;

@@ -505,6 +506,25 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }

+#define GUC_CTB_TIMEOUT_MS 1500 +static inline bool ct_deadlocked(struct intel_guc_ct *ct) +{ + long timeout = GUC_CTB_TIMEOUT_MS; + bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout; + + if (unlikely(ret)) { + struct guc_ct_buffer_desc *send = ct->ctbs.send.desc; + struct guc_ct_buffer_desc *recv = ct->ctbs.send.desc; + + CT_ERROR(ct, "Communication stalled for %lld ms, desc status=%#x,%#x\n", + ktime_ms_delta(ktime_get(), ct->stall_time), + send->status, recv->status); + ct->ctbs.send.broken = true; + } + + return ret; +} + static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) { struct guc_ct_buffer_desc *desc = ctb->desc; @@ -516,6 +536,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) return space >= len_dw; }

+static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) +{ + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + + lockdep_assert_held(&ct->ctbs.send.lock); + + if (unlikely(!h2g_has_room(ctb, len_dw))) { + if (ct->stall_time == KTIME_MAX) + ct->stall_time = ktime_get(); + + if (unlikely(ct_deadlocked(ct))) + return -EPIPE; + else + return -EBUSY; + } + + ct->stall_time = KTIME_MAX; + return 0; +} + static int ct_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -528,11 +568,9 @@ static int ct_send_nb(struct intel_guc_ct *ct,

spin_lock_irqsave(&ctb->lock, spin_flags);

- ret = h2g_has_room(ctb, len + GUC_CTB_HDR_LEN); - if (unlikely(!ret)) { - ret = -EBUSY; + ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN); + if (unlikely(ret)) goto out; - }

fence = ct_get_next_fence(ct); ret = ct_write(ct, action, len, fence, flags); @@ -575,8 +613,13 @@ static int ct_send(struct intel_guc_ct *ct, retry: spin_lock_irqsave(&ctb->lock, flags); if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) { + if (ct->stall_time == KTIME_MAX) + ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags);

+ if (unlikely(ct_deadlocked(ct))) + return -EPIPE; + if (msleep_interruptible(sleep_period_ms)) return -EINTR; sleep_period_ms = sleep_period_ms << 1; @@ -584,6 +627,8 @@ static int ct_send(struct intel_guc_ct *ct, goto retry; }

+ ct->stall_time = KTIME_MAX; + fence = ct_get_next_fence(ct); request.fence = fence; request.status = 0; @@ -646,6 +691,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }

+ if (unlikely(ct->ctbs.send.broken)) + return -EPIPE; + if (flags & INTEL_GUC_CT_SEND_NB) return ct_send_nb(ct, action, len, flags);

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 5bb8bef024c8..bee03794c1eb 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -9,6 +9,7 @@ #include <linux/interrupt.h> #include <linux/spinlock.h> #include <linux/workqueue.h> +#include <linux/ktime.h>

#include "intel_guc_fwif.h"

@@ -68,6 +69,9 @@ struct intel_guc_ct { struct list_head incoming; /* incoming requests */ struct work_struct worker; /* handler for incoming requests */ } requests; + + /** @stall_time: time of first time a CTB submission is stalled */ + ktime_t stall_time; };

void intel_guc_ct_init_early(struct intel_guc_ct *ct);

-- 2.28.0

Matthew Brost

10:20 p.m.

New subject: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.

v2: (Michal) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 v3: (Michal / John H) - Drop redundant check of head value

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++-------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..4a73a1f03a9b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + guc_ct_buffer_desc_init(ctb->desc); }

@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 tail = ctb->tail; u32 size = ctb->size; - u32 used; u32 header; u32 hxg; u32 type; @@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;

- if (unlikely((tail | head) >= size)) { + GEM_BUG_ON(tail > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(tail != READ_ONCE(desc->tail))) { + CT_ERROR(ct, "Tail was modified %u != %u\n", + desc->tail, ctb->tail); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", - head, tail, size); + desc->head, desc->tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } - - /* - * tail == head condition indicates empty. GuC FW does not support - * using up the entire buffer to get tail == head meaning full. - */ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; - - /* make sure there is a space including extra dw for the header */ - if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) - return -ENOSPC; +#endif

/* * dw0: CT header (including fence) @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct);

/* now update descriptor */ + ctb->tail = tail; WRITE_ONCE(desc->tail, tail); + ctb->space -= len + GUC_CTB_HDR_LEN;

return 0;

@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status * - * For each sent request, Guc shall send bac CT response message. + * For each sent request, GuC shall send back CT response message. * Our message handler will update status of tracked request once * response message with given fence is received. Wait here and * check for valid response status value. @@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }

-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { - struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = READ_ONCE(desc->head); + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + u32 head; u32 space;

- space = CIRC_SPACE(desc->tail, head, ctb->size); + if (ctb->space >= len_dw) + return true; + + head = READ_ONCE(ctb->desc->head); + if (unlikely(head > ctb->size)) { + CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n", + ctb->desc->head, ctb->desc->tail, ctb->size); + ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW; + ctb->broken = true; + return false; + } + + space = CIRC_SPACE(ctb->tail, head, ctb->size); + ctb->space = space;

return space >= len_dw; }

static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) { - struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; - lockdep_assert_held(&ct->ctbs.send.lock);

- if (unlikely(!h2g_has_room(ctb, len_dw))) { + if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();

@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ctb->lock, flags); - if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) { + if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags); @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; + u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds; @@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;

- if (unlikely((tail | head) >= size)) { + GEM_BUG_ON(head > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(head != READ_ONCE(desc->head))) { + CT_ERROR(ct, "Head was modified %u != %u\n", + desc->head, ctb->head); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", head, tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } +#else + if (unlikely(tail >= size)) { + CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n", + tail, size); + desc->status |= GUC_CTB_STATUS_OVERFLOW; + goto corrupted; + } +#endif

/* tail == head condition indicates empty */ available = tail - head; @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);

+ ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index bee03794c1eb..edd1bba0445d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc; * @desc: pointer to the buffer descriptor * @cmds: pointer to the commands buffer * @size: size of the commands buffer in dwords + * @head: local shadow copy of head in dwords + * @tail: local shadow copy of tail in dwords + * @space: local shadow copy of space in dwords * @broken: flag to indicate if descriptor data is broken */ struct intel_guc_ct_buffer { @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size; + u32 tail; + u32 head; + u32 space; bool broken; };

-- 2.28.0

John Harrison

10:51 p.m.

New subject: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

On 7/6/2021 15:20, Matthew Brost wrote:

...

CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.

v2: (Michal)

Add additional sanity checks for head / tail pointers

Use GUC_CTB_HDR_LEN rather than magic 1

v3: (Michal / John H)

Drop redundant check of head value

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com

drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++-------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..4a73a1f03a9b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false;

ctb->tail = 0;

ctb->head = 0;

ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);

guc_ct_buffer_desc_init(ctb->desc); }

@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 tail = desc->tail;

u32 tail = ctb->tail; u32 size = ctb->size;

u32 used; u32 header; u32 hxg; u32 type;

@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;

if (unlikely((tail | head) >= size)) {

GEM_BUG_ON(tail > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(tail != READ_ONCE(desc->tail))) {
CT_ERROR(ct, "Tail was modified %u != %u\n",
	 desc->tail, ctb->tail);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely((desc->tail | desc->head) >= size)) {

Same arguments below about head apply to tail here. Also, there is no #else check on ctb->head?

...

CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
	 desc->head, desc->tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }
/*
* tail == head condition indicates empty. GuC FW does not support
* using up the entire buffer to get tail == head meaning full.
*/
if (tail < head)
used = (size - head) + tail;
else
used = tail - head;
/* make sure there is a space including extra dw for the header */

if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
return -ENOSPC;
+#endif

/* * dw0: CT header (including fence) @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct);

/* now update descriptor */

ctb->tail = tail; WRITE_ONCE(desc->tail, tail);

ctb->space -= len + GUC_CTB_HDR_LEN;

return 0;

@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,

@req: pointer to pending request

@status: placeholder for status

For each sent request, Guc shall send bac CT response message.

For each sent request, GuC shall send back CT response message.

Our message handler will update status of tracked request once

response message with given fence is received. Wait here and

check for valid response status value.

@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }

-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) {

struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = READ_ONCE(desc->head);

struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

u32 head; u32 space;

space = CIRC_SPACE(desc->tail, head, ctb->size);
if (ctb->space >= len_dw)
return true;
head = READ_ONCE(ctb->desc->head);

if (unlikely(head > ctb->size)) {
CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
	 ctb->desc->head, ctb->desc->tail, ctb->size);
ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
ctb->broken = true;
return false;
}

space = CIRC_SPACE(ctb->tail, head, ctb->size);

ctb->space = space;

return space >= len_dw; }

static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

lockdep_assert_held(&ct->ctbs.send.lock);

if (unlikely(!h2g_has_room(ctb, len_dw))) {

if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();

@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ctb->lock, flags);

if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {

if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags);

@@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds;

@@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;

if (unlikely((tail | head) >= size)) {

GEM_BUG_ON(head > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(head != READ_ONCE(desc->head))) {
CT_ERROR(ct, "Head was modified %u != %u\n",
	 desc->head, ctb->head);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely((desc->tail | desc->head) >= size)) {

As per comment in other thread, the check on head here is redundant because you have already hit a BUG_ON(ctb->head > size) followed by CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if 'desc->head > size'.

...

CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
} +#else
if (unlikely(tail >= size)) {
CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
	 tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
}
+#endif

/* tail == head condition indicates empty */ available = tail - head; @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);

ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index bee03794c1eb..edd1bba0445d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc;

@desc: pointer to the buffer descriptor

@cmds: pointer to the commands buffer

@size: size of the commands buffer in dwords

@head: local shadow copy of head in dwords

@tail: local shadow copy of tail in dwords

@space: local shadow copy of space in dwords

@broken: flag to indicate if descriptor data is broken

*/ struct intel_guc_ct_buffer {

@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;

u32 tail;

u32 head;

u32 space; bool broken; };

Matthew Brost

7 Jul 7 Jul

5:50 p.m.

New subject: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:

...

On 7/6/2021 15:20, Matthew Brost wrote:

...
CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.

v2: (Michal)

Add additional sanity checks for head / tail pointers

Use GUC_CTB_HDR_LEN rather than magic 1

v3: (Michal / John H)

Drop redundant check of head value

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com

drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++-------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..4a73a1f03a9b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false;

ctb->tail = 0;

ctb->head = 0;

ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);

guc_ct_buffer_desc_init(ctb->desc); }

@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 tail = desc->tail;

u32 tail = ctb->tail; u32 size = ctb->size;

u32 used; u32 header; u32 hxg; u32 type;

@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;

if (unlikely((tail | head) >= size)) {

GEM_BUG_ON(tail > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(tail != READ_ONCE(desc->tail))) {
CT_ERROR(ct, "Tail was modified %u != %u\n",
	 desc->tail, ctb->tail);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely((desc->tail | desc->head) >= size)) {
Same arguments below about head apply to tail here. Also, there is no #else

Yes, desc->tail can be removed from this check. Same for head below. Can you fix this when merging?

...

check on ctb->head?

ctb->head variable isn't used in this path nor is ctb->tail in the other. In the other path desc->tail is checked as it is read while desc->head isn't needed to be read here. The other path can also likely be reworked to pull the tail check outside of the if / else define block.

...

...
CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
	 desc->head, desc->tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }
/*
* tail == head condition indicates empty. GuC FW does not support
* using up the entire buffer to get tail == head meaning full.
*/
if (tail < head)
used = (size - head) + tail;
else
used = tail - head;
/* make sure there is a space including extra dw for the header */

if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
return -ENOSPC;
+#endif /* * dw0: CT header (including fence) @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct); /* now update descriptor */

ctb->tail = tail; WRITE_ONCE(desc->tail, tail);

ctb->space -= len + GUC_CTB_HDR_LEN; return 0;

@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,

@req: pointer to pending request

@status: placeholder for status

For each sent request, Guc shall send bac CT response message.

For each sent request, GuC shall send back CT response message.

Our message handler will update status of tracked request once

response message with given fence is received. Wait here and

check for valid response status value.

@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) {

struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = READ_ONCE(desc->head);

struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

u32 head; u32 space;

space = CIRC_SPACE(desc->tail, head, ctb->size);
if (ctb->space >= len_dw)
return true;
head = READ_ONCE(ctb->desc->head);

if (unlikely(head > ctb->size)) {
CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
	 ctb->desc->head, ctb->desc->tail, ctb->size);
ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
ctb->broken = true;
return false;
}

space = CIRC_SPACE(ctb->tail, head, ctb->size);

ctb->space = space; return space >= len_dw; } static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

lockdep_assert_held(&ct->ctbs.send.lock);

if (unlikely(!h2g_has_room(ctb, len_dw))) {

if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();

@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ctb->lock, flags);

if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {

if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags);

@@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds;

@@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;

if (unlikely((tail | head) >= size)) {

GEM_BUG_ON(head > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(head != READ_ONCE(desc->head))) {
CT_ERROR(ct, "Head was modified %u != %u\n",
	 desc->head, ctb->head);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely((desc->tail | desc->head) >= size)) {
As per comment in other thread, the check on head here is redundant because you have already hit a BUG_ON(ctb->head > size) followed by CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if 'desc->head > size'.

Yep. See above we can likely just delete this.

...

CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;

} +#else

if (unlikely(tail >= size)) {

CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",

```
	 tail, size);
```

desc->status |= GUC_CTB_STATUS_OVERFLOW;

```
goto corrupted;
```
}

Now we can move this outside if/else define block as it is same check as above. Again can you do this when you merge this?

Matt

...

...
+#endif /* tail == head condition indicates empty */ available = tail - head; @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);

ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index bee03794c1eb..edd1bba0445d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc;

@desc: pointer to the buffer descriptor

@cmds: pointer to the commands buffer

@size: size of the commands buffer in dwords

@head: local shadow copy of head in dwords

@tail: local shadow copy of tail in dwords

@space: local shadow copy of space in dwords

@broken: flag to indicate if descriptor data is broken

*/ struct intel_guc_ct_buffer {

@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;

u32 tail;

u32 head;

u32 space; bool broken; };

John Harrison

6:19 p.m.

New subject: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

On 7/7/2021 10:50, Matthew Brost wrote:

...

On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:

...
On 7/6/2021 15:20, Matthew Brost wrote:

...
CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.

v2: (Michal) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 v3: (Michal / John H) - Drop redundant check of head value

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com

drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++-------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..4a73a1f03a9b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false;

ctb->tail = 0;

ctb->head = 0;

ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);

guc_ct_buffer_desc_init(ctb->desc); }

@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 tail = desc->tail;

u32 tail = ctb->tail; u32 size = ctb->size;

u32 used; u32 header; u32 hxg; u32 type;

@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;

if (unlikely((tail | head) >= size)) {

GEM_BUG_ON(tail > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(tail != READ_ONCE(desc->tail))) {
CT_ERROR(ct, "Tail was modified %u != %u\n",
	 desc->tail, ctb->tail);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely((desc->tail | desc->head) >= size)) {
Same arguments below about head apply to tail here. Also, there is no #else
Yes, desc->tail can be removed from this check. Same for head below. Can you fix this when merging?

...
check on ctb->head?

ctb->head variable isn't used in this path nor is ctb->tail in the other. In the other path desc->tail is checked as it is read while desc->head isn't needed to be read here. The other path can also likely be reworked to pull the tail check outside of the if / else define block.

...
...
	CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
	 desc->head, desc->tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
}
/*
* tail == head condition indicates empty. GuC FW does not support
* using up the entire buffer to get tail == head meaning full.
*/
if (tail < head)
used = (size - head) + tail;
else
used = tail - head;
/* make sure there is a space including extra dw for the header */

if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
return -ENOSPC;
+#endif /* * dw0: CT header (including fence) @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct); /* now update descriptor */

ctb->tail = tail; WRITE_ONCE(desc->tail, tail);

ctb->space -= len + GUC_CTB_HDR_LEN; return 0;

@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status *

For each sent request, Guc shall send bac CT response message.

For each sent request, GuC shall send back CT response message.

Our message handler will update status of tracked request once

response message with given fence is received. Wait here and

check for valid response status value.

@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) {

struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = READ_ONCE(desc->head);

struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

u32 head; u32 space;

space = CIRC_SPACE(desc->tail, head, ctb->size);
if (ctb->space >= len_dw)
return true;
head = READ_ONCE(ctb->desc->head);

if (unlikely(head > ctb->size)) {
CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
	 ctb->desc->head, ctb->desc->tail, ctb->size);
ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
ctb->broken = true;
return false;
}

space = CIRC_SPACE(ctb->tail, head, ctb->size);

ctb->space = space; return space >= len_dw; } static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

lockdep_assert_held(&ct->ctbs.send.lock);

if (unlikely(!h2g_has_room(ctb, len_dw))) {

if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();

@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ctb->lock, flags);

if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {

if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags);

@@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds;

@@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;

if (unlikely((tail | head) >= size)) {

GEM_BUG_ON(head > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(head != READ_ONCE(desc->head))) {
CT_ERROR(ct, "Head was modified %u != %u\n",
	 desc->head, ctb->head);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely((desc->tail | desc->head) >= size)) {
As per comment in other thread, the check on head here is redundant because you have already hit a BUG_ON(ctb->head > size) followed by CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if 'desc->head > size'.
Yep. See above we can likely just delete this.

...
...
	CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
		 head, tail, size);
	desc->status |= GUC_CTB_STATUS_OVERFLOW;
	goto corrupted;
}
+#else
if (unlikely(tail >= size)) {
CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
	 tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
}
Now we can move this outside if/else define block as it is same check as above. Again can you do this when you merge this?

Matt

Given that a) there are multiple changes which are not trivial one liners and b) I personally prefer keeping the CT_ERRORs and dropping the BUG_ON, I would recommend that you repost an updated patch how you want it changed. Shouldn't need to repost the whole set, just this one patch. And maybe get it reviewed by Michal as he seems to be in agreement with your preferred direction.

John.

...

...
...
+#endif /* tail == head condition indicates empty */ available = tail - head; @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);

ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index bee03794c1eb..edd1bba0445d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc; * @desc: pointer to the buffer descriptor * @cmds: pointer to the commands buffer * @size: size of the commands buffer in dwords

@head: local shadow copy of head in dwords

@tail: local shadow copy of tail in dwords

@space: local shadow copy of space in dwords

@broken: flag to indicate if descriptor data is broken

*/

struct intel_guc_ct_buffer {

@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;

u32 tail;

u32 head;

u32 space; bool broken; };

Matthew Brost

6:56 p.m.

New subject: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

On Wed, Jul 07, 2021 at 11:19:01AM -0700, John Harrison wrote:

...

On 7/7/2021 10:50, Matthew Brost wrote:

...
On Tue, Jul 06, 2021 at 03:51:00PM -0700, John Harrison wrote:

...
On 7/6/2021 15:20, Matthew Brost wrote:

...
CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.

v2: (Michal) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 v3: (Michal / John H) - Drop redundant check of head value

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com

drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 88 +++++++++++++++-------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 65 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..4a73a1f03a9b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false;

ctb->tail = 0;

ctb->head = 0;

ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);

guc_ct_buffer_desc_init(ctb->desc); }

@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 tail = desc->tail;

u32 tail = ctb->tail; u32 size = ctb->size;

u32 used; u32 header; u32 hxg; u32 type;

@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;

if (unlikely((tail | head) >= size)) {

GEM_BUG_ON(tail > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(tail != READ_ONCE(desc->tail))) {
CT_ERROR(ct, "Tail was modified %u != %u\n",
	 desc->tail, ctb->tail);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely((desc->tail | desc->head) >= size)) {
Same arguments below about head apply to tail here. Also, there is no #else
Yes, desc->tail can be removed from this check. Same for head below. Can you fix this when merging?

...
check on ctb->head?

ctb->head variable isn't used in this path nor is ctb->tail in the other. In the other path desc->tail is checked as it is read while desc->head isn't needed to be read here. The other path can also likely be reworked to pull the tail check outside of the if / else define block.

...
...
	CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
	 desc->head, desc->tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
}
/*
* tail == head condition indicates empty. GuC FW does not support
* using up the entire buffer to get tail == head meaning full.
*/
if (tail < head)
used = (size - head) + tail;
else
used = tail - head;
/* make sure there is a space including extra dw for the header */

if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
return -ENOSPC;
+#endif /* * dw0: CT header (including fence) @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct); /* now update descriptor */

ctb->tail = tail; WRITE_ONCE(desc->tail, tail);

ctb->space -= len + GUC_CTB_HDR_LEN; return 0;

@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status *

For each sent request, Guc shall send bac CT response message.

For each sent request, GuC shall send back CT response message.

Our message handler will update status of tracked request once

response message with given fence is received. Wait here and

check for valid response status value.

@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } -static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) {

struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = READ_ONCE(desc->head);

struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

u32 head; u32 space;

space = CIRC_SPACE(desc->tail, head, ctb->size);
if (ctb->space >= len_dw)
return true;
head = READ_ONCE(ctb->desc->head);

if (unlikely(head > ctb->size)) {
CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
	 ctb->desc->head, ctb->desc->tail, ctb->size);
ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
ctb->broken = true;
return false;
}

space = CIRC_SPACE(ctb->tail, head, ctb->size);

ctb->space = space; return space >= len_dw; } static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

lockdep_assert_held(&ct->ctbs.send.lock);

if (unlikely(!h2g_has_room(ctb, len_dw))) {

if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();

@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ctb->lock, flags);

if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {

if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags);

@@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds;

@@ -747,12 +759,29 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;

if (unlikely((tail | head) >= size)) {

GEM_BUG_ON(head > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(head != READ_ONCE(desc->head))) {
CT_ERROR(ct, "Head was modified %u != %u\n",
	 desc->head, ctb->head);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely((desc->tail | desc->head) >= size)) {
As per comment in other thread, the check on head here is redundant because you have already hit a BUG_ON(ctb->head > size) followed by CT_ERROR(ctb->head != desc->head). Therefore, you can't get here if 'desc->head > size'.
Yep. See above we can likely just delete this.

...
...
	CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
		 head, tail, size);
	desc->status |= GUC_CTB_STATUS_OVERFLOW;
	goto corrupted;
}
+#else
if (unlikely(tail >= size)) {
CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
	 tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
}
Now we can move this outside if/else define block as it is same check as above. Again can you do this when you merge this?

Matt
Given that a) there are multiple changes which are not trivial one liners and b) I personally prefer keeping the CT_ERRORs and dropping the BUG_ON, I would recommend that you repost an updated patch how you want it changed. Shouldn't need to repost the whole set, just this one patch. And maybe get it reviewed by Michal as he seems to be in agreement with your preferred direction.

Ok, I sent it but I looks like patchworks didn't like it. Anyways we should be able to review that patch.

Matt

...

John.

...
...
...
+#endif /* tail == head condition indicates empty */ available = tail - head; @@ -802,6 +831,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);

ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index bee03794c1eb..edd1bba0445d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc; * @desc: pointer to the buffer descriptor * @cmds: pointer to the commands buffer * @size: size of the commands buffer in dwords

@head: local shadow copy of head in dwords

@tail: local shadow copy of tail in dwords

@space: local shadow copy of space in dwords

@broken: flag to indicate if descriptor data is broken

*/

struct intel_guc_ct_buffer {

@@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;

u32 tail;

u32 head;

u32 space; bool broken; };

John Harrison

8:21 p.m.

New subject: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

On 7/7/2021 11:56, Matthew Brost wrote: <snip>

...

Ok, I sent it but I looks like patchworks didn't like it. Anyways we should be able to review that patch.

Matt

Maybe because it came out as 6/56 instead of 6/7? Also, not sure if it needs to be in reply to 0/7 or 6/7?

John.

Matthew Brost

8:23 p.m.

New subject: [PATCH 6/7] drm/i915/guc: Optimize CTB writes and reads

On Wed, Jul 07, 2021 at 01:21:35PM -0700, John Harrison wrote:

...

On 7/7/2021 11:56, Matthew Brost wrote:

<snip> > Ok, I sent it but I looks like patchworks didn't like it. Anyways we > should be able to review that patch. > > Matt Maybe because it came out as 6/56 instead of 6/7? Also, not sure if it needs to be in reply to 0/7 or 6/7?

Yea, that is probably it. I think 6/7 would've made patckworks happy.

Matt

...

John.

Matthew Brost

6 Jul 6 Jul

10:20 p.m.

New subject: [PATCH 7/7] drm/i915/guc: Module load failure test for CT buffer creation

From: John Harrison John.C.Harrison@Intel.com

Add several module failure load inject points in the CT buffer creation code path.

Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++ 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 4a73a1f03a9b..5448377026e0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -175,6 +175,10 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type, { int err;

+ err = i915_inject_probe_error(guc_to_gt(ct_to_guc(ct))->i915, -ENXIO); + if (unlikely(err)) + return err; + err = guc_action_register_ct_buffer(ct_to_guc(ct), type, desc_addr, buff_addr, size); if (unlikely(err)) @@ -226,6 +230,10 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) u32 *cmds; int err;

+ err = i915_inject_probe_error(guc_to_gt(guc)->i915, -ENXIO); + if (err) + return err; + GEM_BUG_ON(ct->vma);

blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE;

-- 2.28.0

Matthew Brost

7 Jul 7 Jul

7:09 p.m.

New subject: [PATCH 06/56] drm/i915/guc: Optimize CTB writes and reads

v2: (Michal) - Add additional sanity checks for head / tail pointers - Use GUC_CTB_HDR_LEN rather than magic 1 v3: (Michal / John H) - Drop redundant check of head value v4: (John H) - Drop redundant checks of tail / head values

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 87 ++++++++++++++--------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..37fe9f3bbce3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + guc_ct_buffer_desc_init(ctb->desc); }

- if (unlikely((tail | head) >= size)) { - CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", - head, tail, size); + GEM_BUG_ON(tail > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(tail != READ_ONCE(desc->tail))) { + CT_ERROR(ct, "Tail was modified %u != %u\n", + desc->tail, ctb->tail); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely(desc->head >= size)) { + CT_ERROR(ct, "Invalid offsets head=%u (size=%u)\n", + desc->head, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } - - /* - * tail == head condition indicates empty. GuC FW does not support - * using up the entire buffer to get tail == head meaning full. - */ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; - - /* make sure there is a space including extra dw for the header */ - if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) - return -ENOSPC; +#endif

/* * dw0: CT header (including fence) @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct);

/* now update descriptor */ + ctb->tail = tail; WRITE_ONCE(desc->tail, tail); + ctb->space -= len + GUC_CTB_HDR_LEN;

return 0;

return space >= len_dw; }

static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) { - struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; - lockdep_assert_held(&ct->ctbs.send.lock);

- if (unlikely(!h2g_has_room(ctb, len_dw))) { + if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();

@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ctb->lock, flags); - if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) { + if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags); @@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; + u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds; @@ -747,9 +759,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;

- if (unlikely((tail | head) >= size)) { - CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", - head, tail, size); + GEM_BUG_ON(head > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(head != READ_ONCE(desc->head))) { + CT_ERROR(ct, "Head was modified %u != %u\n", + desc->head, ctb->head); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } +#endif + if (unlikely(tail >= size)) { + CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n", + tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } @@ -802,6 +824,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);

+ ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);

-- 2.28.0

Michal Wajdeczko

8:30 p.m.

New subject: [PATCH 06/56] drm/i915/guc: Optimize CTB writes and reads

Hi,

On 07.07.2021 21:09, Matthew Brost wrote:

...

CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.

v2: (Michal)

Add additional sanity checks for head / tail pointers

Use GUC_CTB_HDR_LEN rather than magic 1

v3: (Michal / John H)

Drop redundant check of head value

v4: (John H)

Drop redundant checks of tail / head values

mostly nits, but since you will be sending it again...

...

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com

drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 87 ++++++++++++++--------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 61 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..37fe9f3bbce3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false;

ctb->tail = 0;

ctb->head = 0;

ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);

guc_ct_buffer_desc_init(ctb->desc);

}

@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 tail = desc->tail;

u32 tail = ctb->tail; u32 size = ctb->size;

u32 used; u32 header; u32 hxg; u32 type;

@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;
if (unlikely((tail | head) >= size)) {
CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
GEM_BUG_ON(tail > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(tail != READ_ONCE(desc->tail))) {
CT_ERROR(ct, "Tail was modified %u != %u\n",
	 desc->tail, ctb->tail);

here you accessing desc->tail again so maybe we can use:

u32 raw __maybe_unused;

raw = READ_ONCE(desc->tail); if (unlikely(raw != tail)) ... CT_ERROR(..., raw, tail);

...

desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely(desc->head >= size)) {

above you are reading value from desc using READ_ONCE, could be

raw = READ_ONCE(desc->head); if (unlikely(raw >= size)) ... CT_ERROR(..., raw, size);

...

CT_ERROR(ct, "Invalid offsets head=%u (size=%u)\n",

"Invalid head offset %u > %u\n" ?

...

	 desc->head, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }
/*
* tail == head condition indicates empty. GuC FW does not support
* using up the entire buffer to get tail == head meaning full.
*/
if (tail < head)
used = (size - head) + tail;
else
used = tail - head;
/* make sure there is a space including extra dw for the header */

if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
return -ENOSPC;
+#endif

/* * dw0: CT header (including fence) @@ -453,7 +452,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct);

/* now update descriptor */

ctb->tail = tail; WRITE_ONCE(desc->tail, tail);

ctb->space -= len + GUC_CTB_HDR_LEN;

maybe keep ctb updates together?

+ /* update local copies */ + ctb->tail = tail; + ctb->space -= len + GUC_CTB_HDR_LEN; + /* now update descriptor */ WRITE_ONCE(desc->tail, tail);

...

return 0;

@@ -469,7 +470,7 @@ static int ct_write(struct intel_guc_ct *ct,

@req: pointer to pending request

@status: placeholder for status

For each sent request, Guc shall send bac CT response message.

For each sent request, GuC shall send back CT response message.

Our message handler will update status of tracked request once

response message with given fence is received. Wait here and

check for valid response status value.

@@ -525,24 +526,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }

-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) {

struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = READ_ONCE(desc->head);

struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

keep struct guc_ct_buffer_desc *desc = ctb->desc;

...

u32 head; u32 space;

space = CIRC_SPACE(desc->tail, head, ctb->size);
if (ctb->space >= len_dw)
return true;
head = READ_ONCE(ctb->desc->head);

if (unlikely(head > ctb->size)) {
CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",

"Invalid head offset %u > %u\n" ?

...

	 ctb->desc->head, ctb->desc->tail, ctb->size);

ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;

```
ctb->broken = true;
```

so above references to desc will be simpler

...

return false;
}

space = CIRC_SPACE(ctb->tail, head, ctb->size);

ctb->space = space;

return space >= len_dw;
}

static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) {

struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

lockdep_assert_held(&ct->ctbs.send.lock);

if (unlikely(!h2g_has_room(ctb, len_dw))) {

if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();

@@ -612,7 +624,7 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ctb->lock, flags);

if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {

if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags);

@@ -732,7 +744,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 head = ctb->head; u32 tail = desc->tail;

READ_ONCE ?

...

u32 size = ctb->size; u32 *cmds = ctb->cmds; @@ -747,9 +759,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;
if (unlikely((tail | head) >= size)) {
CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
GEM_BUG_ON(head > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(head != READ_ONCE(desc->head))) {
CT_ERROR(ct, "Head was modified %u != %u\n",
	 desc->head, ctb->head);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}
+#endif
if (unlikely(tail >= size)) {
CT_ERROR(ct, "Invalid offsets tail=%u (size=%u)\n",
	 tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }
@@ -802,6 +824,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);

ctb->head = head;

+ empty line

...

/* now update descriptor */ WRITE_ONCE(desc->head, head);

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index bee03794c1eb..edd1bba0445d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc;

@desc: pointer to the buffer descriptor

@cmds: pointer to the commands buffer

@size: size of the commands buffer in dwords

@head: local shadow copy of head in dwords

@tail: local shadow copy of tail in dwords

@space: local shadow copy of space in dwords

@broken: flag to indicate if descriptor data is broken

*/

struct intel_guc_ct_buffer { @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;

u32 tail;

u32 head;

u32 space;

it looks that we update space only for send/H2G ctb but maybe we should also track free space in recv/G2H ctb ? then we can report different stress conditions if space is less than 25% of size or so (someday)

Michal

...

bool broken; };

Matthew Brost

11:25 p.m.

New subject: [PATCH 06/7] drm/i915/guc: Optimize CTB writes and reads

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 92 +++++++++++++++-------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 66 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..d552d3016779 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + guc_ct_buffer_desc_init(ctb->desc); }

- if (unlikely((tail | head) >= size)) { - CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", - head, tail, size); + GEM_BUG_ON(tail > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(tail != READ_ONCE(desc->tail))) { + CT_ERROR(ct, "Tail was modified %u != %u\n", + desc->tail, tail); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } + if (unlikely(desc->head >= size)) { + CT_ERROR(ct, "Invalid head offset %u >= %u)\n", + desc->head, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } - - /* - * tail == head condition indicates empty. GuC FW does not support - * using up the entire buffer to get tail == head meaning full. - */ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; - - /* make sure there is a space including extra dw for the header */ - if (unlikely(used + len + GUC_CTB_HDR_LEN >= size)) - return -ENOSPC; +#endif

/* * dw0: CT header (including fence) @@ -452,6 +451,10 @@ static int ct_write(struct intel_guc_ct *ct, */ write_barrier(ct);

+ /* update local copies */ + ctb->tail = tail; + ctb->space -= len + GUC_CTB_HDR_LEN; + /* now update descriptor */ WRITE_ONCE(desc->tail, tail);

@@ -469,7 +472,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status * - * For each sent request, Guc shall send bac CT response message. + * For each sent request, GuC shall send back CT response message. * Our message handler will update status of tracked request once * response message with given fence is received. Wait here and * check for valid response status value. @@ -525,24 +528,36 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }

-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = READ_ONCE(desc->head); + u32 head; u32 space;

- space = CIRC_SPACE(desc->tail, head, ctb->size); + if (ctb->space >= len_dw) + return true; + + head = READ_ONCE(desc->head); + if (unlikely(head > ctb->size)) { + CT_ERROR(ct, "Invalid head offset %u >= %u)\n", + head, ctb->size); + desc->status |= GUC_CTB_STATUS_OVERFLOW; + ctb->broken = true; + return false; + } + + space = CIRC_SPACE(ctb->tail, head, ctb->size); + ctb->space = space;

return space >= len_dw; }

static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) { - struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; - lockdep_assert_held(&ct->ctbs.send.lock);

- if (unlikely(!h2g_has_room(ctb, len_dw))) { + if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();

@@ -612,7 +627,7 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ctb->lock, flags); - if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) { + if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags); @@ -732,8 +747,8 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 head = ctb->head; + u32 tail = READ_ONCE(desc->tail); u32 size = ctb->size; u32 *cmds = ctb->cmds; s32 available; @@ -747,9 +762,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;

- if (unlikely((tail | head) >= size)) { - CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", - head, tail, size); + GEM_BUG_ON(head > size); + +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely(head != READ_ONCE(desc->head))) { + CT_ERROR(ct, "Head was modified %u != %u\n", + desc->head, head); + desc->status |= GUC_CTB_STATUS_MISMATCH; + goto corrupted; + } +#endif + if (unlikely(tail >= size)) { + CT_ERROR(ct, "Invalid tail offset %u >= %u)\n", + tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } @@ -802,6 +827,9 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);

+ /* update local copies */ + ctb->head = head; + /* now update descriptor */ WRITE_ONCE(desc->head, head);

-- 2.28.0

Michal Wajdeczko

8 Jul 8 Jul

1:23 p.m.

New subject: [PATCH 06/7] drm/i915/guc: Optimize CTB writes and reads

On 08.07.2021 01:25, Matthew Brost wrote:

...

CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.

v2: (Michal)

Add additional sanity checks for head / tail pointers

Use GUC_CTB_HDR_LEN rather than magic 1

v3: (Michal / John H)

Drop redundant check of head value

v4: (John H)

Drop redundant checks of tail / head values

v5: (Michal)

Address more nits

Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com

drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 92 +++++++++++++++-------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 66 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index db3e85b89573..d552d3016779 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false;

ctb->tail = 0;

ctb->head = 0;

ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);

guc_ct_buffer_desc_init(ctb->desc);

}

@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 tail = desc->tail;

u32 tail = ctb->tail; u32 size = ctb->size;

u32 used; u32 header; u32 hxg; u32 type;

@@ -396,25 +398,22 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;
if (unlikely((tail | head) >= size)) {
CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
GEM_BUG_ON(tail > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(tail != READ_ONCE(desc->tail))) {
CT_ERROR(ct, "Tail was modified %u != %u\n",
	 desc->tail, tail);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}

if (unlikely(desc->head >= size)) {

READ_ONCE wouldn't hurt

...

CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
	 desc->head, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }
/*
* tail == head condition indicates empty. GuC FW does not support
* using up the entire buffer to get tail == head meaning full.
*/
if (tail < head)
used = (size - head) + tail;
else
used = tail - head;
/* make sure there is a space including extra dw for the header */

if (unlikely(used + len + GUC_CTB_HDR_LEN >= size))
return -ENOSPC;
+#endif

/* * dw0: CT header (including fence) @@ -452,6 +451,10 @@ static int ct_write(struct intel_guc_ct *ct, */ write_barrier(ct);

/* update local copies */

ctb->tail = tail;

ctb->space -= len + GUC_CTB_HDR_LEN;

it looks that we rely on previous call to h2g_has_room(), but maybe for completeness we should have sanity check in this function as well:

GEM_BUG_ON(ctb->space < len + HDR_LEN);

not a blocker, other LGTM,

Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com

Michal

...

/* now update descriptor */ WRITE_ONCE(desc->tail, tail);

@@ -469,7 +472,7 @@ static int ct_write(struct intel_guc_ct *ct,

@req: pointer to pending request

@status: placeholder for status

For each sent request, Guc shall send bac CT response message.

For each sent request, GuC shall send back CT response message.

Our message handler will update status of tracked request once

response message with given fence is received. Wait here and

check for valid response status value.

@@ -525,24 +528,36 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }

-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) {

struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = READ_ONCE(desc->head);

u32 head; u32 space;

space = CIRC_SPACE(desc->tail, head, ctb->size);
if (ctb->space >= len_dw)
return true;
head = READ_ONCE(desc->head);

if (unlikely(head > ctb->size)) {
CT_ERROR(ct, "Invalid head offset %u >= %u)\n",
	 head, ctb->size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
ctb->broken = true;
return false;
}

space = CIRC_SPACE(ctb->tail, head, ctb->size);

ctb->space = space;

return space >= len_dw;
}

static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) {

struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;

lockdep_assert_held(&ct->ctbs.send.lock);

if (unlikely(!h2g_has_room(ctb, len_dw))) {

if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();

@@ -612,7 +627,7 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ctb->lock, flags);

if (unlikely(!h2g_has_room(ctb, len + GUC_CTB_HDR_LEN))) {

if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ctb->lock, flags);

@@ -732,8 +747,8 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc;

u32 head = desc->head;

u32 tail = desc->tail;

u32 head = ctb->head;

u32 tail = READ_ONCE(desc->tail); u32 size = ctb->size; u32 *cmds = ctb->cmds; s32 available;

@@ -747,9 +762,19 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;
if (unlikely((tail | head) >= size)) {
CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
	 head, tail, size);
GEM_BUG_ON(head > size);

+#ifdef CONFIG_DRM_I915_DEBUG_GUC
if (unlikely(head != READ_ONCE(desc->head))) {
CT_ERROR(ct, "Head was modified %u != %u\n",
	 desc->head, head);
desc->status |= GUC_CTB_STATUS_MISMATCH;
goto corrupted;
}
+#endif
if (unlikely(tail >= size)) {
CT_ERROR(ct, "Invalid tail offset %u >= %u)\n",
	 tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }
@@ -802,6 +827,9 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);

/* update local copies */

ctb->head = head;

/* now update descriptor */ WRITE_ONCE(desc->head, head);

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index bee03794c1eb..edd1bba0445d 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc;

@desc: pointer to the buffer descriptor

@cmds: pointer to the commands buffer

@size: size of the commands buffer in dwords

@head: local shadow copy of head in dwords

@tail: local shadow copy of tail in dwords

@space: local shadow copy of space in dwords

@broken: flag to indicate if descriptor data is broken

*/

struct intel_guc_ct_buffer { @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;

u32 tail;

u32 head;

u32 space; bool broken;

};

Matthew Brost

12:30 a.m.

New subject: [PATCH 0/2] Introduce set_parallel2 extension

Based on upstream feedback [1] the current set_parallel extension isn't suitable. Add a single patch to DII implementing the new interface agreed two upstream [2]. Intended to enable the UMDs with the upstream interface while maintaining the old interface on DII.

Quick IGT to prove this is working should be list shortly.

v2: Move single patch in GuC section on pile, align with agreed to upstream interface, only include prelim* definitions. v3: Enable set_parallel2 via SET_PARAM IOCTL, resend for CI v4: Fix regression when patch was merge - only do parallel checks on user engine sets

Signed-off-by: Matthew Brost matthew.brost@intel.com

[1] https://patchwork.freedesktop.org/patch/432205/?series=89840&rev=1 [2] https://patchwork.freedesktop.org/patch/438911/?series=91417&rev=1

Signed-off-by: Matthew Brost matthew.brost@intel.com

--- baseline: b7227afd06bac1fe6719136e2ddd2bfed1d85feb pile-commit: b7a2c9136977a385659a71df837cbe5a1f775b32 range-diff: -: ------------ > 930: ad12b87b91af INTEL_DII/NOT_UPSTREAM: drm/i915: Introduce set_parallel2 extension 1083: 73e59e150cde ! 1084: 79b296835b1c INTEL_DII/FIXME: drm/i915/perf: add a parameter to control the size of OA buffer 1120: edbc20ae1355 ! 1121: 30d02d618229 INTEL_DII/FIXME: drm/i915: Add context parameter for debug flags 1293: 997b317fc408 ! 1294: 016b5903b0a0 INTEL_DII: drm/i915/perf: Add OA formats for XEHPSDV 1364: 136064b76b92 ! 1365: 5f564d553dc8 INTEL_DII: drm/i915/xehpsdv: Expand total numbers of supported engines up to 256 1403: 67b729033e82 ! 1404: 4398a2322f2f INTEL_DII: drm/i915/xehpsdv: Impose ULLS context restrictions 1405: b8dd2a22a952 ! 1406: dd2fab232cf1 INTEL_DII: drm/i915: Add context methods to suspend and resume requests 1670: b4633106fa13 ! 1671: 53b4a54ee2cc INTEL_DII: drm/i915/pxp: interface for marking contexts as using protected content 1671: 22369ab70556 ! 1672: 42234590cdf5 INTEL_DII: drm/i915/pxp: start the arb session on demand

series | 1 + ...IXME-drm-i915-perf-add-a-parameter-to-con.patch | 4 +- ...IXME-drm-i915-Add-context-parameter-for-d.patch | 18 +- ...-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch | 4 +- ...rm-i915-xehpsdv-Expand-total-numbers-of-s.patch | 2 +- ...rm-i915-xehpsdv-Impose-ULLS-context-restr.patch | 12 +- ...rm-i915-Add-context-methods-to-suspend-an.patch | 38 +- ...rm-i915-pxp-interface-for-marking-context.patch | 16 +- ...rm-i915-pxp-start-the-arb-session-on-dema.patch | 2 +- ...OT_UPSTREAM-drm-i915-Introduce-set_parall.patch | 676 +++++++++++++++++++++ 10 files changed, 725 insertions(+), 48 deletions(-)

diff --git a/series b/series index 8b77d52df40c..7db508ea974d 100644 --- a/series +++ b/series @@ -929,6 +929,7 @@ 0001-INTEL_DII-drm-i915-guc-Increase-GuC-log-size-for-CON.patch 0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Dump-error-capture-t.patch 0001-INTEL_DII-NOT_UPSTREAM-drm-i915-guc-Dump-error-captu.patch +0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch 0001-INTEL_DII-END-GuC-submission-and-slpc-support.patch 0001-INTEL_DII-BEGIN-SR-IOV-ENABLING.patch 0001-INTEL_DII-drm-i915-guc-Update-GuC-to-62.0.3.patch diff --git a/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch b/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch index dd654f144374..b7a637b3813f 100644 --- a/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch +++ b/0001-INTEL_DII-FIXME-drm-i915-perf-add-a-parameter-to-con.patch @@ -384,8 +384,8 @@ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h --- a/include/uapi/drm/i915_drm_prelim.h +++ b/include/uapi/drm/i915_drm_prelim.h -@@ -393,6 +393,36 @@ struct prelim_i915_context_param_engines { - #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */ +@@ -508,6 +508,36 @@ struct prelim_i915_context_param_engines { + #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */ };

+enum prelim_drm_i915_perf_property_id { diff --git a/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch b/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch index dfd5790ac2b8..71a5943b5536 100644 --- a/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch +++ b/0001-INTEL_DII-FIXME-drm-i915-Add-context-parameter-for-d.patch @@ -44,7 +44,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/ }

static void __free_engines(struct i915_gem_engines *e, unsigned int count) -@@ -2252,6 +2255,76 @@ static int set_priority(struct i915_gem_context *ctx, +@@ -2436,6 +2439,76 @@ static int set_priority(struct i915_gem_context *ctx, return 0; }

@@ -121,7 +121,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/ static int ctx_setparam(struct drm_i915_file_private *fpriv, struct i915_gem_context *ctx, struct drm_i915_gem_context_param *args) -@@ -2321,6 +2394,11 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, +@@ -2505,6 +2578,11 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, ret = set_ringsize(ctx, args); break;

@@ -133,7 +133,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/ case I915_CONTEXT_PARAM_BAN_PERIOD: default: ret = -EINVAL; -@@ -2777,6 +2855,11 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, +@@ -2961,6 +3039,11 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, ret = get_ringsize(ctx, args); break;

@@ -184,7 +184,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h -@@ -285,6 +285,24 @@ intel_context_clear_nopreempt(struct intel_context *ce) +@@ -296,6 +296,24 @@ intel_context_clear_nopreempt(struct intel_context *ce) ce->emit_bb_start = ce->engine->emit_bb_start; }

@@ -212,19 +212,19 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/i diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h -@@ -114,6 +114,7 @@ struct intel_context { - #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 +@@ -115,6 +115,7 @@ struct intel_context { #define CONTEXT_NOPREEMPT 8 #define CONTEXT_LRCA_DIRTY 9 -+#define CONTEXT_DEBUG 10 + #define CONTEXT_NO_PREEMPT_MID_BATCH 10 ++#define CONTEXT_DEBUG 11

struct { u64 timeout_us; diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h --- a/include/uapi/drm/i915_drm_prelim.h +++ b/include/uapi/drm/i915_drm_prelim.h -@@ -395,6 +395,32 @@ struct prelim_i915_context_param_engines { - #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */ +@@ -510,6 +510,32 @@ struct prelim_i915_context_param_engines { + #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */ };

+struct prelim_drm_i915_gem_context_param { diff --git a/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch b/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch index 19a07b3926ae..f62d7848e091 100644 --- a/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch +++ b/0001-INTEL_DII-drm-i915-perf-Add-OA-formats-for-XEHPSDV.patch @@ -204,8 +204,8 @@ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h --- a/include/uapi/drm/i915_drm_prelim.h +++ b/include/uapi/drm/i915_drm_prelim.h -@@ -435,6 +435,27 @@ struct prelim_i915_context_param_engines { - #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */ +@@ -550,6 +550,27 @@ struct prelim_i915_context_param_engines { + #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */ };

+enum prelim_drm_i915_oa_format { diff --git a/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch b/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch index 05a84884a3d1..ee486b95d11e 100644 --- a/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch +++ b/0001-INTEL_DII-drm-i915-xehpsdv-Expand-total-numbers-of-s.patch @@ -76,7 +76,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9

/* Kernel clipping was a DRI1 misfeature */ if (!(exec->flags & I915_EXEC_FENCE_ARRAY)) { -@@ -3233,9 +3235,12 @@ eb_select_engine(struct i915_execbuffer *eb) +@@ -3233,9 +3235,12 @@ eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number) int err;

if (i915_gem_context_user_engines(eb->gem_context)) diff --git a/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch b/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch index 38ad84c4dc12..80880e3008cc 100644 --- a/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch +++ b/0001-INTEL_DII-drm-i915-xehpsdv-Impose-ULLS-context-restr.patch @@ -76,7 +76,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9 if (intel_context_nopreempt(eb->context) || intel_context_debug(eb->context)) __set_bit(I915_FENCE_FLAG_NOPREEMPT, &eb->request->fence.flags); -@@ -3453,6 +3462,13 @@ static int eb_request_add(struct i915_execbuffer *eb, int err) +@@ -3463,6 +3472,13 @@ static int eb_request_add(struct i915_execbuffer *eb, int err)

trace_i915_request_add(rq);

@@ -90,7 +90,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9 prev = __i915_request_commit(rq);

/* Check that the context wasn't destroyed before submission */ -@@ -3531,6 +3547,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, +@@ -3541,6 +3557,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, int err; bool first = batch_number == 0; bool last = batch_number + 1 == num_batches; @@ -98,7 +98,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9

BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS); BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS & -@@ -3582,6 +3599,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, +@@ -3592,6 +3609,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (unlikely(err)) goto err_destroy;

@@ -109,7 +109,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9 + goto err_context; + } + - err = eb_select_engine(&eb); + err = eb_select_engine(&eb, batch_number); if (unlikely(err)) goto err_context; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c @@ -239,7 +239,7 @@ diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_req /* * Requests on the same timeline are explicitly ordered, along * with their dependencies, by i915_request_add() which ensures -@@ -2126,6 +2181,7 @@ long i915_request_wait(struct i915_request *rq, +@@ -2121,6 +2176,7 @@ long i915_request_wait(struct i915_request *rq, { might_sleep(); GEM_BUG_ON(timeout < 0); @@ -247,7 +247,7 @@ diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_req

if (dma_fence_is_signaled(&rq->fence)) return timeout; -@@ -2331,6 +2387,8 @@ static struct i915_global_request global = { { +@@ -2326,6 +2382,8 @@ static struct i915_global_request global = { {

int __init i915_global_request_init(void) { diff --git a/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch b/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch index 7d523c8dadba..44fd93184b8a 100644 --- a/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch +++ b/0001-INTEL_DII-drm-i915-Add-context-methods-to-suspend-an.patch @@ -52,7 +52,7 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/i void intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) { -@@ -475,6 +481,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) +@@ -476,6 +482,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link);

@@ -62,7 +62,7 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/i i915_active_init(&ce->active, __intel_context_active, __intel_context_retire); } -@@ -485,6 +494,7 @@ void intel_context_fini(struct intel_context *ce) +@@ -486,6 +495,7 @@ void intel_context_fini(struct intel_context *ce)

if (ce->last_rq) i915_request_put(ce->last_rq); @@ -73,7 +73,7 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/i diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h -@@ -252,6 +252,54 @@ static inline bool intel_context_ban(struct intel_context *ce, +@@ -263,6 +263,54 @@ static inline bool intel_context_ban(struct intel_context *ce, return ret; }

@@ -152,10 +152,10 @@ diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i91 void (*enter)(struct intel_context *ce); void (*exit)(struct intel_context *ce);

-@@ -241,6 +248,9 @@ struct intel_context { +@@ -245,6 +252,9 @@ struct intel_context {

- /* Last request submitted on a parent */ - struct i915_request *last_rq; + /* Parallel submit mutex */ + struct mutex parallel_submit; + + /* GuC context blocked fence */ + struct i915_sw_fence guc_blocked; @@ -231,7 +231,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm

if (!enabled) { GEM_BUG_ON(context_pending_enable(ce)); -@@ -1103,6 +1137,8 @@ static void __guc_context_destroy(struct intel_context *ce); +@@ -1102,6 +1136,8 @@ static void __guc_context_destroy(struct intel_context *ce); static void release_guc_id(struct intel_guc *guc, struct intel_context *ce); static void guc_signal_context_fence(struct intel_context *ce); static void guc_cancel_context_requests(struct intel_context *ce); @@ -240,7 +240,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm

static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) { -@@ -1143,6 +1179,8 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +@@ -1142,6 +1178,8 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)

/* Not mutualy exclusive with above if statement. */ if (pending_disable) { @@ -249,7 +249,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm guc_signal_context_fence(ce); if (banned) { guc_cancel_context_requests(ce); -@@ -1150,7 +1188,12 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +@@ -1149,7 +1187,12 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) } intel_context_sched_disable_unpin(ce); atomic_dec(&guc->outstanding_submission_g2h); @@ -262,7 +262,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm } } } -@@ -2549,6 +2592,22 @@ static void guc_parent_context_unpin(struct intel_context *ce) +@@ -2551,6 +2594,22 @@ static void guc_parent_context_unpin(struct intel_context *ce) __guc_context_unpin(ce); }

@@ -285,7 +285,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm static void __guc_context_sched_disable(struct intel_guc *guc, struct intel_context *ce, u16 guc_id) -@@ -2576,10 +2635,13 @@ static void __guc_context_sched_disable(struct intel_guc *guc, +@@ -2578,10 +2637,13 @@ static void __guc_context_sched_disable(struct intel_guc *guc, G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true); }

@@ -299,7 +299,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm intel_context_get(ce);

return ce->guc_id; -@@ -2677,6 +2739,132 @@ static void guc_context_sched_disable(struct intel_context *ce) +@@ -2679,6 +2741,132 @@ static void guc_context_sched_disable(struct intel_context *ce) intel_context_sched_disable_unpin(ce); }

@@ -432,7 +432,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm int intel_guc_modify_scheduling(struct intel_guc *guc, bool enable) { struct intel_gt *gt = guc_to_gt(guc); -@@ -2991,6 +3179,9 @@ static const struct intel_context_ops guc_context_ops = { +@@ -2993,6 +3181,9 @@ static const struct intel_context_ops guc_context_ops = {

.ban = guc_context_ban,

@@ -442,7 +442,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm .enter = intel_context_enter_engine, .exit = guc_context_exit,

-@@ -3380,6 +3571,9 @@ static const struct intel_context_ops virtual_guc_context_ops = { +@@ -3382,6 +3573,9 @@ static const struct intel_context_ops virtual_guc_context_ops = {

.ban = guc_context_ban,

@@ -452,7 +452,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm .enter = guc_virtual_context_enter, .exit = guc_virtual_context_exit,

-@@ -3457,6 +3651,9 @@ static const struct intel_context_ops parent_context_ops = { +@@ -3459,6 +3653,9 @@ static const struct intel_context_ops parent_context_ops = {

.ban = guc_context_ban,

@@ -462,7 +462,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm .enter = intel_context_enter_engine, .exit = intel_context_exit_engine,

-@@ -3476,6 +3673,9 @@ static const struct intel_context_ops virtual_parent_context_ops = { +@@ -3478,6 +3675,9 @@ static const struct intel_context_ops virtual_parent_context_ops = {

.ban = guc_context_ban,

@@ -472,7 +472,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm .enter = guc_virtual_context_enter, .exit = guc_virtual_context_exit,

-@@ -3487,6 +3687,9 @@ static const struct intel_context_ops virtual_parent_context_ops = { +@@ -3489,6 +3689,9 @@ static const struct intel_context_ops virtual_parent_context_ops = { static const struct intel_context_ops child_context_ops = { .alloc = guc_context_alloc,

@@ -482,7 +482,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm .enter = intel_context_enter_engine, .exit = guc_context_exit,

-@@ -3497,6 +3700,9 @@ static const struct intel_context_ops child_context_ops = { +@@ -3499,6 +3702,9 @@ static const struct intel_context_ops child_context_ops = { static const struct intel_context_ops virtual_child_context_ops = { .alloc = guc_virtual_context_alloc,

@@ -492,7 +492,7 @@ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm .enter = guc_virtual_context_enter, .exit = guc_virtual_context_exit,

-@@ -4440,6 +4646,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, +@@ -4441,6 +4647,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, clr_context_banned(ce); clr_context_pending_disable(ce); __guc_signal_context_fence(ce); diff --git a/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch b/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch index 6b38bd36d21b..8a6b9561eb24 100644 --- a/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch +++ b/0001-INTEL_DII-drm-i915-pxp-interface-for-marking-context.patch @@ -56,7 +56,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/ #include "i915_gem_context.h" #include "i915_gem_ioctls.h" #include "i915_globals.h" -@@ -2574,6 +2576,40 @@ static int set_acc(struct i915_gem_context *ctx, +@@ -2769,6 +2771,40 @@ static int set_acc(struct i915_gem_context *ctx, return 0; }

@@ -97,7 +97,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/ static int ctx_setparam(struct drm_i915_file_private *fpriv, struct i915_gem_context *ctx, struct drm_i915_gem_context_param *args, -@@ -2607,6 +2643,8 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, +@@ -2802,6 +2838,8 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, ret = -EPERM; else if (args->value) i915_gem_context_set_bannable(ctx); @@ -106,7 +106,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/ else i915_gem_context_clear_bannable(ctx); break; -@@ -2614,10 +2652,12 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, +@@ -2809,10 +2847,12 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, case I915_CONTEXT_PARAM_RECOVERABLE: if (args->size) ret = -EINVAL; @@ -122,7 +122,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/ break;

case I915_CONTEXT_PARAM_PRIORITY: -@@ -2664,6 +2704,9 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, +@@ -2865,6 +2905,9 @@ static int ctx_setparam(struct drm_i915_file_private *fpriv, case I915_CONTEXT_PARAM_DEBUG_FLAGS: ret = set_debug_flags(ctx, args); break; @@ -132,7 +132,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/

case I915_CONTEXT_PARAM_BAN_PERIOD: default: -@@ -3157,6 +3200,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, +@@ -3358,6 +3401,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_DEBUG_FLAGS: ret = get_debug_flags(ctx, args); break; @@ -142,7 +142,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/

case I915_CONTEXT_PARAM_BAN_PERIOD: default: -@@ -3281,6 +3327,11 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, +@@ -3482,6 +3528,11 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, args->batch_active = atomic_read(&ctx->guilty_count); args->batch_pending = atomic_read(&ctx->active_count);

@@ -225,7 +225,7 @@ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i9 eb->gem_context = ctx; if (rcu_access_pointer(ctx->vm)) eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT; -@@ -3301,6 +3308,17 @@ eb_select_engine(struct i915_execbuffer *eb) +@@ -3311,6 +3318,17 @@ eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number)

intel_gt_pm_get(ce->engine->gt);

@@ -348,7 +348,7 @@ diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h --- a/include/uapi/drm/i915_drm_prelim.h +++ b/include/uapi/drm/i915_drm_prelim.h -@@ -893,6 +893,26 @@ struct prelim_drm_i915_gem_context_param { +@@ -1003,6 +1003,26 @@ struct prelim_drm_i915_gem_context_param { #define I915_CONTEXT_PARAM_ACC 0xd };

diff --git a/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch b/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch index 5ee627b00811..4b4326057959 100644 --- a/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch +++ b/0001-INTEL_DII-drm-i915-pxp-start-the-arb-session-on-dema.patch @@ -22,7 +22,7 @@ Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c -@@ -3309,9 +3309,11 @@ eb_select_engine(struct i915_execbuffer *eb) +@@ -3319,9 +3319,11 @@ eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number) intel_gt_pm_get(ce->engine->gt);

if (i915_gem_context_uses_protected_content(eb->gem_context)) { diff --git a/0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch b/0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch new file mode 100644 index 000000000000..415fbd930383 --- /dev/null +++ b/0001-INTEL_DII-NOT_UPSTREAM-drm-i915-Introduce-set_parall.patch @@ -0,0 +1,676 @@ +From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 +From: Matthew Brost matthew.brost@intel.com +Date: Wed, 7 Jul 2021 16:55:03 -0700 +Subject: [PATCH] INTEL_DII/NOT_UPSTREAM: drm/i915: Introduce set_parallel2 + extension + +Based on upstream feedback the set_parallel extension isn't suitable as +it looks a bit too much like the bonding extension. Introduce a +set_parallel2 extension which configures parallel submission in a single +extension and in a single slot. This compares to old set_parallel +extension which configured parallel submission across multiple slots. + +Also remove the ability for the user to pass in the number of BBs in +the execbuf IOCTL. The number of BBs is now implied based on the width +of the context in the slot. + +This patch is intended in enable UMDs for the upstream direction while +maintaining the old set_parallel extension to not break UMDs. Once UMDs +have been updated to use new extension the old one can be removed from +DII. + +v2: Only enable parallel submission on engines set by user + +Signed-off-by: Matthew Brost matthew.brost@intel.com +--- + drivers/gpu/drm/i915/gem/i915_gem_context.c | 190 +++++++++++++++++- + .../gpu/drm/i915/gem/i915_gem_context_types.h | 6 - + .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 73 +++++-- + drivers/gpu/drm/i915/gt/intel_context.c | 2 + + drivers/gpu/drm/i915/gt/intel_context.h | 11 + + drivers/gpu/drm/i915/gt/intel_context_types.h | 4 + + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 +- + drivers/gpu/drm/i915/i915_request.c | 7 +- + include/uapi/drm/i915_drm_prelim.h | 115 +++++++++++ + 9 files changed, 376 insertions(+), 36 deletions(-) + +diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c +--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c ++++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c +@@ -374,7 +374,6 @@ void i915_gem_context_release(struct kref *ref) + mutex_destroy(&ctx->engines_mutex); + mutex_destroy(&ctx->lut_mutex); + mutex_destroy(&ctx->mutex); +- mutex_destroy(&ctx->parallel_submit); + + kfree_rcu(ctx, rcu); + } +@@ -699,8 +698,6 @@ __create_context(struct drm_i915_private *i915) + mutex_init(&ctx->mutex); + INIT_LIST_HEAD(&ctx->link); + +- mutex_init(&ctx->parallel_submit); +- + spin_lock_init(&ctx->stale.lock); + INIT_LIST_HEAD(&ctx->stale.engines); + +@@ -1857,6 +1854,48 @@ static bool validate_parallel_engines_layout(const struct set_engines *set) + return true; + } + ++/* ++ * Engine must be same class and form a logically contiguous mask. ++ * ++ * FIXME: Logical mask check not 100% correct but good enough for the PoC ++ */ ++static bool __validate_parallel_engines_layout(struct drm_i915_private *i915, ++ struct intel_context *parent) ++{ ++ u8 engine_class = parent->engine->class; ++ u8 num_siblings = hweight_long(parent->engine->logical_mask); ++ struct intel_context *child; ++ intel_engine_mask_t logical_mask = parent->engine->logical_mask; ++ ++ for_each_child(parent, child) { ++ if (child->engine->class != engine_class) { ++ drm_dbg(&i915->drm, "Class mismatch: %u, %u", ++ engine_class, child->engine->class); ++ return false; ++ } ++ if (hweight_long(child->engine->logical_mask) != num_siblings) { ++ drm_dbg(&i915->drm, "Sibling mismatch: %u, %lu", ++ num_siblings, ++ hweight_long(child->engine->logical_mask)); ++ return false; ++ } ++ if (logical_mask & child->engine->logical_mask) { ++ drm_dbg(&i915->drm, "Overlapping logical mask: 0x%04x, 0x%04x", ++ logical_mask, child->engine->logical_mask); ++ return false; ++ } ++ logical_mask |= child->engine->logical_mask; ++ } ++ ++ if (!is_power_of_2((logical_mask >> (ffs(logical_mask) - 1)) + 1)) { ++ drm_dbg(&i915->drm, "Non-contiguous logical mask: 0x%04x", ++ logical_mask); ++ return false; ++ } ++ ++ return true; ++} ++ + static int + set_engines__parallel_submit(struct i915_user_extension __user *base, void *data) + { +@@ -2009,11 +2048,156 @@ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data + return err; + } + ++static int ++set_engines__parallel2_submit(struct i915_user_extension __user *base, ++ void *data) ++{ ++ struct prelim_drm_i915_context_engines_parallel2_submit __user *ext = ++ container_of_user(base, typeof(*ext), base); ++ const struct set_engines *set = data; ++ struct drm_i915_private *i915 = set->ctx->i915; ++ struct intel_context *parent, *child, *ce; ++ u64 flags; ++ int err = 0, n, i, j; ++ u16 slot, width, num_siblings; ++ struct intel_engine_cs **siblings = NULL; ++ ++ if (!(intel_uc_uses_guc_submission(&i915->gt.uc))) ++ return -ENODEV; ++ ++ if (get_user(slot, &ext->engine_index)) ++ return -EFAULT; ++ ++ if (get_user(width, &ext->width)) ++ return -EFAULT; ++ ++ if (get_user(num_siblings, &ext->num_siblings)) ++ return -EFAULT; ++ ++ if (slot >= set->engines->num_engines) { ++ drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n", ++ slot, set->engines->num_engines); ++ return -EINVAL; ++ } ++ ++ parent = set->engines->engines[slot]; ++ if (parent) { ++ drm_dbg(&i915->drm, "Context index[%d] not NULL\n", slot); ++ return -EINVAL; ++ } ++ ++ if (get_user(flags, &ext->flags)) ++ return -EFAULT; ++ ++ if (flags) { ++ drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags); ++ return -EINVAL; ++ } ++ ++ for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) { ++ err = check_user_mbz(&ext->mbz64[n]); ++ if (err) ++ return err; ++ } ++ ++ if (width < 1) { ++ drm_dbg(&i915->drm, "Width (%d) < 1 \n", width); ++ return -EINVAL; ++ } ++ ++ if (num_siblings < 1) { ++ drm_dbg(&i915->drm, "Number siblings (%d) < 1 \n", ++ num_siblings); ++ return -EINVAL; ++ } ++ ++ siblings = kmalloc_array(num_siblings, ++ sizeof(*siblings), ++ GFP_KERNEL); ++ if (!siblings) ++ return -ENOMEM; ++ ++ mutex_lock(&set->ctx->mutex); ++ ++ /* Create contexts / engines */ ++ for (i = 0; i < width; ++i) { ++ for (j = 0; j < num_siblings; ++j) { ++ struct i915_engine_class_instance ci; ++ ++ if (copy_from_user(&ci, &ext->engines[i * num_siblings + j], ++ sizeof(ci))) { ++ err = -EFAULT; ++ goto out_err; ++ } ++ ++ siblings[j] = intel_engine_lookup_user(i915, ++ ci.engine_class, ++ ci.engine_instance); ++ if (!siblings[j]) { ++ drm_dbg(&i915->drm, ++ "Invalid sibling[%d]: { class:%d, inst:%d }\n", ++ n, ci.engine_class, ci.engine_instance); ++ err = -EINVAL; ++ goto out_err; ++ } ++ } ++ ++ ce = intel_engine_create_virtual(siblings, num_siblings, ++ FORCE_VIRTUAL); ++ if (IS_ERR(ce)) { ++ err = PTR_ERR(ce); ++ goto out_err; ++ } ++ intel_context_set_gem(ce, set->ctx); ++ ++ if (i == 0) { ++ parent = ce; ++ __set_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags); ++ } else { ++ intel_context_bind_parent_child(parent, ce); ++ err = intel_context_alloc_state(ce); ++ if (err) ++ goto out_err; ++ } ++ } ++ ++ if (!__validate_parallel_engines_layout(i915, parent)) { ++ drm_dbg(&i915->drm, "Invalidate parallel context layout"); ++ err = -EINVAL; ++ goto out_err; ++ } ++ ++ intel_guc_configure_parent_context(parent); ++ if (cmpxchg(&set->engines->engines[slot], NULL, parent)) { ++ err = -EEXIST; ++ goto out_err; ++ } ++ ++ kfree(siblings); ++ mutex_unlock(&set->ctx->mutex); ++ ++ return 0; ++ ++out_err: ++ if (parent) { ++ for_each_child(parent, child) ++ intel_context_put(child); ++ intel_context_put(parent); ++ set->engines->engines[slot] = NULL; ++ } ++ kfree(siblings); ++ mutex_unlock(&set->ctx->mutex); ++ ++ return err; ++} ++ + static const i915_user_extension_fn set_engines__extensions[] = { + [I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance, + [I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond, + [PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)] = + set_engines__parallel_submit, ++ [PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT)] = ++ set_engines__parallel2_submit, + }; + + static int +diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h ++++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +@@ -194,12 +194,6 @@ struct i915_gem_context { + */ + u64 fence_context; + +- /** +- * @parallel_submit: Ensure only 1 parallel submission is happening on +- * this context at a time. +- */ +- struct mutex parallel_submit; +- + /** + * @seqno: Seqno when using when a parallel context. + */ +diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c ++++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +@@ -1633,7 +1633,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, + goto err_unmap; + + if (engine == eb->context->engine && +- !i915_gem_context_is_parallel(eb->gem_context)) { ++ !intel_context_is_parallel(eb->context)) { + rq = i915_request_create(eb->context); + } else { + struct intel_context *ce = eb->reloc_context; +@@ -1727,7 +1727,7 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb, + struct intel_engine_cs *engine = eb->engine; + + if (!reloc_can_use_engine(engine) || +- i915_gem_context_is_parallel(eb->gem_context)) { ++ intel_context_is_parallel(eb->context)) { + engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0]; + if (!engine) + return ERR_PTR(-ENODEV); +@@ -3223,7 +3223,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb) + } + + static int +-eb_select_engine(struct i915_execbuffer *eb) ++eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number) + { + struct intel_context *ce; + unsigned int idx; +@@ -3238,6 +3238,16 @@ eb_select_engine(struct i915_execbuffer *eb) + if (IS_ERR(ce)) + return PTR_ERR(ce); + ++ if (batch_number > 0 && ++ !i915_gem_context_is_parallel(eb->gem_context)) { ++ struct intel_context *parent = ce; ++ for_each_child(parent, ce) ++ if (!--batch_number) ++ break; ++ intel_context_put(parent); ++ intel_context_get(ce); ++ } ++ + intel_gt_pm_get(ce->engine->gt); + + if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) { +@@ -3562,7 +3572,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, + if (unlikely(err)) + goto err_destroy; + +- err = eb_select_engine(&eb); ++ err = eb_select_engine(&eb, batch_number); + if (unlikely(err)) + goto err_context; + +@@ -3751,6 +3761,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, + const size_t count = args->buffer_count; + unsigned int num_batches, i; + int err, start_context; ++ bool is_parallel = false; ++ struct intel_context *parent = NULL; + + if (!check_buffer_count(count)) { + drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count); +@@ -3782,15 +3794,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, + I915_EXEC_NUMBER_BB_LSB) + + ((args->flags & PRELIM_I915_EXEC_NUMBER_BB_MASK) >> + PRELIM_I915_EXEC_NUMBER_BB_LSB)) + 1; +- if (i915_gem_context_is_parallel(ctx)) { +- if (num_batches > count || +- start_context + num_batches > ctx->width) { +- err = -EINVAL; +- goto err_context; ++ ++ if (i915_gem_context_user_engines(ctx)) { ++ parent = i915_gem_context_get_engine(ctx, start_context); ++ if (IS_ERR(parent)) { ++ i915_gem_context_put(ctx); ++ return PTR_ERR(parent); + } + +- if (i915_gem_context_is_bb_preempt_boundary(ctx) && +- (start_context || num_batches != ctx->width)) { ++ is_parallel = i915_gem_context_is_parallel(ctx) || ++ intel_context_is_parallel(parent); ++ if (i915_gem_context_is_parallel(ctx)) { ++ if (num_batches > count || ++ start_context + num_batches > ctx->width) { ++ err = -EINVAL; ++ goto err_context; ++ } ++ ++ if (i915_gem_context_is_bb_preempt_boundary(ctx) && ++ (start_context || num_batches != ctx->width)) { ++ err = -EINVAL; ++ goto err_context; ++ } ++ } else if (intel_context_is_parallel(parent)) { ++ if (num_batches != 1) ++ return -EINVAL; ++ num_batches = parent->guc_number_children + 1; ++ if (num_batches > count) ++ return -EINVAL; ++ } else if(num_batches > 1) { + err = -EINVAL; + goto err_context; + } +@@ -3827,8 +3859,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, + * properly, also this is needed to create an excl fence for an dma buf + * objects these BBs touch. + */ +- if (args->flags & I915_EXEC_FENCE_OUT || +- i915_gem_context_is_parallel(ctx)) { ++ if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) { + out_fences = kcalloc(num_batches, sizeof(*out_fences), + GFP_KERNEL); + if (!out_fences) { +@@ -3874,8 +3905,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, + * in intel_context sequence, thus only 1 submission can happen at a + * time. + */ +- if (i915_gem_context_is_parallel(ctx)) +- mutex_lock(&ctx->parallel_submit); ++ if (is_parallel) ++ mutex_lock(&parent->parallel_submit); + + err = i915_gem_do_execbuffer(dev, file, args, exec2_list, + args->flags & I915_EXEC_BATCH_FIRST ? +@@ -3889,8 +3920,10 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, + &ww); + + for (i = 1; err == 0 && i < num_batches; i++) { +- args->flags &= ~I915_EXEC_RING_MASK; +- args->flags |= start_context + i; ++ if (i915_gem_context_is_parallel(ctx)) { ++ args->flags &= ~I915_EXEC_RING_MASK; ++ args->flags |= start_context + i; ++ } + args->batch_len = 0; + + err = i915_gem_do_execbuffer(dev, file, args, exec2_list, +@@ -3905,8 +3938,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, + &ww); + } + +- if (i915_gem_context_is_parallel(ctx)) +- mutex_unlock(&ctx->parallel_submit); ++ if (is_parallel) ++ mutex_unlock(&parent->parallel_submit); + + /* + * Now that we have begun execution of the batchbuffer, we ignore +@@ -4009,6 +4042,8 @@ end:; + dma_fence_put(in_fence); + err_context: + i915_gem_context_put(ctx); ++ if (parent) ++ intel_context_put(parent); + + return err; + } +diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c +--- a/drivers/gpu/drm/i915/gt/intel_context.c ++++ b/drivers/gpu/drm/i915/gt/intel_context.c +@@ -460,6 +460,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) + INIT_LIST_HEAD(&ce->signals); + + mutex_init(&ce->pin_mutex); ++ mutex_init(&ce->parallel_submit); + + spin_lock_init(&ce->guc_state.lock); + INIT_LIST_HEAD(&ce->guc_state.fences); +@@ -491,6 +492,7 @@ void intel_context_fini(struct intel_context *ce) + intel_context_put(child); + + mutex_destroy(&ce->pin_mutex); ++ mutex_destroy(&ce->parallel_submit); + i915_active_fini(&ce->active); + } + +diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h +--- a/drivers/gpu/drm/i915/gt/intel_context.h ++++ b/drivers/gpu/drm/i915/gt/intel_context.h +@@ -52,6 +52,11 @@ static inline bool intel_context_is_parent(struct intel_context *ce) + return !!ce->guc_number_children; + } + ++static inline bool intel_context_is_parallel(struct intel_context *ce) ++{ ++ return intel_context_is_child(ce) || intel_context_is_parent(ce); ++} ++ + /* Only should be called directly by selftests */ + void __intel_context_bind_parent_child(struct intel_context *parent, + struct intel_context *child); +@@ -204,6 +209,12 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce) + return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); + } + ++static inline bool ++intel_context_is_no_preempt_mid_batch(const struct intel_context *ce) ++{ ++ return test_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags); ++} ++ + static inline bool intel_context_is_closed(const struct intel_context *ce) + { + return test_bit(CONTEXT_CLOSED_BIT, &ce->flags); +diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h +--- a/drivers/gpu/drm/i915/gt/intel_context_types.h ++++ b/drivers/gpu/drm/i915/gt/intel_context_types.h +@@ -114,6 +114,7 @@ struct intel_context { + #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 + #define CONTEXT_NOPREEMPT 8 + #define CONTEXT_LRCA_DIRTY 9 ++#define CONTEXT_NO_PREEMPT_MID_BATCH 10 + + struct { + u64 timeout_us; +@@ -239,6 +240,9 @@ struct intel_context { + + /* Last request submitted on a parent */ + struct i915_request *last_rq; ++ ++ /* Parallel submit mutex */ ++ struct mutex parallel_submit; + }; + + #endif /* __INTEL_CONTEXT_TYPES__ */ +diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c ++++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +@@ -798,8 +798,7 @@ static inline int rq_prio(const struct i915_request *rq) + + static inline bool is_multi_lrc(struct intel_context *ce) + { +- return intel_context_is_child(ce) || +- intel_context_is_parent(ce); ++ return intel_context_is_parallel(ce); + } + + static inline bool is_multi_lrc_rq(struct i915_request *rq) +@@ -3458,6 +3457,7 @@ void intel_guc_configure_parent_context(struct intel_context *ce) + bb_preempt_boundary = + i915_gem_context_is_bb_preempt_boundary(ctx); + rcu_read_unlock(); ++ bb_preempt_boundary |= intel_context_is_no_preempt_mid_batch(ce); + if (bb_preempt_boundary) { + ce->emit_bb_start = emit_bb_start_parent_bb_preempt_boundary; + ce->emit_fini_breadcrumb = +diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c +--- a/drivers/gpu/drm/i915/i915_request.c ++++ b/drivers/gpu/drm/i915/i915_request.c +@@ -1606,14 +1606,9 @@ i915_request_await_object(struct i915_request *to, + return ret; + } + +-static inline bool is_parallel(struct intel_context *ce) +-{ +- return intel_context_is_child(ce) || intel_context_is_parent(ce); +-} +- + static inline bool is_parallel_rq(struct i915_request *rq) + { +- return is_parallel(rq->context); ++ return intel_context_is_parallel(rq->context); + } + + static inline struct intel_context *request_to_parent(struct i915_request *rq) +diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h +--- a/include/uapi/drm/i915_drm_prelim.h ++++ b/include/uapi/drm/i915_drm_prelim.h +@@ -370,9 +370,124 @@ struct prelim_i915_context_engines_parallel_submit { + } __attribute__ ((packed)); + #define i915_context_engines_parallel_submit prelim_i915_context_engines_parallel_submit + ++/** ++ * struct prelim_drm_i915_context_engines_parallel2_submit - Configure engine ++ * for parallel submission. ++ * ++ * Setup a slot in the context engine map to allow multiple BBs to be submitted ++ * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU ++ * in parallel. Multiple hardware contexts are created internally in the i915 ++ * run these BBs. Once a slot is configured for N BBs only N BBs can be ++ * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user ++ * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how ++ * many BBs there are based on the slot's configuration. The N BBs are the last ++ * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set. ++ * ++ * The default placement behavior is to create implicit bonds between each ++ * context if each context maps to more than 1 physical engine (e.g. context is ++ * a virtual engine). Also we only allow contexts of same engine class and these ++ * contexts must be in logically contiguous order. Examples of the placement ++ * behavior described below. Lastly, the default is to not allow BBs to ++ * preempted mid BB rather insert coordinated preemption on all hardware ++ * contexts between each set of BBs. Flags may be added in the future to change ++ * both of these default behaviors. ++ * ++ * Returns -EINVAL if hardware context placement configuration is invalid or if ++ * the placement configuration isn't supported on the platform / submission ++ * interface. ++ * Returns -ENODEV if extension isn't supported on the platform / submission ++ * inteface. ++ * ++ * .. code-block:: ++ * ++ * Example 1 pseudo code: ++ * CS[X] = generic engine of same class, logical instance X ++ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE ++ * set_engines(INVALID) ++ * set_parallel(engine_index=0, width=2, num_siblings=1, ++ * engines=CS[0],CS[1]) ++ * ++ * Results in the following valid placement: ++ * CS[0], CS[1] ++ * ++ * Example 2 pseudo code: ++ * CS[X] = generic engine of same class, logical instance X ++ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE ++ * set_engines(INVALID) ++ * set_parallel(engine_index=0, width=2, num_siblings=2, ++ * engines=CS[0],CS[2],CS[1],CS[3]) ++ * ++ * Results in the following valid placements: ++ * CS[0], CS[1] ++ * CS[2], CS[3] ++ * ++ * This can also be thought of as 2 virtual engines described by 2-D array ++ * in the engines the field with bonds placed between each index of the ++ * virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to ++ * CS[3]. ++ * VE[0] = CS[0], CS[2] ++ * VE[1] = CS[1], CS[3] ++ * ++ * Example 3 pseudo code: ++ * CS[X] = generic engine of same class, logical instance X ++ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE ++ * set_engines(INVALID) ++ * set_parallel(engine_index=0, width=2, num_siblings=2, ++ * engines=CS[0],CS[1],CS[1],CS[3]) ++ * ++ * Results in the following valid and invalid placements: ++ * CS[0], CS[1] ++ * CS[1], CS[3] - Not logical contiguous, return -EINVAL ++ */ ++struct prelim_drm_i915_context_engines_parallel2_submit { ++ /** ++ * @base: base user extension. ++ */ ++ struct i915_user_extension base; ++ ++ /** ++ * @engine_index: slot for parallel engine ++ */ ++ __u16 engine_index; ++ ++ /** ++ * @width: number of contexts per parallel engine ++ */ ++ __u16 width; ++ ++ /** ++ * @num_siblings: number of siblings per context ++ */ ++ __u16 num_siblings; ++ ++ /** ++ * @mbz16: reserved for future use; must be zero ++ */ ++ __u16 mbz16; ++ ++ /** ++ * @flags: all undefined flags must be zero, currently not defined flags ++ */ ++ __u64 flags; ++ ++ /** ++ * @mbz64: reserved for future use; must be zero ++ */ ++ __u64 mbz64[3]; ++ ++ /** ++ * @engines: 2-d array of engine instances to configure parallel engine ++ * ++ * length = width (i) * num_siblings (j) ++ * index = j + i * num_siblings ++ */ ++ struct i915_engine_class_instance engines[0]; ++} __attribute__ ((packed)); ++ + struct prelim_i915_context_param_engines { + #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT (PRELIM_I915_USER_EXT | 2) /* see prelim_i915_context_engines_parallel_submit */ + #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */ ++#define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */ + }; + + enum prelim_drm_i915_gem_memory_class { -- git-pile 0.97

Matthew Brost

12:30 a.m.

New subject: [PATCH 1/2] INTEL_DII/NOT_UPSTREAM: drm/i915: Introduce set_parallel2 extension

Based on upstream feedback the set_parallel extension isn't suitable as it looks a bit too much like the bonding extension. Introduce a set_parallel2 extension which configures parallel submission in a single extension and in a single slot. This compares to old set_parallel extension which configured parallel submission across multiple slots.

Also remove the ability for the user to pass in the number of BBs in the execbuf IOCTL. The number of BBs is now implied based on the width of the context in the slot.

This patch is intended in enable UMDs for the upstream direction while maintaining the old set_parallel extension to not break UMDs. Once UMDs have been updated to use new extension the old one can be removed from DII.

v2: Only enable parallel submission on engines set by user

Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 190 +++++++++++++++++- .../gpu/drm/i915/gem/i915_gem_context_types.h | 6 - .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 73 +++++-- drivers/gpu/drm/i915/gt/intel_context.c | 2 + drivers/gpu/drm/i915/gt/intel_context.h | 11 + drivers/gpu/drm/i915/gt/intel_context_types.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 +- drivers/gpu/drm/i915/i915_request.c | 7 +- include/uapi/drm/i915_drm_prelim.h | 115 +++++++++++ 9 files changed, 376 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 2e3c0bf09eff..c8da87dde1fa 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -374,7 +374,6 @@ void i915_gem_context_release(struct kref *ref) mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex); mutex_destroy(&ctx->mutex); - mutex_destroy(&ctx->parallel_submit);

kfree_rcu(ctx, rcu); } @@ -699,8 +698,6 @@ __create_context(struct drm_i915_private *i915) mutex_init(&ctx->mutex); INIT_LIST_HEAD(&ctx->link);

- mutex_init(&ctx->parallel_submit); - spin_lock_init(&ctx->stale.lock); INIT_LIST_HEAD(&ctx->stale.engines);

@@ -1857,6 +1854,48 @@ static bool validate_parallel_engines_layout(const struct set_engines *set) return true; }

+/* + * Engine must be same class and form a logically contiguous mask. + * + * FIXME: Logical mask check not 100% correct but good enough for the PoC + */ +static bool __validate_parallel_engines_layout(struct drm_i915_private *i915, + struct intel_context *parent) +{ + u8 engine_class = parent->engine->class; + u8 num_siblings = hweight_long(parent->engine->logical_mask); + struct intel_context *child; + intel_engine_mask_t logical_mask = parent->engine->logical_mask; + + for_each_child(parent, child) { + if (child->engine->class != engine_class) { + drm_dbg(&i915->drm, "Class mismatch: %u, %u", + engine_class, child->engine->class); + return false; + } + if (hweight_long(child->engine->logical_mask) != num_siblings) { + drm_dbg(&i915->drm, "Sibling mismatch: %u, %lu", + num_siblings, + hweight_long(child->engine->logical_mask)); + return false; + } + if (logical_mask & child->engine->logical_mask) { + drm_dbg(&i915->drm, "Overlapping logical mask: 0x%04x, 0x%04x", + logical_mask, child->engine->logical_mask); + return false; + } + logical_mask |= child->engine->logical_mask; + } + + if (!is_power_of_2((logical_mask >> (ffs(logical_mask) - 1)) + 1)) { + drm_dbg(&i915->drm, "Non-contiguous logical mask: 0x%04x", + logical_mask); + return false; + } + + return true; +} + static int set_engines__parallel_submit(struct i915_user_extension __user *base, void *data) { @@ -2009,11 +2048,156 @@ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data return err; }

+static int +set_engines__parallel2_submit(struct i915_user_extension __user *base, + void *data) +{ + struct prelim_drm_i915_context_engines_parallel2_submit __user *ext = + container_of_user(base, typeof(*ext), base); + const struct set_engines *set = data; + struct drm_i915_private *i915 = set->ctx->i915; + struct intel_context *parent, *child, *ce; + u64 flags; + int err = 0, n, i, j; + u16 slot, width, num_siblings; + struct intel_engine_cs **siblings = NULL; + + if (!(intel_uc_uses_guc_submission(&i915->gt.uc))) + return -ENODEV; + + if (get_user(slot, &ext->engine_index)) + return -EFAULT; + + if (get_user(width, &ext->width)) + return -EFAULT; + + if (get_user(num_siblings, &ext->num_siblings)) + return -EFAULT; + + if (slot >= set->engines->num_engines) { + drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n", + slot, set->engines->num_engines); + return -EINVAL; + } + + parent = set->engines->engines[slot]; + if (parent) { + drm_dbg(&i915->drm, "Context index[%d] not NULL\n", slot); + return -EINVAL; + } + + if (get_user(flags, &ext->flags)) + return -EFAULT; + + if (flags) { + drm_dbg(&i915->drm, "Unknown flags 0x%02llx", flags); + return -EINVAL; + } + + for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) { + err = check_user_mbz(&ext->mbz64[n]); + if (err) + return err; + } + + if (width < 1) { + drm_dbg(&i915->drm, "Width (%d) < 1 \n", width); + return -EINVAL; + } + + if (num_siblings < 1) { + drm_dbg(&i915->drm, "Number siblings (%d) < 1 \n", + num_siblings); + return -EINVAL; + } + + siblings = kmalloc_array(num_siblings, + sizeof(*siblings), + GFP_KERNEL); + if (!siblings) + return -ENOMEM; + + mutex_lock(&set->ctx->mutex); + + /* Create contexts / engines */ + for (i = 0; i < width; ++i) { + for (j = 0; j < num_siblings; ++j) { + struct i915_engine_class_instance ci; + + if (copy_from_user(&ci, &ext->engines[i * num_siblings + j], + sizeof(ci))) { + err = -EFAULT; + goto out_err; + } + + siblings[j] = intel_engine_lookup_user(i915, + ci.engine_class, + ci.engine_instance); + if (!siblings[j]) { + drm_dbg(&i915->drm, + "Invalid sibling[%d]: { class:%d, inst:%d }\n", + n, ci.engine_class, ci.engine_instance); + err = -EINVAL; + goto out_err; + } + } + + ce = intel_engine_create_virtual(siblings, num_siblings, + FORCE_VIRTUAL); + if (IS_ERR(ce)) { + err = PTR_ERR(ce); + goto out_err; + } + intel_context_set_gem(ce, set->ctx); + + if (i == 0) { + parent = ce; + __set_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags); + } else { + intel_context_bind_parent_child(parent, ce); + err = intel_context_alloc_state(ce); + if (err) + goto out_err; + } + } + + if (!__validate_parallel_engines_layout(i915, parent)) { + drm_dbg(&i915->drm, "Invalidate parallel context layout"); + err = -EINVAL; + goto out_err; + } + + intel_guc_configure_parent_context(parent); + if (cmpxchg(&set->engines->engines[slot], NULL, parent)) { + err = -EEXIST; + goto out_err; + } + + kfree(siblings); + mutex_unlock(&set->ctx->mutex); + + return 0; + +out_err: + if (parent) { + for_each_child(parent, child) + intel_context_put(child); + intel_context_put(parent); + set->engines->engines[slot] = NULL; + } + kfree(siblings); + mutex_unlock(&set->ctx->mutex); + + return err; +} + static const i915_user_extension_fn set_engines__extensions[] = { [I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance, [I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond, [PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT)] = set_engines__parallel_submit, + [PRELIM_I915_USER_EXT_MASK(PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT)] = + set_engines__parallel2_submit, };

static int diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 4f5785c7528b..0c79000e95fe 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -194,12 +194,6 @@ struct i915_gem_context { */ u64 fence_context;

- /** - * @parallel_submit: Ensure only 1 parallel submission is happening on - * this context at a time. - */ - struct mutex parallel_submit; - /** * @seqno: Seqno when using when a parallel context. */ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index d3f70899e7ab..2253336e9791 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1633,7 +1633,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, goto err_unmap;

if (engine == eb->context->engine && - !i915_gem_context_is_parallel(eb->gem_context)) { + !intel_context_is_parallel(eb->context)) { rq = i915_request_create(eb->context); } else { struct intel_context *ce = eb->reloc_context; @@ -1727,7 +1727,7 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb, struct intel_engine_cs *engine = eb->engine;

if (!reloc_can_use_engine(engine) || - i915_gem_context_is_parallel(eb->gem_context)) { + intel_context_is_parallel(eb->context)) { engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0]; if (!engine) return ERR_PTR(-ENODEV); @@ -3223,7 +3223,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb) }

static int -eb_select_engine(struct i915_execbuffer *eb) +eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number) { struct intel_context *ce; unsigned int idx; @@ -3238,6 +3238,16 @@ eb_select_engine(struct i915_execbuffer *eb) if (IS_ERR(ce)) return PTR_ERR(ce);

+ if (batch_number > 0 && + !i915_gem_context_is_parallel(eb->gem_context)) { + struct intel_context *parent = ce; + for_each_child(parent, ce) + if (!--batch_number) + break; + intel_context_put(parent); + intel_context_get(ce); + } + intel_gt_pm_get(ce->engine->gt);

if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) { @@ -3562,7 +3572,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (unlikely(err)) goto err_destroy;

- err = eb_select_engine(&eb); + err = eb_select_engine(&eb, batch_number); if (unlikely(err)) goto err_context;

@@ -3751,6 +3761,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, const size_t count = args->buffer_count; unsigned int num_batches, i; int err, start_context; + bool is_parallel = false; + struct intel_context *parent = NULL;

if (!check_buffer_count(count)) { drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count); @@ -3782,15 +3794,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, I915_EXEC_NUMBER_BB_LSB) + ((args->flags & PRELIM_I915_EXEC_NUMBER_BB_MASK) >> PRELIM_I915_EXEC_NUMBER_BB_LSB)) + 1; - if (i915_gem_context_is_parallel(ctx)) { - if (num_batches > count || - start_context + num_batches > ctx->width) { - err = -EINVAL; - goto err_context; + + if (i915_gem_context_user_engines(ctx)) { + parent = i915_gem_context_get_engine(ctx, start_context); + if (IS_ERR(parent)) { + i915_gem_context_put(ctx); + return PTR_ERR(parent); }

- if (i915_gem_context_is_bb_preempt_boundary(ctx) && - (start_context || num_batches != ctx->width)) { + is_parallel = i915_gem_context_is_parallel(ctx) || + intel_context_is_parallel(parent); + if (i915_gem_context_is_parallel(ctx)) { + if (num_batches > count || + start_context + num_batches > ctx->width) { + err = -EINVAL; + goto err_context; + } + + if (i915_gem_context_is_bb_preempt_boundary(ctx) && + (start_context || num_batches != ctx->width)) { + err = -EINVAL; + goto err_context; + } + } else if (intel_context_is_parallel(parent)) { + if (num_batches != 1) + return -EINVAL; + num_batches = parent->guc_number_children + 1; + if (num_batches > count) + return -EINVAL; + } else if(num_batches > 1) { err = -EINVAL; goto err_context; } @@ -3827,8 +3859,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, * properly, also this is needed to create an excl fence for an dma buf * objects these BBs touch. */ - if (args->flags & I915_EXEC_FENCE_OUT || - i915_gem_context_is_parallel(ctx)) { + if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) { out_fences = kcalloc(num_batches, sizeof(*out_fences), GFP_KERNEL); if (!out_fences) { @@ -3874,8 +3905,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, * in intel_context sequence, thus only 1 submission can happen at a * time. */ - if (i915_gem_context_is_parallel(ctx)) - mutex_lock(&ctx->parallel_submit); + if (is_parallel) + mutex_lock(&parent->parallel_submit);

err = i915_gem_do_execbuffer(dev, file, args, exec2_list, args->flags & I915_EXEC_BATCH_FIRST ? @@ -3889,8 +3920,10 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, &ww);

for (i = 1; err == 0 && i < num_batches; i++) { - args->flags &= ~I915_EXEC_RING_MASK; - args->flags |= start_context + i; + if (i915_gem_context_is_parallel(ctx)) { + args->flags &= ~I915_EXEC_RING_MASK; + args->flags |= start_context + i; + } args->batch_len = 0;

err = i915_gem_do_execbuffer(dev, file, args, exec2_list, @@ -3905,8 +3938,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, &ww); }

- if (i915_gem_context_is_parallel(ctx)) - mutex_unlock(&ctx->parallel_submit); + if (is_parallel) + mutex_unlock(&parent->parallel_submit);

/* * Now that we have begun execution of the batchbuffer, we ignore @@ -4009,6 +4042,8 @@ end:; dma_fence_put(in_fence); err_context: i915_gem_context_put(ctx); + if (parent) + intel_context_put(parent);

return err; } diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 84ce1f4d3051..2e45c128ee0b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -460,6 +460,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) INIT_LIST_HEAD(&ce->signals);

mutex_init(&ce->pin_mutex); + mutex_init(&ce->parallel_submit);

spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences); @@ -491,6 +492,7 @@ void intel_context_fini(struct intel_context *ce) intel_context_put(child);

mutex_destroy(&ce->pin_mutex); + mutex_destroy(&ce->parallel_submit); i915_active_fini(&ce->active); }

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index fc2da541d633..5ae6cc7374c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -52,6 +52,11 @@ static inline bool intel_context_is_parent(struct intel_context *ce) return !!ce->guc_number_children; }

+static inline bool intel_context_is_parallel(struct intel_context *ce) +{ + return intel_context_is_child(ce) || intel_context_is_parent(ce); +} + /* Only should be called directly by selftests */ void __intel_context_bind_parent_child(struct intel_context *parent, struct intel_context *child); @@ -204,6 +209,12 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce) return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); }

+static inline bool +intel_context_is_no_preempt_mid_batch(const struct intel_context *ce) +{ + return test_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags); +} + static inline bool intel_context_is_closed(const struct intel_context *ce) { return test_bit(CONTEXT_CLOSED_BIT, &ce->flags); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 44bc13ee9f8b..74c23c34ee50 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -114,6 +114,7 @@ struct intel_context { #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 #define CONTEXT_NOPREEMPT 8 #define CONTEXT_LRCA_DIRTY 9 +#define CONTEXT_NO_PREEMPT_MID_BATCH 10

struct { u64 timeout_us; @@ -239,6 +240,9 @@ struct intel_context {

/* Last request submitted on a parent */ struct i915_request *last_rq; + + /* Parallel submit mutex */ + struct mutex parallel_submit; };

#endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c9e873f075a7..2b2c243ffd16 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -798,8 +798,7 @@ static inline int rq_prio(const struct i915_request *rq)

static inline bool is_multi_lrc(struct intel_context *ce) { - return intel_context_is_child(ce) || - intel_context_is_parent(ce); + return intel_context_is_parallel(ce); }

static inline bool is_multi_lrc_rq(struct i915_request *rq) @@ -3458,6 +3457,7 @@ void intel_guc_configure_parent_context(struct intel_context *ce) bb_preempt_boundary = i915_gem_context_is_bb_preempt_boundary(ctx); rcu_read_unlock(); + bb_preempt_boundary |= intel_context_is_no_preempt_mid_batch(ce); if (bb_preempt_boundary) { ce->emit_bb_start = emit_bb_start_parent_bb_preempt_boundary; ce->emit_fini_breadcrumb = diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 092e9d892c30..76fa06f06947 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1606,14 +1606,9 @@ i915_request_await_object(struct i915_request *to, return ret; }

-static inline bool is_parallel(struct intel_context *ce) -{ - return intel_context_is_child(ce) || intel_context_is_parent(ce); -} - static inline bool is_parallel_rq(struct i915_request *rq) { - return is_parallel(rq->context); + return intel_context_is_parallel(rq->context); }

static inline struct intel_context *request_to_parent(struct i915_request *rq) diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h index a885c14585aa..6d5d7eb54813 100644 --- a/include/uapi/drm/i915_drm_prelim.h +++ b/include/uapi/drm/i915_drm_prelim.h @@ -370,9 +370,124 @@ struct prelim_i915_context_engines_parallel_submit { } __attribute__ ((packed)); #define i915_context_engines_parallel_submit prelim_i915_context_engines_parallel_submit

+/** + * struct prelim_drm_i915_context_engines_parallel2_submit - Configure engine + * for parallel submission. + * + * Setup a slot in the context engine map to allow multiple BBs to be submitted + * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU + * in parallel. Multiple hardware contexts are created internally in the i915 + * run these BBs. Once a slot is configured for N BBs only N BBs can be + * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user + * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how + * many BBs there are based on the slot's configuration. The N BBs are the last + * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set. + * + * The default placement behavior is to create implicit bonds between each + * context if each context maps to more than 1 physical engine (e.g. context is + * a virtual engine). Also we only allow contexts of same engine class and these + * contexts must be in logically contiguous order. Examples of the placement + * behavior described below. Lastly, the default is to not allow BBs to + * preempted mid BB rather insert coordinated preemption on all hardware + * contexts between each set of BBs. Flags may be added in the future to change + * both of these default behaviors. + * + * Returns -EINVAL if hardware context placement configuration is invalid or if + * the placement configuration isn't supported on the platform / submission + * interface. + * Returns -ENODEV if extension isn't supported on the platform / submission + * inteface. + * + * .. code-block:: + * + * Example 1 pseudo code: + * CS[X] = generic engine of same class, logical instance X + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE + * set_engines(INVALID) + * set_parallel(engine_index=0, width=2, num_siblings=1, + * engines=CS[0],CS[1]) + * + * Results in the following valid placement: + * CS[0], CS[1] + * + * Example 2 pseudo code: + * CS[X] = generic engine of same class, logical instance X + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE + * set_engines(INVALID) + * set_parallel(engine_index=0, width=2, num_siblings=2, + * engines=CS[0],CS[2],CS[1],CS[3]) + * + * Results in the following valid placements: + * CS[0], CS[1] + * CS[2], CS[3] + * + * This can also be thought of as 2 virtual engines described by 2-D array + * in the engines the field with bonds placed between each index of the + * virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to + * CS[3]. + * VE[0] = CS[0], CS[2] + * VE[1] = CS[1], CS[3] + * + * Example 3 pseudo code: + * CS[X] = generic engine of same class, logical instance X + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE + * set_engines(INVALID) + * set_parallel(engine_index=0, width=2, num_siblings=2, + * engines=CS[0],CS[1],CS[1],CS[3]) + * + * Results in the following valid and invalid placements: + * CS[0], CS[1] + * CS[1], CS[3] - Not logical contiguous, return -EINVAL + */ +struct prelim_drm_i915_context_engines_parallel2_submit { + /** + * @base: base user extension. + */ + struct i915_user_extension base; + + /** + * @engine_index: slot for parallel engine + */ + __u16 engine_index; + + /** + * @width: number of contexts per parallel engine + */ + __u16 width; + + /** + * @num_siblings: number of siblings per context + */ + __u16 num_siblings; + + /** + * @mbz16: reserved for future use; must be zero + */ + __u16 mbz16; + + /** + * @flags: all undefined flags must be zero, currently not defined flags + */ + __u64 flags; + + /** + * @mbz64: reserved for future use; must be zero + */ + __u64 mbz64[3]; + + /** + * @engines: 2-d array of engine instances to configure parallel engine + * + * length = width (i) * num_siblings (j) + * index = j + i * num_siblings + */ + struct i915_engine_class_instance engines[0]; +} __attribute__ ((packed)); + struct prelim_i915_context_param_engines { #define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT (PRELIM_I915_USER_EXT | 2) /* see prelim_i915_context_engines_parallel_submit */ #define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */ +#define PRELIM_I915_CONTEXT_ENGINES_EXT_PARALLEL2_SUBMIT (PRELIM_I915_USER_EXT | 3) /* see prelim_i915_context_engines_parallel2_submit */ };

enum prelim_drm_i915_gem_memory_class {

Matthew Brost

12:30 a.m.

New subject: [PATCH 2/2] REVIEW: Full tree diff against internal/internal

Auto-generated diff between internal/internal..internal --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 190 +++++++++++++++++++++- drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 6 - drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 73 ++++++--- drivers/gpu/drm/i915/gt/intel_context.c | 2 + drivers/gpu/drm/i915/gt/intel_context.h | 11 ++ drivers/gpu/drm/i915/gt/intel_context_types.h | 6 +- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 4 +- drivers/gpu/drm/i915/i915_request.c | 7 +- include/uapi/drm/i915_drm_prelim.h | 115 +++++++++++++ 9 files changed, 377 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 82b7af55ba05..583113c58e9c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -389,7 +389,6 @@ void i915_gem_context_release(struct kref *ref) mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex); mutex_destroy(&ctx->mutex); - mutex_destroy(&ctx->parallel_submit);

kfree_rcu(ctx, rcu); } @@ -769,8 +768,6 @@ static struct i915_gem_context *__create_context(struct intel_gt *gt) mutex_init(&ctx->mutex); INIT_LIST_HEAD(&ctx->link);

- mutex_init(&ctx->parallel_submit); - spin_lock_init(&ctx->stale.lock); INIT_LIST_HEAD(&ctx->stale.engines);

@@ -2093,6 +2090,48 @@ static bool validate_parallel_engines_layout(const struct set_engines *set) return true; }

+/* + * Engine must be same class and form a logically contiguous mask. + * + * FIXME: Logical mask check not 100% correct but good enough for the PoC + */ +static bool __validate_parallel_engines_layout(struct drm_i915_private *i915, + struct intel_context *parent) +{ + u8 engine_class = parent->engine->class; + u8 num_siblings = hweight_long(parent->engine->logical_mask); + struct intel_context *child; + intel_engine_mask_t logical_mask = parent->engine->logical_mask; + + for_each_child(parent, child) { + if (child->engine->class != engine_class) { + drm_dbg(&i915->drm, "Class mismatch: %u, %u", + engine_class, child->engine->class); + return false; + } + if (hweight_long(child->engine->logical_mask) != num_siblings) { + drm_dbg(&i915->drm, "Sibling mismatch: %u, %lu", + num_siblings, + hweight_long(child->engine->logical_mask)); + return false; + } + if (logical_mask & child->engine->logical_mask) { + drm_dbg(&i915->drm, "Overlapping logical mask: 0x%04x, 0x%04x", + logical_mask, child->engine->logical_mask); + return false; + } + logical_mask |= child->engine->logical_mask; + } + + if (!is_power_of_2((logical_mask >> (ffs(logical_mask) - 1)) + 1)) { + drm_dbg(&i915->drm, "Non-contiguous logical mask: 0x%04x", + logical_mask); + return false; + } + + return true; +} + static int set_engines__parallel_submit(struct i915_user_extension __user *base, void *data) { @@ -2245,11 +2284,156 @@ set_engines__parallel_submit(struct i915_user_extension __user *base, void *data return err; }

static int diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 70b6103d9a32..c70c6d042f06 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -206,12 +206,6 @@ struct i915_gem_context { */ u64 fence_context;

- /** - * @parallel_submit: Ensure only 1 parallel submission is happening on - * this context at a time. - */ - struct mutex parallel_submit; - /** * @seqno: Seqno when using when a parallel context. */ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index cdf37c65b904..3cd4327c25f3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1682,7 +1682,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, goto err_unmap;

if (engine == eb->context->engine && - !i915_gem_context_is_parallel(eb->gem_context)) { + !intel_context_is_parallel(eb->context)) { rq = i915_request_create(eb->context); } else { struct intel_context *ce = eb->reloc_context; @@ -1776,7 +1776,7 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb, struct intel_engine_cs *engine = eb->engine;

if (!reloc_can_use_engine(engine) || - i915_gem_context_is_parallel(eb->gem_context)) { + intel_context_is_parallel(eb->context)) { engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0]; if (!engine) return ERR_PTR(-ENODEV); @@ -3308,7 +3308,7 @@ eb_select_legacy_ring(struct i915_execbuffer *eb) }

static int -eb_select_engine(struct i915_execbuffer *eb) +eb_select_engine(struct i915_execbuffer *eb, unsigned int batch_number) { struct intel_context *ce; unsigned int idx; @@ -3326,6 +3326,16 @@ eb_select_engine(struct i915_execbuffer *eb) if (IS_ERR(ce)) return PTR_ERR(ce);

if (i915_gem_context_uses_protected_content(eb->gem_context)) { @@ -3687,7 +3697,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, goto err_context; }

- err = eb_select_engine(&eb); + err = eb_select_engine(&eb, batch_number); if (unlikely(err)) goto err_context;

@@ -3900,6 +3910,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, const size_t count = args->buffer_count; unsigned int num_batches, i; int err, start_context; + bool is_parallel = false; + struct intel_context *parent = NULL;

if (!check_buffer_count(count)) { drm_dbg(&i915->drm, "execbuf2 with %zd buffers\n", count); @@ -3936,15 +3948,35 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, I915_EXEC_NUMBER_BB_LSB) + ((args->flags & PRELIM_I915_EXEC_NUMBER_BB_MASK) >> PRELIM_I915_EXEC_NUMBER_BB_LSB)) + 1; - if (i915_gem_context_is_parallel(ctx)) { - if (num_batches > count || - start_context + num_batches > ctx->width) { - err = -EINVAL; - goto err_context; + + if (i915_gem_context_user_engines(ctx)) { + parent = i915_gem_context_get_engine(ctx, start_context); + if (IS_ERR(parent)) { + i915_gem_context_put(ctx); + return PTR_ERR(parent); }

- if (i915_gem_context_is_bb_preempt_boundary(ctx) && - (start_context || num_batches != ctx->width)) { + is_parallel = i915_gem_context_is_parallel(ctx) || + intel_context_is_parallel(parent); + if (i915_gem_context_is_parallel(ctx)) { + if (num_batches > count || + start_context + num_batches > ctx->width) { + err = -EINVAL; + goto err_context; + } + + if (i915_gem_context_is_bb_preempt_boundary(ctx) && + (start_context || num_batches != ctx->width)) { + err = -EINVAL; + goto err_context; + } + } else if (intel_context_is_parallel(parent)) { + if (num_batches != 1) + return -EINVAL; + num_batches = parent->guc_number_children + 1; + if (num_batches > count) + return -EINVAL; + } else if(num_batches > 1) { err = -EINVAL; goto err_context; } @@ -3981,8 +4013,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, * properly, also this is needed to create an excl fence for an dma buf * objects these BBs touch. */ - if (args->flags & I915_EXEC_FENCE_OUT || - i915_gem_context_is_parallel(ctx)) { + if (args->flags & I915_EXEC_FENCE_OUT || is_parallel) { out_fences = kcalloc(num_batches, sizeof(*out_fences), GFP_KERNEL); if (!out_fences) { @@ -4028,8 +4059,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, * in intel_context sequence, thus only 1 submission can happen at a * time. */ - if (i915_gem_context_is_parallel(ctx)) - mutex_lock(&ctx->parallel_submit); + if (is_parallel) + mutex_lock(&parent->parallel_submit);

err = i915_gem_do_execbuffer(dev, file, args, exec2_list, args->flags & I915_EXEC_BATCH_FIRST ? @@ -4043,8 +4074,10 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, &ww);

err = i915_gem_do_execbuffer(dev, file, args, exec2_list, @@ -4059,8 +4092,8 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, &ww); }

- if (i915_gem_context_is_parallel(ctx)) - mutex_unlock(&ctx->parallel_submit); + if (is_parallel) + mutex_unlock(&parent->parallel_submit);

/* * Now that we have begun execution of the batchbuffer, we ignore @@ -4164,6 +4197,8 @@ end:; dma_fence_put(in_fence); err_context: i915_gem_context_put(ctx); + if (parent) + intel_context_put(parent);

return err; } diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 3e704be30501..fa36419b7fa4 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -469,6 +469,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) INIT_LIST_HEAD(&ce->signals);

mutex_init(&ce->pin_mutex); + mutex_init(&ce->parallel_submit);

spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences); @@ -507,6 +508,7 @@ void intel_context_fini(struct intel_context *ce) intel_context_put(child);

mutex_destroy(&ce->pin_mutex); + mutex_destroy(&ce->parallel_submit); i915_active_fini(&ce->active); }

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 88d4a71c5d29..c401cad7a231 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -52,6 +52,11 @@ static inline bool intel_context_is_parent(struct intel_context *ce) return !!ce->guc_number_children; }

+static inline bool intel_context_is_parallel(struct intel_context *ce) +{ + return intel_context_is_child(ce) || intel_context_is_parent(ce); +} + /* Only should be called directly by selftests */ void __intel_context_bind_parent_child(struct intel_context *parent, struct intel_context *child); @@ -205,6 +210,12 @@ static inline bool intel_context_is_barrier(const struct intel_context *ce) return test_bit(CONTEXT_BARRIER_BIT, &ce->flags); }

+static inline bool +intel_context_is_no_preempt_mid_batch(const struct intel_context *ce) +{ + return test_bit(CONTEXT_NO_PREEMPT_MID_BATCH, &ce->flags); +} + static inline bool intel_context_is_closed(const struct intel_context *ce) { return test_bit(CONTEXT_CLOSED_BIT, &ce->flags); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 7bf73704b250..a7c9bf87ef23 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -125,7 +125,8 @@ struct intel_context { #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 #define CONTEXT_NOPREEMPT 8 #define CONTEXT_LRCA_DIRTY 9 -#define CONTEXT_DEBUG 10 +#define CONTEXT_NO_PREEMPT_MID_BATCH 10 +#define CONTEXT_DEBUG 11

struct { u64 timeout_us; @@ -252,6 +253,9 @@ struct intel_context { /* Last request submitted on a parent */ struct i915_request *last_rq;

+ /* Parallel submit mutex */ + struct mutex parallel_submit; + /* GuC context blocked fence */ struct i915_sw_fence guc_blocked; }; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c285160be8b3..c935c8b7f557 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -832,8 +832,7 @@ static inline int rq_prio(const struct i915_request *rq)

static inline bool is_multi_lrc(struct intel_context *ce) { - return intel_context_is_child(ce) || - intel_context_is_parent(ce); + return intel_context_is_parallel(ce); }

static inline bool is_multi_lrc_rq(struct i915_request *rq) @@ -3827,6 +3826,7 @@ void intel_guc_configure_parent_context(struct intel_context *ce) bb_preempt_boundary = i915_gem_context_is_bb_preempt_boundary(ctx); rcu_read_unlock(); + bb_preempt_boundary |= intel_context_is_no_preempt_mid_batch(ce); if (bb_preempt_boundary) { ce->emit_bb_start = emit_bb_start_parent_bb_preempt_boundary; ce->emit_fini_breadcrumb = diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 65c31d359e81..e1db7d9f27f8 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1721,14 +1721,9 @@ i915_request_await_object(struct i915_request *to, return ret; }

static inline struct intel_context *request_to_parent(struct i915_request *rq) diff --git a/include/uapi/drm/i915_drm_prelim.h b/include/uapi/drm/i915_drm_prelim.h index 577e99801690..ac5cc639e078 100644 --- a/include/uapi/drm/i915_drm_prelim.h +++ b/include/uapi/drm/i915_drm_prelim.h @@ -884,9 +884,124 @@ struct prelim_i915_context_engines_parallel_submit { } __attribute__ ((packed)); #define i915_context_engines_parallel_submit prelim_i915_context_engines_parallel_submit

enum prelim_drm_i915_oa_format { -- git-pile 0.97

1392

Age (days ago)

1394

Last active (days ago)

dri-devel@lists.freedesktop.org

20 comments

3 participants

tags (0)

participants (3)

John Harrison
Matthew Brost
Michal Wajdeczko