As discussed in [1], [2] we are enabling GuC submission support in the i915. This is a subset of the patches in step 5 described in [1], basically it is absolute to enable CI with GuC submission on gen11+ platforms.
This series itself will likely be broken down into smaller patch sets to merge. Likely into CTBs changes, basic submission, virtual engines, and resets.
A following series will address the missing patches remaining from [1].
Locally tested on TGL machine and basic tests seem to be passing.
Signed-off-by: Matthew Brost matthew.brost@intel.com
[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/
Daniele Ceraolo Spurio (1): drm/i915/guc: Unblock GuC submission on Gen11+
John Harrison (10): drm/i915/guc: Module load failure test for CT buffer creation drm/i915: Track 'serial' counts for virtual engines drm/i915/guc: Provide mmio list to be saved/restored on engine reset drm/i915/guc: Don't complain about reset races drm/i915/guc: Enable GuC engine reset drm/i915/guc: Fix for error capture after full GPU reset with GuC drm/i915/guc: Hook GuC scheduling policies up drm/i915/guc: Connect reset modparam updates to GuC policy flags drm/i915/guc: Include scheduling policies in the debugfs state dump drm/i915/guc: Add golden context to GuC ADS
Matthew Brost (36): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads drm/i915/guc: Add new GuC interface defines and structures drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor drm/i915/guc: Add lrc descriptor context lookup array drm/i915/guc: Implement GuC submission tasklet drm/i915/guc: Add bypass tasklet submission path to GuC drm/i915/guc: Implement GuC context operations for new inteface drm/i915/guc: Insert fence on context when deregistering drm/i915/guc: Defer context unpin until scheduling is disabled drm/i915/guc: Disable engine barriers with GuC during unpin drm/i915/guc: Extend deregistration fence to schedule disable drm/i915: Disable preempt busywait when using GuC scheduling drm/i915/guc: Ensure request ordering via completion fences drm/i915/guc: Disable semaphores when using GuC scheduling drm/i915/guc: Ensure G2H response has space in buffer drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC drm/i915/guc: Update GuC debugfs to support new GuC drm/i915/guc: Add several request trace points drm/i915: Add intel_context tracing drm/i915/guc: GuC virtual engines drm/i915: Hold reference to intel_context over life of i915_request drm/i915/guc: Disable bonding extension with GuC submission drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs drm/i915/guc: Reset implementation for new GuC interface drm/i915: Reset GPU immediately if submission is disabled drm/i915/guc: Add disable interrupts to guc sanitize drm/i915/guc: Suspend/resume implementation for new interface drm/i915/guc: Handle context reset notification drm/i915/guc: Handle engine reset failure notification drm/i915/guc: Enable the timer expired interrupt for GuC drm/i915/guc: Capture error state on context reset
drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + drivers/gpu/drm/i915/gt/intel_context.c | 41 +- drivers/gpu/drm/i915/gt/intel_context.h | 31 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- .../drm/i915/gt/intel_execlists_submission.c | 234 +- .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/intel_gt.c | 21 + drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + drivers/gpu/drm/i915/gt/intel_rps.c | 4 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- drivers/gpu/drm/i915/gt/selftest_context.c | 10 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/i915_request.c | 159 +- drivers/gpu/drm/i915/i915_request.h | 21 + drivers/gpu/drm/i915/i915_scheduler.c | 6 + drivers/gpu/drm/i915/i915_scheduler.h | 6 + drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + drivers/gpu/drm/i915/i915_trace.h | 197 +- .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 57 files changed, 4159 insertions(+), 787 deletions(-)
In upcoming patch we will allow more CTB requests to be sent in parallel to the GuC for processing, so we shouldn't assume any more that GuC will always reply without 10ms.
Use bigger value hardcoded value of 1s instead.
v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option v3: (Daniel Vetter) - Use hardcoded value of 1s rather than config option
Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Michal Wajdeczko michal.wajdeczko@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 43409044528e..a59e239497ee 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -474,14 +474,16 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) /* * Fast commands should complete in less than 10us, so sample quickly * up to that length of time, then switch to a slower sleep-wait loop. - * No GuC command should ever take longer than 10ms. + * No GuC command should ever take longer than 10ms but many GuC + * commands can be inflight at time, so use a 1s timeout on the slower + * sleep-wait loop. */ #define done \ (FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \ GUC_HXG_ORIGIN_GUC) err = wait_for_us(done, 10); if (err) - err = wait_for(done, 10); + err = wait_for(done, 1000); #undef done
if (unlikely(err))
On 24.06.2021 09:04, Matthew Brost wrote:
In upcoming patch we will allow more CTB requests to be sent in parallel to the GuC for processing, so we shouldn't assume any more that GuC will always reply without 10ms.
Use bigger value hardcoded value of 1s instead.
v2: Add CONFIG_DRM_I915_GUC_CTB_TIMEOUT config option v3: (Daniel Vetter)
- Use hardcoded value of 1s rather than config option
Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Michal Wajdeczko michal.wajdeczko@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 43409044528e..a59e239497ee 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -474,14 +474,16 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) /* * Fast commands should complete in less than 10us, so sample quickly * up to that length of time, then switch to a slower sleep-wait loop.
* No GuC command should ever take longer than 10ms.
* No GuC command should ever take longer than 10ms but many GuC
* commands can be inflight at time, so use a 1s timeout on the slower
*/* sleep-wait loop.
#define done \ (FIELD_GET(GUC_HXG_MSG_0_ORIGIN, READ_ONCE(req->status)) == \ GUC_HXG_ORIGIN_GUC) err = wait_for_us(done, 10); if (err)
err = wait_for(done, 10);
err = wait_for(done, 1000);
can we add #defines for these 10/1000 values? with that
Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com
#undef done
if (unlikely(err))
Improve the error message when a unsolicited CT response is received by printing fence that couldn't be found, the last fence, and all requests with a response outstanding.
Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a59e239497ee..07f080ddb9ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -730,12 +730,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r found = true; break; } - spin_unlock_irqrestore(&ct->requests.lock, flags); - if (!found) { CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence); - return -ENOKEY; + CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence, + ct->requests.last_fence); + list_for_each_entry(req, &ct->requests.pending, link) + CT_ERROR(ct, "request %u awaits response\n", + req->fence); + err = -ENOKEY; } + spin_unlock_irqrestore(&ct->requests.lock, flags);
if (unlikely(err)) return err;
On 24.06.2021 09:04, Matthew Brost wrote:
Improve the error message when a unsolicited CT response is received by printing fence that couldn't be found, the last fence, and all requests with a response outstanding.
Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a59e239497ee..07f080ddb9ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -730,12 +730,16 @@ static int ct_handle_response(struct intel_guc_ct *ct, struct ct_incoming_msg *r found = true; break; }
- spin_unlock_irqrestore(&ct->requests.lock, flags);
- if (!found) { CT_ERROR(ct, "Unsolicited response (fence %u)\n", fence);
return -ENOKEY;
CT_ERROR(ct, "Could not find fence=%u, last_fence=%u\n", fence,
ct->requests.last_fence);
list_for_each_entry(req, &ct->requests.pending, link)
CT_ERROR(ct, "request %u awaits response\n",
req->fence);
not quite sure how listing of awaiting requests could help here (if we suspect that this is a duplicated reply, then we should rather track short list of already processed messages to look there) but since it does not hurt too much, this is:
Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com
err = -ENOKEY;
}
spin_unlock_irqrestore(&ct->requests.lock, flags);
if (unlikely(err)) return err;
With the introduction of non-blocking CTBs more than one CTB can be in flight at a time. Increasing the size of the CTBs should reduce how often software hits the case where no space is available in the CTB buffer.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 07f080ddb9ae..a17215920e58 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) * +--------+-----------------------------------------------+------+ * * Size of each `CT Buffer`_ must be multiple of 4K. - * As we don't expect too many messages, for now use minimum sizes. + * We don't expect too many messages in flight at any time, unless we are + * using the GuC submission. In that case each request requires a minimum + * 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this + * enough space to avoid backpressure on the driver. We increase the size + * of the receive buffer (relative to the send) to ensure a G2H response + * CTB has a landing spot. */ #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) -#define CTB_G2H_BUFFER_SIZE (SZ_4K) +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE)
struct ct_request { struct list_head link; @@ -641,7 +646,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* beware of buffer wrap case */ if (unlikely(available < 0)) available += size; - CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail); + CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size); GEM_BUG_ON(available < 0);
header = cmds[head];
On 24.06.2021 09:04, Matthew Brost wrote:
With the introduction of non-blocking CTBs more than one CTB can be in flight at a time. Increasing the size of the CTBs should reduce how often software hits the case where no space is available in the CTB buffer.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 07f080ddb9ae..a17215920e58 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
+--------+-----------------------------------------------+------+
- Size of each `CT Buffer`_ must be multiple of 4K.
- As we don't expect too many messages, for now use minimum sizes.
- We don't expect too many messages in flight at any time, unless we are
- using the GuC submission. In that case each request requires a minimum
- 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this
- enough space to avoid backpressure on the driver. We increase the size
- of the receive buffer (relative to the send) to ensure a G2H response
*/
- CTB has a landing spot.
#define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) -#define CTB_G2H_BUFFER_SIZE (SZ_4K) +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE)
struct ct_request { struct list_head link; @@ -641,7 +646,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* beware of buffer wrap case */ if (unlikely(available < 0)) available += size;
- CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
- CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
CTB size is already printed in intel_guc_ct_init() and is fixed so not sure if repeating it on every ct_read has any benefit
GEM_BUG_ON(available < 0);
header = cmds[head];
On Thu, Jun 24, 2021 at 03:49:55PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
With the introduction of non-blocking CTBs more than one CTB can be in flight at a time. Increasing the size of the CTBs should reduce how often software hits the case where no space is available in the CTB buffer.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 07f080ddb9ae..a17215920e58 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
+--------+-----------------------------------------------+------+
- Size of each `CT Buffer`_ must be multiple of 4K.
- As we don't expect too many messages, for now use minimum sizes.
- We don't expect too many messages in flight at any time, unless we are
- using the GuC submission. In that case each request requires a minimum
- 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this
- enough space to avoid backpressure on the driver. We increase the size
- of the receive buffer (relative to the send) to ensure a G2H response
*/
- CTB has a landing spot.
#define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) -#define CTB_G2H_BUFFER_SIZE (SZ_4K) +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE)
struct ct_request { struct list_head link; @@ -641,7 +646,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* beware of buffer wrap case */ if (unlikely(available < 0)) available += size;
- CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
- CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
CTB size is already printed in intel_guc_ct_init() and is fixed so not sure if repeating it on every ct_read has any benefit
I'd say more debug the better and if CT_DEBUG is enabled the logs are very verbose so an extra value doesn't really hurt.
Matt
GEM_BUG_ON(available < 0);
header = cmds[head];
On 24.06.2021 17:41, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 03:49:55PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
With the introduction of non-blocking CTBs more than one CTB can be in flight at a time. Increasing the size of the CTBs should reduce how often software hits the case where no space is available in the CTB buffer.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 07f080ddb9ae..a17215920e58 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -58,11 +58,16 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
+--------+-----------------------------------------------+------+
- Size of each `CT Buffer`_ must be multiple of 4K.
- As we don't expect too many messages, for now use minimum sizes.
- We don't expect too many messages in flight at any time, unless we are
- using the GuC submission. In that case each request requires a minimum
- 2 dwords which gives us a maximum 256 queue'd requests. Hopefully this
- enough space to avoid backpressure on the driver. We increase the size
- of the receive buffer (relative to the send) to ensure a G2H response
*/
- CTB has a landing spot.
#define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) -#define CTB_G2H_BUFFER_SIZE (SZ_4K) +#define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE)
struct ct_request { struct list_head link; @@ -641,7 +646,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* beware of buffer wrap case */ if (unlikely(available < 0)) available += size;
- CT_DEBUG(ct, "available %d (%u:%u)\n", available, head, tail);
- CT_DEBUG(ct, "available %d (%u:%u:%u)\n", available, head, tail, size);
CTB size is already printed in intel_guc_ct_init() and is fixed so not sure if repeating it on every ct_read has any benefit
I'd say more debug the better and if CT_DEBUG is enabled the logs are very verbose so an extra value doesn't really hurt.
fair, but this doesn't mean we should add little/no value item, anyway since DEBUG_GUC is if off by default, this is:
Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com
Matt
GEM_BUG_ON(available < 0);
header = cmds[head];
Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function.
The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer.
The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 +- 3 files changed, 82 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..24b1df6ad4ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) { - return intel_guc_ct_send(&guc->ct, action, len, NULL, 0); + return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0); +} + +#define INTEL_GUC_SEND_NB BIT(31) +static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{ + return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, + INTEL_GUC_SEND_NB); }
static inline int @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len, - response_buf, response_buf_size); + response_buf, response_buf_size, 0); }
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a17215920e58..c9a65d05911f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,11 @@ * Copyright © 2016-2019 Intel Corporation */
+#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h> + #include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */, - u32 fence) + u32 fence, u32 flags) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
- hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | - FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION | - GUC_HXG_REQUEST_MSG_0_DATA0, action[0]); + hxg = (flags & INTEL_GUC_SEND_NB) ? + (FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) | + FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION | + GUC_HXG_EVENT_MSG_0_DATA0, action[0])) : + (FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) | + FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION | + GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +{ + struct guc_ct_buffer_desc *desc = ctb->desc; + u32 head = READ_ONCE(desc->head); + u32 space; + + space = CIRC_SPACE(desc->tail, head, ctb->size); + + return space >= len_dw; +} + +static int ct_send_nb(struct intel_guc_ct *ct, + const u32 *action, + u32 len, + u32 flags) +{ + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + unsigned long spin_flags; + u32 fence; + int ret; + + spin_lock_irqsave(&ctb->lock, spin_flags); + + ret = h2g_has_room(ctb, len + 1); + if (unlikely(ret)) + goto out; + + fence = ct_get_next_fence(ct); + ret = ct_write(ct, action, len, fence, flags); + if (unlikely(ret)) + goto out; + + intel_guc_notify(ct_to_guc(ct)); + +out: + spin_unlock_irqrestore(&ctb->lock, spin_flags); + + return ret; +} + static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) { + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; u32 fence; @@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size); + might_sleep();
+ /* + * We use a lazy spin wait loop here as we believe that if the CT + * buffers are sized correctly the flow control condition should be + * rare. + */ +retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags); + if (unlikely(!h2g_has_room(ctb, len + 1))) { + spin_unlock_irqrestore(&ct->ctbs.send.lock, flags); + cond_resched(); + goto retry; + }
fence = ct_get_next_fence(ct); request.fence = fence; @@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock);
- err = ct_write(ct, action, len, fence); + err = ct_write(ct, action, len, fence, 0);
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
@@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct, * Command Transport (CT) buffer based GuC send function. */ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, - u32 *response_buf, u32 response_buf_size) + u32 *response_buf, u32 response_buf_size, u32 flags) { u32 status = ~0; /* undefined */ int ret; @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }
+ if (flags & INTEL_GUC_SEND_NB) + return ct_send_nb(ct, action, len, flags); + ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n", diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 1ae2dde6db93..eb69263324ba 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer { bool broken; };
- /** Top-level structure for Command Transport related data * * Includes a pair of CT buffers for bi-directional communication and tracking @@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) }
int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, - u32 *response_buf, u32 response_buf_size); + u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
#endif /* _INTEL_GUC_CT_H_ */
On 24.06.2021 09:04, Matthew Brost wrote:
Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function.
The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer.
The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 +- 3 files changed, 82 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..24b1df6ad4ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) {
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+#define INTEL_GUC_SEND_NB BIT(31)
hmm, this flag really belongs to intel_guc_ct_send() so it should be defined as CTB flag near that function declaration
+static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
INTEL_GUC_SEND_NB);
}
static inline int @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len,
response_buf, response_buf_size);
response_buf, response_buf_size, 0);
}
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a17215920e58..c9a65d05911f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,11 @@
- Copyright © 2016-2019 Intel Corporation
*/
+#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h>
#include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */,
u32 fence)
u32 fence, u32 flags)
{ struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
- hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
- hxg = (flags & INTEL_GUC_SEND_NB) ?
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
or as we already switched to accept and return whole HXG messages in guc_send_mmio() maybe we should do the same for CTB variant too and instead of using extra flag just let caller to prepare proper HXG header with HXG_EVENT type and then in CTB code just look at this type to make decision which code path to use
note that existing callers should not be impacted, as full HXG header for the REQUEST message looks exactly the same as "action" code alone.
CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +{
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- return space >= len_dw;
here you are returning true(1) as has room
+}
+static int ct_send_nb(struct intel_guc_ct *ct,
const u32 *action,
u32 len,
u32 flags)
+{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- unsigned long spin_flags;
- u32 fence;
- int ret;
- spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1);
but here you treat "1" it as en error
and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
- if (unlikely(ret))
goto out;
- fence = ct_get_next_fence(ct);
- ret = ct_write(ct, action, len, fence, flags);
- if (unlikely(ret))
goto out;
- intel_guc_notify(ct_to_guc(ct));
+out:
- spin_unlock_irqrestore(&ctb->lock, spin_flags);
- return ret;
+}
static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) {
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; u32 fence;
@@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size);
might_sleep();
/*
* We use a lazy spin wait loop here as we believe that if the CT
* buffers are sized correctly the flow control condition should be
* rare.
shouldn't we at least try to log such cases with RATE_LIMITED to find out how "rare" it is, or if really unlikely just return -EBUSY as in case of non-blocking send ?
*/
+retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
if (unlikely(!h2g_has_room(ctb, len + 1))) {
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
cond_resched();
goto retry;
}
fence = ct_get_next_fence(ct); request.fence = fence;
@@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock);
- err = ct_write(ct, action, len, fence);
err = ct_write(ct, action, len, fence, 0);
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
@@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
- Command Transport (CT) buffer based GuC send function.
*/ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size)
u32 *response_buf, u32 response_buf_size, u32 flags)
{ u32 status = ~0; /* undefined */ int ret; @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }
- if (flags & INTEL_GUC_SEND_NB)
return ct_send_nb(ct, action, len, flags);
- ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 1ae2dde6db93..eb69263324ba 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer { bool broken; };
/** Top-level structure for Command Transport related data
- Includes a pair of CT buffers for bi-directional communication and tracking
@@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) }
int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size);
u32 *response_buf, u32 response_buf_size, u32 flags);
void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
#endif /* _INTEL_GUC_CT_H_ */
On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function.
The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer.
The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 +- 3 files changed, 82 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..24b1df6ad4ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) {
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+#define INTEL_GUC_SEND_NB BIT(31)
hmm, this flag really belongs to intel_guc_ct_send() so it should be defined as CTB flag near that function declaration
I can move this up a few lines.
+static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
INTEL_GUC_SEND_NB);
}
static inline int @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len,
response_buf, response_buf_size);
response_buf, response_buf_size, 0);
}
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a17215920e58..c9a65d05911f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,11 @@
- Copyright © 2016-2019 Intel Corporation
*/
+#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h>
#include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */,
u32 fence)
u32 fence, u32 flags)
{ struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
- hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
- hxg = (flags & INTEL_GUC_SEND_NB) ?
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
or as we already switched to accept and return whole HXG messages in guc_send_mmio() maybe we should do the same for CTB variant too and instead of using extra flag just let caller to prepare proper HXG header with HXG_EVENT type and then in CTB code just look at this type to make decision which code path to use
Not sure I follow. Anyways could this be done in a follow up by you if want this change.
note that existing callers should not be impacted, as full HXG header for the REQUEST message looks exactly the same as "action" code alone.
CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +{
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- return space >= len_dw;
here you are returning true(1) as has room
See below.
+}
+static int ct_send_nb(struct intel_guc_ct *ct,
const u32 *action,
u32 len,
u32 flags)
+{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- unsigned long spin_flags;
- u32 fence;
- int ret;
- spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1);
but here you treat "1" it as en error
Yes, this patch is broken but fixed in a follow up one. Regardless I'll fix this patch in place.
and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
Not exactly. This is following how ct_send() uses the action + len field. Action[0] field goes in the HXG header and extra + 1 is for the CT header.
- if (unlikely(ret))
goto out;
- fence = ct_get_next_fence(ct);
- ret = ct_write(ct, action, len, fence, flags);
- if (unlikely(ret))
goto out;
- intel_guc_notify(ct_to_guc(ct));
+out:
- spin_unlock_irqrestore(&ctb->lock, spin_flags);
- return ret;
+}
static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) {
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; u32 fence;
@@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size);
might_sleep();
/*
* We use a lazy spin wait loop here as we believe that if the CT
* buffers are sized correctly the flow control condition should be
* rare.
shouldn't we at least try to log such cases with RATE_LIMITED to find out how "rare" it is, or if really unlikely just return -EBUSY as in case of non-blocking send ?
Definitely not return -EBUSY as this a blocking call. Perhaps we can log this, but IGTs likely can hit rather easily. It really is only interesting if real workloads hit this. Regardless that can be a follow up.
Matt
*/
+retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
if (unlikely(!h2g_has_room(ctb, len + 1))) {
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
cond_resched();
goto retry;
}
fence = ct_get_next_fence(ct); request.fence = fence;
@@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock);
- err = ct_write(ct, action, len, fence);
err = ct_write(ct, action, len, fence, 0);
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
@@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
- Command Transport (CT) buffer based GuC send function.
*/ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size)
u32 *response_buf, u32 response_buf_size, u32 flags)
{ u32 status = ~0; /* undefined */ int ret; @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }
- if (flags & INTEL_GUC_SEND_NB)
return ct_send_nb(ct, action, len, flags);
- ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 1ae2dde6db93..eb69263324ba 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer { bool broken; };
/** Top-level structure for Command Transport related data
- Includes a pair of CT buffers for bi-directional communication and tracking
@@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) }
int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size);
u32 *response_buf, u32 response_buf_size, u32 flags);
void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
#endif /* _INTEL_GUC_CT_H_ */
On 24.06.2021 17:49, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function.
The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer.
The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 +- 3 files changed, 82 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..24b1df6ad4ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) {
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+#define INTEL_GUC_SEND_NB BIT(31)
hmm, this flag really belongs to intel_guc_ct_send() so it should be defined as CTB flag near that function declaration
I can move this up a few lines.
+static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
INTEL_GUC_SEND_NB);
}
static inline int @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len,
response_buf, response_buf_size);
response_buf, response_buf_size, 0);
}
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a17215920e58..c9a65d05911f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,11 @@
- Copyright © 2016-2019 Intel Corporation
*/
+#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h>
#include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */,
u32 fence)
u32 fence, u32 flags)
{ struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
- hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
- hxg = (flags & INTEL_GUC_SEND_NB) ?
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
or as we already switched to accept and return whole HXG messages in guc_send_mmio() maybe we should do the same for CTB variant too and instead of using extra flag just let caller to prepare proper HXG header with HXG_EVENT type and then in CTB code just look at this type to make decision which code path to use
Not sure I follow. Anyways could this be done in a follow up by you if want this change.
note that existing callers should not be impacted, as full HXG header for the REQUEST message looks exactly the same as "action" code alone.
CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +{
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- return space >= len_dw;
here you are returning true(1) as has room
See below.
+}
+static int ct_send_nb(struct intel_guc_ct *ct,
const u32 *action,
u32 len,
u32 flags)
+{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- unsigned long spin_flags;
- u32 fence;
- int ret;
- spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1);
but here you treat "1" it as en error
Yes, this patch is broken but fixed in a follow up one. Regardless I'll fix this patch in place.
and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
Not exactly. This is following how ct_send() uses the action + len field. Action[0] field goes in the HXG header and extra + 1 is for the CT header.
well, "len" already counts "action" so by treating input as full HXG message (including HXG header) will make it cleaner
we can try do it later but by doing it right now we would avoid introducing this send_nb() function and deprecating them long term again
- if (unlikely(ret))
goto out;
- fence = ct_get_next_fence(ct);
- ret = ct_write(ct, action, len, fence, flags);
- if (unlikely(ret))
goto out;
- intel_guc_notify(ct_to_guc(ct));
+out:
- spin_unlock_irqrestore(&ctb->lock, spin_flags);
- return ret;
+}
static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) {
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; u32 fence;
@@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size);
might_sleep();
/*
* We use a lazy spin wait loop here as we believe that if the CT
* buffers are sized correctly the flow control condition should be
* rare.
shouldn't we at least try to log such cases with RATE_LIMITED to find out how "rare" it is, or if really unlikely just return -EBUSY as in case of non-blocking send ?
Definitely not return -EBUSY as this a blocking call. Perhaps we can log
blocking calls still can fail for various reasons, full CTB is one of them, and if we return error (now broken) for non-blocking variant then we should do the same for blocking variant as well and let the caller decide about next steps
this, but IGTs likely can hit rather easily. It really is only interesting if real workloads hit this. Regardless that can be a follow up.
if we hide retry in a silent loop then we will not find it out if we hit this condition (IGT or real WL) or not
Matt
*/
+retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
if (unlikely(!h2g_has_room(ctb, len + 1))) {
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
cond_resched();
goto retry;
}
fence = ct_get_next_fence(ct); request.fence = fence;
@@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock);
- err = ct_write(ct, action, len, fence);
err = ct_write(ct, action, len, fence, 0);
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
@@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
- Command Transport (CT) buffer based GuC send function.
*/ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size)
u32 *response_buf, u32 response_buf_size, u32 flags)
{ u32 status = ~0; /* undefined */ int ret; @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }
- if (flags & INTEL_GUC_SEND_NB)
return ct_send_nb(ct, action, len, flags);
- ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 1ae2dde6db93..eb69263324ba 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer { bool broken; };
/** Top-level structure for Command Transport related data
- Includes a pair of CT buffers for bi-directional communication and tracking
@@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) }
int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size);
u32 *response_buf, u32 response_buf_size, u32 flags);
void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
#endif /* _INTEL_GUC_CT_H_ */
On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 17:49, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function.
The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer.
The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 +- 3 files changed, 82 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..24b1df6ad4ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) {
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+#define INTEL_GUC_SEND_NB BIT(31)
hmm, this flag really belongs to intel_guc_ct_send() so it should be defined as CTB flag near that function declaration
I can move this up a few lines.
+static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
INTEL_GUC_SEND_NB);
}
static inline int @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len,
response_buf, response_buf_size);
response_buf, response_buf_size, 0);
}
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a17215920e58..c9a65d05911f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,11 @@
- Copyright © 2016-2019 Intel Corporation
*/
+#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h>
#include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */,
u32 fence)
u32 fence, u32 flags)
{ struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
- hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
- hxg = (flags & INTEL_GUC_SEND_NB) ?
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
or as we already switched to accept and return whole HXG messages in guc_send_mmio() maybe we should do the same for CTB variant too and instead of using extra flag just let caller to prepare proper HXG header with HXG_EVENT type and then in CTB code just look at this type to make decision which code path to use
Not sure I follow. Anyways could this be done in a follow up by you if want this change.
note that existing callers should not be impacted, as full HXG header for the REQUEST message looks exactly the same as "action" code alone.
CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +{
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- return space >= len_dw;
here you are returning true(1) as has room
See below.
+}
+static int ct_send_nb(struct intel_guc_ct *ct,
const u32 *action,
u32 len,
u32 flags)
+{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- unsigned long spin_flags;
- u32 fence;
- int ret;
- spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1);
but here you treat "1" it as en error
Yes, this patch is broken but fixed in a follow up one. Regardless I'll fix this patch in place.
and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
Not exactly. This is following how ct_send() uses the action + len field. Action[0] field goes in the HXG header and extra + 1 is for the CT header.
well, "len" already counts "action" so by treating input as full HXG message (including HXG header) will make it cleaner
Yes, I know. See above. To me GUC_HXG_MSG_MIN_LEN makes zero sense and it is worse than adding + 1. This + 1 accounts for the CT header not the HXG header. If any we add a new define, GUC_CT_HDR_LEN, and add that.
Matt
we can try do it later but by doing it right now we would avoid introducing this send_nb() function and deprecating them long term again
- if (unlikely(ret))
goto out;
- fence = ct_get_next_fence(ct);
- ret = ct_write(ct, action, len, fence, flags);
- if (unlikely(ret))
goto out;
- intel_guc_notify(ct_to_guc(ct));
+out:
- spin_unlock_irqrestore(&ctb->lock, spin_flags);
- return ret;
+}
static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) {
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; u32 fence;
@@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size);
might_sleep();
/*
* We use a lazy spin wait loop here as we believe that if the CT
* buffers are sized correctly the flow control condition should be
* rare.
shouldn't we at least try to log such cases with RATE_LIMITED to find out how "rare" it is, or if really unlikely just return -EBUSY as in case of non-blocking send ?
Definitely not return -EBUSY as this a blocking call. Perhaps we can log
blocking calls still can fail for various reasons, full CTB is one of them, and if we return error (now broken) for non-blocking variant then we should do the same for blocking variant as well and let the caller decide about next steps
this, but IGTs likely can hit rather easily. It really is only interesting if real workloads hit this. Regardless that can be a follow up.
if we hide retry in a silent loop then we will not find it out if we hit this condition (IGT or real WL) or not
Matt
*/
+retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
if (unlikely(!h2g_has_room(ctb, len + 1))) {
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
cond_resched();
goto retry;
}
fence = ct_get_next_fence(ct); request.fence = fence;
@@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock);
- err = ct_write(ct, action, len, fence);
err = ct_write(ct, action, len, fence, 0);
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
@@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
- Command Transport (CT) buffer based GuC send function.
*/ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size)
u32 *response_buf, u32 response_buf_size, u32 flags)
{ u32 status = ~0; /* undefined */ int ret; @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }
- if (flags & INTEL_GUC_SEND_NB)
return ct_send_nb(ct, action, len, flags);
- ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 1ae2dde6db93..eb69263324ba 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer { bool broken; };
/** Top-level structure for Command Transport related data
- Includes a pair of CT buffers for bi-directional communication and tracking
@@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) }
int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size);
u32 *response_buf, u32 response_buf_size, u32 flags);
void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
#endif /* _INTEL_GUC_CT_H_ */
On 25.06.2021 00:41, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 17:49, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function.
The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer.
The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 +- 3 files changed, 82 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..24b1df6ad4ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) {
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+#define INTEL_GUC_SEND_NB BIT(31)
hmm, this flag really belongs to intel_guc_ct_send() so it should be defined as CTB flag near that function declaration
I can move this up a few lines.
+static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
INTEL_GUC_SEND_NB);
}
static inline int @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len,
response_buf, response_buf_size);
response_buf, response_buf_size, 0);
}
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a17215920e58..c9a65d05911f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,11 @@
- Copyright © 2016-2019 Intel Corporation
*/
+#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h>
#include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */,
u32 fence)
u32 fence, u32 flags)
{ struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
- hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
- hxg = (flags & INTEL_GUC_SEND_NB) ?
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
or as we already switched to accept and return whole HXG messages in guc_send_mmio() maybe we should do the same for CTB variant too and instead of using extra flag just let caller to prepare proper HXG header with HXG_EVENT type and then in CTB code just look at this type to make decision which code path to use
Not sure I follow. Anyways could this be done in a follow up by you if want this change.
note that existing callers should not be impacted, as full HXG header for the REQUEST message looks exactly the same as "action" code alone.
CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +{
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- return space >= len_dw;
here you are returning true(1) as has room
See below.
+}
+static int ct_send_nb(struct intel_guc_ct *ct,
const u32 *action,
u32 len,
u32 flags)
+{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- unsigned long spin_flags;
- u32 fence;
- int ret;
- spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1);
but here you treat "1" it as en error
Yes, this patch is broken but fixed in a follow up one. Regardless I'll fix this patch in place.
and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
Not exactly. This is following how ct_send() uses the action + len field. Action[0] field goes in the HXG header and extra + 1 is for the CT header.
well, "len" already counts "action" so by treating input as full HXG message (including HXG header) will make it cleaner
Yes, I know. See above. To me GUC_HXG_MSG_MIN_LEN makes zero sense and it is worse than adding + 1. This + 1 accounts for the CT header not the HXG header. If any we add a new define, GUC_CT_HDR_LEN, and add that.
you mean GUC_CTB_MSG_MIN_LEN ? it's already there [1]
[1] https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/i915/gt/uc/abi...
Matt
we can try do it later but by doing it right now we would avoid introducing this send_nb() function and deprecating them long term again
- if (unlikely(ret))
goto out;
- fence = ct_get_next_fence(ct);
- ret = ct_write(ct, action, len, fence, flags);
- if (unlikely(ret))
goto out;
- intel_guc_notify(ct_to_guc(ct));
+out:
- spin_unlock_irqrestore(&ctb->lock, spin_flags);
- return ret;
+}
static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) {
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; u32 fence;
@@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size);
might_sleep();
/*
* We use a lazy spin wait loop here as we believe that if the CT
* buffers are sized correctly the flow control condition should be
* rare.
shouldn't we at least try to log such cases with RATE_LIMITED to find out how "rare" it is, or if really unlikely just return -EBUSY as in case of non-blocking send ?
Definitely not return -EBUSY as this a blocking call. Perhaps we can log
blocking calls still can fail for various reasons, full CTB is one of them, and if we return error (now broken) for non-blocking variant then we should do the same for blocking variant as well and let the caller decide about next steps
this, but IGTs likely can hit rather easily. It really is only interesting if real workloads hit this. Regardless that can be a follow up.
if we hide retry in a silent loop then we will not find it out if we hit this condition (IGT or real WL) or not
Matt
*/
+retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
if (unlikely(!h2g_has_room(ctb, len + 1))) {
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
cond_resched();
goto retry;
}
fence = ct_get_next_fence(ct); request.fence = fence;
@@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock);
- err = ct_write(ct, action, len, fence);
err = ct_write(ct, action, len, fence, 0);
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
@@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
- Command Transport (CT) buffer based GuC send function.
*/ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size)
u32 *response_buf, u32 response_buf_size, u32 flags)
{ u32 status = ~0; /* undefined */ int ret; @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }
- if (flags & INTEL_GUC_SEND_NB)
return ct_send_nb(ct, action, len, flags);
- ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 1ae2dde6db93..eb69263324ba 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer { bool broken; };
/** Top-level structure for Command Transport related data
- Includes a pair of CT buffers for bi-directional communication and tracking
@@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) }
int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size);
u32 *response_buf, u32 response_buf_size, u32 flags);
void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
#endif /* _INTEL_GUC_CT_H_ */
On Fri, Jun 25, 2021 at 01:50:21PM +0200, Michal Wajdeczko wrote:
On 25.06.2021 00:41, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 17:49, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function.
The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer.
The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 +- 3 files changed, 82 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..24b1df6ad4ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) {
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+#define INTEL_GUC_SEND_NB BIT(31)
hmm, this flag really belongs to intel_guc_ct_send() so it should be defined as CTB flag near that function declaration
I can move this up a few lines.
+static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
INTEL_GUC_SEND_NB);
}
static inline int @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len,
response_buf, response_buf_size);
response_buf, response_buf_size, 0);
}
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a17215920e58..c9a65d05911f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,11 @@
- Copyright © 2016-2019 Intel Corporation
*/
+#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h>
#include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */,
u32 fence)
u32 fence, u32 flags)
{ struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
- hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
- hxg = (flags & INTEL_GUC_SEND_NB) ?
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
or as we already switched to accept and return whole HXG messages in guc_send_mmio() maybe we should do the same for CTB variant too and instead of using extra flag just let caller to prepare proper HXG header with HXG_EVENT type and then in CTB code just look at this type to make decision which code path to use
Not sure I follow. Anyways could this be done in a follow up by you if want this change.
note that existing callers should not be impacted, as full HXG header for the REQUEST message looks exactly the same as "action" code alone.
CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +{
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- return space >= len_dw;
here you are returning true(1) as has room
See below.
+}
+static int ct_send_nb(struct intel_guc_ct *ct,
const u32 *action,
u32 len,
u32 flags)
+{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- unsigned long spin_flags;
- u32 fence;
- int ret;
- spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1);
but here you treat "1" it as en error
Yes, this patch is broken but fixed in a follow up one. Regardless I'll fix this patch in place.
and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
Not exactly. This is following how ct_send() uses the action + len field. Action[0] field goes in the HXG header and extra + 1 is for the CT header.
well, "len" already counts "action" so by treating input as full HXG message (including HXG header) will make it cleaner
Yes, I know. See above. To me GUC_HXG_MSG_MIN_LEN makes zero sense and it is worse than adding + 1. This + 1 accounts for the CT header not the HXG header. If any we add a new define, GUC_CT_HDR_LEN, and add that.
you mean GUC_CTB_MSG_MIN_LEN ? it's already there [1]
Kinda? I think we should have a define GUC_CTB_HDR_LEN which is 1 and GUC_CTB_MSG_MIN_LEN is defined as GUC_CTB_HDR_LEN. 'GUC_CTB_HDR_LEN' makes it clear that the + 1 is referring to the header. I've done this branch of these patches already.
Matt
[1] https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/i915/gt/uc/abi...
Matt
we can try do it later but by doing it right now we would avoid introducing this send_nb() function and deprecating them long term again
- if (unlikely(ret))
goto out;
- fence = ct_get_next_fence(ct);
- ret = ct_write(ct, action, len, fence, flags);
- if (unlikely(ret))
goto out;
- intel_guc_notify(ct_to_guc(ct));
+out:
- spin_unlock_irqrestore(&ctb->lock, spin_flags);
- return ret;
+}
static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) {
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; u32 fence;
@@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size);
might_sleep();
/*
* We use a lazy spin wait loop here as we believe that if the CT
* buffers are sized correctly the flow control condition should be
* rare.
shouldn't we at least try to log such cases with RATE_LIMITED to find out how "rare" it is, or if really unlikely just return -EBUSY as in case of non-blocking send ?
Definitely not return -EBUSY as this a blocking call. Perhaps we can log
blocking calls still can fail for various reasons, full CTB is one of them, and if we return error (now broken) for non-blocking variant then we should do the same for blocking variant as well and let the caller decide about next steps
this, but IGTs likely can hit rather easily. It really is only interesting if real workloads hit this. Regardless that can be a follow up.
if we hide retry in a silent loop then we will not find it out if we hit this condition (IGT or real WL) or not
Matt
*/
+retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
if (unlikely(!h2g_has_room(ctb, len + 1))) {
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
cond_resched();
goto retry;
}
fence = ct_get_next_fence(ct); request.fence = fence;
@@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock);
- err = ct_write(ct, action, len, fence);
err = ct_write(ct, action, len, fence, 0);
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
@@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
- Command Transport (CT) buffer based GuC send function.
*/ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size)
u32 *response_buf, u32 response_buf_size, u32 flags)
{ u32 status = ~0; /* undefined */ int ret; @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }
- if (flags & INTEL_GUC_SEND_NB)
return ct_send_nb(ct, action, len, flags);
- ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 1ae2dde6db93..eb69263324ba 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer { bool broken; };
/** Top-level structure for Command Transport related data
- Includes a pair of CT buffers for bi-directional communication and tracking
@@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) }
int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size);
u32 *response_buf, u32 response_buf_size, u32 flags);
void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
#endif /* _INTEL_GUC_CT_H_ */
On Thu, Jun 24, 2021 at 07:02:18PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 17:49, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 04:48:32PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Add non blocking CTB send function, intel_guc_send_nb. GuC submission will send CTBs in the critical path and does not need to wait for these CTBs to complete before moving on, hence the need for this new function.
The non-blocking CTB now must have a flow control mechanism to ensure the buffer isn't overrun. A lazy spin wait is used as we believe the flow control condition should be rare with a properly sized buffer.
The function, intel_guc_send_nb, is exported in this patch but unused. Several patches later in the series make use of this function.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 12 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 77 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 +- 3 files changed, 82 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 4abc59f6f3cd..24b1df6ad4ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -74,7 +74,15 @@ static inline struct intel_guc *log_to_guc(struct intel_guc_log *log) static inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) {
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0);
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, 0);
+}
+#define INTEL_GUC_SEND_NB BIT(31)
hmm, this flag really belongs to intel_guc_ct_send() so it should be defined as CTB flag near that function declaration
I can move this up a few lines.
+static +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +{
- return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
INTEL_GUC_SEND_NB);
}
static inline int @@ -82,7 +90,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size) { return intel_guc_ct_send(&guc->ct, action, len,
response_buf, response_buf_size);
response_buf, response_buf_size, 0);
}
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a17215920e58..c9a65d05911f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -3,6 +3,11 @@
- Copyright © 2016-2019 Intel Corporation
*/
+#include <linux/circ_buf.h> +#include <linux/ktime.h> +#include <linux/time64.h> +#include <linux/timekeeping.h>
#include "i915_drv.h" #include "intel_guc_ct.h" #include "gt/intel_gt.h" @@ -373,7 +378,7 @@ static void write_barrier(struct intel_guc_ct *ct) static int ct_write(struct intel_guc_ct *ct, const u32 *action, u32 len /* in dwords */,
u32 fence)
u32 fence, u32 flags)
{ struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; @@ -421,9 +426,13 @@ static int ct_write(struct intel_guc_ct *ct, FIELD_PREP(GUC_CTB_MSG_0_NUM_DWORDS, len) | FIELD_PREP(GUC_CTB_MSG_0_FENCE, fence);
- hxg = FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]);
- hxg = (flags & INTEL_GUC_SEND_NB) ?
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_EVENT) |
FIELD_PREP(GUC_HXG_EVENT_MSG_0_ACTION |
GUC_HXG_EVENT_MSG_0_DATA0, action[0])) :
(FIELD_PREP(GUC_HXG_MSG_0_TYPE, GUC_HXG_TYPE_REQUEST) |
FIELD_PREP(GUC_HXG_REQUEST_MSG_0_ACTION |
GUC_HXG_REQUEST_MSG_0_DATA0, action[0]));
or as we already switched to accept and return whole HXG messages in guc_send_mmio() maybe we should do the same for CTB variant too and instead of using extra flag just let caller to prepare proper HXG header with HXG_EVENT type and then in CTB code just look at this type to make decision which code path to use
Not sure I follow. Anyways could this be done in a follow up by you if want this change.
note that existing callers should not be impacted, as full HXG header for the REQUEST message looks exactly the same as "action" code alone.
CT_DEBUG(ct, "writing (tail %u) %*ph %*ph %*ph\n", tail, 4, &header, 4, &hxg, 4 * (len - 1), &action[1]); @@ -498,6 +507,46 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +{
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- return space >= len_dw;
here you are returning true(1) as has room
See below.
+}
+static int ct_send_nb(struct intel_guc_ct *ct,
const u32 *action,
u32 len,
u32 flags)
+{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- unsigned long spin_flags;
- u32 fence;
- int ret;
- spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1);
but here you treat "1" it as en error
Yes, this patch is broken but fixed in a follow up one. Regardless I'll fix this patch in place.
and this "1" is GUC_HXG_MSG_MIN_LEN, right ?
Not exactly. This is following how ct_send() uses the action + len field. Action[0] field goes in the HXG header and extra + 1 is for the CT header.
well, "len" already counts "action" so by treating input as full HXG message (including HXG header) will make it cleaner
we can try do it later but by doing it right now we would avoid introducing this send_nb() function and deprecating them long term again
- if (unlikely(ret))
goto out;
- fence = ct_get_next_fence(ct);
- ret = ct_write(ct, action, len, fence, flags);
- if (unlikely(ret))
goto out;
- intel_guc_notify(ct_to_guc(ct));
+out:
- spin_unlock_irqrestore(&ctb->lock, spin_flags);
- return ret;
+}
static int ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -505,6 +554,7 @@ static int ct_send(struct intel_guc_ct *ct, u32 response_buf_size, u32 *status) {
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct ct_request request; unsigned long flags; u32 fence;
@@ -514,8 +564,20 @@ static int ct_send(struct intel_guc_ct *ct, GEM_BUG_ON(!len); GEM_BUG_ON(len & ~GUC_CT_MSG_LEN_MASK); GEM_BUG_ON(!response_buf && response_buf_size);
might_sleep();
/*
* We use a lazy spin wait loop here as we believe that if the CT
* buffers are sized correctly the flow control condition should be
* rare.
shouldn't we at least try to log such cases with RATE_LIMITED to find out how "rare" it is, or if really unlikely just return -EBUSY as in case of non-blocking send ?
Definitely not return -EBUSY as this a blocking call. Perhaps we can log
blocking calls still can fail for various reasons, full CTB is one of them, and if we return error (now broken) for non-blocking variant then we should do the same for blocking variant as well and let the caller decide about next steps
And have to rewrite reset of the stack with the new behavior, that seems wrong. This function is allowed to block, so let it.
If you want to do this but all means go ahead but I'll likely NACK it as over engineered.
this, but IGTs likely can hit rather easily. It really is only interesting if real workloads hit this. Regardless that can be a follow up.
if we hide retry in a silent loop then we will not find it out if we hit this condition (IGT or real WL) or not
We don't care if this is hit in IGTs as this isn't a real world use case and will spam the dmesg if we hit this. I can make a note of this as an open and we can revisit later.
Matt
Matt
*/
+retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
if (unlikely(!h2g_has_room(ctb, len + 1))) {
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
cond_resched();
goto retry;
}
fence = ct_get_next_fence(ct); request.fence = fence;
@@ -527,7 +589,7 @@ static int ct_send(struct intel_guc_ct *ct, list_add_tail(&request.link, &ct->requests.pending); spin_unlock(&ct->requests.lock);
- err = ct_write(ct, action, len, fence);
err = ct_write(ct, action, len, fence, 0);
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
@@ -569,7 +631,7 @@ static int ct_send(struct intel_guc_ct *ct,
- Command Transport (CT) buffer based GuC send function.
*/ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size)
u32 *response_buf, u32 response_buf_size, u32 flags)
{ u32 status = ~0; /* undefined */ int ret; @@ -579,6 +641,9 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, return -ENODEV; }
- if (flags & INTEL_GUC_SEND_NB)
return ct_send_nb(ct, action, len, flags);
- ret = ct_send(ct, action, len, response_buf, response_buf_size, &status); if (unlikely(ret < 0)) { CT_ERROR(ct, "Sending action %#x failed (err=%d status=%#X)\n",
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 1ae2dde6db93..eb69263324ba 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -42,7 +42,6 @@ struct intel_guc_ct_buffer { bool broken; };
/** Top-level structure for Command Transport related data
- Includes a pair of CT buffers for bi-directional communication and tracking
@@ -88,7 +87,7 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct) }
int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
u32 *response_buf, u32 response_buf_size);
u32 *response_buf, u32 response_buf_size, u32 flags);
void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
#endif /* _INTEL_GUC_CT_H_ */
Implement a stall timer which fails H2G CTBs once a period of time with no forward progress is reached to prevent deadlock.
Also update to ct_write to return -EIO rather than -EPIPE on a corrupted descriptor.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 47 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 ++ 2 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index c9a65d05911f..27ec30b5ef47 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -319,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct) goto err_deregister;
ct->enabled = true; + ct->stall_time = KTIME_MAX;
return 0;
@@ -392,7 +393,7 @@ static int ct_write(struct intel_guc_ct *ct, unsigned int i;
if (unlikely(ctb->broken)) - return -EPIPE; + return -EIO;
if (unlikely(desc->status)) goto corrupted; @@ -464,7 +465,7 @@ static int ct_write(struct intel_guc_ct *ct, CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n", desc->head, desc->tail, desc->status); ctb->broken = true; - return -EPIPE; + return -EIO; }
/** @@ -507,6 +508,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+#define GUC_CTB_TIMEOUT_MS 1500 +static inline bool ct_deadlocked(struct intel_guc_ct *ct) +{ + long timeout = GUC_CTB_TIMEOUT_MS; + bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout; + + if (unlikely(ret)) + CT_ERROR(ct, "CT deadlocked\n"); + + return ret; +} + static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) { struct guc_ct_buffer_desc *desc = ctb->desc; @@ -518,6 +531,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) return space >= len_dw; }
+static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) +{ + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + + lockdep_assert_held(&ct->ctbs.send.lock); + + if (unlikely(!h2g_has_room(ctb, len_dw))) { + if (ct->stall_time == KTIME_MAX) + ct->stall_time = ktime_get(); + + if (unlikely(ct_deadlocked(ct))) + return -EIO; + else + return -EBUSY; + } + + ct->stall_time = KTIME_MAX; + return 0; +} + static int ct_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -530,7 +563,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1); + ret = has_room_nb(ct, len + 1); if (unlikely(ret)) goto out;
@@ -574,11 +607,19 @@ static int ct_send(struct intel_guc_ct *ct, retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags); if (unlikely(!h2g_has_room(ctb, len + 1))) { + if (ct->stall_time == KTIME_MAX) + ct->stall_time = ktime_get(); spin_unlock_irqrestore(&ct->ctbs.send.lock, flags); + + if (unlikely(ct_deadlocked(ct))) + return -EIO; + cond_resched(); goto retry; }
+ ct->stall_time = KTIME_MAX; + fence = ct_get_next_fence(ct); request.fence = fence; request.status = 0; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index eb69263324ba..55ef7c52472f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -9,6 +9,7 @@ #include <linux/interrupt.h> #include <linux/spinlock.h> #include <linux/workqueue.h> +#include <linux/ktime.h>
#include "intel_guc_fwif.h"
@@ -68,6 +69,9 @@ struct intel_guc_ct { struct list_head incoming; /* incoming requests */ struct work_struct worker; /* handler for incoming requests */ } requests; + + /** @stall_time: time of first time a CTB submission is stalled */ + ktime_t stall_time; };
void intel_guc_ct_init_early(struct intel_guc_ct *ct);
On 24.06.2021 09:04, Matthew Brost wrote:
Implement a stall timer which fails H2G CTBs once a period of time with no forward progress is reached to prevent deadlock.
Also update to ct_write to return -EIO rather than -EPIPE on a corrupted descriptor.
by doing so you will have the same error code for two different problems:
a) corrupted CTB descriptor (definitely unrecoverable) b) long stall in CTB processing (still recoverable)
while caller is explicitly instructed to retry only on:
c) temporary stall in CTB processing (likely recoverable)
so why do we want to limit our diagnostics?
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 47 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 ++ 2 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index c9a65d05911f..27ec30b5ef47 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -319,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct) goto err_deregister;
ct->enabled = true;
ct->stall_time = KTIME_MAX;
return 0;
@@ -392,7 +393,7 @@ static int ct_write(struct intel_guc_ct *ct, unsigned int i;
if (unlikely(ctb->broken))
return -EPIPE;
return -EIO;
if (unlikely(desc->status)) goto corrupted;
@@ -464,7 +465,7 @@ static int ct_write(struct intel_guc_ct *ct, CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n", desc->head, desc->tail, desc->status); ctb->broken = true;
- return -EPIPE;
- return -EIO;
}
/** @@ -507,6 +508,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+#define GUC_CTB_TIMEOUT_MS 1500
it's 150% of core CTB timeout, maybe we should correlate them ?
+static inline bool ct_deadlocked(struct intel_guc_ct *ct) +{
- long timeout = GUC_CTB_TIMEOUT_MS;
- bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout;
- if (unlikely(ret))
CT_ERROR(ct, "CT deadlocked\n");
nit: in commit message you said all these changes are to "prevent deadlock" so maybe this message should rather be:
int delta = ktime_ms_delta(ktime_get(), ct->stall_time);
CT_ERROR(ct, "Communication stalled for %dms\n", delta);
(note that CT_ERROR already adds "CT" prefix)
- return ret;
+}
static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) { struct guc_ct_buffer_desc *desc = ctb->desc; @@ -518,6 +531,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) return space >= len_dw; }
+static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) +{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- lockdep_assert_held(&ct->ctbs.send.lock);
- if (unlikely(!h2g_has_room(ctb, len_dw))) {
if (ct->stall_time == KTIME_MAX)
ct->stall_time = ktime_get();
if (unlikely(ct_deadlocked(ct)))
and maybe above message should be printed somewhere around here when we detect "deadlock" for the first time?
return -EIO;
else
return -EBUSY;
- }
- ct->stall_time = KTIME_MAX;
- return 0;
+}
static int ct_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -530,7 +563,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1);
- ret = has_room_nb(ct, len + 1); if (unlikely(ret)) goto out;
@@ -574,11 +607,19 @@ static int ct_send(struct intel_guc_ct *ct, retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags); if (unlikely(!h2g_has_room(ctb, len + 1))) {
if (ct->stall_time == KTIME_MAX)
ct->stall_time = ktime_get();
as this is a repeated pattern, maybe it should be moved to h2g_has_room or other wrapper ?
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
if (unlikely(ct_deadlocked(ct)))
return -EIO;
cond_resched(); goto retry; }
ct->stall_time = KTIME_MAX;
this one too
- fence = ct_get_next_fence(ct); request.fence = fence; request.status = 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index eb69263324ba..55ef7c52472f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -9,6 +9,7 @@ #include <linux/interrupt.h> #include <linux/spinlock.h> #include <linux/workqueue.h> +#include <linux/ktime.h>
#include "intel_guc_fwif.h"
@@ -68,6 +69,9 @@ struct intel_guc_ct { struct list_head incoming; /* incoming requests */ struct work_struct worker; /* handler for incoming requests */ } requests;
- /** @stall_time: time of first time a CTB submission is stalled */
- ktime_t stall_time;
};
void intel_guc_ct_init_early(struct intel_guc_ct *ct);
On Thu, Jun 24, 2021 at 07:37:01PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Implement a stall timer which fails H2G CTBs once a period of time with no forward progress is reached to prevent deadlock.
Also update to ct_write to return -EIO rather than -EPIPE on a corrupted descriptor.
by doing so you will have the same error code for two different problems:
a) corrupted CTB descriptor (definitely unrecoverable) b) long stall in CTB processing (still recoverable)
Already discussed both are treated exactly the same by the rest of the stack so we return a single error code.
while caller is explicitly instructed to retry only on:
c) temporary stall in CTB processing (likely recoverable)
so why do we want to limit our diagnostics?
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 47 +++++++++++++++++++++-- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 ++ 2 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index c9a65d05911f..27ec30b5ef47 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -319,6 +319,7 @@ int intel_guc_ct_enable(struct intel_guc_ct *ct) goto err_deregister;
ct->enabled = true;
ct->stall_time = KTIME_MAX;
return 0;
@@ -392,7 +393,7 @@ static int ct_write(struct intel_guc_ct *ct, unsigned int i;
if (unlikely(ctb->broken))
return -EPIPE;
return -EIO;
if (unlikely(desc->status)) goto corrupted;
@@ -464,7 +465,7 @@ static int ct_write(struct intel_guc_ct *ct, CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u status=%#x\n", desc->head, desc->tail, desc->status); ctb->broken = true;
- return -EPIPE;
- return -EIO;
}
/** @@ -507,6 +508,18 @@ static int wait_for_ct_request_update(struct ct_request *req, u32 *status) return err; }
+#define GUC_CTB_TIMEOUT_MS 1500
it's 150% of core CTB timeout, maybe we should correlate them ?
Seems overkill.
+static inline bool ct_deadlocked(struct intel_guc_ct *ct) +{
- long timeout = GUC_CTB_TIMEOUT_MS;
- bool ret = ktime_ms_delta(ktime_get(), ct->stall_time) > timeout;
- if (unlikely(ret))
CT_ERROR(ct, "CT deadlocked\n");
nit: in commit message you said all these changes are to "prevent deadlock" so maybe this message should rather be:
int delta = ktime_ms_delta(ktime_get(), ct->stall_time);
CT_ERROR(ct, "Communication stalled for %dms\n", delta);
Sure.
(note that CT_ERROR already adds "CT" prefix)
- return ret;
+}
static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) { struct guc_ct_buffer_desc *desc = ctb->desc; @@ -518,6 +531,26 @@ static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) return space >= len_dw; }
+static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) +{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- lockdep_assert_held(&ct->ctbs.send.lock);
- if (unlikely(!h2g_has_room(ctb, len_dw))) {
if (ct->stall_time == KTIME_MAX)
ct->stall_time = ktime_get();
if (unlikely(ct_deadlocked(ct)))
and maybe above message should be printed somewhere around here when we detect "deadlock" for the first time?
Not sure I follow. The error message is in the correct place if ask me. Probably should set the broken flag though when the message is printed though.
return -EIO;
else
return -EBUSY;
- }
- ct->stall_time = KTIME_MAX;
- return 0;
+}
static int ct_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -530,7 +563,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = h2g_has_room(ctb, len + 1);
- ret = has_room_nb(ct, len + 1); if (unlikely(ret)) goto out;
@@ -574,11 +607,19 @@ static int ct_send(struct intel_guc_ct *ct, retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags); if (unlikely(!h2g_has_room(ctb, len + 1))) {
if (ct->stall_time == KTIME_MAX)
ct->stall_time = ktime_get();
as this is a repeated pattern, maybe it should be moved to h2g_has_room or other wrapper ?
Once we check G2H credits the pattern changes, hence the reason for this outside of the wrapper. Also IMO sometimes we go a little overboard with wrappers and it makes the code harder to understand.
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
if (unlikely(ct_deadlocked(ct)))
return -EIO;
cond_resched(); goto retry; }
ct->stall_time = KTIME_MAX;
this one too
Same as above.
Matt
- fence = ct_get_next_fence(ct); request.fence = fence; request.status = 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index eb69263324ba..55ef7c52472f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -9,6 +9,7 @@ #include <linux/interrupt.h> #include <linux/spinlock.h> #include <linux/workqueue.h> +#include <linux/ktime.h>
#include "intel_guc_fwif.h"
@@ -68,6 +69,9 @@ struct intel_guc_ct { struct list_head incoming; /* incoming requests */ struct work_struct worker; /* handler for incoming requests */ } requests;
- /** @stall_time: time of first time a CTB submission is stalled */
- ktime_t stall_time;
};
void intel_guc_ct_init_early(struct intel_guc_ct *ct);
CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 ++++++++++++++--------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 51 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 27ec30b5ef47..1fd5c69358ef 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false; + ctb->tail = 0; + ctb->head = 0; + ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + guc_ct_buffer_desc_init(ctb->desc); }
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; - u32 tail = desc->tail; + u32 tail = ctb->tail; u32 size = ctb->size; - u32 used; u32 header; u32 hxg; u32 *cmds = ctb->cmds; @@ -398,25 +400,14 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;
- if (unlikely((tail | head) >= size)) { +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", - head, tail, size); + desc->head, desc->tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } - - /* - * tail == head condition indicates empty. GuC FW does not support - * using up the entire buffer to get tail == head meaning full. - */ - if (tail < head) - used = (size - head) + tail; - else - used = tail - head; - - /* make sure there is a space including extra dw for the fence */ - if (unlikely(used + len + 1 >= size)) - return -ENOSPC; +#endif
/* * dw0: CT header (including fence) @@ -457,7 +448,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct);
/* now update descriptor */ + ctb->tail = tail; WRITE_ONCE(desc->tail, tail); + ctb->space -= len + 1;
return 0;
@@ -473,7 +466,7 @@ static int ct_write(struct intel_guc_ct *ct, * @req: pointer to pending request * @status: placeholder for status * - * For each sent request, Guc shall send bac CT response message. + * For each sent request, GuC shall send back CT response message. * Our message handler will update status of tracked request once * response message with given fence is received. Wait here and * check for valid response status value. @@ -520,24 +513,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { - struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = READ_ONCE(desc->head); + struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; + u32 head; u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size); + if (ctb->space >= len_dw) + return true; + + head = READ_ONCE(ctb->desc->head); + if (unlikely(head > ctb->size)) { + CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n", + ctb->desc->head, ctb->desc->tail, ctb->size); + ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW; + ctb->broken = true; + return false; + } + + space = CIRC_SPACE(ctb->tail, head, ctb->size); + ctb->space = space;
return space >= len_dw; }
static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) { - struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; - lockdep_assert_held(&ct->ctbs.send.lock);
- if (unlikely(!h2g_has_room(ctb, len_dw))) { + if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
@@ -606,10 +610,10 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags); - if (unlikely(!h2g_has_room(ctb, len + 1))) { + if (unlikely(!h2g_has_room(ct, len + 1))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); - spin_unlock_irqrestore(&ct->ctbs.send.lock, flags); + spin_unlock_irqrestore(&ctb->lock, flags);
if (unlikely(ct_deadlocked(ct))) return -EIO; @@ -632,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
err = ct_write(ct, action, len, fence, 0);
- spin_unlock_irqrestore(&ct->ctbs.send.lock, flags); + spin_unlock_irqrestore(&ctb->lock, flags);
if (unlikely(err)) goto unlink; @@ -720,7 +724,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc; - u32 head = desc->head; + u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds; @@ -735,12 +739,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;
- if (unlikely((tail | head) >= size)) { +#ifdef CONFIG_DRM_I915_DEBUG_GUC + if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", head, tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; } +#else + if (unlikely((tail | ctb->head) >= size)) { + CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", + head, tail, size); + desc->status |= GUC_CTB_STATUS_OVERFLOW; + goto corrupted; + } +#endif
/* tail == head condition indicates empty */ available = tail - head; @@ -790,6 +803,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
+ ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 55ef7c52472f..9924335e2ee6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc; * @desc: pointer to the buffer descriptor * @cmds: pointer to the commands buffer * @size: size of the commands buffer in dwords + * @head: local shadow copy of head in dwords + * @tail: local shadow copy of tail in dwords + * @space: local shadow copy of space in dwords * @broken: flag to indicate if descriptor data is broken */ struct intel_guc_ct_buffer { @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size; + u32 tail; + u32 head; + u32 space; bool broken; };
On 24.06.2021 09:04, Matthew Brost wrote:
CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 ++++++++++++++--------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 51 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 27ec30b5ef47..1fd5c69358ef 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false;
- ctb->tail = 0;
- ctb->head = 0;
- ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
- guc_ct_buffer_desc_init(ctb->desc);
}
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = desc->head;
- u32 tail = desc->tail;
- u32 tail = ctb->tail; u32 size = ctb->size;
- u32 used; u32 header; u32 hxg; u32 *cmds = ctb->cmds;
@@ -398,25 +400,14 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;
- if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
since we are caching tail, we may want to check if it's sill correct:
tail = READ_ONCE(desc->tail); if (unlikely(tail != ctb->tail)) { CT_ERROR(ct, "Tail was modified %u != %u\n", tail, ctb->tail); desc->status |= GUC_CTB_STATUS_MISMATCH; goto corrupted; }
and since we own the tail then we can be more strict:
GEM_BUG_ON(tail > size);
and then finally just check GuC head:
head = READ_ONCE(desc->head); if (unlikely(head >= size)) { ...
- if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
head, tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }desc->head, desc->tail, size);
- /*
* tail == head condition indicates empty. GuC FW does not support
* using up the entire buffer to get tail == head meaning full.
*/
- if (tail < head)
used = (size - head) + tail;
- else
used = tail - head;
- /* make sure there is a space including extra dw for the fence */
- if (unlikely(used + len + 1 >= size))
return -ENOSPC;
+#endif
/* * dw0: CT header (including fence) @@ -457,7 +448,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct);
/* now update descriptor */
- ctb->tail = tail; WRITE_ONCE(desc->tail, tail);
- ctb->space -= len + 1;
this magic "1" is likely GUC_CTB_MSG_MIN_LEN, right ?
return 0;
@@ -473,7 +466,7 @@ static int ct_write(struct intel_guc_ct *ct,
- @req: pointer to pending request
- @status: placeholder for status
- For each sent request, Guc shall send bac CT response message.
- For each sent request, GuC shall send back CT response message.
- Our message handler will update status of tracked request once
- response message with given fence is received. Wait here and
- check for valid response status value.
@@ -520,24 +513,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) {
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- u32 head; u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- if (ctb->space >= len_dw)
return true;
- head = READ_ONCE(ctb->desc->head);
- if (unlikely(head > ctb->size)) {
CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
ctb->desc->head, ctb->desc->tail, ctb->size);
ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
ctb->broken = true;
return false;
- }
- space = CIRC_SPACE(ctb->tail, head, ctb->size);
- ctb->space = space;
maybe here we could mark stall_time ?
if (space >= len_dw) return true;
if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); return false;
return space >= len_dw;
btw, maybe to avoid filling CTB to the last dword, this should be
space > len_dw
note the earlier comment:
/* * tail == head condition indicates empty. GuC FW does not support * using up the entire buffer to get tail == head meaning full. */
}
static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
lockdep_assert_held(&ct->ctbs.send.lock);
if (unlikely(!h2g_has_room(ctb, len_dw))) {
- if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
@@ -606,10 +610,10 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
- if (unlikely(!h2g_has_room(ctb, len + 1))) {
- if (unlikely(!h2g_has_room(ct, len + 1))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
spin_unlock_irqrestore(&ctb->lock, flags);
if (unlikely(ct_deadlocked(ct))) return -EIO;
@@ -632,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
err = ct_write(ct, action, len, fence, 0);
- spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
spin_unlock_irqrestore(&ctb->lock, flags);
if (unlikely(err)) goto unlink;
@@ -720,7 +724,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = desc->head;
- u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds;
@@ -735,12 +739,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;
- if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
as above we may want to check if our cached head was not modified
- if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", head, tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }
+#else
- if (unlikely((tail | ctb->head) >= size)) {
CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
head, tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
- }
+#endif
/* tail == head condition indicates empty */ available = tail - head; @@ -790,6 +803,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
- ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 55ef7c52472f..9924335e2ee6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc;
- @desc: pointer to the buffer descriptor
- @cmds: pointer to the commands buffer
- @size: size of the commands buffer in dwords
- @head: local shadow copy of head in dwords
- @tail: local shadow copy of tail in dwords
*/
- @space: local shadow copy of space in dwords
- @broken: flag to indicate if descriptor data is broken
struct intel_guc_ct_buffer { @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;
- u32 tail;
- u32 head;
- u32 space;
in later patch this is changing to atomic_t maybe we can start with it ?
bool broken; };
On Fri, Jun 25, 2021 at 03:09:29PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 ++++++++++++++--------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 51 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 27ec30b5ef47..1fd5c69358ef 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false;
- ctb->tail = 0;
- ctb->head = 0;
- ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
- guc_ct_buffer_desc_init(ctb->desc);
}
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = desc->head;
- u32 tail = desc->tail;
- u32 tail = ctb->tail; u32 size = ctb->size;
- u32 used; u32 header; u32 hxg; u32 *cmds = ctb->cmds;
@@ -398,25 +400,14 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;
- if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
since we are caching tail, we may want to check if it's sill correct:
tail = READ_ONCE(desc->tail); if (unlikely(tail != ctb->tail)) { CT_ERROR(ct, "Tail was modified %u != %u\n", tail, ctb->tail); desc->status |= GUC_CTB_STATUS_MISMATCH; goto corrupted; }
and since we own the tail then we can be more strict:
GEM_BUG_ON(tail > size);
and then finally just check GuC head:
head = READ_ONCE(desc->head); if (unlikely(head >= size)) { ...
Sure, but still hidden behind CONFIG_DRM_I915_DEBUG_GUC, right?
- if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
head, tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }desc->head, desc->tail, size);
- /*
* tail == head condition indicates empty. GuC FW does not support
* using up the entire buffer to get tail == head meaning full.
*/
- if (tail < head)
used = (size - head) + tail;
- else
used = tail - head;
- /* make sure there is a space including extra dw for the fence */
- if (unlikely(used + len + 1 >= size))
return -ENOSPC;
+#endif
/* * dw0: CT header (including fence) @@ -457,7 +448,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct);
/* now update descriptor */
- ctb->tail = tail; WRITE_ONCE(desc->tail, tail);
- ctb->space -= len + 1;
this magic "1" is likely GUC_CTB_MSG_MIN_LEN, right ?
Yes.
return 0;
@@ -473,7 +466,7 @@ static int ct_write(struct intel_guc_ct *ct,
- @req: pointer to pending request
- @status: placeholder for status
- For each sent request, Guc shall send bac CT response message.
- For each sent request, GuC shall send back CT response message.
- Our message handler will update status of tracked request once
- response message with given fence is received. Wait here and
- check for valid response status value.
@@ -520,24 +513,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) {
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- u32 head; u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- if (ctb->space >= len_dw)
return true;
- head = READ_ONCE(ctb->desc->head);
- if (unlikely(head > ctb->size)) {
CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
ctb->desc->head, ctb->desc->tail, ctb->size);
ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
ctb->broken = true;
return false;
- }
- space = CIRC_SPACE(ctb->tail, head, ctb->size);
- ctb->space = space;
maybe here we could mark stall_time ?
if (space >= len_dw) return true;
if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); return false;
No. See my eariler comment [1] about why I'd rather leave this to the caller.
[1] https://patchwork.freedesktop.org/patch/440703/?series=91840&rev=1
return space >= len_dw;
btw, maybe to avoid filling CTB to the last dword, this should be
space > len_dw
CIRC_SPACE leaves an extra DW already.
note the earlier comment:
/*
- tail == head condition indicates empty. GuC FW does not support
- using up the entire buffer to get tail == head meaning full.
*/
Yes, again CIRC_SPACE uses this same algorithm.
Matt
}
static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
lockdep_assert_held(&ct->ctbs.send.lock);
if (unlikely(!h2g_has_room(ctb, len_dw))) {
- if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
@@ -606,10 +610,10 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
- if (unlikely(!h2g_has_room(ctb, len + 1))) {
- if (unlikely(!h2g_has_room(ct, len + 1))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
spin_unlock_irqrestore(&ctb->lock, flags);
if (unlikely(ct_deadlocked(ct))) return -EIO;
@@ -632,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
err = ct_write(ct, action, len, fence, 0);
- spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
spin_unlock_irqrestore(&ctb->lock, flags);
if (unlikely(err)) goto unlink;
@@ -720,7 +724,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = desc->head;
- u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds;
@@ -735,12 +739,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;
- if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
as above we may want to check if our cached head was not modified
- if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", head, tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }
+#else
- if (unlikely((tail | ctb->head) >= size)) {
CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
head, tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
- }
+#endif
/* tail == head condition indicates empty */ available = tail - head; @@ -790,6 +803,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
- ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 55ef7c52472f..9924335e2ee6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc;
- @desc: pointer to the buffer descriptor
- @cmds: pointer to the commands buffer
- @size: size of the commands buffer in dwords
- @head: local shadow copy of head in dwords
- @tail: local shadow copy of tail in dwords
*/
- @space: local shadow copy of space in dwords
- @broken: flag to indicate if descriptor data is broken
struct intel_guc_ct_buffer { @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;
- u32 tail;
- u32 head;
- u32 space;
in later patch this is changing to atomic_t maybe we can start with it ?
bool broken; };
On Fri, Jun 25, 2021 at 03:09:29PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
CTB writes are now in the path of command submission and should be optimized for performance. Rather than reading CTB descriptor values (e.g. head, tail) which could result in accesses across the PCIe bus, store shadow local copies and only read/write the descriptor values when absolutely necessary. Also store the current space in the each channel locally.
Missed two comments, addressed below.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 ++++++++++++++--------- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 6 ++ 2 files changed, 51 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 27ec30b5ef47..1fd5c69358ef 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -130,6 +130,10 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { ctb->broken = false;
- ctb->tail = 0;
- ctb->head = 0;
- ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
- guc_ct_buffer_desc_init(ctb->desc);
}
@@ -383,10 +387,8 @@ static int ct_write(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = desc->head;
- u32 tail = desc->tail;
- u32 tail = ctb->tail; u32 size = ctb->size;
- u32 used; u32 header; u32 hxg; u32 *cmds = ctb->cmds;
@@ -398,25 +400,14 @@ static int ct_write(struct intel_guc_ct *ct, if (unlikely(desc->status)) goto corrupted;
- if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
since we are caching tail, we may want to check if it's sill correct:
tail = READ_ONCE(desc->tail); if (unlikely(tail != ctb->tail)) { CT_ERROR(ct, "Tail was modified %u != %u\n", tail, ctb->tail); desc->status |= GUC_CTB_STATUS_MISMATCH; goto corrupted; }
and since we own the tail then we can be more strict:
GEM_BUG_ON(tail > size);
and then finally just check GuC head:
head = READ_ONCE(desc->head); if (unlikely(head >= size)) { ...
- if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
head, tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }desc->head, desc->tail, size);
- /*
* tail == head condition indicates empty. GuC FW does not support
* using up the entire buffer to get tail == head meaning full.
*/
- if (tail < head)
used = (size - head) + tail;
- else
used = tail - head;
- /* make sure there is a space including extra dw for the fence */
- if (unlikely(used + len + 1 >= size))
return -ENOSPC;
+#endif
/* * dw0: CT header (including fence) @@ -457,7 +448,9 @@ static int ct_write(struct intel_guc_ct *ct, write_barrier(ct);
/* now update descriptor */
- ctb->tail = tail; WRITE_ONCE(desc->tail, tail);
- ctb->space -= len + 1;
this magic "1" is likely GUC_CTB_MSG_MIN_LEN, right ?
return 0;
@@ -473,7 +466,7 @@ static int ct_write(struct intel_guc_ct *ct,
- @req: pointer to pending request
- @status: placeholder for status
- For each sent request, Guc shall send bac CT response message.
- For each sent request, GuC shall send back CT response message.
- Our message handler will update status of tracked request once
- response message with given fence is received. Wait here and
- check for valid response status value.
@@ -520,24 +513,35 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }
-static inline bool h2g_has_room(struct intel_guc_ct_buffer *ctb, u32 len_dw) +static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) {
- struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = READ_ONCE(desc->head);
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
- u32 head; u32 space;
- space = CIRC_SPACE(desc->tail, head, ctb->size);
- if (ctb->space >= len_dw)
return true;
- head = READ_ONCE(ctb->desc->head);
- if (unlikely(head > ctb->size)) {
CT_ERROR(ct, "Corrupted descriptor head=%u tail=%u size=%u\n",
ctb->desc->head, ctb->desc->tail, ctb->size);
ctb->desc->status |= GUC_CTB_STATUS_OVERFLOW;
ctb->broken = true;
return false;
- }
- space = CIRC_SPACE(ctb->tail, head, ctb->size);
- ctb->space = space;
maybe here we could mark stall_time ?
if (space >= len_dw) return true;
if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get(); return false;
return space >= len_dw;
btw, maybe to avoid filling CTB to the last dword, this should be
space > len_dw
note the earlier comment:
/*
- tail == head condition indicates empty. GuC FW does not support
- using up the entire buffer to get tail == head meaning full.
*/
}
static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) {
struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
lockdep_assert_held(&ct->ctbs.send.lock);
if (unlikely(!h2g_has_room(ctb, len_dw))) {
- if (unlikely(!h2g_has_room(ct, len_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
@@ -606,10 +610,10 @@ static int ct_send(struct intel_guc_ct *ct, */ retry: spin_lock_irqsave(&ct->ctbs.send.lock, flags);
- if (unlikely(!h2g_has_room(ctb, len + 1))) {
- if (unlikely(!h2g_has_room(ct, len + 1))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
spin_unlock_irqrestore(&ctb->lock, flags);
if (unlikely(ct_deadlocked(ct))) return -EIO;
@@ -632,7 +636,7 @@ static int ct_send(struct intel_guc_ct *ct,
err = ct_write(ct, action, len, fence, 0);
- spin_unlock_irqrestore(&ct->ctbs.send.lock, flags);
spin_unlock_irqrestore(&ctb->lock, flags);
if (unlikely(err)) goto unlink;
@@ -720,7 +724,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; struct guc_ct_buffer_desc *desc = ctb->desc;
- u32 head = desc->head;
- u32 head = ctb->head; u32 tail = desc->tail; u32 size = ctb->size; u32 *cmds = ctb->cmds;
@@ -735,12 +739,21 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) if (unlikely(desc->status)) goto corrupted;
- if (unlikely((tail | head) >= size)) {
+#ifdef CONFIG_DRM_I915_DEBUG_GUC
as above we may want to check if our cached head was not modified
Sure.
- if (unlikely((desc->tail | desc->head) >= size)) { CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n", head, tail, size); desc->status |= GUC_CTB_STATUS_OVERFLOW; goto corrupted; }
+#else
- if (unlikely((tail | ctb->head) >= size)) {
CT_ERROR(ct, "Invalid offsets head=%u tail=%u (size=%u)\n",
head, tail, size);
desc->status |= GUC_CTB_STATUS_OVERFLOW;
goto corrupted;
- }
+#endif
/* tail == head condition indicates empty */ available = tail - head; @@ -790,6 +803,7 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) } CT_DEBUG(ct, "received %*ph\n", 4 * len, (*msg)->msg);
- ctb->head = head; /* now update descriptor */ WRITE_ONCE(desc->head, head);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 55ef7c52472f..9924335e2ee6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,9 @@ struct intel_guc;
- @desc: pointer to the buffer descriptor
- @cmds: pointer to the commands buffer
- @size: size of the commands buffer in dwords
- @head: local shadow copy of head in dwords
- @tail: local shadow copy of tail in dwords
*/
- @space: local shadow copy of space in dwords
- @broken: flag to indicate if descriptor data is broken
struct intel_guc_ct_buffer { @@ -40,6 +43,9 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;
- u32 tail;
- u32 head;
- u32 space;
in later patch this is changing to atomic_t maybe we can start with it ?
I'd rather leave this as is. It doesn't make to use an atomic here but G2H credits patch makes it clear why we need an atomic.
Matt
bool broken; };
From: John Harrison John.C.Harrison@Intel.com
Add several module failure load inject points in the CT buffer creation code path.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Michal Wajdeczko michal.wajdeczko@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 1fd5c69358ef..8e0ed7d8feb3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -175,6 +175,10 @@ static int ct_register_buffer(struct intel_guc_ct *ct, u32 type, { int err;
+ err = i915_inject_probe_error(guc_to_gt(ct_to_guc(ct))->i915, -ENXIO); + if (unlikely(err)) + return err; + err = guc_action_register_ct_buffer(ct_to_guc(ct), type, desc_addr, buff_addr, size); if (unlikely(err)) @@ -226,6 +230,10 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) u32 *cmds; int err;
+ err = i915_inject_probe_error(guc_to_gt(guc)->i915, -ENXIO); + if (err) + return err; + GEM_BUG_ON(ct->vma);
blob_size = 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE + CTB_G2H_BUFFER_SIZE;
Add new GuC interface defines and structures while maintaining old ones in parallel.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 14 +++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 41 +++++++++++++++++++ 2 files changed, 55 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h index 2d6198e63ebe..57e18babdf4b 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h @@ -124,10 +124,24 @@ enum intel_guc_action { INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302, INTEL_GUC_ACTION_ENTER_S_STATE = 0x501, INTEL_GUC_ACTION_EXIT_S_STATE = 0x502, + INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506, + INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000, + INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001, + INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002, + INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003, + INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004, + INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005, + INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006, + INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007, + INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008, + INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009, INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003, INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000, + INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502, + INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503, INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506, + INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600, INTEL_GUC_ACTION_LIMIT };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 617ec601648d..28245a217a39 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -17,6 +17,9 @@ #include "abi/guc_communication_ctb_abi.h" #include "abi/guc_messages_abi.h"
+#define GUC_CONTEXT_DISABLE 0 +#define GUC_CONTEXT_ENABLE 1 + #define GUC_CLIENT_PRIORITY_KMD_HIGH 0 #define GUC_CLIENT_PRIORITY_HIGH 1 #define GUC_CLIENT_PRIORITY_KMD_NORMAL 2 @@ -26,6 +29,9 @@ #define GUC_MAX_STAGE_DESCRIPTORS 1024 #define GUC_INVALID_STAGE_ID GUC_MAX_STAGE_DESCRIPTORS
+#define GUC_MAX_LRC_DESCRIPTORS 65535 +#define GUC_INVALID_LRC_ID GUC_MAX_LRC_DESCRIPTORS + #define GUC_RENDER_ENGINE 0 #define GUC_VIDEO_ENGINE 1 #define GUC_BLITTER_ENGINE 2 @@ -237,6 +243,41 @@ struct guc_stage_desc { u64 desc_private; } __packed;
+#define CONTEXT_REGISTRATION_FLAG_KMD BIT(0) + +#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000 +#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000 + +/* Preempt to idle on quantum expiry */ +#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE BIT(0) + +/* + * GuC Context registration descriptor. + * FIXME: This is only required to exist during context registration. + * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC + * is not required. + */ +struct guc_lrc_desc { + u32 hw_context_desc; + u32 slpm_perf_mode_hint; /* SPLC v1 only */ + u32 slpm_freq_hint; + u32 engine_submit_mask; /* In logical space */ + u8 engine_class; + u8 reserved0[3]; + u32 priority; + u32 process_desc; + u32 wq_addr; + u32 wq_size; + u32 context_flags; /* CONTEXT_REGISTRATION_* */ + /* Time for one workload to execute. (in micro seconds) */ + u32 execution_quantum; + /* Time to wait for a preemption request to complete before issuing a + * reset. (in micro seconds). */ + u32 preemption_timeout; + u32 policy_flags; /* CONTEXT_POLICY_* */ + u32 reserved1[19]; +} __packed; + #define GUC_POWER_UNSPECIFIED 0 #define GUC_POWER_D0 1 #define GUC_POWER_D1 2
On 6/24/2021 00:04, Matthew Brost wrote:
Add new GuC interface defines and structures while maintaining old ones in parallel.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
I think there was some difference of opinion over whether these additions should be squashed in to the specific patches that first use them. However, on the grounds that such is basically a patch-only style comment and doesn't change the final product plus, we need to get this stuff merged efficiently and not spend forever rebasing and refactoring...
Reviewed-by: John Harrison John.C.Harrison@Intel.com
.../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 14 +++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 41 +++++++++++++++++++ 2 files changed, 55 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h index 2d6198e63ebe..57e18babdf4b 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h @@ -124,10 +124,24 @@ enum intel_guc_action { INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302, INTEL_GUC_ACTION_ENTER_S_STATE = 0x501, INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
- INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506,
- INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000,
- INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001,
- INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,
- INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003,
- INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004,
- INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005,
- INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006,
- INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007,
- INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008,
- INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009, INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003, INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
- INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
- INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503, INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
- INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600, INTEL_GUC_ACTION_LIMIT };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 617ec601648d..28245a217a39 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -17,6 +17,9 @@ #include "abi/guc_communication_ctb_abi.h" #include "abi/guc_messages_abi.h"
+#define GUC_CONTEXT_DISABLE 0 +#define GUC_CONTEXT_ENABLE 1
- #define GUC_CLIENT_PRIORITY_KMD_HIGH 0 #define GUC_CLIENT_PRIORITY_HIGH 1 #define GUC_CLIENT_PRIORITY_KMD_NORMAL 2
@@ -26,6 +29,9 @@ #define GUC_MAX_STAGE_DESCRIPTORS 1024 #define GUC_INVALID_STAGE_ID GUC_MAX_STAGE_DESCRIPTORS
+#define GUC_MAX_LRC_DESCRIPTORS 65535 +#define GUC_INVALID_LRC_ID GUC_MAX_LRC_DESCRIPTORS
- #define GUC_RENDER_ENGINE 0 #define GUC_VIDEO_ENGINE 1 #define GUC_BLITTER_ENGINE 2
@@ -237,6 +243,41 @@ struct guc_stage_desc { u64 desc_private; } __packed;
+#define CONTEXT_REGISTRATION_FLAG_KMD BIT(0)
+#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000 +#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000
+/* Preempt to idle on quantum expiry */ +#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE BIT(0)
+/*
- GuC Context registration descriptor.
- FIXME: This is only required to exist during context registration.
- The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC
- is not required.
- */
+struct guc_lrc_desc {
- u32 hw_context_desc;
- u32 slpm_perf_mode_hint; /* SPLC v1 only */
- u32 slpm_freq_hint;
- u32 engine_submit_mask; /* In logical space */
- u8 engine_class;
- u8 reserved0[3];
- u32 priority;
- u32 process_desc;
- u32 wq_addr;
- u32 wq_size;
- u32 context_flags; /* CONTEXT_REGISTRATION_* */
- /* Time for one workload to execute. (in micro seconds) */
- u32 execution_quantum;
- /* Time to wait for a preemption request to complete before issuing a
* reset. (in micro seconds). */
- u32 preemption_timeout;
- u32 policy_flags; /* CONTEXT_POLICY_* */
- u32 reserved1[19];
+} __packed;
- #define GUC_POWER_UNSPECIFIED 0 #define GUC_POWER_D0 1 #define GUC_POWER_D1 2
On Tue, Jun 29, 2021 at 02:11:00PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Add new GuC interface defines and structures while maintaining old ones in parallel.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
I think there was some difference of opinion over whether these additions should be squashed in to the specific patches that first use them. However, on the grounds that such is basically a patch-only style comment and doesn't change the final product plus, we need to get this stuff merged efficiently and not spend forever rebasing and refactoring...
Agree this doesn't need to be squashed as it doesn't break anything and also this all dead code anyways until we enable submission at the end of the series.
Matt
Reviewed-by: John Harrison John.C.Harrison@Intel.com
.../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 14 +++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 41 +++++++++++++++++++ 2 files changed, 55 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h index 2d6198e63ebe..57e18babdf4b 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h @@ -124,10 +124,24 @@ enum intel_guc_action { INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302, INTEL_GUC_ACTION_ENTER_S_STATE = 0x501, INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
- INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506,
- INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000,
- INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001,
- INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,
- INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003,
- INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004,
- INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005,
- INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006,
- INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007,
- INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008,
- INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009, INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003, INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
- INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
- INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503, INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
- INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600, INTEL_GUC_ACTION_LIMIT };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 617ec601648d..28245a217a39 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -17,6 +17,9 @@ #include "abi/guc_communication_ctb_abi.h" #include "abi/guc_messages_abi.h" +#define GUC_CONTEXT_DISABLE 0 +#define GUC_CONTEXT_ENABLE 1
- #define GUC_CLIENT_PRIORITY_KMD_HIGH 0 #define GUC_CLIENT_PRIORITY_HIGH 1 #define GUC_CLIENT_PRIORITY_KMD_NORMAL 2
@@ -26,6 +29,9 @@ #define GUC_MAX_STAGE_DESCRIPTORS 1024 #define GUC_INVALID_STAGE_ID GUC_MAX_STAGE_DESCRIPTORS +#define GUC_MAX_LRC_DESCRIPTORS 65535 +#define GUC_INVALID_LRC_ID GUC_MAX_LRC_DESCRIPTORS
- #define GUC_RENDER_ENGINE 0 #define GUC_VIDEO_ENGINE 1 #define GUC_BLITTER_ENGINE 2
@@ -237,6 +243,41 @@ struct guc_stage_desc { u64 desc_private; } __packed; +#define CONTEXT_REGISTRATION_FLAG_KMD BIT(0)
+#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000 +#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000
+/* Preempt to idle on quantum expiry */ +#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE BIT(0)
+/*
- GuC Context registration descriptor.
- FIXME: This is only required to exist during context registration.
- The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC
- is not required.
- */
+struct guc_lrc_desc {
- u32 hw_context_desc;
- u32 slpm_perf_mode_hint; /* SPLC v1 only */
- u32 slpm_freq_hint;
- u32 engine_submit_mask; /* In logical space */
- u8 engine_class;
- u8 reserved0[3];
- u32 priority;
- u32 process_desc;
- u32 wq_addr;
- u32 wq_size;
- u32 context_flags; /* CONTEXT_REGISTRATION_* */
- /* Time for one workload to execute. (in micro seconds) */
- u32 execution_quantum;
- /* Time to wait for a preemption request to complete before issuing a
* reset. (in micro seconds). */
- u32 preemption_timeout;
- u32 policy_flags; /* CONTEXT_POLICY_* */
- u32 reserved1[19];
+} __packed;
- #define GUC_POWER_UNSPECIFIED 0 #define GUC_POWER_D0 1 #define GUC_POWER_D1 2
Remove old GuC stage descriptor, add lrc descriptor which will be used by the new GuC interface implemented in this patch series.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 65 ----------------- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 72 ++++++------------- 3 files changed, 25 insertions(+), 116 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 24b1df6ad4ae..b28fa54214f2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -43,8 +43,8 @@ struct intel_guc { struct i915_vma *ads_vma; struct __guc_ads_blob *ads_blob;
- struct i915_vma *stage_desc_pool; - void *stage_desc_pool_vaddr; + struct i915_vma *lrc_desc_pool; + void *lrc_desc_pool_vaddr;
/* Control params for fw initialization */ u32 params[GUC_CTL_MAX_DWORDS]; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 28245a217a39..4e4edc368b77 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -26,9 +26,6 @@ #define GUC_CLIENT_PRIORITY_NORMAL 3 #define GUC_CLIENT_PRIORITY_NUM 4
-#define GUC_MAX_STAGE_DESCRIPTORS 1024 -#define GUC_INVALID_STAGE_ID GUC_MAX_STAGE_DESCRIPTORS - #define GUC_MAX_LRC_DESCRIPTORS 65535 #define GUC_INVALID_LRC_ID GUC_MAX_LRC_DESCRIPTORS
@@ -181,68 +178,6 @@ struct guc_process_desc { u32 reserved[30]; } __packed;
-/* engine id and context id is packed into guc_execlist_context.context_id*/ -#define GUC_ELC_CTXID_OFFSET 0 -#define GUC_ELC_ENGINE_OFFSET 29 - -/* The execlist context including software and HW information */ -struct guc_execlist_context { - u32 context_desc; - u32 context_id; - u32 ring_status; - u32 ring_lrca; - u32 ring_begin; - u32 ring_end; - u32 ring_next_free_location; - u32 ring_current_tail_pointer_value; - u8 engine_state_submit_value; - u8 engine_state_wait_value; - u16 pagefault_count; - u16 engine_submit_queue_count; -} __packed; - -/* - * This structure describes a stage set arranged for a particular communication - * between uKernel (GuC) and Driver (KMD). Technically, this is known as a - * "GuC Context descriptor" in the specs, but we use the term "stage descriptor" - * to avoid confusion with all the other things already named "context" in the - * driver. A static pool of these descriptors are stored inside a GEM object - * (stage_desc_pool) which is held for the entire lifetime of our interaction - * with the GuC, being allocated before the GuC is loaded with its firmware. - */ -struct guc_stage_desc { - u32 sched_common_area; - u32 stage_id; - u32 pas_id; - u8 engines_used; - u64 db_trigger_cpu; - u32 db_trigger_uk; - u64 db_trigger_phy; - u16 db_id; - - struct guc_execlist_context lrc[GUC_MAX_ENGINES_NUM]; - - u8 attribute; - - u32 priority; - - u32 wq_sampled_tail_offset; - u32 wq_total_submit_enqueues; - - u32 process_desc; - u32 wq_addr; - u32 wq_size; - - u32 engine_presence; - - u8 engine_suspended; - - u8 reserved0[3]; - u64 reserved1[1]; - - u64 desc_private; -} __packed; - #define CONTEXT_REGISTRATION_FLAG_KMD BIT(0)
#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000 diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index e9c237b18692..a366890fb840 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -65,57 +65,35 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb) return rb_entry(rb, struct i915_priolist, node); }
-static struct guc_stage_desc *__get_stage_desc(struct intel_guc *guc, u32 id) +/* Future patches will use this function */ +__attribute__ ((unused)) +static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) { - struct guc_stage_desc *base = guc->stage_desc_pool_vaddr; + struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
- return &base[id]; -} - -static int guc_stage_desc_pool_create(struct intel_guc *guc) -{ - u32 size = PAGE_ALIGN(sizeof(struct guc_stage_desc) * - GUC_MAX_STAGE_DESCRIPTORS); + GEM_BUG_ON(index >= GUC_MAX_LRC_DESCRIPTORS);
- return intel_guc_allocate_and_map_vma(guc, size, &guc->stage_desc_pool, - &guc->stage_desc_pool_vaddr); + return &base[index]; }
-static void guc_stage_desc_pool_destroy(struct intel_guc *guc) -{ - i915_vma_unpin_and_release(&guc->stage_desc_pool, I915_VMA_RELEASE_MAP); -} - -/* - * Initialise/clear the stage descriptor shared with the GuC firmware. - * - * This descriptor tells the GuC where (in GGTT space) to find the important - * data structures related to work submission (process descriptor, write queue, - * etc). - */ -static void guc_stage_desc_init(struct intel_guc *guc) +static int guc_lrc_desc_pool_create(struct intel_guc *guc) { - struct guc_stage_desc *desc; - - /* we only use 1 stage desc, so hardcode it to 0 */ - desc = __get_stage_desc(guc, 0); - memset(desc, 0, sizeof(*desc)); - - desc->attribute = GUC_STAGE_DESC_ATTR_ACTIVE | - GUC_STAGE_DESC_ATTR_KERNEL; + u32 size; + int ret;
- desc->stage_id = 0; - desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL; + size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) * + GUC_MAX_LRC_DESCRIPTORS); + ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool, + (void **)&guc->lrc_desc_pool_vaddr); + if (ret) + return ret;
- desc->wq_size = GUC_WQ_SIZE; + return 0; }
-static void guc_stage_desc_fini(struct intel_guc *guc) +static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) { - struct guc_stage_desc *desc; - - desc = __get_stage_desc(guc, 0); - memset(desc, 0, sizeof(*desc)); + i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) @@ -410,26 +388,25 @@ int intel_guc_submission_init(struct intel_guc *guc) { int ret;
- if (guc->stage_desc_pool) + if (guc->lrc_desc_pool) return 0;
- ret = guc_stage_desc_pool_create(guc); + ret = guc_lrc_desc_pool_create(guc); if (ret) return ret; /* * Keep static analysers happy, let them know that we allocated the * vma after testing that it didn't exist earlier. */ - GEM_BUG_ON(!guc->stage_desc_pool); + GEM_BUG_ON(!guc->lrc_desc_pool);
return 0; }
void intel_guc_submission_fini(struct intel_guc *guc) { - if (guc->stage_desc_pool) { - guc_stage_desc_pool_destroy(guc); - } + if (guc->lrc_desc_pool) + guc_lrc_desc_pool_destroy(guc); }
static int guc_context_alloc(struct intel_context *ce) @@ -695,7 +672,6 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
void intel_guc_submission_enable(struct intel_guc *guc) { - guc_stage_desc_init(guc); }
void intel_guc_submission_disable(struct intel_guc *guc) @@ -705,8 +681,6 @@ void intel_guc_submission_disable(struct intel_guc *guc) GEM_BUG_ON(gt->awake); /* GT should be parked first */
/* Note: By the time we're here, GuC may have already been reset */ - - guc_stage_desc_fini(guc); }
static bool __guc_submission_selected(struct intel_guc *guc)
On 6/24/2021 00:04, Matthew Brost wrote:
Remove old GuC stage descriptor, add lrc descriptor which will be used by the new GuC interface implemented in this patch series.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: John Harrison John.C.Harrison@Intel.com
Add lrc descriptor context lookup array which can resolve the intel_context from the lrc descriptor index. In addition to lookup, it can determine in the lrc descriptor context is currently registered with the GuC by checking if an entry for a descriptor index is present. Future patches in the series will make use of this array.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 5 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++-- 2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b28fa54214f2..2313d9fc087b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -6,6 +6,8 @@ #ifndef _INTEL_GUC_H_ #define _INTEL_GUC_H_
+#include "linux/xarray.h" + #include "intel_uncore.h" #include "intel_guc_fw.h" #include "intel_guc_fwif.h" @@ -46,6 +48,9 @@ struct intel_guc { struct i915_vma *lrc_desc_pool; void *lrc_desc_pool_vaddr;
+ /* guc_id to intel_context lookup */ + struct xarray context_lookup; + /* Control params for fw initialization */ u32 params[GUC_CTL_MAX_DWORDS];
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a366890fb840..23a94a896a0b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb) return rb_entry(rb, struct i915_priolist, node); }
-/* Future patches will use this function */ -__attribute__ ((unused)) static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) { struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) return &base[index]; }
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id) +{ + struct intel_context *ce = xa_load(&guc->context_lookup, id); + + GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS); + + return ce; +} + static int guc_lrc_desc_pool_create(struct intel_guc *guc) { u32 size; @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) +{ + struct guc_lrc_desc *desc = __get_lrc_desc(guc, id); + + memset(desc, 0, sizeof(*desc)); + xa_erase_irq(&guc->context_lookup, id); +} + +static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) +{ + return __get_context(guc, id); +} + +static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, + struct intel_context *ce) +{ + xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); +} + static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) { /* Leaving stub as this function will be used in future patches */ @@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc) */ GEM_BUG_ON(!guc->lrc_desc_pool);
+ xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ); + return 0; }
On 24.06.2021 09:04, Matthew Brost wrote:
Add lrc descriptor context lookup array which can resolve the intel_context from the lrc descriptor index. In addition to lookup, it can determine in the lrc descriptor context is currently registered with the GuC by checking if an entry for a descriptor index is present. Future patches in the series will make use of this array.
s/lrc/LRC
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 5 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++-- 2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b28fa54214f2..2313d9fc087b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -6,6 +6,8 @@ #ifndef _INTEL_GUC_H_ #define _INTEL_GUC_H_
+#include "linux/xarray.h"
#include <linux/xarray.h>
#include "intel_uncore.h" #include "intel_guc_fw.h" #include "intel_guc_fwif.h" @@ -46,6 +48,9 @@ struct intel_guc { struct i915_vma *lrc_desc_pool; void *lrc_desc_pool_vaddr;
- /* guc_id to intel_context lookup */
- struct xarray context_lookup;
- /* Control params for fw initialization */ u32 params[GUC_CTL_MAX_DWORDS];
btw, IIRC there was idea to move most struct definitions to intel_guc_types.h, is this still a plan ?
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a366890fb840..23a94a896a0b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb) return rb_entry(rb, struct i915_priolist, node); }
-/* Future patches will use this function */ -__attribute__ ((unused)) static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) { struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) return &base[index]; }
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id) +{
- struct intel_context *ce = xa_load(&guc->context_lookup, id);
- GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
- return ce;
+}
static int guc_lrc_desc_pool_create(struct intel_guc *guc) { u32 size; @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) +{
- struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
- memset(desc, 0, sizeof(*desc));
- xa_erase_irq(&guc->context_lookup, id);
+}
+static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) +{
- return __get_context(guc, id);
+}
+static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
struct intel_context *ce)
+{
- xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+}
static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) { /* Leaving stub as this function will be used in future patches */ @@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc) */ GEM_BUG_ON(!guc->lrc_desc_pool);
- xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
- return 0;
}
On Fri, Jun 25, 2021 at 03:17:51PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Add lrc descriptor context lookup array which can resolve the intel_context from the lrc descriptor index. In addition to lookup, it can determine in the lrc descriptor context is currently registered with the GuC by checking if an entry for a descriptor index is present. Future patches in the series will make use of this array.
s/lrc/LRC
I guess? lrc and LRC are used interchangeably throughout the current code base.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 5 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++-- 2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b28fa54214f2..2313d9fc087b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -6,6 +6,8 @@ #ifndef _INTEL_GUC_H_ #define _INTEL_GUC_H_
+#include "linux/xarray.h"
#include <linux/xarray.h>
Yep.
#include "intel_uncore.h" #include "intel_guc_fw.h" #include "intel_guc_fwif.h" @@ -46,6 +48,9 @@ struct intel_guc { struct i915_vma *lrc_desc_pool; void *lrc_desc_pool_vaddr;
- /* guc_id to intel_context lookup */
- struct xarray context_lookup;
- /* Control params for fw initialization */ u32 params[GUC_CTL_MAX_DWORDS];
btw, IIRC there was idea to move most struct definitions to intel_guc_types.h, is this still a plan ?
I don't ever recall discussing this but we can certainly do this. For what it is worth we do introduce intel_guc_submission_types.h a bit later. I'll make a note about intel_guc_types.h though.
Matt
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a366890fb840..23a94a896a0b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb) return rb_entry(rb, struct i915_priolist, node); }
-/* Future patches will use this function */ -__attribute__ ((unused)) static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) { struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) return &base[index]; }
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id) +{
- struct intel_context *ce = xa_load(&guc->context_lookup, id);
- GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
- return ce;
+}
static int guc_lrc_desc_pool_create(struct intel_guc *guc) { u32 size; @@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) +{
- struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
- memset(desc, 0, sizeof(*desc));
- xa_erase_irq(&guc->context_lookup, id);
+}
+static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) +{
- return __get_context(guc, id);
+}
+static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
struct intel_context *ce)
+{
- xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+}
static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) { /* Leaving stub as this function will be used in future patches */ @@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc) */ GEM_BUG_ON(!guc->lrc_desc_pool);
- xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
- return 0;
}
On 6/25/2021 10:26, Matthew Brost wrote:
On Fri, Jun 25, 2021 at 03:17:51PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Add lrc descriptor context lookup array which can resolve the intel_context from the lrc descriptor index. In addition to lookup, it can determine in the lrc descriptor context is currently registered with the GuC by checking if an entry for a descriptor index is present. Future patches in the series will make use of this array.
s/lrc/LRC
I guess? lrc and LRC are used interchangeably throughout the current code base.
It is an abbreviation so LRC is technically the correct version for a comment. The fact that other existing comments are incorrect is not a valid reason to perpetuate a mistake :). Might as well fix it if you are going to repost the patch anyway for any other reason, but I would not call it a blocking issue.
Also, 'can determine in the' should be 'can determine if the'. Again, not exactly a blocking issue but should be fixed.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 5 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++-- 2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b28fa54214f2..2313d9fc087b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -6,6 +6,8 @@ #ifndef _INTEL_GUC_H_ #define _INTEL_GUC_H_
+#include "linux/xarray.h"
#include <linux/xarray.h>
Yep.
- #include "intel_uncore.h" #include "intel_guc_fw.h" #include "intel_guc_fwif.h"
@@ -46,6 +48,9 @@ struct intel_guc { struct i915_vma *lrc_desc_pool; void *lrc_desc_pool_vaddr;
- /* guc_id to intel_context lookup */
- struct xarray context_lookup;
- /* Control params for fw initialization */ u32 params[GUC_CTL_MAX_DWORDS];
btw, IIRC there was idea to move most struct definitions to intel_guc_types.h, is this still a plan ?
I don't ever recall discussing this but we can certainly do this. For what it is worth we do introduce intel_guc_submission_types.h a bit later. I'll make a note about intel_guc_types.h though.
Matt
Yeah, my only recollection was about the submission types header. Are there sufficient non-submission fields in the GuC structure to warrant a general GuC types header?
With the commit message tweaks and #include fix mentioned above, it looks good to me. Reviewed-by: John Harrison John.C.Harrison@Intel.com
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a366890fb840..23a94a896a0b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb) return rb_entry(rb, struct i915_priolist, node); }
-/* Future patches will use this function */ -__attribute__ ((unused)) static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) { struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr; @@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index) return &base[index]; }
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id) +{
- struct intel_context *ce = xa_load(&guc->context_lookup, id);
- GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
- return ce;
+}
- static int guc_lrc_desc_pool_create(struct intel_guc *guc) { u32 size;
@@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) +{
- struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
- memset(desc, 0, sizeof(*desc));
- xa_erase_irq(&guc->context_lookup, id);
+}
+static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) +{
- return __get_context(guc, id);
+}
+static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
struct intel_context *ce)
+{
- xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+}
- static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) { /* Leaving stub as this function will be used in future patches */
@@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc) */ GEM_BUG_ON(!guc->lrc_desc_pool);
- xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
- return 0; }
Implement GuC submission tasklet for new interface. The new GuC interface uses H2G to submit contexts to the GuC. Since H2G use a single channel, a single tasklet submits is used for the submission path.
Also the per engine interrupt handler has been updated to disable the rescheduling of the physical engine tasklet, when using GuC scheduling, as the physical engine tasklet is no longer used.
In this patch the field, guc_id, has been added to intel_context and is not assigned. Patches later in the series will assign this value.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_context_types.h | 9 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++--------- 3 files changed, 127 insertions(+), 117 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index ed8c447a7346..bb6fef7eae52 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -136,6 +136,15 @@ struct intel_context { struct intel_sseu sseu;
u8 wa_bb_page; /* if set, page num reserved for context workarounds */ + + /* GuC scheduling state that does not require a lock. */ + atomic_t guc_sched_state_no_lock; + + /* + * GuC lrc descriptor ID - Not assigned in this patch but future patches + * in the series will. + */ + u16 guc_id; };
#endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 2313d9fc087b..9ba8219475b2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -30,6 +30,10 @@ struct intel_guc { struct intel_guc_log log; struct intel_guc_ct ct;
+ /* Global engine used to submit requests to GuC */ + struct i915_sched_engine *sched_engine; + struct i915_request *stalled_request; + /* intel_guc_recv interrupt related state */ spinlock_t irq_lock; unsigned int msg_enabled_mask; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 23a94a896a0b..ee933efbf0ff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -60,6 +60,31 @@
#define GUC_REQUEST_SIZE 64 /* bytes */
+/* + * Below is a set of functions which control the GuC scheduling state which do + * not require a lock as all state transitions are mutually exclusive. i.e. It + * is not possible for the context pinning code and submission, for the same + * context, to be executing simultaneously. We still need an atomic as it is + * possible for some of the bits to changing at the same time though. + */ +#define SCHED_STATE_NO_LOCK_ENABLED BIT(0) +static inline bool context_enabled(struct intel_context *ce) +{ + return (atomic_read(&ce->guc_sched_state_no_lock) & + SCHED_STATE_NO_LOCK_ENABLED); +} + +static inline void set_context_enabled(struct intel_context *ce) +{ + atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock); +} + +static inline void clr_context_enabled(struct intel_context *ce) +{ + atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED, + &ce->guc_sched_state_no_lock); +} + static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node); @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); }
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { - /* Leaving stub as this function will be used in future patches */ -} + int err; + struct intel_context *ce = rq->context; + u32 action[3]; + int len = 0; + bool enabled = context_enabled(ce);
-/* - * When we're doing submissions using regular execlists backend, writing to - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages - * pinned in mappable aperture portion of GGTT are visible to command streamer. - * Writes done by GuC on our behalf are not guaranteeing such ordering, - * therefore, to ensure the flush, we're issuing a POSTING READ. - */ -static void flush_ggtt_writes(struct i915_vma *vma) -{ - if (i915_vma_is_map_and_fenceable(vma)) - intel_uncore_posting_read_fw(vma->vm->gt->uncore, - GUC_STATUS); -} + if (!enabled) { + action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; + action[len++] = ce->guc_id; + action[len++] = GUC_CONTEXT_ENABLE; + } else { + action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; + action[len++] = ce->guc_id; + }
-static void guc_submit(struct intel_engine_cs *engine, - struct i915_request **out, - struct i915_request **end) -{ - struct intel_guc *guc = &engine->gt->uc.guc; + err = intel_guc_send_nb(guc, action, len);
- do { - struct i915_request *rq = *out++; + if (!enabled && !err) + set_context_enabled(ce);
- flush_ggtt_writes(rq->ring->vma); - guc_add_request(guc, rq); - } while (out != end); + return err; }
static inline int rq_prio(const struct i915_request *rq) @@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq) return rq->sched.attr.priority; }
-static struct i915_request *schedule_in(struct i915_request *rq, int idx) +static int guc_dequeue_one_context(struct intel_guc *guc) { - trace_i915_request_in(rq, idx); - - /* - * Currently we are not tracking the rq->context being inflight - * (ce->inflight = rq->engine). It is only used by the execlists - * backend at the moment, a similar counting strategy would be - * required if we generalise the inflight tracking. - */ - - __intel_gt_pm_get(rq->engine->gt); - return i915_request_get(rq); -} - -static void schedule_out(struct i915_request *rq) -{ - trace_i915_request_out(rq); - - intel_gt_pm_put_async(rq->engine->gt); - i915_request_put(rq); -} - -static void __guc_dequeue(struct intel_engine_cs *engine) -{ - struct intel_engine_execlists * const execlists = &engine->execlists; - struct i915_sched_engine * const sched_engine = engine->sched_engine; - struct i915_request **first = execlists->inflight; - struct i915_request ** const last_port = first + execlists->port_mask; - struct i915_request *last = first[0]; - struct i915_request **port; + struct i915_sched_engine * const sched_engine = guc->sched_engine; + struct i915_request *last = NULL; bool submit = false; struct rb_node *rb; + int ret;
lockdep_assert_held(&sched_engine->lock);
- if (last) { - if (*++first) - return; - - last = NULL; + if (guc->stalled_request) { + submit = true; + last = guc->stalled_request; + goto resubmit; }
- /* - * We write directly into the execlists->inflight queue and don't use - * the execlists->pending queue, as we don't have a distinct switch - * event. - */ - port = first; while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); struct i915_request *rq, *rn;
priolist_for_each_request_consume(rq, rn, p) { - if (last && rq->context != last->context) { - if (port == last_port) - goto done; - - *port = schedule_in(last, - port - execlists->inflight); - port++; - } + if (last && rq->context != last->context) + goto done;
list_del_init(&rq->sched.link); + __i915_request_submit(rq); - submit = true; + + trace_i915_request_in(rq, 0); last = rq; + submit = true; }
rb_erase_cached(&p->node, &sched_engine->queue); i915_priolist_free(p); } done: - sched_engine->queue_priority_hint = - rb ? to_priolist(rb)->priority : INT_MIN; if (submit) { - *port = schedule_in(last, port - execlists->inflight); - *++port = NULL; - guc_submit(engine, first, port); + last->context->lrc_reg_state[CTX_RING_TAIL] = + intel_ring_set_tail(last->ring, last->tail); +resubmit: + /* + * We only check for -EBUSY here even though it is possible for + * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has + * died and a full GPU needs to be done. The hangcheck will + * eventually detect that the GuC has died and trigger this + * reset so no need to handle -EDEADLK here. + */ + ret = guc_add_request(guc, last); + if (ret == -EBUSY) { + tasklet_schedule(&sched_engine->tasklet); + guc->stalled_request = last; + return false; + } } - execlists->active = execlists->inflight; + + guc->stalled_request = NULL; + return submit; }
static void guc_submission_tasklet(struct tasklet_struct *t) { struct i915_sched_engine *sched_engine = from_tasklet(sched_engine, t, tasklet); - struct intel_engine_cs * const engine = sched_engine->private_data; - struct intel_engine_execlists * const execlists = &engine->execlists; - struct i915_request **port, *rq; unsigned long flags; + bool loop;
- spin_lock_irqsave(&engine->sched_engine->lock, flags); - - for (port = execlists->inflight; (rq = *port); port++) { - if (!i915_request_completed(rq)) - break; - - schedule_out(rq); - } - if (port != execlists->inflight) { - int idx = port - execlists->inflight; - int rem = ARRAY_SIZE(execlists->inflight) - idx; - memmove(execlists->inflight, port, rem * sizeof(*port)); - } + spin_lock_irqsave(&sched_engine->lock, flags);
- __guc_dequeue(engine); + do { + loop = guc_dequeue_one_context(sched_engine->private_data); + } while (loop);
- i915_sched_engine_reset_on_empty(engine->sched_engine); + i915_sched_engine_reset_on_empty(sched_engine);
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags); + spin_unlock_irqrestore(&sched_engine->lock, flags); }
static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) { - if (iir & GT_RENDER_USER_INTERRUPT) { + if (iir & GT_RENDER_USER_INTERRUPT) intel_engine_signal_breadcrumbs(engine); - tasklet_hi_schedule(&engine->sched_engine->tasklet); - } }
static void guc_reset_prepare(struct intel_engine_cs *engine) @@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) struct rb_node *rb; unsigned long flags;
+ /* Can be called during boot if GuC fails to load */ + if (!engine->gt) + return; + ENGINE_TRACE(engine, "\n");
/* @@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
void intel_guc_submission_fini(struct intel_guc *guc) { - if (guc->lrc_desc_pool) - guc_lrc_desc_pool_destroy(guc); + if (!guc->lrc_desc_pool) + return; + + guc_lrc_desc_pool_destroy(guc); + i915_sched_engine_put(guc->sched_engine); }
static int guc_context_alloc(struct intel_context *ce) @@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request) return 0; }
-static inline void queue_request(struct intel_engine_cs *engine, +static inline void queue_request(struct i915_sched_engine *sched_engine, struct i915_request *rq, int prio) { GEM_BUG_ON(!list_empty(&rq->sched.link)); list_add_tail(&rq->sched.link, - i915_sched_lookup_priolist(engine->sched_engine, prio)); + i915_sched_lookup_priolist(sched_engine, prio)); set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); }
static void guc_submit_request(struct i915_request *rq) { - struct intel_engine_cs *engine = rq->engine; + struct i915_sched_engine *sched_engine = rq->engine->sched_engine; unsigned long flags;
/* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->sched_engine->lock, flags); + spin_lock_irqsave(&sched_engine->lock, flags);
- queue_request(engine, rq, rq_prio(rq)); + queue_request(sched_engine, rq, rq_prio(rq));
- GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine)); + GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine)); GEM_BUG_ON(list_empty(&rq->sched.link));
- tasklet_hi_schedule(&engine->sched_engine->tasklet); + tasklet_hi_schedule(&sched_engine->tasklet);
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags); + spin_unlock_irqrestore(&sched_engine->lock, flags); }
static void sanitize_hwsp(struct intel_engine_cs *engine) @@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine) { engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
- tasklet_kill(&engine->sched_engine->tasklet); - intel_engine_cleanup_common(engine); lrc_fini_wa_ctx(engine); } @@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine) int intel_guc_submission_setup(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915; + struct intel_guc *guc = &engine->gt->uc.guc;
/* * The setup relies on several assumptions (e.g. irqs always enabled) @@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) */ GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
- tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet); + if (!guc->sched_engine) { + guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL); + if (!guc->sched_engine) + return -ENOMEM; + + guc->sched_engine->schedule = i915_schedule; + guc->sched_engine->private_data = guc; + tasklet_setup(&guc->sched_engine->tasklet, + guc_submission_tasklet); + } + i915_sched_engine_put(engine->sched_engine); + engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
guc_default_vfuncs(engine); guc_default_irqs(engine);
On 6/24/2021 00:04, Matthew Brost wrote:
Implement GuC submission tasklet for new interface. The new GuC interface uses H2G to submit contexts to the GuC. Since H2G use a single channel, a single tasklet submits is used for the submission path.
Re-word? 'a single tasklet submits is used...' doesn't make sense.
Also the per engine interrupt handler has been updated to disable the rescheduling of the physical engine tasklet, when using GuC scheduling, as the physical engine tasklet is no longer used.
In this patch the field, guc_id, has been added to intel_context and is not assigned. Patches later in the series will assign this value.
Cc: John Harrisonjohn.c.harrison@intel.com Signed-off-by: Matthew Brostmatthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context_types.h | 9 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++--------- 3 files changed, 127 insertions(+), 117 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index ed8c447a7346..bb6fef7eae52 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -136,6 +136,15 @@ struct intel_context { struct intel_sseu sseu;
u8 wa_bb_page; /* if set, page num reserved for context workarounds */
- /* GuC scheduling state that does not require a lock. */
Maybe 'GuC scheduling state flags that do not require a lock'? Otherwise it just looks like a counter or something.
- atomic_t guc_sched_state_no_lock;
- /*
* GuC lrc descriptor ID - Not assigned in this patch but future patches
Not a blocker but s/lrc/LRC/ would keep Michal happy ;). Although presumably this comment is at least being amended by later patches in the series.
* in the series will.
*/
u16 guc_id; };
#endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 2313d9fc087b..9ba8219475b2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -30,6 +30,10 @@ struct intel_guc { struct intel_guc_log log; struct intel_guc_ct ct;
- /* Global engine used to submit requests to GuC */
- struct i915_sched_engine *sched_engine;
- struct i915_request *stalled_request;
- /* intel_guc_recv interrupt related state */ spinlock_t irq_lock; unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 23a94a896a0b..ee933efbf0ff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -60,6 +60,31 @@
#define GUC_REQUEST_SIZE 64 /* bytes */
+/*
- Below is a set of functions which control the GuC scheduling state which do
- not require a lock as all state transitions are mutually exclusive. i.e. It
- is not possible for the context pinning code and submission, for the same
- context, to be executing simultaneously. We still need an atomic as it is
- possible for some of the bits to changing at the same time though.
- */
+#define SCHED_STATE_NO_LOCK_ENABLED BIT(0) +static inline bool context_enabled(struct intel_context *ce) +{
- return (atomic_read(&ce->guc_sched_state_no_lock) &
SCHED_STATE_NO_LOCK_ENABLED);
+}
+static inline void set_context_enabled(struct intel_context *ce) +{
- atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+static inline void clr_context_enabled(struct intel_context *ce) +{
- atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
&ce->guc_sched_state_no_lock);
+}
- static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); }
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) {
- /* Leaving stub as this function will be used in future patches */
-}
- int err;
- struct intel_context *ce = rq->context;
- u32 action[3];
- int len = 0;
- bool enabled = context_enabled(ce);
-/*
- When we're doing submissions using regular execlists backend, writing to
- ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- pinned in mappable aperture portion of GGTT are visible to command streamer.
- Writes done by GuC on our behalf are not guaranteeing such ordering,
- therefore, to ensure the flush, we're issuing a POSTING READ.
- */
-static void flush_ggtt_writes(struct i915_vma *vma) -{
- if (i915_vma_is_map_and_fenceable(vma))
intel_uncore_posting_read_fw(vma->vm->gt->uncore,
GUC_STATUS);
-}
- if (!enabled) {
action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
action[len++] = ce->guc_id;
action[len++] = GUC_CONTEXT_ENABLE;
- } else {
action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
action[len++] = ce->guc_id;
- }
-static void guc_submit(struct intel_engine_cs *engine,
struct i915_request **out,
struct i915_request **end)
-{
- struct intel_guc *guc = &engine->gt->uc.guc;
- err = intel_guc_send_nb(guc, action, len);
- do {
struct i915_request *rq = *out++;
- if (!enabled && !err)
set_context_enabled(ce);
flush_ggtt_writes(rq->ring->vma);
guc_add_request(guc, rq);
- } while (out != end);
return err; }
static inline int rq_prio(const struct i915_request *rq)
@@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq) return rq->sched.attr.priority; }
-static struct i915_request *schedule_in(struct i915_request *rq, int idx) +static int guc_dequeue_one_context(struct intel_guc *guc) {
- trace_i915_request_in(rq, idx);
- /*
* Currently we are not tracking the rq->context being inflight
* (ce->inflight = rq->engine). It is only used by the execlists
* backend at the moment, a similar counting strategy would be
* required if we generalise the inflight tracking.
*/
- __intel_gt_pm_get(rq->engine->gt);
- return i915_request_get(rq);
-}
-static void schedule_out(struct i915_request *rq) -{
- trace_i915_request_out(rq);
- intel_gt_pm_put_async(rq->engine->gt);
- i915_request_put(rq);
-}
-static void __guc_dequeue(struct intel_engine_cs *engine) -{
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_sched_engine * const sched_engine = engine->sched_engine;
- struct i915_request **first = execlists->inflight;
- struct i915_request ** const last_port = first + execlists->port_mask;
- struct i915_request *last = first[0];
- struct i915_request **port;
struct i915_sched_engine * const sched_engine = guc->sched_engine;
struct i915_request *last = NULL; bool submit = false; struct rb_node *rb;
int ret;
lockdep_assert_held(&sched_engine->lock);
- if (last) {
if (*++first)
return;
last = NULL;
- if (guc->stalled_request) {
submit = true;
last = guc->stalled_request;
}goto resubmit;
/*
* We write directly into the execlists->inflight queue and don't use
* the execlists->pending queue, as we don't have a distinct switch
* event.
*/
port = first; while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); struct i915_request *rq, *rn;
priolist_for_each_request_consume(rq, rn, p) {
if (last && rq->context != last->context) {
if (port == last_port)
goto done;
*port = schedule_in(last,
port - execlists->inflight);
port++;
}
if (last && rq->context != last->context)
goto done; list_del_init(&rq->sched.link);
__i915_request_submit(rq);
submit = true;
trace_i915_request_in(rq, 0); last = rq;
submit = true;
}
rb_erase_cached(&p->node, &sched_engine->queue); i915_priolist_free(p); } done:
- sched_engine->queue_priority_hint =
if (submit) {rb ? to_priolist(rb)->priority : INT_MIN;
*port = schedule_in(last, port - execlists->inflight);
*++port = NULL;
guc_submit(engine, first, port);
last->context->lrc_reg_state[CTX_RING_TAIL] =
intel_ring_set_tail(last->ring, last->tail);
+resubmit:
/*
* We only check for -EBUSY here even though it is possible for
* -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
* died and a full GPU needs to be done. The hangcheck will
'full GPU reset'. Although I believe strictly speaking, it is a 'full GT reset'. There are other bits of the GPU beyond the GT.
* eventually detect that the GuC has died and trigger this
* reset so no need to handle -EDEADLK here.
*/
ret = guc_add_request(guc, last);
if (ret == -EBUSY) {
tasklet_schedule(&sched_engine->tasklet);
guc->stalled_request = last;
return false;
}}
- execlists->active = execlists->inflight;
guc->stalled_request = NULL;
return submit; }
static void guc_submission_tasklet(struct tasklet_struct *t) { struct i915_sched_engine *sched_engine = from_tasklet(sched_engine, t, tasklet);
- struct intel_engine_cs * const engine = sched_engine->private_data;
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_request **port, *rq; unsigned long flags;
- bool loop;
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- for (port = execlists->inflight; (rq = *port); port++) {
if (!i915_request_completed(rq))
break;
schedule_out(rq);
- }
- if (port != execlists->inflight) {
int idx = port - execlists->inflight;
int rem = ARRAY_SIZE(execlists->inflight) - idx;
memmove(execlists->inflight, port, rem * sizeof(*port));
- }
- spin_lock_irqsave(&sched_engine->lock, flags);
- __guc_dequeue(engine);
- do {
loop = guc_dequeue_one_context(sched_engine->private_data);
- } while (loop);
- i915_sched_engine_reset_on_empty(engine->sched_engine);
- i915_sched_engine_reset_on_empty(sched_engine);
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags); }
Not a blocker but it has to be said that it would be much easier to remove all of the above if the delete was split into a separate patch. Having two completely disparate threads of code interwoven in the diff makes it much harder to see what the new version is doing!
static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) {
- if (iir & GT_RENDER_USER_INTERRUPT) {
- if (iir & GT_RENDER_USER_INTERRUPT) intel_engine_signal_breadcrumbs(engine);
tasklet_hi_schedule(&engine->sched_engine->tasklet);
} }
static void guc_reset_prepare(struct intel_engine_cs *engine)
@@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) struct rb_node *rb; unsigned long flags;
/* Can be called during boot if GuC fails to load */
if (!engine->gt)
return;
ENGINE_TRACE(engine, "\n");
/*
@@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
void intel_guc_submission_fini(struct intel_guc *guc) {
- if (guc->lrc_desc_pool)
guc_lrc_desc_pool_destroy(guc);
if (!guc->lrc_desc_pool)
return;
guc_lrc_desc_pool_destroy(guc);
i915_sched_engine_put(guc->sched_engine); }
static int guc_context_alloc(struct intel_context *ce)
@@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request) return 0; }
-static inline void queue_request(struct intel_engine_cs *engine, +static inline void queue_request(struct i915_sched_engine *sched_engine, struct i915_request *rq, int prio) { GEM_BUG_ON(!list_empty(&rq->sched.link)); list_add_tail(&rq->sched.link,
i915_sched_lookup_priolist(engine->sched_engine, prio));
i915_sched_lookup_priolist(sched_engine, prio));
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); }
static void guc_submit_request(struct i915_request *rq) {
- struct intel_engine_cs *engine = rq->engine;
struct i915_sched_engine *sched_engine = rq->engine->sched_engine; unsigned long flags;
/* Will be called from irq-context when using foreign fences. */
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- spin_lock_irqsave(&sched_engine->lock, flags);
- queue_request(engine, rq, rq_prio(rq));
- queue_request(sched_engine, rq, rq_prio(rq));
- GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
- GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine)); GEM_BUG_ON(list_empty(&rq->sched.link));
- tasklet_hi_schedule(&engine->sched_engine->tasklet);
- tasklet_hi_schedule(&sched_engine->tasklet);
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
spin_unlock_irqrestore(&sched_engine->lock, flags); }
static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine) { engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
- tasklet_kill(&engine->sched_engine->tasklet);
- intel_engine_cleanup_common(engine); lrc_fini_wa_ctx(engine); }
@@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine) int intel_guc_submission_setup(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
struct intel_guc *guc = &engine->gt->uc.guc;
/*
- The setup relies on several assumptions (e.g. irqs always enabled)
@@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) */ GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
- tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
- if (!guc->sched_engine) {
guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
Does the re-work of the sched_engine create/destroy happen later in this patch series? Wasn't there issues with the wrong destroy function being called in certain situations? Or do those issues (and fixes) only come in with the virtual engine support?
John.
if (!guc->sched_engine)
return -ENOMEM;
guc->sched_engine->schedule = i915_schedule;
guc->sched_engine->private_data = guc;
tasklet_setup(&guc->sched_engine->tasklet,
guc_submission_tasklet);
}
i915_sched_engine_put(engine->sched_engine);
engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
guc_default_vfuncs(engine); guc_default_irqs(engine);
On Tue, Jun 29, 2021 at 03:04:56PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Implement GuC submission tasklet for new interface. The new GuC interface uses H2G to submit contexts to the GuC. Since H2G use a single channel, a single tasklet submits is used for the submission path.
Re-word? 'a single tasklet submits is used...' doesn't make sense.
Will do.
Also the per engine interrupt handler has been updated to disable the rescheduling of the physical engine tasklet, when using GuC scheduling, as the physical engine tasklet is no longer used.
In this patch the field, guc_id, has been added to intel_context and is not assigned. Patches later in the series will assign this value.
Cc: John Harrisonjohn.c.harrison@intel.com Signed-off-by: Matthew Brostmatthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context_types.h | 9 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++--------- 3 files changed, 127 insertions(+), 117 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index ed8c447a7346..bb6fef7eae52 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -136,6 +136,15 @@ struct intel_context { struct intel_sseu sseu; u8 wa_bb_page; /* if set, page num reserved for context workarounds */
- /* GuC scheduling state that does not require a lock. */
Maybe 'GuC scheduling state flags that do not require a lock'? Otherwise it just looks like a counter or something.
Sure.
- atomic_t guc_sched_state_no_lock;
- /*
* GuC lrc descriptor ID - Not assigned in this patch but future patches
Not a blocker but s/lrc/LRC/ would keep Michal happy ;). Although presumably this comment is at least being amended by later patches in the series.
Will fix.
* in the series will.
*/
- u16 guc_id; }; #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 2313d9fc087b..9ba8219475b2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -30,6 +30,10 @@ struct intel_guc { struct intel_guc_log log; struct intel_guc_ct ct;
- /* Global engine used to submit requests to GuC */
- struct i915_sched_engine *sched_engine;
- struct i915_request *stalled_request;
- /* intel_guc_recv interrupt related state */ spinlock_t irq_lock; unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 23a94a896a0b..ee933efbf0ff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -60,6 +60,31 @@ #define GUC_REQUEST_SIZE 64 /* bytes */ +/*
- Below is a set of functions which control the GuC scheduling state which do
- not require a lock as all state transitions are mutually exclusive. i.e. It
- is not possible for the context pinning code and submission, for the same
- context, to be executing simultaneously. We still need an atomic as it is
- possible for some of the bits to changing at the same time though.
- */
+#define SCHED_STATE_NO_LOCK_ENABLED BIT(0) +static inline bool context_enabled(struct intel_context *ce) +{
- return (atomic_read(&ce->guc_sched_state_no_lock) &
SCHED_STATE_NO_LOCK_ENABLED);
+}
+static inline void set_context_enabled(struct intel_context *ce) +{
- atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+static inline void clr_context_enabled(struct intel_context *ce) +{
- atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
&ce->guc_sched_state_no_lock);
+}
- static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); } -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) {
- /* Leaving stub as this function will be used in future patches */
-}
- int err;
- struct intel_context *ce = rq->context;
- u32 action[3];
- int len = 0;
- bool enabled = context_enabled(ce);
-/*
- When we're doing submissions using regular execlists backend, writing to
- ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- pinned in mappable aperture portion of GGTT are visible to command streamer.
- Writes done by GuC on our behalf are not guaranteeing such ordering,
- therefore, to ensure the flush, we're issuing a POSTING READ.
- */
-static void flush_ggtt_writes(struct i915_vma *vma) -{
- if (i915_vma_is_map_and_fenceable(vma))
intel_uncore_posting_read_fw(vma->vm->gt->uncore,
GUC_STATUS);
-}
- if (!enabled) {
action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
action[len++] = ce->guc_id;
action[len++] = GUC_CONTEXT_ENABLE;
- } else {
action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
action[len++] = ce->guc_id;
- }
-static void guc_submit(struct intel_engine_cs *engine,
struct i915_request **out,
struct i915_request **end)
-{
- struct intel_guc *guc = &engine->gt->uc.guc;
- err = intel_guc_send_nb(guc, action, len);
- do {
struct i915_request *rq = *out++;
- if (!enabled && !err)
set_context_enabled(ce);
flush_ggtt_writes(rq->ring->vma);
guc_add_request(guc, rq);
- } while (out != end);
- return err; } static inline int rq_prio(const struct i915_request *rq)
@@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq) return rq->sched.attr.priority; } -static struct i915_request *schedule_in(struct i915_request *rq, int idx) +static int guc_dequeue_one_context(struct intel_guc *guc) {
- trace_i915_request_in(rq, idx);
- /*
* Currently we are not tracking the rq->context being inflight
* (ce->inflight = rq->engine). It is only used by the execlists
* backend at the moment, a similar counting strategy would be
* required if we generalise the inflight tracking.
*/
- __intel_gt_pm_get(rq->engine->gt);
- return i915_request_get(rq);
-}
-static void schedule_out(struct i915_request *rq) -{
- trace_i915_request_out(rq);
- intel_gt_pm_put_async(rq->engine->gt);
- i915_request_put(rq);
-}
-static void __guc_dequeue(struct intel_engine_cs *engine) -{
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_sched_engine * const sched_engine = engine->sched_engine;
- struct i915_request **first = execlists->inflight;
- struct i915_request ** const last_port = first + execlists->port_mask;
- struct i915_request *last = first[0];
- struct i915_request **port;
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- struct i915_request *last = NULL; bool submit = false; struct rb_node *rb;
- int ret; lockdep_assert_held(&sched_engine->lock);
- if (last) {
if (*++first)
return;
last = NULL;
- if (guc->stalled_request) {
submit = true;
last = guc->stalled_request;
}goto resubmit;
- /*
* We write directly into the execlists->inflight queue and don't use
* the execlists->pending queue, as we don't have a distinct switch
* event.
*/
- port = first; while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); struct i915_request *rq, *rn; priolist_for_each_request_consume(rq, rn, p) {
if (last && rq->context != last->context) {
if (port == last_port)
goto done;
*port = schedule_in(last,
port - execlists->inflight);
port++;
}
if (last && rq->context != last->context)
goto done; list_del_init(&rq->sched.link);
__i915_request_submit(rq);
submit = true;
trace_i915_request_in(rq, 0); last = rq;
} rb_erase_cached(&p->node, &sched_engine->queue); i915_priolist_free(p); } done:submit = true;
- sched_engine->queue_priority_hint =
if (submit) {rb ? to_priolist(rb)->priority : INT_MIN;
*port = schedule_in(last, port - execlists->inflight);
*++port = NULL;
guc_submit(engine, first, port);
last->context->lrc_reg_state[CTX_RING_TAIL] =
intel_ring_set_tail(last->ring, last->tail);
+resubmit:
/*
* We only check for -EBUSY here even though it is possible for
* -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
* died and a full GPU needs to be done. The hangcheck will
'full GPU reset'. Although I believe strictly speaking, it is a 'full GT reset'. There are other bits of the GPU beyond the GT.
Yep, will fix.
* eventually detect that the GuC has died and trigger this
* reset so no need to handle -EDEADLK here.
*/
ret = guc_add_request(guc, last);
if (ret == -EBUSY) {
tasklet_schedule(&sched_engine->tasklet);
guc->stalled_request = last;
return false;
}}
- execlists->active = execlists->inflight;
- guc->stalled_request = NULL;
- return submit; } static void guc_submission_tasklet(struct tasklet_struct *t) { struct i915_sched_engine *sched_engine = from_tasklet(sched_engine, t, tasklet);
- struct intel_engine_cs * const engine = sched_engine->private_data;
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_request **port, *rq; unsigned long flags;
- bool loop;
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- for (port = execlists->inflight; (rq = *port); port++) {
if (!i915_request_completed(rq))
break;
schedule_out(rq);
- }
- if (port != execlists->inflight) {
int idx = port - execlists->inflight;
int rem = ARRAY_SIZE(execlists->inflight) - idx;
memmove(execlists->inflight, port, rem * sizeof(*port));
- }
- spin_lock_irqsave(&sched_engine->lock, flags);
- __guc_dequeue(engine);
- do {
loop = guc_dequeue_one_context(sched_engine->private_data);
- } while (loop);
- i915_sched_engine_reset_on_empty(engine->sched_engine);
- i915_sched_engine_reset_on_empty(sched_engine);
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags); }
Not a blocker but it has to be said that it would be much easier to remove all of the above if the delete was split into a separate patch. Having two completely disparate threads of code interwoven in the diff makes it much harder to see what the new version is doing!
Yes, it would be easier to read if this code was deleted in a seperate patch. I'll keep that in mind going forward. No promises but perhaps I'll do this in the next rev.
static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) {
- if (iir & GT_RENDER_USER_INTERRUPT) {
- if (iir & GT_RENDER_USER_INTERRUPT) intel_engine_signal_breadcrumbs(engine);
tasklet_hi_schedule(&engine->sched_engine->tasklet);
- } } static void guc_reset_prepare(struct intel_engine_cs *engine)
@@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) struct rb_node *rb; unsigned long flags;
- /* Can be called during boot if GuC fails to load */
- if (!engine->gt)
return;
- ENGINE_TRACE(engine, "\n"); /*
@@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc) void intel_guc_submission_fini(struct intel_guc *guc) {
- if (guc->lrc_desc_pool)
guc_lrc_desc_pool_destroy(guc);
- if (!guc->lrc_desc_pool)
return;
- guc_lrc_desc_pool_destroy(guc);
- i915_sched_engine_put(guc->sched_engine); } static int guc_context_alloc(struct intel_context *ce)
@@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request) return 0; } -static inline void queue_request(struct intel_engine_cs *engine, +static inline void queue_request(struct i915_sched_engine *sched_engine, struct i915_request *rq, int prio) { GEM_BUG_ON(!list_empty(&rq->sched.link)); list_add_tail(&rq->sched.link,
i915_sched_lookup_priolist(engine->sched_engine, prio));
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); } static void guc_submit_request(struct i915_request *rq) {i915_sched_lookup_priolist(sched_engine, prio));
- struct intel_engine_cs *engine = rq->engine;
- struct i915_sched_engine *sched_engine = rq->engine->sched_engine; unsigned long flags; /* Will be called from irq-context when using foreign fences. */
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- spin_lock_irqsave(&sched_engine->lock, flags);
- queue_request(engine, rq, rq_prio(rq));
- queue_request(sched_engine, rq, rq_prio(rq));
- GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
- GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine)); GEM_BUG_ON(list_empty(&rq->sched.link));
- tasklet_hi_schedule(&engine->sched_engine->tasklet);
- tasklet_hi_schedule(&sched_engine->tasklet);
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags); } static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine) { engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
- tasklet_kill(&engine->sched_engine->tasklet);
- intel_engine_cleanup_common(engine); lrc_fini_wa_ctx(engine); }
@@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine) int intel_guc_submission_setup(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
- struct intel_guc *guc = &engine->gt->uc.guc; /*
- The setup relies on several assumptions (e.g. irqs always enabled)
@@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) */ GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
- tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
- if (!guc->sched_engine) {
guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
Does the re-work of the sched_engine create/destroy happen later in this patch series? Wasn't there issues with the wrong destroy function being called in certain situations? Or do those issues (and fixes) only come in with the virtual engine support?
We didn't need the destroy until we introduce the guc_submit_engine, but that is changing after the kasan bug fix I sent out today for the internal version of this code. I've already reworked my upstream branch to add a destroy vfunc for sched_engine in seperate patch a bit later in the series.
Matt
John.
if (!guc->sched_engine)
return -ENOMEM;
guc->sched_engine->schedule = i915_schedule;
guc->sched_engine->private_data = guc;
tasklet_setup(&guc->sched_engine->tasklet,
guc_submission_tasklet);
- }
- i915_sched_engine_put(engine->sched_engine);
- engine->sched_engine = i915_sched_engine_get(guc->sched_engine); guc_default_vfuncs(engine); guc_default_irqs(engine);
Add bypass tasklet submission path to GuC. The tasklet is only used if H2G channel has backpresure.
Signed-off-by: Matthew Brost matthew.brost@intel.com --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++---- 1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ee933efbf0ff..38aff83ee9fa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -172,6 +172,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) return err; }
+static inline void guc_set_lrc_tail(struct i915_request *rq) +{ + rq->context->lrc_reg_state[CTX_RING_TAIL] = + intel_ring_set_tail(rq->ring, rq->tail); +} + static inline int rq_prio(const struct i915_request *rq) { return rq->sched.attr.priority; @@ -215,8 +221,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) } done: if (submit) { - last->context->lrc_reg_state[CTX_RING_TAIL] = - intel_ring_set_tail(last->ring, last->tail); + guc_set_lrc_tail(last); resubmit: /* * We only check for -EBUSY here even though it is possible for @@ -496,20 +501,36 @@ static inline void queue_request(struct i915_sched_engine *sched_engine, set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); }
+static int guc_bypass_tasklet_submit(struct intel_guc *guc, + struct i915_request *rq) +{ + int ret; + + __i915_request_submit(rq); + + trace_i915_request_in(rq, 0); + + guc_set_lrc_tail(rq); + ret = guc_add_request(guc, rq); + if (ret == -EBUSY) + guc->stalled_request = rq; + + return ret; +} + static void guc_submit_request(struct i915_request *rq) { struct i915_sched_engine *sched_engine = rq->engine->sched_engine; + struct intel_guc *guc = &rq->engine->gt->uc.guc; unsigned long flags;
/* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags);
- queue_request(sched_engine, rq, rq_prio(rq)); - - GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine)); - GEM_BUG_ON(list_empty(&rq->sched.link)); - - tasklet_hi_schedule(&sched_engine->tasklet); + if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine)) + queue_request(sched_engine, rq, rq_prio(rq)); + else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) + tasklet_hi_schedule(&sched_engine->tasklet);
spin_unlock_irqrestore(&sched_engine->lock, flags); }
On 6/24/2021 00:04, Matthew Brost wrote:
Add bypass tasklet submission path to GuC. The tasklet is only used if H2G channel has backpresure.
Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: John Harrison John.C.Harrison@Intel.com
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++---- 1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ee933efbf0ff..38aff83ee9fa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -172,6 +172,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) return err; }
+static inline void guc_set_lrc_tail(struct i915_request *rq) +{
- rq->context->lrc_reg_state[CTX_RING_TAIL] =
intel_ring_set_tail(rq->ring, rq->tail);
+}
- static inline int rq_prio(const struct i915_request *rq) { return rq->sched.attr.priority;
@@ -215,8 +221,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) } done: if (submit) {
last->context->lrc_reg_state[CTX_RING_TAIL] =
intel_ring_set_tail(last->ring, last->tail);
resubmit: /*guc_set_lrc_tail(last);
- We only check for -EBUSY here even though it is possible for
@@ -496,20 +501,36 @@ static inline void queue_request(struct i915_sched_engine *sched_engine, set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); }
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
struct i915_request *rq)
+{
- int ret;
- __i915_request_submit(rq);
- trace_i915_request_in(rq, 0);
- guc_set_lrc_tail(rq);
- ret = guc_add_request(guc, rq);
- if (ret == -EBUSY)
guc->stalled_request = rq;
- return ret;
+}
static void guc_submit_request(struct i915_request *rq) { struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
struct intel_guc *guc = &rq->engine->gt->uc.guc; unsigned long flags;
/* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags);
- queue_request(sched_engine, rq, rq_prio(rq));
- GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
- GEM_BUG_ON(list_empty(&rq->sched.link));
- tasklet_hi_schedule(&sched_engine->tasklet);
if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
queue_request(sched_engine, rq, rq_prio(rq));
else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
tasklet_hi_schedule(&sched_engine->tasklet);
spin_unlock_irqrestore(&sched_engine->lock, flags); }
Implement GuC context operations which includes GuC specific operations alloc, pin, unpin, and destroy.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_context.c | 5 + drivers/gpu/drm/i915/gt/intel_context_types.h | 22 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 34 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 664 ++++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/i915_request.c | 1 + 8 files changed, 677 insertions(+), 55 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 4033184f13b9..2b68af16222c 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -383,6 +383,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
mutex_init(&ce->pin_mutex);
+ spin_lock_init(&ce->guc_state.lock); + + ce->guc_id = GUC_INVALID_LRC_ID; + INIT_LIST_HEAD(&ce->guc_id_link); + i915_active_init(&ce->active, __intel_context_active, __intel_context_retire, 0); } diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index bb6fef7eae52..ce7c69b34cd1 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -95,6 +95,7 @@ struct intel_context { #define CONTEXT_BANNED 6 #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 #define CONTEXT_NOPREEMPT 8 +#define CONTEXT_LRCA_DIRTY 9
struct { u64 timeout_us; @@ -137,14 +138,29 @@ struct intel_context {
u8 wa_bb_page; /* if set, page num reserved for context workarounds */
+ struct { + /** lock: protects everything in guc_state */ + spinlock_t lock; + /** + * sched_state: scheduling state of this context using GuC + * submission + */ + u8 sched_state; + } guc_state; + /* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
+ /* GuC lrc descriptor ID */ + u16 guc_id; + + /* GuC lrc descriptor reference count */ + atomic_t guc_id_ref; + /* - * GuC lrc descriptor ID - Not assigned in this patch but future patches - * in the series will. + * GuC ID link - in list when unpinned but guc_id still valid in GuC */ - u16 guc_id; + struct list_head guc_id_link; };
#endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h index 41e5350a7a05..49d4857ad9b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h @@ -87,7 +87,6 @@ #define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0)
#define MAX_CONTEXT_HW_ID (1 << 21) /* exclusive */ -#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */ #define GEN11_MAX_CONTEXT_HW_ID (1 << 11) /* exclusive */ /* in Gen12 ID 0x7FF is reserved to indicate idle */ #define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 9ba8219475b2..d44316dc914b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -44,6 +44,14 @@ struct intel_guc { void (*disable)(struct intel_guc *guc); } interrupts;
+ /* + * contexts_lock protects the pool of free guc ids and a linked list of + * guc ids available to be stolen + */ + spinlock_t contexts_lock; + struct ida guc_ids; + struct list_head guc_id_list; + bool submission_selected;
struct i915_vma *ads_vma; @@ -102,6 +110,29 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, response_buf, response_buf_size, 0); }
+static inline int intel_guc_send_busy_loop(struct intel_guc* guc, + const u32 *action, + u32 len, + bool loop) +{ + int err; + + /* No sleeping with spin locks, just busy loop */ + might_sleep_if(loop && (!in_atomic() && !irqs_disabled())); + +retry: + err = intel_guc_send_nb(guc, action, len); + if (unlikely(err == -EBUSY && loop)) { + if (likely(!in_atomic() && !irqs_disabled())) + cond_resched(); + else + cpu_relax(); + goto retry; + } + + return err; +} + static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) { intel_guc_ct_event_handler(&guc->ct); @@ -203,6 +234,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) int intel_guc_reset_engine(struct intel_guc *guc, struct intel_engine_cs *engine);
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len); + void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
#endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 8e0ed7d8feb3..42a7daef2ff6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -901,6 +901,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_DEFAULT: ret = intel_guc_to_host_process_recv_msg(guc, payload, len); break; + case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE: + ret = intel_guc_deregister_done_process_msg(guc, payload, + len); + break; default: ret = -EOPNOTSUPP; break; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 38aff83ee9fa..d39579ac2faa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -13,7 +13,9 @@ #include "gt/intel_gt.h" #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt_requests.h" #include "gt/intel_lrc.h" +#include "gt/intel_lrc_reg.h" #include "gt/intel_mocs.h" #include "gt/intel_ring.h"
@@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce) &ce->guc_sched_state_no_lock); }
+/* + * Below is a set of functions which control the GuC scheduling state which + * require a lock, aside from the special case where the functions are called + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code + * path to be executing on the context. + */ +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER BIT(0) +#define SCHED_STATE_DESTROYED BIT(1) +static inline void init_sched_state(struct intel_context *ce) +{ + /* Only should be called from guc_lrc_desc_pin() */ + atomic_set(&ce->guc_sched_state_no_lock, 0); + ce->guc_state.sched_state = 0; +} + +static inline bool +context_wait_for_deregister_to_register(struct intel_context *ce) +{ + return (ce->guc_state.sched_state & + SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER); +} + +static inline void +set_context_wait_for_deregister_to_register(struct intel_context *ce) +{ + /* Only should be called from guc_lrc_desc_pin() */ + ce->guc_state.sched_state |= + SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; +} + +static inline void +clr_context_wait_for_deregister_to_register(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state = + (ce->guc_state.sched_state & + ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER); +} + +static inline bool +context_destroyed(struct intel_context *ce) +{ + return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED); +} + +static inline void +set_context_destroyed(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state |= SCHED_STATE_DESTROYED; +} + +static inline bool context_guc_id_invalid(struct intel_context *ce) +{ + return (ce->guc_id == GUC_INVALID_LRC_ID); +} + +static inline void set_context_guc_id_invalid(struct intel_context *ce) +{ + ce->guc_id = GUC_INVALID_LRC_ID; +} + +static inline struct intel_guc *ce_to_guc(struct intel_context *ce) +{ + return &ce->engine->gt->uc.guc; +} + static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node); @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) int len = 0; bool enabled = context_enabled(ce);
+ GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); + GEM_BUG_ON(context_guc_id_invalid(ce)); + if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id; @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
+ spin_lock_init(&guc->contexts_lock); + INIT_LIST_HEAD(&guc->guc_id_list); + ida_init(&guc->guc_ids); + return 0; }
@@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc) i915_sched_engine_put(guc->sched_engine); }
-static int guc_context_alloc(struct intel_context *ce) +static inline void queue_request(struct i915_sched_engine *sched_engine, + struct i915_request *rq, + int prio) { - return lrc_alloc(ce, ce->engine); + GEM_BUG_ON(!list_empty(&rq->sched.link)); + list_add_tail(&rq->sched.link, + i915_sched_lookup_priolist(sched_engine, prio)); + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); +} + +static int guc_bypass_tasklet_submit(struct intel_guc *guc, + struct i915_request *rq) +{ + int ret; + + __i915_request_submit(rq); + + trace_i915_request_in(rq, 0); + + guc_set_lrc_tail(rq); + ret = guc_add_request(guc, rq); + if (ret == -EBUSY) + guc->stalled_request = rq; + + return ret; +} + +static void guc_submit_request(struct i915_request *rq) +{ + struct i915_sched_engine *sched_engine = rq->engine->sched_engine; + struct intel_guc *guc = &rq->engine->gt->uc.guc; + unsigned long flags; + + /* Will be called from irq-context when using foreign fences. */ + spin_lock_irqsave(&sched_engine->lock, flags); + + if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine)) + queue_request(sched_engine, rq, rq_prio(rq)); + else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) + tasklet_hi_schedule(&sched_engine->tasklet); + + spin_unlock_irqrestore(&sched_engine->lock, flags); +} + +#define GUC_ID_START 64 /* First 64 guc_ids reserved */ +static int new_guc_id(struct intel_guc *guc) +{ + return ida_simple_get(&guc->guc_ids, GUC_ID_START, + GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL | + __GFP_RETRY_MAYFAIL | __GFP_NOWARN); +} + +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce) +{ + if (!context_guc_id_invalid(ce)) { + ida_simple_remove(&guc->guc_ids, ce->guc_id); + reset_lrc_desc(guc, ce->guc_id); + set_context_guc_id_invalid(ce); + } + if (!list_empty(&ce->guc_id_link)) + list_del_init(&ce->guc_id_link); +} + +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce) +{ + unsigned long flags; + + spin_lock_irqsave(&guc->contexts_lock, flags); + __release_guc_id(guc, ce); + spin_unlock_irqrestore(&guc->contexts_lock, flags); +} + +static int steal_guc_id(struct intel_guc *guc) +{ + struct intel_context *ce; + int guc_id; + + if (!list_empty(&guc->guc_id_list)) { + ce = list_first_entry(&guc->guc_id_list, + struct intel_context, + guc_id_link); + + GEM_BUG_ON(atomic_read(&ce->guc_id_ref)); + GEM_BUG_ON(context_guc_id_invalid(ce)); + + list_del_init(&ce->guc_id_link); + guc_id = ce->guc_id; + set_context_guc_id_invalid(ce); + return guc_id; + } else { + return -EAGAIN; + } +} + +static int assign_guc_id(struct intel_guc *guc, u16 *out) +{ + int ret; + + ret = new_guc_id(guc); + if (unlikely(ret < 0)) { + ret = steal_guc_id(guc); + if (ret < 0) + return ret; + } + + *out = ret; + return 0; +} + +#define PIN_GUC_ID_TRIES 4 +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce) +{ + int ret = 0; + unsigned long flags, tries = PIN_GUC_ID_TRIES; + + GEM_BUG_ON(atomic_read(&ce->guc_id_ref)); + +try_again: + spin_lock_irqsave(&guc->contexts_lock, flags); + + if (context_guc_id_invalid(ce)) { + ret = assign_guc_id(guc, &ce->guc_id); + if (ret) + goto out_unlock; + ret = 1; // Indidcates newly assigned HW context + } + if (!list_empty(&ce->guc_id_link)) + list_del_init(&ce->guc_id_link); + atomic_inc(&ce->guc_id_ref); + +out_unlock: + spin_unlock_irqrestore(&guc->contexts_lock, flags); + + /* + * -EAGAIN indicates no guc_ids are available, let's retire any + * outstanding requests to see if that frees up a guc_id. If the first + * retire didn't help, insert a sleep with the timeslice duration before + * attempting to retire more requests. Double the sleep period each + * subsequent pass before finally giving up. The sleep period has max of + * 100ms and minimum of 1ms. + */ + if (ret == -EAGAIN && --tries) { + if (PIN_GUC_ID_TRIES - tries > 1) { + unsigned int timeslice_shifted = + ce->engine->props.timeslice_duration_ms << + (PIN_GUC_ID_TRIES - tries - 2); + unsigned int max = min_t(unsigned int, 100, + timeslice_shifted); + + msleep(max_t(unsigned int, max, 1)); + } + intel_gt_retire_requests(guc_to_gt(guc)); + goto try_again; + } + + return ret; +} + +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce) +{ + unsigned long flags; + + GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0); + + spin_lock_irqsave(&guc->contexts_lock, flags); + if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) && + !atomic_read(&ce->guc_id_ref)) + list_add_tail(&ce->guc_id_link, &guc->guc_id_list); + spin_unlock_irqrestore(&guc->contexts_lock, flags); +} + +static int __guc_action_register_context(struct intel_guc *guc, + u32 guc_id, + u32 offset) +{ + u32 action[] = { + INTEL_GUC_ACTION_REGISTER_CONTEXT, + guc_id, + offset, + }; + + return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true); +} + +static int register_context(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + + ce->guc_id * sizeof(struct guc_lrc_desc); + + return __guc_action_register_context(guc, ce->guc_id, offset); +} + +static int __guc_action_deregister_context(struct intel_guc *guc, + u32 guc_id) +{ + u32 action[] = { + INTEL_GUC_ACTION_DEREGISTER_CONTEXT, + guc_id, + }; + + return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true); +} + +static int deregister_context(struct intel_context *ce, u32 guc_id) +{ + struct intel_guc *guc = ce_to_guc(ce); + + return __guc_action_deregister_context(guc, guc_id); +} + +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask) +{ + switch (class) { + case RENDER_CLASS: + return mask >> RCS0; + case VIDEO_ENHANCEMENT_CLASS: + return mask >> VECS0; + case VIDEO_DECODE_CLASS: + return mask >> VCS0; + case COPY_ENGINE_CLASS: + return mask >> BCS0; + default: + GEM_BUG_ON("Invalid Class"); + return 0; + } +} + +static void guc_context_policy_init(struct intel_engine_cs *engine, + struct guc_lrc_desc *desc) +{ + desc->policy_flags = 0; + + desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US; + desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US; +} + +static int guc_lrc_desc_pin(struct intel_context *ce) +{ + struct intel_runtime_pm *runtime_pm = + &ce->engine->gt->i915->runtime_pm; + struct intel_engine_cs *engine = ce->engine; + struct intel_guc *guc = &engine->gt->uc.guc; + u32 desc_idx = ce->guc_id; + struct guc_lrc_desc *desc; + bool context_registered; + intel_wakeref_t wakeref; + int ret = 0; + + GEM_BUG_ON(!engine->mask); + + /* + * Ensure LRC + CT vmas are is same region as write barrier is done + * based on CT vma region. + */ + GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) != + i915_gem_object_is_lmem(ce->ring->vma->obj)); + + context_registered = lrc_desc_registered(guc, desc_idx); + + reset_lrc_desc(guc, desc_idx); + set_lrc_desc_registered(guc, desc_idx, ce); + + desc = __get_lrc_desc(guc, desc_idx); + desc->engine_class = engine_class_to_guc_class(engine->class); + desc->engine_submit_mask = adjust_engine_mask(engine->class, + engine->mask); + desc->hw_context_desc = ce->lrc.lrca; + desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL; + desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD; + guc_context_policy_init(engine, desc); + init_sched_state(ce); + + /* + * The context_lookup xarray is used to determine if the hardware + * context is currently registered. There are two cases in which it + * could be regisgered either the guc_id has been stole from from + * another context or the lrc descriptor address of this context has + * changed. In either case the context needs to be deregistered with the + * GuC before registering this context. + */ + if (context_registered) { + set_context_wait_for_deregister_to_register(ce); + intel_context_get(ce); + + /* + * If stealing the guc_id, this ce has the same guc_id as the + * context whos guc_id was stole. + */ + with_intel_runtime_pm(runtime_pm, wakeref) + ret = deregister_context(ce, ce->guc_id); + } else { + with_intel_runtime_pm(runtime_pm, wakeref) + ret = register_context(ce); + } + + return ret; }
static int guc_context_pre_pin(struct intel_context *ce, @@ -443,36 +813,137 @@ static int guc_context_pre_pin(struct intel_context *ce,
static int guc_context_pin(struct intel_context *ce, void *vaddr) { + if (i915_ggtt_offset(ce->state) != + (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK)) + set_bit(CONTEXT_LRCA_DIRTY, &ce->flags); + return lrc_pin(ce, ce->engine, vaddr); }
+static void guc_context_unpin(struct intel_context *ce) +{ + unpin_guc_id(ce_to_guc(ce), ce); + lrc_unpin(ce); +} + +static void guc_context_post_unpin(struct intel_context *ce) +{ + lrc_post_unpin(ce); +} + +static inline void guc_lrc_desc_unpin(struct intel_context *ce) +{ + struct intel_engine_cs *engine = ce->engine; + struct intel_guc *guc = &engine->gt->uc.guc; + unsigned long flags; + + GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); + GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); + + spin_lock_irqsave(&ce->guc_state.lock, flags); + set_context_destroyed(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + deregister_context(ce, ce->guc_id); +} + +static void guc_context_destroy(struct kref *kref) +{ + struct intel_context *ce = container_of(kref, typeof(*ce), ref); + struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; + struct intel_guc *guc = &ce->engine->gt->uc.guc; + intel_wakeref_t wakeref; + unsigned long flags; + + /* + * If the guc_id is invalid this context has been stolen and we can free + * it immediately. Also can be freed immediately if the context is not + * registered with the GuC. + */ + if (context_guc_id_invalid(ce) || + !lrc_desc_registered(guc, ce->guc_id)) { + release_guc_id(guc, ce); + lrc_destroy(kref); + return; + } + + /* + * We have to acquire the context spinlock and check guc_id again, if it + * is valid it hasn't been stolen and needs to be deregistered. We + * delete this context from the list of unpinned guc_ids available to + * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB + * returns indicating this context has been deregistered the guc_id is + * returned to the pool of available guc_ids. + */ + spin_lock_irqsave(&guc->contexts_lock, flags); + if (context_guc_id_invalid(ce)) { + __release_guc_id(guc, ce); + spin_unlock_irqrestore(&guc->contexts_lock, flags); + lrc_destroy(kref); + return; + } + + if (!list_empty(&ce->guc_id_link)) + list_del_init(&ce->guc_id_link); + spin_unlock_irqrestore(&guc->contexts_lock, flags); + + /* + * We defer GuC context deregistration until the context is destroyed + * in order to save on CTBs. With this optimization ideally we only need + * 1 CTB to register the context during the first pin and 1 CTB to + * deregister the context when the context is destroyed. Without this + * optimization, a CTB would be needed every pin & unpin. + * + * XXX: Need to acqiure the runtime wakeref as this can be triggered + * from context_free_worker when not runtime wakeref is held. + * guc_lrc_desc_unpin requires the runtime as a GuC register is written + * in H2G CTB to deregister the context. A future patch may defer this + * H2G CTB if the runtime wakeref is zero. + */ + with_intel_runtime_pm(runtime_pm, wakeref) + guc_lrc_desc_unpin(ce); +} + +static int guc_context_alloc(struct intel_context *ce) +{ + return lrc_alloc(ce, ce->engine); +} + static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
.pre_pin = guc_context_pre_pin, .pin = guc_context_pin, - .unpin = lrc_unpin, - .post_unpin = lrc_post_unpin, + .unpin = guc_context_unpin, + .post_unpin = guc_context_post_unpin,
.enter = intel_context_enter_engine, .exit = intel_context_exit_engine,
.reset = lrc_reset, - .destroy = lrc_destroy, + .destroy = guc_context_destroy, };
-static int guc_request_alloc(struct i915_request *request) +static bool context_needs_register(struct intel_context *ce, bool new_guc_id) { + return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) || + !lrc_desc_registered(ce_to_guc(ce), ce->guc_id); +} + +static int guc_request_alloc(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + struct intel_guc *guc = ce_to_guc(ce); int ret;
- GEM_BUG_ON(!intel_context_is_pinned(request->context)); + GEM_BUG_ON(!intel_context_is_pinned(rq->context));
/* * Flush enough space to reduce the likelihood of waiting after * we start building the request - in which case we will just * have to repeat work. */ - request->reserved_space += GUC_REQUEST_SIZE; + rq->reserved_space += GUC_REQUEST_SIZE;
/* * Note that after this point, we have committed to using @@ -483,56 +954,47 @@ static int guc_request_alloc(struct i915_request *request) */
/* Unconditionally invalidate GPU caches and TLBs. */ - ret = request->engine->emit_flush(request, EMIT_INVALIDATE); + ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE); if (ret) return ret;
- request->reserved_space -= GUC_REQUEST_SIZE; - return 0; -} - -static inline void queue_request(struct i915_sched_engine *sched_engine, - struct i915_request *rq, - int prio) -{ - GEM_BUG_ON(!list_empty(&rq->sched.link)); - list_add_tail(&rq->sched.link, - i915_sched_lookup_priolist(sched_engine, prio)); - set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); -} - -static int guc_bypass_tasklet_submit(struct intel_guc *guc, - struct i915_request *rq) -{ - int ret; - - __i915_request_submit(rq); + rq->reserved_space -= GUC_REQUEST_SIZE;
- trace_i915_request_in(rq, 0); - - guc_set_lrc_tail(rq); - ret = guc_add_request(guc, rq); - if (ret == -EBUSY) - guc->stalled_request = rq; - - return ret; -} - -static void guc_submit_request(struct i915_request *rq) -{ - struct i915_sched_engine *sched_engine = rq->engine->sched_engine; - struct intel_guc *guc = &rq->engine->gt->uc.guc; - unsigned long flags; + /* + * Call pin_guc_id here rather than in the pinning step as with + * dma_resv, contexts can be repeatedly pinned / unpinned trashing the + * guc_ids and creating horrible race conditions. This is especially bad + * when guc_ids are being stolen due to over subscription. By the time + * this function is reached, it is guaranteed that the guc_id will be + * persistent until the generated request is retired. Thus, sealing these + * race conditions. It is still safe to fail here if guc_ids are + * exhausted and return -EAGAIN to the user indicating that they can try + * again in the future. + * + * There is no need for a lock here as the timeline mutex ensures at + * most one context can be executing this code path at once. The + * guc_id_ref is incremented once for every request in flight and + * decremented on each retire. When it is zero, a lock around the + * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id. + */ + if (atomic_add_unless(&ce->guc_id_ref, 1, 0)) + return 0;
- /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&sched_engine->lock, flags); + ret = pin_guc_id(guc, ce); /* returns 1 if new guc_id assigned */ + if (unlikely(ret < 0)) + return ret;; + if (context_needs_register(ce, !!ret)) { + ret = guc_lrc_desc_pin(ce); + if (unlikely(ret)) { /* unwind */ + atomic_dec(&ce->guc_id_ref); + unpin_guc_id(guc, ce); + return ret; + } + }
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine)) - queue_request(sched_engine, rq, rq_prio(rq)); - else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) - tasklet_hi_schedule(&sched_engine->tasklet); + clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags); + return 0; }
static void sanitize_hwsp(struct intel_engine_cs *engine) @@ -606,6 +1068,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine) engine->submit_request = guc_submit_request; }
+static inline void guc_kernel_context_pin(struct intel_guc *guc, + struct intel_context *ce) +{ + if (context_guc_id_invalid(ce)) + pin_guc_id(guc, ce); + guc_lrc_desc_pin(ce); +} + +static inline void guc_init_lrc_mapping(struct intel_guc *guc) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct intel_engine_cs *engine; + enum intel_engine_id id; + + /* make sure all descriptors are clean... */ + xa_destroy(&guc->context_lookup); + + /* + * Some contexts might have been pinned before we enabled GuC + * submission, so we need to add them to the GuC bookeeping. + * Also, after a reset the GuC we want to make sure that the information + * shared with GuC is properly reset. The kernel lrcs are not attached + * to the gem_context, so they need to be added separately. + * + * Note: we purposely do not check the error return of + * guc_lrc_desc_pin, because that function can only fail in two cases. + * One, if there aren't enough free IDs, but we're guaranteed to have + * enough here (we're either only pinning a handful of lrc on first boot + * or we're re-pinning lrcs that were already pinned before the reset). + * Two, if the GuC has died and CTBs can't make forward progress. + * Presumably, the GuC should be alive as this function is called on + * driver load or after a reset. Even if it is dead, another full GPU + * reset will be triggered and this function would be called again. + */ + + for_each_engine(engine, gt, id) + if (engine->kernel_context) + guc_kernel_context_pin(guc, engine->kernel_context); +} + static void guc_release(struct intel_engine_cs *engine) { engine->sanitize = NULL; /* no longer in control, nothing to sanitize */ @@ -718,6 +1220,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
void intel_guc_submission_enable(struct intel_guc *guc) { + guc_init_lrc_mapping(guc); }
void intel_guc_submission_disable(struct intel_guc *guc) @@ -743,3 +1246,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc) { guc->submission_selected = __guc_submission_selected(guc); } + +static inline struct intel_context * +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) +{ + struct intel_context *ce; + + if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, + "Invalid desc_idx %u", desc_idx); + return NULL; + } + + ce = __get_context(guc, desc_idx); + if (unlikely(!ce)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, + "Context is NULL, desc_idx %u", desc_idx); + return NULL; + } + + return ce; +} + +int intel_guc_deregister_done_process_msg(struct intel_guc *guc, + const u32 *msg, + u32 len) +{ + struct intel_context *ce; + u32 desc_idx = msg[0]; + + if (unlikely(len < 1)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + ce = g2h_context_lookup(guc, desc_idx); + if (unlikely(!ce)) + return -EPROTO; + + if (context_wait_for_deregister_to_register(ce)) { + struct intel_runtime_pm *runtime_pm = + &ce->engine->gt->i915->runtime_pm; + intel_wakeref_t wakeref; + + /* + * Previous owner of this guc_id has been deregistered, now safe + * register this context. + */ + with_intel_runtime_pm(runtime_pm, wakeref) + register_context(ce); + clr_context_wait_for_deregister_to_register(ce); + intel_context_put(ce); + } else if (context_destroyed(ce)) { + /* Context has been destroyed */ + release_guc_id(guc, ce); + lrc_destroy(&ce->ref); + } + + return 0; +} diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c857fafb8a30..a9c2242d61a2 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -4142,6 +4142,7 @@ enum { FAULT_AND_CONTINUE /* Unsupported */ };
+#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12) #define GEN8_CTX_VALID (1 << 0) #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1) #define GEN8_CTX_FORCE_RESTORE (1 << 2) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index c5989c0b83d3..9dad3df5eaf7 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -419,6 +419,7 @@ bool i915_request_retire(struct i915_request *rq) */ if (!list_empty(&rq->sched.link)) remove_from_engine(rq); + atomic_dec(&rq->context->guc_id_ref); GEM_BUG_ON(!llist_empty(&rq->execute_cb));
__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
On 24.06.2021 09:04, Matthew Brost wrote:
Implement GuC context operations which includes GuC specific operations alloc, pin, unpin, and destroy.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 5 + drivers/gpu/drm/i915/gt/intel_context_types.h | 22 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 34 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 664 ++++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/i915_request.c | 1 + 8 files changed, 677 insertions(+), 55 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 4033184f13b9..2b68af16222c 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -383,6 +383,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
mutex_init(&ce->pin_mutex);
- spin_lock_init(&ce->guc_state.lock);
- ce->guc_id = GUC_INVALID_LRC_ID;
- INIT_LIST_HEAD(&ce->guc_id_link);
- i915_active_init(&ce->active, __intel_context_active, __intel_context_retire, 0);
} diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index bb6fef7eae52..ce7c69b34cd1 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -95,6 +95,7 @@ struct intel_context { #define CONTEXT_BANNED 6 #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 #define CONTEXT_NOPREEMPT 8 +#define CONTEXT_LRCA_DIRTY 9
struct { u64 timeout_us; @@ -137,14 +138,29 @@ struct intel_context {
u8 wa_bb_page; /* if set, page num reserved for context workarounds */
struct {
/** lock: protects everything in guc_state */
spinlock_t lock;
/**
* sched_state: scheduling state of this context using GuC
* submission
*/
u8 sched_state;
} guc_state;
/* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
/* GuC lrc descriptor ID */
u16 guc_id;
/* GuC lrc descriptor reference count */
atomic_t guc_id_ref;
/*
* GuC lrc descriptor ID - Not assigned in this patch but future patches
* in the series will.
*/* GuC ID link - in list when unpinned but guc_id still valid in GuC
- u16 guc_id;
- struct list_head guc_id_link;
some fields are being added with kerneldoc, some without what's the rule ?
};
#endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h index 41e5350a7a05..49d4857ad9b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h @@ -87,7 +87,6 @@ #define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0)
#define MAX_CONTEXT_HW_ID (1 << 21) /* exclusive */ -#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */ #define GEN11_MAX_CONTEXT_HW_ID (1 << 11) /* exclusive */ /* in Gen12 ID 0x7FF is reserved to indicate idle */ #define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 9ba8219475b2..d44316dc914b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -44,6 +44,14 @@ struct intel_guc { void (*disable)(struct intel_guc *guc); } interrupts;
/*
* contexts_lock protects the pool of free guc ids and a linked list of
* guc ids available to be stolen
*/
spinlock_t contexts_lock;
struct ida guc_ids;
struct list_head guc_id_list;
bool submission_selected;
struct i915_vma *ads_vma;
@@ -102,6 +110,29 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, response_buf, response_buf_size, 0); }
+static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
const u32 *action,
u32 len,
bool loop)
+{
- int err;
- /* No sleeping with spin locks, just busy loop */
- might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
+retry:
- err = intel_guc_send_nb(guc, action, len);
- if (unlikely(err == -EBUSY && loop)) {
if (likely(!in_atomic() && !irqs_disabled()))
cond_resched();
else
cpu_relax();
goto retry;
- }
- return err;
+}
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) { intel_guc_ct_event_handler(&guc->ct); @@ -203,6 +234,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) int intel_guc_reset_engine(struct intel_guc *guc, struct intel_engine_cs *engine);
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len);
void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
#endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 8e0ed7d8feb3..42a7daef2ff6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -901,6 +901,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_DEFAULT: ret = intel_guc_to_host_process_recv_msg(guc, payload, len); break;
- case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
ret = intel_guc_deregister_done_process_msg(guc, payload,
len);
default: ret = -EOPNOTSUPP; break;break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 38aff83ee9fa..d39579ac2faa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -13,7 +13,9 @@ #include "gt/intel_gt.h" #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt_requests.h" #include "gt/intel_lrc.h" +#include "gt/intel_lrc_reg.h" #include "gt/intel_mocs.h" #include "gt/intel_ring.h"
@@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce) &ce->guc_sched_state_no_lock); }
+/*
- Below is a set of functions which control the GuC scheduling state which
- require a lock, aside from the special case where the functions are called
- from guc_lrc_desc_pin(). In that case it isn't possible for any other code
- path to be executing on the context.
- */
+#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER BIT(0) +#define SCHED_STATE_DESTROYED BIT(1) +static inline void init_sched_state(struct intel_context *ce) +{
- /* Only should be called from guc_lrc_desc_pin() */
- atomic_set(&ce->guc_sched_state_no_lock, 0);
- ce->guc_state.sched_state = 0;
+}
+static inline bool +context_wait_for_deregister_to_register(struct intel_context *ce) +{
- return (ce->guc_state.sched_state &
SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+static inline void +set_context_wait_for_deregister_to_register(struct intel_context *ce) +{
- /* Only should be called from guc_lrc_desc_pin() */
- ce->guc_state.sched_state |=
SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
+}
+static inline void +clr_context_wait_for_deregister_to_register(struct intel_context *ce) +{
- lockdep_assert_held(&ce->guc_state.lock);
- ce->guc_state.sched_state =
(ce->guc_state.sched_state &
~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+static inline bool +context_destroyed(struct intel_context *ce) +{
- return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
+}
+static inline void +set_context_destroyed(struct intel_context *ce) +{
- lockdep_assert_held(&ce->guc_state.lock);
- ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
+}
+static inline bool context_guc_id_invalid(struct intel_context *ce) +{
- return (ce->guc_id == GUC_INVALID_LRC_ID);
+}
+static inline void set_context_guc_id_invalid(struct intel_context *ce) +{
- ce->guc_id = GUC_INVALID_LRC_ID;
+}
+static inline struct intel_guc *ce_to_guc(struct intel_context *ce) +{
- return &ce->engine->gt->uc.guc;
+}
static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node); @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) int len = 0; bool enabled = context_enabled(ce);
- GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
- GEM_BUG_ON(context_guc_id_invalid(ce));
- if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id;
@@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
- spin_lock_init(&guc->contexts_lock);
- INIT_LIST_HEAD(&guc->guc_id_list);
- ida_init(&guc->guc_ids);
- return 0;
}
@@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc) i915_sched_engine_put(guc->sched_engine); }
-static int guc_context_alloc(struct intel_context *ce) +static inline void queue_request(struct i915_sched_engine *sched_engine,
struct i915_request *rq,
int prio)
{
- return lrc_alloc(ce, ce->engine);
- GEM_BUG_ON(!list_empty(&rq->sched.link));
- list_add_tail(&rq->sched.link,
i915_sched_lookup_priolist(sched_engine, prio));
- set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+}
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
struct i915_request *rq)
+{
- int ret;
- __i915_request_submit(rq);
- trace_i915_request_in(rq, 0);
- guc_set_lrc_tail(rq);
- ret = guc_add_request(guc, rq);
- if (ret == -EBUSY)
guc->stalled_request = rq;
- return ret;
+}
+static void guc_submit_request(struct i915_request *rq) +{
- struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
- struct intel_guc *guc = &rq->engine->gt->uc.guc;
- unsigned long flags;
- /* Will be called from irq-context when using foreign fences. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
queue_request(sched_engine, rq, rq_prio(rq));
- else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
tasklet_hi_schedule(&sched_engine->tasklet);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+#define GUC_ID_START 64 /* First 64 guc_ids reserved */ +static int new_guc_id(struct intel_guc *guc) +{
- return ida_simple_get(&guc->guc_ids, GUC_ID_START,
GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
__GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+}
+static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce) +{
- if (!context_guc_id_invalid(ce)) {
ida_simple_remove(&guc->guc_ids, ce->guc_id);
reset_lrc_desc(guc, ce->guc_id);
set_context_guc_id_invalid(ce);
- }
- if (!list_empty(&ce->guc_id_link))
list_del_init(&ce->guc_id_link);
+}
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce) +{
- unsigned long flags;
- spin_lock_irqsave(&guc->contexts_lock, flags);
- __release_guc_id(guc, ce);
- spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+static int steal_guc_id(struct intel_guc *guc) +{
- struct intel_context *ce;
- int guc_id;
- if (!list_empty(&guc->guc_id_list)) {
ce = list_first_entry(&guc->guc_id_list,
struct intel_context,
guc_id_link);
GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
GEM_BUG_ON(context_guc_id_invalid(ce));
list_del_init(&ce->guc_id_link);
guc_id = ce->guc_id;
set_context_guc_id_invalid(ce);
return guc_id;
- } else {
return -EAGAIN;
- }
+}
+static int assign_guc_id(struct intel_guc *guc, u16 *out) +{
- int ret;
- ret = new_guc_id(guc);
- if (unlikely(ret < 0)) {
ret = steal_guc_id(guc);
if (ret < 0)
return ret;
- }
- *out = ret;
- return 0;
+}
+#define PIN_GUC_ID_TRIES 4 +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce) +{
- int ret = 0;
- unsigned long flags, tries = PIN_GUC_ID_TRIES;
- GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+try_again:
- spin_lock_irqsave(&guc->contexts_lock, flags);
- if (context_guc_id_invalid(ce)) {
ret = assign_guc_id(guc, &ce->guc_id);
if (ret)
goto out_unlock;
ret = 1; // Indidcates newly assigned HW context
C++ style comment
- }
- if (!list_empty(&ce->guc_id_link))
list_del_init(&ce->guc_id_link);
- atomic_inc(&ce->guc_id_ref);
+out_unlock:
- spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /*
* -EAGAIN indicates no guc_ids are available, let's retire any
* outstanding requests to see if that frees up a guc_id. If the first
* retire didn't help, insert a sleep with the timeslice duration before
* attempting to retire more requests. Double the sleep period each
* subsequent pass before finally giving up. The sleep period has max of
* 100ms and minimum of 1ms.
*/
- if (ret == -EAGAIN && --tries) {
if (PIN_GUC_ID_TRIES - tries > 1) {
unsigned int timeslice_shifted =
ce->engine->props.timeslice_duration_ms <<
(PIN_GUC_ID_TRIES - tries - 2);
unsigned int max = min_t(unsigned int, 100,
timeslice_shifted);
msleep(max_t(unsigned int, max, 1));
}
intel_gt_retire_requests(guc_to_gt(guc));
goto try_again;
- }
- return ret;
+}
+static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce) +{
- unsigned long flags;
- GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
- spin_lock_irqsave(&guc->contexts_lock, flags);
- if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
!atomic_read(&ce->guc_id_ref))
list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
- spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+static int __guc_action_register_context(struct intel_guc *guc,
u32 guc_id,
u32 offset)
+{
- u32 action[] = {
INTEL_GUC_ACTION_REGISTER_CONTEXT,
guc_id,
offset,
- };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+static int register_context(struct intel_context *ce) +{
- struct intel_guc *guc = ce_to_guc(ce);
- u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
ce->guc_id * sizeof(struct guc_lrc_desc);
- return __guc_action_register_context(guc, ce->guc_id, offset);
+}
+static int __guc_action_deregister_context(struct intel_guc *guc,
u32 guc_id)
+{
- u32 action[] = {
INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
guc_id,
- };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+static int deregister_context(struct intel_context *ce, u32 guc_id) +{
- struct intel_guc *guc = ce_to_guc(ce);
- return __guc_action_deregister_context(guc, guc_id);
+}
+static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask) +{
- switch (class) {
- case RENDER_CLASS:
return mask >> RCS0;
- case VIDEO_ENHANCEMENT_CLASS:
return mask >> VECS0;
- case VIDEO_DECODE_CLASS:
return mask >> VCS0;
- case COPY_ENGINE_CLASS:
return mask >> BCS0;
- default:
GEM_BUG_ON("Invalid Class");
return 0;
- }
+}
+static void guc_context_policy_init(struct intel_engine_cs *engine,
struct guc_lrc_desc *desc)
+{
- desc->policy_flags = 0;
- desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
- desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+}
+static int guc_lrc_desc_pin(struct intel_context *ce) +{
- struct intel_runtime_pm *runtime_pm =
&ce->engine->gt->i915->runtime_pm;
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- u32 desc_idx = ce->guc_id;
- struct guc_lrc_desc *desc;
- bool context_registered;
- intel_wakeref_t wakeref;
- int ret = 0;
- GEM_BUG_ON(!engine->mask);
- /*
* Ensure LRC + CT vmas are is same region as write barrier is done
* based on CT vma region.
*/
- GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
i915_gem_object_is_lmem(ce->ring->vma->obj));
- context_registered = lrc_desc_registered(guc, desc_idx);
- reset_lrc_desc(guc, desc_idx);
- set_lrc_desc_registered(guc, desc_idx, ce);
- desc = __get_lrc_desc(guc, desc_idx);
- desc->engine_class = engine_class_to_guc_class(engine->class);
- desc->engine_submit_mask = adjust_engine_mask(engine->class,
engine->mask);
- desc->hw_context_desc = ce->lrc.lrca;
- desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
- desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
- guc_context_policy_init(engine, desc);
- init_sched_state(ce);
- /*
* The context_lookup xarray is used to determine if the hardware
* context is currently registered. There are two cases in which it
* could be regisgered either the guc_id has been stole from from
* another context or the lrc descriptor address of this context has
* changed. In either case the context needs to be deregistered with the
* GuC before registering this context.
*/
- if (context_registered) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
/*
* If stealing the guc_id, this ce has the same guc_id as the
* context whos guc_id was stole.
*/
with_intel_runtime_pm(runtime_pm, wakeref)
ret = deregister_context(ce, ce->guc_id);
- } else {
with_intel_runtime_pm(runtime_pm, wakeref)
ret = register_context(ce);
- }
- return ret;
}
static int guc_context_pre_pin(struct intel_context *ce, @@ -443,36 +813,137 @@ static int guc_context_pre_pin(struct intel_context *ce,
static int guc_context_pin(struct intel_context *ce, void *vaddr) {
- if (i915_ggtt_offset(ce->state) !=
(ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
- return lrc_pin(ce, ce->engine, vaddr);
}
+static void guc_context_unpin(struct intel_context *ce) +{
- unpin_guc_id(ce_to_guc(ce), ce);
- lrc_unpin(ce);
+}
+static void guc_context_post_unpin(struct intel_context *ce) +{
- lrc_post_unpin(ce);
+}
+static inline void guc_lrc_desc_unpin(struct intel_context *ce) +{
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- unsigned long flags;
- GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
- GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- deregister_context(ce, ce->guc_id);
+}
+static void guc_context_destroy(struct kref *kref) +{
- struct intel_context *ce = container_of(kref, typeof(*ce), ref);
- struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
- struct intel_guc *guc = &ce->engine->gt->uc.guc;
- intel_wakeref_t wakeref;
- unsigned long flags;
- /*
* If the guc_id is invalid this context has been stolen and we can free
* it immediately. Also can be freed immediately if the context is not
* registered with the GuC.
*/
- if (context_guc_id_invalid(ce) ||
!lrc_desc_registered(guc, ce->guc_id)) {
release_guc_id(guc, ce);
lrc_destroy(kref);
return;
- }
- /*
* We have to acquire the context spinlock and check guc_id again, if it
* is valid it hasn't been stolen and needs to be deregistered. We
* delete this context from the list of unpinned guc_ids available to
* stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
* returns indicating this context has been deregistered the guc_id is
* returned to the pool of available guc_ids.
*/
- spin_lock_irqsave(&guc->contexts_lock, flags);
- if (context_guc_id_invalid(ce)) {
__release_guc_id(guc, ce);
spin_unlock_irqrestore(&guc->contexts_lock, flags);
lrc_destroy(kref);
return;
- }
- if (!list_empty(&ce->guc_id_link))
list_del_init(&ce->guc_id_link);
- spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /*
* We defer GuC context deregistration until the context is destroyed
* in order to save on CTBs. With this optimization ideally we only need
* 1 CTB to register the context during the first pin and 1 CTB to
* deregister the context when the context is destroyed. Without this
* optimization, a CTB would be needed every pin & unpin.
*
* XXX: Need to acqiure the runtime wakeref as this can be triggered
* from context_free_worker when not runtime wakeref is held.
* guc_lrc_desc_unpin requires the runtime as a GuC register is written
* in H2G CTB to deregister the context. A future patch may defer this
* H2G CTB if the runtime wakeref is zero.
*/
- with_intel_runtime_pm(runtime_pm, wakeref)
guc_lrc_desc_unpin(ce);
+}
+static int guc_context_alloc(struct intel_context *ce) +{
- return lrc_alloc(ce, ce->engine);
+}
static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
.pre_pin = guc_context_pre_pin, .pin = guc_context_pin,
- .unpin = lrc_unpin,
- .post_unpin = lrc_post_unpin,
.unpin = guc_context_unpin,
.post_unpin = guc_context_post_unpin,
.enter = intel_context_enter_engine, .exit = intel_context_exit_engine,
.reset = lrc_reset,
- .destroy = lrc_destroy,
- .destroy = guc_context_destroy,
};
-static int guc_request_alloc(struct i915_request *request) +static bool context_needs_register(struct intel_context *ce, bool new_guc_id) {
- return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+}
+static int guc_request_alloc(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- struct intel_guc *guc = ce_to_guc(ce); int ret;
- GEM_BUG_ON(!intel_context_is_pinned(request->context));
GEM_BUG_ON(!intel_context_is_pinned(rq->context));
/*
- Flush enough space to reduce the likelihood of waiting after
- we start building the request - in which case we will just
- have to repeat work.
*/
- request->reserved_space += GUC_REQUEST_SIZE;
rq->reserved_space += GUC_REQUEST_SIZE;
/*
- Note that after this point, we have committed to using
@@ -483,56 +954,47 @@ static int guc_request_alloc(struct i915_request *request) */
/* Unconditionally invalidate GPU caches and TLBs. */
- ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
- ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE); if (ret) return ret;
- request->reserved_space -= GUC_REQUEST_SIZE;
- return 0;
-}
-static inline void queue_request(struct i915_sched_engine *sched_engine,
struct i915_request *rq,
int prio)
-{
- GEM_BUG_ON(!list_empty(&rq->sched.link));
- list_add_tail(&rq->sched.link,
i915_sched_lookup_priolist(sched_engine, prio));
- set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-}
-static int guc_bypass_tasklet_submit(struct intel_guc *guc,
struct i915_request *rq)
-{
- int ret;
- __i915_request_submit(rq);
- rq->reserved_space -= GUC_REQUEST_SIZE;
- trace_i915_request_in(rq, 0);
- guc_set_lrc_tail(rq);
- ret = guc_add_request(guc, rq);
- if (ret == -EBUSY)
guc->stalled_request = rq;
- return ret;
-}
-static void guc_submit_request(struct i915_request *rq) -{
- struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
- struct intel_guc *guc = &rq->engine->gt->uc.guc;
- unsigned long flags;
- /*
* Call pin_guc_id here rather than in the pinning step as with
* dma_resv, contexts can be repeatedly pinned / unpinned trashing the
* guc_ids and creating horrible race conditions. This is especially bad
* when guc_ids are being stolen due to over subscription. By the time
* this function is reached, it is guaranteed that the guc_id will be
* persistent until the generated request is retired. Thus, sealing these
* race conditions. It is still safe to fail here if guc_ids are
* exhausted and return -EAGAIN to the user indicating that they can try
* again in the future.
*
* There is no need for a lock here as the timeline mutex ensures at
* most one context can be executing this code path at once. The
* guc_id_ref is incremented once for every request in flight and
* decremented on each retire. When it is zero, a lock around the
* increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
*/
- if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
return 0;
- /* Will be called from irq-context when using foreign fences. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- ret = pin_guc_id(guc, ce); /* returns 1 if new guc_id assigned */
- if (unlikely(ret < 0))
return ret;;
- if (context_needs_register(ce, !!ret)) {
ret = guc_lrc_desc_pin(ce);
if (unlikely(ret)) { /* unwind */
atomic_dec(&ce->guc_id_ref);
unpin_guc_id(guc, ce);
return ret;
}
- }
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
queue_request(sched_engine, rq, rq_prio(rq));
- else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
tasklet_hi_schedule(&sched_engine->tasklet);
- clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
- return 0;
}
static void sanitize_hwsp(struct intel_engine_cs *engine) @@ -606,6 +1068,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine) engine->submit_request = guc_submit_request; }
+static inline void guc_kernel_context_pin(struct intel_guc *guc,
struct intel_context *ce)
+{
- if (context_guc_id_invalid(ce))
pin_guc_id(guc, ce);
- guc_lrc_desc_pin(ce);
+}
+static inline void guc_init_lrc_mapping(struct intel_guc *guc) +{
- struct intel_gt *gt = guc_to_gt(guc);
- struct intel_engine_cs *engine;
- enum intel_engine_id id;
- /* make sure all descriptors are clean... */
- xa_destroy(&guc->context_lookup);
- /*
* Some contexts might have been pinned before we enabled GuC
* submission, so we need to add them to the GuC bookeeping.
* Also, after a reset the GuC we want to make sure that the information
* shared with GuC is properly reset. The kernel lrcs are not attached
* to the gem_context, so they need to be added separately.
*
* Note: we purposely do not check the error return of
* guc_lrc_desc_pin, because that function can only fail in two cases.
* One, if there aren't enough free IDs, but we're guaranteed to have
* enough here (we're either only pinning a handful of lrc on first boot
* or we're re-pinning lrcs that were already pinned before the reset).
* Two, if the GuC has died and CTBs can't make forward progress.
* Presumably, the GuC should be alive as this function is called on
* driver load or after a reset. Even if it is dead, another full GPU
* reset will be triggered and this function would be called again.
*/
- for_each_engine(engine, gt, id)
if (engine->kernel_context)
guc_kernel_context_pin(guc, engine->kernel_context);
+}
static void guc_release(struct intel_engine_cs *engine) { engine->sanitize = NULL; /* no longer in control, nothing to sanitize */ @@ -718,6 +1220,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
void intel_guc_submission_enable(struct intel_guc *guc) {
- guc_init_lrc_mapping(guc);
}
void intel_guc_submission_disable(struct intel_guc *guc) @@ -743,3 +1246,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc) { guc->submission_selected = __guc_submission_selected(guc); }
+static inline struct intel_context * +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) +{
- struct intel_context *ce;
- if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
drm_dbg(&guc_to_gt(guc)->i915->drm,
"Invalid desc_idx %u", desc_idx);
just debug? why not an (fatal) error ?
return NULL;
- }
- ce = __get_context(guc, desc_idx);
- if (unlikely(!ce)) {
drm_dbg(&guc_to_gt(guc)->i915->drm,
"Context is NULL, desc_idx %u", desc_idx);
not an error ?
return NULL;
- }
- return ce;
+}
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
const u32 *msg,
u32 len)
+{
- struct intel_context *ce;
- u32 desc_idx = msg[0];
- if (unlikely(len < 1)) {
drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
return -EPROTO;
- }
- ce = g2h_context_lookup(guc, desc_idx);
- if (unlikely(!ce))
return -EPROTO;
- if (context_wait_for_deregister_to_register(ce)) {
struct intel_runtime_pm *runtime_pm =
&ce->engine->gt->i915->runtime_pm;
intel_wakeref_t wakeref;
/*
* Previous owner of this guc_id has been deregistered, now safe
* register this context.
*/
with_intel_runtime_pm(runtime_pm, wakeref)
register_context(ce);
clr_context_wait_for_deregister_to_register(ce);
intel_context_put(ce);
- } else if (context_destroyed(ce)) {
/* Context has been destroyed */
release_guc_id(guc, ce);
lrc_destroy(&ce->ref);
- }
- return 0;
+} diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c857fafb8a30..a9c2242d61a2 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -4142,6 +4142,7 @@ enum { FAULT_AND_CONTINUE /* Unsupported */ };
+#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12) #define GEN8_CTX_VALID (1 << 0) #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1) #define GEN8_CTX_FORCE_RESTORE (1 << 2) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index c5989c0b83d3..9dad3df5eaf7 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -419,6 +419,7 @@ bool i915_request_retire(struct i915_request *rq) */ if (!list_empty(&rq->sched.link)) remove_from_engine(rq);
atomic_dec(&rq->context->guc_id_ref); GEM_BUG_ON(!llist_empty(&rq->execute_cb));
__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
On Fri, Jun 25, 2021 at 03:25:13PM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Implement GuC context operations which includes GuC specific operations alloc, pin, unpin, and destroy.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 5 + drivers/gpu/drm/i915/gt/intel_context_types.h | 22 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 34 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 664 ++++++++++++++++-- drivers/gpu/drm/i915/i915_reg.h | 1 + drivers/gpu/drm/i915/i915_request.c | 1 + 8 files changed, 677 insertions(+), 55 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 4033184f13b9..2b68af16222c 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -383,6 +383,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
mutex_init(&ce->pin_mutex);
- spin_lock_init(&ce->guc_state.lock);
- ce->guc_id = GUC_INVALID_LRC_ID;
- INIT_LIST_HEAD(&ce->guc_id_link);
- i915_active_init(&ce->active, __intel_context_active, __intel_context_retire, 0);
} diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index bb6fef7eae52..ce7c69b34cd1 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -95,6 +95,7 @@ struct intel_context { #define CONTEXT_BANNED 6 #define CONTEXT_FORCE_SINGLE_SUBMISSION 7 #define CONTEXT_NOPREEMPT 8 +#define CONTEXT_LRCA_DIRTY 9
struct { u64 timeout_us; @@ -137,14 +138,29 @@ struct intel_context {
u8 wa_bb_page; /* if set, page num reserved for context workarounds */
struct {
/** lock: protects everything in guc_state */
spinlock_t lock;
/**
* sched_state: scheduling state of this context using GuC
* submission
*/
u8 sched_state;
} guc_state;
/* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
/* GuC lrc descriptor ID */
u16 guc_id;
/* GuC lrc descriptor reference count */
atomic_t guc_id_ref;
/*
* GuC lrc descriptor ID - Not assigned in this patch but future patches
* in the series will.
*/* GuC ID link - in list when unpinned but guc_id still valid in GuC
- u16 guc_id;
- struct list_head guc_id_link;
some fields are being added with kerneldoc, some without what's the rule ?
Yea, idk. I think we need to scrub all the structures in the driver and add kernel doc everywhere. IMO not a blocker too as I think all the structures are going to be reworked with OO concepts after the GuC code lands before moving to DRM scheduler. That would be logical time to update all the kernel doc too.
};
#endif /* __INTEL_CONTEXT_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h index 41e5350a7a05..49d4857ad9b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h @@ -87,7 +87,6 @@ #define GEN11_CSB_WRITE_PTR_MASK (GEN11_CSB_PTR_MASK << 0)
#define MAX_CONTEXT_HW_ID (1 << 21) /* exclusive */ -#define MAX_GUC_CONTEXT_HW_ID (1 << 20) /* exclusive */ #define GEN11_MAX_CONTEXT_HW_ID (1 << 11) /* exclusive */ /* in Gen12 ID 0x7FF is reserved to indicate idle */ #define GEN12_MAX_CONTEXT_HW_ID (GEN11_MAX_CONTEXT_HW_ID - 1) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 9ba8219475b2..d44316dc914b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -44,6 +44,14 @@ struct intel_guc { void (*disable)(struct intel_guc *guc); } interrupts;
/*
* contexts_lock protects the pool of free guc ids and a linked list of
* guc ids available to be stolen
*/
spinlock_t contexts_lock;
struct ida guc_ids;
struct list_head guc_id_list;
bool submission_selected;
struct i915_vma *ads_vma;
@@ -102,6 +110,29 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, response_buf, response_buf_size, 0); }
+static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
const u32 *action,
u32 len,
bool loop)
+{
- int err;
- /* No sleeping with spin locks, just busy loop */
- might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
+retry:
- err = intel_guc_send_nb(guc, action, len);
- if (unlikely(err == -EBUSY && loop)) {
if (likely(!in_atomic() && !irqs_disabled()))
cond_resched();
else
cpu_relax();
goto retry;
- }
- return err;
+}
static inline void intel_guc_to_host_event_handler(struct intel_guc *guc) { intel_guc_ct_event_handler(&guc->ct); @@ -203,6 +234,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) int intel_guc_reset_engine(struct intel_guc *guc, struct intel_engine_cs *engine);
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len);
void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
#endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 8e0ed7d8feb3..42a7daef2ff6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -901,6 +901,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_DEFAULT: ret = intel_guc_to_host_process_recv_msg(guc, payload, len); break;
- case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
ret = intel_guc_deregister_done_process_msg(guc, payload,
len);
default: ret = -EOPNOTSUPP; break;break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 38aff83ee9fa..d39579ac2faa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -13,7 +13,9 @@ #include "gt/intel_gt.h" #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt_requests.h" #include "gt/intel_lrc.h" +#include "gt/intel_lrc_reg.h" #include "gt/intel_mocs.h" #include "gt/intel_ring.h"
@@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce) &ce->guc_sched_state_no_lock); }
+/*
- Below is a set of functions which control the GuC scheduling state which
- require a lock, aside from the special case where the functions are called
- from guc_lrc_desc_pin(). In that case it isn't possible for any other code
- path to be executing on the context.
- */
+#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER BIT(0) +#define SCHED_STATE_DESTROYED BIT(1) +static inline void init_sched_state(struct intel_context *ce) +{
- /* Only should be called from guc_lrc_desc_pin() */
- atomic_set(&ce->guc_sched_state_no_lock, 0);
- ce->guc_state.sched_state = 0;
+}
+static inline bool +context_wait_for_deregister_to_register(struct intel_context *ce) +{
- return (ce->guc_state.sched_state &
SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+static inline void +set_context_wait_for_deregister_to_register(struct intel_context *ce) +{
- /* Only should be called from guc_lrc_desc_pin() */
- ce->guc_state.sched_state |=
SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
+}
+static inline void +clr_context_wait_for_deregister_to_register(struct intel_context *ce) +{
- lockdep_assert_held(&ce->guc_state.lock);
- ce->guc_state.sched_state =
(ce->guc_state.sched_state &
~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+static inline bool +context_destroyed(struct intel_context *ce) +{
- return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
+}
+static inline void +set_context_destroyed(struct intel_context *ce) +{
- lockdep_assert_held(&ce->guc_state.lock);
- ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
+}
+static inline bool context_guc_id_invalid(struct intel_context *ce) +{
- return (ce->guc_id == GUC_INVALID_LRC_ID);
+}
+static inline void set_context_guc_id_invalid(struct intel_context *ce) +{
- ce->guc_id = GUC_INVALID_LRC_ID;
+}
+static inline struct intel_guc *ce_to_guc(struct intel_context *ce) +{
- return &ce->engine->gt->uc.guc;
+}
static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node); @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) int len = 0; bool enabled = context_enabled(ce);
- GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
- GEM_BUG_ON(context_guc_id_invalid(ce));
- if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id;
@@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
- spin_lock_init(&guc->contexts_lock);
- INIT_LIST_HEAD(&guc->guc_id_list);
- ida_init(&guc->guc_ids);
- return 0;
}
@@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc) i915_sched_engine_put(guc->sched_engine); }
-static int guc_context_alloc(struct intel_context *ce) +static inline void queue_request(struct i915_sched_engine *sched_engine,
struct i915_request *rq,
int prio)
{
- return lrc_alloc(ce, ce->engine);
- GEM_BUG_ON(!list_empty(&rq->sched.link));
- list_add_tail(&rq->sched.link,
i915_sched_lookup_priolist(sched_engine, prio));
- set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+}
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
struct i915_request *rq)
+{
- int ret;
- __i915_request_submit(rq);
- trace_i915_request_in(rq, 0);
- guc_set_lrc_tail(rq);
- ret = guc_add_request(guc, rq);
- if (ret == -EBUSY)
guc->stalled_request = rq;
- return ret;
+}
+static void guc_submit_request(struct i915_request *rq) +{
- struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
- struct intel_guc *guc = &rq->engine->gt->uc.guc;
- unsigned long flags;
- /* Will be called from irq-context when using foreign fences. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
queue_request(sched_engine, rq, rq_prio(rq));
- else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
tasklet_hi_schedule(&sched_engine->tasklet);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+#define GUC_ID_START 64 /* First 64 guc_ids reserved */ +static int new_guc_id(struct intel_guc *guc) +{
- return ida_simple_get(&guc->guc_ids, GUC_ID_START,
GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
__GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+}
+static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce) +{
- if (!context_guc_id_invalid(ce)) {
ida_simple_remove(&guc->guc_ids, ce->guc_id);
reset_lrc_desc(guc, ce->guc_id);
set_context_guc_id_invalid(ce);
- }
- if (!list_empty(&ce->guc_id_link))
list_del_init(&ce->guc_id_link);
+}
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce) +{
- unsigned long flags;
- spin_lock_irqsave(&guc->contexts_lock, flags);
- __release_guc_id(guc, ce);
- spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+static int steal_guc_id(struct intel_guc *guc) +{
- struct intel_context *ce;
- int guc_id;
- if (!list_empty(&guc->guc_id_list)) {
ce = list_first_entry(&guc->guc_id_list,
struct intel_context,
guc_id_link);
GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
GEM_BUG_ON(context_guc_id_invalid(ce));
list_del_init(&ce->guc_id_link);
guc_id = ce->guc_id;
set_context_guc_id_invalid(ce);
return guc_id;
- } else {
return -EAGAIN;
- }
+}
+static int assign_guc_id(struct intel_guc *guc, u16 *out) +{
- int ret;
- ret = new_guc_id(guc);
- if (unlikely(ret < 0)) {
ret = steal_guc_id(guc);
if (ret < 0)
return ret;
- }
- *out = ret;
- return 0;
+}
+#define PIN_GUC_ID_TRIES 4 +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce) +{
- int ret = 0;
- unsigned long flags, tries = PIN_GUC_ID_TRIES;
- GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+try_again:
- spin_lock_irqsave(&guc->contexts_lock, flags);
- if (context_guc_id_invalid(ce)) {
ret = assign_guc_id(guc, &ce->guc_id);
if (ret)
goto out_unlock;
ret = 1; // Indidcates newly assigned HW context
C++ style comment
Yep, will fix.
- }
- if (!list_empty(&ce->guc_id_link))
list_del_init(&ce->guc_id_link);
- atomic_inc(&ce->guc_id_ref);
+out_unlock:
- spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /*
* -EAGAIN indicates no guc_ids are available, let's retire any
* outstanding requests to see if that frees up a guc_id. If the first
* retire didn't help, insert a sleep with the timeslice duration before
* attempting to retire more requests. Double the sleep period each
* subsequent pass before finally giving up. The sleep period has max of
* 100ms and minimum of 1ms.
*/
- if (ret == -EAGAIN && --tries) {
if (PIN_GUC_ID_TRIES - tries > 1) {
unsigned int timeslice_shifted =
ce->engine->props.timeslice_duration_ms <<
(PIN_GUC_ID_TRIES - tries - 2);
unsigned int max = min_t(unsigned int, 100,
timeslice_shifted);
msleep(max_t(unsigned int, max, 1));
}
intel_gt_retire_requests(guc_to_gt(guc));
goto try_again;
- }
- return ret;
+}
+static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce) +{
- unsigned long flags;
- GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
- spin_lock_irqsave(&guc->contexts_lock, flags);
- if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
!atomic_read(&ce->guc_id_ref))
list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
- spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+static int __guc_action_register_context(struct intel_guc *guc,
u32 guc_id,
u32 offset)
+{
- u32 action[] = {
INTEL_GUC_ACTION_REGISTER_CONTEXT,
guc_id,
offset,
- };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+static int register_context(struct intel_context *ce) +{
- struct intel_guc *guc = ce_to_guc(ce);
- u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
ce->guc_id * sizeof(struct guc_lrc_desc);
- return __guc_action_register_context(guc, ce->guc_id, offset);
+}
+static int __guc_action_deregister_context(struct intel_guc *guc,
u32 guc_id)
+{
- u32 action[] = {
INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
guc_id,
- };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+static int deregister_context(struct intel_context *ce, u32 guc_id) +{
- struct intel_guc *guc = ce_to_guc(ce);
- return __guc_action_deregister_context(guc, guc_id);
+}
+static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask) +{
- switch (class) {
- case RENDER_CLASS:
return mask >> RCS0;
- case VIDEO_ENHANCEMENT_CLASS:
return mask >> VECS0;
- case VIDEO_DECODE_CLASS:
return mask >> VCS0;
- case COPY_ENGINE_CLASS:
return mask >> BCS0;
- default:
GEM_BUG_ON("Invalid Class");
return 0;
- }
+}
+static void guc_context_policy_init(struct intel_engine_cs *engine,
struct guc_lrc_desc *desc)
+{
- desc->policy_flags = 0;
- desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
- desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+}
+static int guc_lrc_desc_pin(struct intel_context *ce) +{
- struct intel_runtime_pm *runtime_pm =
&ce->engine->gt->i915->runtime_pm;
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- u32 desc_idx = ce->guc_id;
- struct guc_lrc_desc *desc;
- bool context_registered;
- intel_wakeref_t wakeref;
- int ret = 0;
- GEM_BUG_ON(!engine->mask);
- /*
* Ensure LRC + CT vmas are is same region as write barrier is done
* based on CT vma region.
*/
- GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
i915_gem_object_is_lmem(ce->ring->vma->obj));
- context_registered = lrc_desc_registered(guc, desc_idx);
- reset_lrc_desc(guc, desc_idx);
- set_lrc_desc_registered(guc, desc_idx, ce);
- desc = __get_lrc_desc(guc, desc_idx);
- desc->engine_class = engine_class_to_guc_class(engine->class);
- desc->engine_submit_mask = adjust_engine_mask(engine->class,
engine->mask);
- desc->hw_context_desc = ce->lrc.lrca;
- desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
- desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
- guc_context_policy_init(engine, desc);
- init_sched_state(ce);
- /*
* The context_lookup xarray is used to determine if the hardware
* context is currently registered. There are two cases in which it
* could be regisgered either the guc_id has been stole from from
* another context or the lrc descriptor address of this context has
* changed. In either case the context needs to be deregistered with the
* GuC before registering this context.
*/
- if (context_registered) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
/*
* If stealing the guc_id, this ce has the same guc_id as the
* context whos guc_id was stole.
*/
with_intel_runtime_pm(runtime_pm, wakeref)
ret = deregister_context(ce, ce->guc_id);
- } else {
with_intel_runtime_pm(runtime_pm, wakeref)
ret = register_context(ce);
- }
- return ret;
}
static int guc_context_pre_pin(struct intel_context *ce, @@ -443,36 +813,137 @@ static int guc_context_pre_pin(struct intel_context *ce,
static int guc_context_pin(struct intel_context *ce, void *vaddr) {
- if (i915_ggtt_offset(ce->state) !=
(ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
- return lrc_pin(ce, ce->engine, vaddr);
}
+static void guc_context_unpin(struct intel_context *ce) +{
- unpin_guc_id(ce_to_guc(ce), ce);
- lrc_unpin(ce);
+}
+static void guc_context_post_unpin(struct intel_context *ce) +{
- lrc_post_unpin(ce);
+}
+static inline void guc_lrc_desc_unpin(struct intel_context *ce) +{
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- unsigned long flags;
- GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
- GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- deregister_context(ce, ce->guc_id);
+}
+static void guc_context_destroy(struct kref *kref) +{
- struct intel_context *ce = container_of(kref, typeof(*ce), ref);
- struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
- struct intel_guc *guc = &ce->engine->gt->uc.guc;
- intel_wakeref_t wakeref;
- unsigned long flags;
- /*
* If the guc_id is invalid this context has been stolen and we can free
* it immediately. Also can be freed immediately if the context is not
* registered with the GuC.
*/
- if (context_guc_id_invalid(ce) ||
!lrc_desc_registered(guc, ce->guc_id)) {
release_guc_id(guc, ce);
lrc_destroy(kref);
return;
- }
- /*
* We have to acquire the context spinlock and check guc_id again, if it
* is valid it hasn't been stolen and needs to be deregistered. We
* delete this context from the list of unpinned guc_ids available to
* stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
* returns indicating this context has been deregistered the guc_id is
* returned to the pool of available guc_ids.
*/
- spin_lock_irqsave(&guc->contexts_lock, flags);
- if (context_guc_id_invalid(ce)) {
__release_guc_id(guc, ce);
spin_unlock_irqrestore(&guc->contexts_lock, flags);
lrc_destroy(kref);
return;
- }
- if (!list_empty(&ce->guc_id_link))
list_del_init(&ce->guc_id_link);
- spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /*
* We defer GuC context deregistration until the context is destroyed
* in order to save on CTBs. With this optimization ideally we only need
* 1 CTB to register the context during the first pin and 1 CTB to
* deregister the context when the context is destroyed. Without this
* optimization, a CTB would be needed every pin & unpin.
*
* XXX: Need to acqiure the runtime wakeref as this can be triggered
* from context_free_worker when not runtime wakeref is held.
* guc_lrc_desc_unpin requires the runtime as a GuC register is written
* in H2G CTB to deregister the context. A future patch may defer this
* H2G CTB if the runtime wakeref is zero.
*/
- with_intel_runtime_pm(runtime_pm, wakeref)
guc_lrc_desc_unpin(ce);
+}
+static int guc_context_alloc(struct intel_context *ce) +{
- return lrc_alloc(ce, ce->engine);
+}
static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
.pre_pin = guc_context_pre_pin, .pin = guc_context_pin,
- .unpin = lrc_unpin,
- .post_unpin = lrc_post_unpin,
.unpin = guc_context_unpin,
.post_unpin = guc_context_post_unpin,
.enter = intel_context_enter_engine, .exit = intel_context_exit_engine,
.reset = lrc_reset,
- .destroy = lrc_destroy,
- .destroy = guc_context_destroy,
};
-static int guc_request_alloc(struct i915_request *request) +static bool context_needs_register(struct intel_context *ce, bool new_guc_id) {
- return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+}
+static int guc_request_alloc(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- struct intel_guc *guc = ce_to_guc(ce); int ret;
- GEM_BUG_ON(!intel_context_is_pinned(request->context));
GEM_BUG_ON(!intel_context_is_pinned(rq->context));
/*
- Flush enough space to reduce the likelihood of waiting after
- we start building the request - in which case we will just
- have to repeat work.
*/
- request->reserved_space += GUC_REQUEST_SIZE;
rq->reserved_space += GUC_REQUEST_SIZE;
/*
- Note that after this point, we have committed to using
@@ -483,56 +954,47 @@ static int guc_request_alloc(struct i915_request *request) */
/* Unconditionally invalidate GPU caches and TLBs. */
- ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
- ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE); if (ret) return ret;
- request->reserved_space -= GUC_REQUEST_SIZE;
- return 0;
-}
-static inline void queue_request(struct i915_sched_engine *sched_engine,
struct i915_request *rq,
int prio)
-{
- GEM_BUG_ON(!list_empty(&rq->sched.link));
- list_add_tail(&rq->sched.link,
i915_sched_lookup_priolist(sched_engine, prio));
- set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-}
-static int guc_bypass_tasklet_submit(struct intel_guc *guc,
struct i915_request *rq)
-{
- int ret;
- __i915_request_submit(rq);
- rq->reserved_space -= GUC_REQUEST_SIZE;
- trace_i915_request_in(rq, 0);
- guc_set_lrc_tail(rq);
- ret = guc_add_request(guc, rq);
- if (ret == -EBUSY)
guc->stalled_request = rq;
- return ret;
-}
-static void guc_submit_request(struct i915_request *rq) -{
- struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
- struct intel_guc *guc = &rq->engine->gt->uc.guc;
- unsigned long flags;
- /*
* Call pin_guc_id here rather than in the pinning step as with
* dma_resv, contexts can be repeatedly pinned / unpinned trashing the
* guc_ids and creating horrible race conditions. This is especially bad
* when guc_ids are being stolen due to over subscription. By the time
* this function is reached, it is guaranteed that the guc_id will be
* persistent until the generated request is retired. Thus, sealing these
* race conditions. It is still safe to fail here if guc_ids are
* exhausted and return -EAGAIN to the user indicating that they can try
* again in the future.
*
* There is no need for a lock here as the timeline mutex ensures at
* most one context can be executing this code path at once. The
* guc_id_ref is incremented once for every request in flight and
* decremented on each retire. When it is zero, a lock around the
* increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
*/
- if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
return 0;
- /* Will be called from irq-context when using foreign fences. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- ret = pin_guc_id(guc, ce); /* returns 1 if new guc_id assigned */
- if (unlikely(ret < 0))
return ret;;
- if (context_needs_register(ce, !!ret)) {
ret = guc_lrc_desc_pin(ce);
if (unlikely(ret)) { /* unwind */
atomic_dec(&ce->guc_id_ref);
unpin_guc_id(guc, ce);
return ret;
}
- }
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
queue_request(sched_engine, rq, rq_prio(rq));
- else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
tasklet_hi_schedule(&sched_engine->tasklet);
- clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
- return 0;
}
static void sanitize_hwsp(struct intel_engine_cs *engine) @@ -606,6 +1068,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine) engine->submit_request = guc_submit_request; }
+static inline void guc_kernel_context_pin(struct intel_guc *guc,
struct intel_context *ce)
+{
- if (context_guc_id_invalid(ce))
pin_guc_id(guc, ce);
- guc_lrc_desc_pin(ce);
+}
+static inline void guc_init_lrc_mapping(struct intel_guc *guc) +{
- struct intel_gt *gt = guc_to_gt(guc);
- struct intel_engine_cs *engine;
- enum intel_engine_id id;
- /* make sure all descriptors are clean... */
- xa_destroy(&guc->context_lookup);
- /*
* Some contexts might have been pinned before we enabled GuC
* submission, so we need to add them to the GuC bookeeping.
* Also, after a reset the GuC we want to make sure that the information
* shared with GuC is properly reset. The kernel lrcs are not attached
* to the gem_context, so they need to be added separately.
*
* Note: we purposely do not check the error return of
* guc_lrc_desc_pin, because that function can only fail in two cases.
* One, if there aren't enough free IDs, but we're guaranteed to have
* enough here (we're either only pinning a handful of lrc on first boot
* or we're re-pinning lrcs that were already pinned before the reset).
* Two, if the GuC has died and CTBs can't make forward progress.
* Presumably, the GuC should be alive as this function is called on
* driver load or after a reset. Even if it is dead, another full GPU
* reset will be triggered and this function would be called again.
*/
- for_each_engine(engine, gt, id)
if (engine->kernel_context)
guc_kernel_context_pin(guc, engine->kernel_context);
+}
static void guc_release(struct intel_engine_cs *engine) { engine->sanitize = NULL; /* no longer in control, nothing to sanitize */ @@ -718,6 +1220,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
void intel_guc_submission_enable(struct intel_guc *guc) {
- guc_init_lrc_mapping(guc);
}
void intel_guc_submission_disable(struct intel_guc *guc) @@ -743,3 +1246,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc) { guc->submission_selected = __guc_submission_selected(guc); }
+static inline struct intel_context * +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) +{
- struct intel_context *ce;
- if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
drm_dbg(&guc_to_gt(guc)->i915->drm,
"Invalid desc_idx %u", desc_idx);
just debug? why not an (fatal) error ?
This is a G2H communication. We can't crash the driver if the GuC gives us crap.
return NULL;
- }
- ce = __get_context(guc, desc_idx);
- if (unlikely(!ce)) {
drm_dbg(&guc_to_gt(guc)->i915->drm,
"Context is NULL, desc_idx %u", desc_idx);
not an error ?
It is an error, we return NULL.
BTW - Can you include your name on the last comment? That makes it easier to know when I've reached the end of your comments.
Matt
return NULL;
- }
- return ce;
+}
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
const u32 *msg,
u32 len)
+{
- struct intel_context *ce;
- u32 desc_idx = msg[0];
- if (unlikely(len < 1)) {
drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
return -EPROTO;
- }
- ce = g2h_context_lookup(guc, desc_idx);
- if (unlikely(!ce))
return -EPROTO;
- if (context_wait_for_deregister_to_register(ce)) {
struct intel_runtime_pm *runtime_pm =
&ce->engine->gt->i915->runtime_pm;
intel_wakeref_t wakeref;
/*
* Previous owner of this guc_id has been deregistered, now safe
* register this context.
*/
with_intel_runtime_pm(runtime_pm, wakeref)
register_context(ce);
clr_context_wait_for_deregister_to_register(ce);
intel_context_put(ce);
- } else if (context_destroyed(ce)) {
/* Context has been destroyed */
release_guc_id(guc, ce);
lrc_destroy(&ce->ref);
- }
- return 0;
+} diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c857fafb8a30..a9c2242d61a2 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -4142,6 +4142,7 @@ enum { FAULT_AND_CONTINUE /* Unsupported */ };
+#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12) #define GEN8_CTX_VALID (1 << 0) #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1) #define GEN8_CTX_FORCE_RESTORE (1 << 2) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index c5989c0b83d3..9dad3df5eaf7 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -419,6 +419,7 @@ bool i915_request_retire(struct i915_request *rq) */ if (!list_empty(&rq->sched.link)) remove_from_engine(rq);
atomic_dec(&rq->context->guc_id_ref); GEM_BUG_ON(!llist_empty(&rq->execute_cb));
__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
Sometime during context pinning a context with the same guc_id is registered with the GuC. In this a case deregister must be before before the context can be registered. A fence is inserted on all requests while the deregister is in flight. Once the G2H is received indicating the deregistration is complete the context is registered and the fence is released.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_context.c | 1 + drivers/gpu/drm/i915/gt/intel_context_types.h | 5 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++++++++++++++++++- drivers/gpu/drm/i915/i915_request.h | 8 +++ 4 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 2b68af16222c..f750c826e19d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -384,6 +384,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) mutex_init(&ce->pin_mutex);
spin_lock_init(&ce->guc_state.lock); + INIT_LIST_HEAD(&ce->guc_state.fences);
ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index ce7c69b34cd1..beafe55a9101 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -146,6 +146,11 @@ struct intel_context { * submission */ u8 sched_state; + /* + * fences: maintains of list of requests that have a submit + * fence related to GuC submission + */ + struct list_head fences; } guc_state;
/* GuC scheduling state that does not require a lock. */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d39579ac2faa..49e5d460d54b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -924,6 +924,30 @@ static const struct intel_context_ops guc_context_ops = { .destroy = guc_context_destroy, };
+static void __guc_signal_context_fence(struct intel_context *ce) +{ + struct i915_request *rq; + + lockdep_assert_held(&ce->guc_state.lock); + + list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) + i915_sw_fence_complete(&rq->submit); + + INIT_LIST_HEAD(&ce->guc_state.fences); +} + +static void guc_signal_context_fence(struct intel_context *ce) +{ + unsigned long flags; + + GEM_BUG_ON(!context_wait_for_deregister_to_register(ce)); + + spin_lock_irqsave(&ce->guc_state.lock, flags); + clr_context_wait_for_deregister_to_register(ce); + __guc_signal_context_fence(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); +} + static bool context_needs_register(struct intel_context *ce, bool new_guc_id) { return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) || @@ -934,6 +958,7 @@ static int guc_request_alloc(struct i915_request *rq) { struct intel_context *ce = rq->context; struct intel_guc *guc = ce_to_guc(ce); + unsigned long flags; int ret;
GEM_BUG_ON(!intel_context_is_pinned(rq->context)); @@ -978,7 +1003,7 @@ static int guc_request_alloc(struct i915_request *rq) * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id. */ if (atomic_add_unless(&ce->guc_id_ref, 1, 0)) - return 0; + goto out;
ret = pin_guc_id(guc, ce); /* returns 1 if new guc_id assigned */ if (unlikely(ret < 0)) @@ -994,6 +1019,28 @@ static int guc_request_alloc(struct i915_request *rq)
clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+out: + /* + * We block all requests on this context if a G2H is pending for a + * context deregistration as the GuC will fail a context registration + * while this G2H is pending. Once a G2H returns, the fence is released + * that is blocking these requests (see guc_signal_context_fence). + * + * We can safely check the below field outside of the lock as it isn't + * possible for this field to transition from being clear to set but + * converse is possible, hence the need for the check within the lock. + */ + if (likely(!context_wait_for_deregister_to_register(ce))) + return 0; + + spin_lock_irqsave(&ce->guc_state.lock, flags); + if (context_wait_for_deregister_to_register(ce)) { + i915_sw_fence_await(&rq->submit); + + list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences); + } + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + return 0; }
@@ -1295,7 +1342,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, */ with_intel_runtime_pm(runtime_pm, wakeref) register_context(ce); - clr_context_wait_for_deregister_to_register(ce); + guc_signal_context_fence(ce); intel_context_put(ce); } else if (context_destroyed(ce)) { /* Context has been destroyed */ diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 239964bec1fa..f870cd75a001 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -285,6 +285,14 @@ struct i915_request { struct hrtimer timer; } watchdog;
+ /* + * Requests may need to be stalled when using GuC submission waiting for + * certain GuC operations to complete. If that is the case, stalled + * requests are added to a per context list of stalled requests. The + * below list_head is the link in that list. + */ + struct list_head guc_fence_link; + I915_SELFTEST_DECLARE(struct { struct list_head link; unsigned long delay;
On 6/24/2021 00:04, Matthew Brost wrote:
Sometime during context pinning a context with the same guc_id is
Sometime*s*
registered with the GuC. In this a case deregister must be before before
before before -> done before
the context can be registered. A fence is inserted on all requests while the deregister is in flight. Once the G2H is received indicating the deregistration is complete the context is registered and the fence is released.
Cc: John Harrisonjohn.c.harrison@intel.com Signed-off-by: Matthew Brostmatthew.brost@intel.com
With the above text fixed up: Reviewed-by: John Harrison John.C.Harrison@Intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 1 + drivers/gpu/drm/i915/gt/intel_context_types.h | 5 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++++++++++++++++++- drivers/gpu/drm/i915/i915_request.h | 8 +++ 4 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 2b68af16222c..f750c826e19d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -384,6 +384,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) mutex_init(&ce->pin_mutex);
spin_lock_init(&ce->guc_state.lock);
INIT_LIST_HEAD(&ce->guc_state.fences);
ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index ce7c69b34cd1..beafe55a9101 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -146,6 +146,11 @@ struct intel_context { * submission */ u8 sched_state;
/*
* fences: maintains of list of requests that have a submit
* fence related to GuC submission
*/
struct list_head fences;
} guc_state;
/* GuC scheduling state that does not require a lock. */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d39579ac2faa..49e5d460d54b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -924,6 +924,30 @@ static const struct intel_context_ops guc_context_ops = { .destroy = guc_context_destroy, };
+static void __guc_signal_context_fence(struct intel_context *ce) +{
- struct i915_request *rq;
- lockdep_assert_held(&ce->guc_state.lock);
- list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
i915_sw_fence_complete(&rq->submit);
- INIT_LIST_HEAD(&ce->guc_state.fences);
+}
+static void guc_signal_context_fence(struct intel_context *ce) +{
- unsigned long flags;
- GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- clr_context_wait_for_deregister_to_register(ce);
- __guc_signal_context_fence(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+}
- static bool context_needs_register(struct intel_context *ce, bool new_guc_id) { return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
@@ -934,6 +958,7 @@ static int guc_request_alloc(struct i915_request *rq) { struct intel_context *ce = rq->context; struct intel_guc *guc = ce_to_guc(ce);
unsigned long flags; int ret;
GEM_BUG_ON(!intel_context_is_pinned(rq->context));
@@ -978,7 +1003,7 @@ static int guc_request_alloc(struct i915_request *rq) * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id. */ if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
return 0;
goto out;
ret = pin_guc_id(guc, ce); /* returns 1 if new guc_id assigned */ if (unlikely(ret < 0))
@@ -994,6 +1019,28 @@ static int guc_request_alloc(struct i915_request *rq)
clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+out:
- /*
* We block all requests on this context if a G2H is pending for a
* context deregistration as the GuC will fail a context registration
* while this G2H is pending. Once a G2H returns, the fence is released
* that is blocking these requests (see guc_signal_context_fence).
*
* We can safely check the below field outside of the lock as it isn't
* possible for this field to transition from being clear to set but
* converse is possible, hence the need for the check within the lock.
*/
- if (likely(!context_wait_for_deregister_to_register(ce)))
return 0;
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- if (context_wait_for_deregister_to_register(ce)) {
i915_sw_fence_await(&rq->submit);
list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
- }
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- return 0; }
@@ -1295,7 +1342,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, */ with_intel_runtime_pm(runtime_pm, wakeref) register_context(ce);
clr_context_wait_for_deregister_to_register(ce);
intel_context_put(ce); } else if (context_destroyed(ce)) { /* Context has been destroyed */guc_signal_context_fence(ce);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 239964bec1fa..f870cd75a001 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -285,6 +285,14 @@ struct i915_request { struct hrtimer timer; } watchdog;
- /*
* Requests may need to be stalled when using GuC submission waiting for
* certain GuC operations to complete. If that is the case, stalled
* requests are added to a per context list of stalled requests. The
* below list_head is the link in that list.
*/
- struct list_head guc_fence_link;
- I915_SELFTEST_DECLARE(struct { struct list_head link; unsigned long delay;
With GuC scheduling, it isn't safe to unpin a context while scheduling is enabled for that context as the GuC may touch some of the pinned state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is done, a call back is added to intel_context_unpin when pin count == 1 to disable scheduling for that context. When the response CTB is received it is safe to do the final unpin.
Future patches may add a heuristic / delay to schedule the disable call back to avoid thrashing on schedule enable / disable.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_context.c | 4 +- drivers/gpu/drm/i915/gt/intel_context.h | 27 +++- drivers/gpu/drm/i915/gt/intel_context_types.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 145 +++++++++++++++++- 6 files changed, 179 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index f750c826e19d..1499b8aace2a 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -306,9 +306,9 @@ int __intel_context_do_pin(struct intel_context *ce) return err; }
-void intel_context_unpin(struct intel_context *ce) +void __intel_context_do_unpin(struct intel_context *ce, int sub) { - if (!atomic_dec_and_test(&ce->pin_count)) + if (!atomic_sub_and_test(sub, &ce->pin_count)) return;
CE_TRACE(ce, "unpin\n"); diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index f83a73a2b39f..8a7199afbe61 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -113,7 +113,32 @@ static inline void __intel_context_pin(struct intel_context *ce) atomic_inc(&ce->pin_count); }
-void intel_context_unpin(struct intel_context *ce); +void __intel_context_do_unpin(struct intel_context *ce, int sub); + +static inline void intel_context_sched_disable_unpin(struct intel_context *ce) +{ + __intel_context_do_unpin(ce, 2); +} + +static inline void intel_context_unpin(struct intel_context *ce) +{ + if (!ce->ops->sched_disable) { + __intel_context_do_unpin(ce, 1); + } else { + /* + * Move ownership of this pin to the scheduling disable which is + * an async operation. When that operation completes the above + * intel_context_sched_disable_unpin is called potentially + * unpinning the context. + */ + while (!atomic_add_unless(&ce->pin_count, -1, 1)) { + if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) { + ce->ops->sched_disable(ce); + break; + } + } + } +}
void intel_context_enter_engine(struct intel_context *ce); void intel_context_exit_engine(struct intel_context *ce); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index beafe55a9101..e7af6a2368f8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -43,6 +43,8 @@ struct intel_context_ops { void (*enter)(struct intel_context *ce); void (*exit)(struct intel_context *ce);
+ void (*sched_disable)(struct intel_context *ce); + void (*reset)(struct intel_context *ce); void (*destroy)(struct kref *kref); }; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index d44316dc914b..b43ec56986b5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -236,6 +236,8 @@ int intel_guc_reset_engine(struct intel_guc *guc,
int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_sched_done_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len);
void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 42a7daef2ff6..7491f041859e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -905,6 +905,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r ret = intel_guc_deregister_done_process_msg(guc, payload, len); break; + case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: + ret = intel_guc_sched_done_process_msg(guc, payload, len); + break; default: ret = -EOPNOTSUPP; break; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 49e5d460d54b..0386ccd5a481 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -70,6 +70,7 @@ * possible for some of the bits to changing at the same time though. */ #define SCHED_STATE_NO_LOCK_ENABLED BIT(0) +#define SCHED_STATE_NO_LOCK_PENDING_ENABLE BIT(1) static inline bool context_enabled(struct intel_context *ce) { return (atomic_read(&ce->guc_sched_state_no_lock) & @@ -87,6 +88,24 @@ static inline void clr_context_enabled(struct intel_context *ce) &ce->guc_sched_state_no_lock); }
+static inline bool context_pending_enable(struct intel_context *ce) +{ + return (atomic_read(&ce->guc_sched_state_no_lock) & + SCHED_STATE_NO_LOCK_PENDING_ENABLE); +} + +static inline void set_context_pending_enable(struct intel_context *ce) +{ + atomic_or(SCHED_STATE_NO_LOCK_PENDING_ENABLE, + &ce->guc_sched_state_no_lock); +} + +static inline void clr_context_pending_enable(struct intel_context *ce) +{ + atomic_and((u32)~SCHED_STATE_NO_LOCK_PENDING_ENABLE, + &ce->guc_sched_state_no_lock); +} + /* * Below is a set of functions which control the GuC scheduling state which * require a lock, aside from the special case where the functions are called @@ -95,6 +114,7 @@ static inline void clr_context_enabled(struct intel_context *ce) */ #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER BIT(0) #define SCHED_STATE_DESTROYED BIT(1) +#define SCHED_STATE_PENDING_DISABLE BIT(2) static inline void init_sched_state(struct intel_context *ce) { /* Only should be called from guc_lrc_desc_pin() */ @@ -139,6 +159,24 @@ set_context_destroyed(struct intel_context *ce) ce->guc_state.sched_state |= SCHED_STATE_DESTROYED; }
+static inline bool context_pending_disable(struct intel_context *ce) +{ + return (ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE); +} + +static inline void set_context_pending_disable(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state |= SCHED_STATE_PENDING_DISABLE; +} + +static inline void clr_context_pending_disable(struct intel_context *ce) +{ + lockdep_assert_held(&ce->guc_state.lock); + ce->guc_state.sched_state = + (ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE); +} + static inline bool context_guc_id_invalid(struct intel_context *ce) { return (ce->guc_id == GUC_INVALID_LRC_ID); @@ -231,6 +269,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id; action[len++] = GUC_CONTEXT_ENABLE; + set_context_pending_enable(ce); + intel_context_get(ce); } else { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; action[len++] = ce->guc_id; @@ -238,8 +278,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
err = intel_guc_send_nb(guc, action, len);
- if (!enabled && !err) + if (!enabled && !err) { set_context_enabled(ce); + } else if (!enabled) { + clr_context_pending_enable(ce); + intel_context_put(ce); + }
return err; } @@ -831,6 +875,60 @@ static void guc_context_post_unpin(struct intel_context *ce) lrc_post_unpin(ce); }
+static void __guc_context_sched_disable(struct intel_guc *guc, + struct intel_context *ce, + u16 guc_id) +{ + u32 action[] = { + INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET, + guc_id, /* ce->guc_id not stable */ + GUC_CONTEXT_DISABLE + }; + + GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID); + + intel_context_get(ce); + + intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true); +} + +static u16 prep_context_pending_disable(struct intel_context *ce) +{ + set_context_pending_disable(ce); + clr_context_enabled(ce); + + return ce->guc_id; +} + +static void guc_context_sched_disable(struct intel_context *ce) +{ + struct intel_guc *guc = ce_to_guc(ce); + struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; + unsigned long flags; + u16 guc_id; + intel_wakeref_t wakeref; + + if (context_guc_id_invalid(ce) || + !lrc_desc_registered(guc, ce->guc_id)) { + clr_context_enabled(ce); + goto unpin; + } + + if (!context_enabled(ce)) + goto unpin; + + spin_lock_irqsave(&ce->guc_state.lock, flags); + guc_id = prep_context_pending_disable(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + with_intel_runtime_pm(runtime_pm, wakeref) + __guc_context_sched_disable(guc, ce, guc_id); + + return; +unpin: + intel_context_sched_disable_unpin(ce); +} + static inline void guc_lrc_desc_unpin(struct intel_context *ce) { struct intel_engine_cs *engine = ce->engine; @@ -839,6 +937,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); + GEM_BUG_ON(context_enabled(ce));
spin_lock_irqsave(&ce->guc_state.lock, flags); set_context_destroyed(ce); @@ -920,6 +1019,8 @@ static const struct intel_context_ops guc_context_ops = { .enter = intel_context_enter_engine, .exit = intel_context_exit_engine,
+ .sched_disable = guc_context_sched_disable, + .reset = lrc_reset, .destroy = guc_context_destroy, }; @@ -1352,3 +1453,45 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
return 0; } + +int intel_guc_sched_done_process_msg(struct intel_guc *guc, + const u32 *msg, + u32 len) +{ + struct intel_context *ce; + unsigned long flags; + u32 desc_idx = msg[0]; + + if (unlikely(len < 2)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + ce = g2h_context_lookup(guc, desc_idx); + if (unlikely(!ce)) + return -EPROTO; + + if (unlikely(context_destroyed(ce) || + (!context_pending_enable(ce) && + !context_pending_disable(ce)))) { + drm_dbg(&guc_to_gt(guc)->i915->drm, + "Bad context sched_state 0x%x, 0x%x, desc_idx %u", + atomic_read(&ce->guc_sched_state_no_lock), + ce->guc_state.sched_state, desc_idx); + return -EPROTO; + } + + if (context_pending_enable(ce)) { + clr_context_pending_enable(ce); + } else if (context_pending_disable(ce)) { + intel_context_sched_disable_unpin(ce); + + spin_lock_irqsave(&ce->guc_state.lock, flags); + clr_context_pending_disable(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + } + + intel_context_put(ce); + + return 0; +}
On 6/24/2021 00:04, Matthew Brost wrote:
With GuC scheduling, it isn't safe to unpin a context while scheduling is enabled for that context as the GuC may touch some of the pinned state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is done, a call back is added to intel_context_unpin when pin count == 1 to disable scheduling for that context. When the response CTB is received it is safe to do the final unpin.
Future patches may add a heuristic / delay to schedule the disable call back to avoid thrashing on schedule enable / disable.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: John Harrison John.C.Harrison@Intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 4 +- drivers/gpu/drm/i915/gt/intel_context.h | 27 +++- drivers/gpu/drm/i915/gt/intel_context_types.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 145 +++++++++++++++++- 6 files changed, 179 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index f750c826e19d..1499b8aace2a 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -306,9 +306,9 @@ int __intel_context_do_pin(struct intel_context *ce) return err; }
-void intel_context_unpin(struct intel_context *ce) +void __intel_context_do_unpin(struct intel_context *ce, int sub) {
- if (!atomic_dec_and_test(&ce->pin_count))
if (!atomic_sub_and_test(sub, &ce->pin_count)) return;
CE_TRACE(ce, "unpin\n");
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index f83a73a2b39f..8a7199afbe61 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -113,7 +113,32 @@ static inline void __intel_context_pin(struct intel_context *ce) atomic_inc(&ce->pin_count); }
-void intel_context_unpin(struct intel_context *ce); +void __intel_context_do_unpin(struct intel_context *ce, int sub);
+static inline void intel_context_sched_disable_unpin(struct intel_context *ce) +{
- __intel_context_do_unpin(ce, 2);
+}
+static inline void intel_context_unpin(struct intel_context *ce) +{
- if (!ce->ops->sched_disable) {
__intel_context_do_unpin(ce, 1);
- } else {
/*
* Move ownership of this pin to the scheduling disable which is
* an async operation. When that operation completes the above
* intel_context_sched_disable_unpin is called potentially
* unpinning the context.
*/
while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
ce->ops->sched_disable(ce);
break;
}
}
- }
+}
void intel_context_enter_engine(struct intel_context *ce); void intel_context_exit_engine(struct intel_context *ce); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index beafe55a9101..e7af6a2368f8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -43,6 +43,8 @@ struct intel_context_ops { void (*enter)(struct intel_context *ce); void (*exit)(struct intel_context *ce);
- void (*sched_disable)(struct intel_context *ce);
- void (*reset)(struct intel_context *ce); void (*destroy)(struct kref *kref); };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index d44316dc914b..b43ec56986b5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -236,6 +236,8 @@ int intel_guc_reset_engine(struct intel_guc *guc,
int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_sched_done_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len);
void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 42a7daef2ff6..7491f041859e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -905,6 +905,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r ret = intel_guc_deregister_done_process_msg(guc, payload, len); break;
- case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
ret = intel_guc_sched_done_process_msg(guc, payload, len);
default: ret = -EOPNOTSUPP; break;break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 49e5d460d54b..0386ccd5a481 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -70,6 +70,7 @@
- possible for some of the bits to changing at the same time though.
*/ #define SCHED_STATE_NO_LOCK_ENABLED BIT(0) +#define SCHED_STATE_NO_LOCK_PENDING_ENABLE BIT(1) static inline bool context_enabled(struct intel_context *ce) { return (atomic_read(&ce->guc_sched_state_no_lock) & @@ -87,6 +88,24 @@ static inline void clr_context_enabled(struct intel_context *ce) &ce->guc_sched_state_no_lock); }
+static inline bool context_pending_enable(struct intel_context *ce) +{
- return (atomic_read(&ce->guc_sched_state_no_lock) &
SCHED_STATE_NO_LOCK_PENDING_ENABLE);
+}
+static inline void set_context_pending_enable(struct intel_context *ce) +{
- atomic_or(SCHED_STATE_NO_LOCK_PENDING_ENABLE,
&ce->guc_sched_state_no_lock);
+}
+static inline void clr_context_pending_enable(struct intel_context *ce) +{
- atomic_and((u32)~SCHED_STATE_NO_LOCK_PENDING_ENABLE,
&ce->guc_sched_state_no_lock);
+}
- /*
- Below is a set of functions which control the GuC scheduling state which
- require a lock, aside from the special case where the functions are called
@@ -95,6 +114,7 @@ static inline void clr_context_enabled(struct intel_context *ce) */ #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER BIT(0) #define SCHED_STATE_DESTROYED BIT(1) +#define SCHED_STATE_PENDING_DISABLE BIT(2) static inline void init_sched_state(struct intel_context *ce) { /* Only should be called from guc_lrc_desc_pin() */ @@ -139,6 +159,24 @@ set_context_destroyed(struct intel_context *ce) ce->guc_state.sched_state |= SCHED_STATE_DESTROYED; }
+static inline bool context_pending_disable(struct intel_context *ce) +{
- return (ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE);
+}
+static inline void set_context_pending_disable(struct intel_context *ce) +{
- lockdep_assert_held(&ce->guc_state.lock);
- ce->guc_state.sched_state |= SCHED_STATE_PENDING_DISABLE;
+}
+static inline void clr_context_pending_disable(struct intel_context *ce) +{
- lockdep_assert_held(&ce->guc_state.lock);
- ce->guc_state.sched_state =
(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
+}
- static inline bool context_guc_id_invalid(struct intel_context *ce) { return (ce->guc_id == GUC_INVALID_LRC_ID);
@@ -231,6 +269,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id; action[len++] = GUC_CONTEXT_ENABLE;
set_context_pending_enable(ce);
} else { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; action[len++] = ce->guc_id;intel_context_get(ce);
@@ -238,8 +278,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
err = intel_guc_send_nb(guc, action, len);
- if (!enabled && !err)
if (!enabled && !err) { set_context_enabled(ce);
} else if (!enabled) {
clr_context_pending_enable(ce);
intel_context_put(ce);
}
return err; }
@@ -831,6 +875,60 @@ static void guc_context_post_unpin(struct intel_context *ce) lrc_post_unpin(ce); }
+static void __guc_context_sched_disable(struct intel_guc *guc,
struct intel_context *ce,
u16 guc_id)
+{
- u32 action[] = {
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
guc_id, /* ce->guc_id not stable */
GUC_CONTEXT_DISABLE
- };
- GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
- intel_context_get(ce);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+static u16 prep_context_pending_disable(struct intel_context *ce) +{
- set_context_pending_disable(ce);
- clr_context_enabled(ce);
- return ce->guc_id;
+}
+static void guc_context_sched_disable(struct intel_context *ce) +{
- struct intel_guc *guc = ce_to_guc(ce);
- struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
- unsigned long flags;
- u16 guc_id;
- intel_wakeref_t wakeref;
- if (context_guc_id_invalid(ce) ||
!lrc_desc_registered(guc, ce->guc_id)) {
clr_context_enabled(ce);
goto unpin;
- }
- if (!context_enabled(ce))
goto unpin;
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- guc_id = prep_context_pending_disable(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- with_intel_runtime_pm(runtime_pm, wakeref)
__guc_context_sched_disable(guc, ce, guc_id);
- return;
+unpin:
- intel_context_sched_disable_unpin(ce);
+}
- static inline void guc_lrc_desc_unpin(struct intel_context *ce) { struct intel_engine_cs *engine = ce->engine;
@@ -839,6 +937,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
GEM_BUG_ON(context_enabled(ce));
spin_lock_irqsave(&ce->guc_state.lock, flags); set_context_destroyed(ce);
@@ -920,6 +1019,8 @@ static const struct intel_context_ops guc_context_ops = { .enter = intel_context_enter_engine, .exit = intel_context_exit_engine,
- .sched_disable = guc_context_sched_disable,
- .reset = lrc_reset, .destroy = guc_context_destroy, };
@@ -1352,3 +1453,45 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
return 0; }
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
const u32 *msg,
u32 len)
+{
- struct intel_context *ce;
- unsigned long flags;
- u32 desc_idx = msg[0];
- if (unlikely(len < 2)) {
drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
return -EPROTO;
- }
- ce = g2h_context_lookup(guc, desc_idx);
- if (unlikely(!ce))
return -EPROTO;
- if (unlikely(context_destroyed(ce) ||
(!context_pending_enable(ce) &&
!context_pending_disable(ce)))) {
drm_dbg(&guc_to_gt(guc)->i915->drm,
"Bad context sched_state 0x%x, 0x%x, desc_idx %u",
atomic_read(&ce->guc_sched_state_no_lock),
ce->guc_state.sched_state, desc_idx);
return -EPROTO;
- }
- if (context_pending_enable(ce)) {
clr_context_pending_enable(ce);
- } else if (context_pending_disable(ce)) {
intel_context_sched_disable_unpin(ce);
spin_lock_irqsave(&ce->guc_state.lock, flags);
clr_context_pending_disable(ce);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- }
- intel_context_put(ce);
- return 0;
+}
Disable engine barriers for unpinning with GuC. This feature isn't needed with the GuC as it disables context scheduling before unpinning which guarantees the HW will not reference the context. Hence it is not necessary to defer unpinning until a kernel context request completes on each engine in the context engine mask.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com --- drivers/gpu/drm/i915/gt/intel_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 1 + drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++ 3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 1499b8aace2a..7f97753ab164 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
__i915_active_acquire(&ce->active);
- if (intel_context_is_barrier(ce)) + if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) return 0;
/* Preallocate tracking nodes */ diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8a7199afbe61..a592a9605dc8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -16,6 +16,7 @@ #include "intel_engine_types.h" #include "intel_ring_types.h" #include "intel_timeline_types.h" +#include "uc/intel_guc_submission.h"
#define CE_TRACE(ce, fmt, ...) do { \ const struct intel_context *ce__ = (ce); \ diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 26685b927169..fa7b99a671dd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine) * This test makes sure that the context is kept alive until a * subsequent idle-barrier (emitted when the engine wakeref hits 0 * with no more outstanding requests). + * + * In GuC submission mode we don't use idle barriers and we instead + * get a message from the GuC to signal that it is safe to unpin the + * context from memory. */ + if (intel_engine_uses_guc(engine)) + return 0;
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n", @@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine) * on the context image remotely (intel_context_prepare_remote_request), * which inserts foreign fences into intel_context.active, does not * clobber the idle-barrier. + * + * In GuC submission mode we don't use idle barriers. */ + if (intel_engine_uses_guc(engine)) + return 0;
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",
On 6/24/2021 00:04, Matthew Brost wrote:
Disable engine barriers for unpinning with GuC. This feature isn't needed with the GuC as it disables context scheduling before unpinning which guarantees the HW will not reference the context. Hence it is not necessary to defer unpinning until a kernel context request completes on each engine in the context engine mask.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 1 + drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++ 3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 1499b8aace2a..7f97753ab164 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
__i915_active_acquire(&ce->active);
- if (intel_context_is_barrier(ce))
- if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) return 0;
Would be better to have a scheduler flag to say whether barriers are required or not. That would prevent polluting front end code with back end details.
John.
/* Preallocate tracking nodes */ diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8a7199afbe61..a592a9605dc8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -16,6 +16,7 @@ #include "intel_engine_types.h" #include "intel_ring_types.h" #include "intel_timeline_types.h" +#include "uc/intel_guc_submission.h"
#define CE_TRACE(ce, fmt, ...) do { \ const struct intel_context *ce__ = (ce); \ diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 26685b927169..fa7b99a671dd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine) * This test makes sure that the context is kept alive until a * subsequent idle-barrier (emitted when the engine wakeref hits 0 * with no more outstanding requests).
*
* In GuC submission mode we don't use idle barriers and we instead
* get a message from the GuC to signal that it is safe to unpin the
* context from memory.
*/
if (intel_engine_uses_guc(engine))
return 0;
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine) * on the context image remotely (intel_context_prepare_remote_request), * which inserts foreign fences into intel_context.active, does not * clobber the idle-barrier.
*
* In GuC submission mode we don't use idle barriers.
*/
if (intel_engine_uses_guc(engine))
return 0;
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",
On Fri, Jul 09, 2021 at 03:53:29PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Disable engine barriers for unpinning with GuC. This feature isn't needed with the GuC as it disables context scheduling before unpinning which guarantees the HW will not reference the context. Hence it is not necessary to defer unpinning until a kernel context request completes on each engine in the context engine mask.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 1 + drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++ 3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 1499b8aace2a..7f97753ab164 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce) __i915_active_acquire(&ce->active);
- if (intel_context_is_barrier(ce))
- if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) return 0;
Would be better to have a scheduler flag to say whether barriers are required or not. That would prevent polluting front end code with back end details.
I guess an engine flag is slightly better but I still don't love that as we have to test if the context is a barrier (kernel context) and then call a function that is basically backend specific after. IMO we really need to push all of this to a vfunc. If you really want me to make this an engine flag I can, but in the end it just seems like that will trash the code (adding an engine flag just to remove it). I think this is just a clean up we write down, and figure out a bit later as nothing is functionally wrong + quite clear that it is something that should be cleaned up.
Matt
John.
/* Preallocate tracking nodes */ diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8a7199afbe61..a592a9605dc8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -16,6 +16,7 @@ #include "intel_engine_types.h" #include "intel_ring_types.h" #include "intel_timeline_types.h" +#include "uc/intel_guc_submission.h" #define CE_TRACE(ce, fmt, ...) do { \ const struct intel_context *ce__ = (ce); \ diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 26685b927169..fa7b99a671dd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine) * This test makes sure that the context is kept alive until a * subsequent idle-barrier (emitted when the engine wakeref hits 0 * with no more outstanding requests).
*
* In GuC submission mode we don't use idle barriers and we instead
* get a message from the GuC to signal that it is safe to unpin the
*/* context from memory.
- if (intel_engine_uses_guc(engine))
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",return 0;
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine) * on the context image remotely (intel_context_prepare_remote_request), * which inserts foreign fences into intel_context.active, does not * clobber the idle-barrier.
*
*/* In GuC submission mode we don't use idle barriers.
- if (intel_engine_uses_guc(engine))
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",return 0;
On 7/9/2021 20:00, Matthew Brost wrote:
On Fri, Jul 09, 2021 at 03:53:29PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Disable engine barriers for unpinning with GuC. This feature isn't needed with the GuC as it disables context scheduling before unpinning which guarantees the HW will not reference the context. Hence it is not necessary to defer unpinning until a kernel context request completes on each engine in the context engine mask.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 1 + drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++ 3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 1499b8aace2a..7f97753ab164 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce) __i915_active_acquire(&ce->active);
- if (intel_context_is_barrier(ce))
- if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) return 0;
Would be better to have a scheduler flag to say whether barriers are required or not. That would prevent polluting front end code with back end details.
I guess an engine flag is slightly better but I still don't love that as we have to test if the context is a barrier (kernel context) and then call a function that is basically backend specific after. IMO we really need to push all of this to a vfunc. If you really want me to make this an engine flag I can, but in the end it just seems like that will trash the code (adding an engine flag just to remove it). I think this is just a clean up we write down, and figure out a bit later as nothing is functionally wrong + quite clear that it is something that should be cleaned up.
Matt
Flag, vfunc, whatever. I just mean that it would be better to abstract it out in some manner. Maybe a flag/vfunc on the ce object? That way it would swallow the 'ignore kernel contexts' test as well. But yes, probably best to add it to the todo list and move on as it is not going to be a two minute quick fix. I've added a comment to the Jira, so...
Reviewed-by: John Harrison John.C.Harrison@Intel.com
John.
/* Preallocate tracking nodes */
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8a7199afbe61..a592a9605dc8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -16,6 +16,7 @@ #include "intel_engine_types.h" #include "intel_ring_types.h" #include "intel_timeline_types.h" +#include "uc/intel_guc_submission.h" #define CE_TRACE(ce, fmt, ...) do { \ const struct intel_context *ce__ = (ce); \ diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 26685b927169..fa7b99a671dd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine) * This test makes sure that the context is kept alive until a * subsequent idle-barrier (emitted when the engine wakeref hits 0 * with no more outstanding requests).
*
* In GuC submission mode we don't use idle barriers and we instead
* get a message from the GuC to signal that it is safe to unpin the
*/* context from memory.
- if (intel_engine_uses_guc(engine))
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",return 0;
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine) * on the context image remotely (intel_context_prepare_remote_request), * which inserts foreign fences into intel_context.active, does not * clobber the idle-barrier.
*
*/* In GuC submission mode we don't use idle barriers.
- if (intel_engine_uses_guc(engine))
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",return 0;
On Mon, Jul 12, 2021 at 7:57 PM John Harrison john.c.harrison@intel.com wrote:
On 7/9/2021 20:00, Matthew Brost wrote:
On Fri, Jul 09, 2021 at 03:53:29PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Disable engine barriers for unpinning with GuC. This feature isn't needed with the GuC as it disables context scheduling before unpinning which guarantees the HW will not reference the context. Hence it is not necessary to defer unpinning until a kernel context request completes on each engine in the context engine mask.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context.h | 1 + drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++ 3 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 1499b8aace2a..7f97753ab164 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce) __i915_active_acquire(&ce->active);
- if (intel_context_is_barrier(ce))
- if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine)) return 0;
Would be better to have a scheduler flag to say whether barriers are required or not. That would prevent polluting front end code with back end details.
I guess an engine flag is slightly better but I still don't love that as we have to test if the context is a barrier (kernel context) and then call a function that is basically backend specific after. IMO we really need to push all of this to a vfunc. If you really want me to make this an engine flag I can, but in the end it just seems like that will trash the code (adding an engine flag just to remove it). I think this is just a clean up we write down, and figure out a bit later as nothing is functionally wrong + quite clear that it is something that should be cleaned up.
Matt
Flag, vfunc, whatever. I just mean that it would be better to abstract it out in some manner. Maybe a flag/vfunc on the ce object? That way it would swallow the 'ignore kernel contexts' test as well. But yes, probably best to add it to the todo list and move on as it is not going to be a two minute quick fix. I've added a comment to the Jira, so...
The plan is: - merge guc backend - convert over to drm/scheduler as a proper interface between higher levels and the scheduler backend - shovel as much as reasonable of the execlist specifics into the execlist backend
Right now our frontend code is essentially designed to assume the execlist backend is the true way to build a scheduler, and everything else is a special case. We can't reasonable fix this by sprinkling lots of vfuns all over the place, and hence we imo shouldn't try, at least not until the big picture is in much better shape. -Daniel
Reviewed-by: John Harrison John.C.Harrison@Intel.com
John.
/* Preallocate tracking nodes */
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 8a7199afbe61..a592a9605dc8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -16,6 +16,7 @@ #include "intel_engine_types.h" #include "intel_ring_types.h" #include "intel_timeline_types.h" +#include "uc/intel_guc_submission.h" #define CE_TRACE(ce, fmt, ...) do { \ const struct intel_context *ce__ = (ce); \ diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 26685b927169..fa7b99a671dd 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine) * This test makes sure that the context is kept alive until a * subsequent idle-barrier (emitted when the engine wakeref hits 0 * with no more outstanding requests).
- In GuC submission mode we don't use idle barriers and we instead
- get a message from the GuC to signal that it is safe to unpin the
*/
- context from memory.
- if (intel_engine_uses_guc(engine))
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",return 0;
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine) * on the context image remotely (intel_context_prepare_remote_request), * which inserts foreign fences into intel_context.active, does not * clobber the idle-barrier.
*/
- In GuC submission mode we don't use idle barriers.
- if (intel_engine_uses_guc(engine))
if (intel_engine_pm_is_awake(engine)) { pr_err("%s is awake before starting %s!\n",return 0;
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Extend the deregistration context fence to fence whne a GuC context has scheduling disable pending.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++---- 1 file changed, 30 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0386ccd5a481..0a6ccdf32316 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -918,7 +918,19 @@ static void guc_context_sched_disable(struct intel_context *ce) goto unpin;
spin_lock_irqsave(&ce->guc_state.lock, flags); + + /* + * We have to check if the context has been pinned again as another pin + * operation is allowed to pass this function. Checking the pin count + * here synchronizes this function with guc_request_alloc ensuring a + * request doesn't slip through the 'context_pending_disable' fence. + */ + if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + return; + } guc_id = prep_context_pending_disable(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags);
with_intel_runtime_pm(runtime_pm, wakeref) @@ -1123,19 +1135,22 @@ static int guc_request_alloc(struct i915_request *rq) out: /* * We block all requests on this context if a G2H is pending for a - * context deregistration as the GuC will fail a context registration - * while this G2H is pending. Once a G2H returns, the fence is released - * that is blocking these requests (see guc_signal_context_fence). + * schedule disable or context deregistration as the GuC will fail a + * schedule enable or context registration if either G2H is pending + * respectfully. Once a G2H returns, the fence is released that is + * blocking these requests (see guc_signal_context_fence). * - * We can safely check the below field outside of the lock as it isn't - * possible for this field to transition from being clear to set but + * We can safely check the below fields outside of the lock as it isn't + * possible for these fields to transition from being clear to set but * converse is possible, hence the need for the check within the lock. */ - if (likely(!context_wait_for_deregister_to_register(ce))) + if (likely(!context_wait_for_deregister_to_register(ce) && + !context_pending_disable(ce))) return 0;
spin_lock_irqsave(&ce->guc_state.lock, flags); - if (context_wait_for_deregister_to_register(ce)) { + if (context_wait_for_deregister_to_register(ce) || + context_pending_disable(ce)) { i915_sw_fence_await(&rq->submit);
list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences); @@ -1484,10 +1499,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) { + /* + * Unpin must be done before __guc_signal_context_fence, + * otherwise a race exists between the requests getting + * submitted + retired before this unpin completes resulting in + * the pin_count going to zero and the context still being + * enabled. + */ intel_context_sched_disable_unpin(ce);
spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_pending_disable(ce); + __guc_signal_context_fence(ce); spin_unlock_irqrestore(&ce->guc_state.lock, flags); }
On 6/24/2021 00:04, Matthew Brost wrote:
Extend the deregistration context fence to fence whne a GuC context has scheduling disable pending.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++---- 1 file changed, 30 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0386ccd5a481..0a6ccdf32316 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -918,7 +918,19 @@ static void guc_context_sched_disable(struct intel_context *ce) goto unpin;
spin_lock_irqsave(&ce->guc_state.lock, flags);
- /*
* We have to check if the context has been pinned again as another pin
* operation is allowed to pass this function. Checking the pin count
* here synchronizes this function with guc_request_alloc ensuring a
* request doesn't slip through the 'context_pending_disable' fence.
*/
The pin count is an atomic so doesn't need the spinlock. Also the above comment 'checking the pin count here synchronizes ...' seems wrong. Isn't the point that acquiring the spinlock is what synchronises with guc_request_alloc? So the comment should be before the spinlock acquire and should mention using the spinlock for this purpose?
John.
if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
return;
} guc_id = prep_context_pending_disable(ce);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
with_intel_runtime_pm(runtime_pm, wakeref)
@@ -1123,19 +1135,22 @@ static int guc_request_alloc(struct i915_request *rq) out: /* * We block all requests on this context if a G2H is pending for a
* context deregistration as the GuC will fail a context registration
* while this G2H is pending. Once a G2H returns, the fence is released
* that is blocking these requests (see guc_signal_context_fence).
* schedule disable or context deregistration as the GuC will fail a
* schedule enable or context registration if either G2H is pending
* respectfully. Once a G2H returns, the fence is released that is
* blocking these requests (see guc_signal_context_fence).
* We can safely check the below field outside of the lock as it isn't
* possible for this field to transition from being clear to set but
* We can safely check the below fields outside of the lock as it isn't
* possible for these fields to transition from being clear to set but
*/
- converse is possible, hence the need for the check within the lock.
- if (likely(!context_wait_for_deregister_to_register(ce)))
if (likely(!context_wait_for_deregister_to_register(ce) &&
!context_pending_disable(ce)))
return 0;
spin_lock_irqsave(&ce->guc_state.lock, flags);
- if (context_wait_for_deregister_to_register(ce)) {
if (context_wait_for_deregister_to_register(ce) ||
context_pending_disable(ce)) {
i915_sw_fence_await(&rq->submit);
list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
@@ -1484,10 +1499,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) {
/*
* Unpin must be done before __guc_signal_context_fence,
* otherwise a race exists between the requests getting
* submitted + retired before this unpin completes resulting in
* the pin_count going to zero and the context still being
* enabled.
*/
intel_context_sched_disable_unpin(ce);
spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_pending_disable(ce);
__guc_signal_context_fence(ce);
spin_unlock_irqrestore(&ce->guc_state.lock, flags); }
On Fri, Jul 09, 2021 at 03:59:11PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Extend the deregistration context fence to fence whne a GuC context has scheduling disable pending.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++---- 1 file changed, 30 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0386ccd5a481..0a6ccdf32316 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -918,7 +918,19 @@ static void guc_context_sched_disable(struct intel_context *ce) goto unpin; spin_lock_irqsave(&ce->guc_state.lock, flags);
- /*
* We have to check if the context has been pinned again as another pin
* operation is allowed to pass this function. Checking the pin count
* here synchronizes this function with guc_request_alloc ensuring a
* request doesn't slip through the 'context_pending_disable' fence.
*/
The pin count is an atomic so doesn't need the spinlock. Also the above
How about?
/* * We have to check if the context has been pinned again as another pin * operation is allowed to pass this function. Checking the pin count, * within ce->guc_state.lock, synchronizes this function with * guc_request_alloc ensuring a request doesn't slip through the * 'context_pending_disable' fence. Checking within the spin lock (can't * sleep) ensures another process doesn't pin this context and generate * a request before we set the 'context_pending_disable' flag here. */
Matt
comment 'checking the pin count here synchronizes ...' seems wrong. Isn't the point that acquiring the spinlock is what synchronises with guc_request_alloc? So the comment should be before the spinlock acquire and should mention using the spinlock for this purpose?
John.
- if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
return;
- } guc_id = prep_context_pending_disable(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags); with_intel_runtime_pm(runtime_pm, wakeref)
@@ -1123,19 +1135,22 @@ static int guc_request_alloc(struct i915_request *rq) out: /* * We block all requests on this context if a G2H is pending for a
* context deregistration as the GuC will fail a context registration
* while this G2H is pending. Once a G2H returns, the fence is released
* that is blocking these requests (see guc_signal_context_fence).
* schedule disable or context deregistration as the GuC will fail a
* schedule enable or context registration if either G2H is pending
* respectfully. Once a G2H returns, the fence is released that is
* blocking these requests (see guc_signal_context_fence).
* We can safely check the below field outside of the lock as it isn't
* possible for this field to transition from being clear to set but
* We can safely check the below fields outside of the lock as it isn't
* possible for these fields to transition from being clear to set but
*/
- converse is possible, hence the need for the check within the lock.
- if (likely(!context_wait_for_deregister_to_register(ce)))
- if (likely(!context_wait_for_deregister_to_register(ce) &&
return 0; spin_lock_irqsave(&ce->guc_state.lock, flags);!context_pending_disable(ce)))
- if (context_wait_for_deregister_to_register(ce)) {
- if (context_wait_for_deregister_to_register(ce) ||
i915_sw_fence_await(&rq->submit); list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);context_pending_disable(ce)) {
@@ -1484,10 +1499,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) {
/*
* Unpin must be done before __guc_signal_context_fence,
* otherwise a race exists between the requests getting
* submitted + retired before this unpin completes resulting in
* the pin_count going to zero and the context still being
* enabled.
intel_context_sched_disable_unpin(ce); spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_pending_disable(ce);*/
spin_unlock_irqrestore(&ce->guc_state.lock, flags); }__guc_signal_context_fence(ce);
On 7/9/2021 20:36, Matthew Brost wrote:
On Fri, Jul 09, 2021 at 03:59:11PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Extend the deregistration context fence to fence whne a GuC context has scheduling disable pending.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++---- 1 file changed, 30 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0386ccd5a481..0a6ccdf32316 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -918,7 +918,19 @@ static void guc_context_sched_disable(struct intel_context *ce) goto unpin; spin_lock_irqsave(&ce->guc_state.lock, flags);
- /*
* We have to check if the context has been pinned again as another pin
* operation is allowed to pass this function. Checking the pin count
* here synchronizes this function with guc_request_alloc ensuring a
* request doesn't slip through the 'context_pending_disable' fence.
*/
The pin count is an atomic so doesn't need the spinlock. Also the above
How about?
/*
- We have to check if the context has been pinned again as another pin
- operation is allowed to pass this function. Checking the pin count,
- within ce->guc_state.lock, synchronizes this function with
- guc_request_alloc ensuring a request doesn't slip through the
- 'context_pending_disable' fence. Checking within the spin lock (can't
- sleep) ensures another process doesn't pin this context and generate
- a request before we set the 'context_pending_disable' flag here.
*/
Matt
Sounds good. With that added in: Reviewed-by: John Harrison John.C.Harrison@Intel.com
comment 'checking the pin count here synchronizes ...' seems wrong. Isn't the point that acquiring the spinlock is what synchronises with guc_request_alloc? So the comment should be before the spinlock acquire and should mention using the spinlock for this purpose?
John.
- if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
return;
- } guc_id = prep_context_pending_disable(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags); with_intel_runtime_pm(runtime_pm, wakeref)
@@ -1123,19 +1135,22 @@ static int guc_request_alloc(struct i915_request *rq) out: /* * We block all requests on this context if a G2H is pending for a
* context deregistration as the GuC will fail a context registration
* while this G2H is pending. Once a G2H returns, the fence is released
* that is blocking these requests (see guc_signal_context_fence).
* schedule disable or context deregistration as the GuC will fail a
* schedule enable or context registration if either G2H is pending
* respectfully. Once a G2H returns, the fence is released that is
* blocking these requests (see guc_signal_context_fence).
* We can safely check the below field outside of the lock as it isn't
* possible for this field to transition from being clear to set but
* We can safely check the below fields outside of the lock as it isn't
* possible for these fields to transition from being clear to set but
*/
- converse is possible, hence the need for the check within the lock.
- if (likely(!context_wait_for_deregister_to_register(ce)))
- if (likely(!context_wait_for_deregister_to_register(ce) &&
spin_lock_irqsave(&ce->guc_state.lock, flags);!context_pending_disable(ce))) return 0;
- if (context_wait_for_deregister_to_register(ce)) {
- if (context_wait_for_deregister_to_register(ce) ||
context_pending_disable(ce)) { i915_sw_fence_await(&rq->submit); list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
@@ -1484,10 +1499,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) {
/*
* Unpin must be done before __guc_signal_context_fence,
* otherwise a race exists between the requests getting
* submitted + retired before this unpin completes resulting in
* the pin_count going to zero and the context still being
* enabled.
*/ intel_context_sched_disable_unpin(ce); spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_pending_disable(ce);
}__guc_signal_context_fence(ce); spin_unlock_irqrestore(&ce->guc_state.lock, flags);
Disable preempt busywait when using GuC scheduling. This isn't need as the GuC control preemption when scheduling.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 87b06572fd2e..f7aae502ec3d 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -506,7 +506,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT;
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - if (intel_engine_has_semaphores(rq->engine)) + if (intel_engine_has_semaphores(rq->engine) && + !intel_uc_uses_guc_submission(&rq->engine->gt->uc)) cs = emit_preempt_busywait(rq, cs);
rq->tail = intel_ring_offset(rq, cs); @@ -598,7 +599,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT;
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; - if (intel_engine_has_semaphores(rq->engine)) + if (intel_engine_has_semaphores(rq->engine) && + !intel_uc_uses_guc_submission(&rq->engine->gt->uc)) cs = gen12_emit_preempt_busywait(rq, cs);
rq->tail = intel_ring_offset(rq, cs);
On 6/24/2021 00:04, Matthew Brost wrote:
Disable preempt busywait when using GuC scheduling. This isn't need as
needed
the GuC control preemption when scheduling.
controls
With the above fixed: Reviewed-by: John Harrison John.C.Harrison@Intel.com
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c index 87b06572fd2e..f7aae502ec3d 100644 --- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c @@ -506,7 +506,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT;
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
- if (intel_engine_has_semaphores(rq->engine))
if (intel_engine_has_semaphores(rq->engine) &&
!intel_uc_uses_guc_submission(&rq->engine->gt->uc))
cs = emit_preempt_busywait(rq, cs);
rq->tail = intel_ring_offset(rq, cs);
@@ -598,7 +599,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs) *cs++ = MI_USER_INTERRUPT;
*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
- if (intel_engine_has_semaphores(rq->engine))
if (intel_engine_has_semaphores(rq->engine) &&
!intel_uc_uses_guc_submission(&rq->engine->gt->uc))
cs = gen12_emit_preempt_busywait(rq, cs);
rq->tail = intel_ring_offset(rq, cs);
If two requests are on the same ring, they are explicitly ordered by the HW. So, a submission fence is sufficient to ensure ordering when using the new GuC submission interface. Conversely, if two requests share a timeline and are on the same physical engine but different context this doesn't ensure ordering on the new GuC submission interface. So, a completion fence needs to be used to ensure ordering.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 - drivers/gpu/drm/i915/i915_request.c | 17 +++++++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0a6ccdf32316..010e46dd6b16 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -926,7 +926,6 @@ static void guc_context_sched_disable(struct intel_context *ce) * request doesn't slip through the 'context_pending_disable' fence. */ if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) { - spin_unlock_irqrestore(&ce->guc_state.lock, flags); return; } guc_id = prep_context_pending_disable(ce); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 9dad3df5eaf7..d92c9f25c9f4 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -444,6 +444,7 @@ void i915_request_retire_upto(struct i915_request *rq)
do { tmp = list_first_entry(&tl->requests, typeof(*tmp), link); + GEM_BUG_ON(!i915_request_completed(tmp)); } while (i915_request_retire(tmp) && tmp != rq); }
@@ -1405,6 +1406,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence) return err; }
+static int +i915_request_await_request(struct i915_request *to, struct i915_request *from); + int i915_request_await_execution(struct i915_request *rq, struct dma_fence *fence, @@ -1464,12 +1468,13 @@ await_request_submit(struct i915_request *to, struct i915_request *from) * the waiter to be submitted immediately to the physical engine * as it may then bypass the virtual request. */ - if (to->engine == READ_ONCE(from->engine)) + if (to->engine == READ_ONCE(from->engine)) { return i915_sw_fence_await_sw_fence_gfp(&to->submit, &from->submit, I915_FENCE_GFP); - else + } else { return __i915_request_await_execution(to, from, NULL); + } }
static int @@ -1493,7 +1498,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from) return ret; }
- if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask))) + if (!intel_engine_uses_guc(to->engine) && + is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask))) ret = await_request_submit(to, from); else ret = emit_semaphore_wait(to, from, I915_FENCE_GFP); @@ -1654,6 +1660,8 @@ __i915_request_add_to_timeline(struct i915_request *rq) prev = to_request(__i915_active_fence_set(&timeline->last_request, &rq->fence)); if (prev && !__i915_request_is_complete(prev)) { + bool uses_guc = intel_engine_uses_guc(rq->engine); + /* * The requests are supposed to be kept in order. However, * we need to be wary in case the timeline->last_request @@ -1664,7 +1672,8 @@ __i915_request_add_to_timeline(struct i915_request *rq) i915_seqno_passed(prev->fence.seqno, rq->fence.seqno));
- if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) + if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) || + (uses_guc && prev->context == rq->context)) i915_sw_fence_await_sw_fence(&rq->submit, &prev->submit, &rq->submitq);
On 6/24/2021 12:04 AM, Matthew Brost wrote:
If two requests are on the same ring, they are explicitly ordered by the HW. So, a submission fence is sufficient to ensure ordering when using the new GuC submission interface. Conversely, if two requests share a timeline and are on the same physical engine but different context this doesn't ensure ordering on the new GuC submission interface. So, a completion fence needs to be used to ensure ordering.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 1 - drivers/gpu/drm/i915/i915_request.c | 17 +++++++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 0a6ccdf32316..010e46dd6b16 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -926,7 +926,6 @@ static void guc_context_sched_disable(struct intel_context *ce) * request doesn't slip through the 'context_pending_disable' fence. */ if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
Why is this unlock() being dropped here?
return;
} guc_id = prep_context_pending_disable(ce); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 9dad3df5eaf7..d92c9f25c9f4 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -444,6 +444,7 @@ void i915_request_retire_upto(struct i915_request *rq)
do { tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
GEM_BUG_ON(!i915_request_completed(tmp));
This condition in the BUG_ON is not a new requirement introduced by the changes below, right? just want to make sure I'm not missing anything.
} while (i915_request_retire(tmp) && tmp != rq); }
@@ -1405,6 +1406,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence) return err; }
+static int +i915_request_await_request(struct i915_request *to, struct i915_request *from);
- int i915_request_await_execution(struct i915_request *rq, struct dma_fence *fence,
@@ -1464,12 +1468,13 @@ await_request_submit(struct i915_request *to, struct i915_request *from) * the waiter to be submitted immediately to the physical engine * as it may then bypass the virtual request. */
- if (to->engine == READ_ONCE(from->engine))
- if (to->engine == READ_ONCE(from->engine)) { return i915_sw_fence_await_sw_fence_gfp(&to->submit, &from->submit, I915_FENCE_GFP);
- else
- } else { return __i915_request_await_execution(to, from, NULL);
- }
{ }Â are not needed here. I'm guessing they're leftover from a dropped change.
}
static int @@ -1493,7 +1498,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from) return ret; }
- if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
- if (!intel_engine_uses_guc(to->engine) &&
ret = await_request_submit(to, from); else ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
@@ -1654,6 +1660,8 @@ __i915_request_add_to_timeline(struct i915_request *rq) prev = to_request(__i915_active_fence_set(&timeline->last_request, &rq->fence)); if (prev && !__i915_request_is_complete(prev)) {
bool uses_guc = intel_engine_uses_guc(rq->engine);
- /*
- The requests are supposed to be kept in order. However,
- we need to be wary in case the timeline->last_request
@@ -1664,7 +1672,8 @@ __i915_request_add_to_timeline(struct i915_request *rq) i915_seqno_passed(prev->fence.seqno, rq->fence.seqno));
if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
(uses_guc && prev->context == rq->context))
Would it be worth adding an engine flag instead of checking which back-end is in use? I915_ENGINE_IS_FIFO or something. Not a blocker.
Daniele
i915_sw_fence_await_sw_fence(&rq->submit, &prev->submit, &rq->submitq);
Semaphores are an optimization and not required for basic GuC submission to work properly. Disable until we have time to do the implementation to enable semaphores and tune them for performance. Also long direction is just to delete semaphores from the i915 so another reason to not enable these for GuC submission.
v2: Reword commit message
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 7720b8c22c81..5c07e6abf16a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce, ce->timeline = intel_timeline_get(ctx->timeline);
if (ctx->sched.priority >= I915_PRIORITY_NORMAL && - intel_engine_has_timeslices(ce->engine)) + intel_engine_has_timeslices(ce->engine) && + intel_engine_has_semaphores(ce->engine)) __set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us); @@ -1938,7 +1939,8 @@ static int __apply_priority(struct intel_context *ce, void *arg) if (!intel_engine_has_timeslices(ce->engine)) return 0;
- if (ctx->sched.priority >= I915_PRIORITY_NORMAL) + if (ctx->sched.priority >= I915_PRIORITY_NORMAL && + intel_engine_has_semaphores(ce->engine)) intel_context_set_use_semaphores(ce); else intel_context_clear_use_semaphores(ce);
On 6/24/2021 00:04, Matthew Brost wrote:
Semaphores are an optimization and not required for basic GuC submission to work properly. Disable until we have time to do the implementation to enable semaphores and tune them for performance. Also long direction is just to delete semaphores from the i915 so another reason to not enable these for GuC submission.
v2: Reword commit message
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
I think the commit description does not really match the patch content. The description is valid but the 'disable' is done by simply not setting the enable flag (done in the execlist back end and presumably not done in the GuC back end). However, what the patch is actually doing seems to be fixing bugs with the 'are semaphores enabled' mechanism. I.e. correcting pieces of code that used semaphores without checking if they are enabled. And presumably this would be broken if someone tried to disable semaphores in execlist mode for any reason?
So I think keeping the existing comment text is fine but something should be added to explain the actual changes.
John.
drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 7720b8c22c81..5c07e6abf16a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce, ce->timeline = intel_timeline_get(ctx->timeline);
if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
intel_engine_has_timeslices(ce->engine))
intel_engine_has_timeslices(ce->engine) &&
intel_engine_has_semaphores(ce->engine))
__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);
@@ -1938,7 +1939,8 @@ static int __apply_priority(struct intel_context *ce, void *arg) if (!intel_engine_has_timeslices(ce->engine)) return 0;
- if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
- if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
intel_context_set_use_semaphores(ce); else intel_context_clear_use_semaphores(ce);intel_engine_has_semaphores(ce->engine))
On Fri, Jul 09, 2021 at 04:53:37PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Semaphores are an optimization and not required for basic GuC submission to work properly. Disable until we have time to do the implementation to enable semaphores and tune them for performance. Also long direction is just to delete semaphores from the i915 so another reason to not enable these for GuC submission.
v2: Reword commit message
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
I think the commit description does not really match the patch content. The description is valid but the 'disable' is done by simply not setting the enable flag (done in the execlist back end and presumably not done in the GuC back end). However, what the patch is actually doing seems to be fixing bugs with the 'are semaphores enabled' mechanism. I.e. correcting pieces of code that used semaphores without checking if they are enabled. And presumably this would be broken if someone tried to disable semaphores in execlist mode for any reason?
So I think keeping the existing comment text is fine but something should be added to explain the actual changes.
Yes, commit is wrong. This more or less bug fix to the existing code. Will update.
Matt
John.
drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 7720b8c22c81..5c07e6abf16a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -230,7 +230,8 @@ static void intel_context_set_gem(struct intel_context *ce, ce->timeline = intel_timeline_get(ctx->timeline); if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
intel_engine_has_timeslices(ce->engine))
intel_engine_has_timeslices(ce->engine) &&
__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags); intel_context_set_watchdog_us(ce, ctx->watchdog.timeout_us);intel_engine_has_semaphores(ce->engine))
@@ -1938,7 +1939,8 @@ static int __apply_priority(struct intel_context *ce, void *arg) if (!intel_engine_has_timeslices(ce->engine)) return 0;
- if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
- if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
intel_context_set_use_semaphores(ce); else intel_context_clear_use_semaphores(ce);intel_engine_has_semaphores(ce->engine))
Ensure G2H response has space in the buffer before sending H2G CTB as the GuC can't handle any backpressure on the G2H interface.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 13 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 +++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++-- 5 files changed, 87 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b43ec56986b5..24e7a924134e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -95,11 +95,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) }
#define INTEL_GUC_SEND_NB BIT(31) +#define INTEL_GUC_SEND_G2H_DW_SHIFT 0 +#define INTEL_GUC_SEND_G2H_DW_MASK (0xff << INTEL_GUC_SEND_G2H_DW_SHIFT) +#define MAKE_SEND_FLAGS(len) \ + ({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \ + (FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);}) static -inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len, + u32 g2h_len_dw) { return intel_guc_ct_send(&guc->ct, action, len, NULL, 0, - INTEL_GUC_SEND_NB); + MAKE_SEND_FLAGS(g2h_len_dw)); }
static inline int @@ -113,6 +119,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, static inline int intel_guc_send_busy_loop(struct intel_guc* guc, const u32 *action, u32 len, + u32 g2h_len_dw, bool loop) { int err; @@ -121,7 +128,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc, might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
retry: - err = intel_guc_send_nb(guc, action, len); + err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (unlikely(err == -EBUSY && loop)) { if (likely(!in_atomic() && !irqs_disabled())) cond_resched(); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 7491f041859e..a60970e85635 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) #define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) +#define G2H_ROOM_BUFFER_SIZE (PAGE_SIZE)
struct ct_request { struct list_head link; @@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) { + u32 space; + ctb->broken = false; ctb->tail = 0; ctb->head = 0; - ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size); + space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space; + atomic_set(&ctb->space, space);
guc_ct_buffer_desc_init(ctb->desc); }
static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb, struct guc_ct_buffer_desc *desc, - u32 *cmds, u32 size_in_bytes) + u32 *cmds, u32 size_in_bytes, u32 resv_space) { GEM_BUG_ON(size_in_bytes % 4);
ctb->desc = desc; ctb->cmds = cmds; ctb->size = size_in_bytes / 4; + ctb->resv_space = resv_space / 4;
guc_ct_buffer_reset(ctb); } @@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) struct guc_ct_buffer_desc *desc; u32 blob_size; u32 cmds_size; + u32 resv_space; void *blob; u32 *cmds; int err; @@ -250,19 +256,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) desc = blob; cmds = blob + 2 * CTB_DESC_SIZE; cmds_size = CTB_H2G_BUFFER_SIZE; - CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "send", - ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size); + resv_space = 0; + CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "send", + ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size, + resv_space);
- guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size); + guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size, resv_space);
/* store pointers to desc and cmds for recv ctb */ desc = blob + CTB_DESC_SIZE; cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE; cmds_size = CTB_G2H_BUFFER_SIZE; - CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "recv", - ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size); + resv_space = G2H_ROOM_BUFFER_SIZE; + CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "recv", + ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size, + resv_space);
- guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size); + guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size, resv_space);
return 0; } @@ -458,7 +468,7 @@ static int ct_write(struct intel_guc_ct *ct, /* now update descriptor */ ctb->tail = tail; WRITE_ONCE(desc->tail, tail); - ctb->space -= len + 1; + atomic_sub(len + 1, &ctb->space);
return 0;
@@ -521,13 +531,34 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }
+static inline bool g2h_has_room(struct intel_guc_ct *ct, u32 g2h_len_dw) +{ + struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv; + + /* + * We leave a certain amount of space in the G2H CTB buffer for + * unexpected G2H CTBs (e.g. logging, engine hang, etc...) + */ + return !g2h_len_dw || atomic_read(&ctb->space) >= g2h_len_dw; +} + +static inline void g2h_reserve_space(struct intel_guc_ct *ct, u32 g2h_len_dw) +{ + lockdep_assert_held(&ct->ctbs.send.lock); + + GEM_BUG_ON(!g2h_has_room(ct, g2h_len_dw)); + + if (g2h_len_dw) + atomic_sub(g2h_len_dw, &ct->ctbs.recv.space); +} + static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; u32 head; u32 space;
- if (ctb->space >= len_dw) + if (atomic_read(&ctb->space) >= len_dw) return true;
head = READ_ONCE(ctb->desc->head); @@ -540,16 +571,16 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) }
space = CIRC_SPACE(ctb->tail, head, ctb->size); - ctb->space = space; + atomic_set(&ctb->space, space);
return space >= len_dw; }
-static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) +static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw) { lockdep_assert_held(&ct->ctbs.send.lock);
- if (unlikely(!h2g_has_room(ct, len_dw))) { + if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
@@ -563,6 +594,9 @@ static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) return 0; }
+#define G2H_LEN_DW(f) \ + FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) ? \ + FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0 static int ct_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len, @@ -570,12 +604,13 @@ static int ct_send_nb(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; unsigned long spin_flags; + u32 g2h_len_dw = G2H_LEN_DW(flags); u32 fence; int ret;
spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = has_room_nb(ct, len + 1); + ret = has_room_nb(ct, len + 1, g2h_len_dw); if (unlikely(ret)) goto out;
@@ -584,6 +619,7 @@ static int ct_send_nb(struct intel_guc_ct *ct, if (unlikely(ret)) goto out;
+ g2h_reserve_space(ct, g2h_len_dw); intel_guc_notify(ct_to_guc(ct));
out: @@ -965,10 +1001,22 @@ static void ct_incoming_request_worker_func(struct work_struct *w) static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request) { const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN]; + u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]); unsigned long flags;
GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT);
+ /* + * Adjusting the space must be done in IRQ or deadlock can occur as the + * CTB processing in the below workqueue can send CTBs which creates a + * circular dependency if the space was returned there. + */ + switch (action) { + case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: + case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE: + atomic_add(request->size, &ct->ctbs.recv.space); + } + spin_lock_irqsave(&ct->requests.lock, flags); list_add_tail(&request->link, &ct->requests.incoming); spin_unlock_irqrestore(&ct->requests.lock, flags); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 9924335e2ee6..660bf37238e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,7 @@ struct intel_guc; * @desc: pointer to the buffer descriptor * @cmds: pointer to the commands buffer * @size: size of the commands buffer in dwords + * @resv_space: reserved space in buffer in dwords * @head: local shadow copy of head in dwords * @tail: local shadow copy of tail in dwords * @space: local shadow copy of space in dwords @@ -43,9 +44,10 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size; + u32 resv_space; u32 tail; u32 head; - u32 space; + atomic_t space; bool broken; };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 4e4edc368b77..94bb1ca6f889 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -17,6 +17,10 @@ #include "abi/guc_communication_ctb_abi.h" #include "abi/guc_messages_abi.h"
+/* Payload length only i.e. don't include G2H header length */ +#define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET 2 +#define G2H_LEN_DW_DEREGISTER_CONTEXT 1 + #define GUC_CONTEXT_DISABLE 0 #define GUC_CONTEXT_ENABLE 1
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 010e46dd6b16..ef24758c4266 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -260,6 +260,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) struct intel_context *ce = rq->context; u32 action[3]; int len = 0; + u32 g2h_len_dw = 0; bool enabled = context_enabled(ce);
GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); @@ -271,13 +272,13 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) action[len++] = GUC_CONTEXT_ENABLE; set_context_pending_enable(ce); intel_context_get(ce); + g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET; } else { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; action[len++] = ce->guc_id; }
- err = intel_guc_send_nb(guc, action, len); - + err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) { set_context_enabled(ce); } else if (!enabled) { @@ -730,7 +731,7 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true); + return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); }
static int register_context(struct intel_context *ce) @@ -750,7 +751,8 @@ static int __guc_action_deregister_context(struct intel_guc *guc, guc_id, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true); + return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), + G2H_LEN_DW_DEREGISTER_CONTEXT, true); }
static int deregister_context(struct intel_context *ce, u32 guc_id) @@ -889,7 +891,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
intel_context_get(ce);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true); + intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), + G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true); }
static u16 prep_context_pending_disable(struct intel_context *ce)
On 6/24/2021 00:04, Matthew Brost wrote:
Ensure G2H response has space in the buffer before sending H2G CTB as the GuC can't handle any backpressure on the G2H interface.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 13 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 +++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++-- 5 files changed, 87 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b43ec56986b5..24e7a924134e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -95,11 +95,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) }
#define INTEL_GUC_SEND_NB BIT(31) +#define INTEL_GUC_SEND_G2H_DW_SHIFT 0 +#define INTEL_GUC_SEND_G2H_DW_MASK (0xff << INTEL_GUC_SEND_G2H_DW_SHIFT) +#define MAKE_SEND_FLAGS(len) \
- ({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \
- (FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);}) static
-inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len,
{ return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,u32 g2h_len_dw)
INTEL_GUC_SEND_NB);
MAKE_SEND_FLAGS(g2h_len_dw));
}
static inline int
@@ -113,6 +119,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, static inline int intel_guc_send_busy_loop(struct intel_guc* guc, const u32 *action, u32 len,
{ int err;u32 g2h_len_dw, bool loop)
@@ -121,7 +128,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc, might_sleep_if(loop && (!in_atomic() && !irqs_disabled()));
retry:
- err = intel_guc_send_nb(guc, action, len);
- err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (unlikely(err == -EBUSY && loop)) { if (likely(!in_atomic() && !irqs_disabled())) cond_resched();
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 7491f041859e..a60970e85635 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) #define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) +#define G2H_ROOM_BUFFER_SIZE (PAGE_SIZE)
Any particular reason why PAGE_SIZE instead of SZ_4K? I'm not seeing anything in the code that is actually related to page sizes. Seems like '(CTB_G2H_BUFFER_SIZE / 4)' would be a more correct way to express it. Unless I'm missing something about how it's used?
John.
struct ct_request { struct list_head link; @@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) {
- u32 space;
- ctb->broken = false; ctb->tail = 0; ctb->head = 0;
- ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space;
atomic_set(&ctb->space, space);
guc_ct_buffer_desc_init(ctb->desc); }
static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb, struct guc_ct_buffer_desc *desc,
u32 *cmds, u32 size_in_bytes)
u32 *cmds, u32 size_in_bytes, u32 resv_space)
{ GEM_BUG_ON(size_in_bytes % 4);
ctb->desc = desc; ctb->cmds = cmds; ctb->size = size_in_bytes / 4;
ctb->resv_space = resv_space / 4;
guc_ct_buffer_reset(ctb); }
@@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) struct guc_ct_buffer_desc *desc; u32 blob_size; u32 cmds_size;
- u32 resv_space; void *blob; u32 *cmds; int err;
@@ -250,19 +256,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) desc = blob; cmds = blob + 2 * CTB_DESC_SIZE; cmds_size = CTB_H2G_BUFFER_SIZE;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "send",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
- resv_space = 0;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "send",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
resv_space);
- guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size, resv_space);
/* store pointers to desc and cmds for recv ctb */ desc = blob + CTB_DESC_SIZE; cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE; cmds_size = CTB_G2H_BUFFER_SIZE;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "recv",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
- resv_space = G2H_ROOM_BUFFER_SIZE;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "recv",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
resv_space);
- guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size);
guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size, resv_space);
return 0; }
@@ -458,7 +468,7 @@ static int ct_write(struct intel_guc_ct *ct, /* now update descriptor */ ctb->tail = tail; WRITE_ONCE(desc->tail, tail);
- ctb->space -= len + 1;
atomic_sub(len + 1, &ctb->space);
return 0;
@@ -521,13 +531,34 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; }
+static inline bool g2h_has_room(struct intel_guc_ct *ct, u32 g2h_len_dw) +{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
- /*
* We leave a certain amount of space in the G2H CTB buffer for
* unexpected G2H CTBs (e.g. logging, engine hang, etc...)
*/
- return !g2h_len_dw || atomic_read(&ctb->space) >= g2h_len_dw;
+}
+static inline void g2h_reserve_space(struct intel_guc_ct *ct, u32 g2h_len_dw) +{
- lockdep_assert_held(&ct->ctbs.send.lock);
- GEM_BUG_ON(!g2h_has_room(ct, g2h_len_dw));
- if (g2h_len_dw)
atomic_sub(g2h_len_dw, &ct->ctbs.recv.space);
+}
- static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; u32 head; u32 space;
- if (ctb->space >= len_dw)
if (atomic_read(&ctb->space) >= len_dw) return true;
head = READ_ONCE(ctb->desc->head);
@@ -540,16 +571,16 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) }
space = CIRC_SPACE(ctb->tail, head, ctb->size);
- ctb->space = space;
atomic_set(&ctb->space, space);
return space >= len_dw; }
-static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) +static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw) { lockdep_assert_held(&ct->ctbs.send.lock);
- if (unlikely(!h2g_has_room(ct, len_dw))) {
- if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
@@ -563,6 +594,9 @@ static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) return 0; }
+#define G2H_LEN_DW(f) \
- FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) ? \
- FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0 static int ct_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len,
@@ -570,12 +604,13 @@ static int ct_send_nb(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; unsigned long spin_flags;
u32 g2h_len_dw = G2H_LEN_DW(flags); u32 fence; int ret;
spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = has_room_nb(ct, len + 1);
- ret = has_room_nb(ct, len + 1, g2h_len_dw); if (unlikely(ret)) goto out;
@@ -584,6 +619,7 @@ static int ct_send_nb(struct intel_guc_ct *ct, if (unlikely(ret)) goto out;
g2h_reserve_space(ct, g2h_len_dw); intel_guc_notify(ct_to_guc(ct));
out:
@@ -965,10 +1001,22 @@ static void ct_incoming_request_worker_func(struct work_struct *w) static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request) { const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN];
u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]); unsigned long flags;
GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT);
/*
* Adjusting the space must be done in IRQ or deadlock can occur as the
* CTB processing in the below workqueue can send CTBs which creates a
* circular dependency if the space was returned there.
*/
switch (action) {
case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
atomic_add(request->size, &ct->ctbs.recv.space);
}
spin_lock_irqsave(&ct->requests.lock, flags); list_add_tail(&request->link, &ct->requests.incoming); spin_unlock_irqrestore(&ct->requests.lock, flags);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 9924335e2ee6..660bf37238e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,7 @@ struct intel_guc;
- @desc: pointer to the buffer descriptor
- @cmds: pointer to the commands buffer
- @size: size of the commands buffer in dwords
- @resv_space: reserved space in buffer in dwords
- @head: local shadow copy of head in dwords
- @tail: local shadow copy of tail in dwords
- @space: local shadow copy of space in dwords
@@ -43,9 +44,10 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;
- u32 resv_space; u32 tail; u32 head;
- u32 space;
- atomic_t space; bool broken; };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 4e4edc368b77..94bb1ca6f889 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -17,6 +17,10 @@ #include "abi/guc_communication_ctb_abi.h" #include "abi/guc_messages_abi.h"
+/* Payload length only i.e. don't include G2H header length */ +#define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET 2 +#define G2H_LEN_DW_DEREGISTER_CONTEXT 1
- #define GUC_CONTEXT_DISABLE 0 #define GUC_CONTEXT_ENABLE 1
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 010e46dd6b16..ef24758c4266 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -260,6 +260,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) struct intel_context *ce = rq->context; u32 action[3]; int len = 0;
u32 g2h_len_dw = 0; bool enabled = context_enabled(ce);
GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
@@ -271,13 +272,13 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) action[len++] = GUC_CONTEXT_ENABLE; set_context_pending_enable(ce); intel_context_get(ce);
} else { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; action[len++] = ce->guc_id; }g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
- err = intel_guc_send_nb(guc, action, len);
- err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) { set_context_enabled(ce); } else if (!enabled) {
@@ -730,7 +731,7 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); }
static int register_context(struct intel_context *ce)
@@ -750,7 +751,8 @@ static int __guc_action_deregister_context(struct intel_guc *guc, guc_id, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
G2H_LEN_DW_DEREGISTER_CONTEXT, true);
}
static int deregister_context(struct intel_context *ce, u32 guc_id)
@@ -889,7 +891,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
intel_context_get(ce);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
}
static u16 prep_context_pending_disable(struct intel_context *ce)
On Tue, Jul 13, 2021 at 11:36:05AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Ensure G2H response has space in the buffer before sending H2G CTB as the GuC can't handle any backpressure on the G2H interface.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 13 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 +++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++-- 5 files changed, 87 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b43ec56986b5..24e7a924134e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -95,11 +95,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) } #define INTEL_GUC_SEND_NB BIT(31) +#define INTEL_GUC_SEND_G2H_DW_SHIFT 0 +#define INTEL_GUC_SEND_G2H_DW_MASK (0xff << INTEL_GUC_SEND_G2H_DW_SHIFT) +#define MAKE_SEND_FLAGS(len) \
- ({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \
- (FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);}) static
-inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len,
{ return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,u32 g2h_len_dw)
INTEL_GUC_SEND_NB);
} static inline intMAKE_SEND_FLAGS(g2h_len_dw));
@@ -113,6 +119,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, static inline int intel_guc_send_busy_loop(struct intel_guc* guc, const u32 *action, u32 len,
{ int err;u32 g2h_len_dw, bool loop)
@@ -121,7 +128,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc, might_sleep_if(loop && (!in_atomic() && !irqs_disabled())); retry:
- err = intel_guc_send_nb(guc, action, len);
- err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (unlikely(err == -EBUSY && loop)) { if (likely(!in_atomic() && !irqs_disabled())) cond_resched();
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 7491f041859e..a60970e85635 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) #define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) +#define G2H_ROOM_BUFFER_SIZE (PAGE_SIZE)
Any particular reason why PAGE_SIZE instead of SZ_4K? I'm not seeing anything in the code that is actually related to page sizes. Seems like '(CTB_G2H_BUFFER_SIZE / 4)' would be a more correct way to express it. Unless I'm missing something about how it's used?
Yes, CTB_G2H_BUFFER_SIZE / 4 is better.
Matt
John.
struct ct_request { struct list_head link; @@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) {
- u32 space;
- ctb->broken = false; ctb->tail = 0; ctb->head = 0;
- ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
- space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space;
- atomic_set(&ctb->space, space); guc_ct_buffer_desc_init(ctb->desc); } static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb, struct guc_ct_buffer_desc *desc,
u32 *cmds, u32 size_in_bytes)
{ GEM_BUG_ON(size_in_bytes % 4); ctb->desc = desc; ctb->cmds = cmds; ctb->size = size_in_bytes / 4;u32 *cmds, u32 size_in_bytes, u32 resv_space)
- ctb->resv_space = resv_space / 4; guc_ct_buffer_reset(ctb); }
@@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) struct guc_ct_buffer_desc *desc; u32 blob_size; u32 cmds_size;
- u32 resv_space; void *blob; u32 *cmds; int err;
@@ -250,19 +256,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) desc = blob; cmds = blob + 2 * CTB_DESC_SIZE; cmds_size = CTB_H2G_BUFFER_SIZE;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "send",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
- resv_space = 0;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "send",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
resv_space);
- guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
- guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size, resv_space); /* store pointers to desc and cmds for recv ctb */ desc = blob + CTB_DESC_SIZE; cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE; cmds_size = CTB_G2H_BUFFER_SIZE;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "recv",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
- resv_space = G2H_ROOM_BUFFER_SIZE;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "recv",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
resv_space);
- guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size);
- guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size, resv_space); return 0; }
@@ -458,7 +468,7 @@ static int ct_write(struct intel_guc_ct *ct, /* now update descriptor */ ctb->tail = tail; WRITE_ONCE(desc->tail, tail);
- ctb->space -= len + 1;
- atomic_sub(len + 1, &ctb->space); return 0;
@@ -521,13 +531,34 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } +static inline bool g2h_has_room(struct intel_guc_ct *ct, u32 g2h_len_dw) +{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
- /*
* We leave a certain amount of space in the G2H CTB buffer for
* unexpected G2H CTBs (e.g. logging, engine hang, etc...)
*/
- return !g2h_len_dw || atomic_read(&ctb->space) >= g2h_len_dw;
+}
+static inline void g2h_reserve_space(struct intel_guc_ct *ct, u32 g2h_len_dw) +{
- lockdep_assert_held(&ct->ctbs.send.lock);
- GEM_BUG_ON(!g2h_has_room(ct, g2h_len_dw));
- if (g2h_len_dw)
atomic_sub(g2h_len_dw, &ct->ctbs.recv.space);
+}
- static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; u32 head; u32 space;
- if (ctb->space >= len_dw)
- if (atomic_read(&ctb->space) >= len_dw) return true; head = READ_ONCE(ctb->desc->head);
@@ -540,16 +571,16 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) } space = CIRC_SPACE(ctb->tail, head, ctb->size);
- ctb->space = space;
- atomic_set(&ctb->space, space); return space >= len_dw; }
-static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) +static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw) { lockdep_assert_held(&ct->ctbs.send.lock);
- if (unlikely(!h2g_has_room(ct, len_dw))) {
- if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
@@ -563,6 +594,9 @@ static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) return 0; } +#define G2H_LEN_DW(f) \
- FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) ? \
- FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0 static int ct_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len,
@@ -570,12 +604,13 @@ static int ct_send_nb(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; unsigned long spin_flags;
- u32 g2h_len_dw = G2H_LEN_DW(flags); u32 fence; int ret; spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = has_room_nb(ct, len + 1);
- ret = has_room_nb(ct, len + 1, g2h_len_dw); if (unlikely(ret)) goto out;
@@ -584,6 +619,7 @@ static int ct_send_nb(struct intel_guc_ct *ct, if (unlikely(ret)) goto out;
- g2h_reserve_space(ct, g2h_len_dw); intel_guc_notify(ct_to_guc(ct)); out:
@@ -965,10 +1001,22 @@ static void ct_incoming_request_worker_func(struct work_struct *w) static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request) { const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN];
- u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]); unsigned long flags; GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT);
- /*
* Adjusting the space must be done in IRQ or deadlock can occur as the
* CTB processing in the below workqueue can send CTBs which creates a
* circular dependency if the space was returned there.
*/
- switch (action) {
- case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
- case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
atomic_add(request->size, &ct->ctbs.recv.space);
- }
- spin_lock_irqsave(&ct->requests.lock, flags); list_add_tail(&request->link, &ct->requests.incoming); spin_unlock_irqrestore(&ct->requests.lock, flags);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 9924335e2ee6..660bf37238e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,7 @@ struct intel_guc;
- @desc: pointer to the buffer descriptor
- @cmds: pointer to the commands buffer
- @size: size of the commands buffer in dwords
- @resv_space: reserved space in buffer in dwords
- @head: local shadow copy of head in dwords
- @tail: local shadow copy of tail in dwords
- @space: local shadow copy of space in dwords
@@ -43,9 +44,10 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;
- u32 resv_space; u32 tail; u32 head;
- u32 space;
- atomic_t space; bool broken; };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 4e4edc368b77..94bb1ca6f889 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -17,6 +17,10 @@ #include "abi/guc_communication_ctb_abi.h" #include "abi/guc_messages_abi.h" +/* Payload length only i.e. don't include G2H header length */ +#define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET 2 +#define G2H_LEN_DW_DEREGISTER_CONTEXT 1
- #define GUC_CONTEXT_DISABLE 0 #define GUC_CONTEXT_ENABLE 1
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 010e46dd6b16..ef24758c4266 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -260,6 +260,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) struct intel_context *ce = rq->context; u32 action[3]; int len = 0;
- u32 g2h_len_dw = 0; bool enabled = context_enabled(ce); GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
@@ -271,13 +272,13 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) action[len++] = GUC_CONTEXT_ENABLE; set_context_pending_enable(ce); intel_context_get(ce);
} else { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; action[len++] = ce->guc_id; }g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
- err = intel_guc_send_nb(guc, action, len);
- err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) { set_context_enabled(ce); } else if (!enabled) {
@@ -730,7 +731,7 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); } static int register_context(struct intel_context *ce)
@@ -750,7 +751,8 @@ static int __guc_action_deregister_context(struct intel_guc *guc, guc_id, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
} static int deregister_context(struct intel_context *ce, u32 guc_id)G2H_LEN_DW_DEREGISTER_CONTEXT, true);
@@ -889,7 +891,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc, intel_context_get(ce);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
} static u16 prep_context_pending_disable(struct intel_context *ce)G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
On 7/14/2021 17:06, Matthew Brost wrote:
On Tue, Jul 13, 2021 at 11:36:05AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Ensure G2H response has space in the buffer before sending H2G CTB as the GuC can't handle any backpressure on the G2H interface.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 13 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 76 +++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++-- 5 files changed, 87 insertions(+), 23 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index b43ec56986b5..24e7a924134e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -95,11 +95,17 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len) } #define INTEL_GUC_SEND_NB BIT(31) +#define INTEL_GUC_SEND_G2H_DW_SHIFT 0 +#define INTEL_GUC_SEND_G2H_DW_MASK (0xff << INTEL_GUC_SEND_G2H_DW_SHIFT) +#define MAKE_SEND_FLAGS(len) \
- ({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_SEND_G2H_DW_MASK, len)); \
- (FIELD_PREP(INTEL_GUC_SEND_G2H_DW_MASK, len) | INTEL_GUC_SEND_NB);}) static
-inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len) +inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len,
{ return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,u32 g2h_len_dw)
INTEL_GUC_SEND_NB);
} static inline intMAKE_SEND_FLAGS(g2h_len_dw));
@@ -113,6 +119,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len, static inline int intel_guc_send_busy_loop(struct intel_guc* guc, const u32 *action, u32 len,
{ int err;u32 g2h_len_dw, bool loop)
@@ -121,7 +128,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc, might_sleep_if(loop && (!in_atomic() && !irqs_disabled())); retry:
- err = intel_guc_send_nb(guc, action, len);
- err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (unlikely(err == -EBUSY && loop)) { if (likely(!in_atomic() && !irqs_disabled())) cond_resched();
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 7491f041859e..a60970e85635 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct) #define CTB_DESC_SIZE ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K) #define CTB_H2G_BUFFER_SIZE (SZ_4K) #define CTB_G2H_BUFFER_SIZE (4 * CTB_H2G_BUFFER_SIZE) +#define G2H_ROOM_BUFFER_SIZE (PAGE_SIZE)
Any particular reason why PAGE_SIZE instead of SZ_4K? I'm not seeing anything in the code that is actually related to page sizes. Seems like '(CTB_G2H_BUFFER_SIZE / 4)' would be a more correct way to express it. Unless I'm missing something about how it's used?
Yes, CTB_G2H_BUFFER_SIZE / 4 is better.
Matt
Okay. With that changed:
Reviewed-by: John Harrison John.C.Harrison@Intel.com
John.
struct ct_request { struct list_head link; @@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc) static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb) {
- u32 space;
- ctb->broken = false; ctb->tail = 0; ctb->head = 0;
- ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
- space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space;
- atomic_set(&ctb->space, space); guc_ct_buffer_desc_init(ctb->desc); } static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb, struct guc_ct_buffer_desc *desc,
u32 *cmds, u32 size_in_bytes)
{ GEM_BUG_ON(size_in_bytes % 4); ctb->desc = desc; ctb->cmds = cmds; ctb->size = size_in_bytes / 4;u32 *cmds, u32 size_in_bytes, u32 resv_space)
- ctb->resv_space = resv_space / 4; guc_ct_buffer_reset(ctb); }
@@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) struct guc_ct_buffer_desc *desc; u32 blob_size; u32 cmds_size;
- u32 resv_space; void *blob; u32 *cmds; int err;
@@ -250,19 +256,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct) desc = blob; cmds = blob + 2 * CTB_DESC_SIZE; cmds_size = CTB_H2G_BUFFER_SIZE;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "send",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
- resv_space = 0;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "send",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
resv_space);
- guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
- guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size, resv_space); /* store pointers to desc and cmds for recv ctb */ desc = blob + CTB_DESC_SIZE; cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE; cmds_size = CTB_G2H_BUFFER_SIZE;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "recv",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
- resv_space = G2H_ROOM_BUFFER_SIZE;
- CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "recv",
ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
resv_space);
- guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size);
- guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size, resv_space); return 0; }
@@ -458,7 +468,7 @@ static int ct_write(struct intel_guc_ct *ct, /* now update descriptor */ ctb->tail = tail; WRITE_ONCE(desc->tail, tail);
- ctb->space -= len + 1;
- atomic_sub(len + 1, &ctb->space); return 0;
@@ -521,13 +531,34 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct) return ret; } +static inline bool g2h_has_room(struct intel_guc_ct *ct, u32 g2h_len_dw) +{
- struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
- /*
* We leave a certain amount of space in the G2H CTB buffer for
* unexpected G2H CTBs (e.g. logging, engine hang, etc...)
*/
- return !g2h_len_dw || atomic_read(&ctb->space) >= g2h_len_dw;
+}
+static inline void g2h_reserve_space(struct intel_guc_ct *ct, u32 g2h_len_dw) +{
- lockdep_assert_held(&ct->ctbs.send.lock);
- GEM_BUG_ON(!g2h_has_room(ct, g2h_len_dw));
- if (g2h_len_dw)
atomic_sub(g2h_len_dw, &ct->ctbs.recv.space);
+}
- static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; u32 head; u32 space;
- if (ctb->space >= len_dw)
- if (atomic_read(&ctb->space) >= len_dw) return true; head = READ_ONCE(ctb->desc->head);
@@ -540,16 +571,16 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw) } space = CIRC_SPACE(ctb->tail, head, ctb->size);
- ctb->space = space;
- atomic_set(&ctb->space, space); return space >= len_dw; }
-static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) +static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw) { lockdep_assert_held(&ct->ctbs.send.lock);
- if (unlikely(!h2g_has_room(ct, len_dw))) {
- if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) { if (ct->stall_time == KTIME_MAX) ct->stall_time = ktime_get();
@@ -563,6 +594,9 @@ static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw) return 0; } +#define G2H_LEN_DW(f) \
- FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) ? \
- FIELD_GET(INTEL_GUC_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0 static int ct_send_nb(struct intel_guc_ct *ct, const u32 *action, u32 len,
@@ -570,12 +604,13 @@ static int ct_send_nb(struct intel_guc_ct *ct, { struct intel_guc_ct_buffer *ctb = &ct->ctbs.send; unsigned long spin_flags;
- u32 g2h_len_dw = G2H_LEN_DW(flags); u32 fence; int ret; spin_lock_irqsave(&ctb->lock, spin_flags);
- ret = has_room_nb(ct, len + 1);
- ret = has_room_nb(ct, len + 1, g2h_len_dw); if (unlikely(ret)) goto out;
@@ -584,6 +619,7 @@ static int ct_send_nb(struct intel_guc_ct *ct, if (unlikely(ret)) goto out;
- g2h_reserve_space(ct, g2h_len_dw); intel_guc_notify(ct_to_guc(ct)); out:
@@ -965,10 +1001,22 @@ static void ct_incoming_request_worker_func(struct work_struct *w) static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request) { const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN];
- u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]); unsigned long flags; GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT);
- /*
* Adjusting the space must be done in IRQ or deadlock can occur as the
* CTB processing in the below workqueue can send CTBs which creates a
* circular dependency if the space was returned there.
*/
- switch (action) {
- case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
- case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
atomic_add(request->size, &ct->ctbs.recv.space);
- }
- spin_lock_irqsave(&ct->requests.lock, flags); list_add_tail(&request->link, &ct->requests.incoming); spin_unlock_irqrestore(&ct->requests.lock, flags);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 9924335e2ee6..660bf37238e2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -33,6 +33,7 @@ struct intel_guc; * @desc: pointer to the buffer descriptor * @cmds: pointer to the commands buffer * @size: size of the commands buffer in dwords
- @resv_space: reserved space in buffer in dwords
- @head: local shadow copy of head in dwords
- @tail: local shadow copy of tail in dwords
- @space: local shadow copy of space in dwords
@@ -43,9 +44,10 @@ struct intel_guc_ct_buffer { struct guc_ct_buffer_desc *desc; u32 *cmds; u32 size;
- u32 resv_space; u32 tail; u32 head;
- u32 space;
- atomic_t space; bool broken; };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h index 4e4edc368b77..94bb1ca6f889 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h @@ -17,6 +17,10 @@ #include "abi/guc_communication_ctb_abi.h" #include "abi/guc_messages_abi.h" +/* Payload length only i.e. don't include G2H header length */ +#define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET 2 +#define G2H_LEN_DW_DEREGISTER_CONTEXT 1
- #define GUC_CONTEXT_DISABLE 0 #define GUC_CONTEXT_ENABLE 1
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 010e46dd6b16..ef24758c4266 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -260,6 +260,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) struct intel_context *ce = rq->context; u32 action[3]; int len = 0;
- u32 g2h_len_dw = 0; bool enabled = context_enabled(ce); GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
@@ -271,13 +272,13 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) action[len++] = GUC_CONTEXT_ENABLE; set_context_pending_enable(ce); intel_context_get(ce);
} else { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT; action[len++] = ce->guc_id; }g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
- err = intel_guc_send_nb(guc, action, len);
- err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) { set_context_enabled(ce); } else if (!enabled) {
@@ -730,7 +731,7 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); } static int register_context(struct intel_context *ce)
@@ -750,7 +751,8 @@ static int __guc_action_deregister_context(struct intel_guc *guc, guc_id, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
} static int deregister_context(struct intel_context *ce, u32 guc_id)G2H_LEN_DW_DEREGISTER_CONTEXT, true);
@@ -889,7 +891,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc, intel_context_get(ce);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
} static u16 prep_context_pending_disable(struct intel_context *ce)G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero.
v2: rtimeout -> remaining_timeout
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/intel_gt.c | 19 ++++ drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 ++--- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 88 ++++++++++++++++++- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 5 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 14 files changed, 137 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 2fd155742bd2..335b955d5b4b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -644,7 +644,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, goto insert;
/* Attempt to reap some mmap space from dead objects */ - err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT); + err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT, + NULL); if (err) goto err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index e714e21c0a4d..acfdd53b2678 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt) GEM_BUG_ON(intel_gt_pm_is_awake(gt)); }
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) +{ + long remaining_timeout; + + /* If the device is asleep, we have no requests outstanding */ + if (!intel_gt_pm_is_awake(gt)) + return 0; + + while ((timeout = intel_gt_retire_requests_timeout(gt, timeout, + &remaining_timeout)) > 0) { + cond_resched(); + if (signal_pending(current)) + return -EINTR; + } + + return timeout ? timeout : intel_uc_wait_for_idle(>->uc, + remaining_timeout); +} + int intel_gt_init(struct intel_gt *gt) { int err; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index e7aabe0cc5bf..74e771871a9b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
void intel_gt_driver_late_release(struct intel_gt *gt);
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout); + void intel_gt_check_and_clear_faults(struct intel_gt *gt); void intel_gt_clear_error_registers(struct intel_gt *gt, intel_engine_mask_t engine_mask); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 647eca9d867a..39f5e824dac5 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -13,6 +13,7 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h" #include "intel_timeline.h" +#include "uc/intel_uc.h"
static bool retire_requests(struct intel_timeline *tl) { @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine) GEM_BUG_ON(engine->retire); }
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout) +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, + long *remaining_timeout) { struct intel_gt_timelines *timelines = >->timelines; struct intel_timeline *tl, *tn; @@ -195,22 +197,10 @@ out_active: spin_lock(&timelines->lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++;
- return active_count ? timeout : 0; -} - -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) -{ - /* If the device is asleep, we have no requests outstanding */ - if (!intel_gt_pm_is_awake(gt)) - return 0; - - while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) { - cond_resched(); - if (signal_pending(current)) - return -EINTR; - } + if (remaining_timeout) + *remaining_timeout = timeout;
- return timeout; + return active_count ? timeout : 0; }
static void retire_work_handler(struct work_struct *work) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h index fcc30a6e4fe9..51dbe0e3294e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h @@ -6,14 +6,17 @@ #ifndef INTEL_GT_REQUESTS_H #define INTEL_GT_REQUESTS_H
+#include <stddef.h> + struct intel_engine_cs; struct intel_gt; struct intel_timeline;
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout); +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout, + long *remaining_timeout); static inline void intel_gt_retire_requests(struct intel_gt *gt) { - intel_gt_retire_requests_timeout(gt, 0); + intel_gt_retire_requests_timeout(gt, 0, NULL); }
void intel_engine_init_retire(struct intel_engine_cs *engine); @@ -21,8 +24,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine, struct intel_timeline *tl); void intel_engine_fini_retire(struct intel_engine_cs *engine);
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout); - void intel_gt_init_requests(struct intel_gt *gt); void intel_gt_park_requests(struct intel_gt *gt); void intel_gt_unpark_requests(struct intel_gt *gt); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 24e7a924134e..22eb1e9cca41 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -38,6 +38,8 @@ struct intel_guc { spinlock_t irq_lock; unsigned int msg_enabled_mask;
+ atomic_t outstanding_submission_g2h; + struct { void (*reset)(struct intel_guc *guc); void (*enable)(struct intel_guc *guc); @@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) spin_unlock_irq(&guc->irq_lock); }
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout); + int intel_guc_reset_engine(struct intel_guc *guc, struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a60970e85635..e0f92e28350c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct) INIT_LIST_HEAD(&ct->requests.incoming); INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func); tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func); + init_waitqueue_head(&ct->wq); }
static inline const char *guc_ct_buffer_type_to_str(u32 type) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 660bf37238e2..ab1b79ab960b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -10,6 +10,7 @@ #include <linux/spinlock.h> #include <linux/workqueue.h> #include <linux/ktime.h> +#include <linux/wait.h>
#include "intel_guc_fwif.h"
@@ -68,6 +69,9 @@ struct intel_guc_ct {
struct tasklet_struct receive_tasklet;
+ /** @wq: wait queue for g2h chanenl */ + wait_queue_head_t wq; + struct { u16 last_fence; /* last fence used to send request */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ef24758c4266..d1a28283a9ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -254,6 +254,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); }
+static int guc_submission_busy_loop(struct intel_guc* guc, + const u32 *action, + u32 len, + u32 g2h_len_dw, + bool loop) +{ + int err; + + err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop); + + if (!err && g2h_len_dw) + atomic_inc(&guc->outstanding_submission_g2h); + + return err; +} + +static int guc_wait_for_pending_msg(struct intel_guc *guc, + atomic_t *wait_var, + bool interruptible, + long timeout) +{ + const int state = interruptible ? + TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE; + DEFINE_WAIT(wait); + + might_sleep(); + GEM_BUG_ON(timeout < 0); + + if (!atomic_read(wait_var)) + return 0; + + if (!timeout) + return -ETIME; + + for (;;) { + prepare_to_wait(&guc->ct.wq, &wait, state); + + if (!atomic_read(wait_var)) + break; + + if (signal_pending_state(state, current)) { + timeout = -ERESTARTSYS; + break; + } + + if (!timeout) { + timeout = -ETIME; + break; + } + + timeout = io_schedule_timeout(timeout); + } + finish_wait(&guc->ct.wq, &wait); + + return (timeout < 0) ? timeout : 0; +} + +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) +{ + bool interruptible = true; + + if (unlikely(timeout < 0)) + timeout = -timeout, interruptible = false; + + return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h, + interruptible, timeout); +} + static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err; @@ -280,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) { + atomic_inc(&guc->outstanding_submission_g2h); set_context_enabled(ce); } else if (!enabled) { clr_context_pending_enable(ce); @@ -731,7 +800,7 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); + return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); }
static int register_context(struct intel_context *ce) @@ -751,7 +820,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc, guc_id, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), + return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_DEREGISTER_CONTEXT, true); }
@@ -868,7 +937,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
static void guc_context_unpin(struct intel_context *ce) { - unpin_guc_id(ce_to_guc(ce), ce); + struct intel_guc *guc = ce_to_guc(ce); + + unpin_guc_id(guc, ce); lrc_unpin(ce); }
@@ -891,7 +962,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
intel_context_get(ce);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), + guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true); }
@@ -1433,6 +1504,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) return ce; }
+static void decr_outstanding_submission_g2h(struct intel_guc *guc) +{ + if (atomic_dec_and_test(&guc->outstanding_submission_g2h)) + wake_up_all(&guc->ct.wq); +} + int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len) @@ -1468,6 +1545,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, lrc_destroy(&ce->ref); }
+ decr_outstanding_submission_g2h(guc); + return 0; }
@@ -1516,6 +1595,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, spin_unlock_irqrestore(&ce->guc_state.lock, flags); }
+ decr_outstanding_submission_g2h(guc); intel_context_put(ce);
return 0; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index 9c954c589edf..c4cef885e984 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission); #undef uc_state_checkers #undef __uc_state_checker
+static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout) +{ + return intel_guc_wait_for_idle(&uc->guc, timeout); +} + #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \ static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \ { \ diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index cc745751ac53..277800987bf8 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -36,6 +36,7 @@ #include "gt/intel_gt_clock_utils.h" #include "gt/intel_gt.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt.h" #include "gt/intel_gt_requests.h" #include "gt/intel_reset.h" #include "gt/intel_rc6.h" diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 4d2d59a9942b..2b73ddb11c66 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -27,6 +27,7 @@ */
#include "gem/i915_gem_context.h" +#include "gt/intel_gt.h" #include "gt/intel_gt_requests.h"
#include "i915_drv.h" diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c index c130010a7033..1c721542e277 100644 --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c @@ -5,7 +5,7 @@ */
#include "i915_drv.h" -#include "gt/intel_gt_requests.h" +#include "gt/intel_gt.h"
#include "../i915_selftest.h" #include "igt_flush_test.h" diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index d189c4bd4bef..4f8180146888 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915) do { for_each_engine(engine, gt, id) mock_engine_flush(engine); - } while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT)); + } while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT, + NULL)); }
static void mock_device_release(struct drm_device *dev)
On 6/24/2021 00:04, Matthew Brost wrote:
When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero.
v2: rtimeout -> remaining_timeout
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/intel_gt.c | 19 ++++ drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 ++--- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 88 ++++++++++++++++++- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 5 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 14 files changed, 137 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 2fd155742bd2..335b955d5b4b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -644,7 +644,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, goto insert;
/* Attempt to reap some mmap space from dead objects */
- err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
- err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
if (err) goto err;NULL);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index e714e21c0a4d..acfdd53b2678 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt) GEM_BUG_ON(intel_gt_pm_is_awake(gt)); }
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) +{
- long remaining_timeout;
- /* If the device is asleep, we have no requests outstanding */
- if (!intel_gt_pm_is_awake(gt))
return 0;
- while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
&remaining_timeout)) > 0) {
cond_resched();
if (signal_pending(current))
return -EINTR;
- }
- return timeout ? timeout : intel_uc_wait_for_idle(>->uc,
remaining_timeout);
+}
- int intel_gt_init(struct intel_gt *gt) { int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index e7aabe0cc5bf..74e771871a9b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
void intel_gt_driver_late_release(struct intel_gt *gt);
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
- void intel_gt_check_and_clear_faults(struct intel_gt *gt); void intel_gt_clear_error_registers(struct intel_gt *gt, intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 647eca9d867a..39f5e824dac5 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -13,6 +13,7 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h" #include "intel_timeline.h" +#include "uc/intel_uc.h"
Why is this needed?
static bool retire_requests(struct intel_timeline *tl) { @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine) GEM_BUG_ON(engine->retire); }
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout) +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
{ struct intel_gt_timelines *timelines = >->timelines; struct intel_timeline *tl, *tn;long *remaining_timeout)
@@ -195,22 +197,10 @@ out_active: spin_lock(&timelines->lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++;
- return active_count ? timeout : 0;
-}
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) -{
- /* If the device is asleep, we have no requests outstanding */
- if (!intel_gt_pm_is_awake(gt))
return 0;
- while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
cond_resched();
if (signal_pending(current))
return -EINTR;
- }
- if (remaining_timeout)
*remaining_timeout = timeout;
- return timeout;
return active_count ? timeout : 0; }
static void retire_work_handler(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h index fcc30a6e4fe9..51dbe0e3294e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h @@ -6,14 +6,17 @@ #ifndef INTEL_GT_REQUESTS_H #define INTEL_GT_REQUESTS_H
+#include <stddef.h>
Why is this needed?
struct intel_engine_cs; struct intel_gt; struct intel_timeline;
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout); +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
static inline void intel_gt_retire_requests(struct intel_gt *gt) {long *remaining_timeout);
- intel_gt_retire_requests_timeout(gt, 0);
intel_gt_retire_requests_timeout(gt, 0, NULL); }
void intel_engine_init_retire(struct intel_engine_cs *engine);
@@ -21,8 +24,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine, struct intel_timeline *tl); void intel_engine_fini_retire(struct intel_engine_cs *engine);
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
- void intel_gt_init_requests(struct intel_gt *gt); void intel_gt_park_requests(struct intel_gt *gt); void intel_gt_unpark_requests(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 24e7a924134e..22eb1e9cca41 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -38,6 +38,8 @@ struct intel_guc { spinlock_t irq_lock; unsigned int msg_enabled_mask;
- atomic_t outstanding_submission_g2h;
- struct { void (*reset)(struct intel_guc *guc); void (*enable)(struct intel_guc *guc);
@@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) spin_unlock_irq(&guc->irq_lock); }
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
- int intel_guc_reset_engine(struct intel_guc *guc, struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a60970e85635..e0f92e28350c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct) INIT_LIST_HEAD(&ct->requests.incoming); INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func); tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
init_waitqueue_head(&ct->wq); }
static inline const char *guc_ct_buffer_type_to_str(u32 type)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 660bf37238e2..ab1b79ab960b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -10,6 +10,7 @@ #include <linux/spinlock.h> #include <linux/workqueue.h> #include <linux/ktime.h> +#include <linux/wait.h>
#include "intel_guc_fwif.h"
@@ -68,6 +69,9 @@ struct intel_guc_ct {
struct tasklet_struct receive_tasklet;
- /** @wq: wait queue for g2h chanenl */
- wait_queue_head_t wq;
- struct { u16 last_fence; /* last fence used to send request */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ef24758c4266..d1a28283a9ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -254,6 +254,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); }
+static int guc_submission_busy_loop(struct intel_guc* guc,
I think this name is misleading. It would be better as guc_submission_send_busy_loop.
const u32 *action,
u32 len,
u32 g2h_len_dw,
bool loop)
+{
- int err;
- err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
- if (!err && g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
- return err;
+}
+static int guc_wait_for_pending_msg(struct intel_guc *guc,
atomic_t *wait_var,
bool interruptible,
long timeout)
+{
- const int state = interruptible ?
TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
- DEFINE_WAIT(wait);
- might_sleep();
- GEM_BUG_ON(timeout < 0);
- if (!atomic_read(wait_var))
return 0;
- if (!timeout)
return -ETIME;
- for (;;) {
prepare_to_wait(&guc->ct.wq, &wait, state);
if (!atomic_read(wait_var))
break;
if (signal_pending_state(state, current)) {
timeout = -ERESTARTSYS;
break;
}
if (!timeout) {
timeout = -ETIME;
break;
}
timeout = io_schedule_timeout(timeout);
- }
- finish_wait(&guc->ct.wq, &wait);
- return (timeout < 0) ? timeout : 0;
+}
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) +{
- bool interruptible = true;
- if (unlikely(timeout < 0))
timeout = -timeout, interruptible = false;
Why is this a comma bridged statement rather than just two lines with braces on the if?
And overloading negative timeouts to mean non-interruptible seems unnecessarily convoluted in the first place. Why not just have an interruptible parameter? I'm also not seeing how the timeout gets to be negative in the first place?
- return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
interruptible, timeout);
+}
- static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err;
@@ -280,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) {
set_context_enabled(ce); } else if (!enabled) { clr_context_pending_enable(ce);atomic_inc(&guc->outstanding_submission_g2h);
@@ -731,7 +800,7 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); }
static int register_context(struct intel_context *ce)
@@ -751,7 +820,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc, guc_id, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_DEREGISTER_CONTEXT, true); }
@@ -868,7 +937,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr)
static void guc_context_unpin(struct intel_context *ce) {
- unpin_guc_id(ce_to_guc(ce), ce);
- struct intel_guc *guc = ce_to_guc(ce);
- unpin_guc_id(guc, ce);
Should this be part of this patch?
lrc_unpin(ce); }
@@ -891,7 +962,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
intel_context_get(ce);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
- guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true); }
@@ -1433,6 +1504,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) return ce; }
+static void decr_outstanding_submission_g2h(struct intel_guc *guc) +{
- if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
wake_up_all(&guc->ct.wq);
+}
- int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len)
@@ -1468,6 +1545,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, lrc_destroy(&ce->ref); }
- decr_outstanding_submission_g2h(guc);
- return 0; }
@@ -1516,6 +1595,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, spin_unlock_irqrestore(&ce->guc_state.lock, flags); }
decr_outstanding_submission_g2h(guc); intel_context_put(ce);
return 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index 9c954c589edf..c4cef885e984 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission); #undef uc_state_checkers #undef __uc_state_checker
+static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout) +{
- return intel_guc_wait_for_idle(&uc->guc, timeout);
+}
- #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \ static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \ { \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index cc745751ac53..277800987bf8 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -36,6 +36,7 @@ #include "gt/intel_gt_clock_utils.h" #include "gt/intel_gt.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt.h"
All of these extra includes seem incorrect. There is no code change in any of the files below that would warrant a new include.
John.
#include "gt/intel_gt_requests.h" #include "gt/intel_reset.h" #include "gt/intel_rc6.h" diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 4d2d59a9942b..2b73ddb11c66 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -27,6 +27,7 @@ */
#include "gem/i915_gem_context.h" +#include "gt/intel_gt.h" #include "gt/intel_gt_requests.h"
#include "i915_drv.h" diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c index c130010a7033..1c721542e277 100644 --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c @@ -5,7 +5,7 @@ */
#include "i915_drv.h" -#include "gt/intel_gt_requests.h" +#include "gt/intel_gt.h"
#include "../i915_selftest.h" #include "igt_flush_test.h" diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index d189c4bd4bef..4f8180146888 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915) do { for_each_engine(engine, gt, id) mock_engine_flush(engine);
- } while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
NULL));
}
static void mock_device_release(struct drm_device *dev)
On Fri, Jul 09, 2021 at 05:16:34PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero.
v2: rtimeout -> remaining_timeout
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/intel_gt.c | 19 ++++ drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 ++--- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 88 ++++++++++++++++++- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 5 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 14 files changed, 137 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 2fd155742bd2..335b955d5b4b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -644,7 +644,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, goto insert; /* Attempt to reap some mmap space from dead objects */
- err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
- err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
if (err) goto err;NULL);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index e714e21c0a4d..acfdd53b2678 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt) GEM_BUG_ON(intel_gt_pm_is_awake(gt)); } +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) +{
- long remaining_timeout;
- /* If the device is asleep, we have no requests outstanding */
- if (!intel_gt_pm_is_awake(gt))
return 0;
- while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
&remaining_timeout)) > 0) {
cond_resched();
if (signal_pending(current))
return -EINTR;
- }
- return timeout ? timeout : intel_uc_wait_for_idle(>->uc,
remaining_timeout);
+}
- int intel_gt_init(struct intel_gt *gt) { int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index e7aabe0cc5bf..74e771871a9b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt); void intel_gt_driver_late_release(struct intel_gt *gt); +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
- void intel_gt_check_and_clear_faults(struct intel_gt *gt); void intel_gt_clear_error_registers(struct intel_gt *gt, intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 647eca9d867a..39f5e824dac5 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -13,6 +13,7 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h" #include "intel_timeline.h" +#include "uc/intel_uc.h"
Why is this needed?
It is not, likely holdover from internal churn.
static bool retire_requests(struct intel_timeline *tl) { @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine) GEM_BUG_ON(engine->retire); } -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout) +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
{ struct intel_gt_timelines *timelines = >->timelines; struct intel_timeline *tl, *tn;long *remaining_timeout)
@@ -195,22 +197,10 @@ out_active: spin_lock(&timelines->lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++;
- return active_count ? timeout : 0;
-}
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) -{
- /* If the device is asleep, we have no requests outstanding */
- if (!intel_gt_pm_is_awake(gt))
return 0;
- while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
cond_resched();
if (signal_pending(current))
return -EINTR;
- }
- if (remaining_timeout)
*remaining_timeout = timeout;
- return timeout;
- return active_count ? timeout : 0; } static void retire_work_handler(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h index fcc30a6e4fe9..51dbe0e3294e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h @@ -6,14 +6,17 @@ #ifndef INTEL_GT_REQUESTS_H #define INTEL_GT_REQUESTS_H +#include <stddef.h>
Why is this needed?
I swear I needed stddef.h for NULL on a different machice of mine. It seems to be quite happy without it on my current machine. Can remove.
struct intel_engine_cs; struct intel_gt; struct intel_timeline; -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout); +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
static inline void intel_gt_retire_requests(struct intel_gt *gt) {long *remaining_timeout);
- intel_gt_retire_requests_timeout(gt, 0);
- intel_gt_retire_requests_timeout(gt, 0, NULL); } void intel_engine_init_retire(struct intel_engine_cs *engine);
@@ -21,8 +24,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine, struct intel_timeline *tl); void intel_engine_fini_retire(struct intel_engine_cs *engine); -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
- void intel_gt_init_requests(struct intel_gt *gt); void intel_gt_park_requests(struct intel_gt *gt); void intel_gt_unpark_requests(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 24e7a924134e..22eb1e9cca41 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -38,6 +38,8 @@ struct intel_guc { spinlock_t irq_lock; unsigned int msg_enabled_mask;
- atomic_t outstanding_submission_g2h;
- struct { void (*reset)(struct intel_guc *guc); void (*enable)(struct intel_guc *guc);
@@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) spin_unlock_irq(&guc->irq_lock); } +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
- int intel_guc_reset_engine(struct intel_guc *guc, struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a60970e85635..e0f92e28350c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct) INIT_LIST_HEAD(&ct->requests.incoming); INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func); tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
- init_waitqueue_head(&ct->wq); } static inline const char *guc_ct_buffer_type_to_str(u32 type)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 660bf37238e2..ab1b79ab960b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -10,6 +10,7 @@ #include <linux/spinlock.h> #include <linux/workqueue.h> #include <linux/ktime.h> +#include <linux/wait.h> #include "intel_guc_fwif.h" @@ -68,6 +69,9 @@ struct intel_guc_ct { struct tasklet_struct receive_tasklet;
- /** @wq: wait queue for g2h chanenl */
- wait_queue_head_t wq;
- struct { u16 last_fence; /* last fence used to send request */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ef24758c4266..d1a28283a9ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -254,6 +254,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); } +static int guc_submission_busy_loop(struct intel_guc* guc,
I think this name is misleading. It would be better as guc_submission_send_busy_loop.
Yep, better name. Will fix.
const u32 *action,
u32 len,
u32 g2h_len_dw,
bool loop)
+{
- int err;
- err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
- if (!err && g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
- return err;
+}
+static int guc_wait_for_pending_msg(struct intel_guc *guc,
atomic_t *wait_var,
bool interruptible,
long timeout)
+{
- const int state = interruptible ?
TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
- DEFINE_WAIT(wait);
- might_sleep();
- GEM_BUG_ON(timeout < 0);
- if (!atomic_read(wait_var))
return 0;
- if (!timeout)
return -ETIME;
- for (;;) {
prepare_to_wait(&guc->ct.wq, &wait, state);
if (!atomic_read(wait_var))
break;
if (signal_pending_state(state, current)) {
timeout = -ERESTARTSYS;
break;
}
if (!timeout) {
timeout = -ETIME;
break;
}
timeout = io_schedule_timeout(timeout);
- }
- finish_wait(&guc->ct.wq, &wait);
- return (timeout < 0) ? timeout : 0;
+}
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) +{
- bool interruptible = true;
- if (unlikely(timeout < 0))
timeout = -timeout, interruptible = false;
Why is this a comma bridged statement rather than just two lines with braces on the if?
And overloading negative timeouts to mean non-interruptible seems unnecessarily convoluted in the first place. Why not just have an interruptible parameter? I'm also not seeing how the timeout gets to be negative in the first place?
Copy paste from some other code, can remove the comma and replace with 2 lines.
This is how intel_gt_wait_for_idle works which in turn calls this. Not saying the negative parameter meaning something special is right, just how it is currently done. Now that you mention this with the remaining_timeout I may have broken this too. How about I just add parameter than this convoluted sceme as you suggest.
- return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
interruptible, timeout);
+}
- static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err;
@@ -280,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) {
set_context_enabled(ce); } else if (!enabled) { clr_context_pending_enable(ce);atomic_inc(&guc->outstanding_submission_g2h);
@@ -731,7 +800,7 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); } static int register_context(struct intel_context *ce)
@@ -751,7 +820,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc, guc_id, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_DEREGISTER_CONTEXT, true); }
@@ -868,7 +937,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr) static void guc_context_unpin(struct intel_context *ce) {
- unpin_guc_id(ce_to_guc(ce), ce);
- struct intel_guc *guc = ce_to_guc(ce);
- unpin_guc_id(guc, ce);
Should this be part of this patch?
Not likely. Let me see what is going on here.
lrc_unpin(ce); } @@ -891,7 +962,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc, intel_context_get(ce);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
- guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true); }
@@ -1433,6 +1504,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) return ce; } +static void decr_outstanding_submission_g2h(struct intel_guc *guc) +{
- if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
wake_up_all(&guc->ct.wq);
+}
- int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len)
@@ -1468,6 +1545,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, lrc_destroy(&ce->ref); }
- decr_outstanding_submission_g2h(guc);
- return 0; }
@@ -1516,6 +1595,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, spin_unlock_irqrestore(&ce->guc_state.lock, flags); }
- decr_outstanding_submission_g2h(guc); intel_context_put(ce); return 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index 9c954c589edf..c4cef885e984 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission); #undef uc_state_checkers #undef __uc_state_checker +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout) +{
- return intel_guc_wait_for_idle(&uc->guc, timeout);
+}
- #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \ static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \ { \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index cc745751ac53..277800987bf8 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -36,6 +36,7 @@ #include "gt/intel_gt_clock_utils.h" #include "gt/intel_gt.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt.h"
All of these extra includes seem incorrect. There is no code change in any of the files below that would warrant a new include.
Well this is surely wrong as it is included two lines above. Will fix.
Matt
John.
#include "gt/intel_gt_requests.h" #include "gt/intel_reset.h" #include "gt/intel_rc6.h" diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 4d2d59a9942b..2b73ddb11c66 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -27,6 +27,7 @@ */ #include "gem/i915_gem_context.h" +#include "gt/intel_gt.h" #include "gt/intel_gt_requests.h" #include "i915_drv.h" diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c index c130010a7033..1c721542e277 100644 --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c @@ -5,7 +5,7 @@ */ #include "i915_drv.h" -#include "gt/intel_gt_requests.h" +#include "gt/intel_gt.h" #include "../i915_selftest.h" #include "igt_flush_test.h" diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index d189c4bd4bef..4f8180146888 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915) do { for_each_engine(engine, gt, id) mock_engine_flush(engine);
- } while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
- } while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
} static void mock_device_release(struct drm_device *dev)NULL));
On Sat, Jul 10, 2021 at 03:55:02AM +0000, Matthew Brost wrote:
On Fri, Jul 09, 2021 at 05:16:34PM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
When running the GuC the GPU can't be considered idle if the GuC still has contexts pinned. As such, a call has been added in intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for the number of unpinned contexts to go to zero.
v2: rtimeout -> remaining_timeout
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/intel_gt.c | 19 ++++ drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 ++--- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 4 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 4 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 88 ++++++++++++++++++- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 5 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 14 files changed, 137 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 2fd155742bd2..335b955d5b4b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -644,7 +644,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, goto insert; /* Attempt to reap some mmap space from dead objects */
- err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
- err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
if (err) goto err;NULL);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index e714e21c0a4d..acfdd53b2678 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt) GEM_BUG_ON(intel_gt_pm_is_awake(gt)); } +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) +{
- long remaining_timeout;
- /* If the device is asleep, we have no requests outstanding */
- if (!intel_gt_pm_is_awake(gt))
return 0;
- while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
&remaining_timeout)) > 0) {
cond_resched();
if (signal_pending(current))
return -EINTR;
- }
- return timeout ? timeout : intel_uc_wait_for_idle(>->uc,
remaining_timeout);
+}
- int intel_gt_init(struct intel_gt *gt) { int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index e7aabe0cc5bf..74e771871a9b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt); void intel_gt_driver_late_release(struct intel_gt *gt); +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
- void intel_gt_check_and_clear_faults(struct intel_gt *gt); void intel_gt_clear_error_registers(struct intel_gt *gt, intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c index 647eca9d867a..39f5e824dac5 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c @@ -13,6 +13,7 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h" #include "intel_timeline.h" +#include "uc/intel_uc.h"
Why is this needed?
It is not, likely holdover from internal churn.
static bool retire_requests(struct intel_timeline *tl) { @@ -130,7 +131,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine) GEM_BUG_ON(engine->retire); } -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout) +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
{ struct intel_gt_timelines *timelines = >->timelines; struct intel_timeline *tl, *tn;long *remaining_timeout)
@@ -195,22 +197,10 @@ out_active: spin_lock(&timelines->lock); if (flush_submission(gt, timeout)) /* Wait, there's more! */ active_count++;
- return active_count ? timeout : 0;
-}
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout) -{
- /* If the device is asleep, we have no requests outstanding */
- if (!intel_gt_pm_is_awake(gt))
return 0;
- while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
cond_resched();
if (signal_pending(current))
return -EINTR;
- }
- if (remaining_timeout)
*remaining_timeout = timeout;
- return timeout;
- return active_count ? timeout : 0; } static void retire_work_handler(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h index fcc30a6e4fe9..51dbe0e3294e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h @@ -6,14 +6,17 @@ #ifndef INTEL_GT_REQUESTS_H #define INTEL_GT_REQUESTS_H +#include <stddef.h>
Why is this needed?
I swear I needed stddef.h for NULL on a different machice of mine. It seems to be quite happy without it on my current machine. Can remove.
And it in fact does need to be included. See CI results below: https://patchwork.freedesktop.org/series/91840/#rev3
Matt
struct intel_engine_cs; struct intel_gt; struct intel_timeline; -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout); +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
static inline void intel_gt_retire_requests(struct intel_gt *gt) {long *remaining_timeout);
- intel_gt_retire_requests_timeout(gt, 0);
- intel_gt_retire_requests_timeout(gt, 0, NULL); } void intel_engine_init_retire(struct intel_engine_cs *engine);
@@ -21,8 +24,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine, struct intel_timeline *tl); void intel_engine_fini_retire(struct intel_engine_cs *engine); -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
- void intel_gt_init_requests(struct intel_gt *gt); void intel_gt_park_requests(struct intel_gt *gt); void intel_gt_unpark_requests(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 24e7a924134e..22eb1e9cca41 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -38,6 +38,8 @@ struct intel_guc { spinlock_t irq_lock; unsigned int msg_enabled_mask;
- atomic_t outstanding_submission_g2h;
- struct { void (*reset)(struct intel_guc *guc); void (*enable)(struct intel_guc *guc);
@@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) spin_unlock_irq(&guc->irq_lock); } +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
- int intel_guc_reset_engine(struct intel_guc *guc, struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a60970e85635..e0f92e28350c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct) INIT_LIST_HEAD(&ct->requests.incoming); INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func); tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
- init_waitqueue_head(&ct->wq); } static inline const char *guc_ct_buffer_type_to_str(u32 type)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index 660bf37238e2..ab1b79ab960b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -10,6 +10,7 @@ #include <linux/spinlock.h> #include <linux/workqueue.h> #include <linux/ktime.h> +#include <linux/wait.h> #include "intel_guc_fwif.h" @@ -68,6 +69,9 @@ struct intel_guc_ct { struct tasklet_struct receive_tasklet;
- /** @wq: wait queue for g2h chanenl */
- wait_queue_head_t wq;
- struct { u16 last_fence; /* last fence used to send request */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ef24758c4266..d1a28283a9ae 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -254,6 +254,74 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); } +static int guc_submission_busy_loop(struct intel_guc* guc,
I think this name is misleading. It would be better as guc_submission_send_busy_loop.
Yep, better name. Will fix.
const u32 *action,
u32 len,
u32 g2h_len_dw,
bool loop)
+{
- int err;
- err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
- if (!err && g2h_len_dw)
atomic_inc(&guc->outstanding_submission_g2h);
- return err;
+}
+static int guc_wait_for_pending_msg(struct intel_guc *guc,
atomic_t *wait_var,
bool interruptible,
long timeout)
+{
- const int state = interruptible ?
TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
- DEFINE_WAIT(wait);
- might_sleep();
- GEM_BUG_ON(timeout < 0);
- if (!atomic_read(wait_var))
return 0;
- if (!timeout)
return -ETIME;
- for (;;) {
prepare_to_wait(&guc->ct.wq, &wait, state);
if (!atomic_read(wait_var))
break;
if (signal_pending_state(state, current)) {
timeout = -ERESTARTSYS;
break;
}
if (!timeout) {
timeout = -ETIME;
break;
}
timeout = io_schedule_timeout(timeout);
- }
- finish_wait(&guc->ct.wq, &wait);
- return (timeout < 0) ? timeout : 0;
+}
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) +{
- bool interruptible = true;
- if (unlikely(timeout < 0))
timeout = -timeout, interruptible = false;
Why is this a comma bridged statement rather than just two lines with braces on the if?
And overloading negative timeouts to mean non-interruptible seems unnecessarily convoluted in the first place. Why not just have an interruptible parameter? I'm also not seeing how the timeout gets to be negative in the first place?
Copy paste from some other code, can remove the comma and replace with 2 lines.
This is how intel_gt_wait_for_idle works which in turn calls this. Not saying the negative parameter meaning something special is right, just how it is currently done. Now that you mention this with the remaining_timeout I may have broken this too. How about I just add parameter than this convoluted sceme as you suggest.
- return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
interruptible, timeout);
+}
- static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err;
@@ -280,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) {
set_context_enabled(ce); } else if (!enabled) { clr_context_pending_enable(ce);atomic_inc(&guc->outstanding_submission_g2h);
@@ -731,7 +800,7 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); } static int register_context(struct intel_context *ce)
@@ -751,7 +820,7 @@ static int __guc_action_deregister_context(struct intel_guc *guc, guc_id, };
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_DEREGISTER_CONTEXT, true); }
@@ -868,7 +937,9 @@ static int guc_context_pin(struct intel_context *ce, void *vaddr) static void guc_context_unpin(struct intel_context *ce) {
- unpin_guc_id(ce_to_guc(ce), ce);
- struct intel_guc *guc = ce_to_guc(ce);
- unpin_guc_id(guc, ce);
Should this be part of this patch?
Not likely. Let me see what is going on here.
lrc_unpin(ce); } @@ -891,7 +962,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc, intel_context_get(ce);
- intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
- guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true); }
@@ -1433,6 +1504,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx) return ce; } +static void decr_outstanding_submission_g2h(struct intel_guc *guc) +{
- if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
wake_up_all(&guc->ct.wq);
+}
- int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len)
@@ -1468,6 +1545,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, lrc_destroy(&ce->ref); }
- decr_outstanding_submission_g2h(guc);
- return 0; }
@@ -1516,6 +1595,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, spin_unlock_irqrestore(&ce->guc_state.lock, flags); }
- decr_outstanding_submission_g2h(guc); intel_context_put(ce); return 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index 9c954c589edf..c4cef885e984 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission); #undef uc_state_checkers #undef __uc_state_checker +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout) +{
- return intel_guc_wait_for_idle(&uc->guc, timeout);
+}
- #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \ static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \ { \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index cc745751ac53..277800987bf8 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -36,6 +36,7 @@ #include "gt/intel_gt_clock_utils.h" #include "gt/intel_gt.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt.h"
All of these extra includes seem incorrect. There is no code change in any of the files below that would warrant a new include.
Well this is surely wrong as it is included two lines above. Will fix.
Matt
John.
#include "gt/intel_gt_requests.h" #include "gt/intel_reset.h" #include "gt/intel_rc6.h" diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 4d2d59a9942b..2b73ddb11c66 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -27,6 +27,7 @@ */ #include "gem/i915_gem_context.h" +#include "gt/intel_gt.h" #include "gt/intel_gt_requests.h" #include "i915_drv.h" diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c index c130010a7033..1c721542e277 100644 --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c @@ -5,7 +5,7 @@ */ #include "i915_drv.h" -#include "gt/intel_gt_requests.h" +#include "gt/intel_gt.h" #include "../i915_selftest.h" #include "igt_flush_test.h" diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index d189c4bd4bef..4f8180146888 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915) do { for_each_engine(engine, gt, id) mock_engine_flush(engine);
- } while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
- } while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
} static void mock_device_release(struct drm_device *dev)NULL));
Update GuC debugfs to support the new GuC structures.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 23 +++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++++++++++++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 4 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + 6 files changed, 104 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index e0f92e28350c..4ed074df88e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
ct_try_receive_message(ct); } + +void intel_guc_log_ct_info(struct intel_guc_ct *ct, + struct drm_printer *p) +{ + if (!ct->enabled) { + drm_puts(p, "CT disabled\n"); + return; + } + + drm_printf(p, "H2G Space: %u\n", + atomic_read(&ct->ctbs.send.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.send.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.send.desc->tail); + drm_printf(p, "G2H Space: %u\n", + atomic_read(&ct->ctbs.recv.space) * 4); + drm_printf(p, "Head: %u\n", + ct->ctbs.recv.desc->head); + drm_printf(p, "Tail: %u\n", + ct->ctbs.recv.desc->tail); +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index ab1b79ab960b..f62eb06b32fc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -16,6 +16,7 @@
struct i915_vma; struct intel_guc; +struct drm_printer;
/** * DOC: Command Transport (CT). @@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
+void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p); + #endif /* _INTEL_GUC_CT_H_ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..62b9ce0fafaa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,8 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_submission.h"
static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p);
- /* Add more as required ... */ + if (!intel_guc_submission_is_used(guc)) + return 0; + + intel_guc_log_ct_info(&guc->ct, &p); + intel_guc_log_submission_info(guc, &p);
return 0; } DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
+static int guc_registered_contexts_show(struct seq_file *m, void *data) +{ + struct intel_guc *guc = m->private; + struct drm_printer p = drm_seq_file_printer(m); + + if (!intel_guc_submission_is_used(guc)) + return -ENODEV; + + intel_guc_log_context_info(guc, &p); + + return 0; +} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts); + void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL }, + { "guc_registered_contexts", &guc_registered_contexts_fops, NULL }, };
if (!intel_guc_is_supported(guc)) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1a28283a9ae..89b3c7e5d15b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
return 0; } + +void intel_guc_log_submission_info(struct intel_guc *guc, + struct drm_printer *p) +{ + struct i915_sched_engine *sched_engine = guc->sched_engine; + struct rb_node *rb; + unsigned long flags; + + drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n", + atomic_read(&guc->outstanding_submission_g2h)); + drm_printf(p, "GuC tasklet count: %u\n\n", + atomic_read(&sched_engine->tasklet.count)); + + spin_lock_irqsave(&sched_engine->lock, flags); + drm_printf(p, "Requests in GuC submit tasklet:\n"); + for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) { + struct i915_priolist *pl = to_priolist(rb); + struct i915_request *rq; + + priolist_for_each_request(rq, pl) + drm_printf(p, "guc_id=%u, seqno=%llu\n", + rq->context->guc_id, + rq->fence.seqno); + } + spin_unlock_irqrestore(&sched_engine->lock, flags); + drm_printf(p, "\n"); +} + +void intel_guc_log_context_info(struct intel_guc *guc, + struct drm_printer *p) +{ + struct intel_context *ce; + unsigned long index; + + xa_for_each(&guc->context_lookup, index, ce) { + drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id); + drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca); + drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n", + ce->ring->head, + ce->lrc_reg_state[CTX_RING_HEAD]); + drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n", + ce->ring->tail, + ce->lrc_reg_state[CTX_RING_TAIL]); + drm_printf(p, "\t\tContext Pin Count: %u\n", + atomic_read(&ce->pin_count)); + drm_printf(p, "\t\tGuC ID Ref Count: %u\n", + atomic_read(&ce->guc_id_ref)); + drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n", + ce->guc_state.sched_state, + atomic_read(&ce->guc_sched_state_no_lock)); + } +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 3f7005018939..6453e2bfa151 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -10,6 +10,7 @@
#include "intel_guc.h"
+struct drm_printer; struct intel_engine_cs;
void intel_guc_submission_init_early(struct intel_guc *guc); @@ -20,6 +21,9 @@ void intel_guc_submission_fini(struct intel_guc *guc); int intel_guc_preempt_work_create(struct intel_guc *guc); void intel_guc_preempt_work_destroy(struct intel_guc *guc); int intel_guc_submission_setup(struct intel_engine_cs *engine); +void intel_guc_log_submission_info(struct intel_guc *guc, + struct drm_printer *p); +void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 277800987bf8..a9084789deff 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -50,6 +50,7 @@ #include "i915_trace.h" #include "intel_pm.h" #include "intel_sideband.h" +#include "gt/intel_lrc_reg.h"
static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node) {
On 6/24/2021 00:04, Matthew Brost wrote:
Update GuC debugfs to support the new GuC structures.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 23 +++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++++++++++++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 4 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + 6 files changed, 104 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index e0f92e28350c..4ed074df88e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
ct_try_receive_message(ct); }
+void intel_guc_log_ct_info(struct intel_guc_ct *ct,
struct drm_printer *p)
+{
- if (!ct->enabled) {
drm_puts(p, "CT disabled\n");
return;
- }
- drm_printf(p, "H2G Space: %u\n",
atomic_read(&ct->ctbs.send.space) * 4);
- drm_printf(p, "Head: %u\n",
ct->ctbs.send.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.send.desc->tail);
- drm_printf(p, "G2H Space: %u\n",
atomic_read(&ct->ctbs.recv.space) * 4);
- drm_printf(p, "Head: %u\n",
ct->ctbs.recv.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.recv.desc->tail);
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index ab1b79ab960b..f62eb06b32fc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -16,6 +16,7 @@
struct i915_vma; struct intel_guc; +struct drm_printer;
/**
- DOC: Command Transport (CT).
@@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
+void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);
- #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..62b9ce0fafaa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,8 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_submission.h"
static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p);
- /* Add more as required ... */
if (!intel_guc_submission_is_used(guc))
return 0;
intel_guc_log_ct_info(&guc->ct, &p);
intel_guc_log_submission_info(guc, &p);
return 0; } DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
+static int guc_registered_contexts_show(struct seq_file *m, void *data) +{
- struct intel_guc *guc = m->private;
- struct drm_printer p = drm_seq_file_printer(m);
- if (!intel_guc_submission_is_used(guc))
return -ENODEV;
- intel_guc_log_context_info(guc, &p);
- return 0;
+} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL },
{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
};
if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1a28283a9ae..89b3c7e5d15b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
return 0; }
+void intel_guc_log_submission_info(struct intel_guc *guc,
struct drm_printer *p)
+{
- struct i915_sched_engine *sched_engine = guc->sched_engine;
- struct rb_node *rb;
- unsigned long flags;
- drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
atomic_read(&guc->outstanding_submission_g2h));
- drm_printf(p, "GuC tasklet count: %u\n\n",
atomic_read(&sched_engine->tasklet.count));
Does sched_engine need a null check?
- spin_lock_irqsave(&sched_engine->lock, flags);
- drm_printf(p, "Requests in GuC submit tasklet:\n");
- for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
struct i915_priolist *pl = to_priolist(rb);
struct i915_request *rq;
priolist_for_each_request(rq, pl)
drm_printf(p, "guc_id=%u, seqno=%llu\n",
rq->context->guc_id,
rq->fence.seqno);
- }
- spin_unlock_irqrestore(&sched_engine->lock, flags);
- drm_printf(p, "\n");
+}
+void intel_guc_log_context_info(struct intel_guc *guc,
struct drm_printer *p)
+{
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce) {
drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
ce->ring->head,
ce->lrc_reg_state[CTX_RING_HEAD]);
drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
ce->ring->tail,
ce->lrc_reg_state[CTX_RING_TAIL]);
drm_printf(p, "\t\tContext Pin Count: %u\n",
atomic_read(&ce->pin_count));
drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
atomic_read(&ce->guc_id_ref));
drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
ce->guc_state.sched_state,
atomic_read(&ce->guc_sched_state_no_lock));
- }
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 3f7005018939..6453e2bfa151 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -10,6 +10,7 @@
#include "intel_guc.h"
+struct drm_printer; struct intel_engine_cs;
void intel_guc_submission_init_early(struct intel_guc *guc); @@ -20,6 +21,9 @@ void intel_guc_submission_fini(struct intel_guc *guc); int intel_guc_preempt_work_create(struct intel_guc *guc); void intel_guc_preempt_work_destroy(struct intel_guc *guc); int intel_guc_submission_setup(struct intel_engine_cs *engine); +void intel_guc_log_submission_info(struct intel_guc *guc,
struct drm_printer *p);
+void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 277800987bf8..a9084789deff 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -50,6 +50,7 @@ #include "i915_trace.h" #include "intel_pm.h" #include "intel_sideband.h" +#include "gt/intel_lrc_reg.h"
Obsolete include again?
John.
static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node) {
On Mon, Jul 12, 2021 at 11:05:59AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Update GuC debugfs to support the new GuC structures.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 23 +++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++++++++++++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 4 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + 6 files changed, 104 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index e0f92e28350c..4ed074df88e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct) ct_try_receive_message(ct); }
+void intel_guc_log_ct_info(struct intel_guc_ct *ct,
struct drm_printer *p)
+{
- if (!ct->enabled) {
drm_puts(p, "CT disabled\n");
return;
- }
- drm_printf(p, "H2G Space: %u\n",
atomic_read(&ct->ctbs.send.space) * 4);
- drm_printf(p, "Head: %u\n",
ct->ctbs.send.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.send.desc->tail);
- drm_printf(p, "G2H Space: %u\n",
atomic_read(&ct->ctbs.recv.space) * 4);
- drm_printf(p, "Head: %u\n",
ct->ctbs.recv.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.recv.desc->tail);
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index ab1b79ab960b..f62eb06b32fc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -16,6 +16,7 @@ struct i915_vma; struct intel_guc; +struct drm_printer; /**
- DOC: Command Transport (CT).
@@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct); +void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);
- #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..62b9ce0fafaa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,8 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_submission.h" static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p);
- /* Add more as required ... */
- if (!intel_guc_submission_is_used(guc))
return 0;
- intel_guc_log_ct_info(&guc->ct, &p);
- intel_guc_log_submission_info(guc, &p); return 0; } DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
+static int guc_registered_contexts_show(struct seq_file *m, void *data) +{
- struct intel_guc *guc = m->private;
- struct drm_printer p = drm_seq_file_printer(m);
- if (!intel_guc_submission_is_used(guc))
return -ENODEV;
- intel_guc_log_context_info(guc, &p);
- return 0;
+} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
- void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL },
}; if (!intel_guc_is_supported(guc)){ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1a28283a9ae..89b3c7e5d15b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; }
+void intel_guc_log_submission_info(struct intel_guc *guc,
struct drm_printer *p)
+{
- struct i915_sched_engine *sched_engine = guc->sched_engine;
- struct rb_node *rb;
- unsigned long flags;
- drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
atomic_read(&guc->outstanding_submission_g2h));
- drm_printf(p, "GuC tasklet count: %u\n\n",
atomic_read(&sched_engine->tasklet.count));
Does sched_engine need a null check?
Yes it does. Have this fixed locally already.
- spin_lock_irqsave(&sched_engine->lock, flags);
- drm_printf(p, "Requests in GuC submit tasklet:\n");
- for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
struct i915_priolist *pl = to_priolist(rb);
struct i915_request *rq;
priolist_for_each_request(rq, pl)
drm_printf(p, "guc_id=%u, seqno=%llu\n",
rq->context->guc_id,
rq->fence.seqno);
- }
- spin_unlock_irqrestore(&sched_engine->lock, flags);
- drm_printf(p, "\n");
+}
+void intel_guc_log_context_info(struct intel_guc *guc,
struct drm_printer *p)
+{
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce) {
drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
ce->ring->head,
ce->lrc_reg_state[CTX_RING_HEAD]);
drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
ce->ring->tail,
ce->lrc_reg_state[CTX_RING_TAIL]);
drm_printf(p, "\t\tContext Pin Count: %u\n",
atomic_read(&ce->pin_count));
drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
atomic_read(&ce->guc_id_ref));
drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
ce->guc_state.sched_state,
atomic_read(&ce->guc_sched_state_no_lock));
- }
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 3f7005018939..6453e2bfa151 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -10,6 +10,7 @@ #include "intel_guc.h" +struct drm_printer; struct intel_engine_cs; void intel_guc_submission_init_early(struct intel_guc *guc); @@ -20,6 +21,9 @@ void intel_guc_submission_fini(struct intel_guc *guc); int intel_guc_preempt_work_create(struct intel_guc *guc); void intel_guc_preempt_work_destroy(struct intel_guc *guc); int intel_guc_submission_setup(struct intel_engine_cs *engine); +void intel_guc_log_submission_info(struct intel_guc *guc,
struct drm_printer *p);
+void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p); static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 277800987bf8..a9084789deff 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -50,6 +50,7 @@ #include "i915_trace.h" #include "intel_pm.h" #include "intel_sideband.h" +#include "gt/intel_lrc_reg.h"
Obsolete include again?
CTX_RING_TAIL /CTX_RING_HEAD are in gt/intel_lrc_reg.h
Matt
John.
static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node) {
On 7/12/2021 13:59, Matthew Brost wrote:
On Mon, Jul 12, 2021 at 11:05:59AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Update GuC debugfs to support the new GuC structures.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 23 +++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++++++++++++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 4 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + 6 files changed, 104 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index e0f92e28350c..4ed074df88e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct) ct_try_receive_message(ct); }
+void intel_guc_log_ct_info(struct intel_guc_ct *ct,
struct drm_printer *p)
+{
- if (!ct->enabled) {
drm_puts(p, "CT disabled\n");
return;
- }
- drm_printf(p, "H2G Space: %u\n",
atomic_read(&ct->ctbs.send.space) * 4);
- drm_printf(p, "Head: %u\n",
ct->ctbs.send.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.send.desc->tail);
- drm_printf(p, "G2H Space: %u\n",
atomic_read(&ct->ctbs.recv.space) * 4);
- drm_printf(p, "Head: %u\n",
ct->ctbs.recv.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.recv.desc->tail);
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index ab1b79ab960b..f62eb06b32fc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -16,6 +16,7 @@ struct i915_vma; struct intel_guc; +struct drm_printer; /** * DOC: Command Transport (CT). @@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct); +void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);
- #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..62b9ce0fafaa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,8 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_submission.h" static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p);
- /* Add more as required ... */
- if (!intel_guc_submission_is_used(guc))
return 0;
- intel_guc_log_ct_info(&guc->ct, &p);
- intel_guc_log_submission_info(guc, &p); return 0; } DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
+static int guc_registered_contexts_show(struct seq_file *m, void *data) +{
- struct intel_guc *guc = m->private;
- struct drm_printer p = drm_seq_file_printer(m);
- if (!intel_guc_submission_is_used(guc))
return -ENODEV;
- intel_guc_log_context_info(guc, &p);
- return 0;
+} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
- void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL },
}; if (!intel_guc_is_supported(guc)){ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1a28283a9ae..89b3c7e5d15b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; }
+void intel_guc_log_submission_info(struct intel_guc *guc,
struct drm_printer *p)
+{
- struct i915_sched_engine *sched_engine = guc->sched_engine;
- struct rb_node *rb;
- unsigned long flags;
- drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
atomic_read(&guc->outstanding_submission_g2h));
- drm_printf(p, "GuC tasklet count: %u\n\n",
atomic_read(&sched_engine->tasklet.count));
Does sched_engine need a null check?
Yes it does. Have this fixed locally already.
- spin_lock_irqsave(&sched_engine->lock, flags);
- drm_printf(p, "Requests in GuC submit tasklet:\n");
- for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
struct i915_priolist *pl = to_priolist(rb);
struct i915_request *rq;
priolist_for_each_request(rq, pl)
drm_printf(p, "guc_id=%u, seqno=%llu\n",
rq->context->guc_id,
rq->fence.seqno);
- }
- spin_unlock_irqrestore(&sched_engine->lock, flags);
- drm_printf(p, "\n");
+}
+void intel_guc_log_context_info(struct intel_guc *guc,
struct drm_printer *p)
+{
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce) {
drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
ce->ring->head,
ce->lrc_reg_state[CTX_RING_HEAD]);
drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
ce->ring->tail,
ce->lrc_reg_state[CTX_RING_TAIL]);
drm_printf(p, "\t\tContext Pin Count: %u\n",
atomic_read(&ce->pin_count));
drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
atomic_read(&ce->guc_id_ref));
drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
ce->guc_state.sched_state,
atomic_read(&ce->guc_sched_state_no_lock));
- }
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 3f7005018939..6453e2bfa151 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -10,6 +10,7 @@ #include "intel_guc.h" +struct drm_printer; struct intel_engine_cs; void intel_guc_submission_init_early(struct intel_guc *guc); @@ -20,6 +21,9 @@ void intel_guc_submission_fini(struct intel_guc *guc); int intel_guc_preempt_work_create(struct intel_guc *guc); void intel_guc_preempt_work_destroy(struct intel_guc *guc); int intel_guc_submission_setup(struct intel_engine_cs *engine); +void intel_guc_log_submission_info(struct intel_guc *guc,
struct drm_printer *p);
+void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p); static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 277800987bf8..a9084789deff 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -50,6 +50,7 @@ #include "i915_trace.h" #include "intel_pm.h" #include "intel_sideband.h" +#include "gt/intel_lrc_reg.h"
Obsolete include again?
CTX_RING_TAIL /CTX_RING_HEAD are in gt/intel_lrc_reg.h
Matt
But those are not being added to i915_debugfs.c, only to intel_guc_submission.c. So why is the include being added here?
John.
John.
static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node) {
On 24.06.2021 09:04, Matthew Brost wrote:
Update GuC debugfs to support the new GuC structures.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 23 +++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++++++++++++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 4 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + 6 files changed, 104 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index e0f92e28350c..4ed074df88e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
ct_try_receive_message(ct); }
+void intel_guc_log_ct_info(struct intel_guc_ct *ct,
this is not "guc log" function, it is "guc ct" one, so:
void intel_guc_ct_print_info(struct intel_guc_ct *ct,
struct drm_printer *p)
+{
- if (!ct->enabled) {
drm_puts(p, "CT disabled\n");
nit: maybe
drm_puts(p, "CT %s\n", enableddisabled(false));
return;
- }
- drm_printf(p, "H2G Space: %u\n",
atomic_read(&ct->ctbs.send.space) * 4);
don't you want to print size ? or GGTT offset ?
- drm_printf(p, "Head: %u\n",
ct->ctbs.send.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.send.desc->tail);
- drm_printf(p, "G2H Space: %u\n",
atomic_read(&ct->ctbs.recv.space) * 4);
- drm_printf(p, "Head: %u\n",
ct->ctbs.recv.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.recv.desc->tail);
hmm, what about adding helper:
static void dump_ctb(struct intel_guc_ct_buffer *ctb, *p) { drm_printf(p, "Size: %u\n", ctb->size); drm_printf(p, "Space: %u\n", atomic_read(&ctb->space) * 4); drm_printf(p, "Head: %u\n", ctb->desc->head); drm_printf(p, "Tail: %u\n", ctb->desc->tail); }
and then:
drm_printf(p, "H2G:\n"); dump_ctb(&ct->ctbs.send, p); drm_printf(p, "G2H:\n"); dump_ctb(&ct->ctbs.recv, p);
or
dump_ctb(&ct->ctbs.send, "H2G", p); dump_ctb(&ct->ctbs.recv, "G2H", p);
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index ab1b79ab960b..f62eb06b32fc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -16,6 +16,7 @@
struct i915_vma; struct intel_guc; +struct drm_printer;
/**
- DOC: Command Transport (CT).
@@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
+void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);
#endif /* _INTEL_GUC_CT_H_ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..62b9ce0fafaa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,8 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_submission.h"
static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p);
- /* Add more as required ... */
if (!intel_guc_submission_is_used(guc))
return 0;
intel_guc_log_ct_info(&guc->ct, &p);
intel_guc_log_submission_info(guc, &p);
return 0;
} DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
+static int guc_registered_contexts_show(struct seq_file *m, void *data) +{
- struct intel_guc *guc = m->private;
- struct drm_printer p = drm_seq_file_printer(m);
- if (!intel_guc_submission_is_used(guc))
return -ENODEV;
- intel_guc_log_context_info(guc, &p);
- return 0;
+} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL },
{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
};
if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1a28283a9ae..89b3c7e5d15b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
return 0; }
+void intel_guc_log_submission_info(struct intel_guc *guc,
use correct prefix:
void intel_guc_submission_print_info(struct intel_guc *guc,
struct drm_printer *p)
+{
- struct i915_sched_engine *sched_engine = guc->sched_engine;
- struct rb_node *rb;
- unsigned long flags;
- drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
atomic_read(&guc->outstanding_submission_g2h));
- drm_printf(p, "GuC tasklet count: %u\n\n",
atomic_read(&sched_engine->tasklet.count));
- spin_lock_irqsave(&sched_engine->lock, flags);
- drm_printf(p, "Requests in GuC submit tasklet:\n");
- for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
struct i915_priolist *pl = to_priolist(rb);
struct i915_request *rq;
priolist_for_each_request(rq, pl)
drm_printf(p, "guc_id=%u, seqno=%llu\n",
rq->context->guc_id,
rq->fence.seqno);
- }
- spin_unlock_irqrestore(&sched_engine->lock, flags);
- drm_printf(p, "\n");
+}
+void intel_guc_log_context_info(struct intel_guc *guc,
use correct prefix:
void intel_guc_submission_print_context_info(struct intel_guc *guc,
Michal
struct drm_printer *p)
+{
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce) {
drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
ce->ring->head,
ce->lrc_reg_state[CTX_RING_HEAD]);
drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
ce->ring->tail,
ce->lrc_reg_state[CTX_RING_TAIL]);
drm_printf(p, "\t\tContext Pin Count: %u\n",
atomic_read(&ce->pin_count));
drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
atomic_read(&ce->guc_id_ref));
drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
ce->guc_state.sched_state,
atomic_read(&ce->guc_sched_state_no_lock));
- }
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 3f7005018939..6453e2bfa151 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -10,6 +10,7 @@
#include "intel_guc.h"
+struct drm_printer; struct intel_engine_cs;
void intel_guc_submission_init_early(struct intel_guc *guc); @@ -20,6 +21,9 @@ void intel_guc_submission_fini(struct intel_guc *guc); int intel_guc_preempt_work_create(struct intel_guc *guc); void intel_guc_preempt_work_destroy(struct intel_guc *guc); int intel_guc_submission_setup(struct intel_engine_cs *engine); +void intel_guc_log_submission_info(struct intel_guc *guc,
struct drm_printer *p);
+void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 277800987bf8..a9084789deff 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -50,6 +50,7 @@ #include "i915_trace.h" #include "intel_pm.h" #include "intel_sideband.h" +#include "gt/intel_lrc_reg.h"
static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node) {
On Tue, Jul 13, 2021 at 10:51:35AM +0200, Michal Wajdeczko wrote:
On 24.06.2021 09:04, Matthew Brost wrote:
Update GuC debugfs to support the new GuC structures.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 22 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 23 +++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 52 +++++++++++++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 4 ++ drivers/gpu/drm/i915/i915_debugfs.c | 1 + 6 files changed, 104 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index e0f92e28350c..4ed074df88e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -1135,3 +1135,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
ct_try_receive_message(ct); }
+void intel_guc_log_ct_info(struct intel_guc_ct *ct,
this is not "guc log" function, it is "guc ct" one, so:
void intel_guc_ct_print_info(struct intel_guc_ct *ct,
Sure.
struct drm_printer *p)
+{
- if (!ct->enabled) {
drm_puts(p, "CT disabled\n");
nit: maybe
drm_puts(p, "CT %s\n", enableddisabled(false));
Sure.
return;
- }
- drm_printf(p, "H2G Space: %u\n",
atomic_read(&ct->ctbs.send.space) * 4);
don't you want to print size ? or GGTT offset ?
I don't think so.
- drm_printf(p, "Head: %u\n",
ct->ctbs.send.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.send.desc->tail);
- drm_printf(p, "G2H Space: %u\n",
atomic_read(&ct->ctbs.recv.space) * 4);
- drm_printf(p, "Head: %u\n",
ct->ctbs.recv.desc->head);
- drm_printf(p, "Tail: %u\n",
ct->ctbs.recv.desc->tail);
hmm, what about adding helper:
static void dump_ctb(struct intel_guc_ct_buffer *ctb, *p) { drm_printf(p, "Size: %u\n", ctb->size); drm_printf(p, "Space: %u\n", atomic_read(&ctb->space) * 4); drm_printf(p, "Head: %u\n", ctb->desc->head); drm_printf(p, "Tail: %u\n", ctb->desc->tail); }
and then:
drm_printf(p, "H2G:\n"); dump_ctb(&ct->ctbs.send, p); drm_printf(p, "G2H:\n"); dump_ctb(&ct->ctbs.recv, p);
or
dump_ctb(&ct->ctbs.send, "H2G", p); dump_ctb(&ct->ctbs.recv, "G2H", p);
Seems unnecessary.
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h index ab1b79ab960b..f62eb06b32fc 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h @@ -16,6 +16,7 @@
struct i915_vma; struct intel_guc; +struct drm_printer;
/**
- DOC: Command Transport (CT).
@@ -106,4 +107,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, u32 *response_buf, u32 response_buf_size, u32 flags); void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
+void intel_guc_log_ct_info(struct intel_guc_ct *ct, struct drm_printer *p);
#endif /* _INTEL_GUC_CT_H_ */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index fe7cb7b29a1e..62b9ce0fafaa 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -9,6 +9,8 @@ #include "intel_guc.h" #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" +#include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_submission.h"
static int guc_info_show(struct seq_file *m, void *data) { @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data) drm_puts(&p, "\n"); intel_guc_log_info(&guc->log, &p);
- /* Add more as required ... */
if (!intel_guc_submission_is_used(guc))
return 0;
intel_guc_log_ct_info(&guc->ct, &p);
intel_guc_log_submission_info(guc, &p);
return 0;
} DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
+static int guc_registered_contexts_show(struct seq_file *m, void *data) +{
- struct intel_guc *guc = m->private;
- struct drm_printer p = drm_seq_file_printer(m);
- if (!intel_guc_submission_is_used(guc))
return -ENODEV;
- intel_guc_log_context_info(guc, &p);
- return 0;
+} +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root) { static const struct debugfs_gt_file files[] = { { "guc_info", &guc_info_fops, NULL },
{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
};
if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1a28283a9ae..89b3c7e5d15b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1600,3 +1600,55 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
return 0; }
+void intel_guc_log_submission_info(struct intel_guc *guc,
use correct prefix:
I think correct is wrong term here, use the way I'd name seems more accurate.
void intel_guc_submission_print_info(struct intel_guc *guc,
But, yes will change.
struct drm_printer *p)
+{
- struct i915_sched_engine *sched_engine = guc->sched_engine;
- struct rb_node *rb;
- unsigned long flags;
- drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
atomic_read(&guc->outstanding_submission_g2h));
- drm_printf(p, "GuC tasklet count: %u\n\n",
atomic_read(&sched_engine->tasklet.count));
- spin_lock_irqsave(&sched_engine->lock, flags);
- drm_printf(p, "Requests in GuC submit tasklet:\n");
- for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
struct i915_priolist *pl = to_priolist(rb);
struct i915_request *rq;
priolist_for_each_request(rq, pl)
drm_printf(p, "guc_id=%u, seqno=%llu\n",
rq->context->guc_id,
rq->fence.seqno);
- }
- spin_unlock_irqrestore(&sched_engine->lock, flags);
- drm_printf(p, "\n");
+}
+void intel_guc_log_context_info(struct intel_guc *guc,
use correct prefix:
Same as above.
Matt
void intel_guc_submission_print_context_info(struct intel_guc *guc,
Michal
struct drm_printer *p)
+{
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce) {
drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
ce->ring->head,
ce->lrc_reg_state[CTX_RING_HEAD]);
drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
ce->ring->tail,
ce->lrc_reg_state[CTX_RING_TAIL]);
drm_printf(p, "\t\tContext Pin Count: %u\n",
atomic_read(&ce->pin_count));
drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
atomic_read(&ce->guc_id_ref));
drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
ce->guc_state.sched_state,
atomic_read(&ce->guc_sched_state_no_lock));
- }
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 3f7005018939..6453e2bfa151 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -10,6 +10,7 @@
#include "intel_guc.h"
+struct drm_printer; struct intel_engine_cs;
void intel_guc_submission_init_early(struct intel_guc *guc); @@ -20,6 +21,9 @@ void intel_guc_submission_fini(struct intel_guc *guc); int intel_guc_preempt_work_create(struct intel_guc *guc); void intel_guc_preempt_work_destroy(struct intel_guc *guc); int intel_guc_submission_setup(struct intel_engine_cs *engine); +void intel_guc_log_submission_info(struct intel_guc *guc,
struct drm_printer *p);
+void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 277800987bf8..a9084789deff 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -50,6 +50,7 @@ #include "i915_trace.h" #include "intel_pm.h" #include "intel_sideband.h" +#include "gt/intel_lrc_reg.h"
static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node) {
Add trace points for request dependencies and GuC submit. Extended existing request trace points to include submit fence value,, guc_id, and ring tail value.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++ drivers/gpu/drm/i915/i915_request.c | 3 ++ drivers/gpu/drm/i915/i915_trace.h | 39 ++++++++++++++++++- 3 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 89b3c7e5d15b..c2327eebc09c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -422,6 +422,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = last; return false; } + trace_i915_request_guc_submit(last); }
guc->stalled_request = NULL; @@ -642,6 +643,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, ret = guc_add_request(guc, rq); if (ret == -EBUSY) guc->stalled_request = rq; + else + trace_i915_request_guc_submit(rq);
return ret; } diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d92c9f25c9f4..7f7aa096e873 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1344,6 +1344,9 @@ __i915_request_await_execution(struct i915_request *to, return err; }
+ trace_i915_request_dep_to(to); + trace_i915_request_dep_from(from); + /* Couple the dependency tree for PI on this exposed to->fence */ if (to->engine->sched_engine->schedule) { err = i915_sched_node_add_dependency(&to->sched, diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 6778ad2a14a4..b02d04b6c8f6 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -794,22 +794,27 @@ DECLARE_EVENT_CLASS(i915_request, TP_STRUCT__entry( __field(u32, dev) __field(u64, ctx) + __field(u32, guc_id) __field(u16, class) __field(u16, instance) __field(u32, seqno) + __field(u32, tail) ),
TP_fast_assign( __entry->dev = rq->engine->i915->drm.primary->index; __entry->class = rq->engine->uabi_class; __entry->instance = rq->engine->uabi_instance; + __entry->guc_id = rq->context->guc_id; __entry->ctx = rq->fence.context; __entry->seqno = rq->fence.seqno; + __entry->tail = rq->tail; ),
- TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u", + TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u", __entry->dev, __entry->class, __entry->instance, - __entry->ctx, __entry->seqno) + __entry->guc_id, __entry->ctx, __entry->seqno, + __entry->tail) );
DEFINE_EVENT(i915_request, i915_request_add, @@ -818,6 +823,21 @@ DEFINE_EVENT(i915_request, i915_request_add, );
#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) +DEFINE_EVENT(i915_request, i915_request_dep_to, + TP_PROTO(struct i915_request *rq), + TP_ARGS(rq) +); + +DEFINE_EVENT(i915_request, i915_request_dep_from, + TP_PROTO(struct i915_request *rq), + TP_ARGS(rq) +); + +DEFINE_EVENT(i915_request, i915_request_guc_submit, + TP_PROTO(struct i915_request *rq), + TP_ARGS(rq) +); + DEFINE_EVENT(i915_request, i915_request_submit, TP_PROTO(struct i915_request *rq), TP_ARGS(rq) @@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
#else #if !defined(TRACE_HEADER_MULTI_READ) +static inline void +trace_i915_request_dep_to(struct i915_request *rq) +{ +} + +static inline void +trace_i915_request_dep_from(struct i915_request *rq) +{ +} + +static inline void +trace_i915_request_guc_submit(struct i915_request *rq) +{ +} + static inline void trace_i915_request_submit(struct i915_request *rq) {
On 6/24/2021 00:04, Matthew Brost wrote:
Add trace points for request dependencies and GuC submit. Extended existing request trace points to include submit fence value,, guc_id,
Excessive punctuation. Or maybe should say 'fence value, tail, guc_id'? With that fixed:
Reviewed-by: John Harrison John.C.Harrison@Intel.com
and ring tail value.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++ drivers/gpu/drm/i915/i915_request.c | 3 ++ drivers/gpu/drm/i915/i915_trace.h | 39 ++++++++++++++++++- 3 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 89b3c7e5d15b..c2327eebc09c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -422,6 +422,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = last; return false; }
trace_i915_request_guc_submit(last);
}
guc->stalled_request = NULL;
@@ -642,6 +643,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, ret = guc_add_request(guc, rq); if (ret == -EBUSY) guc->stalled_request = rq;
else
trace_i915_request_guc_submit(rq);
return ret; }
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d92c9f25c9f4..7f7aa096e873 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1344,6 +1344,9 @@ __i915_request_await_execution(struct i915_request *to, return err; }
- trace_i915_request_dep_to(to);
- trace_i915_request_dep_from(from);
- /* Couple the dependency tree for PI on this exposed to->fence */ if (to->engine->sched_engine->schedule) { err = i915_sched_node_add_dependency(&to->sched,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 6778ad2a14a4..b02d04b6c8f6 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -794,22 +794,27 @@ DECLARE_EVENT_CLASS(i915_request, TP_STRUCT__entry( __field(u32, dev) __field(u64, ctx)
__field(u32, guc_id) __field(u16, class) __field(u16, instance) __field(u32, seqno)
__field(u32, tail) ),
TP_fast_assign( __entry->dev = rq->engine->i915->drm.primary->index; __entry->class = rq->engine->uabi_class; __entry->instance = rq->engine->uabi_instance;
__entry->guc_id = rq->context->guc_id; __entry->ctx = rq->fence.context; __entry->seqno = rq->fence.seqno;
__entry->tail = rq->tail; ),
TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u", __entry->dev, __entry->class, __entry->instance,
__entry->ctx, __entry->seqno)
__entry->guc_id, __entry->ctx, __entry->seqno,
__entry->tail)
);
DEFINE_EVENT(i915_request, i915_request_add,
@@ -818,6 +823,21 @@ DEFINE_EVENT(i915_request, i915_request_add, );
#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) +DEFINE_EVENT(i915_request, i915_request_dep_to,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
+);
+DEFINE_EVENT(i915_request, i915_request_dep_from,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
+);
+DEFINE_EVENT(i915_request, i915_request_guc_submit,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
+);
- DEFINE_EVENT(i915_request, i915_request_submit, TP_PROTO(struct i915_request *rq), TP_ARGS(rq)
@@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
#else #if !defined(TRACE_HEADER_MULTI_READ) +static inline void +trace_i915_request_dep_to(struct i915_request *rq) +{ +}
+static inline void +trace_i915_request_dep_from(struct i915_request *rq) +{ +}
+static inline void +trace_i915_request_guc_submit(struct i915_request *rq) +{ +}
- static inline void trace_i915_request_submit(struct i915_request *rq) {
On 24/06/2021 08:04, Matthew Brost wrote:
Add trace points for request dependencies and GuC submit. Extended existing request trace points to include submit fence value,, guc_id, and ring tail value.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++ drivers/gpu/drm/i915/i915_request.c | 3 ++ drivers/gpu/drm/i915/i915_trace.h | 39 ++++++++++++++++++- 3 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 89b3c7e5d15b..c2327eebc09c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -422,6 +422,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = last; return false; }
trace_i915_request_guc_submit(last);
}
guc->stalled_request = NULL;
@@ -642,6 +643,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, ret = guc_add_request(guc, rq); if (ret == -EBUSY) guc->stalled_request = rq;
else
trace_i915_request_guc_submit(rq);
return ret; }
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d92c9f25c9f4..7f7aa096e873 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1344,6 +1344,9 @@ __i915_request_await_execution(struct i915_request *to, return err; }
- trace_i915_request_dep_to(to);
- trace_i915_request_dep_from(from);
Are those two guaranteed to be atomic ie. no other dep_to/dep_from can get injected in the middle of them and if so what guarantees that?
Actually we had an internal discussion going in November 2019 on these very tracepoints which I think was left hanging in the air.
There I was suggesting you create a single tracepoint in the format of "from -> to", so it's clear without any doubt what is going on.
I also suggested this should out outside the GuC patch since it is backend agnostic.
I also asked why only this flavour of dependencies and not all. You said this was the handy one for debugging GuC backend issues. I said in that case you should name it trace_i915_request_await_request so it is clearer it does not cover all dependencies.
As it stands it is a bit misleadingly named, has that question mark around atomicity, and also is not GuC specific. So really I wouldn't think it passes the bar in the current state. Regards,
Tvrtko
P.S. Same discussion from 2019 also talked about trace_i915_request_guc_submit and how it exactly aligns to existing request in tracepoint. You were saying the new one is handy because it corresponds with H2G, as the last request_in of the group would trigger it. I was saying that then you could either know implicitly last request_in triggers H2G, or that you could consider adding explicit H2G tracepoints.
- /* Couple the dependency tree for PI on this exposed to->fence */ if (to->engine->sched_engine->schedule) { err = i915_sched_node_add_dependency(&to->sched,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 6778ad2a14a4..b02d04b6c8f6 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -794,22 +794,27 @@ DECLARE_EVENT_CLASS(i915_request, TP_STRUCT__entry( __field(u32, dev) __field(u64, ctx)
__field(u32, guc_id) __field(u16, class) __field(u16, instance) __field(u32, seqno)
__field(u32, tail) ),
TP_fast_assign( __entry->dev = rq->engine->i915->drm.primary->index; __entry->class = rq->engine->uabi_class; __entry->instance = rq->engine->uabi_instance;
__entry->guc_id = rq->context->guc_id; __entry->ctx = rq->fence.context; __entry->seqno = rq->fence.seqno;
__entry->tail = rq->tail; ),
TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u", __entry->dev, __entry->class, __entry->instance,
__entry->ctx, __entry->seqno)
__entry->guc_id, __entry->ctx, __entry->seqno,
__entry->tail)
);
DEFINE_EVENT(i915_request, i915_request_add,
@@ -818,6 +823,21 @@ DEFINE_EVENT(i915_request, i915_request_add, );
#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) +DEFINE_EVENT(i915_request, i915_request_dep_to,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
+);
+DEFINE_EVENT(i915_request, i915_request_dep_from,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
+);
+DEFINE_EVENT(i915_request, i915_request_guc_submit,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
+);
- DEFINE_EVENT(i915_request, i915_request_submit, TP_PROTO(struct i915_request *rq), TP_ARGS(rq)
@@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
#else #if !defined(TRACE_HEADER_MULTI_READ) +static inline void +trace_i915_request_dep_to(struct i915_request *rq) +{ +}
+static inline void +trace_i915_request_dep_from(struct i915_request *rq) +{ +}
+static inline void +trace_i915_request_guc_submit(struct i915_request *rq) +{ +}
- static inline void trace_i915_request_submit(struct i915_request *rq) {
On Tue, Jul 13, 2021 at 10:06:17AM +0100, Tvrtko Ursulin wrote:
On 24/06/2021 08:04, Matthew Brost wrote:
Add trace points for request dependencies and GuC submit. Extended existing request trace points to include submit fence value,, guc_id, and ring tail value.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++ drivers/gpu/drm/i915/i915_request.c | 3 ++ drivers/gpu/drm/i915/i915_trace.h | 39 ++++++++++++++++++- 3 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 89b3c7e5d15b..c2327eebc09c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -422,6 +422,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = last; return false; }
} guc->stalled_request = NULL;trace_i915_request_guc_submit(last);
@@ -642,6 +643,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, ret = guc_add_request(guc, rq); if (ret == -EBUSY) guc->stalled_request = rq;
- else
return ret; }trace_i915_request_guc_submit(rq);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d92c9f25c9f4..7f7aa096e873 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1344,6 +1344,9 @@ __i915_request_await_execution(struct i915_request *to, return err; }
- trace_i915_request_dep_to(to);
- trace_i915_request_dep_from(from);
Are those two guaranteed to be atomic ie. no other dep_to/dep_from can get injected in the middle of them and if so what guarantees that?
These are not atomic but in practice I've never seen an out of order tracepoints.
Actually we had an internal discussion going in November 2019 on these very tracepoints which I think was left hanging in the air.
There I was suggesting you create a single tracepoint in the format of "from -> to", so it's clear without any doubt what is going on.
Not sure if it worth adding a custom trace point fo rthis.
I also suggested this should out outside the GuC patch since it is backend agnostic.
I guess, but it really matter?
I also asked why only this flavour of dependencies and not all. You said this was the handy one for debugging GuC backend issues. I said in that case you should name it trace_i915_request_await_request so it is clearer it does not cover all dependencies.
Can't we look at the code? For kernel dev trace point I don't this it is to much to ask a developer to grep around the code. Also you likely only turn these on if you know what you are doing anyways.
As it stands it is a bit misleadingly named, has that question mark around atomicity, and also is not GuC specific. So really I wouldn't think it passes the bar in the current state.
I'll just delete them.
Regards,
Tvrtko
P.S. Same discussion from 2019 also talked about trace_i915_request_guc_submit and how it exactly aligns to existing request in tracepoint.
I doesn't align. You literally make the point about how it doesn't align below.
You were saying the new one is handy because it corresponds with H2G, as the last request_in of the group would trigger it. I was saying that then you could either know implicitly last request_in triggers H2G, or that you could consider adding explicit H2G tracepoints.
Yes, we have a trace point for every H2G. Again the users of these tracepoints know what they mean.
Matt
- /* Couple the dependency tree for PI on this exposed to->fence */ if (to->engine->sched_engine->schedule) { err = i915_sched_node_add_dependency(&to->sched,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 6778ad2a14a4..b02d04b6c8f6 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -794,22 +794,27 @@ DECLARE_EVENT_CLASS(i915_request, TP_STRUCT__entry( __field(u32, dev) __field(u64, ctx)
__field(u32, guc_id) __field(u16, class) __field(u16, instance) __field(u32, seqno)
TP_fast_assign( __entry->dev = rq->engine->i915->drm.primary->index; __entry->class = rq->engine->uabi_class; __entry->instance = rq->engine->uabi_instance;__field(u32, tail) ),
__entry->guc_id = rq->context->guc_id; __entry->ctx = rq->fence.context; __entry->seqno = rq->fence.seqno;
__entry->tail = rq->tail; ),
TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u", __entry->dev, __entry->class, __entry->instance,
__entry->ctx, __entry->seqno)
__entry->guc_id, __entry->ctx, __entry->seqno,
); DEFINE_EVENT(i915_request, i915_request_add,__entry->tail)
@@ -818,6 +823,21 @@ DEFINE_EVENT(i915_request, i915_request_add, ); #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) +DEFINE_EVENT(i915_request, i915_request_dep_to,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
+);
+DEFINE_EVENT(i915_request, i915_request_dep_from,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
+);
+DEFINE_EVENT(i915_request, i915_request_guc_submit,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
+);
- DEFINE_EVENT(i915_request, i915_request_submit, TP_PROTO(struct i915_request *rq), TP_ARGS(rq)
@@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out, #else #if !defined(TRACE_HEADER_MULTI_READ) +static inline void +trace_i915_request_dep_to(struct i915_request *rq) +{ +}
+static inline void +trace_i915_request_dep_from(struct i915_request *rq) +{ +}
+static inline void +trace_i915_request_guc_submit(struct i915_request *rq) +{ +}
- static inline void trace_i915_request_submit(struct i915_request *rq) {
On 20/07/2021 02:59, Matthew Brost wrote:
On Tue, Jul 13, 2021 at 10:06:17AM +0100, Tvrtko Ursulin wrote:
On 24/06/2021 08:04, Matthew Brost wrote:
Add trace points for request dependencies and GuC submit. Extended existing request trace points to include submit fence value,, guc_id, and ring tail value.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++ drivers/gpu/drm/i915/i915_request.c | 3 ++ drivers/gpu/drm/i915/i915_trace.h | 39 ++++++++++++++++++- 3 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 89b3c7e5d15b..c2327eebc09c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -422,6 +422,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = last; return false; }
} guc->stalled_request = NULL;trace_i915_request_guc_submit(last);
@@ -642,6 +643,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, ret = guc_add_request(guc, rq); if (ret == -EBUSY) guc->stalled_request = rq;
- else
return ret; }trace_i915_request_guc_submit(rq);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d92c9f25c9f4..7f7aa096e873 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1344,6 +1344,9 @@ __i915_request_await_execution(struct i915_request *to, return err; }
- trace_i915_request_dep_to(to);
- trace_i915_request_dep_from(from);
Are those two guaranteed to be atomic ie. no other dep_to/dep_from can get injected in the middle of them and if so what guarantees that?
These are not atomic but in practice I've never seen an out of order tracepoints.
Actually we had an internal discussion going in November 2019 on these very tracepoints which I think was left hanging in the air.
There I was suggesting you create a single tracepoint in the format of "from -> to", so it's clear without any doubt what is going on.
Not sure if it worth adding a custom trace point fo rthis.
Custom as in not inherit from i915_request class you mean? It's not that hard really.
I also suggested this should out outside the GuC patch since it is backend agnostic.
I guess, but it really matter?
IMO following best practices and established conventions matters a lot.
Regards,
Tvrtko
Add intel_context tracing. These trace points are particular helpful when debugging the GuC firmware and can be enabled via CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_context.c | 6 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++ drivers/gpu/drm/i915/i915_trace.h | 148 +++++++++++++++++- 3 files changed, 166 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 7f97753ab164..b24a1b7a3f88 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -8,6 +8,7 @@
#include "i915_drv.h" #include "i915_globals.h" +#include "i915_trace.h"
#include "intel_context.h" #include "intel_engine.h" @@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu) { struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
+ trace_intel_context_free(ce); kmem_cache_free(global.slab_ce, ce); }
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine) return ERR_PTR(-ENOMEM);
intel_context_init(ce, engine); + trace_intel_context_create(ce); return ce; }
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
+ trace_intel_context_do_pin(ce); + err_unlock: mutex_unlock(&ce->pin_mutex); err_post_unpin: @@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub) */ intel_context_get(ce); intel_context_active_release(ce); + trace_intel_context_do_unpin(ce); intel_context_put(ce); }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c2327eebc09c..d605af0d66e6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -348,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) { + trace_intel_context_sched_enable(ce); atomic_inc(&guc->outstanding_submission_g2h); set_context_enabled(ce); } else if (!enabled) { @@ -812,6 +813,8 @@ static int register_context(struct intel_context *ce) u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + ce->guc_id * sizeof(struct guc_lrc_desc);
+ trace_intel_context_register(ce); + return __guc_action_register_context(guc, ce->guc_id, offset); }
@@ -831,6 +834,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id) { struct intel_guc *guc = ce_to_guc(ce);
+ trace_intel_context_deregister(ce); + return __guc_action_deregister_context(guc, guc_id); }
@@ -905,6 +910,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce) * GuC before registering this context. */ if (context_registered) { + trace_intel_context_steal_guc_id(ce); set_context_wait_for_deregister_to_register(ce); intel_context_get(ce);
@@ -963,6 +969,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+ trace_intel_context_sched_disable(ce); intel_context_get(ce);
guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), @@ -1119,6 +1126,9 @@ static void __guc_signal_context_fence(struct intel_context *ce)
lockdep_assert_held(&ce->guc_state.lock);
+ if (!list_empty(&ce->guc_state.fences)) + trace_intel_context_fence_release(ce); + list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) i915_sw_fence_complete(&rq->submit);
@@ -1529,6 +1539,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, if (unlikely(!ce)) return -EPROTO;
+ trace_intel_context_deregister_done(ce); + if (context_wait_for_deregister_to_register(ce)) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -1580,6 +1592,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return -EPROTO; }
+ trace_intel_context_sched_done(ce); + if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) { diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index b02d04b6c8f6..97c2e83984ed 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -818,8 +818,8 @@ DECLARE_EVENT_CLASS(i915_request, );
DEFINE_EVENT(i915_request, i915_request_add, - TP_PROTO(struct i915_request *rq), - TP_ARGS(rq) + TP_PROTO(struct i915_request *rq), + TP_ARGS(rq) );
#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) @@ -905,6 +905,90 @@ TRACE_EVENT(i915_request_out, __entry->ctx, __entry->seqno, __entry->completed) );
+DECLARE_EVENT_CLASS(intel_context, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce), + + TP_STRUCT__entry( + __field(u32, guc_id) + __field(int, pin_count) + __field(u32, sched_state) + __field(u32, guc_sched_state_no_lock) + ), + + TP_fast_assign( + __entry->guc_id = ce->guc_id; + __entry->pin_count = atomic_read(&ce->pin_count); + __entry->sched_state = ce->guc_state.sched_state; + __entry->guc_sched_state_no_lock = + atomic_read(&ce->guc_sched_state_no_lock); + ), + + TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x", + __entry->guc_id, __entry->pin_count, __entry->sched_state, + __entry->guc_sched_state_no_lock) +); + +DEFINE_EVENT(intel_context, intel_context_register, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_deregister, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_deregister_done, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_sched_enable, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_sched_disable, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_sched_done, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_create, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_fence_release, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_free, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_steal_guc_id, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_do_pin, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + +DEFINE_EVENT(intel_context, intel_context_do_unpin, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + #else #if !defined(TRACE_HEADER_MULTI_READ) static inline void @@ -941,6 +1025,66 @@ static inline void trace_i915_request_out(struct i915_request *rq) { } + +static inline void +trace_intel_context_register(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_deregister(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_deregister_done(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_sched_enable(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_sched_disable(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_sched_done(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_create(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_fence_release(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_free(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_steal_guc_id(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_do_pin(struct intel_context *ce) +{ +} + +static inline void +trace_intel_context_do_unpin(struct intel_context *ce) +{ +} #endif #endif
On 6/24/2021 00:04, Matthew Brost wrote:
Add intel_context tracing. These trace points are particular helpful when debugging the GuC firmware and can be enabled via CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 6 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++ drivers/gpu/drm/i915/i915_trace.h | 148 +++++++++++++++++- 3 files changed, 166 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 7f97753ab164..b24a1b7a3f88 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -8,6 +8,7 @@
#include "i915_drv.h" #include "i915_globals.h" +#include "i915_trace.h"
#include "intel_context.h" #include "intel_engine.h" @@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu) { struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
- trace_intel_context_free(ce); kmem_cache_free(global.slab_ce, ce); }
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine) return ERR_PTR(-ENOMEM);
intel_context_init(ce, engine);
- trace_intel_context_create(ce); return ce; }
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
- trace_intel_context_do_pin(ce);
- err_unlock: mutex_unlock(&ce->pin_mutex); err_post_unpin:
@@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub) */ intel_context_get(ce); intel_context_active_release(ce);
- trace_intel_context_do_unpin(ce); intel_context_put(ce); }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c2327eebc09c..d605af0d66e6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -348,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) {
atomic_inc(&guc->outstanding_submission_g2h); set_context_enabled(ce); } else if (!enabled) {trace_intel_context_sched_enable(ce);
@@ -812,6 +813,8 @@ static int register_context(struct intel_context *ce) u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + ce->guc_id * sizeof(struct guc_lrc_desc);
- trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset); }
@@ -831,6 +834,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id) { struct intel_guc *guc = ce_to_guc(ce);
- trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id); }
@@ -905,6 +910,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce) * GuC before registering this context. */ if (context_registered) {
set_context_wait_for_deregister_to_register(ce); intel_context_get(ce);trace_intel_context_steal_guc_id(ce);
@@ -963,6 +969,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
trace_intel_context_sched_disable(ce); intel_context_get(ce);
guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1119,6 +1126,9 @@ static void __guc_signal_context_fence(struct intel_context *ce)
lockdep_assert_held(&ce->guc_state.lock);
- if (!list_empty(&ce->guc_state.fences))
trace_intel_context_fence_release(ce);
- list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) i915_sw_fence_complete(&rq->submit);
@@ -1529,6 +1539,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, if (unlikely(!ce)) return -EPROTO;
- trace_intel_context_deregister_done(ce);
- if (context_wait_for_deregister_to_register(ce)) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
@@ -1580,6 +1592,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return -EPROTO; }
- trace_intel_context_sched_done(ce);
- if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index b02d04b6c8f6..97c2e83984ed 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -818,8 +818,8 @@ DECLARE_EVENT_CLASS(i915_request, );
DEFINE_EVENT(i915_request, i915_request_add,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
Is this an intentional white space change?
);
#if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) @@ -905,6 +905,90 @@ TRACE_EVENT(i915_request_out, __entry->ctx, __entry->seqno, __entry->completed) );
+DECLARE_EVENT_CLASS(intel_context,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce),
TP_STRUCT__entry(
__field(u32, guc_id)
__field(int, pin_count)
__field(u32, sched_state)
__field(u32, guc_sched_state_no_lock)
),
TP_fast_assign(
__entry->guc_id = ce->guc_id;
__entry->pin_count = atomic_read(&ce->pin_count);
__entry->sched_state = ce->guc_state.sched_state;
__entry->guc_sched_state_no_lock =
atomic_read(&ce->guc_sched_state_no_lock);
),
TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
__entry->guc_id, __entry->pin_count, __entry->sched_state,
__entry->guc_sched_state_no_lock)
+);
+DEFINE_EVENT(intel_context, intel_context_register,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_deregister,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_deregister_done,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_sched_enable,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_sched_disable,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_sched_done,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_create,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_fence_release,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_free,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_steal_guc_id,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_do_pin,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_do_unpin,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
- #else #if !defined(TRACE_HEADER_MULTI_READ) static inline void
@@ -941,6 +1025,66 @@ static inline void trace_i915_request_out(struct i915_request *rq) { }
+static inline void +trace_intel_context_register(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_deregister(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_deregister_done(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_sched_enable(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_sched_disable(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_sched_done(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_create(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_fence_release(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_free(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_steal_guc_id(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_do_pin(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_do_unpin(struct intel_context *ce) +{ +} #endif #endif
On Mon, Jul 12, 2021 at 11:10:40AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Add intel_context tracing. These trace points are particular helpful when debugging the GuC firmware and can be enabled via CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 6 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++ drivers/gpu/drm/i915/i915_trace.h | 148 +++++++++++++++++- 3 files changed, 166 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 7f97753ab164..b24a1b7a3f88 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -8,6 +8,7 @@ #include "i915_drv.h" #include "i915_globals.h" +#include "i915_trace.h" #include "intel_context.h" #include "intel_engine.h" @@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu) { struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
- trace_intel_context_free(ce); kmem_cache_free(global.slab_ce, ce); }
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine) return ERR_PTR(-ENOMEM); intel_context_init(ce, engine);
- trace_intel_context_create(ce); return ce; }
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce, GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
- trace_intel_context_do_pin(ce);
- err_unlock: mutex_unlock(&ce->pin_mutex); err_post_unpin:
@@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub) */ intel_context_get(ce); intel_context_active_release(ce);
- trace_intel_context_do_unpin(ce); intel_context_put(ce); }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c2327eebc09c..d605af0d66e6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -348,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) {
atomic_inc(&guc->outstanding_submission_g2h); set_context_enabled(ce); } else if (!enabled) {trace_intel_context_sched_enable(ce);
@@ -812,6 +813,8 @@ static int register_context(struct intel_context *ce) u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + ce->guc_id * sizeof(struct guc_lrc_desc);
- trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset); }
@@ -831,6 +834,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id) { struct intel_guc *guc = ce_to_guc(ce);
- trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id); }
@@ -905,6 +910,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce) * GuC before registering this context. */ if (context_registered) {
set_context_wait_for_deregister_to_register(ce); intel_context_get(ce);trace_intel_context_steal_guc_id(ce);
@@ -963,6 +969,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
- trace_intel_context_sched_disable(ce); intel_context_get(ce); guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1119,6 +1126,9 @@ static void __guc_signal_context_fence(struct intel_context *ce) lockdep_assert_held(&ce->guc_state.lock);
- if (!list_empty(&ce->guc_state.fences))
trace_intel_context_fence_release(ce);
- list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) i915_sw_fence_complete(&rq->submit);
@@ -1529,6 +1539,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, if (unlikely(!ce)) return -EPROTO;
- trace_intel_context_deregister_done(ce);
- if (context_wait_for_deregister_to_register(ce)) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
@@ -1580,6 +1592,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return -EPROTO; }
- trace_intel_context_sched_done(ce);
- if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index b02d04b6c8f6..97c2e83984ed 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -818,8 +818,8 @@ DECLARE_EVENT_CLASS(i915_request, ); DEFINE_EVENT(i915_request, i915_request_add,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
Is this an intentional white space change?
Yea, probably should be in the previous patch though. Before this change the arguments were misaligned.
Matt
); #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) @@ -905,6 +905,90 @@ TRACE_EVENT(i915_request_out, __entry->ctx, __entry->seqno, __entry->completed) ); +DECLARE_EVENT_CLASS(intel_context,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce),
TP_STRUCT__entry(
__field(u32, guc_id)
__field(int, pin_count)
__field(u32, sched_state)
__field(u32, guc_sched_state_no_lock)
),
TP_fast_assign(
__entry->guc_id = ce->guc_id;
__entry->pin_count = atomic_read(&ce->pin_count);
__entry->sched_state = ce->guc_state.sched_state;
__entry->guc_sched_state_no_lock =
atomic_read(&ce->guc_sched_state_no_lock);
),
TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
__entry->guc_id, __entry->pin_count, __entry->sched_state,
__entry->guc_sched_state_no_lock)
+);
+DEFINE_EVENT(intel_context, intel_context_register,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_deregister,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_deregister_done,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_sched_enable,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_sched_disable,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_sched_done,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_create,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_fence_release,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_free,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_steal_guc_id,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_do_pin,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_do_unpin,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
- #else #if !defined(TRACE_HEADER_MULTI_READ) static inline void
@@ -941,6 +1025,66 @@ static inline void trace_i915_request_out(struct i915_request *rq) { }
+static inline void +trace_intel_context_register(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_deregister(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_deregister_done(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_sched_enable(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_sched_disable(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_sched_done(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_create(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_fence_release(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_free(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_steal_guc_id(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_do_pin(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_do_unpin(struct intel_context *ce) +{ +} #endif #endif
On 7/12/2021 14:47, Matthew Brost wrote:
On Mon, Jul 12, 2021 at 11:10:40AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Add intel_context tracing. These trace points are particular helpful when debugging the GuC firmware and can be enabled via CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 6 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++ drivers/gpu/drm/i915/i915_trace.h | 148 +++++++++++++++++- 3 files changed, 166 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 7f97753ab164..b24a1b7a3f88 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -8,6 +8,7 @@ #include "i915_drv.h" #include "i915_globals.h" +#include "i915_trace.h" #include "intel_context.h" #include "intel_engine.h" @@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu) { struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
- trace_intel_context_free(ce); kmem_cache_free(global.slab_ce, ce); }
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine) return ERR_PTR(-ENOMEM); intel_context_init(ce, engine);
- trace_intel_context_create(ce); return ce; }
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce, GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
- trace_intel_context_do_pin(ce);
- err_unlock: mutex_unlock(&ce->pin_mutex); err_post_unpin:
@@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub) */ intel_context_get(ce); intel_context_active_release(ce);
- trace_intel_context_do_unpin(ce); intel_context_put(ce); }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c2327eebc09c..d605af0d66e6 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -348,6 +348,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) err = intel_guc_send_nb(guc, action, len, g2h_len_dw); if (!enabled && !err) {
} else if (!enabled) {trace_intel_context_sched_enable(ce); atomic_inc(&guc->outstanding_submission_g2h); set_context_enabled(ce);
@@ -812,6 +813,8 @@ static int register_context(struct intel_context *ce) u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + ce->guc_id * sizeof(struct guc_lrc_desc);
- trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset); }
@@ -831,6 +834,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id) { struct intel_guc *guc = ce_to_guc(ce);
- trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id); }
@@ -905,6 +910,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce) * GuC before registering this context. */ if (context_registered) {
trace_intel_context_steal_guc_id(ce); set_context_wait_for_deregister_to_register(ce); intel_context_get(ce);
@@ -963,6 +969,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
- trace_intel_context_sched_disable(ce); intel_context_get(ce); guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1119,6 +1126,9 @@ static void __guc_signal_context_fence(struct intel_context *ce) lockdep_assert_held(&ce->guc_state.lock);
- if (!list_empty(&ce->guc_state.fences))
trace_intel_context_fence_release(ce);
- list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link) i915_sw_fence_complete(&rq->submit);
@@ -1529,6 +1539,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, if (unlikely(!ce)) return -EPROTO;
- trace_intel_context_deregister_done(ce);
- if (context_wait_for_deregister_to_register(ce)) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
@@ -1580,6 +1592,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return -EPROTO; }
- trace_intel_context_sched_done(ce);
- if (context_pending_enable(ce)) { clr_context_pending_enable(ce); } else if (context_pending_disable(ce)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index b02d04b6c8f6..97c2e83984ed 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -818,8 +818,8 @@ DECLARE_EVENT_CLASS(i915_request, ); DEFINE_EVENT(i915_request, i915_request_add,
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
TP_PROTO(struct i915_request *rq),
TP_ARGS(rq)
Is this an intentional white space change?
Yea, probably should be in the previous patch though. Before this change the arguments were misaligned.
Matt
Okay, one can never tell if the alignment is out for reals or just because the email viewer and/or diff prefixes are playing silly buggers with tab spacing. And yeah, would make more sense to bump the change into the request trace point patch. With that done...
Reviewed-by: John Harrison John.C.Harrison@Intel.com
); #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS) @@ -905,6 +905,90 @@ TRACE_EVENT(i915_request_out, __entry->ctx, __entry->seqno, __entry->completed) ); +DECLARE_EVENT_CLASS(intel_context,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce),
TP_STRUCT__entry(
__field(u32, guc_id)
__field(int, pin_count)
__field(u32, sched_state)
__field(u32, guc_sched_state_no_lock)
),
TP_fast_assign(
__entry->guc_id = ce->guc_id;
__entry->pin_count = atomic_read(&ce->pin_count);
__entry->sched_state = ce->guc_state.sched_state;
__entry->guc_sched_state_no_lock =
atomic_read(&ce->guc_sched_state_no_lock);
),
TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
__entry->guc_id, __entry->pin_count, __entry->sched_state,
__entry->guc_sched_state_no_lock)
+);
+DEFINE_EVENT(intel_context, intel_context_register,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_deregister,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_deregister_done,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_sched_enable,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_sched_disable,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_sched_done,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_create,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_fence_release,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_free,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_steal_guc_id,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_do_pin,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
+DEFINE_EVENT(intel_context, intel_context_do_unpin,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
- #else #if !defined(TRACE_HEADER_MULTI_READ) static inline void
@@ -941,6 +1025,66 @@ static inline void trace_i915_request_out(struct i915_request *rq) { }
+static inline void +trace_intel_context_register(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_deregister(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_deregister_done(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_sched_enable(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_sched_disable(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_sched_done(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_create(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_fence_release(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_free(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_steal_guc_id(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_do_pin(struct intel_context *ce) +{ +}
+static inline void +trace_intel_context_do_unpin(struct intel_context *ce) +{ +} #endif #endif
Implement GuC virtual engines. Rather simple implementation, basically just allocate an engine, setup context enter / exit function to virtual engine specific functions, set all other variables / functions to guc versions, and set the engine mask to that of all the siblings.
Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 19 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gt/intel_context_types.h | 10 + drivers/gpu/drm/i915/gt/intel_engine.h | 45 +++- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 14 + .../drm/i915/gt/intel_execlists_submission.c | 186 +++++++------ .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 253 +++++++++++++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 + 10 files changed, 429 insertions(+), 132 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 5c07e6abf16a..8a9293e0ca92 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -72,7 +72,6 @@ #include "gt/intel_context_param.h" #include "gt/intel_engine_heartbeat.h" #include "gt/intel_engine_user.h" -#include "gt/intel_execlists_submission.h" /* virtual_engine */ #include "gt/intel_gpu_commands.h" #include "gt/intel_ring.h"
@@ -1568,9 +1567,6 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data) if (!HAS_EXECLISTS(i915)) return -ENODEV;
- if (intel_uc_uses_guc_submission(&i915->gt.uc)) - return -ENODEV; /* not implement yet */ - if (get_user(idx, &ext->engine_index)) return -EFAULT;
@@ -1627,7 +1623,7 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data) } }
- ce = intel_execlists_create_virtual(siblings, n); + ce = intel_engine_create_virtual(siblings, n); if (IS_ERR(ce)) { err = PTR_ERR(ce); goto out_siblings; @@ -1723,13 +1719,9 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) * A non-virtual engine has no siblings to choose between; and * a submit fence will always be directed to the one engine. */ - if (intel_engine_is_virtual(virtual)) { - err = intel_virtual_engine_attach_bond(virtual, - master, - bond); - if (err) - return err; - } + err = intel_engine_attach_bond(virtual, master, bond); + if (err) + return err; }
return 0; @@ -2116,8 +2108,7 @@ static int clone_engines(struct i915_gem_context *dst, * the virtual engine instead. */ if (intel_engine_is_virtual(engine)) - clone->engines[n] = - intel_execlists_clone_virtual(engine); + clone->engines[n] = intel_engine_clone_virtual(engine); else clone->engines[n] = intel_context_create(engine); if (IS_ERR_OR_NULL(clone->engines[n])) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h index b5c908f3f4f2..ba772762f7b9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h @@ -10,6 +10,7 @@ #include "i915_gem_context_types.h"
#include "gt/intel_context.h" +#include "gt/intel_engine.h"
#include "i915_drv.h" #include "i915_gem.h" diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index e7af6a2368f8..6945963a31ba 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -47,6 +47,16 @@ struct intel_context_ops {
void (*reset)(struct intel_context *ce); void (*destroy)(struct kref *kref); + + /* virtual engine/context interface */ + struct intel_context *(*create_virtual)(struct intel_engine_cs **engine, + unsigned int count); + struct intel_context *(*clone_virtual)(struct intel_engine_cs *engine); + struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine, + unsigned int sibling); + int (*attach_bond)(struct intel_engine_cs *engine, + const struct intel_engine_cs *master, + const struct intel_engine_cs *sibling); };
struct intel_context { diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index f911c1224ab2..923eaee627b3 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -273,13 +273,56 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine) return intel_engine_has_preemption(engine); }
+struct intel_context * +intel_engine_create_virtual(struct intel_engine_cs **siblings, + unsigned int count); + +static inline bool +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine) +{ + if (intel_engine_uses_guc(engine)) + return intel_guc_virtual_engine_has_heartbeat(engine); + else + GEM_BUG_ON("Only should be called in GuC submission"); + + return false; +} + static inline bool intel_engine_has_heartbeat(const struct intel_engine_cs *engine) { if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL)) return false;
- return READ_ONCE(engine->props.heartbeat_interval_ms); + if (intel_engine_is_virtual(engine)) + return intel_virtual_engine_has_heartbeat(engine); + else + return READ_ONCE(engine->props.heartbeat_interval_ms); +} + +static inline struct intel_context * +intel_engine_clone_virtual(struct intel_engine_cs *src) +{ + GEM_BUG_ON(!intel_engine_is_virtual(src)); + return src->cops->clone_virtual(src); +} + +static inline int +intel_engine_attach_bond(struct intel_engine_cs *engine, + const struct intel_engine_cs *master, + const struct intel_engine_cs *sibling) +{ + if (!engine->cops->attach_bond) + return 0; + + return engine->cops->attach_bond(engine, master, sibling); +} + +static inline struct intel_engine_cs * +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) +{ + GEM_BUG_ON(!intel_engine_is_virtual(engine)); + return engine->cops->get_sibling(engine, sibling); }
#endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 88694822716a..d13b1716c29e 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1736,6 +1736,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now) return total; }
+struct intel_context * +intel_engine_create_virtual(struct intel_engine_cs **siblings, + unsigned int count) +{ + if (count == 0) + return ERR_PTR(-EINVAL); + + if (count == 1) + return intel_context_create(siblings[0]); + + GEM_BUG_ON(!siblings[0]->cops->create_virtual); + return siblings[0]->cops->create_virtual(siblings, count); +} + static bool match_ring(struct i915_request *rq) { u32 ring = ENGINE_READ(rq->engine, RING_START); diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index cdb2126a159a..bd4ced794ff9 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -205,6 +205,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) return container_of(engine, struct virtual_engine, base); }
+static struct intel_context * +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count); + static struct i915_request * __active_request(const struct intel_timeline * const tl, struct i915_request *rq, @@ -2560,6 +2563,8 @@ static const struct intel_context_ops execlists_context_ops = {
.reset = lrc_reset, .destroy = lrc_destroy, + + .create_virtual = execlists_create_virtual, };
static int emit_pdps(struct i915_request *rq) @@ -3506,6 +3511,94 @@ static void virtual_context_exit(struct intel_context *ce) intel_engine_pm_put(ve->siblings[n]); }
+static struct intel_engine_cs * +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) +{ + struct virtual_engine *ve = to_virtual_engine(engine); + + if (sibling >= ve->num_siblings) + return NULL; + + return ve->siblings[sibling]; +} + +static struct intel_context * +virtual_clone(struct intel_engine_cs *src) +{ + struct virtual_engine *se = to_virtual_engine(src); + struct intel_context *dst; + + dst = execlists_create_virtual(se->siblings, se->num_siblings); + if (IS_ERR(dst)) + return dst; + + if (se->num_bonds) { + struct virtual_engine *de = to_virtual_engine(dst->engine); + + de->bonds = kmemdup(se->bonds, + sizeof(*se->bonds) * se->num_bonds, + GFP_KERNEL); + if (!de->bonds) { + intel_context_put(dst); + return ERR_PTR(-ENOMEM); + } + + de->num_bonds = se->num_bonds; + } + + return dst; +} + +static struct ve_bond * +virtual_find_bond(struct virtual_engine *ve, + const struct intel_engine_cs *master) +{ + int i; + + for (i = 0; i < ve->num_bonds; i++) { + if (ve->bonds[i].master == master) + return &ve->bonds[i]; + } + + return NULL; +} + +static int virtual_attach_bond(struct intel_engine_cs *engine, + const struct intel_engine_cs *master, + const struct intel_engine_cs *sibling) +{ + struct virtual_engine *ve = to_virtual_engine(engine); + struct ve_bond *bond; + int n; + + /* Sanity check the sibling is part of the virtual engine */ + for (n = 0; n < ve->num_siblings; n++) + if (sibling == ve->siblings[n]) + break; + if (n == ve->num_siblings) + return -EINVAL; + + bond = virtual_find_bond(ve, master); + if (bond) { + bond->sibling_mask |= sibling->mask; + return 0; + } + + bond = krealloc(ve->bonds, + sizeof(*bond) * (ve->num_bonds + 1), + GFP_KERNEL); + if (!bond) + return -ENOMEM; + + bond[ve->num_bonds].master = master; + bond[ve->num_bonds].sibling_mask = sibling->mask; + + ve->bonds = bond; + ve->num_bonds++; + + return 0; +} + static const struct intel_context_ops virtual_context_ops = { .flags = COPS_HAS_INFLIGHT,
@@ -3520,6 +3613,10 @@ static const struct intel_context_ops virtual_context_ops = { .exit = virtual_context_exit,
.destroy = virtual_context_destroy, + + .clone_virtual = virtual_clone, + .get_sibling = virtual_get_sibling, + .attach_bond = virtual_attach_bond, };
static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) @@ -3668,20 +3765,6 @@ static void virtual_submit_request(struct i915_request *rq) spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags); }
-static struct ve_bond * -virtual_find_bond(struct virtual_engine *ve, - const struct intel_engine_cs *master) -{ - int i; - - for (i = 0; i < ve->num_bonds; i++) { - if (ve->bonds[i].master == master) - return &ve->bonds[i]; - } - - return NULL; -} - static void virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) { @@ -3704,20 +3787,13 @@ virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) to_request(signal)->execution_mask &= ~allowed; }
-struct intel_context * -intel_execlists_create_virtual(struct intel_engine_cs **siblings, - unsigned int count) +static struct intel_context * +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) { struct virtual_engine *ve; unsigned int n; int err;
- if (count == 0) - return ERR_PTR(-EINVAL); - - if (count == 1) - return intel_context_create(siblings[0]); - ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL); if (!ve) return ERR_PTR(-ENOMEM); @@ -3850,70 +3926,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, return ERR_PTR(err); }
-struct intel_context * -intel_execlists_clone_virtual(struct intel_engine_cs *src) -{ - struct virtual_engine *se = to_virtual_engine(src); - struct intel_context *dst; - - dst = intel_execlists_create_virtual(se->siblings, - se->num_siblings); - if (IS_ERR(dst)) - return dst; - - if (se->num_bonds) { - struct virtual_engine *de = to_virtual_engine(dst->engine); - - de->bonds = kmemdup(se->bonds, - sizeof(*se->bonds) * se->num_bonds, - GFP_KERNEL); - if (!de->bonds) { - intel_context_put(dst); - return ERR_PTR(-ENOMEM); - } - - de->num_bonds = se->num_bonds; - } - - return dst; -} - -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, - const struct intel_engine_cs *master, - const struct intel_engine_cs *sibling) -{ - struct virtual_engine *ve = to_virtual_engine(engine); - struct ve_bond *bond; - int n; - - /* Sanity check the sibling is part of the virtual engine */ - for (n = 0; n < ve->num_siblings; n++) - if (sibling == ve->siblings[n]) - break; - if (n == ve->num_siblings) - return -EINVAL; - - bond = virtual_find_bond(ve, master); - if (bond) { - bond->sibling_mask |= sibling->mask; - return 0; - } - - bond = krealloc(ve->bonds, - sizeof(*bond) * (ve->num_bonds + 1), - GFP_KERNEL); - if (!bond) - return -ENOMEM; - - bond[ve->num_bonds].master = master; - bond[ve->num_bonds].sibling_mask = sibling->mask; - - ve->bonds = bond; - ve->num_bonds++; - - return 0; -} - void intel_execlists_show_requests(struct intel_engine_cs *engine, struct drm_printer *m, void (*show_request)(struct drm_printer *m, diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h index 4ca9b475e252..74041b1994af 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h @@ -32,15 +32,4 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, int indent), unsigned int max);
-struct intel_context * -intel_execlists_create_virtual(struct intel_engine_cs **siblings, - unsigned int count); - -struct intel_context * -intel_execlists_clone_virtual(struct intel_engine_cs *src); - -int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine, - const struct intel_engine_cs *master, - const struct intel_engine_cs *sibling); - #endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */ diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 08896ae027d5..88aac9977e09 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt, GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
for (n = 0; n < nctx; n++) { - ve[n] = intel_execlists_create_virtual(siblings, nsibling); + ve[n] = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve[n])) { err = PTR_ERR(ve[n]); nctx = n; @@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt, * restrict it to our desired engine within the virtual engine. */
- ve = intel_execlists_create_virtual(siblings, nsibling); + ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); goto out_close; @@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt, i915_request_add(rq); }
- ce = intel_execlists_create_virtual(siblings, nsibling); + ce = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ce)) { err = PTR_ERR(ce); goto out; @@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
/* XXX We do not handle oversubscription and fairness with normal rq */ for (n = 0; n < nsibling; n++) { - ce = intel_execlists_create_virtual(siblings, nsibling); + ce = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ce)) { err = PTR_ERR(ce); goto out; @@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt, if (err) goto out_scratch;
- ve = intel_execlists_create_virtual(siblings, nsibling); + ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); goto out_scratch; @@ -4431,16 +4431,16 @@ static int bond_virtual_engine(struct intel_gt *gt, for (n = 0; n < nsibling; n++) { struct intel_context *ve;
- ve = intel_execlists_create_virtual(siblings, nsibling); + ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); onstack_fence_fini(&fence); goto out; }
- err = intel_virtual_engine_attach_bond(ve->engine, - master, - siblings[n]); + err = intel_engine_attach_bond(ve->engine, + master, + siblings[n]); if (err) { intel_context_put(ve); onstack_fence_fini(&fence); @@ -4576,7 +4576,7 @@ static int reset_virtual_engine(struct intel_gt *gt, if (igt_spinner_init(&spin, gt)) return -ENOMEM;
- ve = intel_execlists_create_virtual(siblings, nsibling); + ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); goto out_spin; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d605af0d66e6..ccbcf024b31b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -60,6 +60,15 @@ * */
+/* GuC Virtual Engine */ +struct guc_virtual_engine { + struct intel_engine_cs base; + struct intel_context context; +}; + +static struct intel_context * +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count); + #define GUC_REQUEST_SIZE 64 /* bytes */
/* @@ -928,20 +937,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce) return ret; }
-static int guc_context_pre_pin(struct intel_context *ce, - struct i915_gem_ww_ctx *ww, - void **vaddr) +static int __guc_context_pre_pin(struct intel_context *ce, + struct intel_engine_cs *engine, + struct i915_gem_ww_ctx *ww, + void **vaddr) { - return lrc_pre_pin(ce, ce->engine, ww, vaddr); + return lrc_pre_pin(ce, engine, ww, vaddr); }
-static int guc_context_pin(struct intel_context *ce, void *vaddr) +static int __guc_context_pin(struct intel_context *ce, + struct intel_engine_cs *engine, + void *vaddr) { if (i915_ggtt_offset(ce->state) != (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK)) set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
- return lrc_pin(ce, ce->engine, vaddr); + return lrc_pin(ce, engine, vaddr); +} + +static int guc_context_pre_pin(struct intel_context *ce, + struct i915_gem_ww_ctx *ww, + void **vaddr) +{ + return __guc_context_pre_pin(ce, ce->engine, ww, vaddr); +} + +static int guc_context_pin(struct intel_context *ce, void *vaddr) +{ + return __guc_context_pin(ce, ce->engine, vaddr); }
static void guc_context_unpin(struct intel_context *ce) @@ -1041,6 +1065,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) deregister_context(ce, ce->guc_id); }
+static void __guc_context_destroy(struct intel_context *ce) +{ + lrc_fini(ce); + intel_context_fini(ce); + + if (intel_engine_is_virtual(ce->engine)) { + struct guc_virtual_engine *ve = + container_of(ce, typeof(*ve), context); + + kfree(ve); + } else { + intel_context_free(ce); + } +} + static void guc_context_destroy(struct kref *kref) { struct intel_context *ce = container_of(kref, typeof(*ce), ref); @@ -1057,7 +1096,7 @@ static void guc_context_destroy(struct kref *kref) if (context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { release_guc_id(guc, ce); - lrc_destroy(kref); + __guc_context_destroy(ce); return; }
@@ -1073,7 +1112,7 @@ static void guc_context_destroy(struct kref *kref) if (context_guc_id_invalid(ce)) { __release_guc_id(guc, ce); spin_unlock_irqrestore(&guc->contexts_lock, flags); - lrc_destroy(kref); + __guc_context_destroy(ce); return; }
@@ -1118,6 +1157,8 @@ static const struct intel_context_ops guc_context_ops = {
.reset = lrc_reset, .destroy = guc_context_destroy, + + .create_virtual = guc_create_virtual, };
static void __guc_signal_context_fence(struct intel_context *ce) @@ -1246,6 +1287,96 @@ static int guc_request_alloc(struct i915_request *rq) return 0; }
+static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) +{ + struct intel_engine_cs *engine; + intel_engine_mask_t tmp, mask = ve->mask; + unsigned int num_siblings = 0; + + for_each_engine_masked(engine, ve->gt, mask, tmp) + if (num_siblings++ == sibling) + return engine; + + return NULL; +} + +static int guc_virtual_context_pre_pin(struct intel_context *ce, + struct i915_gem_ww_ctx *ww, + void **vaddr) +{ + struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0); + + return __guc_context_pre_pin(ce, engine, ww, vaddr); +} + +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr) +{ + struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0); + + return __guc_context_pin(ce, engine, vaddr); +} + +static void guc_virtual_context_enter(struct intel_context *ce) +{ + intel_engine_mask_t tmp, mask = ce->engine->mask; + struct intel_engine_cs *engine; + + for_each_engine_masked(engine, ce->engine->gt, mask, tmp) + intel_engine_pm_get(engine); + + intel_timeline_enter(ce->timeline); +} + +static void guc_virtual_context_exit(struct intel_context *ce) +{ + intel_engine_mask_t tmp, mask = ce->engine->mask; + struct intel_engine_cs *engine; + + for_each_engine_masked(engine, ce->engine->gt, mask, tmp) + intel_engine_pm_put(engine); + + intel_timeline_exit(ce->timeline); +} + +static int guc_virtual_context_alloc(struct intel_context *ce) +{ + struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0); + + return lrc_alloc(ce, engine); +} + +static struct intel_context *guc_clone_virtual(struct intel_engine_cs *src) +{ + struct intel_engine_cs *siblings[GUC_MAX_INSTANCES_PER_CLASS], *engine; + intel_engine_mask_t tmp, mask = src->mask; + unsigned int num_siblings = 0; + + for_each_engine_masked(engine, src->gt, mask, tmp) + siblings[num_siblings++] = engine; + + return guc_create_virtual(siblings, num_siblings); +} + +static const struct intel_context_ops virtual_guc_context_ops = { + .alloc = guc_virtual_context_alloc, + + .pre_pin = guc_virtual_context_pre_pin, + .pin = guc_virtual_context_pin, + .unpin = guc_context_unpin, + .post_unpin = guc_context_post_unpin, + + .enter = guc_virtual_context_enter, + .exit = guc_virtual_context_exit, + + .sched_disable = guc_context_sched_disable, + + .destroy = guc_context_destroy, + + .clone_virtual = guc_clone_virtual, + .get_sibling = guc_virtual_get_sibling, +}; + static void sanitize_hwsp(struct intel_engine_cs *engine) { struct intel_timeline *tl; @@ -1557,7 +1688,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, } else if (context_destroyed(ce)) { /* Context has been destroyed */ release_guc_id(guc, ce); - lrc_destroy(&ce->ref); + __guc_context_destroy(ce); }
decr_outstanding_submission_g2h(guc); @@ -1669,3 +1800,107 @@ void intel_guc_log_context_info(struct intel_guc *guc, atomic_read(&ce->guc_sched_state_no_lock)); } } + +static struct intel_context * +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) +{ + struct guc_virtual_engine *ve; + struct intel_guc *guc; + unsigned int n; + int err; + + ve = kzalloc(sizeof(*ve), GFP_KERNEL); + if (!ve) + return ERR_PTR(-ENOMEM); + + guc = &siblings[0]->gt->uc.guc; + + ve->base.i915 = siblings[0]->i915; + ve->base.gt = siblings[0]->gt; + ve->base.uncore = siblings[0]->uncore; + ve->base.id = -1; + + ve->base.uabi_class = I915_ENGINE_CLASS_INVALID; + ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; + ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; + ve->base.saturated = ALL_ENGINES; + ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base); + if (!ve->base.breadcrumbs) { + kfree(ve); + return ERR_PTR(-ENOMEM); + } + + snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); + + ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine); + + ve->base.cops = &virtual_guc_context_ops; + ve->base.request_alloc = guc_request_alloc; + + ve->base.submit_request = guc_submit_request; + + ve->base.flags = I915_ENGINE_IS_VIRTUAL; + + intel_context_init(&ve->context, &ve->base); + + for (n = 0; n < count; n++) { + struct intel_engine_cs *sibling = siblings[n]; + + GEM_BUG_ON(!is_power_of_2(sibling->mask)); + if (sibling->mask & ve->base.mask) { + DRM_DEBUG("duplicate %s entry in load balancer\n", + sibling->name); + err = -EINVAL; + goto err_put; + } + + ve->base.mask |= sibling->mask; + + if (n != 0 && ve->base.class != sibling->class) { + DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n", + sibling->class, ve->base.class); + err = -EINVAL; + goto err_put; + } else if (n == 0) { + ve->base.class = sibling->class; + ve->base.uabi_class = sibling->uabi_class; + snprintf(ve->base.name, sizeof(ve->base.name), + "v%dx%d", ve->base.class, count); + ve->base.context_size = sibling->context_size; + + ve->base.emit_bb_start = sibling->emit_bb_start; + ve->base.emit_flush = sibling->emit_flush; + ve->base.emit_init_breadcrumb = + sibling->emit_init_breadcrumb; + ve->base.emit_fini_breadcrumb = + sibling->emit_fini_breadcrumb; + ve->base.emit_fini_breadcrumb_dw = + sibling->emit_fini_breadcrumb_dw; + + ve->base.flags |= sibling->flags; + + ve->base.props.timeslice_duration_ms = + sibling->props.timeslice_duration_ms; + } + } + + return &ve->context; + +err_put: + intel_context_put(&ve->context); + return ERR_PTR(err); +} + + + +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve) +{ + struct intel_engine_cs *engine; + intel_engine_mask_t tmp, mask = ve->mask; + + for_each_engine_masked(engine, ve->gt, mask, tmp) + if (READ_ONCE(engine->props.heartbeat_interval_ms)) + return true; + + return false; +} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 6453e2bfa151..95df5ab06031 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -25,6 +25,8 @@ void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p); void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve); + static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { /* XXX: GuC submission is unavailable for now */
On 6/24/2021 12:04 AM, Matthew Brost wrote:
Implement GuC virtual engines. Rather simple implementation, basically just allocate an engine, setup context enter / exit function to virtual engine specific functions, set all other variables / functions to guc versions, and set the engine mask to that of all the siblings.
Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gem/i915_gem_context.c | 19 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gt/intel_context_types.h | 10 + drivers/gpu/drm/i915/gt/intel_engine.h | 45 +++- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 14 + .../drm/i915/gt/intel_execlists_submission.c | 186 +++++++------ .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 253 +++++++++++++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 + 10 files changed, 429 insertions(+), 132 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 5c07e6abf16a..8a9293e0ca92 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -72,7 +72,6 @@ #include "gt/intel_context_param.h" #include "gt/intel_engine_heartbeat.h" #include "gt/intel_engine_user.h" -#include "gt/intel_execlists_submission.h" /* virtual_engine */ #include "gt/intel_gpu_commands.h" #include "gt/intel_ring.h"
@@ -1568,9 +1567,6 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data) if (!HAS_EXECLISTS(i915)) return -ENODEV;
- if (intel_uc_uses_guc_submission(&i915->gt.uc))
return -ENODEV; /* not implement yet */
- if (get_user(idx, &ext->engine_index)) return -EFAULT;
@@ -1627,7 +1623,7 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data) } }
- ce = intel_execlists_create_virtual(siblings, n);
- ce = intel_engine_create_virtual(siblings, n); if (IS_ERR(ce)) { err = PTR_ERR(ce); goto out_siblings;
@@ -1723,13 +1719,9 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) * A non-virtual engine has no siblings to choose between; and * a submit fence will always be directed to the one engine. */
if (intel_engine_is_virtual(virtual)) {
err = intel_virtual_engine_attach_bond(virtual,
master,
bond);
if (err)
return err;
}
err = intel_engine_attach_bond(virtual, master, bond);
if (err)
return err;
}
return 0;
@@ -2116,8 +2108,7 @@ static int clone_engines(struct i915_gem_context *dst, * the virtual engine instead. */ if (intel_engine_is_virtual(engine))
clone->engines[n] =
intel_execlists_clone_virtual(engine);
else clone->engines[n] = intel_context_create(engine); if (IS_ERR_OR_NULL(clone->engines[n])) {clone->engines[n] = intel_engine_clone_virtual(engine);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h index b5c908f3f4f2..ba772762f7b9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h @@ -10,6 +10,7 @@ #include "i915_gem_context_types.h"
#include "gt/intel_context.h" +#include "gt/intel_engine.h"
#include "i915_drv.h" #include "i915_gem.h" diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index e7af6a2368f8..6945963a31ba 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -47,6 +47,16 @@ struct intel_context_ops {
void (*reset)(struct intel_context *ce); void (*destroy)(struct kref *kref);
- /* virtual engine/context interface */
- struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
unsigned int count);
- struct intel_context *(*clone_virtual)(struct intel_engine_cs *engine);
- struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
unsigned int sibling);
- int (*attach_bond)(struct intel_engine_cs *engine,
const struct intel_engine_cs *master,
const struct intel_engine_cs *sibling);
Cloning and bonding for VE have been removed, so can be dropped. I'll skip reviewing all the related code in this patch.
};
struct intel_context { diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index f911c1224ab2..923eaee627b3 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -273,13 +273,56 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine) return intel_engine_has_preemption(engine); }
+struct intel_context * +intel_engine_create_virtual(struct intel_engine_cs **siblings,
unsigned int count);
+static inline bool +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine) +{
- if (intel_engine_uses_guc(engine))
return intel_guc_virtual_engine_has_heartbeat(engine);
- else
GEM_BUG_ON("Only should be called in GuC submission");
- return false;
+}
This could use a better explanation. Maybe something like:
static inline bool intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine) { /* * For non-GuC submission we expect the back-end to look at the * heartbeat status of the actual physical engine that the work * has been (or is being) scheduled on, so we should only reach * here with GuC submission enabled. Â */ GEM_BUG_ON(!intel_engine_uses_guc(engine));
return intel_guc_virtual_engine_has_heartbeat(engine); }
- static inline bool intel_engine_has_heartbeat(const struct intel_engine_cs *engine) { if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL)) return false;
- return READ_ONCE(engine->props.heartbeat_interval_ms);
- if (intel_engine_is_virtual(engine))
return intel_virtual_engine_has_heartbeat(engine);
- else
return READ_ONCE(engine->props.heartbeat_interval_ms);
+}
+static inline struct intel_context * +intel_engine_clone_virtual(struct intel_engine_cs *src) +{
- GEM_BUG_ON(!intel_engine_is_virtual(src));
- return src->cops->clone_virtual(src);
+}
+static inline int +intel_engine_attach_bond(struct intel_engine_cs *engine,
const struct intel_engine_cs *master,
const struct intel_engine_cs *sibling)
+{
- if (!engine->cops->attach_bond)
return 0;
- return engine->cops->attach_bond(engine, master, sibling);
+}
+static inline struct intel_engine_cs * +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) +{
GEM_BUG_ON(!intel_engine_is_virtual(engine));
return engine->cops->get_sibling(engine, sibling); }
#endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 88694822716a..d13b1716c29e 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1736,6 +1736,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now) return total; }
+struct intel_context * +intel_engine_create_virtual(struct intel_engine_cs **siblings,
unsigned int count)
+{
- if (count == 0)
return ERR_PTR(-EINVAL);
- if (count == 1)
return intel_context_create(siblings[0]);
- GEM_BUG_ON(!siblings[0]->cops->create_virtual);
- return siblings[0]->cops->create_virtual(siblings, count);
+}
- static bool match_ring(struct i915_request *rq) { u32 ring = ENGINE_READ(rq->engine, RING_START);
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index cdb2126a159a..bd4ced794ff9 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -205,6 +205,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine) return container_of(engine, struct virtual_engine, base); }
+static struct intel_context * +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
- static struct i915_request * __active_request(const struct intel_timeline * const tl, struct i915_request *rq,
@@ -2560,6 +2563,8 @@ static const struct intel_context_ops execlists_context_ops = {
.reset = lrc_reset, .destroy = lrc_destroy,
.create_virtual = execlists_create_virtual, };
static int emit_pdps(struct i915_request *rq)
@@ -3506,6 +3511,94 @@ static void virtual_context_exit(struct intel_context *ce) intel_engine_pm_put(ve->siblings[n]); }
+static struct intel_engine_cs * +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) +{
- struct virtual_engine *ve = to_virtual_engine(engine);
- if (sibling >= ve->num_siblings)
return NULL;
- return ve->siblings[sibling];
+}
+static struct intel_context * +virtual_clone(struct intel_engine_cs *src) +{
- struct virtual_engine *se = to_virtual_engine(src);
- struct intel_context *dst;
- dst = execlists_create_virtual(se->siblings, se->num_siblings);
- if (IS_ERR(dst))
return dst;
- if (se->num_bonds) {
struct virtual_engine *de = to_virtual_engine(dst->engine);
de->bonds = kmemdup(se->bonds,
sizeof(*se->bonds) * se->num_bonds,
GFP_KERNEL);
if (!de->bonds) {
intel_context_put(dst);
return ERR_PTR(-ENOMEM);
}
de->num_bonds = se->num_bonds;
- }
- return dst;
+}
+static struct ve_bond * +virtual_find_bond(struct virtual_engine *ve,
const struct intel_engine_cs *master)
+{
- int i;
- for (i = 0; i < ve->num_bonds; i++) {
if (ve->bonds[i].master == master)
return &ve->bonds[i];
- }
- return NULL;
+}
+static int virtual_attach_bond(struct intel_engine_cs *engine,
const struct intel_engine_cs *master,
const struct intel_engine_cs *sibling)
+{
- struct virtual_engine *ve = to_virtual_engine(engine);
- struct ve_bond *bond;
- int n;
- /* Sanity check the sibling is part of the virtual engine */
- for (n = 0; n < ve->num_siblings; n++)
if (sibling == ve->siblings[n])
break;
- if (n == ve->num_siblings)
return -EINVAL;
- bond = virtual_find_bond(ve, master);
- if (bond) {
bond->sibling_mask |= sibling->mask;
return 0;
- }
- bond = krealloc(ve->bonds,
sizeof(*bond) * (ve->num_bonds + 1),
GFP_KERNEL);
- if (!bond)
return -ENOMEM;
- bond[ve->num_bonds].master = master;
- bond[ve->num_bonds].sibling_mask = sibling->mask;
- ve->bonds = bond;
- ve->num_bonds++;
- return 0;
+}
- static const struct intel_context_ops virtual_context_ops = { .flags = COPS_HAS_INFLIGHT,
@@ -3520,6 +3613,10 @@ static const struct intel_context_ops virtual_context_ops = { .exit = virtual_context_exit,
.destroy = virtual_context_destroy,
.clone_virtual = virtual_clone,
.get_sibling = virtual_get_sibling,
.attach_bond = virtual_attach_bond, };
static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
@@ -3668,20 +3765,6 @@ static void virtual_submit_request(struct i915_request *rq) spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags); }
-static struct ve_bond * -virtual_find_bond(struct virtual_engine *ve,
const struct intel_engine_cs *master)
-{
- int i;
- for (i = 0; i < ve->num_bonds; i++) {
if (ve->bonds[i].master == master)
return &ve->bonds[i];
- }
- return NULL;
-}
- static void virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) {
@@ -3704,20 +3787,13 @@ virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) to_request(signal)->execution_mask &= ~allowed; }
-struct intel_context * -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
unsigned int count)
+static struct intel_context * +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) { struct virtual_engine *ve; unsigned int n; int err;
- if (count == 0)
return ERR_PTR(-EINVAL);
- if (count == 1)
return intel_context_create(siblings[0]);
- ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL); if (!ve) return ERR_PTR(-ENOMEM);
@@ -3850,70 +3926,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings, return ERR_PTR(err); }
-struct intel_context * -intel_execlists_clone_virtual(struct intel_engine_cs *src) -{
- struct virtual_engine *se = to_virtual_engine(src);
- struct intel_context *dst;
- dst = intel_execlists_create_virtual(se->siblings,
se->num_siblings);
- if (IS_ERR(dst))
return dst;
- if (se->num_bonds) {
struct virtual_engine *de = to_virtual_engine(dst->engine);
de->bonds = kmemdup(se->bonds,
sizeof(*se->bonds) * se->num_bonds,
GFP_KERNEL);
if (!de->bonds) {
intel_context_put(dst);
return ERR_PTR(-ENOMEM);
}
de->num_bonds = se->num_bonds;
- }
- return dst;
-}
-int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
const struct intel_engine_cs *master,
const struct intel_engine_cs *sibling)
-{
- struct virtual_engine *ve = to_virtual_engine(engine);
- struct ve_bond *bond;
- int n;
- /* Sanity check the sibling is part of the virtual engine */
- for (n = 0; n < ve->num_siblings; n++)
if (sibling == ve->siblings[n])
break;
- if (n == ve->num_siblings)
return -EINVAL;
- bond = virtual_find_bond(ve, master);
- if (bond) {
bond->sibling_mask |= sibling->mask;
return 0;
- }
- bond = krealloc(ve->bonds,
sizeof(*bond) * (ve->num_bonds + 1),
GFP_KERNEL);
- if (!bond)
return -ENOMEM;
- bond[ve->num_bonds].master = master;
- bond[ve->num_bonds].sibling_mask = sibling->mask;
- ve->bonds = bond;
- ve->num_bonds++;
- return 0;
-}
- void intel_execlists_show_requests(struct intel_engine_cs *engine, struct drm_printer *m, void (*show_request)(struct drm_printer *m,
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h index 4ca9b475e252..74041b1994af 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h @@ -32,15 +32,4 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, int indent), unsigned int max);
-struct intel_context * -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
unsigned int count);
-struct intel_context * -intel_execlists_clone_virtual(struct intel_engine_cs *src);
-int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
const struct intel_engine_cs *master,
const struct intel_engine_cs *sibling);
- #endif /* __INTEL_EXECLISTS_SUBMISSION_H__ */
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c index 08896ae027d5..88aac9977e09 100644 --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt, GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
for (n = 0; n < nctx; n++) {
ve[n] = intel_execlists_create_virtual(siblings, nsibling);
if (IS_ERR(ve[n])) { err = PTR_ERR(ve[n]); nctx = n;ve[n] = intel_engine_create_virtual(siblings, nsibling);
@@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt, * restrict it to our desired engine within the virtual engine. */
- ve = intel_execlists_create_virtual(siblings, nsibling);
- ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); goto out_close;
@@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt, i915_request_add(rq); }
- ce = intel_execlists_create_virtual(siblings, nsibling);
- ce = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ce)) { err = PTR_ERR(ce); goto out;
@@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
/* XXX We do not handle oversubscription and fairness with normal rq */ for (n = 0; n < nsibling; n++) {
ce = intel_execlists_create_virtual(siblings, nsibling);
if (IS_ERR(ce)) { err = PTR_ERR(ce); goto out;ce = intel_engine_create_virtual(siblings, nsibling);
@@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt, if (err) goto out_scratch;
- ve = intel_execlists_create_virtual(siblings, nsibling);
- ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); goto out_scratch;
@@ -4431,16 +4431,16 @@ static int bond_virtual_engine(struct intel_gt *gt, for (n = 0; n < nsibling; n++) { struct intel_context *ve;
ve = intel_execlists_create_virtual(siblings, nsibling);
ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); onstack_fence_fini(&fence); goto out; }
err = intel_virtual_engine_attach_bond(ve->engine,
master,
siblings[n]);
err = intel_engine_attach_bond(ve->engine,
master,
siblings[n]); if (err) { intel_context_put(ve); onstack_fence_fini(&fence);
@@ -4576,7 +4576,7 @@ static int reset_virtual_engine(struct intel_gt *gt, if (igt_spinner_init(&spin, gt)) return -ENOMEM;
- ve = intel_execlists_create_virtual(siblings, nsibling);
- ve = intel_engine_create_virtual(siblings, nsibling); if (IS_ERR(ve)) { err = PTR_ERR(ve); goto out_spin;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d605af0d66e6..ccbcf024b31b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -60,6 +60,15 @@
*/
+/* GuC Virtual Engine */ +struct guc_virtual_engine {
- struct intel_engine_cs base;
- struct intel_context context;
+};
+static struct intel_context * +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
#define GUC_REQUEST_SIZE 64 /* bytes */
/*
@@ -928,20 +937,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce) return ret; }
-static int guc_context_pre_pin(struct intel_context *ce,
struct i915_gem_ww_ctx *ww,
void **vaddr)
+static int __guc_context_pre_pin(struct intel_context *ce,
struct intel_engine_cs *engine,
struct i915_gem_ww_ctx *ww,
{void **vaddr)
- return lrc_pre_pin(ce, ce->engine, ww, vaddr);
- return lrc_pre_pin(ce, engine, ww, vaddr); }
-static int guc_context_pin(struct intel_context *ce, void *vaddr) +static int __guc_context_pin(struct intel_context *ce,
struct intel_engine_cs *engine,
{ if (i915_ggtt_offset(ce->state) != (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK)) set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);void *vaddr)
- return lrc_pin(ce, ce->engine, vaddr);
- return lrc_pin(ce, engine, vaddr);
+}
+static int guc_context_pre_pin(struct intel_context *ce,
struct i915_gem_ww_ctx *ww,
void **vaddr)
+{
- return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
+}
+static int guc_context_pin(struct intel_context *ce, void *vaddr) +{
return __guc_context_pin(ce, ce->engine, vaddr); }
static void guc_context_unpin(struct intel_context *ce)
@@ -1041,6 +1065,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) deregister_context(ce, ce->guc_id); }
+static void __guc_context_destroy(struct intel_context *ce) +{
- lrc_fini(ce);
- intel_context_fini(ce);
- if (intel_engine_is_virtual(ce->engine)) {
struct guc_virtual_engine *ve =
container_of(ce, typeof(*ve), context);
kfree(ve);
- } else {
intel_context_free(ce);
- }
+}
- static void guc_context_destroy(struct kref *kref) { struct intel_context *ce = container_of(kref, typeof(*ce), ref);
@@ -1057,7 +1096,7 @@ static void guc_context_destroy(struct kref *kref) if (context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { release_guc_id(guc, ce);
lrc_destroy(kref);
AFAICS after this patch we only have 1 use of lrc_destroy inside the execlists file, while we do have 2 open coded implementations (here and execlists VE). Since lrc_fini and intel_context_fini are still always called as a pair, maybe we can replace lrc_destroy with a function that calls those 2 (i.e basically just remove the free() from lrc_destroy)? Can be done as a follow up.
return; }__guc_context_destroy(ce);
@@ -1073,7 +1112,7 @@ static void guc_context_destroy(struct kref *kref) if (context_guc_id_invalid(ce)) { __release_guc_id(guc, ce); spin_unlock_irqrestore(&guc->contexts_lock, flags);
lrc_destroy(kref);
return; }__guc_context_destroy(ce);
@@ -1118,6 +1157,8 @@ static const struct intel_context_ops guc_context_ops = {
.reset = lrc_reset, .destroy = guc_context_destroy,
.create_virtual = guc_create_virtual, };
static void __guc_signal_context_fence(struct intel_context *ce)
@@ -1246,6 +1287,96 @@ static int guc_request_alloc(struct i915_request *rq) return 0; }
+static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) +{
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
+}
+static int guc_virtual_context_pre_pin(struct intel_context *ce,
struct i915_gem_ww_ctx *ww,
void **vaddr)
+{
- struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
- return __guc_context_pre_pin(ce, engine, ww, vaddr);
+}
+static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr) +{
- struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
- return __guc_context_pin(ce, engine, vaddr);
+}
+static void guc_virtual_context_enter(struct intel_context *ce) +{
- intel_engine_mask_t tmp, mask = ce->engine->mask;
- struct intel_engine_cs *engine;
- for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
intel_engine_pm_get(engine);
- intel_timeline_enter(ce->timeline);
+}
+static void guc_virtual_context_exit(struct intel_context *ce) +{
- intel_engine_mask_t tmp, mask = ce->engine->mask;
- struct intel_engine_cs *engine;
- for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
intel_engine_pm_put(engine);
- intel_timeline_exit(ce->timeline);
+}
+static int guc_virtual_context_alloc(struct intel_context *ce) +{
- struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
- return lrc_alloc(ce, engine);
+}
+static struct intel_context *guc_clone_virtual(struct intel_engine_cs *src) +{
- struct intel_engine_cs *siblings[GUC_MAX_INSTANCES_PER_CLASS], *engine;
- intel_engine_mask_t tmp, mask = src->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, src->gt, mask, tmp)
siblings[num_siblings++] = engine;
- return guc_create_virtual(siblings, num_siblings);
+}
+static const struct intel_context_ops virtual_guc_context_ops = {
- .alloc = guc_virtual_context_alloc,
- .pre_pin = guc_virtual_context_pre_pin,
- .pin = guc_virtual_context_pin,
- .unpin = guc_context_unpin,
- .post_unpin = guc_context_post_unpin,
- .enter = guc_virtual_context_enter,
- .exit = guc_virtual_context_exit,
- .sched_disable = guc_context_sched_disable,
- .destroy = guc_context_destroy,
- .clone_virtual = guc_clone_virtual,
- .get_sibling = guc_virtual_get_sibling,
+};
- static void sanitize_hwsp(struct intel_engine_cs *engine) { struct intel_timeline *tl;
@@ -1557,7 +1688,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, } else if (context_destroyed(ce)) { /* Context has been destroyed */ release_guc_id(guc, ce);
lrc_destroy(&ce->ref);
__guc_context_destroy(ce);
}
decr_outstanding_submission_g2h(guc);
@@ -1669,3 +1800,107 @@ void intel_guc_log_context_info(struct intel_guc *guc, atomic_read(&ce->guc_sched_state_no_lock)); } }
+static struct intel_context * +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) +{
- struct guc_virtual_engine *ve;
- struct intel_guc *guc;
- unsigned int n;
- int err;
- ve = kzalloc(sizeof(*ve), GFP_KERNEL);
- if (!ve)
return ERR_PTR(-ENOMEM);
- guc = &siblings[0]->gt->uc.guc;
- ve->base.i915 = siblings[0]->i915;
- ve->base.gt = siblings[0]->gt;
- ve->base.uncore = siblings[0]->uncore;
- ve->base.id = -1;
- ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
- ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
- ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
- ve->base.saturated = ALL_ENGINES;
Most of these settings are the same for both execlists and GuC and aren't back-end dependent. Maybe we can have a:
intel_virtual_engine_init_early(struct intel_engine_cs *engine, Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â struct intel_engine_cs *sibling);
And call that from both places? Can be done as a follow-up
- ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
- if (!ve->base.breadcrumbs) {
kfree(ve);
return ERR_PTR(-ENOMEM);
- }
- snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
- ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
- ve->base.cops = &virtual_guc_context_ops;
- ve->base.request_alloc = guc_request_alloc;
- ve->base.submit_request = guc_submit_request;
- ve->base.flags = I915_ENGINE_IS_VIRTUAL;
- intel_context_init(&ve->context, &ve->base);
- for (n = 0; n < count; n++) {
struct intel_engine_cs *sibling = siblings[n];
GEM_BUG_ON(!is_power_of_2(sibling->mask));
if (sibling->mask & ve->base.mask) {
DRM_DEBUG("duplicate %s entry in load balancer\n",
sibling->name);
err = -EINVAL;
goto err_put;
}
ve->base.mask |= sibling->mask;
if (n != 0 && ve->base.class != sibling->class) {
DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
sibling->class, ve->base.class);
err = -EINVAL;
goto err_put;
} else if (n == 0) {
ve->base.class = sibling->class;
ve->base.uabi_class = sibling->uabi_class;
snprintf(ve->base.name, sizeof(ve->base.name),
"v%dx%d", ve->base.class, count);
ve->base.context_size = sibling->context_size;
ve->base.emit_bb_start = sibling->emit_bb_start;
ve->base.emit_flush = sibling->emit_flush;
ve->base.emit_init_breadcrumb =
sibling->emit_init_breadcrumb;
ve->base.emit_fini_breadcrumb =
sibling->emit_fini_breadcrumb;
ve->base.emit_fini_breadcrumb_dw =
sibling->emit_fini_breadcrumb_dw;
ve->base.flags |= sibling->flags;
Same here, most of these setting from the sibling are the same. intel_virtual_engine_inherit_from_sibling()?
Apart from the various nits the code LGTM, but I'll wait until the next spin for an r-b since a good chunk of the patch is going away.
Daniele
ve->base.props.timeslice_duration_ms =
sibling->props.timeslice_duration_ms;
}
- }
- return &ve->context;
+err_put:
- intel_context_put(&ve->context);
- return ERR_PTR(err);
+}
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve) +{
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (READ_ONCE(engine->props.heartbeat_interval_ms))
return true;
- return false;
+} diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 6453e2bfa151..95df5ab06031 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -25,6 +25,8 @@ void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p); void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
- static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { /* XXX: GuC submission is unavailable for now */
From: John Harrison John.C.Harrison@Intel.com
The serial number tracking of engines happens at the backend of request submission and was expecting to only be given physical engines. However, in GuC submission mode, the decomposition of virtual to physical engines does not happen in i915. Instead, requests are submitted to their virtual engine mask all the way through to the hardware (i.e. to GuC). This would mean that the heart beat code thinks the physical engines are idle due to the serial number not incrementing.
This patch updates the tracking to decompose virtual engines into their physical constituents and tracks the request against each. This is not entirely accurate as the GuC will only be issuing the request to one physical engine. However, it is the best that i915 can do given that it has no knowledge of the GuC's scheduling decisions.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ .../gpu/drm/i915/gt/intel_execlists_submission.c | 6 ++++++ drivers/gpu/drm/i915/gt/intel_ring_submission.c | 6 ++++++ drivers/gpu/drm/i915/gt/mock_engine.c | 6 ++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 16 ++++++++++++++++ drivers/gpu/drm/i915/i915_request.c | 4 +++- 6 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 5b91068ab277..1dc59e6c9a92 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -388,6 +388,8 @@ struct intel_engine_cs { void (*park)(struct intel_engine_cs *engine); void (*unpark)(struct intel_engine_cs *engine);
+ void (*bump_serial)(struct intel_engine_cs *engine); + void (*set_default_submission)(struct intel_engine_cs *engine);
const struct intel_context_ops *cops; diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index bd4ced794ff9..9cfb8800a0e6 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3203,6 +3203,11 @@ static void execlists_release(struct intel_engine_cs *engine) lrc_fini_wa_ctx(engine); }
+static void execlist_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + static void logical_ring_default_vfuncs(struct intel_engine_cs *engine) { @@ -3212,6 +3217,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; + engine->bump_serial = execlist_bump_serial;
engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind; diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 5d42a12ef3d6..e1506b280df1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1044,6 +1044,11 @@ static void setup_irq(struct intel_engine_cs *engine) } }
+static void ring_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915; @@ -1063,6 +1068,7 @@ static void setup_common(struct intel_engine_cs *engine)
engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; + engine->bump_serial = ring_bump_serial;
/* * Using a global execution timeline; the previous final breadcrumb is diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 68970398e4ef..9203c766db80 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine) intel_engine_fini_retire(engine); }
+static void mock_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, const char *name, int id) @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
engine->base.cops = &mock_context_ops; engine->base.request_alloc = mock_request_alloc; + engine->base.bump_serial = mock_bump_serial; engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ccbcf024b31b..d1badd7137b7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1496,6 +1496,20 @@ static void guc_release(struct intel_engine_cs *engine) lrc_fini_wa_ctx(engine); }
+static void guc_bump_serial(struct intel_engine_cs *engine) +{ + engine->serial++; +} + +static void virtual_guc_bump_serial(struct intel_engine_cs *engine) +{ + struct intel_engine_cs *e; + intel_engine_mask_t tmp, mask = engine->mask; + + for_each_engine_masked(e, engine->gt, mask, tmp) + e->serial++; +} + static void guc_default_vfuncs(struct intel_engine_cs *engine) { /* Default vfuncs which can be overridden by each engine. */ @@ -1504,6 +1518,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; + engine->bump_serial = guc_bump_serial;
engine->sched_engine->schedule = i915_schedule;
@@ -1836,6 +1851,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
ve->base.cops = &virtual_guc_context_ops; ve->base.request_alloc = guc_request_alloc; + ve->base.bump_serial = virtual_guc_bump_serial;
ve->base.submit_request = guc_submit_request;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7f7aa096e873..de9deb95b8b1 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request *request) request->ring->vaddr + request->postfix);
trace_i915_request_execute(request); - engine->serial++; + if (engine->bump_serial) + engine->bump_serial(engine); + result = true;
GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
On 6/24/2021 00:04, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
The serial number tracking of engines happens at the backend of request submission and was expecting to only be given physical engines. However, in GuC submission mode, the decomposition of virtual to physical engines does not happen in i915. Instead, requests are submitted to their virtual engine mask all the way through to the hardware (i.e. to GuC). This would mean that the heart beat code thinks the physical engines are idle due to the serial number not incrementing.
This patch updates the tracking to decompose virtual engines into their physical constituents and tracks the request against each. This is not entirely accurate as the GuC will only be issuing the request to one physical engine. However, it is the best that i915 can do given that it has no knowledge of the GuC's scheduling decisions.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
Need to pull in the updated subject line and commit description from Tvrtko in the RFC patch set review.
John.
drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ .../gpu/drm/i915/gt/intel_execlists_submission.c | 6 ++++++ drivers/gpu/drm/i915/gt/intel_ring_submission.c | 6 ++++++ drivers/gpu/drm/i915/gt/mock_engine.c | 6 ++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 16 ++++++++++++++++ drivers/gpu/drm/i915/i915_request.c | 4 +++- 6 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 5b91068ab277..1dc59e6c9a92 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -388,6 +388,8 @@ struct intel_engine_cs { void (*park)(struct intel_engine_cs *engine); void (*unpark)(struct intel_engine_cs *engine);
void (*bump_serial)(struct intel_engine_cs *engine);
void (*set_default_submission)(struct intel_engine_cs *engine);
const struct intel_context_ops *cops;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index bd4ced794ff9..9cfb8800a0e6 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3203,6 +3203,11 @@ static void execlists_release(struct intel_engine_cs *engine) lrc_fini_wa_ctx(engine); }
+static void execlist_bump_serial(struct intel_engine_cs *engine) +{
- engine->serial++;
+}
- static void logical_ring_default_vfuncs(struct intel_engine_cs *engine) {
@@ -3212,6 +3217,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc;
engine->bump_serial = execlist_bump_serial;
engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 5d42a12ef3d6..e1506b280df1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1044,6 +1044,11 @@ static void setup_irq(struct intel_engine_cs *engine) } }
+static void ring_bump_serial(struct intel_engine_cs *engine) +{
- engine->serial++;
+}
- static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
@@ -1063,6 +1068,7 @@ static void setup_common(struct intel_engine_cs *engine)
engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc;
engine->bump_serial = ring_bump_serial;
/*
- Using a global execution timeline; the previous final breadcrumb is
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 68970398e4ef..9203c766db80 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine) intel_engine_fini_retire(engine); }
+static void mock_bump_serial(struct intel_engine_cs *engine) +{
- engine->serial++;
+}
- struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, const char *name, int id)
@@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
engine->base.cops = &mock_context_ops; engine->base.request_alloc = mock_request_alloc;
- engine->base.bump_serial = mock_bump_serial; engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ccbcf024b31b..d1badd7137b7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1496,6 +1496,20 @@ static void guc_release(struct intel_engine_cs *engine) lrc_fini_wa_ctx(engine); }
+static void guc_bump_serial(struct intel_engine_cs *engine) +{
- engine->serial++;
+}
+static void virtual_guc_bump_serial(struct intel_engine_cs *engine) +{
- struct intel_engine_cs *e;
- intel_engine_mask_t tmp, mask = engine->mask;
- for_each_engine_masked(e, engine->gt, mask, tmp)
e->serial++;
+}
- static void guc_default_vfuncs(struct intel_engine_cs *engine) { /* Default vfuncs which can be overridden by each engine. */
@@ -1504,6 +1518,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc;
engine->bump_serial = guc_bump_serial;
engine->sched_engine->schedule = i915_schedule;
@@ -1836,6 +1851,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
ve->base.cops = &virtual_guc_context_ops; ve->base.request_alloc = guc_request_alloc;
ve->base.bump_serial = virtual_guc_bump_serial;
ve->base.submit_request = guc_submit_request;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7f7aa096e873..de9deb95b8b1 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request *request) request->ring->vaddr + request->postfix);
trace_i915_request_execute(request);
- engine->serial++;
if (engine->bump_serial)
engine->bump_serial(engine);
result = true;
GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
On Mon, Jul 12, 2021 at 11:11:48AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
The serial number tracking of engines happens at the backend of request submission and was expecting to only be given physical engines. However, in GuC submission mode, the decomposition of virtual to physical engines does not happen in i915. Instead, requests are submitted to their virtual engine mask all the way through to the hardware (i.e. to GuC). This would mean that the heart beat code thinks the physical engines are idle due to the serial number not incrementing.
This patch updates the tracking to decompose virtual engines into their physical constituents and tracks the request against each. This is not entirely accurate as the GuC will only be issuing the request to one physical engine. However, it is the best that i915 can do given that it has no knowledge of the GuC's scheduling decisions.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
Need to pull in the updated subject line and commit description from Tvrtko in the RFC patch set review.
Yep, forgot to do this. Will do in next rev.
Matt
John.
drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ .../gpu/drm/i915/gt/intel_execlists_submission.c | 6 ++++++ drivers/gpu/drm/i915/gt/intel_ring_submission.c | 6 ++++++ drivers/gpu/drm/i915/gt/mock_engine.c | 6 ++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 16 ++++++++++++++++ drivers/gpu/drm/i915/i915_request.c | 4 +++- 6 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 5b91068ab277..1dc59e6c9a92 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -388,6 +388,8 @@ struct intel_engine_cs { void (*park)(struct intel_engine_cs *engine); void (*unpark)(struct intel_engine_cs *engine);
- void (*bump_serial)(struct intel_engine_cs *engine);
- void (*set_default_submission)(struct intel_engine_cs *engine); const struct intel_context_ops *cops;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index bd4ced794ff9..9cfb8800a0e6 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3203,6 +3203,11 @@ static void execlists_release(struct intel_engine_cs *engine) lrc_fini_wa_ctx(engine); } +static void execlist_bump_serial(struct intel_engine_cs *engine) +{
- engine->serial++;
+}
- static void logical_ring_default_vfuncs(struct intel_engine_cs *engine) {
@@ -3212,6 +3217,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc;
- engine->bump_serial = execlist_bump_serial; engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index 5d42a12ef3d6..e1506b280df1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1044,6 +1044,11 @@ static void setup_irq(struct intel_engine_cs *engine) } } +static void ring_bump_serial(struct intel_engine_cs *engine) +{
- engine->serial++;
+}
- static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
@@ -1063,6 +1068,7 @@ static void setup_common(struct intel_engine_cs *engine) engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc;
- engine->bump_serial = ring_bump_serial; /*
- Using a global execution timeline; the previous final breadcrumb is
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 68970398e4ef..9203c766db80 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine) intel_engine_fini_retire(engine); } +static void mock_bump_serial(struct intel_engine_cs *engine) +{
- engine->serial++;
+}
- struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, const char *name, int id)
@@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.cops = &mock_context_ops; engine->base.request_alloc = mock_request_alloc;
- engine->base.bump_serial = mock_bump_serial; engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ccbcf024b31b..d1badd7137b7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1496,6 +1496,20 @@ static void guc_release(struct intel_engine_cs *engine) lrc_fini_wa_ctx(engine); } +static void guc_bump_serial(struct intel_engine_cs *engine) +{
- engine->serial++;
+}
+static void virtual_guc_bump_serial(struct intel_engine_cs *engine) +{
- struct intel_engine_cs *e;
- intel_engine_mask_t tmp, mask = engine->mask;
- for_each_engine_masked(e, engine->gt, mask, tmp)
e->serial++;
+}
- static void guc_default_vfuncs(struct intel_engine_cs *engine) { /* Default vfuncs which can be overridden by each engine. */
@@ -1504,6 +1518,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc;
- engine->bump_serial = guc_bump_serial; engine->sched_engine->schedule = i915_schedule;
@@ -1836,6 +1851,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) ve->base.cops = &virtual_guc_context_ops; ve->base.request_alloc = guc_request_alloc;
- ve->base.bump_serial = virtual_guc_bump_serial; ve->base.submit_request = guc_submit_request;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 7f7aa096e873..de9deb95b8b1 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -692,7 +692,9 @@ bool __i915_request_submit(struct i915_request *request) request->ring->vaddr + request->postfix); trace_i915_request_execute(request);
- engine->serial++;
- if (engine->bump_serial)
engine->bump_serial(engine);
- result = true; GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
Hold a reference to the intel_context over life of an i915_request. Without this an i915_request can exist after the context has been destroyed (e.g. request retired, context closed, but user space holds a reference to the request from an out fence). In the case of GuC submission + virtual engine, the engine that the request references is also destroyed which can trigger bad pointer dref in fence ops (e.g. i915_fence_get_driver_name). We could likely change i915_fence_get_driver_name to avoid touching the engine but let's just be safe and hold the intel_context reference.
Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/i915_request.c | 54 ++++++++++++----------------- 1 file changed, 22 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index de9deb95b8b1..dec5a35c9aa2 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence) i915_sw_fence_fini(&rq->semaphore);
/* - * Keep one request on each engine for reserved use under mempressure - * - * We do not hold a reference to the engine here and so have to be - * very careful in what rq->engine we poke. The virtual engine is - * referenced via the rq->context and we released that ref during - * i915_request_retire(), ergo we must not dereference a virtual - * engine here. Not that we would want to, as the only consumer of - * the reserved engine->request_pool is the power management parking, - * which must-not-fail, and that is only run on the physical engines. - * - * Since the request must have been executed to be have completed, - * we know that it will have been processed by the HW and will - * not be unsubmitted again, so rq->engine and rq->execution_mask - * at this point is stable. rq->execution_mask will be a single - * bit if the last and _only_ engine it could execution on was a - * physical engine, if it's multiple bits then it started on and - * could still be on a virtual engine. Thus if the mask is not a - * power-of-two we assume that rq->engine may still be a virtual - * engine and so a dangling invalid pointer that we cannot dereference - * - * For example, consider the flow of a bonded request through a virtual - * engine. The request is created with a wide engine mask (all engines - * that we might execute on). On processing the bond, the request mask - * is reduced to one or more engines. If the request is subsequently - * bound to a single engine, it will then be constrained to only - * execute on that engine and never returned to the virtual engine - * after timeslicing away, see __unwind_incomplete_requests(). Thus we - * know that if the rq->execution_mask is a single bit, rq->engine - * can be a physical engine with the exact corresponding mask. + * Keep one request on each engine for reserved use under mempressure, + * do not use with virtual engines as this really is only needed for + * kernel contexts. */ - if (is_power_of_2(rq->execution_mask) && - !cmpxchg(&rq->engine->request_pool, NULL, rq)) + if (!intel_engine_is_virtual(rq->engine) && + !cmpxchg(&rq->engine->request_pool, NULL, rq)) { + intel_context_put(rq->context); return; + } + + intel_context_put(rq->context);
kmem_cache_free(global.slab_requests, rq); } @@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) } }
- rq->context = ce; + /* + * Hold a reference to the intel_context over life of an i915_request. + * Without this an i915_request can exist after the context has been + * destroyed (e.g. request retired, context closed, but user space holds + * a reference to the request from an out fence). In the case of GuC + * submission + virtual engine, the engine that the request references + * is also destroyed which can trigger bad pointer dref in fence ops + * (e.g. i915_fence_get_driver_name). We could likely change these + * functions to avoid touching the engine but let's just be safe and + * hold the intel_context reference. + */ + rq->context = intel_context_get(ce); rq->engine = ce->engine; rq->ring = ce->ring; rq->execution_mask = ce->engine->mask; @@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
err_free: + intel_context_put(ce); kmem_cache_free(global.slab_requests, rq); err_unreserve: intel_context_unpin(ce);
On 6/24/2021 00:04, Matthew Brost wrote:
Hold a reference to the intel_context over life of an i915_request. Without this an i915_request can exist after the context has been destroyed (e.g. request retired, context closed, but user space holds a reference to the request from an out fence). In the case of GuC submission + virtual engine, the engine that the request references is also destroyed which can trigger bad pointer dref in fence ops (e.g.
Maybe quickly explain a why this is different for GuC submission vs execlist? Presumably it is about only decomposing virtual engines to physical ones in execlist mode?
i915_fence_get_driver_name). We could likely change i915_fence_get_driver_name to avoid touching the engine but let's just be safe and hold the intel_context reference.
Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/i915_request.c | 54 ++++++++++++----------------- 1 file changed, 22 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index de9deb95b8b1..dec5a35c9aa2 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence) i915_sw_fence_fini(&rq->semaphore);
/*
* Keep one request on each engine for reserved use under mempressure
*
* We do not hold a reference to the engine here and so have to be
* very careful in what rq->engine we poke. The virtual engine is
* referenced via the rq->context and we released that ref during
* i915_request_retire(), ergo we must not dereference a virtual
* engine here. Not that we would want to, as the only consumer of
* the reserved engine->request_pool is the power management parking,
* which must-not-fail, and that is only run on the physical engines.
*
* Since the request must have been executed to be have completed,
* we know that it will have been processed by the HW and will
* not be unsubmitted again, so rq->engine and rq->execution_mask
* at this point is stable. rq->execution_mask will be a single
* bit if the last and _only_ engine it could execution on was a
* physical engine, if it's multiple bits then it started on and
* could still be on a virtual engine. Thus if the mask is not a
* power-of-two we assume that rq->engine may still be a virtual
* engine and so a dangling invalid pointer that we cannot dereference
*
* For example, consider the flow of a bonded request through a virtual
* engine. The request is created with a wide engine mask (all engines
* that we might execute on). On processing the bond, the request mask
* is reduced to one or more engines. If the request is subsequently
* bound to a single engine, it will then be constrained to only
* execute on that engine and never returned to the virtual engine
* after timeslicing away, see __unwind_incomplete_requests(). Thus we
* know that if the rq->execution_mask is a single bit, rq->engine
* can be a physical engine with the exact corresponding mask.
* Keep one request on each engine for reserved use under mempressure,
* do not use with virtual engines as this really is only needed for
*/* kernel contexts.
- if (is_power_of_2(rq->execution_mask) &&
!cmpxchg(&rq->engine->request_pool, NULL, rq))
- if (!intel_engine_is_virtual(rq->engine) &&
!cmpxchg(&rq->engine->request_pool, NULL, rq)) {
return;intel_context_put(rq->context);
- }
- intel_context_put(rq->context);
The put is actually unconditional? So it could be moved before the if?
John.
kmem_cache_free(global.slab_requests, rq); } @@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) } }
- rq->context = ce;
- /*
* Hold a reference to the intel_context over life of an i915_request.
* Without this an i915_request can exist after the context has been
* destroyed (e.g. request retired, context closed, but user space holds
* a reference to the request from an out fence). In the case of GuC
* submission + virtual engine, the engine that the request references
* is also destroyed which can trigger bad pointer dref in fence ops
* (e.g. i915_fence_get_driver_name). We could likely change these
* functions to avoid touching the engine but let's just be safe and
* hold the intel_context reference.
*/
- rq->context = intel_context_get(ce); rq->engine = ce->engine; rq->ring = ce->ring; rq->execution_mask = ce->engine->mask;
@@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
err_free:
- intel_context_put(ce); kmem_cache_free(global.slab_requests, rq); err_unreserve: intel_context_unpin(ce);
On Mon, Jul 12, 2021 at 11:23:14AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Hold a reference to the intel_context over life of an i915_request. Without this an i915_request can exist after the context has been destroyed (e.g. request retired, context closed, but user space holds a reference to the request from an out fence). In the case of GuC submission + virtual engine, the engine that the request references is also destroyed which can trigger bad pointer dref in fence ops (e.g.
Maybe quickly explain a why this is different for GuC submission vs execlist? Presumably it is about only decomposing virtual engines to physical ones in execlist mode?
Yes, it because in execlists we always end up pointing to a physical engine in the end while in GuC mode we can be pointing to dynamically allocated virtual engine. I can update the comment.
i915_fence_get_driver_name). We could likely change i915_fence_get_driver_name to avoid touching the engine but let's just be safe and hold the intel_context reference.
Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/i915_request.c | 54 ++++++++++++----------------- 1 file changed, 22 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index de9deb95b8b1..dec5a35c9aa2 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence) i915_sw_fence_fini(&rq->semaphore); /*
* Keep one request on each engine for reserved use under mempressure
*
* We do not hold a reference to the engine here and so have to be
* very careful in what rq->engine we poke. The virtual engine is
* referenced via the rq->context and we released that ref during
* i915_request_retire(), ergo we must not dereference a virtual
* engine here. Not that we would want to, as the only consumer of
* the reserved engine->request_pool is the power management parking,
* which must-not-fail, and that is only run on the physical engines.
*
* Since the request must have been executed to be have completed,
* we know that it will have been processed by the HW and will
* not be unsubmitted again, so rq->engine and rq->execution_mask
* at this point is stable. rq->execution_mask will be a single
* bit if the last and _only_ engine it could execution on was a
* physical engine, if it's multiple bits then it started on and
* could still be on a virtual engine. Thus if the mask is not a
* power-of-two we assume that rq->engine may still be a virtual
* engine and so a dangling invalid pointer that we cannot dereference
*
* For example, consider the flow of a bonded request through a virtual
* engine. The request is created with a wide engine mask (all engines
* that we might execute on). On processing the bond, the request mask
* is reduced to one or more engines. If the request is subsequently
* bound to a single engine, it will then be constrained to only
* execute on that engine and never returned to the virtual engine
* after timeslicing away, see __unwind_incomplete_requests(). Thus we
* know that if the rq->execution_mask is a single bit, rq->engine
* can be a physical engine with the exact corresponding mask.
* Keep one request on each engine for reserved use under mempressure,
* do not use with virtual engines as this really is only needed for
*/* kernel contexts.
- if (is_power_of_2(rq->execution_mask) &&
!cmpxchg(&rq->engine->request_pool, NULL, rq))
- if (!intel_engine_is_virtual(rq->engine) &&
!cmpxchg(&rq->engine->request_pool, NULL, rq)) {
return;intel_context_put(rq->context);
- }
- intel_context_put(rq->context);
The put is actually unconditional? So it could be moved before the if?
Yep, I think so.
Matt
John.
kmem_cache_free(global.slab_requests, rq); } @@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) } }
- rq->context = ce;
- /*
* Hold a reference to the intel_context over life of an i915_request.
* Without this an i915_request can exist after the context has been
* destroyed (e.g. request retired, context closed, but user space holds
* a reference to the request from an out fence). In the case of GuC
* submission + virtual engine, the engine that the request references
* is also destroyed which can trigger bad pointer dref in fence ops
* (e.g. i915_fence_get_driver_name). We could likely change these
* functions to avoid touching the engine but let's just be safe and
* hold the intel_context reference.
*/
- rq->context = intel_context_get(ce); rq->engine = ce->engine; rq->ring = ce->ring; rq->execution_mask = ce->engine->mask;
@@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) GEM_BUG_ON(!list_empty(&rq->sched.waiters_list)); err_free:
- intel_context_put(ce); kmem_cache_free(global.slab_requests, rq); err_unreserve: intel_context_unpin(ce);
On Mon, Jul 12, 2021 at 08:05:30PM +0000, Matthew Brost wrote:
On Mon, Jul 12, 2021 at 11:23:14AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Hold a reference to the intel_context over life of an i915_request. Without this an i915_request can exist after the context has been destroyed (e.g. request retired, context closed, but user space holds a reference to the request from an out fence). In the case of GuC submission + virtual engine, the engine that the request references is also destroyed which can trigger bad pointer dref in fence ops (e.g.
Maybe quickly explain a why this is different for GuC submission vs execlist? Presumably it is about only decomposing virtual engines to physical ones in execlist mode?
Yes, it because in execlists we always end up pointing to a physical engine in the end while in GuC mode we can be pointing to dynamically allocated virtual engine. I can update the comment.
i915_fence_get_driver_name). We could likely change i915_fence_get_driver_name to avoid touching the engine but let's just be safe and hold the intel_context reference.
Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/i915_request.c | 54 ++++++++++++----------------- 1 file changed, 22 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index de9deb95b8b1..dec5a35c9aa2 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence) i915_sw_fence_fini(&rq->semaphore); /*
* Keep one request on each engine for reserved use under mempressure
*
* We do not hold a reference to the engine here and so have to be
* very careful in what rq->engine we poke. The virtual engine is
* referenced via the rq->context and we released that ref during
* i915_request_retire(), ergo we must not dereference a virtual
* engine here. Not that we would want to, as the only consumer of
* the reserved engine->request_pool is the power management parking,
* which must-not-fail, and that is only run on the physical engines.
*
* Since the request must have been executed to be have completed,
* we know that it will have been processed by the HW and will
* not be unsubmitted again, so rq->engine and rq->execution_mask
* at this point is stable. rq->execution_mask will be a single
* bit if the last and _only_ engine it could execution on was a
* physical engine, if it's multiple bits then it started on and
* could still be on a virtual engine. Thus if the mask is not a
* power-of-two we assume that rq->engine may still be a virtual
* engine and so a dangling invalid pointer that we cannot dereference
*
* For example, consider the flow of a bonded request through a virtual
* engine. The request is created with a wide engine mask (all engines
* that we might execute on). On processing the bond, the request mask
* is reduced to one or more engines. If the request is subsequently
* bound to a single engine, it will then be constrained to only
* execute on that engine and never returned to the virtual engine
* after timeslicing away, see __unwind_incomplete_requests(). Thus we
* know that if the rq->execution_mask is a single bit, rq->engine
* can be a physical engine with the exact corresponding mask.
* Keep one request on each engine for reserved use under mempressure,
* do not use with virtual engines as this really is only needed for
*/* kernel contexts.
- if (is_power_of_2(rq->execution_mask) &&
!cmpxchg(&rq->engine->request_pool, NULL, rq))
- if (!intel_engine_is_virtual(rq->engine) &&
!cmpxchg(&rq->engine->request_pool, NULL, rq)) {
return;intel_context_put(rq->context);
- }
- intel_context_put(rq->context);
The put is actually unconditional? So it could be moved before the if?
Yep, I think so.
Wait nope. We reference rq->engine which could a virtual engine and the intel_context_put could free that engine. So we need to do the put after we reference it.
Matt
Matt
John.
kmem_cache_free(global.slab_requests, rq); } @@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) } }
- rq->context = ce;
- /*
* Hold a reference to the intel_context over life of an i915_request.
* Without this an i915_request can exist after the context has been
* destroyed (e.g. request retired, context closed, but user space holds
* a reference to the request from an out fence). In the case of GuC
* submission + virtual engine, the engine that the request references
* is also destroyed which can trigger bad pointer dref in fence ops
* (e.g. i915_fence_get_driver_name). We could likely change these
* functions to avoid touching the engine but let's just be safe and
* hold the intel_context reference.
*/
- rq->context = intel_context_get(ce); rq->engine = ce->engine; rq->ring = ce->ring; rq->execution_mask = ce->engine->mask;
@@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) GEM_BUG_ON(!list_empty(&rq->sched.waiters_list)); err_free:
- intel_context_put(ce); kmem_cache_free(global.slab_requests, rq); err_unreserve: intel_context_unpin(ce);
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 7/12/2021 14:36, Matthew Brost wrote:
On Mon, Jul 12, 2021 at 08:05:30PM +0000, Matthew Brost wrote:
On Mon, Jul 12, 2021 at 11:23:14AM -0700, John Harrison wrote:
On 6/24/2021 00:04, Matthew Brost wrote:
Hold a reference to the intel_context over life of an i915_request. Without this an i915_request can exist after the context has been destroyed (e.g. request retired, context closed, but user space holds a reference to the request from an out fence). In the case of GuC submission + virtual engine, the engine that the request references is also destroyed which can trigger bad pointer dref in fence ops (e.g.
Maybe quickly explain a why this is different for GuC submission vs execlist? Presumably it is about only decomposing virtual engines to physical ones in execlist mode?
Yes, it because in execlists we always end up pointing to a physical engine in the end while in GuC mode we can be pointing to dynamically allocated virtual engine. I can update the comment.
i915_fence_get_driver_name). We could likely change i915_fence_get_driver_name to avoid touching the engine but let's just be safe and hold the intel_context reference.
Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/i915_request.c | 54 ++++++++++++----------------- 1 file changed, 22 insertions(+), 32 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index de9deb95b8b1..dec5a35c9aa2 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -126,39 +126,17 @@ static void i915_fence_release(struct dma_fence *fence) i915_sw_fence_fini(&rq->semaphore); /*
* Keep one request on each engine for reserved use under mempressure
*
* We do not hold a reference to the engine here and so have to be
* very careful in what rq->engine we poke. The virtual engine is
* referenced via the rq->context and we released that ref during
* i915_request_retire(), ergo we must not dereference a virtual
* engine here. Not that we would want to, as the only consumer of
* the reserved engine->request_pool is the power management parking,
* which must-not-fail, and that is only run on the physical engines.
*
* Since the request must have been executed to be have completed,
* we know that it will have been processed by the HW and will
* not be unsubmitted again, so rq->engine and rq->execution_mask
* at this point is stable. rq->execution_mask will be a single
* bit if the last and _only_ engine it could execution on was a
* physical engine, if it's multiple bits then it started on and
* could still be on a virtual engine. Thus if the mask is not a
* power-of-two we assume that rq->engine may still be a virtual
* engine and so a dangling invalid pointer that we cannot dereference
*
* For example, consider the flow of a bonded request through a virtual
* engine. The request is created with a wide engine mask (all engines
* that we might execute on). On processing the bond, the request mask
* is reduced to one or more engines. If the request is subsequently
* bound to a single engine, it will then be constrained to only
* execute on that engine and never returned to the virtual engine
* after timeslicing away, see __unwind_incomplete_requests(). Thus we
* know that if the rq->execution_mask is a single bit, rq->engine
* can be a physical engine with the exact corresponding mask.
* Keep one request on each engine for reserved use under mempressure,
* do not use with virtual engines as this really is only needed for
*/* kernel contexts.
- if (is_power_of_2(rq->execution_mask) &&
!cmpxchg(&rq->engine->request_pool, NULL, rq))
- if (!intel_engine_is_virtual(rq->engine) &&
!cmpxchg(&rq->engine->request_pool, NULL, rq)) {
intel_context_put(rq->context); return;
- }
- intel_context_put(rq->context);
The put is actually unconditional? So it could be moved before the if?
Yep, I think so.
Wait nope. We reference rq->engine which could a virtual engine and the intel_context_put could free that engine. So we need to do the put after we reference it.
Matt
Doh! That's a pretty good reason.
Okay, with a tweaked description to explain about virtual engines being different on GuC vs execlist...
Reviewed-by: John Harrison John.C.Harrison@Intel.com
Matt
John.
kmem_cache_free(global.slab_requests, rq);
} @@ -977,7 +955,18 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) } }
- rq->context = ce;
- /*
* Hold a reference to the intel_context over life of an i915_request.
* Without this an i915_request can exist after the context has been
* destroyed (e.g. request retired, context closed, but user space holds
* a reference to the request from an out fence). In the case of GuC
* submission + virtual engine, the engine that the request references
* is also destroyed which can trigger bad pointer dref in fence ops
* (e.g. i915_fence_get_driver_name). We could likely change these
* functions to avoid touching the engine but let's just be safe and
* hold the intel_context reference.
*/
- rq->context = intel_context_get(ce); rq->engine = ce->engine; rq->ring = ce->ring; rq->execution_mask = ce->engine->mask;
@@ -1054,6 +1043,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) GEM_BUG_ON(!list_empty(&rq->sched.waiters_list)); err_free:
- intel_context_put(ce); kmem_cache_free(global.slab_requests, rq); err_unreserve: intel_context_unpin(ce);
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Update the bonding extension to return -ENODEV when using GuC submission as this extension fundamentally will not work with the GuC submission interface.
Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 8a9293e0ca92..0429aa4172bf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1674,6 +1674,11 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) } virtual = set->engines->engines[idx]->engine;
+ if (intel_engine_uses_guc(virtual)) { + DRM_DEBUG("bonding extension not supported with GuC submission"); + return -ENODEV; + } + err = check_user_mbz(&ext->flags); if (err) return err;
On 6/24/2021 00:04, Matthew Brost wrote:
Update the bonding extension to return -ENODEV when using GuC submission as this extension fundamentally will not work with the GuC submission interface.
Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: John Harrison John.C.Harrison@Intel.com
drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 8a9293e0ca92..0429aa4172bf 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1674,6 +1674,11 @@ set_engines__bond(struct i915_user_extension __user *base, void *data) } virtual = set->engines->engines[idx]->engine;
- if (intel_engine_uses_guc(virtual)) {
DRM_DEBUG("bonding extension not supported with GuC submission");
return -ENODEV;
- }
- err = check_user_mbz(&ext->flags); if (err) return err;
With GuC virtual engines the physical engine which a request executes and completes on isn't known to the i915. Therefore we can't attach a request to a physical engines breadcrumbs. To work around this we create a single breadcrumbs per engine class when using GuC submission and direct all physical engine interrupts to this breadcrumbs.
Signed-off-by: Matthew Brost matthew.brost@intel.com CC: John Harrison John.C.Harrison@Intel.com --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +++++------- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +++- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 ++ drivers/gpu/drm/i915/gt/intel_engine.h | 3 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +++++++- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 - .../drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/mock_engine.c | 4 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++-- 9 files changed, 131 insertions(+), 36 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 38cc42783dfb..2007dc6f6b99 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -15,28 +15,14 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h"
-static bool irq_enable(struct intel_engine_cs *engine) +static bool irq_enable(struct intel_breadcrumbs *b) { - if (!engine->irq_enable) - return false; - - /* Caller disables interrupts */ - spin_lock(&engine->gt->irq_lock); - engine->irq_enable(engine); - spin_unlock(&engine->gt->irq_lock); - - return true; + return intel_engine_irq_enable(b->irq_engine); }
-static void irq_disable(struct intel_engine_cs *engine) +static void irq_disable(struct intel_breadcrumbs *b) { - if (!engine->irq_disable) - return; - - /* Caller disables interrupts */ - spin_lock(&engine->gt->irq_lock); - engine->irq_disable(engine); - spin_unlock(&engine->gt->irq_lock); + intel_engine_irq_disable(b->irq_engine); }
static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) WRITE_ONCE(b->irq_armed, true);
/* Requests may have completed before we could enable the interrupt. */ - if (!b->irq_enabled++ && irq_enable(b->irq_engine)) + if (!b->irq_enabled++ && b->irq_enable(b)) irq_work_queue(&b->irq_work); }
@@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b) { GEM_BUG_ON(!b->irq_enabled); if (!--b->irq_enabled) - irq_disable(b->irq_engine); + b->irq_disable(b);
WRITE_ONCE(b->irq_armed, false); intel_gt_pm_put_async(b->irq_engine->gt); @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) if (!b) return NULL;
- b->irq_engine = irq_engine; + kref_init(&b->ref);
spin_lock_init(&b->signalers_lock); INIT_LIST_HEAD(&b->signalers); @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) spin_lock_init(&b->irq_lock); init_irq_work(&b->irq_work, signal_irq_work);
+ b->irq_engine = irq_engine; + b->irq_enable = irq_enable; + b->irq_disable = irq_disable; + return b; }
@@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b) spin_lock_irqsave(&b->irq_lock, flags);
if (b->irq_enabled) - irq_enable(b->irq_engine); + b->irq_enable(b); else - irq_disable(b->irq_engine); + b->irq_disable(b);
spin_unlock_irqrestore(&b->irq_lock, flags); } @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b) } }
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b) +void intel_breadcrumbs_free(struct kref *kref) { + struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref); + irq_work_sync(&b->irq_work); GEM_BUG_ON(!list_empty(&b->signalers)); GEM_BUG_ON(b->irq_armed); + kfree(b); }
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h index 3ce5ce270b04..72105b74663d 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h @@ -17,7 +17,7 @@ struct intel_breadcrumbs;
struct intel_breadcrumbs * intel_breadcrumbs_create(struct intel_engine_cs *irq_engine); -void intel_breadcrumbs_free(struct intel_breadcrumbs *b); +void intel_breadcrumbs_free(struct kref *kref);
void intel_breadcrumbs_reset(struct intel_breadcrumbs *b); void __intel_breadcrumbs_park(struct intel_breadcrumbs *b); @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request); void intel_context_remove_breadcrumbs(struct intel_context *ce, struct intel_breadcrumbs *b);
+static inline struct intel_breadcrumbs * +intel_breadcrumbs_get(struct intel_breadcrumbs *b) +{ + kref_get(&b->ref); + return b; +} + +static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b) +{ + kref_put(&b->ref, intel_breadcrumbs_free); +} + #endif /* __INTEL_BREADCRUMBS__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h index 3a084ce8ff5e..a4e146684be8 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h @@ -7,10 +7,13 @@ #define __INTEL_BREADCRUMBS_TYPES__
#include <linux/irq_work.h> +#include <linux/kref.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/types.h>
+typedef u8 intel_engine_mask_t; + /* * Rather than have every client wait upon all user interrupts, * with the herd waking after every interrupt and each doing the @@ -29,6 +32,7 @@ * the overhead of waking that client is much preferred. */ struct intel_breadcrumbs { + struct kref ref; atomic_t active;
spinlock_t signalers_lock; /* protects the list of signalers */ @@ -42,7 +46,10 @@ struct intel_breadcrumbs { bool irq_armed;
/* Not all breadcrumbs are attached to physical HW */ + intel_engine_mask_t engine_mask; struct intel_engine_cs *irq_engine; + bool (*irq_enable)(struct intel_breadcrumbs *b); + void (*irq_disable)(struct intel_breadcrumbs *b); };
#endif /* __INTEL_BREADCRUMBS_TYPES__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 923eaee627b3..e9e0657f847a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
void intel_engine_init_execlists(struct intel_engine_cs *engine);
+bool intel_engine_irq_enable(struct intel_engine_cs *engine); +void intel_engine_irq_disable(struct intel_engine_cs *engine); + static inline void __intel_engine_reset(struct intel_engine_cs *engine, bool stalled) { diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index d13b1716c29e..69245670b8b0 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine) err_cmd_parser: i915_sched_engine_put(engine->sched_engine); err_sched_engine: - intel_breadcrumbs_free(engine->breadcrumbs); + intel_breadcrumbs_put(engine->breadcrumbs); err_status: cleanup_status_page(engine); return err; @@ -947,7 +947,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
i915_sched_engine_put(engine->sched_engine); - intel_breadcrumbs_free(engine->breadcrumbs); + intel_breadcrumbs_put(engine->breadcrumbs);
intel_engine_fini_retire(engine); intel_engine_cleanup_cmd_parser(engine); @@ -1264,6 +1264,30 @@ bool intel_engines_are_idle(struct intel_gt *gt) return true; }
+bool intel_engine_irq_enable(struct intel_engine_cs *engine) +{ + if (!engine->irq_enable) + return false; + + /* Caller disables interrupts */ + spin_lock(&engine->gt->irq_lock); + engine->irq_enable(engine); + spin_unlock(&engine->gt->irq_lock); + + return true; +} + +void intel_engine_irq_disable(struct intel_engine_cs *engine) +{ + if (!engine->irq_disable) + return; + + /* Caller disables interrupts */ + spin_lock(&engine->gt->irq_lock); + engine->irq_disable(engine); + spin_unlock(&engine->gt->irq_lock); +} + void intel_engines_reset_default_submission(struct intel_gt *gt) { struct intel_engine_cs *engine; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 1dc59e6c9a92..e7cb6a06db9d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -64,7 +64,6 @@ struct intel_gt; struct intel_ring; struct intel_uncore;
-typedef u8 intel_engine_mask_t; #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
struct intel_hw_status_page { diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 9cfb8800a0e6..c10ea6080752 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3419,7 +3419,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) intel_context_fini(&ve->context);
if (ve->base.breadcrumbs) - intel_breadcrumbs_free(ve->base.breadcrumbs); + intel_breadcrumbs_put(ve->base.breadcrumbs); if (ve->base.sched_engine) i915_sched_engine_put(ve->base.sched_engine); intel_engine_free_request_pool(&ve->base); diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 9203c766db80..fc5a65ab1937 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine) GEM_BUG_ON(timer_pending(&mock->hw_delay));
i915_sched_engine_put(engine->sched_engine); - intel_breadcrumbs_free(engine->breadcrumbs); + intel_breadcrumbs_put(engine->breadcrumbs);
intel_context_unpin(engine->kernel_context); intel_context_put(engine->kernel_context); @@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine) return 0;
err_breadcrumbs: - intel_breadcrumbs_free(engine->breadcrumbs); + intel_breadcrumbs_put(engine->breadcrumbs); err_schedule: i915_sched_engine_put(engine->sched_engine); return -ENOMEM; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1badd7137b7..83058df5ba01 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1074,6 +1074,9 @@ static void __guc_context_destroy(struct intel_context *ce) struct guc_virtual_engine *ve = container_of(ce, typeof(*ve), context);
+ if (ve->base.breadcrumbs) + intel_breadcrumbs_put(ve->base.breadcrumbs); + kfree(ve); } else { intel_context_free(ce); @@ -1377,6 +1380,62 @@ static const struct intel_context_ops virtual_guc_context_ops = { .get_sibling = guc_virtual_get_sibling, };
+static bool +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b) +{ + struct intel_engine_cs *sibling; + intel_engine_mask_t tmp, mask = b->engine_mask; + bool result = false; + + for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp) + result |= intel_engine_irq_enable(sibling); + + return result; +} + +static void +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b) +{ + struct intel_engine_cs *sibling; + intel_engine_mask_t tmp, mask = b->engine_mask; + + for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp) + intel_engine_irq_disable(sibling); +} + +static void guc_init_breadcrumbs(struct intel_engine_cs *engine) +{ + int i; + + /* + * In GuC submission mode we do not know which physical engine a request + * will be scheduled on, this creates a problem because the breadcrumb + * interrupt is per physical engine. To work around this we attach + * requests and direct all breadcrumb interrupts to the first instance + * of an engine per class. In addition all breadcrumb interrupts are + * enabled / disabled across an engine class in unison. + */ + for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) { + struct intel_engine_cs *sibling = + engine->gt->engine_class[engine->class][i]; + + if (sibling) { + if (engine->breadcrumbs != sibling->breadcrumbs) { + intel_breadcrumbs_put(engine->breadcrumbs); + engine->breadcrumbs = + intel_breadcrumbs_get(sibling->breadcrumbs); + } + break; + } + } + + if (engine->breadcrumbs) { + engine->breadcrumbs->engine_mask |= engine->mask; + engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs; + engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs; + } +} + static void sanitize_hwsp(struct intel_engine_cs *engine) { struct intel_timeline *tl; @@ -1600,6 +1659,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
guc_default_vfuncs(engine); guc_default_irqs(engine); + guc_init_breadcrumbs(engine);
if (engine->class == RENDER_CLASS) rcs_submission_override(engine); @@ -1839,11 +1899,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; ve->base.saturated = ALL_ENGINES; - ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base); - if (!ve->base.breadcrumbs) { - kfree(ve); - return ERR_PTR(-ENOMEM); - }
snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
@@ -1892,6 +1947,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) sibling->emit_fini_breadcrumb; ve->base.emit_fini_breadcrumb_dw = sibling->emit_fini_breadcrumb_dw; + ve->base.breadcrumbs = + intel_breadcrumbs_get(sibling->breadcrumbs);
ve->base.flags |= sibling->flags;
On 6/24/2021 00:04, Matthew Brost wrote:
With GuC virtual engines the physical engine which a request executes and completes on isn't known to the i915. Therefore we can't attach a request to a physical engines breadcrumbs. To work around this we create a single breadcrumbs per engine class when using GuC submission and direct all physical engine interrupts to this breadcrumbs.
Signed-off-by: Matthew Brost matthew.brost@intel.com CC: John Harrison John.C.Harrison@Intel.com
drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +++++------- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +++- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 ++ drivers/gpu/drm/i915/gt/intel_engine.h | 3 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +++++++- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 - .../drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/mock_engine.c | 4 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++-- 9 files changed, 131 insertions(+), 36 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 38cc42783dfb..2007dc6f6b99 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -15,28 +15,14 @@ #include "intel_gt_pm.h" #include "intel_gt_requests.h"
-static bool irq_enable(struct intel_engine_cs *engine) +static bool irq_enable(struct intel_breadcrumbs *b) {
- if (!engine->irq_enable)
return false;
- /* Caller disables interrupts */
- spin_lock(&engine->gt->irq_lock);
- engine->irq_enable(engine);
- spin_unlock(&engine->gt->irq_lock);
- return true;
- return intel_engine_irq_enable(b->irq_engine); }
-static void irq_disable(struct intel_engine_cs *engine) +static void irq_disable(struct intel_breadcrumbs *b) {
- if (!engine->irq_disable)
return;
- /* Caller disables interrupts */
- spin_lock(&engine->gt->irq_lock);
- engine->irq_disable(engine);
- spin_unlock(&engine->gt->irq_lock);
intel_engine_irq_disable(b->irq_engine); }
static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
@@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) WRITE_ONCE(b->irq_armed, true);
/* Requests may have completed before we could enable the interrupt. */
- if (!b->irq_enabled++ && irq_enable(b->irq_engine))
- if (!b->irq_enabled++ && b->irq_enable(b)) irq_work_queue(&b->irq_work); }
@@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b) { GEM_BUG_ON(!b->irq_enabled); if (!--b->irq_enabled)
irq_disable(b->irq_engine);
b->irq_disable(b);
WRITE_ONCE(b->irq_armed, false); intel_gt_pm_put_async(b->irq_engine->gt);
@@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) if (!b) return NULL;
- b->irq_engine = irq_engine;
kref_init(&b->ref);
spin_lock_init(&b->signalers_lock); INIT_LIST_HEAD(&b->signalers);
@@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) spin_lock_init(&b->irq_lock); init_irq_work(&b->irq_work, signal_irq_work);
- b->irq_engine = irq_engine;
- b->irq_enable = irq_enable;
- b->irq_disable = irq_disable;
- return b; }
@@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b) spin_lock_irqsave(&b->irq_lock, flags);
if (b->irq_enabled)
irq_enable(b->irq_engine);
elseb->irq_enable(b);
irq_disable(b->irq_engine);
b->irq_disable(b);
spin_unlock_irqrestore(&b->irq_lock, flags); }
@@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b) } }
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b) +void intel_breadcrumbs_free(struct kref *kref) {
- struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
- irq_work_sync(&b->irq_work); GEM_BUG_ON(!list_empty(&b->signalers)); GEM_BUG_ON(b->irq_armed);
- kfree(b); }
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h index 3ce5ce270b04..72105b74663d 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h @@ -17,7 +17,7 @@ struct intel_breadcrumbs;
struct intel_breadcrumbs * intel_breadcrumbs_create(struct intel_engine_cs *irq_engine); -void intel_breadcrumbs_free(struct intel_breadcrumbs *b); +void intel_breadcrumbs_free(struct kref *kref);
void intel_breadcrumbs_reset(struct intel_breadcrumbs *b); void __intel_breadcrumbs_park(struct intel_breadcrumbs *b); @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request); void intel_context_remove_breadcrumbs(struct intel_context *ce, struct intel_breadcrumbs *b);
+static inline struct intel_breadcrumbs * +intel_breadcrumbs_get(struct intel_breadcrumbs *b) +{
- kref_get(&b->ref);
- return b;
+}
+static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b) +{
- kref_put(&b->ref, intel_breadcrumbs_free);
+}
- #endif /* __INTEL_BREADCRUMBS__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h index 3a084ce8ff5e..a4e146684be8 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h @@ -7,10 +7,13 @@ #define __INTEL_BREADCRUMBS_TYPES__
#include <linux/irq_work.h> +#include <linux/kref.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/types.h>
+typedef u8 intel_engine_mask_t;
Seems like the wrong place for this. Can it be moved to gt/intel_engine_types.h instead?
- /*
- Rather than have every client wait upon all user interrupts,
- with the herd waking after every interrupt and each doing the
@@ -29,6 +32,7 @@
- the overhead of waking that client is much preferred.
*/ struct intel_breadcrumbs {
struct kref ref; atomic_t active;
spinlock_t signalers_lock; /* protects the list of signalers */
@@ -42,7 +46,10 @@ struct intel_breadcrumbs { bool irq_armed;
/* Not all breadcrumbs are attached to physical HW */
intel_engine_mask_t engine_mask; struct intel_engine_cs *irq_engine;
bool (*irq_enable)(struct intel_breadcrumbs *b);
void (*irq_disable)(struct intel_breadcrumbs *b); };
#endif /* __INTEL_BREADCRUMBS_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 923eaee627b3..e9e0657f847a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
void intel_engine_init_execlists(struct intel_engine_cs *engine);
+bool intel_engine_irq_enable(struct intel_engine_cs *engine); +void intel_engine_irq_disable(struct intel_engine_cs *engine);
- static inline void __intel_engine_reset(struct intel_engine_cs *engine, bool stalled) {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index d13b1716c29e..69245670b8b0 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine) err_cmd_parser: i915_sched_engine_put(engine->sched_engine); err_sched_engine:
- intel_breadcrumbs_free(engine->breadcrumbs);
- intel_breadcrumbs_put(engine->breadcrumbs); err_status: cleanup_status_page(engine); return err;
@@ -947,7 +947,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
i915_sched_engine_put(engine->sched_engine);
- intel_breadcrumbs_free(engine->breadcrumbs);
intel_breadcrumbs_put(engine->breadcrumbs);
intel_engine_fini_retire(engine); intel_engine_cleanup_cmd_parser(engine);
@@ -1264,6 +1264,30 @@ bool intel_engines_are_idle(struct intel_gt *gt) return true; }
+bool intel_engine_irq_enable(struct intel_engine_cs *engine) +{
- if (!engine->irq_enable)
return false;
- /* Caller disables interrupts */
- spin_lock(&engine->gt->irq_lock);
- engine->irq_enable(engine);
- spin_unlock(&engine->gt->irq_lock);
- return true;
+}
+void intel_engine_irq_disable(struct intel_engine_cs *engine) +{
- if (!engine->irq_disable)
return;
- /* Caller disables interrupts */
- spin_lock(&engine->gt->irq_lock);
- engine->irq_disable(engine);
- spin_unlock(&engine->gt->irq_lock);
+}
- void intel_engines_reset_default_submission(struct intel_gt *gt) { struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 1dc59e6c9a92..e7cb6a06db9d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -64,7 +64,6 @@ struct intel_gt; struct intel_ring; struct intel_uncore;
-typedef u8 intel_engine_mask_t;
Oh.
The following fixes this for me: diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h index 3ce5ce270b04..ac5cdd6ff3f4 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h @@ -10,6 +10,7 @@ Â #include <linux/irq_work.h>
#include "intel_engine_types.h" +#include "gt/intel_breadcrumbs_types.h"
struct drm_printer; Â struct i915_request; diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h index 3a084ce8ff5e..260ccd5c1573 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h @@ -11,6 +11,8 @@ Â #include <linux/spinlock.h> Â #include <linux/types.h>
+#include "gt/intel_engine_types.h" + Â /* Â * Rather than have every client wait upon all user interrupts, Â * with the herd waking after every interrupt and each doing the diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 1cb9c3b70b29..da15b8b3c1f7 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -21,7 +21,6 @@ Â #include "i915_pmu.h" Â #include "i915_priolist_types.h" Â #include "i915_selftest.h" -#include "intel_breadcrumbs_types.h" Â #include "intel_sseu.h" Â #include "intel_timeline_types.h" Â #include "intel_uncore.h"
John.
#define ALL_ENGINES ((intel_engine_mask_t)~0ul)
struct intel_hw_status_page { diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 9cfb8800a0e6..c10ea6080752 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3419,7 +3419,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk) intel_context_fini(&ve->context);
if (ve->base.breadcrumbs)
intel_breadcrumbs_free(ve->base.breadcrumbs);
if (ve->base.sched_engine) i915_sched_engine_put(ve->base.sched_engine); intel_engine_free_request_pool(&ve->base);intel_breadcrumbs_put(ve->base.breadcrumbs);
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 9203c766db80..fc5a65ab1937 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine) GEM_BUG_ON(timer_pending(&mock->hw_delay));
i915_sched_engine_put(engine->sched_engine);
- intel_breadcrumbs_free(engine->breadcrumbs);
intel_breadcrumbs_put(engine->breadcrumbs);
intel_context_unpin(engine->kernel_context); intel_context_put(engine->kernel_context);
@@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine) return 0;
err_breadcrumbs:
- intel_breadcrumbs_free(engine->breadcrumbs);
- intel_breadcrumbs_put(engine->breadcrumbs); err_schedule: i915_sched_engine_put(engine->sched_engine); return -ENOMEM;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index d1badd7137b7..83058df5ba01 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1074,6 +1074,9 @@ static void __guc_context_destroy(struct intel_context *ce) struct guc_virtual_engine *ve = container_of(ce, typeof(*ve), context);
if (ve->base.breadcrumbs)
intel_breadcrumbs_put(ve->base.breadcrumbs);
- kfree(ve); } else { intel_context_free(ce);
@@ -1377,6 +1380,62 @@ static const struct intel_context_ops virtual_guc_context_ops = { .get_sibling = guc_virtual_get_sibling, };
+static bool +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b) +{
- struct intel_engine_cs *sibling;
- intel_engine_mask_t tmp, mask = b->engine_mask;
- bool result = false;
- for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
result |= intel_engine_irq_enable(sibling);
- return result;
+}
+static void +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b) +{
- struct intel_engine_cs *sibling;
- intel_engine_mask_t tmp, mask = b->engine_mask;
- for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
intel_engine_irq_disable(sibling);
+}
+static void guc_init_breadcrumbs(struct intel_engine_cs *engine) +{
- int i;
/*
* In GuC submission mode we do not know which physical engine a request
* will be scheduled on, this creates a problem because the breadcrumb
* interrupt is per physical engine. To work around this we attach
* requests and direct all breadcrumb interrupts to the first instance
* of an engine per class. In addition all breadcrumb interrupts are
- enabled / disabled across an engine class in unison.
*/
- for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
struct intel_engine_cs *sibling =
engine->gt->engine_class[engine->class][i];
if (sibling) {
if (engine->breadcrumbs != sibling->breadcrumbs) {
intel_breadcrumbs_put(engine->breadcrumbs);
engine->breadcrumbs =
intel_breadcrumbs_get(sibling->breadcrumbs);
}
break;
}
- }
- if (engine->breadcrumbs) {
engine->breadcrumbs->engine_mask |= engine->mask;
engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
- }
+}
- static void sanitize_hwsp(struct intel_engine_cs *engine) { struct intel_timeline *tl;
@@ -1600,6 +1659,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
guc_default_vfuncs(engine); guc_default_irqs(engine);
guc_init_breadcrumbs(engine);
if (engine->class == RENDER_CLASS) rcs_submission_override(engine);
@@ -1839,11 +1899,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; ve->base.saturated = ALL_ENGINES;
ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
if (!ve->base.breadcrumbs) {
kfree(ve);
return ERR_PTR(-ENOMEM);
}
snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
@@ -1892,6 +1947,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) sibling->emit_fini_breadcrumb; ve->base.emit_fini_breadcrumb_dw = sibling->emit_fini_breadcrumb_dw;
ve->base.breadcrumbs =
intel_breadcrumbs_get(sibling->breadcrumbs); ve->base.flags |= sibling->flags;
Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies.
With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_context.c | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + .../drm/i915/gt/intel_execlists_submission.c | 40 ++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 22 + drivers/gpu/drm/i915/gt/mock_engine.c | 31 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 +- drivers/gpu/drm/i915/i915_request.h | 2 + 15 files changed, 649 insertions(+), 171 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index b24a1b7a3f88..2f01437056a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences);
+ spin_lock_init(&ce->guc_active.lock); + INIT_LIST_HEAD(&ce->guc_active.requests); + ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6945963a31ba..b63c8cf7823b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -165,6 +165,13 @@ struct intel_context { struct list_head fences; } guc_state;
+ struct { + /** lock: protects everything in guc_active */ + spinlock_t lock; + /** requests: active requests on this context */ + struct list_head requests; + } guc_active; + /* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e7cb6a06db9d..f9d264c008e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -426,6 +426,12 @@ struct intel_engine_cs {
void (*release)(struct intel_engine_cs *engine);
+ /* + * Add / remove request from engine active tracking + */ + void (*add_active_request)(struct i915_request *rq); + void (*remove_active_request)(struct i915_request *rq); + struct intel_engine_execlists execlists;
/* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index c10ea6080752..c301a2d088b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); }
+static void add_to_engine(struct i915_request *rq) +{ + lockdep_assert_held(&rq->engine->sched_engine->lock); + list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); +} + +static void remove_from_engine(struct i915_request *rq) +{ + struct intel_engine_cs *engine, *locked; + + /* + * Virtual engines complicate acquiring the engine timeline lock, + * as their rq->engine pointer is not stable until under that + * engine lock. The simple ploy we use is to take the lock then + * check that the rq still belongs to the newly locked engine. + */ + locked = READ_ONCE(rq->engine); + spin_lock_irq(&locked->sched_engine->lock); + while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { + spin_unlock(&locked->sched_engine->lock); + spin_lock(&engine->sched_engine->lock); + locked = engine; + } + list_del_init(&rq->sched.link); + + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags); + + /* Prevent further __await_execution() registering a cb, then flush */ + set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); + + spin_unlock_irq(&locked->sched_engine->lock); + + i915_request_notify_execute_cb_imm(rq); +} + static bool can_preempt(struct intel_engine_cs *engine) { if (GRAPHICS_VER(engine->i915) > 8) @@ -3218,6 +3254,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; engine->bump_serial = execlist_bump_serial; + engine->add_active_request = add_to_engine; + engine->remove_active_request = remove_from_engine;
engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind; @@ -3912,6 +3950,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
+ ve->base.add_active_request = sibling->add_active_request; + ve->base.remove_active_request = sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb; diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index aef3084e8b16..463a6ae605a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt);
- intel_uc_sanitize(>->uc); - for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine); @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); }
+ intel_uc_reset(>->uc, false); + for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine); @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; }
+ intel_uc_reset_finish(>->uc); + intel_rps_enable(>->rps); intel_llc_enable(>->llc);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..2987282dff6d 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable();
+ intel_uc_reset(>->uc, true); + intel_ggtt_restore_fences(gt->ggtt);
return err; @@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); } + + intel_uc_reset_finish(>->uc); }
static void nop_submit_request(struct i915_request *request) @@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine); + intel_uc_cancel_requests(>->uc); local_bh_enable();
reset_finish(gt, awake); @@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
+ if (intel_engine_uses_guc(engine)) + return -ENODEV; + if (!intel_engine_pm_get_if_awake(engine)) return 0;
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
- if (intel_engine_uses_guc(engine)) - ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine); - else - ret = intel_gt_reset_engine(engine); + ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */ - ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret); + ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret); goto out; }
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt, * Try engine reset when available. We fall back to full reset if * single reset fails. */ - if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) { + if (!intel_uc_uses_guc_submission(>->uc) && + intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) { local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE); diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index e1506b280df1..99dcdc8fba12 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1049,6 +1049,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine) engine->serial++; }
+static void add_to_engine(struct i915_request *rq) +{ + lockdep_assert_held(&rq->engine->sched_engine->lock); + list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); +} + +static void remove_from_engine(struct i915_request *rq) +{ + spin_lock_irq(&rq->engine->sched_engine->lock); + list_del_init(&rq->sched.link); + + /* Prevent further __await_execution() registering a cb, then flush */ + set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); + + spin_unlock_irq(&rq->engine->sched_engine->lock); + + i915_request_notify_execute_cb_imm(rq); +} + static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915; @@ -1066,6 +1085,9 @@ static void setup_common(struct intel_engine_cs *engine) engine->reset.cancel = reset_cancel; engine->reset.finish = reset_finish;
+ engine->add_active_request = add_to_engine; + engine->remove_active_request = remove_from_engine; + engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; engine->bump_serial = ring_bump_serial; diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index fc5a65ab1937..c12ff3a75ce6 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request) spin_unlock_irqrestore(&engine->hw_lock, flags); }
+static void mock_add_to_engine(struct i915_request *rq) +{ + lockdep_assert_held(&rq->engine->sched_engine->lock); + list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests); +} + +static void mock_remove_from_engine(struct i915_request *rq) +{ + struct intel_engine_cs *engine, *locked; + + /* + * Virtual engines complicate acquiring the engine timeline lock, + * as their rq->engine pointer is not stable until under that + * engine lock. The simple ploy we use is to take the lock then + * check that the rq still belongs to the newly locked engine. + */ + + locked = READ_ONCE(rq->engine); + spin_lock_irq(&locked->sched_engine->lock); + while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { + spin_unlock(&locked->sched_engine->lock); + spin_lock(&engine->sched_engine->lock); + locked = engine; + } + list_del_init(&rq->sched.link); + spin_unlock_irq(&locked->sched_engine->lock); +} + + static void mock_reset_prepare(struct intel_engine_cs *engine) { } @@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request; + engine->base.add_active_request = mock_add_to_engine; + engine->base.remove_active_request = mock_remove_from_engine;
engine->base.reset.prepare = mock_reset_prepare; engine->base.reset.rewind = mock_reset_rewind; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 6661dcb02239..9b09395b998f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc) return 0; }
-/** - * intel_guc_reset_engine() - ask GuC to reset an engine - * @guc: intel_guc structure - * @engine: engine to be reset - */ -int intel_guc_reset_engine(struct intel_guc *guc, - struct intel_engine_cs *engine) -{ - /* XXX: to be implemented with submission interface rework */ - - return -ENODEV; -} - /** * intel_guc_resume() - notify GuC resuming from suspend state * @guc: the guc diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 22eb1e9cca41..40c9868762d7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
-int intel_guc_reset_engine(struct intel_guc *guc, - struct intel_engine_cs *engine); - int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len);
+void intel_guc_submission_reset_prepare(struct intel_guc *guc); +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); +void intel_guc_submission_reset_finish(struct intel_guc *guc); +void intel_guc_submission_cancel_requests(struct intel_guc *guc); + void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
#endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 83058df5ba01..b8c894ad8caf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce) static inline void set_context_wait_for_deregister_to_register(struct intel_context *ce) { - /* Only should be called from guc_lrc_desc_pin() */ + /* Only should be called from guc_lrc_desc_pin() without lock */ ce->guc_state.sched_state |= SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; } @@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) { + guc->lrc_desc_pool_vaddr = NULL; i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline bool guc_submission_initialized(struct intel_guc *guc) +{ + return guc->lrc_desc_pool_vaddr != NULL; +} + static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) { - struct guc_lrc_desc *desc = __get_lrc_desc(guc, id); + if (likely(guc_submission_initialized(guc))) { + struct guc_lrc_desc *desc = __get_lrc_desc(guc, id); + unsigned long flags;
- memset(desc, 0, sizeof(*desc)); - xa_erase_irq(&guc->context_lookup, id); + memset(desc, 0, sizeof(*desc)); + + /* + * xarray API doesn't have xa_erase_irqsave wrapper, so calling + * the lower level functions directly. + */ + xa_lock_irqsave(&guc->context_lookup, flags); + __xa_erase(&guc->context_lookup, id); + xa_unlock_irqrestore(&guc->context_lookup, flags); + } }
static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) @@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, struct intel_context *ce) { - xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC); + unsigned long flags; + + /* + * xarray API doesn't have xa_save_irqsave wrapper, so calling the + * lower level functions directly. + */ + xa_lock_irqsave(&guc->context_lookup, flags); + __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC); + xa_unlock_irqrestore(&guc->context_lookup, flags); }
static int guc_submission_busy_loop(struct intel_guc* guc, @@ -331,6 +355,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) interruptible, timeout); }
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop); + static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err; @@ -338,11 +364,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) u32 action[3]; int len = 0; u32 g2h_len_dw = 0; - bool enabled = context_enabled(ce); + bool enabled;
GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); GEM_BUG_ON(context_guc_id_invalid(ce));
+ /* + * Corner case where the GuC firmware was blown away and reloaded while + * this context was pinned. + */ + if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) { + err = guc_lrc_desc_pin(ce, false); + if (unlikely(err)) + goto out; + } + enabled = context_enabled(ce); + if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id; @@ -365,6 +402,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) intel_context_put(ce); }
+out: return err; }
@@ -419,15 +457,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc) if (submit) { guc_set_lrc_tail(last); resubmit: - /* - * We only check for -EBUSY here even though it is possible for - * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has - * died and a full GPU needs to be done. The hangcheck will - * eventually detect that the GuC has died and trigger this - * reset so no need to handle -EDEADLK here. - */ ret = guc_add_request(guc, last); - if (ret == -EBUSY) { + if (unlikely(ret == -EIO)) + goto deadlk; + else if (ret == -EBUSY) { tasklet_schedule(&sched_engine->tasklet); guc->stalled_request = last; return false; @@ -437,6 +470,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
guc->stalled_request = NULL; return submit; + +deadlk: + sched_engine->tasklet.callback = NULL; + tasklet_disable_nosync(&sched_engine->tasklet); + return false; }
static void guc_submission_tasklet(struct tasklet_struct *t) @@ -463,27 +501,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) intel_engine_signal_breadcrumbs(engine); }
-static void guc_reset_prepare(struct intel_engine_cs *engine) +static void __guc_context_destroy(struct intel_context *ce); +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce); +static void guc_signal_context_fence(struct intel_context *ce); + +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +{ + struct intel_context *ce; + unsigned long index, flags; + bool pending_disable, pending_enable, deregister, destroyed; + + xa_for_each(&guc->context_lookup, index, ce) { + /* Flush context */ + spin_lock_irqsave(&ce->guc_state.lock, flags); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + + /* + * Once we are at this point submission_disabled() is guaranteed + * to visible to all callers who set the below flags (see above + * flush and flushes in reset_prepare). If submission_disabled() + * is set, the caller shouldn't set these flags. + */ + + destroyed = context_destroyed(ce); + pending_enable = context_pending_enable(ce); + pending_disable = context_pending_disable(ce); + deregister = context_wait_for_deregister_to_register(ce); + init_sched_state(ce); + + if (pending_enable || destroyed || deregister) { + atomic_dec(&guc->outstanding_submission_g2h); + if (deregister) + guc_signal_context_fence(ce); + if (destroyed) { + release_guc_id(guc, ce); + __guc_context_destroy(ce); + } + if (pending_enable|| deregister) + intel_context_put(ce); + } + + /* Not mutualy exclusive with above if statement. */ + if (pending_disable) { + guc_signal_context_fence(ce); + intel_context_sched_disable_unpin(ce); + atomic_dec(&guc->outstanding_submission_g2h); + intel_context_put(ce); + } + } +} + +static inline bool +submission_disabled(struct intel_guc *guc) +{ + struct i915_sched_engine * const sched_engine = guc->sched_engine; + + return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet)); +} + +static void disable_submission(struct intel_guc *guc) +{ + struct i915_sched_engine * const sched_engine = guc->sched_engine; + + if (__tasklet_is_enabled(&sched_engine->tasklet)) { + GEM_BUG_ON(!guc->ct.enabled); + __tasklet_disable_sync_once(&sched_engine->tasklet); + sched_engine->tasklet.callback = NULL; + } +} + +static void enable_submission(struct intel_guc *guc) +{ + struct i915_sched_engine * const sched_engine = guc->sched_engine; + unsigned long flags; + + spin_lock_irqsave(&guc->sched_engine->lock, flags); + sched_engine->tasklet.callback = guc_submission_tasklet; + wmb(); + if (!__tasklet_is_enabled(&sched_engine->tasklet) && + __tasklet_enable(&sched_engine->tasklet)) { + GEM_BUG_ON(!guc->ct.enabled); + + /* And kick in case we missed a new request submission. */ + tasklet_hi_schedule(&sched_engine->tasklet); + } + spin_unlock_irqrestore(&guc->sched_engine->lock, flags); +} + +static void guc_flush_submissions(struct intel_guc *guc) { - ENGINE_TRACE(engine, "\n"); + struct i915_sched_engine * const sched_engine = guc->sched_engine; + unsigned long flags; + + spin_lock_irqsave(&sched_engine->lock, flags); + spin_unlock_irqrestore(&sched_engine->lock, flags); +} + +void intel_guc_submission_reset_prepare(struct intel_guc *guc) +{ + int i; + + if (unlikely(!guc_submission_initialized(guc))) + /* Reset called during driver load? GuC not yet initialised! */ + return; + + disable_submission(guc); + guc->interrupts.disable(guc); + + /* Flush IRQ handler */ + spin_lock_irq(&guc_to_gt(guc)->irq_lock); + spin_unlock_irq(&guc_to_gt(guc)->irq_lock); + + guc_flush_submissions(guc);
/* - * Prevent request submission to the hardware until we have - * completed the reset in i915_gem_reset_finish(). If a request - * is completed by one engine, it may then queue a request - * to a second via its execlists->tasklet *just* as we are - * calling engine->init_hw() and also writing the ELSP. - * Turning off the execlists->tasklet until the reset is over - * prevents the race. - */ - __tasklet_disable_sync_once(&engine->sched_engine->tasklet); + * Handle any outstanding G2Hs before reset. Call IRQ handler directly + * each pass as interrupt have been disabled. We always scrub for + * outstanding G2H as it is possible for outstanding_submission_g2h to + * be incremented after the context state update. + */ + for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) { + intel_guc_to_host_event_handler(guc); +#define wait_for_reset(guc, wait_var) \ + guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20)) + do { + wait_for_reset(guc, &guc->outstanding_submission_g2h); + } while (!list_empty(&guc->ct.requests.incoming)); + } + scrub_guc_desc_for_outstanding_g2h(guc); }
-static void guc_reset_state(struct intel_context *ce, - struct intel_engine_cs *engine, - u32 head, - bool scrub) +static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) { + struct intel_engine_cs *engine; + intel_engine_mask_t tmp, mask = ve->mask; + unsigned int num_siblings = 0; + + for_each_engine_masked(engine, ve->gt, mask, tmp) + if (num_siblings++ == sibling) + return engine; + + return NULL; +} + +static inline struct intel_engine_cs * +__context_to_physical_engine(struct intel_context *ce) +{ + struct intel_engine_cs *engine = ce->engine; + + if (intel_engine_is_virtual(engine)) + engine = guc_virtual_get_sibling(engine, 0); + + return engine; +} + +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub) +{ + struct intel_engine_cs *engine = __context_to_physical_engine(ce); + GEM_BUG_ON(!intel_context_is_pinned(ce));
/* @@ -501,42 +677,147 @@ static void guc_reset_state(struct intel_context *ce, lrc_update_regs(ce, engine, head); }
-static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) +static void guc_reset_nop(struct intel_engine_cs *engine) { - struct intel_engine_execlists * const execlists = &engine->execlists; - struct i915_request *rq; +} + +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled) +{ +} + +static void +__unwind_incomplete_requests(struct intel_context *ce) +{ + struct i915_request *rq, *rn; + struct list_head *pl; + int prio = I915_PRIORITY_INVALID; + struct i915_sched_engine * const sched_engine = + ce->engine->sched_engine; unsigned long flags;
- spin_lock_irqsave(&engine->sched_engine->lock, flags); + spin_lock_irqsave(&sched_engine->lock, flags); + spin_lock(&ce->guc_active.lock); + list_for_each_entry_safe(rq, rn, + &ce->guc_active.requests, + sched.link) { + if (i915_request_completed(rq)) + continue; + + list_del_init(&rq->sched.link); + spin_unlock(&ce->guc_active.lock); + + __i915_request_unsubmit(rq); + + /* Push the request back into the queue for later resubmission. */ + GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); + if (rq_prio(rq) != prio) { + prio = rq_prio(rq); + pl = i915_sched_lookup_priolist(sched_engine, prio); + } + GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine)); + + list_add_tail(&rq->sched.link, pl); + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Push back any incomplete requests for replay after the reset. */ - rq = execlists_unwind_incomplete_requests(execlists); - if (!rq) - goto out_unlock; + spin_lock(&ce->guc_active.lock); + } + spin_unlock(&ce->guc_active.lock); + spin_unlock_irqrestore(&sched_engine->lock, flags); +} + +static struct i915_request *context_find_active_request(struct intel_context *ce) +{ + struct i915_request *rq, *active = NULL; + unsigned long flags; + + spin_lock_irqsave(&ce->guc_active.lock, flags); + list_for_each_entry_reverse(rq, &ce->guc_active.requests, + sched.link) { + if (i915_request_completed(rq)) + break; + + active = rq; + } + spin_unlock_irqrestore(&ce->guc_active.lock, flags); + + return active; +} + +static void __guc_reset_context(struct intel_context *ce, bool stalled) +{ + struct i915_request *rq; + u32 head; + + /* + * GuC will implicitly mark the context as non-schedulable + * when it sends the reset notification. Make sure our state + * reflects this change. The context will be marked enabled + * on resubmission. + */ + clr_context_enabled(ce); + + rq = context_find_active_request(ce); + if (!rq) { + head = ce->ring->tail; + stalled = false; + goto out_replay; + }
if (!i915_request_started(rq)) stalled = false;
+ GEM_BUG_ON(i915_active_is_idle(&ce->active)); + head = intel_ring_wrap(ce->ring, rq->head); __i915_request_reset(rq, stalled); - guc_reset_state(rq->context, engine, rq->head, stalled);
-out_unlock: - spin_unlock_irqrestore(&engine->sched_engine->lock, flags); +out_replay: + guc_reset_state(ce, head, stalled); + __unwind_incomplete_requests(ce); }
-static void guc_reset_cancel(struct intel_engine_cs *engine) +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled) +{ + struct intel_context *ce; + unsigned long index; + + if (unlikely(!guc_submission_initialized(guc))) + /* Reset called during driver load? GuC not yet initialised! */ + return; + + xa_for_each(&guc->context_lookup, index, ce) + if (intel_context_is_pinned(ce)) + __guc_reset_context(ce, stalled); + + /* GuC is blown away, drop all references to contexts */ + xa_destroy(&guc->context_lookup); +} + +static void guc_cancel_context_requests(struct intel_context *ce) +{ + struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine; + struct i915_request *rq; + unsigned long flags; + + /* Mark all executing requests as skipped. */ + spin_lock_irqsave(&sched_engine->lock, flags); + spin_lock(&ce->guc_active.lock); + list_for_each_entry(rq, &ce->guc_active.requests, sched.link) + i915_request_put(i915_request_mark_eio(rq)); + spin_unlock(&ce->guc_active.lock); + spin_unlock_irqrestore(&sched_engine->lock, flags); +} + +static void +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine) { - struct i915_sched_engine * const sched_engine = engine->sched_engine; struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags;
/* Can be called during boot if GuC fails to load */ - if (!engine->gt) + if (!sched_engine) return;
- ENGINE_TRACE(engine, "\n"); - /* * Before we call engine->cancel_requests(), we should have exclusive * access to the submission state. This is arranged for us by the @@ -553,21 +834,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) */ spin_lock_irqsave(&sched_engine->lock, flags);
- /* Mark all executing requests as skipped. */ - list_for_each_entry(rq, &sched_engine->requests, sched.link) { - i915_request_set_error_once(rq, -EIO); - i915_request_mark_complete(rq); - } - /* Flush the queued requests to the timeline list (for retiring). */ while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb);
priolist_for_each_request_consume(rq, rn, p) { list_del_init(&rq->sched.link); + __i915_request_submit(rq); - dma_fence_set_error(&rq->fence, -EIO); - i915_request_mark_complete(rq); + + i915_request_put(i915_request_mark_eio(rq)); }
rb_erase_cached(&p->node, &sched_engine->queue); @@ -582,14 +858,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) spin_unlock_irqrestore(&sched_engine->lock, flags); }
-static void guc_reset_finish(struct intel_engine_cs *engine) +void intel_guc_submission_cancel_requests(struct intel_guc *guc) { - if (__tasklet_enable(&engine->sched_engine->tasklet)) - /* And kick in case we missed a new request submission. */ - tasklet_hi_schedule(&engine->sched_engine->tasklet); + struct intel_context *ce; + unsigned long index; + + xa_for_each(&guc->context_lookup, index, ce) + if (intel_context_is_pinned(ce)) + guc_cancel_context_requests(ce);
- ENGINE_TRACE(engine, "depth->%d\n", - atomic_read(&engine->sched_engine->tasklet.count)); + guc_cancel_sched_engine_requests(guc->sched_engine); + + /* GuC is blown away, drop all references to contexts */ + xa_destroy(&guc->context_lookup); +} + +void intel_guc_submission_reset_finish(struct intel_guc *guc) +{ + /* Reset called during driver load or during wedge? */ + if (unlikely(!guc_submission_initialized(guc) || + test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags))) + return; + + /* + * Technically possible for either of these values to be non-zero here, + * but very unlikely + harmless. Regardless let's add a warn so we can + * see in CI if this happens frequently / a precursor to taking down the + * machine. + */ + GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h)); + atomic_set(&guc->outstanding_submission_g2h, 0); + + enable_submission(guc); }
/* @@ -656,6 +956,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, else trace_i915_request_guc_submit(rq);
+ if (unlikely(ret == -EIO)) + disable_submission(guc); + return ret; }
@@ -668,7 +971,8 @@ static void guc_submit_request(struct i915_request *rq) /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags);
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine)) + if (submission_disabled(guc) || guc->stalled_request || + !i915_sched_engine_is_empty(sched_engine)) queue_request(sched_engine, rq, rq_prio(rq)); else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) tasklet_hi_schedule(&sched_engine->tasklet); @@ -805,7 +1109,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
static int __guc_action_register_context(struct intel_guc *guc, u32 guc_id, - u32 offset) + u32 offset, + bool loop) { u32 action[] = { INTEL_GUC_ACTION_REGISTER_CONTEXT, @@ -813,10 +1118,10 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); + return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop); }
-static int register_context(struct intel_context *ce) +static int register_context(struct intel_context *ce, bool loop) { struct intel_guc *guc = ce_to_guc(ce); u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + @@ -824,11 +1129,12 @@ static int register_context(struct intel_context *ce)
trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset); + return __guc_action_register_context(guc, ce->guc_id, offset, loop); }
static int __guc_action_deregister_context(struct intel_guc *guc, - u32 guc_id) + u32 guc_id, + bool loop) { u32 action[] = { INTEL_GUC_ACTION_DEREGISTER_CONTEXT, @@ -836,16 +1142,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc, };
return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), - G2H_LEN_DW_DEREGISTER_CONTEXT, true); + G2H_LEN_DW_DEREGISTER_CONTEXT, loop); }
-static int deregister_context(struct intel_context *ce, u32 guc_id) +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop) { struct intel_guc *guc = ce_to_guc(ce);
trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id); + return __guc_action_deregister_context(guc, guc_id, loop); }
static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask) @@ -874,7 +1180,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US; }
-static int guc_lrc_desc_pin(struct intel_context *ce) +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -920,18 +1226,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce) */ if (context_registered) { trace_intel_context_steal_guc_id(ce); - set_context_wait_for_deregister_to_register(ce); - intel_context_get(ce); + if (!loop) { + set_context_wait_for_deregister_to_register(ce); + intel_context_get(ce); + } else { + bool disabled; + unsigned long flags; + + /* Seal race with Reset */ + spin_lock_irqsave(&ce->guc_state.lock, flags); + disabled = submission_disabled(guc); + if (likely(!disabled)) { + set_context_wait_for_deregister_to_register(ce); + intel_context_get(ce); + } + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + if (unlikely(disabled)) { + reset_lrc_desc(guc, desc_idx); + return 0; /* Will get registered later */ + } + }
/* * If stealing the guc_id, this ce has the same guc_id as the * context whos guc_id was stole. */ with_intel_runtime_pm(runtime_pm, wakeref) - ret = deregister_context(ce, ce->guc_id); + ret = deregister_context(ce, ce->guc_id, loop); + if (unlikely(ret == -EBUSY)) { + clr_context_wait_for_deregister_to_register(ce); + intel_context_put(ce); + } } else { with_intel_runtime_pm(runtime_pm, wakeref) - ret = register_context(ce); + ret = register_context(ce, loop); + if (unlikely(ret == -EBUSY)) + reset_lrc_desc(guc, desc_idx); + else if (unlikely(ret == -ENODEV)) + ret = 0; /* Will get registered later */ }
return ret; @@ -994,7 +1326,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
trace_intel_context_sched_disable(ce); - intel_context_get(ce);
guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true); @@ -1004,6 +1335,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce) { set_context_pending_disable(ce); clr_context_enabled(ce); + intel_context_get(ce);
return ce->guc_id; } @@ -1016,7 +1348,7 @@ static void guc_context_sched_disable(struct intel_context *ce) u16 guc_id; intel_wakeref_t wakeref;
- if (context_guc_id_invalid(ce) || + if (submission_disabled(guc) || context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { clr_context_enabled(ce); goto unpin; @@ -1034,6 +1366,7 @@ static void guc_context_sched_disable(struct intel_context *ce) * request doesn't slip through the 'context_pending_disable' fence. */ if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) { + spin_unlock_irqrestore(&ce->guc_state.lock, flags); return; } guc_id = prep_context_pending_disable(ce); @@ -1050,19 +1383,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
static inline void guc_lrc_desc_unpin(struct intel_context *ce) { - struct intel_engine_cs *engine = ce->engine; - struct intel_guc *guc = &engine->gt->uc.guc; - unsigned long flags; + struct intel_guc *guc = ce_to_guc(ce);
GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); GEM_BUG_ON(context_enabled(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags); - set_context_destroyed(ce); - spin_unlock_irqrestore(&ce->guc_state.lock, flags); - - deregister_context(ce, ce->guc_id); + deregister_context(ce, ce->guc_id, true); }
static void __guc_context_destroy(struct intel_context *ce) @@ -1090,13 +1417,15 @@ static void guc_context_destroy(struct kref *kref) struct intel_guc *guc = &ce->engine->gt->uc.guc; intel_wakeref_t wakeref; unsigned long flags; + bool disabled;
/* * If the guc_id is invalid this context has been stolen and we can free * it immediately. Also can be freed immediately if the context is not * registered with the GuC. */ - if (context_guc_id_invalid(ce) || + if (submission_disabled(guc) || + context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { release_guc_id(guc, ce); __guc_context_destroy(ce); @@ -1123,6 +1452,18 @@ static void guc_context_destroy(struct kref *kref) list_del_init(&ce->guc_id_link); spin_unlock_irqrestore(&guc->contexts_lock, flags);
+ /* Seal race with Reset */ + spin_lock_irqsave(&ce->guc_state.lock, flags); + disabled = submission_disabled(guc); + if (likely(!disabled)) + set_context_destroyed(ce); + spin_unlock_irqrestore(&ce->guc_state.lock, flags); + if (unlikely(disabled)) { + release_guc_id(guc, ce); + __guc_context_destroy(ce); + return; + } + /* * We defer GuC context deregistration until the context is destroyed * in order to save on CTBs. With this optimization ideally we only need @@ -1145,6 +1486,33 @@ static int guc_context_alloc(struct intel_context *ce) return lrc_alloc(ce, ce->engine); }
+static void add_to_context(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + + spin_lock(&ce->guc_active.lock); + list_move_tail(&rq->sched.link, &ce->guc_active.requests); + spin_unlock(&ce->guc_active.lock); +} + +static void remove_from_context(struct i915_request *rq) +{ + struct intel_context *ce = rq->context; + + spin_lock_irq(&ce->guc_active.lock); + + list_del_init(&rq->sched.link); + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + + /* Prevent further __await_execution() registering a cb, then flush */ + set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); + + spin_unlock_irq(&ce->guc_active.lock); + + atomic_dec(&ce->guc_id_ref); + i915_request_notify_execute_cb_imm(rq); +} + static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
@@ -1183,8 +1551,6 @@ static void guc_signal_context_fence(struct intel_context *ce) { unsigned long flags;
- GEM_BUG_ON(!context_wait_for_deregister_to_register(ce)); - spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_wait_for_deregister_to_register(ce); __guc_signal_context_fence(ce); @@ -1193,8 +1559,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
static bool context_needs_register(struct intel_context *ce, bool new_guc_id) { - return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) || - !lrc_desc_registered(ce_to_guc(ce), ce->guc_id); + return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) || + !lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) && + !submission_disabled(ce_to_guc(ce)); }
static int guc_request_alloc(struct i915_request *rq) @@ -1252,8 +1619,12 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(ret < 0)) return ret;; if (context_needs_register(ce, !!ret)) { - ret = guc_lrc_desc_pin(ce); + ret = guc_lrc_desc_pin(ce, true); if (unlikely(ret)) { /* unwind */ + if (ret == -EIO) { + disable_submission(guc); + goto out; /* GPU will be reset */ + } atomic_dec(&ce->guc_id_ref); unpin_guc_id(guc, ce); return ret; @@ -1290,20 +1661,6 @@ static int guc_request_alloc(struct i915_request *rq) return 0; }
-static struct intel_engine_cs * -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) -{ - struct intel_engine_cs *engine; - intel_engine_mask_t tmp, mask = ve->mask; - unsigned int num_siblings = 0; - - for_each_engine_masked(engine, ve->gt, mask, tmp) - if (num_siblings++ == sibling) - return engine; - - return NULL; -} - static int guc_virtual_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr) @@ -1512,7 +1869,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc, { if (context_guc_id_invalid(ce)) pin_guc_id(guc, ce); - guc_lrc_desc_pin(ce); + guc_lrc_desc_pin(ce, true); }
static inline void guc_init_lrc_mapping(struct intel_guc *guc) @@ -1578,13 +1935,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; engine->bump_serial = guc_bump_serial; + engine->add_active_request = add_to_context; + engine->remove_active_request = remove_from_context;
engine->sched_engine->schedule = i915_schedule;
- engine->reset.prepare = guc_reset_prepare; - engine->reset.rewind = guc_reset_rewind; - engine->reset.cancel = guc_reset_cancel; - engine->reset.finish = guc_reset_finish; + engine->reset.prepare = guc_reset_nop; + engine->reset.rewind = guc_rewind_nop; + engine->reset.cancel = guc_reset_nop; + engine->reset.finish = guc_reset_nop;
engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb; @@ -1757,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, * register this context. */ with_intel_runtime_pm(runtime_pm, wakeref) - register_context(ce); + register_context(ce, true); guc_signal_context_fence(ce); intel_context_put(ce); } else if (context_destroyed(ce)) { @@ -1939,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
+ ve->base.add_active_request = + sibling->add_active_request; + ve->base.remove_active_request = + sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb = diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 6d8b9233214e..f0b02200aa01 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
- if (!intel_guc_is_ready(guc)) + + /* Nothing to do if GuC isn't supported */ + if (!intel_uc_supports_guc(uc)) return;
+ /* Firmware expected to be running when this function is called */ + if (!intel_guc_is_ready(guc)) + goto sanitize; + + if (intel_uc_uses_guc_submission(uc)) + intel_guc_submission_reset_prepare(guc); + +sanitize: __uc_sanitize(uc); }
+void intel_uc_reset(struct intel_uc *uc, bool stalled) +{ + struct intel_guc *guc = &uc->guc; + + /* Firmware can not be running when this function is called */ + if (intel_uc_uses_guc_submission(uc)) + intel_guc_submission_reset(guc, stalled); +} + +void intel_uc_reset_finish(struct intel_uc *uc) +{ + struct intel_guc *guc = &uc->guc; + + /* Firmware expected to be running when this function is called */ + if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc)) + intel_guc_submission_reset_finish(guc); +} + +void intel_uc_cancel_requests(struct intel_uc *uc) +{ + struct intel_guc *guc = &uc->guc; + + /* Firmware can not be running when this function is called */ + if (intel_uc_uses_guc_submission(uc)) + intel_guc_submission_cancel_requests(guc); +} + void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index c4cef885e984..eaa3202192ac 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc); void intel_uc_driver_remove(struct intel_uc *uc); void intel_uc_init_mmio(struct intel_uc *uc); void intel_uc_reset_prepare(struct intel_uc *uc); +void intel_uc_reset(struct intel_uc *uc, bool stalled); +void intel_uc_reset_finish(struct intel_uc *uc); +void intel_uc_cancel_requests(struct intel_uc *uc); void intel_uc_suspend(struct intel_uc *uc); void intel_uc_runtime_suspend(struct intel_uc *uc); int intel_uc_resume(struct intel_uc *uc); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index dec5a35c9aa2..192784875a1d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk) return false; }
-static void __notify_execute_cb_imm(struct i915_request *rq) +void i915_request_notify_execute_cb_imm(struct i915_request *rq) { __notify_execute_cb(rq, irq_work_imm); } @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq, return ret; }
- -static void remove_from_engine(struct i915_request *rq) -{ - struct intel_engine_cs *engine, *locked; - - /* - * Virtual engines complicate acquiring the engine timeline lock, - * as their rq->engine pointer is not stable until under that - * engine lock. The simple ploy we use is to take the lock then - * check that the rq still belongs to the newly locked engine. - */ - locked = READ_ONCE(rq->engine); - spin_lock_irq(&locked->sched_engine->lock); - while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) { - spin_unlock(&locked->sched_engine->lock); - spin_lock(&engine->sched_engine->lock); - locked = engine; - } - list_del_init(&rq->sched.link); - - clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); - clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags); - - /* Prevent further __await_execution() registering a cb, then flush */ - set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); - - spin_unlock_irq(&locked->sched_engine->lock); - - __notify_execute_cb_imm(rq); -} - static void __rq_init_watchdog(struct i915_request *rq) { rq->watchdog.timer.function = NULL; @@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq) * after removing the breadcrumb and signaling it, so that we do not * inadvertently attach the breadcrumb to a completed request. */ - if (!list_empty(&rq->sched.link)) - remove_from_engine(rq); - atomic_dec(&rq->context->guc_id_ref); + rq->engine->remove_active_request(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb));
__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */ @@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq, if (llist_add(&cb->work.node.llist, &signal->execute_cb)) { if (i915_request_is_active(signal) || __request_in_flight(signal)) - __notify_execute_cb_imm(signal); + i915_request_notify_execute_cb_imm(signal); }
return 0; @@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request) result = true;
GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags)); - list_move_tail(&request->sched.link, &engine->sched_engine->requests); + engine->add_active_request(request); active: clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags); diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index f870cd75a001..bcc6340c505e 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -649,4 +649,6 @@ bool i915_request_active_engine(struct i915_request *rq, struct intel_engine_cs **active);
+void i915_request_notify_execute_cb_imm(struct i915_request *rq); + #endif /* I915_REQUEST_H */
On 6/24/2021 00:05, Matthew Brost wrote:
Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies.
With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs.
There seems to be quite a lot more code being changed in the patch than is described above. Sure, it's all in order to support resets but there is a lot happening to request/context management, support for GuC submission enable/disable, etc. It feels like this patch really should be split into a couple of prep patches followed by the actual reset support. Plus see couple of minor comments below.
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + .../drm/i915/gt/intel_execlists_submission.c | 40 ++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 22 + drivers/gpu/drm/i915/gt/mock_engine.c | 31 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 +- drivers/gpu/drm/i915/i915_request.h | 2 + 15 files changed, 649 insertions(+), 171 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index b24a1b7a3f88..2f01437056a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences);
- spin_lock_init(&ce->guc_active.lock);
- INIT_LIST_HEAD(&ce->guc_active.requests);
- ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6945963a31ba..b63c8cf7823b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -165,6 +165,13 @@ struct intel_context { struct list_head fences; } guc_state;
- struct {
/** lock: protects everything in guc_active */
spinlock_t lock;
/** requests: active requests on this context */
struct list_head requests;
- } guc_active;
- /* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e7cb6a06db9d..f9d264c008e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -426,6 +426,12 @@ struct intel_engine_cs {
void (*release)(struct intel_engine_cs *engine);
/*
* Add / remove request from engine active tracking
*/
void (*add_active_request)(struct i915_request *rq);
void (*remove_active_request)(struct i915_request *rq);
struct intel_engine_execlists execlists;
/*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index c10ea6080752..c301a2d088b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); }
+static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static bool can_preempt(struct intel_engine_cs *engine) { if (GRAPHICS_VER(engine->i915) > 8)
@@ -3218,6 +3254,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; engine->bump_serial = execlist_bump_serial;
engine->add_active_request = add_to_engine;
engine->remove_active_request = remove_from_engine;
engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind;
@@ -3912,6 +3950,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request = sibling->add_active_request;
ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;ve->base.remove_active_request = sibling->remove_active_request;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index aef3084e8b16..463a6ae605a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt);
- intel_uc_sanitize(>->uc);
- for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); }
- intel_uc_reset(>->uc, false);
- for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; }
- intel_uc_reset_finish(>->uc);
- intel_rps_enable(>->rps); intel_llc_enable(>->llc);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..2987282dff6d 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable();
intel_uc_reset(>->uc, true);
intel_ggtt_restore_fences(gt->ggtt);
return err;
@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); }
intel_uc_reset_finish(>->uc); }
static void nop_submit_request(struct i915_request *request)
@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine);
intel_uc_cancel_requests(>->uc); local_bh_enable();
reset_finish(gt, awake);
@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
- if (intel_engine_uses_guc(engine))
return -ENODEV;
- if (!intel_engine_pm_get_if_awake(engine)) return 0;
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
- if (intel_engine_uses_guc(engine))
ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
- else
ret = intel_gt_reset_engine(engine);
- ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */
ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
goto out; }ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt, * Try engine reset when available. We fall back to full reset if * single reset fails. */
- if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
- if (!intel_uc_uses_guc_submission(>->uc) &&
local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index e1506b280df1..99dcdc8fba12 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1049,6 +1049,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine) engine->serial++; }
+static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- spin_lock_irq(&rq->engine->sched_engine->lock);
- list_del_init(&rq->sched.link);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&rq->engine->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1085,9 @@ static void setup_common(struct intel_engine_cs *engine) engine->reset.cancel = reset_cancel; engine->reset.finish = reset_finish;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine;
- engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index fc5a65ab1937..c12ff3a75ce6 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request) spin_unlock_irqrestore(&engine->hw_lock, flags); }
+static void mock_add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void mock_remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- spin_unlock_irq(&locked->sched_engine->lock);
+}
- static void mock_reset_prepare(struct intel_engine_cs *engine) { }
@@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request;
engine->base.add_active_request = mock_add_to_engine;
engine->base.remove_active_request = mock_remove_from_engine;
engine->base.reset.prepare = mock_reset_prepare; engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 6661dcb02239..9b09395b998f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc) return 0; }
-/**
- intel_guc_reset_engine() - ask GuC to reset an engine
- @guc: intel_guc structure
- @engine: engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine)
-{
- /* XXX: to be implemented with submission interface rework */
- return -ENODEV;
-}
- /**
- intel_guc_resume() - notify GuC resuming from suspend state
- @guc: the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 22eb1e9cca41..40c9868762d7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
-int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine);
- int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len);
+void intel_guc_submission_reset_prepare(struct intel_guc *guc); +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); +void intel_guc_submission_reset_finish(struct intel_guc *guc); +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
#endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 83058df5ba01..b8c894ad8caf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce) static inline void set_context_wait_for_deregister_to_register(struct intel_context *ce) {
- /* Only should be called from guc_lrc_desc_pin() */
- /* Only should be called from guc_lrc_desc_pin() without lock */ ce->guc_state.sched_state |= SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; }
@@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) {
- guc->lrc_desc_pool_vaddr = NULL; i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline bool guc_submission_initialized(struct intel_guc *guc) +{
- return guc->lrc_desc_pool_vaddr != NULL;
+}
- static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) {
- struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
- if (likely(guc_submission_initialized(guc))) {
struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
unsigned long flags;
- memset(desc, 0, sizeof(*desc));
- xa_erase_irq(&guc->context_lookup, id);
memset(desc, 0, sizeof(*desc));
/*
* xarray API doesn't have xa_erase_irqsave wrapper, so calling
* the lower level functions directly.
*/
xa_lock_irqsave(&guc->context_lookup, flags);
__xa_erase(&guc->context_lookup, id);
xa_unlock_irqrestore(&guc->context_lookup, flags);
} }
static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, struct intel_context *ce) {
- xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
unsigned long flags;
/*
* xarray API doesn't have xa_save_irqsave wrapper, so calling the
* lower level functions directly.
*/
xa_lock_irqsave(&guc->context_lookup, flags);
__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
xa_unlock_irqrestore(&guc->context_lookup, flags); }
static int guc_submission_busy_loop(struct intel_guc* guc,
@@ -331,6 +355,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) interruptible, timeout); }
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
- static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err;
@@ -338,11 +364,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) u32 action[3]; int len = 0; u32 g2h_len_dw = 0;
- bool enabled = context_enabled(ce);
bool enabled;
GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); GEM_BUG_ON(context_guc_id_invalid(ce));
/*
* Corner case where the GuC firmware was blown away and reloaded while
* this context was pinned.
*/
if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
err = guc_lrc_desc_pin(ce, false);
if (unlikely(err))
goto out;
}
enabled = context_enabled(ce);
if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id;
@@ -365,6 +402,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) intel_context_put(ce); }
+out: return err; }
@@ -419,15 +457,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc) if (submit) { guc_set_lrc_tail(last); resubmit:
/*
* We only check for -EBUSY here even though it is possible for
* -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
* died and a full GPU needs to be done. The hangcheck will
* eventually detect that the GuC has died and trigger this
* reset so no need to handle -EDEADLK here.
ret = guc_add_request(guc, last);*/
if (ret == -EBUSY) {
if (unlikely(ret == -EIO))
goto deadlk;
else if (ret == -EBUSY) { tasklet_schedule(&sched_engine->tasklet); guc->stalled_request = last; return false;
@@ -437,6 +470,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
guc->stalled_request = NULL; return submit;
+deadlk:
sched_engine->tasklet.callback = NULL;
tasklet_disable_nosync(&sched_engine->tasklet);
return false; }
static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -463,27 +501,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) intel_engine_signal_breadcrumbs(engine); }
-static void guc_reset_prepare(struct intel_engine_cs *engine) +static void __guc_context_destroy(struct intel_context *ce); +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce); +static void guc_signal_context_fence(struct intel_context *ce);
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +{
- struct intel_context *ce;
- unsigned long index, flags;
- bool pending_disable, pending_enable, deregister, destroyed;
- xa_for_each(&guc->context_lookup, index, ce) {
/* Flush context */
spin_lock_irqsave(&ce->guc_state.lock, flags);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
/*
* Once we are at this point submission_disabled() is guaranteed
* to visible to all callers who set the below flags (see above
to be visible
* flush and flushes in reset_prepare). If submission_disabled()
* is set, the caller shouldn't set these flags.
*/
destroyed = context_destroyed(ce);
pending_enable = context_pending_enable(ce);
pending_disable = context_pending_disable(ce);
deregister = context_wait_for_deregister_to_register(ce);
init_sched_state(ce);
if (pending_enable || destroyed || deregister) {
atomic_dec(&guc->outstanding_submission_g2h);
if (deregister)
guc_signal_context_fence(ce);
if (destroyed) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
}
if (pending_enable|| deregister)
intel_context_put(ce);
}
/* Not mutualy exclusive with above if statement. */
if (pending_disable) {
guc_signal_context_fence(ce);
intel_context_sched_disable_unpin(ce);
atomic_dec(&guc->outstanding_submission_g2h);
intel_context_put(ce);
}
- }
+}
+static inline bool +submission_disabled(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+static void disable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- if (__tasklet_is_enabled(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
__tasklet_disable_sync_once(&sched_engine->tasklet);
sched_engine->tasklet.callback = NULL;
- }
+}
+static void enable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&guc->sched_engine->lock, flags);
- sched_engine->tasklet.callback = guc_submission_tasklet;
- wmb();
- if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
__tasklet_enable(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&sched_engine->tasklet);
- }
- spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+static void guc_flush_submissions(struct intel_guc *guc) {
- ENGINE_TRACE(engine, "\n");
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+void intel_guc_submission_reset_prepare(struct intel_guc *guc) +{
- int i;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
Ew. Multi-line if without braces just looks broken even if one line is just a comment.
disable_submission(guc);
guc->interrupts.disable(guc);
/* Flush IRQ handler */
spin_lock_irq(&guc_to_gt(guc)->irq_lock);
spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
guc_flush_submissions(guc);
/*
* Prevent request submission to the hardware until we have
* completed the reset in i915_gem_reset_finish(). If a request
* is completed by one engine, it may then queue a request
* to a second via its execlists->tasklet *just* as we are
* calling engine->init_hw() and also writing the ELSP.
* Turning off the execlists->tasklet until the reset is over
* prevents the race.
*/
- __tasklet_disable_sync_once(&engine->sched_engine->tasklet);
* Handle any outstanding G2Hs before reset. Call IRQ handler directly
* each pass as interrupt have been disabled. We always scrub for
* outstanding G2H as it is possible for outstanding_submission_g2h to
* be incremented after the context state update.
*/
- for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
do {
wait_for_reset(guc, &guc->outstanding_submission_g2h);
} while (!list_empty(&guc->ct.requests.incoming));
- }
- scrub_guc_desc_for_outstanding_g2h(guc); }
-static void guc_reset_state(struct intel_context *ce,
struct intel_engine_cs *engine,
u32 head,
bool scrub)
+static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) {
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
+}
+static inline struct intel_engine_cs * +__context_to_physical_engine(struct intel_context *ce) +{
- struct intel_engine_cs *engine = ce->engine;
- if (intel_engine_is_virtual(engine))
engine = guc_virtual_get_sibling(engine, 0);
- return engine;
+}
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub) +{
struct intel_engine_cs *engine = __context_to_physical_engine(ce);
GEM_BUG_ON(!intel_context_is_pinned(ce));
/*
@@ -501,42 +677,147 @@ static void guc_reset_state(struct intel_context *ce, lrc_update_regs(ce, engine, head); }
-static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) +static void guc_reset_nop(struct intel_engine_cs *engine) {
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_request *rq;
+}
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled) +{ +}
+static void +__unwind_incomplete_requests(struct intel_context *ce) +{
- struct i915_request *rq, *rn;
- struct list_head *pl;
- int prio = I915_PRIORITY_INVALID;
- struct i915_sched_engine * const sched_engine =
unsigned long flags;ce->engine->sched_engine;
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry_safe(rq, rn,
&ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
continue;
list_del_init(&rq->sched.link);
spin_unlock(&ce->guc_active.lock);
__i915_request_unsubmit(rq);
/* Push the request back into the queue for later resubmission. */
GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
if (rq_prio(rq) != prio) {
prio = rq_prio(rq);
pl = i915_sched_lookup_priolist(sched_engine, prio);
}
GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
list_add_tail(&rq->sched.link, pl);
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Push back any incomplete requests for replay after the reset. */
- rq = execlists_unwind_incomplete_requests(execlists);
- if (!rq)
goto out_unlock;
spin_lock(&ce->guc_active.lock);
- }
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static struct i915_request *context_find_active_request(struct intel_context *ce) +{
- struct i915_request *rq, *active = NULL;
- unsigned long flags;
- spin_lock_irqsave(&ce->guc_active.lock, flags);
- list_for_each_entry_reverse(rq, &ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
break;
active = rq;
- }
- spin_unlock_irqrestore(&ce->guc_active.lock, flags);
- return active;
+}
+static void __guc_reset_context(struct intel_context *ce, bool stalled) +{
struct i915_request *rq;
u32 head;
/*
* GuC will implicitly mark the context as non-schedulable
* when it sends the reset notification. Make sure our state
* reflects this change. The context will be marked enabled
* on resubmission.
*/
clr_context_enabled(ce);
rq = context_find_active_request(ce);
if (!rq) {
head = ce->ring->tail;
stalled = false;
goto out_replay;
}
if (!i915_request_started(rq)) stalled = false;
GEM_BUG_ON(i915_active_is_idle(&ce->active));
head = intel_ring_wrap(ce->ring, rq->head); __i915_request_reset(rq, stalled);
- guc_reset_state(rq->context, engine, rq->head, stalled);
-out_unlock:
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
- guc_reset_state(ce, head, stalled);
- __unwind_incomplete_requests(ce); }
-static void guc_reset_cancel(struct intel_engine_cs *engine) +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled) +{
- struct intel_context *ce;
- unsigned long index;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
And again.
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
__guc_reset_context(ce, stalled);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+static void guc_cancel_context_requests(struct intel_context *ce) +{
- struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
- struct i915_request *rq;
- unsigned long flags;
- /* Mark all executing requests as skipped. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
i915_request_put(i915_request_mark_eio(rq));
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static void +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine) {
struct i915_sched_engine * const sched_engine = engine->sched_engine; struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags;
/* Can be called during boot if GuC fails to load */
if (!engine->gt)
- if (!sched_engine) return;
- ENGINE_TRACE(engine, "\n");
- /*
- Before we call engine->cancel_requests(), we should have exclusive
- access to the submission state. This is arranged for us by the
@@ -553,21 +834,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) */ spin_lock_irqsave(&sched_engine->lock, flags);
/* Mark all executing requests as skipped. */
list_for_each_entry(rq, &sched_engine->requests, sched.link) {
i915_request_set_error_once(rq, -EIO);
i915_request_mark_complete(rq);
}
/* Flush the queued requests to the timeline list (for retiring). */ while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb);
priolist_for_each_request_consume(rq, rn, p) { list_del_init(&rq->sched.link);
__i915_request_submit(rq);
dma_fence_set_error(&rq->fence, -EIO);
i915_request_mark_complete(rq);
i915_request_put(i915_request_mark_eio(rq));
}
rb_erase_cached(&p->node, &sched_engine->queue);
@@ -582,14 +858,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) spin_unlock_irqrestore(&sched_engine->lock, flags); }
-static void guc_reset_finish(struct intel_engine_cs *engine) +void intel_guc_submission_cancel_requests(struct intel_guc *guc) {
- if (__tasklet_enable(&engine->sched_engine->tasklet))
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&engine->sched_engine->tasklet);
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
guc_cancel_context_requests(ce);
- ENGINE_TRACE(engine, "depth->%d\n",
atomic_read(&engine->sched_engine->tasklet.count));
- guc_cancel_sched_engine_requests(guc->sched_engine);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
This function shares most of the code with 'intel_guc_submission_reset'. Can the xa clean be moved to a common helper?
John.
+}
+void intel_guc_submission_reset_finish(struct intel_guc *guc) +{
/* Reset called during driver load or during wedge? */
if (unlikely(!guc_submission_initialized(guc) ||
test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
return;
/*
* Technically possible for either of these values to be non-zero here,
* but very unlikely + harmless. Regardless let's add a warn so we can
* see in CI if this happens frequently / a precursor to taking down the
* machine.
*/
GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
atomic_set(&guc->outstanding_submission_g2h, 0);
enable_submission(guc); }
/*
@@ -656,6 +956,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, else trace_i915_request_guc_submit(rq);
- if (unlikely(ret == -EIO))
disable_submission(guc);
- return ret; }
@@ -668,7 +971,8 @@ static void guc_submit_request(struct i915_request *rq) /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags);
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
- if (submission_disabled(guc) || guc->stalled_request ||
queue_request(sched_engine, rq, rq_prio(rq)); else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) tasklet_hi_schedule(&sched_engine->tasklet);!i915_sched_engine_is_empty(sched_engine))
@@ -805,7 +1109,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
static int __guc_action_register_context(struct intel_guc *guc, u32 guc_id,
u32 offset)
u32 offset,
{ u32 action[] = { INTEL_GUC_ACTION_REGISTER_CONTEXT,bool loop)
@@ -813,10 +1118,10 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop); }
-static int register_context(struct intel_context *ce) +static int register_context(struct intel_context *ce, bool loop) { struct intel_guc *guc = ce_to_guc(ce); u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + @@ -824,11 +1129,12 @@ static int register_context(struct intel_context *ce)
trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset);
return __guc_action_register_context(guc, ce->guc_id, offset, loop); }
static int __guc_action_deregister_context(struct intel_guc *guc,
u32 guc_id)
u32 guc_id,
{ u32 action[] = { INTEL_GUC_ACTION_DEREGISTER_CONTEXT,bool loop)
@@ -836,16 +1142,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc, };
return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
G2H_LEN_DW_DEREGISTER_CONTEXT, true);
}G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
-static int deregister_context(struct intel_context *ce, u32 guc_id) +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop) { struct intel_guc *guc = ce_to_guc(ce);
trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id);
return __guc_action_deregister_context(guc, guc_id, loop); }
static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -874,7 +1180,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US; }
-static int guc_lrc_desc_pin(struct intel_context *ce) +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -920,18 +1226,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce) */ if (context_registered) { trace_intel_context_steal_guc_id(ce);
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
if (!loop) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
} else {
bool disabled;
unsigned long flags;
/* Seal race with Reset */
spin_lock_irqsave(&ce->guc_state.lock, flags);
disabled = submission_disabled(guc);
if (likely(!disabled)) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
}
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
if (unlikely(disabled)) {
reset_lrc_desc(guc, desc_idx);
return 0; /* Will get registered later */
}
}
/*
- If stealing the guc_id, this ce has the same guc_id as the
- context whos guc_id was stole.
*/ with_intel_runtime_pm(runtime_pm, wakeref)
ret = deregister_context(ce, ce->guc_id);
ret = deregister_context(ce, ce->guc_id, loop);
if (unlikely(ret == -EBUSY)) {
clr_context_wait_for_deregister_to_register(ce);
intel_context_put(ce);
} else { with_intel_runtime_pm(runtime_pm, wakeref)}
ret = register_context(ce);
ret = register_context(ce, loop);
if (unlikely(ret == -EBUSY))
reset_lrc_desc(guc, desc_idx);
else if (unlikely(ret == -ENODEV))
ret = 0; /* Will get registered later */
}
return ret;
@@ -994,7 +1326,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
trace_intel_context_sched_disable(ce);
intel_context_get(ce);
guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1004,6 +1335,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce) { set_context_pending_disable(ce); clr_context_enabled(ce);
intel_context_get(ce);
return ce->guc_id; }
@@ -1016,7 +1348,7 @@ static void guc_context_sched_disable(struct intel_context *ce) u16 guc_id; intel_wakeref_t wakeref;
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) || context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { clr_context_enabled(ce); goto unpin;
@@ -1034,6 +1366,7 @@ static void guc_context_sched_disable(struct intel_context *ce) * request doesn't slip through the 'context_pending_disable' fence. */ if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
return; } guc_id = prep_context_pending_disable(ce);spin_unlock_irqrestore(&ce->guc_state.lock, flags);
@@ -1050,19 +1383,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
static inline void guc_lrc_desc_unpin(struct intel_context *ce) {
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- unsigned long flags;
struct intel_guc *guc = ce_to_guc(ce);
GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); GEM_BUG_ON(context_enabled(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- deregister_context(ce, ce->guc_id);
deregister_context(ce, ce->guc_id, true); }
static void __guc_context_destroy(struct intel_context *ce)
@@ -1090,13 +1417,15 @@ static void guc_context_destroy(struct kref *kref) struct intel_guc *guc = &ce->engine->gt->uc.guc; intel_wakeref_t wakeref; unsigned long flags;
bool disabled;
/*
- If the guc_id is invalid this context has been stolen and we can free
- it immediately. Also can be freed immediately if the context is not
- registered with the GuC.
*/
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) ||
release_guc_id(guc, ce); __guc_context_destroy(ce);context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) {
@@ -1123,6 +1452,18 @@ static void guc_context_destroy(struct kref *kref) list_del_init(&ce->guc_id_link); spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /* Seal race with Reset */
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- disabled = submission_disabled(guc);
- if (likely(!disabled))
set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- if (unlikely(disabled)) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
return;
- }
- /*
- We defer GuC context deregistration until the context is destroyed
- in order to save on CTBs. With this optimization ideally we only need
@@ -1145,6 +1486,33 @@ static int guc_context_alloc(struct intel_context *ce) return lrc_alloc(ce, ce->engine); }
+static void add_to_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock(&ce->guc_active.lock);
- list_move_tail(&rq->sched.link, &ce->guc_active.requests);
- spin_unlock(&ce->guc_active.lock);
+}
+static void remove_from_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock_irq(&ce->guc_active.lock);
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&ce->guc_active.lock);
- atomic_dec(&ce->guc_id_ref);
- i915_request_notify_execute_cb_imm(rq);
+}
- static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
@@ -1183,8 +1551,6 @@ static void guc_signal_context_fence(struct intel_context *ce) { unsigned long flags;
- GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_wait_for_deregister_to_register(ce); __guc_signal_context_fence(ce);
@@ -1193,8 +1559,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
static bool context_needs_register(struct intel_context *ce, bool new_guc_id) {
- return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
!submission_disabled(ce_to_guc(ce));
}
static int guc_request_alloc(struct i915_request *rq)
@@ -1252,8 +1619,12 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(ret < 0)) return ret;; if (context_needs_register(ce, !!ret)) {
ret = guc_lrc_desc_pin(ce);
if (unlikely(ret)) { /* unwind */ret = guc_lrc_desc_pin(ce, true);
if (ret == -EIO) {
disable_submission(guc);
goto out; /* GPU will be reset */
} atomic_dec(&ce->guc_id_ref); unpin_guc_id(guc, ce); return ret;
@@ -1290,20 +1661,6 @@ static int guc_request_alloc(struct i915_request *rq) return 0; }
-static struct intel_engine_cs * -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) -{
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
-}
- static int guc_virtual_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr)
@@ -1512,7 +1869,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc, { if (context_guc_id_invalid(ce)) pin_guc_id(guc, ce);
- guc_lrc_desc_pin(ce);
guc_lrc_desc_pin(ce, true); }
static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1578,13 +1935,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; engine->bump_serial = guc_bump_serial;
engine->add_active_request = add_to_context;
engine->remove_active_request = remove_from_context;
engine->sched_engine->schedule = i915_schedule;
- engine->reset.prepare = guc_reset_prepare;
- engine->reset.rewind = guc_reset_rewind;
- engine->reset.cancel = guc_reset_cancel;
- engine->reset.finish = guc_reset_finish;
engine->reset.prepare = guc_reset_nop;
engine->reset.rewind = guc_rewind_nop;
engine->reset.cancel = guc_reset_nop;
engine->reset.finish = guc_reset_nop;
engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1757,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, * register this context. */ with_intel_runtime_pm(runtime_pm, wakeref)
register_context(ce);
guc_signal_context_fence(ce); intel_context_put(ce); } else if (context_destroyed(ce)) {register_context(ce, true);
@@ -1939,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request =
sibling->add_active_request;
ve->base.remove_active_request =
sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 6d8b9233214e..f0b02200aa01 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
- if (!intel_guc_is_ready(guc))
/* Nothing to do if GuC isn't supported */
if (!intel_uc_supports_guc(uc)) return;
/* Firmware expected to be running when this function is called */
if (!intel_guc_is_ready(guc))
goto sanitize;
if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_prepare(guc);
+sanitize: __uc_sanitize(uc); }
+void intel_uc_reset(struct intel_uc *uc, bool stalled) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset(guc, stalled);
+}
+void intel_uc_reset_finish(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware expected to be running when this function is called */
- if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_finish(guc);
+}
+void intel_uc_cancel_requests(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_cancel_requests(guc);
+}
- void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index c4cef885e984..eaa3202192ac 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc); void intel_uc_driver_remove(struct intel_uc *uc); void intel_uc_init_mmio(struct intel_uc *uc); void intel_uc_reset_prepare(struct intel_uc *uc); +void intel_uc_reset(struct intel_uc *uc, bool stalled); +void intel_uc_reset_finish(struct intel_uc *uc); +void intel_uc_cancel_requests(struct intel_uc *uc); void intel_uc_suspend(struct intel_uc *uc); void intel_uc_runtime_suspend(struct intel_uc *uc); int intel_uc_resume(struct intel_uc *uc); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index dec5a35c9aa2..192784875a1d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk) return false; }
-static void __notify_execute_cb_imm(struct i915_request *rq) +void i915_request_notify_execute_cb_imm(struct i915_request *rq) { __notify_execute_cb(rq, irq_work_imm); } @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq, return ret; }
-static void remove_from_engine(struct i915_request *rq) -{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- __notify_execute_cb_imm(rq);
-}
- static void __rq_init_watchdog(struct i915_request *rq) { rq->watchdog.timer.function = NULL;
@@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq) * after removing the breadcrumb and signaling it, so that we do not * inadvertently attach the breadcrumb to a completed request. */
- if (!list_empty(&rq->sched.link))
remove_from_engine(rq);
- atomic_dec(&rq->context->guc_id_ref);
rq->engine->remove_active_request(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb));
__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq, if (llist_add(&cb->work.node.llist, &signal->execute_cb)) { if (i915_request_is_active(signal) || __request_in_flight(signal))
__notify_execute_cb_imm(signal);
i915_request_notify_execute_cb_imm(signal);
}
return 0;
@@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request) result = true;
GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
- list_move_tail(&request->sched.link, &engine->sched_engine->requests);
- engine->add_active_request(request); active: clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index f870cd75a001..bcc6340c505e 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -649,4 +649,6 @@ bool i915_request_active_engine(struct i915_request *rq, struct intel_engine_cs **active);
+void i915_request_notify_execute_cb_imm(struct i915_request *rq);
- #endif /* I915_REQUEST_H */
On Mon, Jul 12, 2021 at 12:58:45PM -0700, John Harrison wrote:
On 6/24/2021 00:05, Matthew Brost wrote:
Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies.
With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs.
There seems to be quite a lot more code being changed in the patch than is described above. Sure, it's all in order to support resets but there is a lot happening to request/context management, support for GuC submission enable/disable, etc. It feels like this patch really should be split into a couple of prep patches followed by the actual reset support. Plus see couple of minor comments below.
Yea, this is probably the most churned on patch we have as getting resets to full work isn't easy. I'll fix the below comments but I don't know if it worth spliting. Everything in the patch is required to get resets to work and I think it is better to have in a single patch so 'git blame' can give you the whole picture.
Matt
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + .../drm/i915/gt/intel_execlists_submission.c | 40 ++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 22 + drivers/gpu/drm/i915/gt/mock_engine.c | 31 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 +- drivers/gpu/drm/i915/i915_request.h | 2 + 15 files changed, 649 insertions(+), 171 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index b24a1b7a3f88..2f01437056a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences);
- spin_lock_init(&ce->guc_active.lock);
- INIT_LIST_HEAD(&ce->guc_active.requests);
- ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6945963a31ba..b63c8cf7823b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -165,6 +165,13 @@ struct intel_context { struct list_head fences; } guc_state;
- struct {
/** lock: protects everything in guc_active */
spinlock_t lock;
/** requests: active requests on this context */
struct list_head requests;
- } guc_active;
- /* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e7cb6a06db9d..f9d264c008e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -426,6 +426,12 @@ struct intel_engine_cs { void (*release)(struct intel_engine_cs *engine);
- /*
* Add / remove request from engine active tracking
*/
- void (*add_active_request)(struct i915_request *rq);
- void (*remove_active_request)(struct i915_request *rq);
- struct intel_engine_execlists execlists; /*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index c10ea6080752..c301a2d088b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); } +static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static bool can_preempt(struct intel_engine_cs *engine) { if (GRAPHICS_VER(engine->i915) > 8)
@@ -3218,6 +3254,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; engine->bump_serial = execlist_bump_serial;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine; engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind;
@@ -3912,6 +3950,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request = sibling->add_active_request;
ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;ve->base.remove_active_request = sibling->remove_active_request;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index aef3084e8b16..463a6ae605a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt);
- intel_uc_sanitize(>->uc);
- for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); }
- intel_uc_reset(>->uc, false);
- for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; }
- intel_uc_reset_finish(>->uc);
- intel_rps_enable(>->rps); intel_llc_enable(>->llc);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..2987282dff6d 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable();
- intel_uc_reset(>->uc, true);
- intel_ggtt_restore_fences(gt->ggtt); return err;
@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); }
- intel_uc_reset_finish(>->uc); } static void nop_submit_request(struct i915_request *request)
@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine);
- intel_uc_cancel_requests(>->uc); local_bh_enable(); reset_finish(gt, awake);
@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
- if (intel_engine_uses_guc(engine))
return -ENODEV;
- if (!intel_engine_pm_get_if_awake(engine)) return 0;
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
- if (intel_engine_uses_guc(engine))
ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
- else
ret = intel_gt_reset_engine(engine);
- ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */
ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
goto out; }ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt, * Try engine reset when available. We fall back to full reset if * single reset fails. */
- if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
- if (!intel_uc_uses_guc_submission(>->uc) &&
local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index e1506b280df1..99dcdc8fba12 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1049,6 +1049,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine) engine->serial++; } +static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- spin_lock_irq(&rq->engine->sched_engine->lock);
- list_del_init(&rq->sched.link);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&rq->engine->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1085,9 @@ static void setup_common(struct intel_engine_cs *engine) engine->reset.cancel = reset_cancel; engine->reset.finish = reset_finish;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine;
- engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index fc5a65ab1937..c12ff3a75ce6 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request) spin_unlock_irqrestore(&engine->hw_lock, flags); } +static void mock_add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void mock_remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- spin_unlock_irq(&locked->sched_engine->lock);
+}
- static void mock_reset_prepare(struct intel_engine_cs *engine) { }
@@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request;
- engine->base.add_active_request = mock_add_to_engine;
- engine->base.remove_active_request = mock_remove_from_engine; engine->base.reset.prepare = mock_reset_prepare; engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 6661dcb02239..9b09395b998f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc) return 0; } -/**
- intel_guc_reset_engine() - ask GuC to reset an engine
- @guc: intel_guc structure
- @engine: engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine)
-{
- /* XXX: to be implemented with submission interface rework */
- return -ENODEV;
-}
- /**
- intel_guc_resume() - notify GuC resuming from suspend state
- @guc: the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 22eb1e9cca41..40c9868762d7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout); -int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine);
- int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len);
+void intel_guc_submission_reset_prepare(struct intel_guc *guc); +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); +void intel_guc_submission_reset_finish(struct intel_guc *guc); +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
- void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 83058df5ba01..b8c894ad8caf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce) static inline void set_context_wait_for_deregister_to_register(struct intel_context *ce) {
- /* Only should be called from guc_lrc_desc_pin() */
- /* Only should be called from guc_lrc_desc_pin() without lock */ ce->guc_state.sched_state |= SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; }
@@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc) static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) {
- guc->lrc_desc_pool_vaddr = NULL; i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline bool guc_submission_initialized(struct intel_guc *guc) +{
- return guc->lrc_desc_pool_vaddr != NULL;
+}
- static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) {
- struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
- if (likely(guc_submission_initialized(guc))) {
struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
unsigned long flags;
- memset(desc, 0, sizeof(*desc));
- xa_erase_irq(&guc->context_lookup, id);
memset(desc, 0, sizeof(*desc));
/*
* xarray API doesn't have xa_erase_irqsave wrapper, so calling
* the lower level functions directly.
*/
xa_lock_irqsave(&guc->context_lookup, flags);
__xa_erase(&guc->context_lookup, id);
xa_unlock_irqrestore(&guc->context_lookup, flags);
- } } static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, struct intel_context *ce) {
- xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
- unsigned long flags;
- /*
* xarray API doesn't have xa_save_irqsave wrapper, so calling the
* lower level functions directly.
*/
- xa_lock_irqsave(&guc->context_lookup, flags);
- __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
- xa_unlock_irqrestore(&guc->context_lookup, flags); } static int guc_submission_busy_loop(struct intel_guc* guc,
@@ -331,6 +355,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) interruptible, timeout); } +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
- static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err;
@@ -338,11 +364,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) u32 action[3]; int len = 0; u32 g2h_len_dw = 0;
- bool enabled = context_enabled(ce);
- bool enabled; GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); GEM_BUG_ON(context_guc_id_invalid(ce));
- /*
* Corner case where the GuC firmware was blown away and reloaded while
* this context was pinned.
*/
- if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
err = guc_lrc_desc_pin(ce, false);
if (unlikely(err))
goto out;
- }
- enabled = context_enabled(ce);
- if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id;
@@ -365,6 +402,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) intel_context_put(ce); } +out: return err; } @@ -419,15 +457,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc) if (submit) { guc_set_lrc_tail(last); resubmit:
/*
* We only check for -EBUSY here even though it is possible for
* -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
* died and a full GPU needs to be done. The hangcheck will
* eventually detect that the GuC has died and trigger this
* reset so no need to handle -EDEADLK here.
ret = guc_add_request(guc, last);*/
if (ret == -EBUSY) {
if (unlikely(ret == -EIO))
goto deadlk;
else if (ret == -EBUSY) { tasklet_schedule(&sched_engine->tasklet); guc->stalled_request = last; return false;
@@ -437,6 +470,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = NULL; return submit;
+deadlk:
- sched_engine->tasklet.callback = NULL;
- tasklet_disable_nosync(&sched_engine->tasklet);
- return false; } static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -463,27 +501,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) intel_engine_signal_breadcrumbs(engine); } -static void guc_reset_prepare(struct intel_engine_cs *engine) +static void __guc_context_destroy(struct intel_context *ce); +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce); +static void guc_signal_context_fence(struct intel_context *ce);
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +{
- struct intel_context *ce;
- unsigned long index, flags;
- bool pending_disable, pending_enable, deregister, destroyed;
- xa_for_each(&guc->context_lookup, index, ce) {
/* Flush context */
spin_lock_irqsave(&ce->guc_state.lock, flags);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
/*
* Once we are at this point submission_disabled() is guaranteed
* to visible to all callers who set the below flags (see above
to be visible
* flush and flushes in reset_prepare). If submission_disabled()
* is set, the caller shouldn't set these flags.
*/
destroyed = context_destroyed(ce);
pending_enable = context_pending_enable(ce);
pending_disable = context_pending_disable(ce);
deregister = context_wait_for_deregister_to_register(ce);
init_sched_state(ce);
if (pending_enable || destroyed || deregister) {
atomic_dec(&guc->outstanding_submission_g2h);
if (deregister)
guc_signal_context_fence(ce);
if (destroyed) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
}
if (pending_enable|| deregister)
intel_context_put(ce);
}
/* Not mutualy exclusive with above if statement. */
if (pending_disable) {
guc_signal_context_fence(ce);
intel_context_sched_disable_unpin(ce);
atomic_dec(&guc->outstanding_submission_g2h);
intel_context_put(ce);
}
- }
+}
+static inline bool +submission_disabled(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+static void disable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- if (__tasklet_is_enabled(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
__tasklet_disable_sync_once(&sched_engine->tasklet);
sched_engine->tasklet.callback = NULL;
- }
+}
+static void enable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&guc->sched_engine->lock, flags);
- sched_engine->tasklet.callback = guc_submission_tasklet;
- wmb();
- if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
__tasklet_enable(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&sched_engine->tasklet);
- }
- spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+static void guc_flush_submissions(struct intel_guc *guc) {
- ENGINE_TRACE(engine, "\n");
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+void intel_guc_submission_reset_prepare(struct intel_guc *guc) +{
- int i;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
Ew. Multi-line if without braces just looks broken even if one line is just a comment.
- disable_submission(guc);
- guc->interrupts.disable(guc);
- /* Flush IRQ handler */
- spin_lock_irq(&guc_to_gt(guc)->irq_lock);
- spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
- guc_flush_submissions(guc); /*
* Prevent request submission to the hardware until we have
* completed the reset in i915_gem_reset_finish(). If a request
* is completed by one engine, it may then queue a request
* to a second via its execlists->tasklet *just* as we are
* calling engine->init_hw() and also writing the ELSP.
* Turning off the execlists->tasklet until the reset is over
* prevents the race.
*/
- __tasklet_disable_sync_once(&engine->sched_engine->tasklet);
* Handle any outstanding G2Hs before reset. Call IRQ handler directly
* each pass as interrupt have been disabled. We always scrub for
* outstanding G2H as it is possible for outstanding_submission_g2h to
* be incremented after the context state update.
*/
- for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
do {
wait_for_reset(guc, &guc->outstanding_submission_g2h);
} while (!list_empty(&guc->ct.requests.incoming));
- }
- scrub_guc_desc_for_outstanding_g2h(guc); }
-static void guc_reset_state(struct intel_context *ce,
struct intel_engine_cs *engine,
u32 head,
bool scrub)
+static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) {
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
+}
+static inline struct intel_engine_cs * +__context_to_physical_engine(struct intel_context *ce) +{
- struct intel_engine_cs *engine = ce->engine;
- if (intel_engine_is_virtual(engine))
engine = guc_virtual_get_sibling(engine, 0);
- return engine;
+}
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub) +{
- struct intel_engine_cs *engine = __context_to_physical_engine(ce);
- GEM_BUG_ON(!intel_context_is_pinned(ce)); /*
@@ -501,42 +677,147 @@ static void guc_reset_state(struct intel_context *ce, lrc_update_regs(ce, engine, head); } -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) +static void guc_reset_nop(struct intel_engine_cs *engine) {
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_request *rq;
+}
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled) +{ +}
+static void +__unwind_incomplete_requests(struct intel_context *ce) +{
- struct i915_request *rq, *rn;
- struct list_head *pl;
- int prio = I915_PRIORITY_INVALID;
- struct i915_sched_engine * const sched_engine =
unsigned long flags;ce->engine->sched_engine;
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry_safe(rq, rn,
&ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
continue;
list_del_init(&rq->sched.link);
spin_unlock(&ce->guc_active.lock);
__i915_request_unsubmit(rq);
/* Push the request back into the queue for later resubmission. */
GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
if (rq_prio(rq) != prio) {
prio = rq_prio(rq);
pl = i915_sched_lookup_priolist(sched_engine, prio);
}
GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
list_add_tail(&rq->sched.link, pl);
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Push back any incomplete requests for replay after the reset. */
- rq = execlists_unwind_incomplete_requests(execlists);
- if (!rq)
goto out_unlock;
spin_lock(&ce->guc_active.lock);
- }
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static struct i915_request *context_find_active_request(struct intel_context *ce) +{
- struct i915_request *rq, *active = NULL;
- unsigned long flags;
- spin_lock_irqsave(&ce->guc_active.lock, flags);
- list_for_each_entry_reverse(rq, &ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
break;
active = rq;
- }
- spin_unlock_irqrestore(&ce->guc_active.lock, flags);
- return active;
+}
+static void __guc_reset_context(struct intel_context *ce, bool stalled) +{
- struct i915_request *rq;
- u32 head;
- /*
* GuC will implicitly mark the context as non-schedulable
* when it sends the reset notification. Make sure our state
* reflects this change. The context will be marked enabled
* on resubmission.
*/
- clr_context_enabled(ce);
- rq = context_find_active_request(ce);
- if (!rq) {
head = ce->ring->tail;
stalled = false;
goto out_replay;
- } if (!i915_request_started(rq)) stalled = false;
- GEM_BUG_ON(i915_active_is_idle(&ce->active));
- head = intel_ring_wrap(ce->ring, rq->head); __i915_request_reset(rq, stalled);
- guc_reset_state(rq->context, engine, rq->head, stalled);
-out_unlock:
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
- guc_reset_state(ce, head, stalled);
- __unwind_incomplete_requests(ce); }
-static void guc_reset_cancel(struct intel_engine_cs *engine) +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled) +{
- struct intel_context *ce;
- unsigned long index;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
And again.
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
__guc_reset_context(ce, stalled);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+static void guc_cancel_context_requests(struct intel_context *ce) +{
- struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
- struct i915_request *rq;
- unsigned long flags;
- /* Mark all executing requests as skipped. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
i915_request_put(i915_request_mark_eio(rq));
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static void +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine) {
- struct i915_sched_engine * const sched_engine = engine->sched_engine; struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags; /* Can be called during boot if GuC fails to load */
- if (!engine->gt)
- if (!sched_engine) return;
- ENGINE_TRACE(engine, "\n");
- /*
- Before we call engine->cancel_requests(), we should have exclusive
- access to the submission state. This is arranged for us by the
@@ -553,21 +834,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) */ spin_lock_irqsave(&sched_engine->lock, flags);
- /* Mark all executing requests as skipped. */
- list_for_each_entry(rq, &sched_engine->requests, sched.link) {
i915_request_set_error_once(rq, -EIO);
i915_request_mark_complete(rq);
- }
- /* Flush the queued requests to the timeline list (for retiring). */ while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); priolist_for_each_request_consume(rq, rn, p) { list_del_init(&rq->sched.link);
__i915_request_submit(rq);
dma_fence_set_error(&rq->fence, -EIO);
i915_request_mark_complete(rq);
} rb_erase_cached(&p->node, &sched_engine->queue);i915_request_put(i915_request_mark_eio(rq));
@@ -582,14 +858,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) spin_unlock_irqrestore(&sched_engine->lock, flags); } -static void guc_reset_finish(struct intel_engine_cs *engine) +void intel_guc_submission_cancel_requests(struct intel_guc *guc) {
- if (__tasklet_enable(&engine->sched_engine->tasklet))
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&engine->sched_engine->tasklet);
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
guc_cancel_context_requests(ce);
- ENGINE_TRACE(engine, "depth->%d\n",
atomic_read(&engine->sched_engine->tasklet.count));
- guc_cancel_sched_engine_requests(guc->sched_engine);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
This function shares most of the code with 'intel_guc_submission_reset'. Can the xa clean be moved to a common helper?
John.
+}
+void intel_guc_submission_reset_finish(struct intel_guc *guc) +{
- /* Reset called during driver load or during wedge? */
- if (unlikely(!guc_submission_initialized(guc) ||
test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
return;
- /*
* Technically possible for either of these values to be non-zero here,
* but very unlikely + harmless. Regardless let's add a warn so we can
* see in CI if this happens frequently / a precursor to taking down the
* machine.
*/
- GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
- atomic_set(&guc->outstanding_submission_g2h, 0);
- enable_submission(guc); } /*
@@ -656,6 +956,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, else trace_i915_request_guc_submit(rq);
- if (unlikely(ret == -EIO))
disable_submission(guc);
- return ret; }
@@ -668,7 +971,8 @@ static void guc_submit_request(struct i915_request *rq) /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags);
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
- if (submission_disabled(guc) || guc->stalled_request ||
queue_request(sched_engine, rq, rq_prio(rq)); else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) tasklet_hi_schedule(&sched_engine->tasklet);!i915_sched_engine_is_empty(sched_engine))
@@ -805,7 +1109,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce) static int __guc_action_register_context(struct intel_guc *guc, u32 guc_id,
u32 offset)
u32 offset,
{ u32 action[] = { INTEL_GUC_ACTION_REGISTER_CONTEXT,bool loop)
@@ -813,10 +1118,10 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop); }
-static int register_context(struct intel_context *ce) +static int register_context(struct intel_context *ce, bool loop) { struct intel_guc *guc = ce_to_guc(ce); u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + @@ -824,11 +1129,12 @@ static int register_context(struct intel_context *ce) trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset);
- return __guc_action_register_context(guc, ce->guc_id, offset, loop); } static int __guc_action_deregister_context(struct intel_guc *guc,
u32 guc_id)
u32 guc_id,
{ u32 action[] = { INTEL_GUC_ACTION_DEREGISTER_CONTEXT,bool loop)
@@ -836,16 +1142,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc, }; return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
G2H_LEN_DW_DEREGISTER_CONTEXT, true);
}G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
-static int deregister_context(struct intel_context *ce, u32 guc_id) +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop) { struct intel_guc *guc = ce_to_guc(ce); trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id);
- return __guc_action_deregister_context(guc, guc_id, loop); } static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -874,7 +1180,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US; } -static int guc_lrc_desc_pin(struct intel_context *ce) +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -920,18 +1226,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce) */ if (context_registered) { trace_intel_context_steal_guc_id(ce);
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
if (!loop) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
} else {
bool disabled;
unsigned long flags;
/* Seal race with Reset */
spin_lock_irqsave(&ce->guc_state.lock, flags);
disabled = submission_disabled(guc);
if (likely(!disabled)) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
}
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
if (unlikely(disabled)) {
reset_lrc_desc(guc, desc_idx);
return 0; /* Will get registered later */
}
/*}
*/ with_intel_runtime_pm(runtime_pm, wakeref)
- If stealing the guc_id, this ce has the same guc_id as the
- context whos guc_id was stole.
ret = deregister_context(ce, ce->guc_id);
ret = deregister_context(ce, ce->guc_id, loop);
if (unlikely(ret == -EBUSY)) {
clr_context_wait_for_deregister_to_register(ce);
intel_context_put(ce);
} else { with_intel_runtime_pm(runtime_pm, wakeref)}
ret = register_context(ce);
ret = register_context(ce, loop);
if (unlikely(ret == -EBUSY))
reset_lrc_desc(guc, desc_idx);
else if (unlikely(ret == -ENODEV))
} return ret;ret = 0; /* Will get registered later */
@@ -994,7 +1326,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID); trace_intel_context_sched_disable(ce);
- intel_context_get(ce); guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1004,6 +1335,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce) { set_context_pending_disable(ce); clr_context_enabled(ce);
- intel_context_get(ce); return ce->guc_id; }
@@ -1016,7 +1348,7 @@ static void guc_context_sched_disable(struct intel_context *ce) u16 guc_id; intel_wakeref_t wakeref;
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) || context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { clr_context_enabled(ce); goto unpin;
@@ -1034,6 +1366,7 @@ static void guc_context_sched_disable(struct intel_context *ce) * request doesn't slip through the 'context_pending_disable' fence. */ if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
return; } guc_id = prep_context_pending_disable(ce);spin_unlock_irqrestore(&ce->guc_state.lock, flags);
@@ -1050,19 +1383,13 @@ static void guc_context_sched_disable(struct intel_context *ce) static inline void guc_lrc_desc_unpin(struct intel_context *ce) {
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- unsigned long flags;
- struct intel_guc *guc = ce_to_guc(ce); GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); GEM_BUG_ON(context_enabled(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- deregister_context(ce, ce->guc_id);
- deregister_context(ce, ce->guc_id, true); } static void __guc_context_destroy(struct intel_context *ce)
@@ -1090,13 +1417,15 @@ static void guc_context_destroy(struct kref *kref) struct intel_guc *guc = &ce->engine->gt->uc.guc; intel_wakeref_t wakeref; unsigned long flags;
- bool disabled; /*
*/
- If the guc_id is invalid this context has been stolen and we can free
- it immediately. Also can be freed immediately if the context is not
- registered with the GuC.
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) ||
release_guc_id(guc, ce); __guc_context_destroy(ce);context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) {
@@ -1123,6 +1452,18 @@ static void guc_context_destroy(struct kref *kref) list_del_init(&ce->guc_id_link); spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /* Seal race with Reset */
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- disabled = submission_disabled(guc);
- if (likely(!disabled))
set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- if (unlikely(disabled)) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
return;
- }
- /*
- We defer GuC context deregistration until the context is destroyed
- in order to save on CTBs. With this optimization ideally we only need
@@ -1145,6 +1486,33 @@ static int guc_context_alloc(struct intel_context *ce) return lrc_alloc(ce, ce->engine); } +static void add_to_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock(&ce->guc_active.lock);
- list_move_tail(&rq->sched.link, &ce->guc_active.requests);
- spin_unlock(&ce->guc_active.lock);
+}
+static void remove_from_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock_irq(&ce->guc_active.lock);
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&ce->guc_active.lock);
- atomic_dec(&ce->guc_id_ref);
- i915_request_notify_execute_cb_imm(rq);
+}
- static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
@@ -1183,8 +1551,6 @@ static void guc_signal_context_fence(struct intel_context *ce) { unsigned long flags;
- GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_wait_for_deregister_to_register(ce); __guc_signal_context_fence(ce);
@@ -1193,8 +1559,9 @@ static void guc_signal_context_fence(struct intel_context *ce) static bool context_needs_register(struct intel_context *ce, bool new_guc_id) {
- return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
- return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
} static int guc_request_alloc(struct i915_request *rq)!submission_disabled(ce_to_guc(ce));
@@ -1252,8 +1619,12 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(ret < 0)) return ret;; if (context_needs_register(ce, !!ret)) {
ret = guc_lrc_desc_pin(ce);
if (unlikely(ret)) { /* unwind */ret = guc_lrc_desc_pin(ce, true);
if (ret == -EIO) {
disable_submission(guc);
goto out; /* GPU will be reset */
} atomic_dec(&ce->guc_id_ref); unpin_guc_id(guc, ce); return ret;
@@ -1290,20 +1661,6 @@ static int guc_request_alloc(struct i915_request *rq) return 0; } -static struct intel_engine_cs * -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) -{
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
-}
- static int guc_virtual_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr)
@@ -1512,7 +1869,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc, { if (context_guc_id_invalid(ce)) pin_guc_id(guc, ce);
- guc_lrc_desc_pin(ce);
- guc_lrc_desc_pin(ce, true); } static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1578,13 +1935,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; engine->bump_serial = guc_bump_serial;
- engine->add_active_request = add_to_context;
- engine->remove_active_request = remove_from_context; engine->sched_engine->schedule = i915_schedule;
- engine->reset.prepare = guc_reset_prepare;
- engine->reset.rewind = guc_reset_rewind;
- engine->reset.cancel = guc_reset_cancel;
- engine->reset.finish = guc_reset_finish;
- engine->reset.prepare = guc_reset_nop;
- engine->reset.rewind = guc_rewind_nop;
- engine->reset.cancel = guc_reset_nop;
- engine->reset.finish = guc_reset_nop; engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1757,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, * register this context. */ with_intel_runtime_pm(runtime_pm, wakeref)
register_context(ce);
guc_signal_context_fence(ce); intel_context_put(ce); } else if (context_destroyed(ce)) {register_context(ce, true);
@@ -1939,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request =
sibling->add_active_request;
ve->base.remove_active_request =
sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 6d8b9233214e..f0b02200aa01 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
- if (!intel_guc_is_ready(guc))
- /* Nothing to do if GuC isn't supported */
- if (!intel_uc_supports_guc(uc)) return;
- /* Firmware expected to be running when this function is called */
- if (!intel_guc_is_ready(guc))
goto sanitize;
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_prepare(guc);
+sanitize: __uc_sanitize(uc); } +void intel_uc_reset(struct intel_uc *uc, bool stalled) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset(guc, stalled);
+}
+void intel_uc_reset_finish(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware expected to be running when this function is called */
- if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_finish(guc);
+}
+void intel_uc_cancel_requests(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_cancel_requests(guc);
+}
- void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index c4cef885e984..eaa3202192ac 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc); void intel_uc_driver_remove(struct intel_uc *uc); void intel_uc_init_mmio(struct intel_uc *uc); void intel_uc_reset_prepare(struct intel_uc *uc); +void intel_uc_reset(struct intel_uc *uc, bool stalled); +void intel_uc_reset_finish(struct intel_uc *uc); +void intel_uc_cancel_requests(struct intel_uc *uc); void intel_uc_suspend(struct intel_uc *uc); void intel_uc_runtime_suspend(struct intel_uc *uc); int intel_uc_resume(struct intel_uc *uc); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index dec5a35c9aa2..192784875a1d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk) return false; } -static void __notify_execute_cb_imm(struct i915_request *rq) +void i915_request_notify_execute_cb_imm(struct i915_request *rq) { __notify_execute_cb(rq, irq_work_imm); } @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq, return ret; }
-static void remove_from_engine(struct i915_request *rq) -{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- __notify_execute_cb_imm(rq);
-}
- static void __rq_init_watchdog(struct i915_request *rq) { rq->watchdog.timer.function = NULL;
@@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq) * after removing the breadcrumb and signaling it, so that we do not * inadvertently attach the breadcrumb to a completed request. */
- if (!list_empty(&rq->sched.link))
remove_from_engine(rq);
- atomic_dec(&rq->context->guc_id_ref);
- rq->engine->remove_active_request(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb)); __list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq, if (llist_add(&cb->work.node.llist, &signal->execute_cb)) { if (i915_request_is_active(signal) || __request_in_flight(signal))
__notify_execute_cb_imm(signal);
} return 0;i915_request_notify_execute_cb_imm(signal);
@@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request) result = true; GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
- list_move_tail(&request->sched.link, &engine->sched_engine->requests);
- engine->add_active_request(request); active: clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index f870cd75a001..bcc6340c505e 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -649,4 +649,6 @@ bool i915_request_active_engine(struct i915_request *rq, struct intel_engine_cs **active); +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
- #endif /* I915_REQUEST_H */
On 24/06/2021 08:05, Matthew Brost wrote:
Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies.
With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs.
No updates after my review comments on 6th of May.
At least:
1. wmb documentation
2. Spin lock cycling I either didn't understand or didn't buy the explanation. I don't remember seeing that pattern elsewhere in the driver - cycle a spinlock to make sure what was updated inside it is visible you said?
3. Dropping the lock protecting the list in the middle of list_for_each_entry_safe and just continuing to iterate like nothing happened. (__unwind_incomplete_requests) Again, perhaps I did not understand your explanation properly but you did appear to write:
""" We only need the active lock for ce->guc_active.requests list. It is indeed safe to drop the lock. """
+ spin_lock(&ce->guc_active.lock); + list_for_each_entry_safe(rq, rn, + &ce->guc_active.requests, + sched.link) { + if (i915_request_completed(rq)) + continue; + + list_del_init(&rq->sched.link); + spin_unlock(&ce->guc_active.lock); ... + spin_lock(&ce->guc_active.lock); + }
Safe iterator guards against list_del but dropping the lock means the state of the overall list can change so next pointer may or may not be valid, requests may be missed, I don't know. Needs a comment explaining why it is safe.
Regards,
Tvrtko
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + .../drm/i915/gt/intel_execlists_submission.c | 40 ++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 22 + drivers/gpu/drm/i915/gt/mock_engine.c | 31 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 +- drivers/gpu/drm/i915/i915_request.h | 2 + 15 files changed, 649 insertions(+), 171 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index b24a1b7a3f88..2f01437056a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences);
- spin_lock_init(&ce->guc_active.lock);
- INIT_LIST_HEAD(&ce->guc_active.requests);
- ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6945963a31ba..b63c8cf7823b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -165,6 +165,13 @@ struct intel_context { struct list_head fences; } guc_state;
- struct {
/** lock: protects everything in guc_active */
spinlock_t lock;
/** requests: active requests on this context */
struct list_head requests;
- } guc_active;
- /* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e7cb6a06db9d..f9d264c008e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -426,6 +426,12 @@ struct intel_engine_cs {
void (*release)(struct intel_engine_cs *engine);
/*
* Add / remove request from engine active tracking
*/
void (*add_active_request)(struct i915_request *rq);
void (*remove_active_request)(struct i915_request *rq);
struct intel_engine_execlists execlists;
/*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index c10ea6080752..c301a2d088b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); }
+static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static bool can_preempt(struct intel_engine_cs *engine) { if (GRAPHICS_VER(engine->i915) > 8)
@@ -3218,6 +3254,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; engine->bump_serial = execlist_bump_serial;
engine->add_active_request = add_to_engine;
engine->remove_active_request = remove_from_engine;
engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind;
@@ -3912,6 +3950,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request = sibling->add_active_request;
ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;ve->base.remove_active_request = sibling->remove_active_request;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index aef3084e8b16..463a6ae605a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt);
- intel_uc_sanitize(>->uc);
- for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); }
- intel_uc_reset(>->uc, false);
- for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; }
- intel_uc_reset_finish(>->uc);
- intel_rps_enable(>->rps); intel_llc_enable(>->llc);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..2987282dff6d 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable();
intel_uc_reset(>->uc, true);
intel_ggtt_restore_fences(gt->ggtt);
return err;
@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); }
intel_uc_reset_finish(>->uc); }
static void nop_submit_request(struct i915_request *request)
@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine);
intel_uc_cancel_requests(>->uc); local_bh_enable();
reset_finish(gt, awake);
@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
- if (intel_engine_uses_guc(engine))
return -ENODEV;
- if (!intel_engine_pm_get_if_awake(engine)) return 0;
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
- if (intel_engine_uses_guc(engine))
ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
- else
ret = intel_gt_reset_engine(engine);
- ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */
ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
goto out; }ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt, * Try engine reset when available. We fall back to full reset if * single reset fails. */
- if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
- if (!intel_uc_uses_guc_submission(>->uc) &&
local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index e1506b280df1..99dcdc8fba12 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1049,6 +1049,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine) engine->serial++; }
+static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- spin_lock_irq(&rq->engine->sched_engine->lock);
- list_del_init(&rq->sched.link);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&rq->engine->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1085,9 @@ static void setup_common(struct intel_engine_cs *engine) engine->reset.cancel = reset_cancel; engine->reset.finish = reset_finish;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine;
- engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index fc5a65ab1937..c12ff3a75ce6 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request) spin_unlock_irqrestore(&engine->hw_lock, flags); }
+static void mock_add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void mock_remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- spin_unlock_irq(&locked->sched_engine->lock);
+}
- static void mock_reset_prepare(struct intel_engine_cs *engine) { }
@@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request;
engine->base.add_active_request = mock_add_to_engine;
engine->base.remove_active_request = mock_remove_from_engine;
engine->base.reset.prepare = mock_reset_prepare; engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 6661dcb02239..9b09395b998f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc) return 0; }
-/**
- intel_guc_reset_engine() - ask GuC to reset an engine
- @guc: intel_guc structure
- @engine: engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine)
-{
- /* XXX: to be implemented with submission interface rework */
- return -ENODEV;
-}
- /**
- intel_guc_resume() - notify GuC resuming from suspend state
- @guc: the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 22eb1e9cca41..40c9868762d7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
-int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine);
- int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len);
+void intel_guc_submission_reset_prepare(struct intel_guc *guc); +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); +void intel_guc_submission_reset_finish(struct intel_guc *guc); +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
#endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 83058df5ba01..b8c894ad8caf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce) static inline void set_context_wait_for_deregister_to_register(struct intel_context *ce) {
- /* Only should be called from guc_lrc_desc_pin() */
- /* Only should be called from guc_lrc_desc_pin() without lock */ ce->guc_state.sched_state |= SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; }
@@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) {
- guc->lrc_desc_pool_vaddr = NULL; i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline bool guc_submission_initialized(struct intel_guc *guc) +{
- return guc->lrc_desc_pool_vaddr != NULL;
+}
- static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) {
- struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
- if (likely(guc_submission_initialized(guc))) {
struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
unsigned long flags;
- memset(desc, 0, sizeof(*desc));
- xa_erase_irq(&guc->context_lookup, id);
memset(desc, 0, sizeof(*desc));
/*
* xarray API doesn't have xa_erase_irqsave wrapper, so calling
* the lower level functions directly.
*/
xa_lock_irqsave(&guc->context_lookup, flags);
__xa_erase(&guc->context_lookup, id);
xa_unlock_irqrestore(&guc->context_lookup, flags);
} }
static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, struct intel_context *ce) {
- xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
unsigned long flags;
/*
* xarray API doesn't have xa_save_irqsave wrapper, so calling the
* lower level functions directly.
*/
xa_lock_irqsave(&guc->context_lookup, flags);
__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
xa_unlock_irqrestore(&guc->context_lookup, flags); }
static int guc_submission_busy_loop(struct intel_guc* guc,
@@ -331,6 +355,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) interruptible, timeout); }
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
- static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err;
@@ -338,11 +364,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) u32 action[3]; int len = 0; u32 g2h_len_dw = 0;
- bool enabled = context_enabled(ce);
bool enabled;
GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); GEM_BUG_ON(context_guc_id_invalid(ce));
/*
* Corner case where the GuC firmware was blown away and reloaded while
* this context was pinned.
*/
if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
err = guc_lrc_desc_pin(ce, false);
if (unlikely(err))
goto out;
}
enabled = context_enabled(ce);
if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id;
@@ -365,6 +402,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) intel_context_put(ce); }
+out: return err; }
@@ -419,15 +457,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc) if (submit) { guc_set_lrc_tail(last); resubmit:
/*
* We only check for -EBUSY here even though it is possible for
* -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
* died and a full GPU needs to be done. The hangcheck will
* eventually detect that the GuC has died and trigger this
* reset so no need to handle -EDEADLK here.
ret = guc_add_request(guc, last);*/
if (ret == -EBUSY) {
if (unlikely(ret == -EIO))
goto deadlk;
else if (ret == -EBUSY) { tasklet_schedule(&sched_engine->tasklet); guc->stalled_request = last; return false;
@@ -437,6 +470,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
guc->stalled_request = NULL; return submit;
+deadlk:
sched_engine->tasklet.callback = NULL;
tasklet_disable_nosync(&sched_engine->tasklet);
return false; }
static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -463,27 +501,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) intel_engine_signal_breadcrumbs(engine); }
-static void guc_reset_prepare(struct intel_engine_cs *engine) +static void __guc_context_destroy(struct intel_context *ce); +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce); +static void guc_signal_context_fence(struct intel_context *ce);
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +{
- struct intel_context *ce;
- unsigned long index, flags;
- bool pending_disable, pending_enable, deregister, destroyed;
- xa_for_each(&guc->context_lookup, index, ce) {
/* Flush context */
spin_lock_irqsave(&ce->guc_state.lock, flags);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
/*
* Once we are at this point submission_disabled() is guaranteed
* to visible to all callers who set the below flags (see above
* flush and flushes in reset_prepare). If submission_disabled()
* is set, the caller shouldn't set these flags.
*/
destroyed = context_destroyed(ce);
pending_enable = context_pending_enable(ce);
pending_disable = context_pending_disable(ce);
deregister = context_wait_for_deregister_to_register(ce);
init_sched_state(ce);
if (pending_enable || destroyed || deregister) {
atomic_dec(&guc->outstanding_submission_g2h);
if (deregister)
guc_signal_context_fence(ce);
if (destroyed) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
}
if (pending_enable|| deregister)
intel_context_put(ce);
}
/* Not mutualy exclusive with above if statement. */
if (pending_disable) {
guc_signal_context_fence(ce);
intel_context_sched_disable_unpin(ce);
atomic_dec(&guc->outstanding_submission_g2h);
intel_context_put(ce);
}
- }
+}
+static inline bool +submission_disabled(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+static void disable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- if (__tasklet_is_enabled(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
__tasklet_disable_sync_once(&sched_engine->tasklet);
sched_engine->tasklet.callback = NULL;
- }
+}
+static void enable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&guc->sched_engine->lock, flags);
- sched_engine->tasklet.callback = guc_submission_tasklet;
- wmb();
- if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
__tasklet_enable(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&sched_engine->tasklet);
- }
- spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+static void guc_flush_submissions(struct intel_guc *guc) {
- ENGINE_TRACE(engine, "\n");
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+void intel_guc_submission_reset_prepare(struct intel_guc *guc) +{
int i;
if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
disable_submission(guc);
guc->interrupts.disable(guc);
/* Flush IRQ handler */
spin_lock_irq(&guc_to_gt(guc)->irq_lock);
spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
guc_flush_submissions(guc);
/*
* Prevent request submission to the hardware until we have
* completed the reset in i915_gem_reset_finish(). If a request
* is completed by one engine, it may then queue a request
* to a second via its execlists->tasklet *just* as we are
* calling engine->init_hw() and also writing the ELSP.
* Turning off the execlists->tasklet until the reset is over
* prevents the race.
*/
- __tasklet_disable_sync_once(&engine->sched_engine->tasklet);
* Handle any outstanding G2Hs before reset. Call IRQ handler directly
* each pass as interrupt have been disabled. We always scrub for
* outstanding G2H as it is possible for outstanding_submission_g2h to
* be incremented after the context state update.
*/
- for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
do {
wait_for_reset(guc, &guc->outstanding_submission_g2h);
} while (!list_empty(&guc->ct.requests.incoming));
- }
- scrub_guc_desc_for_outstanding_g2h(guc); }
-static void guc_reset_state(struct intel_context *ce,
struct intel_engine_cs *engine,
u32 head,
bool scrub)
+static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) {
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
+}
+static inline struct intel_engine_cs * +__context_to_physical_engine(struct intel_context *ce) +{
- struct intel_engine_cs *engine = ce->engine;
- if (intel_engine_is_virtual(engine))
engine = guc_virtual_get_sibling(engine, 0);
- return engine;
+}
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub) +{
struct intel_engine_cs *engine = __context_to_physical_engine(ce);
GEM_BUG_ON(!intel_context_is_pinned(ce));
/*
@@ -501,42 +677,147 @@ static void guc_reset_state(struct intel_context *ce, lrc_update_regs(ce, engine, head); }
-static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) +static void guc_reset_nop(struct intel_engine_cs *engine) {
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_request *rq;
+}
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled) +{ +}
+static void +__unwind_incomplete_requests(struct intel_context *ce) +{
- struct i915_request *rq, *rn;
- struct list_head *pl;
- int prio = I915_PRIORITY_INVALID;
- struct i915_sched_engine * const sched_engine =
unsigned long flags;ce->engine->sched_engine;
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry_safe(rq, rn,
&ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
continue;
list_del_init(&rq->sched.link);
spin_unlock(&ce->guc_active.lock);
__i915_request_unsubmit(rq);
/* Push the request back into the queue for later resubmission. */
GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
if (rq_prio(rq) != prio) {
prio = rq_prio(rq);
pl = i915_sched_lookup_priolist(sched_engine, prio);
}
GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
list_add_tail(&rq->sched.link, pl);
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Push back any incomplete requests for replay after the reset. */
- rq = execlists_unwind_incomplete_requests(execlists);
- if (!rq)
goto out_unlock;
spin_lock(&ce->guc_active.lock);
- }
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static struct i915_request *context_find_active_request(struct intel_context *ce) +{
- struct i915_request *rq, *active = NULL;
- unsigned long flags;
- spin_lock_irqsave(&ce->guc_active.lock, flags);
- list_for_each_entry_reverse(rq, &ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
break;
active = rq;
- }
- spin_unlock_irqrestore(&ce->guc_active.lock, flags);
- return active;
+}
+static void __guc_reset_context(struct intel_context *ce, bool stalled) +{
struct i915_request *rq;
u32 head;
/*
* GuC will implicitly mark the context as non-schedulable
* when it sends the reset notification. Make sure our state
* reflects this change. The context will be marked enabled
* on resubmission.
*/
clr_context_enabled(ce);
rq = context_find_active_request(ce);
if (!rq) {
head = ce->ring->tail;
stalled = false;
goto out_replay;
}
if (!i915_request_started(rq)) stalled = false;
GEM_BUG_ON(i915_active_is_idle(&ce->active));
head = intel_ring_wrap(ce->ring, rq->head); __i915_request_reset(rq, stalled);
- guc_reset_state(rq->context, engine, rq->head, stalled);
-out_unlock:
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
- guc_reset_state(ce, head, stalled);
- __unwind_incomplete_requests(ce); }
-static void guc_reset_cancel(struct intel_engine_cs *engine) +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled) +{
- struct intel_context *ce;
- unsigned long index;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
__guc_reset_context(ce, stalled);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+static void guc_cancel_context_requests(struct intel_context *ce) +{
- struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
- struct i915_request *rq;
- unsigned long flags;
- /* Mark all executing requests as skipped. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
i915_request_put(i915_request_mark_eio(rq));
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static void +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine) {
struct i915_sched_engine * const sched_engine = engine->sched_engine; struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags;
/* Can be called during boot if GuC fails to load */
if (!engine->gt)
- if (!sched_engine) return;
- ENGINE_TRACE(engine, "\n");
- /*
- Before we call engine->cancel_requests(), we should have exclusive
- access to the submission state. This is arranged for us by the
@@ -553,21 +834,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) */ spin_lock_irqsave(&sched_engine->lock, flags);
/* Mark all executing requests as skipped. */
list_for_each_entry(rq, &sched_engine->requests, sched.link) {
i915_request_set_error_once(rq, -EIO);
i915_request_mark_complete(rq);
}
/* Flush the queued requests to the timeline list (for retiring). */ while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb);
priolist_for_each_request_consume(rq, rn, p) { list_del_init(&rq->sched.link);
__i915_request_submit(rq);
dma_fence_set_error(&rq->fence, -EIO);
i915_request_mark_complete(rq);
i915_request_put(i915_request_mark_eio(rq));
}
rb_erase_cached(&p->node, &sched_engine->queue);
@@ -582,14 +858,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) spin_unlock_irqrestore(&sched_engine->lock, flags); }
-static void guc_reset_finish(struct intel_engine_cs *engine) +void intel_guc_submission_cancel_requests(struct intel_guc *guc) {
- if (__tasklet_enable(&engine->sched_engine->tasklet))
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&engine->sched_engine->tasklet);
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
guc_cancel_context_requests(ce);
- ENGINE_TRACE(engine, "depth->%d\n",
atomic_read(&engine->sched_engine->tasklet.count));
- guc_cancel_sched_engine_requests(guc->sched_engine);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+void intel_guc_submission_reset_finish(struct intel_guc *guc) +{
/* Reset called during driver load or during wedge? */
if (unlikely(!guc_submission_initialized(guc) ||
test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
return;
/*
* Technically possible for either of these values to be non-zero here,
* but very unlikely + harmless. Regardless let's add a warn so we can
* see in CI if this happens frequently / a precursor to taking down the
* machine.
*/
GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
atomic_set(&guc->outstanding_submission_g2h, 0);
enable_submission(guc); }
/*
@@ -656,6 +956,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, else trace_i915_request_guc_submit(rq);
- if (unlikely(ret == -EIO))
disable_submission(guc);
- return ret; }
@@ -668,7 +971,8 @@ static void guc_submit_request(struct i915_request *rq) /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags);
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
- if (submission_disabled(guc) || guc->stalled_request ||
queue_request(sched_engine, rq, rq_prio(rq)); else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) tasklet_hi_schedule(&sched_engine->tasklet);!i915_sched_engine_is_empty(sched_engine))
@@ -805,7 +1109,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
static int __guc_action_register_context(struct intel_guc *guc, u32 guc_id,
u32 offset)
u32 offset,
{ u32 action[] = { INTEL_GUC_ACTION_REGISTER_CONTEXT,bool loop)
@@ -813,10 +1118,10 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop); }
-static int register_context(struct intel_context *ce) +static int register_context(struct intel_context *ce, bool loop) { struct intel_guc *guc = ce_to_guc(ce); u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + @@ -824,11 +1129,12 @@ static int register_context(struct intel_context *ce)
trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset);
return __guc_action_register_context(guc, ce->guc_id, offset, loop); }
static int __guc_action_deregister_context(struct intel_guc *guc,
u32 guc_id)
u32 guc_id,
{ u32 action[] = { INTEL_GUC_ACTION_DEREGISTER_CONTEXT,bool loop)
@@ -836,16 +1142,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc, };
return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
G2H_LEN_DW_DEREGISTER_CONTEXT, true);
}G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
-static int deregister_context(struct intel_context *ce, u32 guc_id) +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop) { struct intel_guc *guc = ce_to_guc(ce);
trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id);
return __guc_action_deregister_context(guc, guc_id, loop); }
static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -874,7 +1180,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US; }
-static int guc_lrc_desc_pin(struct intel_context *ce) +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -920,18 +1226,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce) */ if (context_registered) { trace_intel_context_steal_guc_id(ce);
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
if (!loop) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
} else {
bool disabled;
unsigned long flags;
/* Seal race with Reset */
spin_lock_irqsave(&ce->guc_state.lock, flags);
disabled = submission_disabled(guc);
if (likely(!disabled)) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
}
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
if (unlikely(disabled)) {
reset_lrc_desc(guc, desc_idx);
return 0; /* Will get registered later */
}
}
/*
- If stealing the guc_id, this ce has the same guc_id as the
- context whos guc_id was stole.
*/ with_intel_runtime_pm(runtime_pm, wakeref)
ret = deregister_context(ce, ce->guc_id);
ret = deregister_context(ce, ce->guc_id, loop);
if (unlikely(ret == -EBUSY)) {
clr_context_wait_for_deregister_to_register(ce);
intel_context_put(ce);
} else { with_intel_runtime_pm(runtime_pm, wakeref)}
ret = register_context(ce);
ret = register_context(ce, loop);
if (unlikely(ret == -EBUSY))
reset_lrc_desc(guc, desc_idx);
else if (unlikely(ret == -ENODEV))
ret = 0; /* Will get registered later */
}
return ret;
@@ -994,7 +1326,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
trace_intel_context_sched_disable(ce);
intel_context_get(ce);
guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1004,6 +1335,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce) { set_context_pending_disable(ce); clr_context_enabled(ce);
intel_context_get(ce);
return ce->guc_id; }
@@ -1016,7 +1348,7 @@ static void guc_context_sched_disable(struct intel_context *ce) u16 guc_id; intel_wakeref_t wakeref;
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) || context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { clr_context_enabled(ce); goto unpin;
@@ -1034,6 +1366,7 @@ static void guc_context_sched_disable(struct intel_context *ce) * request doesn't slip through the 'context_pending_disable' fence. */ if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
return; } guc_id = prep_context_pending_disable(ce);spin_unlock_irqrestore(&ce->guc_state.lock, flags);
@@ -1050,19 +1383,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
static inline void guc_lrc_desc_unpin(struct intel_context *ce) {
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- unsigned long flags;
struct intel_guc *guc = ce_to_guc(ce);
GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); GEM_BUG_ON(context_enabled(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- deregister_context(ce, ce->guc_id);
deregister_context(ce, ce->guc_id, true); }
static void __guc_context_destroy(struct intel_context *ce)
@@ -1090,13 +1417,15 @@ static void guc_context_destroy(struct kref *kref) struct intel_guc *guc = &ce->engine->gt->uc.guc; intel_wakeref_t wakeref; unsigned long flags;
bool disabled;
/*
- If the guc_id is invalid this context has been stolen and we can free
- it immediately. Also can be freed immediately if the context is not
- registered with the GuC.
*/
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) ||
release_guc_id(guc, ce); __guc_context_destroy(ce);context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) {
@@ -1123,6 +1452,18 @@ static void guc_context_destroy(struct kref *kref) list_del_init(&ce->guc_id_link); spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /* Seal race with Reset */
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- disabled = submission_disabled(guc);
- if (likely(!disabled))
set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- if (unlikely(disabled)) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
return;
- }
- /*
- We defer GuC context deregistration until the context is destroyed
- in order to save on CTBs. With this optimization ideally we only need
@@ -1145,6 +1486,33 @@ static int guc_context_alloc(struct intel_context *ce) return lrc_alloc(ce, ce->engine); }
+static void add_to_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock(&ce->guc_active.lock);
- list_move_tail(&rq->sched.link, &ce->guc_active.requests);
- spin_unlock(&ce->guc_active.lock);
+}
+static void remove_from_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock_irq(&ce->guc_active.lock);
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&ce->guc_active.lock);
- atomic_dec(&ce->guc_id_ref);
- i915_request_notify_execute_cb_imm(rq);
+}
- static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
@@ -1183,8 +1551,6 @@ static void guc_signal_context_fence(struct intel_context *ce) { unsigned long flags;
- GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_wait_for_deregister_to_register(ce); __guc_signal_context_fence(ce);
@@ -1193,8 +1559,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
static bool context_needs_register(struct intel_context *ce, bool new_guc_id) {
- return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
!submission_disabled(ce_to_guc(ce));
}
static int guc_request_alloc(struct i915_request *rq)
@@ -1252,8 +1619,12 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(ret < 0)) return ret;; if (context_needs_register(ce, !!ret)) {
ret = guc_lrc_desc_pin(ce);
if (unlikely(ret)) { /* unwind */ret = guc_lrc_desc_pin(ce, true);
if (ret == -EIO) {
disable_submission(guc);
goto out; /* GPU will be reset */
} atomic_dec(&ce->guc_id_ref); unpin_guc_id(guc, ce); return ret;
@@ -1290,20 +1661,6 @@ static int guc_request_alloc(struct i915_request *rq) return 0; }
-static struct intel_engine_cs * -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) -{
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
-}
- static int guc_virtual_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr)
@@ -1512,7 +1869,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc, { if (context_guc_id_invalid(ce)) pin_guc_id(guc, ce);
- guc_lrc_desc_pin(ce);
guc_lrc_desc_pin(ce, true); }
static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1578,13 +1935,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; engine->bump_serial = guc_bump_serial;
engine->add_active_request = add_to_context;
engine->remove_active_request = remove_from_context;
engine->sched_engine->schedule = i915_schedule;
- engine->reset.prepare = guc_reset_prepare;
- engine->reset.rewind = guc_reset_rewind;
- engine->reset.cancel = guc_reset_cancel;
- engine->reset.finish = guc_reset_finish;
engine->reset.prepare = guc_reset_nop;
engine->reset.rewind = guc_rewind_nop;
engine->reset.cancel = guc_reset_nop;
engine->reset.finish = guc_reset_nop;
engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1757,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, * register this context. */ with_intel_runtime_pm(runtime_pm, wakeref)
register_context(ce);
guc_signal_context_fence(ce); intel_context_put(ce); } else if (context_destroyed(ce)) {register_context(ce, true);
@@ -1939,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request =
sibling->add_active_request;
ve->base.remove_active_request =
sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 6d8b9233214e..f0b02200aa01 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
- if (!intel_guc_is_ready(guc))
/* Nothing to do if GuC isn't supported */
if (!intel_uc_supports_guc(uc)) return;
/* Firmware expected to be running when this function is called */
if (!intel_guc_is_ready(guc))
goto sanitize;
if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_prepare(guc);
+sanitize: __uc_sanitize(uc); }
+void intel_uc_reset(struct intel_uc *uc, bool stalled) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset(guc, stalled);
+}
+void intel_uc_reset_finish(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware expected to be running when this function is called */
- if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_finish(guc);
+}
+void intel_uc_cancel_requests(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_cancel_requests(guc);
+}
- void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index c4cef885e984..eaa3202192ac 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc); void intel_uc_driver_remove(struct intel_uc *uc); void intel_uc_init_mmio(struct intel_uc *uc); void intel_uc_reset_prepare(struct intel_uc *uc); +void intel_uc_reset(struct intel_uc *uc, bool stalled); +void intel_uc_reset_finish(struct intel_uc *uc); +void intel_uc_cancel_requests(struct intel_uc *uc); void intel_uc_suspend(struct intel_uc *uc); void intel_uc_runtime_suspend(struct intel_uc *uc); int intel_uc_resume(struct intel_uc *uc); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index dec5a35c9aa2..192784875a1d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk) return false; }
-static void __notify_execute_cb_imm(struct i915_request *rq) +void i915_request_notify_execute_cb_imm(struct i915_request *rq) { __notify_execute_cb(rq, irq_work_imm); } @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq, return ret; }
-static void remove_from_engine(struct i915_request *rq) -{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- __notify_execute_cb_imm(rq);
-}
- static void __rq_init_watchdog(struct i915_request *rq) { rq->watchdog.timer.function = NULL;
@@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq) * after removing the breadcrumb and signaling it, so that we do not * inadvertently attach the breadcrumb to a completed request. */
- if (!list_empty(&rq->sched.link))
remove_from_engine(rq);
- atomic_dec(&rq->context->guc_id_ref);
rq->engine->remove_active_request(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb));
__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq, if (llist_add(&cb->work.node.llist, &signal->execute_cb)) { if (i915_request_is_active(signal) || __request_in_flight(signal))
__notify_execute_cb_imm(signal);
i915_request_notify_execute_cb_imm(signal);
}
return 0;
@@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request) result = true;
GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
- list_move_tail(&request->sched.link, &engine->sched_engine->requests);
- engine->add_active_request(request); active: clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index f870cd75a001..bcc6340c505e 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -649,4 +649,6 @@ bool i915_request_active_engine(struct i915_request *rq, struct intel_engine_cs **active);
+void i915_request_notify_execute_cb_imm(struct i915_request *rq);
- #endif /* I915_REQUEST_H */
On Thu, Jul 15, 2021 at 10:36:51AM +0100, Tvrtko Ursulin wrote:
On 24/06/2021 08:05, Matthew Brost wrote:
Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies.
With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs.
No updates after my review comments on 6th of May.
At least:
- wmb documentation
Yea, missed this. Checkpatch yelled at me too. Will be fixed in next rev.
- Spin lock cycling I either didn't understand or didn't buy the
explanation. I don't remember seeing that pattern elsewhere in the driver - cycle a spinlock to make sure what was updated inside it is visible you said?
I did respond - not really my fault if you don't understand a fairly simple concept but I'll explain again.
Change a variable Cycle a lock At this point we know anyone that acquires above lock the variable change is visible.
I can't be the first person in the Linux kernel to do this nor in the i915.
This basically allows to seal all the reset races without a BKL.
Also I told you I explain in this a doc patch that will get reposted after GuC submission lands: https://patchwork.freedesktop.org/patch/432408/?series=89844&rev=1
- Dropping the lock protecting the list in the middle of
list_for_each_entry_safe and just continuing to iterate like nothing happened. (__unwind_incomplete_requests) Again, perhaps I did not understand your explanation properly but you did appear to write:
To be honest looking at the code now we likely don't need to drop the look but regardless I don't think we should change this for the following reasons.
1. I assure you this is safe and works. I can add a better comment explaining this though. 2. This is thoroughly tested and resets are the hardest thing to get stable and working. 3. This code is literally going to get deleted when we move to the DRM scheduler as all the tracking / unwinding / resubmission will be in the DRM scheduler core. 4. A 2 second search of the driver found that we do the same thing in intel_gt_retire_requests_timeout so this isn't unprecedented.
Matt
""" We only need the active lock for ce->guc_active.requests list. It is indeed safe to drop the lock. """
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry_safe(rq, rn,
&ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
continue;
list_del_init(&rq->sched.link);
spin_unlock(&ce->guc_active.lock);
...
spin_lock(&ce->guc_active.lock);
- }
Safe iterator guards against list_del but dropping the lock means the state of the overall list can change so next pointer may or may not be valid, requests may be missed, I don't know. Needs a comment explaining why it is safe.
Regards,
Tvrtko
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + .../drm/i915/gt/intel_execlists_submission.c | 40 ++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 22 + drivers/gpu/drm/i915/gt/mock_engine.c | 31 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 +- drivers/gpu/drm/i915/i915_request.h | 2 + 15 files changed, 649 insertions(+), 171 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index b24a1b7a3f88..2f01437056a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences);
- spin_lock_init(&ce->guc_active.lock);
- INIT_LIST_HEAD(&ce->guc_active.requests);
- ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6945963a31ba..b63c8cf7823b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -165,6 +165,13 @@ struct intel_context { struct list_head fences; } guc_state;
- struct {
/** lock: protects everything in guc_active */
spinlock_t lock;
/** requests: active requests on this context */
struct list_head requests;
- } guc_active;
- /* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e7cb6a06db9d..f9d264c008e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -426,6 +426,12 @@ struct intel_engine_cs { void (*release)(struct intel_engine_cs *engine);
- /*
* Add / remove request from engine active tracking
*/
- void (*add_active_request)(struct i915_request *rq);
- void (*remove_active_request)(struct i915_request *rq);
- struct intel_engine_execlists execlists; /*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index c10ea6080752..c301a2d088b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); } +static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static bool can_preempt(struct intel_engine_cs *engine) { if (GRAPHICS_VER(engine->i915) > 8)
@@ -3218,6 +3254,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; engine->bump_serial = execlist_bump_serial;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine; engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind;
@@ -3912,6 +3950,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request = sibling->add_active_request;
ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;ve->base.remove_active_request = sibling->remove_active_request;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index aef3084e8b16..463a6ae605a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt);
- intel_uc_sanitize(>->uc);
- for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); }
- intel_uc_reset(>->uc, false);
- for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; }
- intel_uc_reset_finish(>->uc);
- intel_rps_enable(>->rps); intel_llc_enable(>->llc);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..2987282dff6d 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable();
- intel_uc_reset(>->uc, true);
- intel_ggtt_restore_fences(gt->ggtt); return err;
@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); }
- intel_uc_reset_finish(>->uc); } static void nop_submit_request(struct i915_request *request)
@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine);
- intel_uc_cancel_requests(>->uc); local_bh_enable(); reset_finish(gt, awake);
@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
- if (intel_engine_uses_guc(engine))
return -ENODEV;
- if (!intel_engine_pm_get_if_awake(engine)) return 0;
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
- if (intel_engine_uses_guc(engine))
ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
- else
ret = intel_gt_reset_engine(engine);
- ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */
ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
goto out; }ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt, * Try engine reset when available. We fall back to full reset if * single reset fails. */
- if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
- if (!intel_uc_uses_guc_submission(>->uc) &&
local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index e1506b280df1..99dcdc8fba12 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1049,6 +1049,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine) engine->serial++; } +static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- spin_lock_irq(&rq->engine->sched_engine->lock);
- list_del_init(&rq->sched.link);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&rq->engine->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1085,9 @@ static void setup_common(struct intel_engine_cs *engine) engine->reset.cancel = reset_cancel; engine->reset.finish = reset_finish;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine;
- engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index fc5a65ab1937..c12ff3a75ce6 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request) spin_unlock_irqrestore(&engine->hw_lock, flags); } +static void mock_add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void mock_remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- spin_unlock_irq(&locked->sched_engine->lock);
+}
- static void mock_reset_prepare(struct intel_engine_cs *engine) { }
@@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request;
- engine->base.add_active_request = mock_add_to_engine;
- engine->base.remove_active_request = mock_remove_from_engine; engine->base.reset.prepare = mock_reset_prepare; engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 6661dcb02239..9b09395b998f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc) return 0; } -/**
- intel_guc_reset_engine() - ask GuC to reset an engine
- @guc: intel_guc structure
- @engine: engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine)
-{
- /* XXX: to be implemented with submission interface rework */
- return -ENODEV;
-}
- /**
- intel_guc_resume() - notify GuC resuming from suspend state
- @guc: the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 22eb1e9cca41..40c9868762d7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout); -int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine);
- int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len);
+void intel_guc_submission_reset_prepare(struct intel_guc *guc); +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); +void intel_guc_submission_reset_finish(struct intel_guc *guc); +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
- void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 83058df5ba01..b8c894ad8caf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce) static inline void set_context_wait_for_deregister_to_register(struct intel_context *ce) {
- /* Only should be called from guc_lrc_desc_pin() */
- /* Only should be called from guc_lrc_desc_pin() without lock */ ce->guc_state.sched_state |= SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; }
@@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc) static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) {
- guc->lrc_desc_pool_vaddr = NULL; i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline bool guc_submission_initialized(struct intel_guc *guc) +{
- return guc->lrc_desc_pool_vaddr != NULL;
+}
- static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) {
- struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
- if (likely(guc_submission_initialized(guc))) {
struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
unsigned long flags;
- memset(desc, 0, sizeof(*desc));
- xa_erase_irq(&guc->context_lookup, id);
memset(desc, 0, sizeof(*desc));
/*
* xarray API doesn't have xa_erase_irqsave wrapper, so calling
* the lower level functions directly.
*/
xa_lock_irqsave(&guc->context_lookup, flags);
__xa_erase(&guc->context_lookup, id);
xa_unlock_irqrestore(&guc->context_lookup, flags);
- } } static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, struct intel_context *ce) {
- xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
- unsigned long flags;
- /*
* xarray API doesn't have xa_save_irqsave wrapper, so calling the
* lower level functions directly.
*/
- xa_lock_irqsave(&guc->context_lookup, flags);
- __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
- xa_unlock_irqrestore(&guc->context_lookup, flags); } static int guc_submission_busy_loop(struct intel_guc* guc,
@@ -331,6 +355,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) interruptible, timeout); } +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
- static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err;
@@ -338,11 +364,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) u32 action[3]; int len = 0; u32 g2h_len_dw = 0;
- bool enabled = context_enabled(ce);
- bool enabled; GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); GEM_BUG_ON(context_guc_id_invalid(ce));
- /*
* Corner case where the GuC firmware was blown away and reloaded while
* this context was pinned.
*/
- if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
err = guc_lrc_desc_pin(ce, false);
if (unlikely(err))
goto out;
- }
- enabled = context_enabled(ce);
- if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id;
@@ -365,6 +402,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) intel_context_put(ce); } +out: return err; } @@ -419,15 +457,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc) if (submit) { guc_set_lrc_tail(last); resubmit:
/*
* We only check for -EBUSY here even though it is possible for
* -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
* died and a full GPU needs to be done. The hangcheck will
* eventually detect that the GuC has died and trigger this
* reset so no need to handle -EDEADLK here.
ret = guc_add_request(guc, last);*/
if (ret == -EBUSY) {
if (unlikely(ret == -EIO))
goto deadlk;
else if (ret == -EBUSY) { tasklet_schedule(&sched_engine->tasklet); guc->stalled_request = last; return false;
@@ -437,6 +470,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = NULL; return submit;
+deadlk:
- sched_engine->tasklet.callback = NULL;
- tasklet_disable_nosync(&sched_engine->tasklet);
- return false; } static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -463,27 +501,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) intel_engine_signal_breadcrumbs(engine); } -static void guc_reset_prepare(struct intel_engine_cs *engine) +static void __guc_context_destroy(struct intel_context *ce); +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce); +static void guc_signal_context_fence(struct intel_context *ce);
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +{
- struct intel_context *ce;
- unsigned long index, flags;
- bool pending_disable, pending_enable, deregister, destroyed;
- xa_for_each(&guc->context_lookup, index, ce) {
/* Flush context */
spin_lock_irqsave(&ce->guc_state.lock, flags);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
/*
* Once we are at this point submission_disabled() is guaranteed
* to visible to all callers who set the below flags (see above
* flush and flushes in reset_prepare). If submission_disabled()
* is set, the caller shouldn't set these flags.
*/
destroyed = context_destroyed(ce);
pending_enable = context_pending_enable(ce);
pending_disable = context_pending_disable(ce);
deregister = context_wait_for_deregister_to_register(ce);
init_sched_state(ce);
if (pending_enable || destroyed || deregister) {
atomic_dec(&guc->outstanding_submission_g2h);
if (deregister)
guc_signal_context_fence(ce);
if (destroyed) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
}
if (pending_enable|| deregister)
intel_context_put(ce);
}
/* Not mutualy exclusive with above if statement. */
if (pending_disable) {
guc_signal_context_fence(ce);
intel_context_sched_disable_unpin(ce);
atomic_dec(&guc->outstanding_submission_g2h);
intel_context_put(ce);
}
- }
+}
+static inline bool +submission_disabled(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+static void disable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- if (__tasklet_is_enabled(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
__tasklet_disable_sync_once(&sched_engine->tasklet);
sched_engine->tasklet.callback = NULL;
- }
+}
+static void enable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&guc->sched_engine->lock, flags);
- sched_engine->tasklet.callback = guc_submission_tasklet;
- wmb();
- if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
__tasklet_enable(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&sched_engine->tasklet);
- }
- spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+static void guc_flush_submissions(struct intel_guc *guc) {
- ENGINE_TRACE(engine, "\n");
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+void intel_guc_submission_reset_prepare(struct intel_guc *guc) +{
- int i;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
- disable_submission(guc);
- guc->interrupts.disable(guc);
- /* Flush IRQ handler */
- spin_lock_irq(&guc_to_gt(guc)->irq_lock);
- spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
- guc_flush_submissions(guc); /*
* Prevent request submission to the hardware until we have
* completed the reset in i915_gem_reset_finish(). If a request
* is completed by one engine, it may then queue a request
* to a second via its execlists->tasklet *just* as we are
* calling engine->init_hw() and also writing the ELSP.
* Turning off the execlists->tasklet until the reset is over
* prevents the race.
*/
- __tasklet_disable_sync_once(&engine->sched_engine->tasklet);
* Handle any outstanding G2Hs before reset. Call IRQ handler directly
* each pass as interrupt have been disabled. We always scrub for
* outstanding G2H as it is possible for outstanding_submission_g2h to
* be incremented after the context state update.
*/
- for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
do {
wait_for_reset(guc, &guc->outstanding_submission_g2h);
} while (!list_empty(&guc->ct.requests.incoming));
- }
- scrub_guc_desc_for_outstanding_g2h(guc); }
-static void guc_reset_state(struct intel_context *ce,
struct intel_engine_cs *engine,
u32 head,
bool scrub)
+static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) {
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
+}
+static inline struct intel_engine_cs * +__context_to_physical_engine(struct intel_context *ce) +{
- struct intel_engine_cs *engine = ce->engine;
- if (intel_engine_is_virtual(engine))
engine = guc_virtual_get_sibling(engine, 0);
- return engine;
+}
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub) +{
- struct intel_engine_cs *engine = __context_to_physical_engine(ce);
- GEM_BUG_ON(!intel_context_is_pinned(ce)); /*
@@ -501,42 +677,147 @@ static void guc_reset_state(struct intel_context *ce, lrc_update_regs(ce, engine, head); } -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) +static void guc_reset_nop(struct intel_engine_cs *engine) {
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_request *rq;
+}
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled) +{ +}
+static void +__unwind_incomplete_requests(struct intel_context *ce) +{
- struct i915_request *rq, *rn;
- struct list_head *pl;
- int prio = I915_PRIORITY_INVALID;
- struct i915_sched_engine * const sched_engine =
unsigned long flags;ce->engine->sched_engine;
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry_safe(rq, rn,
&ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
continue;
list_del_init(&rq->sched.link);
spin_unlock(&ce->guc_active.lock);
__i915_request_unsubmit(rq);
/* Push the request back into the queue for later resubmission. */
GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
if (rq_prio(rq) != prio) {
prio = rq_prio(rq);
pl = i915_sched_lookup_priolist(sched_engine, prio);
}
GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
list_add_tail(&rq->sched.link, pl);
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Push back any incomplete requests for replay after the reset. */
- rq = execlists_unwind_incomplete_requests(execlists);
- if (!rq)
goto out_unlock;
spin_lock(&ce->guc_active.lock);
- }
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static struct i915_request *context_find_active_request(struct intel_context *ce) +{
- struct i915_request *rq, *active = NULL;
- unsigned long flags;
- spin_lock_irqsave(&ce->guc_active.lock, flags);
- list_for_each_entry_reverse(rq, &ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
break;
active = rq;
- }
- spin_unlock_irqrestore(&ce->guc_active.lock, flags);
- return active;
+}
+static void __guc_reset_context(struct intel_context *ce, bool stalled) +{
- struct i915_request *rq;
- u32 head;
- /*
* GuC will implicitly mark the context as non-schedulable
* when it sends the reset notification. Make sure our state
* reflects this change. The context will be marked enabled
* on resubmission.
*/
- clr_context_enabled(ce);
- rq = context_find_active_request(ce);
- if (!rq) {
head = ce->ring->tail;
stalled = false;
goto out_replay;
- } if (!i915_request_started(rq)) stalled = false;
- GEM_BUG_ON(i915_active_is_idle(&ce->active));
- head = intel_ring_wrap(ce->ring, rq->head); __i915_request_reset(rq, stalled);
- guc_reset_state(rq->context, engine, rq->head, stalled);
-out_unlock:
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
- guc_reset_state(ce, head, stalled);
- __unwind_incomplete_requests(ce); }
-static void guc_reset_cancel(struct intel_engine_cs *engine) +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled) +{
- struct intel_context *ce;
- unsigned long index;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
__guc_reset_context(ce, stalled);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+static void guc_cancel_context_requests(struct intel_context *ce) +{
- struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
- struct i915_request *rq;
- unsigned long flags;
- /* Mark all executing requests as skipped. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
i915_request_put(i915_request_mark_eio(rq));
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static void +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine) {
- struct i915_sched_engine * const sched_engine = engine->sched_engine; struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags; /* Can be called during boot if GuC fails to load */
- if (!engine->gt)
- if (!sched_engine) return;
- ENGINE_TRACE(engine, "\n");
- /*
- Before we call engine->cancel_requests(), we should have exclusive
- access to the submission state. This is arranged for us by the
@@ -553,21 +834,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) */ spin_lock_irqsave(&sched_engine->lock, flags);
- /* Mark all executing requests as skipped. */
- list_for_each_entry(rq, &sched_engine->requests, sched.link) {
i915_request_set_error_once(rq, -EIO);
i915_request_mark_complete(rq);
- }
- /* Flush the queued requests to the timeline list (for retiring). */ while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); priolist_for_each_request_consume(rq, rn, p) { list_del_init(&rq->sched.link);
__i915_request_submit(rq);
dma_fence_set_error(&rq->fence, -EIO);
i915_request_mark_complete(rq);
} rb_erase_cached(&p->node, &sched_engine->queue);i915_request_put(i915_request_mark_eio(rq));
@@ -582,14 +858,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) spin_unlock_irqrestore(&sched_engine->lock, flags); } -static void guc_reset_finish(struct intel_engine_cs *engine) +void intel_guc_submission_cancel_requests(struct intel_guc *guc) {
- if (__tasklet_enable(&engine->sched_engine->tasklet))
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&engine->sched_engine->tasklet);
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
guc_cancel_context_requests(ce);
- ENGINE_TRACE(engine, "depth->%d\n",
atomic_read(&engine->sched_engine->tasklet.count));
- guc_cancel_sched_engine_requests(guc->sched_engine);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+void intel_guc_submission_reset_finish(struct intel_guc *guc) +{
- /* Reset called during driver load or during wedge? */
- if (unlikely(!guc_submission_initialized(guc) ||
test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
return;
- /*
* Technically possible for either of these values to be non-zero here,
* but very unlikely + harmless. Regardless let's add a warn so we can
* see in CI if this happens frequently / a precursor to taking down the
* machine.
*/
- GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
- atomic_set(&guc->outstanding_submission_g2h, 0);
- enable_submission(guc); } /*
@@ -656,6 +956,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, else trace_i915_request_guc_submit(rq);
- if (unlikely(ret == -EIO))
disable_submission(guc);
- return ret; }
@@ -668,7 +971,8 @@ static void guc_submit_request(struct i915_request *rq) /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags);
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
- if (submission_disabled(guc) || guc->stalled_request ||
queue_request(sched_engine, rq, rq_prio(rq)); else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) tasklet_hi_schedule(&sched_engine->tasklet);!i915_sched_engine_is_empty(sched_engine))
@@ -805,7 +1109,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce) static int __guc_action_register_context(struct intel_guc *guc, u32 guc_id,
u32 offset)
u32 offset,
{ u32 action[] = { INTEL_GUC_ACTION_REGISTER_CONTEXT,bool loop)
@@ -813,10 +1118,10 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop); }
-static int register_context(struct intel_context *ce) +static int register_context(struct intel_context *ce, bool loop) { struct intel_guc *guc = ce_to_guc(ce); u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + @@ -824,11 +1129,12 @@ static int register_context(struct intel_context *ce) trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset);
- return __guc_action_register_context(guc, ce->guc_id, offset, loop); } static int __guc_action_deregister_context(struct intel_guc *guc,
u32 guc_id)
u32 guc_id,
{ u32 action[] = { INTEL_GUC_ACTION_DEREGISTER_CONTEXT,bool loop)
@@ -836,16 +1142,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc, }; return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
G2H_LEN_DW_DEREGISTER_CONTEXT, true);
}G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
-static int deregister_context(struct intel_context *ce, u32 guc_id) +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop) { struct intel_guc *guc = ce_to_guc(ce); trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id);
- return __guc_action_deregister_context(guc, guc_id, loop); } static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -874,7 +1180,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US; } -static int guc_lrc_desc_pin(struct intel_context *ce) +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -920,18 +1226,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce) */ if (context_registered) { trace_intel_context_steal_guc_id(ce);
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
if (!loop) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
} else {
bool disabled;
unsigned long flags;
/* Seal race with Reset */
spin_lock_irqsave(&ce->guc_state.lock, flags);
disabled = submission_disabled(guc);
if (likely(!disabled)) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
}
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
if (unlikely(disabled)) {
reset_lrc_desc(guc, desc_idx);
return 0; /* Will get registered later */
}
/*}
*/ with_intel_runtime_pm(runtime_pm, wakeref)
- If stealing the guc_id, this ce has the same guc_id as the
- context whos guc_id was stole.
ret = deregister_context(ce, ce->guc_id);
ret = deregister_context(ce, ce->guc_id, loop);
if (unlikely(ret == -EBUSY)) {
clr_context_wait_for_deregister_to_register(ce);
intel_context_put(ce);
} else { with_intel_runtime_pm(runtime_pm, wakeref)}
ret = register_context(ce);
ret = register_context(ce, loop);
if (unlikely(ret == -EBUSY))
reset_lrc_desc(guc, desc_idx);
else if (unlikely(ret == -ENODEV))
} return ret;ret = 0; /* Will get registered later */
@@ -994,7 +1326,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID); trace_intel_context_sched_disable(ce);
- intel_context_get(ce); guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1004,6 +1335,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce) { set_context_pending_disable(ce); clr_context_enabled(ce);
- intel_context_get(ce); return ce->guc_id; }
@@ -1016,7 +1348,7 @@ static void guc_context_sched_disable(struct intel_context *ce) u16 guc_id; intel_wakeref_t wakeref;
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) || context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { clr_context_enabled(ce); goto unpin;
@@ -1034,6 +1366,7 @@ static void guc_context_sched_disable(struct intel_context *ce) * request doesn't slip through the 'context_pending_disable' fence. */ if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
return; } guc_id = prep_context_pending_disable(ce);spin_unlock_irqrestore(&ce->guc_state.lock, flags);
@@ -1050,19 +1383,13 @@ static void guc_context_sched_disable(struct intel_context *ce) static inline void guc_lrc_desc_unpin(struct intel_context *ce) {
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- unsigned long flags;
- struct intel_guc *guc = ce_to_guc(ce); GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); GEM_BUG_ON(context_enabled(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- deregister_context(ce, ce->guc_id);
- deregister_context(ce, ce->guc_id, true); } static void __guc_context_destroy(struct intel_context *ce)
@@ -1090,13 +1417,15 @@ static void guc_context_destroy(struct kref *kref) struct intel_guc *guc = &ce->engine->gt->uc.guc; intel_wakeref_t wakeref; unsigned long flags;
- bool disabled; /*
*/
- If the guc_id is invalid this context has been stolen and we can free
- it immediately. Also can be freed immediately if the context is not
- registered with the GuC.
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) ||
release_guc_id(guc, ce); __guc_context_destroy(ce);context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) {
@@ -1123,6 +1452,18 @@ static void guc_context_destroy(struct kref *kref) list_del_init(&ce->guc_id_link); spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /* Seal race with Reset */
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- disabled = submission_disabled(guc);
- if (likely(!disabled))
set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- if (unlikely(disabled)) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
return;
- }
- /*
- We defer GuC context deregistration until the context is destroyed
- in order to save on CTBs. With this optimization ideally we only need
@@ -1145,6 +1486,33 @@ static int guc_context_alloc(struct intel_context *ce) return lrc_alloc(ce, ce->engine); } +static void add_to_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock(&ce->guc_active.lock);
- list_move_tail(&rq->sched.link, &ce->guc_active.requests);
- spin_unlock(&ce->guc_active.lock);
+}
+static void remove_from_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock_irq(&ce->guc_active.lock);
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&ce->guc_active.lock);
- atomic_dec(&ce->guc_id_ref);
- i915_request_notify_execute_cb_imm(rq);
+}
- static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
@@ -1183,8 +1551,6 @@ static void guc_signal_context_fence(struct intel_context *ce) { unsigned long flags;
- GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_wait_for_deregister_to_register(ce); __guc_signal_context_fence(ce);
@@ -1193,8 +1559,9 @@ static void guc_signal_context_fence(struct intel_context *ce) static bool context_needs_register(struct intel_context *ce, bool new_guc_id) {
- return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
- return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
} static int guc_request_alloc(struct i915_request *rq)!submission_disabled(ce_to_guc(ce));
@@ -1252,8 +1619,12 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(ret < 0)) return ret;; if (context_needs_register(ce, !!ret)) {
ret = guc_lrc_desc_pin(ce);
if (unlikely(ret)) { /* unwind */ret = guc_lrc_desc_pin(ce, true);
if (ret == -EIO) {
disable_submission(guc);
goto out; /* GPU will be reset */
} atomic_dec(&ce->guc_id_ref); unpin_guc_id(guc, ce); return ret;
@@ -1290,20 +1661,6 @@ static int guc_request_alloc(struct i915_request *rq) return 0; } -static struct intel_engine_cs * -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) -{
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
-}
- static int guc_virtual_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr)
@@ -1512,7 +1869,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc, { if (context_guc_id_invalid(ce)) pin_guc_id(guc, ce);
- guc_lrc_desc_pin(ce);
- guc_lrc_desc_pin(ce, true); } static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1578,13 +1935,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; engine->bump_serial = guc_bump_serial;
- engine->add_active_request = add_to_context;
- engine->remove_active_request = remove_from_context; engine->sched_engine->schedule = i915_schedule;
- engine->reset.prepare = guc_reset_prepare;
- engine->reset.rewind = guc_reset_rewind;
- engine->reset.cancel = guc_reset_cancel;
- engine->reset.finish = guc_reset_finish;
- engine->reset.prepare = guc_reset_nop;
- engine->reset.rewind = guc_rewind_nop;
- engine->reset.cancel = guc_reset_nop;
- engine->reset.finish = guc_reset_nop; engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1757,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, * register this context. */ with_intel_runtime_pm(runtime_pm, wakeref)
register_context(ce);
guc_signal_context_fence(ce); intel_context_put(ce); } else if (context_destroyed(ce)) {register_context(ce, true);
@@ -1939,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request =
sibling->add_active_request;
ve->base.remove_active_request =
sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 6d8b9233214e..f0b02200aa01 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
- if (!intel_guc_is_ready(guc))
- /* Nothing to do if GuC isn't supported */
- if (!intel_uc_supports_guc(uc)) return;
- /* Firmware expected to be running when this function is called */
- if (!intel_guc_is_ready(guc))
goto sanitize;
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_prepare(guc);
+sanitize: __uc_sanitize(uc); } +void intel_uc_reset(struct intel_uc *uc, bool stalled) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset(guc, stalled);
+}
+void intel_uc_reset_finish(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware expected to be running when this function is called */
- if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_finish(guc);
+}
+void intel_uc_cancel_requests(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_cancel_requests(guc);
+}
- void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index c4cef885e984..eaa3202192ac 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc); void intel_uc_driver_remove(struct intel_uc *uc); void intel_uc_init_mmio(struct intel_uc *uc); void intel_uc_reset_prepare(struct intel_uc *uc); +void intel_uc_reset(struct intel_uc *uc, bool stalled); +void intel_uc_reset_finish(struct intel_uc *uc); +void intel_uc_cancel_requests(struct intel_uc *uc); void intel_uc_suspend(struct intel_uc *uc); void intel_uc_runtime_suspend(struct intel_uc *uc); int intel_uc_resume(struct intel_uc *uc); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index dec5a35c9aa2..192784875a1d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk) return false; } -static void __notify_execute_cb_imm(struct i915_request *rq) +void i915_request_notify_execute_cb_imm(struct i915_request *rq) { __notify_execute_cb(rq, irq_work_imm); } @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq, return ret; }
-static void remove_from_engine(struct i915_request *rq) -{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- __notify_execute_cb_imm(rq);
-}
- static void __rq_init_watchdog(struct i915_request *rq) { rq->watchdog.timer.function = NULL;
@@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq) * after removing the breadcrumb and signaling it, so that we do not * inadvertently attach the breadcrumb to a completed request. */
- if (!list_empty(&rq->sched.link))
remove_from_engine(rq);
- atomic_dec(&rq->context->guc_id_ref);
- rq->engine->remove_active_request(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb)); __list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq, if (llist_add(&cb->work.node.llist, &signal->execute_cb)) { if (i915_request_is_active(signal) || __request_in_flight(signal))
__notify_execute_cb_imm(signal);
} return 0;i915_request_notify_execute_cb_imm(signal);
@@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request) result = true; GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
- list_move_tail(&request->sched.link, &engine->sched_engine->requests);
- engine->add_active_request(request); active: clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index f870cd75a001..bcc6340c505e 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -649,4 +649,6 @@ bool i915_request_active_engine(struct i915_request *rq, struct intel_engine_cs **active); +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
- #endif /* I915_REQUEST_H */
On 26/07/2021 23:48, Matthew Brost wrote:
On Thu, Jul 15, 2021 at 10:36:51AM +0100, Tvrtko Ursulin wrote:
On 24/06/2021 08:05, Matthew Brost wrote:
Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies.
With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs.
No updates after my review comments on 6th of May.
At least:
- wmb documentation
Yea, missed this. Checkpatch yelled at me too. Will be fixed in next rev.
- Spin lock cycling I either didn't understand or didn't buy the
explanation. I don't remember seeing that pattern elsewhere in the driver - cycle a spinlock to make sure what was updated inside it is visible you said?
I did respond - not really my fault if you don't understand a fairly simple concept but I'll explain again.
Change a variable Cycle a lock At this point we know anyone that acquires above lock the variable change is visible.
I can't be the first person in the Linux kernel to do this nor in the i915.
Don't know, did not do an exhaustive search. I can understand it being used to make sure any lock taking sections would exit, if they happened to be running simultaneously to the lock cycling code, but you seem to be describing it being used as a memory barrier.
So either a code comment or just use a memory barrier is my ask. There is a requirement to comment memory barriers anyway so if this is effectively one of them it's pretty clear cut.
This basically allows to seal all the reset races without a BKL.
Also I told you I explain in this a doc patch that will get reposted after GuC submission lands: https://patchwork.freedesktop.org/patch/432408/?series=89844&rev=1
- Dropping the lock protecting the list in the middle of
list_for_each_entry_safe and just continuing to iterate like nothing happened. (__unwind_incomplete_requests) Again, perhaps I did not understand your explanation properly but you did appear to write:
To be honest looking at the code now we likely don't need to drop the look but regardless I don't think we should change this for the following reasons.
Then don't?
- I assure you this is safe and works. I can add a better comment
explaining this though.
Yes please for a comment. Assurances are all good until a new bug is found.
- This is thoroughly tested and resets are the hardest thing to get
stable and working.
Well new bugs are found even after statements of things being well tested so I'd err on the side of caution. And I don't mean your code here but as a general principle.
- This code is literally going to get deleted when we move to the DRM
scheduler as all the tracking / unwinding / resubmission will be in the DRM scheduler core.
Yeah, but if that cannot be guaranteed to happen in the same kernel release then lets not put dodgy code in.
- A 2 second search of the driver found that we do the same thing in
intel_gt_retire_requests_timeout so this isn't unprecedented.
The code there is bit different. It uses list_safe_reset_next after re-acquiring the lock and only then unlinks the current element from the list.
It all boils down to whether something can modify the list in parallel in your case. If it can't, just don't take the lock but instead put a comment saying why the lock does not need to be taken would be my suggestion. That way you avoid having to explain why the iteration is not broken.
Regards,
Tvrtko
Matt
""" We only need the active lock for ce->guc_active.requests list. It is indeed safe to drop the lock. """
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry_safe(rq, rn,
&ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
continue;
list_del_init(&rq->sched.link);
spin_unlock(&ce->guc_active.lock);
...
spin_lock(&ce->guc_active.lock);
- }
Safe iterator guards against list_del but dropping the lock means the state of the overall list can change so next pointer may or may not be valid, requests may be missed, I don't know. Needs a comment explaining why it is safe.
Regards,
Tvrtko
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + .../drm/i915/gt/intel_execlists_submission.c | 40 ++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 22 + drivers/gpu/drm/i915/gt/mock_engine.c | 31 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 +- drivers/gpu/drm/i915/i915_request.h | 2 + 15 files changed, 649 insertions(+), 171 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index b24a1b7a3f88..2f01437056a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences);
- spin_lock_init(&ce->guc_active.lock);
- INIT_LIST_HEAD(&ce->guc_active.requests);
- ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6945963a31ba..b63c8cf7823b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -165,6 +165,13 @@ struct intel_context { struct list_head fences; } guc_state;
- struct {
/** lock: protects everything in guc_active */
spinlock_t lock;
/** requests: active requests on this context */
struct list_head requests;
- } guc_active;
- /* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e7cb6a06db9d..f9d264c008e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -426,6 +426,12 @@ struct intel_engine_cs { void (*release)(struct intel_engine_cs *engine);
- /*
* Add / remove request from engine active tracking
*/
- void (*add_active_request)(struct i915_request *rq);
- void (*remove_active_request)(struct i915_request *rq);
- struct intel_engine_execlists execlists; /*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index c10ea6080752..c301a2d088b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); } +static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static bool can_preempt(struct intel_engine_cs *engine) { if (GRAPHICS_VER(engine->i915) > 8)
@@ -3218,6 +3254,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; engine->bump_serial = execlist_bump_serial;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine; engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind;
@@ -3912,6 +3950,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request = sibling->add_active_request;
ve->base.remove_active_request = sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index aef3084e8b16..463a6ae605a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt);
- intel_uc_sanitize(>->uc);
- for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); }
- intel_uc_reset(>->uc, false);
- for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; }
- intel_uc_reset_finish(>->uc);
- intel_rps_enable(>->rps); intel_llc_enable(>->llc);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..2987282dff6d 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable();
- intel_uc_reset(>->uc, true);
- intel_ggtt_restore_fences(gt->ggtt); return err;
@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); }
- intel_uc_reset_finish(>->uc); } static void nop_submit_request(struct i915_request *request)
@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine);
- intel_uc_cancel_requests(>->uc); local_bh_enable(); reset_finish(gt, awake);
@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
- if (intel_engine_uses_guc(engine))
return -ENODEV;
- if (!intel_engine_pm_get_if_awake(engine)) return 0;
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
- if (intel_engine_uses_guc(engine))
ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
- else
ret = intel_gt_reset_engine(engine);
- ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */
ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
}ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret); goto out;
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt, * Try engine reset when available. We fall back to full reset if * single reset fails. */
- if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
- if (!intel_uc_uses_guc_submission(>->uc) &&
intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) { local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index e1506b280df1..99dcdc8fba12 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1049,6 +1049,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine) engine->serial++; } +static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- spin_lock_irq(&rq->engine->sched_engine->lock);
- list_del_init(&rq->sched.link);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&rq->engine->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1085,9 @@ static void setup_common(struct intel_engine_cs *engine) engine->reset.cancel = reset_cancel; engine->reset.finish = reset_finish;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine;
- engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index fc5a65ab1937..c12ff3a75ce6 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request) spin_unlock_irqrestore(&engine->hw_lock, flags); } +static void mock_add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void mock_remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- spin_unlock_irq(&locked->sched_engine->lock);
+}
- static void mock_reset_prepare(struct intel_engine_cs *engine) { }
@@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request;
- engine->base.add_active_request = mock_add_to_engine;
- engine->base.remove_active_request = mock_remove_from_engine; engine->base.reset.prepare = mock_reset_prepare; engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 6661dcb02239..9b09395b998f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc) return 0; } -/**
- intel_guc_reset_engine() - ask GuC to reset an engine
- @guc: intel_guc structure
- @engine: engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine)
-{
- /* XXX: to be implemented with submission interface rework */
- return -ENODEV;
-}
- /**
- intel_guc_resume() - notify GuC resuming from suspend state
- @guc: the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 22eb1e9cca41..40c9868762d7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout); -int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine);
- int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len);
+void intel_guc_submission_reset_prepare(struct intel_guc *guc); +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); +void intel_guc_submission_reset_finish(struct intel_guc *guc); +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
- void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 83058df5ba01..b8c894ad8caf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce) static inline void set_context_wait_for_deregister_to_register(struct intel_context *ce) {
- /* Only should be called from guc_lrc_desc_pin() */
- /* Only should be called from guc_lrc_desc_pin() without lock */ ce->guc_state.sched_state |= SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; }
@@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc) static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) {
- guc->lrc_desc_pool_vaddr = NULL; i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline bool guc_submission_initialized(struct intel_guc *guc) +{
- return guc->lrc_desc_pool_vaddr != NULL;
+}
- static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) {
- struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
- if (likely(guc_submission_initialized(guc))) {
struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
unsigned long flags;
- memset(desc, 0, sizeof(*desc));
- xa_erase_irq(&guc->context_lookup, id);
memset(desc, 0, sizeof(*desc));
/*
* xarray API doesn't have xa_erase_irqsave wrapper, so calling
* the lower level functions directly.
*/
xa_lock_irqsave(&guc->context_lookup, flags);
__xa_erase(&guc->context_lookup, id);
xa_unlock_irqrestore(&guc->context_lookup, flags);
- } } static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, struct intel_context *ce) {
- xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
- unsigned long flags;
- /*
* xarray API doesn't have xa_save_irqsave wrapper, so calling the
* lower level functions directly.
*/
- xa_lock_irqsave(&guc->context_lookup, flags);
- __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
- xa_unlock_irqrestore(&guc->context_lookup, flags); } static int guc_submission_busy_loop(struct intel_guc* guc,
@@ -331,6 +355,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) interruptible, timeout); } +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
- static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err;
@@ -338,11 +364,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) u32 action[3]; int len = 0; u32 g2h_len_dw = 0;
- bool enabled = context_enabled(ce);
- bool enabled; GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); GEM_BUG_ON(context_guc_id_invalid(ce));
- /*
* Corner case where the GuC firmware was blown away and reloaded while
* this context was pinned.
*/
- if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
err = guc_lrc_desc_pin(ce, false);
if (unlikely(err))
goto out;
- }
- enabled = context_enabled(ce);
- if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id;
@@ -365,6 +402,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) intel_context_put(ce); } +out: return err; } @@ -419,15 +457,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc) if (submit) { guc_set_lrc_tail(last); resubmit:
/*
* We only check for -EBUSY here even though it is possible for
* -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
* died and a full GPU needs to be done. The hangcheck will
* eventually detect that the GuC has died and trigger this
* reset so no need to handle -EDEADLK here.
*/ ret = guc_add_request(guc, last);
if (ret == -EBUSY) {
if (unlikely(ret == -EIO))
goto deadlk;
else if (ret == -EBUSY) { tasklet_schedule(&sched_engine->tasklet); guc->stalled_request = last; return false;
@@ -437,6 +470,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = NULL; return submit;
+deadlk:
- sched_engine->tasklet.callback = NULL;
- tasklet_disable_nosync(&sched_engine->tasklet);
- return false; } static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -463,27 +501,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) intel_engine_signal_breadcrumbs(engine); } -static void guc_reset_prepare(struct intel_engine_cs *engine) +static void __guc_context_destroy(struct intel_context *ce); +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce); +static void guc_signal_context_fence(struct intel_context *ce);
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +{
- struct intel_context *ce;
- unsigned long index, flags;
- bool pending_disable, pending_enable, deregister, destroyed;
- xa_for_each(&guc->context_lookup, index, ce) {
/* Flush context */
spin_lock_irqsave(&ce->guc_state.lock, flags);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
/*
* Once we are at this point submission_disabled() is guaranteed
* to visible to all callers who set the below flags (see above
* flush and flushes in reset_prepare). If submission_disabled()
* is set, the caller shouldn't set these flags.
*/
destroyed = context_destroyed(ce);
pending_enable = context_pending_enable(ce);
pending_disable = context_pending_disable(ce);
deregister = context_wait_for_deregister_to_register(ce);
init_sched_state(ce);
if (pending_enable || destroyed || deregister) {
atomic_dec(&guc->outstanding_submission_g2h);
if (deregister)
guc_signal_context_fence(ce);
if (destroyed) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
}
if (pending_enable|| deregister)
intel_context_put(ce);
}
/* Not mutualy exclusive with above if statement. */
if (pending_disable) {
guc_signal_context_fence(ce);
intel_context_sched_disable_unpin(ce);
atomic_dec(&guc->outstanding_submission_g2h);
intel_context_put(ce);
}
- }
+}
+static inline bool +submission_disabled(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+static void disable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- if (__tasklet_is_enabled(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
__tasklet_disable_sync_once(&sched_engine->tasklet);
sched_engine->tasklet.callback = NULL;
- }
+}
+static void enable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&guc->sched_engine->lock, flags);
- sched_engine->tasklet.callback = guc_submission_tasklet;
- wmb();
- if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
__tasklet_enable(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&sched_engine->tasklet);
- }
- spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+static void guc_flush_submissions(struct intel_guc *guc) {
- ENGINE_TRACE(engine, "\n");
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+void intel_guc_submission_reset_prepare(struct intel_guc *guc) +{
- int i;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
- disable_submission(guc);
- guc->interrupts.disable(guc);
- /* Flush IRQ handler */
- spin_lock_irq(&guc_to_gt(guc)->irq_lock);
- spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
- guc_flush_submissions(guc); /*
* Prevent request submission to the hardware until we have
* completed the reset in i915_gem_reset_finish(). If a request
* is completed by one engine, it may then queue a request
* to a second via its execlists->tasklet *just* as we are
* calling engine->init_hw() and also writing the ELSP.
* Turning off the execlists->tasklet until the reset is over
* prevents the race.
*/
- __tasklet_disable_sync_once(&engine->sched_engine->tasklet);
* Handle any outstanding G2Hs before reset. Call IRQ handler directly
* each pass as interrupt have been disabled. We always scrub for
* outstanding G2H as it is possible for outstanding_submission_g2h to
* be incremented after the context state update.
*/
- for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
do {
wait_for_reset(guc, &guc->outstanding_submission_g2h);
} while (!list_empty(&guc->ct.requests.incoming));
- }
- scrub_guc_desc_for_outstanding_g2h(guc); }
-static void guc_reset_state(struct intel_context *ce,
struct intel_engine_cs *engine,
u32 head,
bool scrub)
+static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) {
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
+}
+static inline struct intel_engine_cs * +__context_to_physical_engine(struct intel_context *ce) +{
- struct intel_engine_cs *engine = ce->engine;
- if (intel_engine_is_virtual(engine))
engine = guc_virtual_get_sibling(engine, 0);
- return engine;
+}
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub) +{
- struct intel_engine_cs *engine = __context_to_physical_engine(ce);
- GEM_BUG_ON(!intel_context_is_pinned(ce)); /*
@@ -501,42 +677,147 @@ static void guc_reset_state(struct intel_context *ce, lrc_update_regs(ce, engine, head); } -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) +static void guc_reset_nop(struct intel_engine_cs *engine) {
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_request *rq;
+}
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled) +{ +}
+static void +__unwind_incomplete_requests(struct intel_context *ce) +{
- struct i915_request *rq, *rn;
- struct list_head *pl;
- int prio = I915_PRIORITY_INVALID;
- struct i915_sched_engine * const sched_engine =
unsigned long flags;ce->engine->sched_engine;
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry_safe(rq, rn,
&ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
continue;
list_del_init(&rq->sched.link);
spin_unlock(&ce->guc_active.lock);
__i915_request_unsubmit(rq);
/* Push the request back into the queue for later resubmission. */
GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
if (rq_prio(rq) != prio) {
prio = rq_prio(rq);
pl = i915_sched_lookup_priolist(sched_engine, prio);
}
GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
list_add_tail(&rq->sched.link, pl);
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Push back any incomplete requests for replay after the reset. */
- rq = execlists_unwind_incomplete_requests(execlists);
- if (!rq)
goto out_unlock;
spin_lock(&ce->guc_active.lock);
- }
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static struct i915_request *context_find_active_request(struct intel_context *ce) +{
- struct i915_request *rq, *active = NULL;
- unsigned long flags;
- spin_lock_irqsave(&ce->guc_active.lock, flags);
- list_for_each_entry_reverse(rq, &ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
break;
active = rq;
- }
- spin_unlock_irqrestore(&ce->guc_active.lock, flags);
- return active;
+}
+static void __guc_reset_context(struct intel_context *ce, bool stalled) +{
- struct i915_request *rq;
- u32 head;
- /*
* GuC will implicitly mark the context as non-schedulable
* when it sends the reset notification. Make sure our state
* reflects this change. The context will be marked enabled
* on resubmission.
*/
- clr_context_enabled(ce);
- rq = context_find_active_request(ce);
- if (!rq) {
head = ce->ring->tail;
stalled = false;
goto out_replay;
- } if (!i915_request_started(rq)) stalled = false;
- GEM_BUG_ON(i915_active_is_idle(&ce->active));
- head = intel_ring_wrap(ce->ring, rq->head); __i915_request_reset(rq, stalled);
- guc_reset_state(rq->context, engine, rq->head, stalled);
-out_unlock:
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
- guc_reset_state(ce, head, stalled);
- __unwind_incomplete_requests(ce); }
-static void guc_reset_cancel(struct intel_engine_cs *engine) +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled) +{
- struct intel_context *ce;
- unsigned long index;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
__guc_reset_context(ce, stalled);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+static void guc_cancel_context_requests(struct intel_context *ce) +{
- struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
- struct i915_request *rq;
- unsigned long flags;
- /* Mark all executing requests as skipped. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
i915_request_put(i915_request_mark_eio(rq));
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static void +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine) {
- struct i915_sched_engine * const sched_engine = engine->sched_engine; struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags; /* Can be called during boot if GuC fails to load */
- if (!engine->gt)
- if (!sched_engine) return;
- ENGINE_TRACE(engine, "\n");
- /*
- Before we call engine->cancel_requests(), we should have exclusive
- access to the submission state. This is arranged for us by the
@@ -553,21 +834,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) */ spin_lock_irqsave(&sched_engine->lock, flags);
- /* Mark all executing requests as skipped. */
- list_for_each_entry(rq, &sched_engine->requests, sched.link) {
i915_request_set_error_once(rq, -EIO);
i915_request_mark_complete(rq);
- }
- /* Flush the queued requests to the timeline list (for retiring). */ while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); priolist_for_each_request_consume(rq, rn, p) { list_del_init(&rq->sched.link);
__i915_request_submit(rq);
dma_fence_set_error(&rq->fence, -EIO);
i915_request_mark_complete(rq);
i915_request_put(i915_request_mark_eio(rq)); } rb_erase_cached(&p->node, &sched_engine->queue);
@@ -582,14 +858,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) spin_unlock_irqrestore(&sched_engine->lock, flags); } -static void guc_reset_finish(struct intel_engine_cs *engine) +void intel_guc_submission_cancel_requests(struct intel_guc *guc) {
- if (__tasklet_enable(&engine->sched_engine->tasklet))
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&engine->sched_engine->tasklet);
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
guc_cancel_context_requests(ce);
- ENGINE_TRACE(engine, "depth->%d\n",
atomic_read(&engine->sched_engine->tasklet.count));
- guc_cancel_sched_engine_requests(guc->sched_engine);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+void intel_guc_submission_reset_finish(struct intel_guc *guc) +{
- /* Reset called during driver load or during wedge? */
- if (unlikely(!guc_submission_initialized(guc) ||
test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
return;
- /*
* Technically possible for either of these values to be non-zero here,
* but very unlikely + harmless. Regardless let's add a warn so we can
* see in CI if this happens frequently / a precursor to taking down the
* machine.
*/
- GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
- atomic_set(&guc->outstanding_submission_g2h, 0);
- enable_submission(guc); } /*
@@ -656,6 +956,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, else trace_i915_request_guc_submit(rq);
- if (unlikely(ret == -EIO))
disable_submission(guc);
- return ret; }
@@ -668,7 +971,8 @@ static void guc_submit_request(struct i915_request *rq) /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags);
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
- if (submission_disabled(guc) || guc->stalled_request ||
else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) tasklet_hi_schedule(&sched_engine->tasklet);!i915_sched_engine_is_empty(sched_engine)) queue_request(sched_engine, rq, rq_prio(rq));
@@ -805,7 +1109,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce) static int __guc_action_register_context(struct intel_guc *guc, u32 guc_id,
u32 offset)
u32 offset,
{ u32 action[] = { INTEL_GUC_ACTION_REGISTER_CONTEXT,bool loop)
@@ -813,10 +1118,10 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop); }
-static int register_context(struct intel_context *ce) +static int register_context(struct intel_context *ce, bool loop) { struct intel_guc *guc = ce_to_guc(ce); u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + @@ -824,11 +1129,12 @@ static int register_context(struct intel_context *ce) trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset);
- return __guc_action_register_context(guc, ce->guc_id, offset, loop); } static int __guc_action_deregister_context(struct intel_guc *guc,
u32 guc_id)
u32 guc_id,
{ u32 action[] = { INTEL_GUC_ACTION_DEREGISTER_CONTEXT,bool loop)
@@ -836,16 +1142,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc, }; return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
G2H_LEN_DW_DEREGISTER_CONTEXT, true);
}G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
-static int deregister_context(struct intel_context *ce, u32 guc_id) +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop) { struct intel_guc *guc = ce_to_guc(ce); trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id);
- return __guc_action_deregister_context(guc, guc_id, loop); } static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -874,7 +1180,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US; } -static int guc_lrc_desc_pin(struct intel_context *ce) +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -920,18 +1226,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce) */ if (context_registered) { trace_intel_context_steal_guc_id(ce);
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
if (!loop) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
} else {
bool disabled;
unsigned long flags;
/* Seal race with Reset */
spin_lock_irqsave(&ce->guc_state.lock, flags);
disabled = submission_disabled(guc);
if (likely(!disabled)) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
}
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
if (unlikely(disabled)) {
reset_lrc_desc(guc, desc_idx);
return 0; /* Will get registered later */
}
} /* * If stealing the guc_id, this ce has the same guc_id as the * context whos guc_id was stole. */ with_intel_runtime_pm(runtime_pm, wakeref)
ret = deregister_context(ce, ce->guc_id);
ret = deregister_context(ce, ce->guc_id, loop);
if (unlikely(ret == -EBUSY)) {
clr_context_wait_for_deregister_to_register(ce);
intel_context_put(ce);
} else { with_intel_runtime_pm(runtime_pm, wakeref)}
ret = register_context(ce);
ret = register_context(ce, loop);
if (unlikely(ret == -EBUSY))
reset_lrc_desc(guc, desc_idx);
else if (unlikely(ret == -ENODEV))
} return ret;ret = 0; /* Will get registered later */
@@ -994,7 +1326,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID); trace_intel_context_sched_disable(ce);
- intel_context_get(ce); guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1004,6 +1335,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce) { set_context_pending_disable(ce); clr_context_enabled(ce);
- intel_context_get(ce); return ce->guc_id; }
@@ -1016,7 +1348,7 @@ static void guc_context_sched_disable(struct intel_context *ce) u16 guc_id; intel_wakeref_t wakeref;
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) || context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { clr_context_enabled(ce); goto unpin;
@@ -1034,6 +1366,7 @@ static void guc_context_sched_disable(struct intel_context *ce) * request doesn't slip through the 'context_pending_disable' fence. */ if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
} guc_id = prep_context_pending_disable(ce);spin_unlock_irqrestore(&ce->guc_state.lock, flags); return;
@@ -1050,19 +1383,13 @@ static void guc_context_sched_disable(struct intel_context *ce) static inline void guc_lrc_desc_unpin(struct intel_context *ce) {
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- unsigned long flags;
- struct intel_guc *guc = ce_to_guc(ce); GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); GEM_BUG_ON(context_enabled(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- deregister_context(ce, ce->guc_id);
- deregister_context(ce, ce->guc_id, true); } static void __guc_context_destroy(struct intel_context *ce)
@@ -1090,13 +1417,15 @@ static void guc_context_destroy(struct kref *kref) struct intel_guc *guc = &ce->engine->gt->uc.guc; intel_wakeref_t wakeref; unsigned long flags;
- bool disabled; /*
*/
- If the guc_id is invalid this context has been stolen and we can free
- it immediately. Also can be freed immediately if the context is not
- registered with the GuC.
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) ||
context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { release_guc_id(guc, ce); __guc_context_destroy(ce);
@@ -1123,6 +1452,18 @@ static void guc_context_destroy(struct kref *kref) list_del_init(&ce->guc_id_link); spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /* Seal race with Reset */
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- disabled = submission_disabled(guc);
- if (likely(!disabled))
set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- if (unlikely(disabled)) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
return;
- }
- /*
- We defer GuC context deregistration until the context is destroyed
- in order to save on CTBs. With this optimization ideally we only need
@@ -1145,6 +1486,33 @@ static int guc_context_alloc(struct intel_context *ce) return lrc_alloc(ce, ce->engine); } +static void add_to_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock(&ce->guc_active.lock);
- list_move_tail(&rq->sched.link, &ce->guc_active.requests);
- spin_unlock(&ce->guc_active.lock);
+}
+static void remove_from_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock_irq(&ce->guc_active.lock);
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&ce->guc_active.lock);
- atomic_dec(&ce->guc_id_ref);
- i915_request_notify_execute_cb_imm(rq);
+}
- static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
@@ -1183,8 +1551,6 @@ static void guc_signal_context_fence(struct intel_context *ce) { unsigned long flags;
- GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_wait_for_deregister_to_register(ce); __guc_signal_context_fence(ce);
@@ -1193,8 +1559,9 @@ static void guc_signal_context_fence(struct intel_context *ce) static bool context_needs_register(struct intel_context *ce, bool new_guc_id) {
- return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
- return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
} static int guc_request_alloc(struct i915_request *rq)!submission_disabled(ce_to_guc(ce));
@@ -1252,8 +1619,12 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(ret < 0)) return ret;; if (context_needs_register(ce, !!ret)) {
ret = guc_lrc_desc_pin(ce);
ret = guc_lrc_desc_pin(ce, true); if (unlikely(ret)) { /* unwind */
if (ret == -EIO) {
disable_submission(guc);
goto out; /* GPU will be reset */
} atomic_dec(&ce->guc_id_ref); unpin_guc_id(guc, ce); return ret;
@@ -1290,20 +1661,6 @@ static int guc_request_alloc(struct i915_request *rq) return 0; } -static struct intel_engine_cs * -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) -{
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
-}
- static int guc_virtual_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr)
@@ -1512,7 +1869,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc, { if (context_guc_id_invalid(ce)) pin_guc_id(guc, ce);
- guc_lrc_desc_pin(ce);
- guc_lrc_desc_pin(ce, true); } static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1578,13 +1935,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; engine->bump_serial = guc_bump_serial;
- engine->add_active_request = add_to_context;
- engine->remove_active_request = remove_from_context; engine->sched_engine->schedule = i915_schedule;
- engine->reset.prepare = guc_reset_prepare;
- engine->reset.rewind = guc_reset_rewind;
- engine->reset.cancel = guc_reset_cancel;
- engine->reset.finish = guc_reset_finish;
- engine->reset.prepare = guc_reset_nop;
- engine->reset.rewind = guc_rewind_nop;
- engine->reset.cancel = guc_reset_nop;
- engine->reset.finish = guc_reset_nop; engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1757,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, * register this context. */ with_intel_runtime_pm(runtime_pm, wakeref)
register_context(ce);
} else if (context_destroyed(ce)) {register_context(ce, true); guc_signal_context_fence(ce); intel_context_put(ce);
@@ -1939,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request =
sibling->add_active_request;
ve->base.remove_active_request =
sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 6d8b9233214e..f0b02200aa01 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
- if (!intel_guc_is_ready(guc))
- /* Nothing to do if GuC isn't supported */
- if (!intel_uc_supports_guc(uc)) return;
- /* Firmware expected to be running when this function is called */
- if (!intel_guc_is_ready(guc))
goto sanitize;
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_prepare(guc);
+sanitize: __uc_sanitize(uc); } +void intel_uc_reset(struct intel_uc *uc, bool stalled) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset(guc, stalled);
+}
+void intel_uc_reset_finish(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware expected to be running when this function is called */
- if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_finish(guc);
+}
+void intel_uc_cancel_requests(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_cancel_requests(guc);
+}
- void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index c4cef885e984..eaa3202192ac 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc); void intel_uc_driver_remove(struct intel_uc *uc); void intel_uc_init_mmio(struct intel_uc *uc); void intel_uc_reset_prepare(struct intel_uc *uc); +void intel_uc_reset(struct intel_uc *uc, bool stalled); +void intel_uc_reset_finish(struct intel_uc *uc); +void intel_uc_cancel_requests(struct intel_uc *uc); void intel_uc_suspend(struct intel_uc *uc); void intel_uc_runtime_suspend(struct intel_uc *uc); int intel_uc_resume(struct intel_uc *uc); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index dec5a35c9aa2..192784875a1d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk) return false; } -static void __notify_execute_cb_imm(struct i915_request *rq) +void i915_request_notify_execute_cb_imm(struct i915_request *rq) { __notify_execute_cb(rq, irq_work_imm); } @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq, return ret; }
-static void remove_from_engine(struct i915_request *rq) -{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- __notify_execute_cb_imm(rq);
-}
- static void __rq_init_watchdog(struct i915_request *rq) { rq->watchdog.timer.function = NULL;
@@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq) * after removing the breadcrumb and signaling it, so that we do not * inadvertently attach the breadcrumb to a completed request. */
- if (!list_empty(&rq->sched.link))
remove_from_engine(rq);
- atomic_dec(&rq->context->guc_id_ref);
- rq->engine->remove_active_request(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb)); __list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq, if (llist_add(&cb->work.node.llist, &signal->execute_cb)) { if (i915_request_is_active(signal) || __request_in_flight(signal))
__notify_execute_cb_imm(signal);
} return 0;i915_request_notify_execute_cb_imm(signal);
@@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request) result = true; GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
- list_move_tail(&request->sched.link, &engine->sched_engine->requests);
- engine->add_active_request(request); active: clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index f870cd75a001..bcc6340c505e 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -649,4 +649,6 @@ bool i915_request_active_engine(struct i915_request *rq, struct intel_engine_cs **active); +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
- #endif /* I915_REQUEST_H */
On Tue, Jul 27, 2021 at 09:56:06AM +0100, Tvrtko Ursulin wrote:
On 26/07/2021 23:48, Matthew Brost wrote:
On Thu, Jul 15, 2021 at 10:36:51AM +0100, Tvrtko Ursulin wrote:
On 24/06/2021 08:05, Matthew Brost wrote:
Reset implementation for new GuC interface. This is the legacy reset implementation which is called when the i915 owns the engine hang check. Future patches will offload the engine hang check to GuC but we will continue to maintain this legacy path as a fallback and this code path is also required if the GuC dies.
With the new GuC interface it is not possible to reset individual engines - it is only possible to reset the GPU entirely. This patch forces an entire chip reset if any engine hangs.
No updates after my review comments on 6th of May.
At least:
- wmb documentation
Yea, missed this. Checkpatch yelled at me too. Will be fixed in next rev.
- Spin lock cycling I either didn't understand or didn't buy the
explanation. I don't remember seeing that pattern elsewhere in the driver - cycle a spinlock to make sure what was updated inside it is visible you said?
I did respond - not really my fault if you don't understand a fairly simple concept but I'll explain again.
Change a variable Cycle a lock At this point we know anyone that acquires above lock the variable change is visible.
I can't be the first person in the Linux kernel to do this nor in the i915.
Don't know, did not do an exhaustive search. I can understand it being used to make sure any lock taking sections would exit, if they happened to be running simultaneously to the lock cycling code, but you seem to be describing it being used as a memory barrier.
So either a code comment or just use a memory barrier is my ask. There is a requirement to comment memory barriers anyway so if this is effectively one of them it's pretty clear cut.
This more than a memory barrier, not just ensuring that variable change is visible but any actions before the change was visible are complete.
This basically allows to seal all the reset races without a BKL.
Also I told you I explain in this a doc patch that will get reposted after GuC submission lands: https://patchwork.freedesktop.org/patch/432408/?series=89844&rev=1
- Dropping the lock protecting the list in the middle of
list_for_each_entry_safe and just continuing to iterate like nothing happened. (__unwind_incomplete_requests) Again, perhaps I did not understand your explanation properly but you did appear to write:
To be honest looking at the code now we likely don't need to drop the look but regardless I don't think we should change this for the following reasons.
Then don't?
Ok, let me write this down and do this in an immediate follow up after some thorough testing.
Matt
- I assure you this is safe and works. I can add a better comment
explaining this though.
Yes please for a comment. Assurances are all good until a new bug is found.
- This is thoroughly tested and resets are the hardest thing to get
stable and working.
Well new bugs are found even after statements of things being well tested so I'd err on the side of caution. And I don't mean your code here but as a general principle.
- This code is literally going to get deleted when we move to the DRM
scheduler as all the tracking / unwinding / resubmission will be in the DRM scheduler core.
Yeah, but if that cannot be guaranteed to happen in the same kernel release then lets not put dodgy code in.
- A 2 second search of the driver found that we do the same thing in
intel_gt_retire_requests_timeout so this isn't unprecedented.
The code there is bit different. It uses list_safe_reset_next after re-acquiring the lock and only then unlinks the current element from the list.
It all boils down to whether something can modify the list in parallel in your case. If it can't, just don't take the lock but instead put a comment saying why the lock does not need to be taken would be my suggestion. That way you avoid having to explain why the iteration is not broken.
Regards,
Tvrtko
Matt
""" We only need the active lock for ce->guc_active.requests list. It is indeed safe to drop the lock. """
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry_safe(rq, rn,
&ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
continue;
list_del_init(&rq->sched.link);
spin_unlock(&ce->guc_active.lock);
...
spin_lock(&ce->guc_active.lock);
- }
Safe iterator guards against list_del but dropping the lock means the state of the overall list can change so next pointer may or may not be valid, requests may be missed, I don't know. Needs a comment explaining why it is safe.
Regards,
Tvrtko
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 7 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + .../drm/i915/gt/intel_execlists_submission.c | 40 ++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_reset.c | 18 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 22 + drivers/gpu/drm/i915/gt/mock_engine.c | 31 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 13 - drivers/gpu/drm/i915/gt/uc/intel_guc.h | 8 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 581 ++++++++++++++---- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 39 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 +- drivers/gpu/drm/i915/i915_request.h | 2 + 15 files changed, 649 insertions(+), 171 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index b24a1b7a3f88..2f01437056a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -392,6 +392,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) spin_lock_init(&ce->guc_state.lock); INIT_LIST_HEAD(&ce->guc_state.fences);
- spin_lock_init(&ce->guc_active.lock);
- INIT_LIST_HEAD(&ce->guc_active.requests);
- ce->guc_id = GUC_INVALID_LRC_ID; INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 6945963a31ba..b63c8cf7823b 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -165,6 +165,13 @@ struct intel_context { struct list_head fences; } guc_state;
- struct {
/** lock: protects everything in guc_active */
spinlock_t lock;
/** requests: active requests on this context */
struct list_head requests;
- } guc_active;
- /* GuC scheduling state that does not require a lock. */ atomic_t guc_sched_state_no_lock;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index e7cb6a06db9d..f9d264c008e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -426,6 +426,12 @@ struct intel_engine_cs { void (*release)(struct intel_engine_cs *engine);
- /*
* Add / remove request from engine active tracking
*/
- void (*add_active_request)(struct i915_request *rq);
- void (*remove_active_request)(struct i915_request *rq);
- struct intel_engine_execlists execlists; /*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index c10ea6080752..c301a2d088b1 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3118,6 +3118,42 @@ static void execlists_park(struct intel_engine_cs *engine) cancel_timer(&engine->execlists.preempt); } +static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static bool can_preempt(struct intel_engine_cs *engine) { if (GRAPHICS_VER(engine->i915) > 8)
@@ -3218,6 +3254,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &execlists_context_ops; engine->request_alloc = execlists_request_alloc; engine->bump_serial = execlist_bump_serial;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine; engine->reset.prepare = execlists_reset_prepare; engine->reset.rewind = execlists_reset_rewind;
@@ -3912,6 +3950,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request = sibling->add_active_request;
ve->base.remove_active_request = sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index aef3084e8b16..463a6ae605a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force) if (intel_gt_is_wedged(gt)) intel_gt_unset_wedged(gt);
- intel_uc_sanitize(>->uc);
- for_each_engine(engine, gt, id) if (engine->reset.prepare) engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force) __intel_engine_reset(engine, false); }
- intel_uc_reset(>->uc, false);
- for_each_engine(engine, gt, id) if (engine->reset.finish) engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt) goto err_wedged; }
- intel_uc_reset_finish(>->uc);
- intel_rps_enable(>->rps); intel_llc_enable(>->llc);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72251638d4ea..2987282dff6d 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) __intel_engine_reset(engine, stalled_mask & engine->mask); local_bh_enable();
- intel_uc_reset(>->uc, true);
- intel_ggtt_restore_fences(gt->ggtt); return err;
@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake) if (awake & engine->mask) intel_engine_pm_put(engine); }
- intel_uc_reset_finish(>->uc); } static void nop_submit_request(struct i915_request *request)
@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine);
- intel_uc_cancel_requests(>->uc); local_bh_enable(); reset_finish(gt, awake);
@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags); GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, >->reset.flags));
- if (intel_engine_uses_guc(engine))
return -ENODEV;
- if (!intel_engine_pm_get_if_awake(engine)) return 0;
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) "Resetting %s for %s\n", engine->name, msg); atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
- if (intel_engine_uses_guc(engine))
ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
- else
ret = intel_gt_reset_engine(engine);
- ret = intel_gt_reset_engine(engine); if (ret) { /* If we fail here, we expect to fallback to a global reset */
ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
}ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret); goto out;
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt, * Try engine reset when available. We fall back to full reset if * single reset fails. */
- if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
- if (!intel_uc_uses_guc_submission(>->uc) &&
intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) { local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c index e1506b280df1..99dcdc8fba12 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c @@ -1049,6 +1049,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine) engine->serial++; } +static void add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void remove_from_engine(struct i915_request *rq) +{
- spin_lock_irq(&rq->engine->sched_engine->lock);
- list_del_init(&rq->sched.link);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&rq->engine->sched_engine->lock);
- i915_request_notify_execute_cb_imm(rq);
+}
- static void setup_common(struct intel_engine_cs *engine) { struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1085,9 @@ static void setup_common(struct intel_engine_cs *engine) engine->reset.cancel = reset_cancel; engine->reset.finish = reset_finish;
- engine->add_active_request = add_to_engine;
- engine->remove_active_request = remove_from_engine;
- engine->cops = &ring_context_ops; engine->request_alloc = ring_request_alloc; engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index fc5a65ab1937..c12ff3a75ce6 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -235,6 +235,35 @@ static void mock_submit_request(struct i915_request *request) spin_unlock_irqrestore(&engine->hw_lock, flags); } +static void mock_add_to_engine(struct i915_request *rq) +{
- lockdep_assert_held(&rq->engine->sched_engine->lock);
- list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+static void mock_remove_from_engine(struct i915_request *rq) +{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- spin_unlock_irq(&locked->sched_engine->lock);
+}
- static void mock_reset_prepare(struct intel_engine_cs *engine) { }
@@ -327,6 +356,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, engine->base.emit_flush = mock_emit_flush; engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb; engine->base.submit_request = mock_submit_request;
- engine->base.add_active_request = mock_add_to_engine;
- engine->base.remove_active_request = mock_remove_from_engine; engine->base.reset.prepare = mock_reset_prepare; engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 6661dcb02239..9b09395b998f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc) return 0; } -/**
- intel_guc_reset_engine() - ask GuC to reset an engine
- @guc: intel_guc structure
- @engine: engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine)
-{
- /* XXX: to be implemented with submission interface rework */
- return -ENODEV;
-}
- /**
- intel_guc_resume() - notify GuC resuming from suspend state
- @guc: the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 22eb1e9cca41..40c9868762d7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask) int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout); -int intel_guc_reset_engine(struct intel_guc *guc,
struct intel_engine_cs *engine);
- int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len);
+void intel_guc_submission_reset_prepare(struct intel_guc *guc); +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); +void intel_guc_submission_reset_finish(struct intel_guc *guc); +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
- void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 83058df5ba01..b8c894ad8caf 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce) static inline void set_context_wait_for_deregister_to_register(struct intel_context *ce) {
- /* Only should be called from guc_lrc_desc_pin() */
- /* Only should be called from guc_lrc_desc_pin() without lock */ ce->guc_state.sched_state |= SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER; }
@@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc) static void guc_lrc_desc_pool_destroy(struct intel_guc *guc) {
- guc->lrc_desc_pool_vaddr = NULL; i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP); }
+static inline bool guc_submission_initialized(struct intel_guc *guc) +{
- return guc->lrc_desc_pool_vaddr != NULL;
+}
- static inline void reset_lrc_desc(struct intel_guc *guc, u32 id) {
- struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
- if (likely(guc_submission_initialized(guc))) {
struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
unsigned long flags;
- memset(desc, 0, sizeof(*desc));
- xa_erase_irq(&guc->context_lookup, id);
memset(desc, 0, sizeof(*desc));
/*
* xarray API doesn't have xa_erase_irqsave wrapper, so calling
* the lower level functions directly.
*/
xa_lock_irqsave(&guc->context_lookup, flags);
__xa_erase(&guc->context_lookup, id);
xa_unlock_irqrestore(&guc->context_lookup, flags);
- } } static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id) static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id, struct intel_context *ce) {
- xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
- unsigned long flags;
- /*
* xarray API doesn't have xa_save_irqsave wrapper, so calling the
* lower level functions directly.
*/
- xa_lock_irqsave(&guc->context_lookup, flags);
- __xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
- xa_unlock_irqrestore(&guc->context_lookup, flags); } static int guc_submission_busy_loop(struct intel_guc* guc,
@@ -331,6 +355,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) interruptible, timeout); } +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
- static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) { int err;
@@ -338,11 +364,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) u32 action[3]; int len = 0; u32 g2h_len_dw = 0;
- bool enabled = context_enabled(ce);
- bool enabled; GEM_BUG_ON(!atomic_read(&ce->guc_id_ref)); GEM_BUG_ON(context_guc_id_invalid(ce));
- /*
* Corner case where the GuC firmware was blown away and reloaded while
* this context was pinned.
*/
- if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
err = guc_lrc_desc_pin(ce, false);
if (unlikely(err))
goto out;
- }
- enabled = context_enabled(ce);
- if (!enabled) { action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET; action[len++] = ce->guc_id;
@@ -365,6 +402,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq) intel_context_put(ce); } +out: return err; } @@ -419,15 +457,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc) if (submit) { guc_set_lrc_tail(last); resubmit:
/*
* We only check for -EBUSY here even though it is possible for
* -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
* died and a full GPU needs to be done. The hangcheck will
* eventually detect that the GuC has died and trigger this
* reset so no need to handle -EDEADLK here.
*/ ret = guc_add_request(guc, last);
if (ret == -EBUSY) {
if (unlikely(ret == -EIO))
goto deadlk;
else if (ret == -EBUSY) { tasklet_schedule(&sched_engine->tasklet); guc->stalled_request = last; return false;
@@ -437,6 +470,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc) guc->stalled_request = NULL; return submit;
+deadlk:
- sched_engine->tasklet.callback = NULL;
- tasklet_disable_nosync(&sched_engine->tasklet);
- return false; } static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -463,27 +501,165 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir) intel_engine_signal_breadcrumbs(engine); } -static void guc_reset_prepare(struct intel_engine_cs *engine) +static void __guc_context_destroy(struct intel_context *ce); +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce); +static void guc_signal_context_fence(struct intel_context *ce);
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc) +{
- struct intel_context *ce;
- unsigned long index, flags;
- bool pending_disable, pending_enable, deregister, destroyed;
- xa_for_each(&guc->context_lookup, index, ce) {
/* Flush context */
spin_lock_irqsave(&ce->guc_state.lock, flags);
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
/*
* Once we are at this point submission_disabled() is guaranteed
* to visible to all callers who set the below flags (see above
* flush and flushes in reset_prepare). If submission_disabled()
* is set, the caller shouldn't set these flags.
*/
destroyed = context_destroyed(ce);
pending_enable = context_pending_enable(ce);
pending_disable = context_pending_disable(ce);
deregister = context_wait_for_deregister_to_register(ce);
init_sched_state(ce);
if (pending_enable || destroyed || deregister) {
atomic_dec(&guc->outstanding_submission_g2h);
if (deregister)
guc_signal_context_fence(ce);
if (destroyed) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
}
if (pending_enable|| deregister)
intel_context_put(ce);
}
/* Not mutualy exclusive with above if statement. */
if (pending_disable) {
guc_signal_context_fence(ce);
intel_context_sched_disable_unpin(ce);
atomic_dec(&guc->outstanding_submission_g2h);
intel_context_put(ce);
}
- }
+}
+static inline bool +submission_disabled(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- return unlikely(!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+static void disable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- if (__tasklet_is_enabled(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
__tasklet_disable_sync_once(&sched_engine->tasklet);
sched_engine->tasklet.callback = NULL;
- }
+}
+static void enable_submission(struct intel_guc *guc) +{
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&guc->sched_engine->lock, flags);
- sched_engine->tasklet.callback = guc_submission_tasklet;
- wmb();
- if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
__tasklet_enable(&sched_engine->tasklet)) {
GEM_BUG_ON(!guc->ct.enabled);
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&sched_engine->tasklet);
- }
- spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+static void guc_flush_submissions(struct intel_guc *guc) {
- ENGINE_TRACE(engine, "\n");
- struct i915_sched_engine * const sched_engine = guc->sched_engine;
- unsigned long flags;
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+void intel_guc_submission_reset_prepare(struct intel_guc *guc) +{
- int i;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
- disable_submission(guc);
- guc->interrupts.disable(guc);
- /* Flush IRQ handler */
- spin_lock_irq(&guc_to_gt(guc)->irq_lock);
- spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
- guc_flush_submissions(guc); /*
* Prevent request submission to the hardware until we have
* completed the reset in i915_gem_reset_finish(). If a request
* is completed by one engine, it may then queue a request
* to a second via its execlists->tasklet *just* as we are
* calling engine->init_hw() and also writing the ELSP.
* Turning off the execlists->tasklet until the reset is over
* prevents the race.
*/
- __tasklet_disable_sync_once(&engine->sched_engine->tasklet);
* Handle any outstanding G2Hs before reset. Call IRQ handler directly
* each pass as interrupt have been disabled. We always scrub for
* outstanding G2H as it is possible for outstanding_submission_g2h to
* be incremented after the context state update.
*/
- for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
do {
wait_for_reset(guc, &guc->outstanding_submission_g2h);
} while (!list_empty(&guc->ct.requests.incoming));
- }
- scrub_guc_desc_for_outstanding_g2h(guc); }
-static void guc_reset_state(struct intel_context *ce,
struct intel_engine_cs *engine,
u32 head,
bool scrub)
+static struct intel_engine_cs * +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) {
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
+}
+static inline struct intel_engine_cs * +__context_to_physical_engine(struct intel_context *ce) +{
- struct intel_engine_cs *engine = ce->engine;
- if (intel_engine_is_virtual(engine))
engine = guc_virtual_get_sibling(engine, 0);
- return engine;
+}
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub) +{
- struct intel_engine_cs *engine = __context_to_physical_engine(ce);
- GEM_BUG_ON(!intel_context_is_pinned(ce)); /*
@@ -501,42 +677,147 @@ static void guc_reset_state(struct intel_context *ce, lrc_update_regs(ce, engine, head); } -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled) +static void guc_reset_nop(struct intel_engine_cs *engine) {
- struct intel_engine_execlists * const execlists = &engine->execlists;
- struct i915_request *rq;
+}
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled) +{ +}
+static void +__unwind_incomplete_requests(struct intel_context *ce) +{
- struct i915_request *rq, *rn;
- struct list_head *pl;
- int prio = I915_PRIORITY_INVALID;
- struct i915_sched_engine * const sched_engine =
unsigned long flags;ce->engine->sched_engine;
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry_safe(rq, rn,
&ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
continue;
list_del_init(&rq->sched.link);
spin_unlock(&ce->guc_active.lock);
__i915_request_unsubmit(rq);
/* Push the request back into the queue for later resubmission. */
GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
if (rq_prio(rq) != prio) {
prio = rq_prio(rq);
pl = i915_sched_lookup_priolist(sched_engine, prio);
}
GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
list_add_tail(&rq->sched.link, pl);
set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Push back any incomplete requests for replay after the reset. */
- rq = execlists_unwind_incomplete_requests(execlists);
- if (!rq)
goto out_unlock;
spin_lock(&ce->guc_active.lock);
- }
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static struct i915_request *context_find_active_request(struct intel_context *ce) +{
- struct i915_request *rq, *active = NULL;
- unsigned long flags;
- spin_lock_irqsave(&ce->guc_active.lock, flags);
- list_for_each_entry_reverse(rq, &ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
break;
active = rq;
- }
- spin_unlock_irqrestore(&ce->guc_active.lock, flags);
- return active;
+}
+static void __guc_reset_context(struct intel_context *ce, bool stalled) +{
- struct i915_request *rq;
- u32 head;
- /*
* GuC will implicitly mark the context as non-schedulable
* when it sends the reset notification. Make sure our state
* reflects this change. The context will be marked enabled
* on resubmission.
*/
- clr_context_enabled(ce);
- rq = context_find_active_request(ce);
- if (!rq) {
head = ce->ring->tail;
stalled = false;
goto out_replay;
- } if (!i915_request_started(rq)) stalled = false;
- GEM_BUG_ON(i915_active_is_idle(&ce->active));
- head = intel_ring_wrap(ce->ring, rq->head); __i915_request_reset(rq, stalled);
- guc_reset_state(rq->context, engine, rq->head, stalled);
-out_unlock:
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
- guc_reset_state(ce, head, stalled);
- __unwind_incomplete_requests(ce); }
-static void guc_reset_cancel(struct intel_engine_cs *engine) +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled) +{
- struct intel_context *ce;
- unsigned long index;
- if (unlikely(!guc_submission_initialized(guc)))
/* Reset called during driver load? GuC not yet initialised! */
return;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
__guc_reset_context(ce, stalled);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+static void guc_cancel_context_requests(struct intel_context *ce) +{
- struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
- struct i915_request *rq;
- unsigned long flags;
- /* Mark all executing requests as skipped. */
- spin_lock_irqsave(&sched_engine->lock, flags);
- spin_lock(&ce->guc_active.lock);
- list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
i915_request_put(i915_request_mark_eio(rq));
- spin_unlock(&ce->guc_active.lock);
- spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+static void +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine) {
- struct i915_sched_engine * const sched_engine = engine->sched_engine; struct i915_request *rq, *rn; struct rb_node *rb; unsigned long flags; /* Can be called during boot if GuC fails to load */
- if (!engine->gt)
- if (!sched_engine) return;
- ENGINE_TRACE(engine, "\n");
- /*
- Before we call engine->cancel_requests(), we should have exclusive
- access to the submission state. This is arranged for us by the
@@ -553,21 +834,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) */ spin_lock_irqsave(&sched_engine->lock, flags);
- /* Mark all executing requests as skipped. */
- list_for_each_entry(rq, &sched_engine->requests, sched.link) {
i915_request_set_error_once(rq, -EIO);
i915_request_mark_complete(rq);
- }
- /* Flush the queued requests to the timeline list (for retiring). */ while ((rb = rb_first_cached(&sched_engine->queue))) { struct i915_priolist *p = to_priolist(rb); priolist_for_each_request_consume(rq, rn, p) { list_del_init(&rq->sched.link);
__i915_request_submit(rq);
dma_fence_set_error(&rq->fence, -EIO);
i915_request_mark_complete(rq);
i915_request_put(i915_request_mark_eio(rq)); } rb_erase_cached(&p->node, &sched_engine->queue);
@@ -582,14 +858,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine) spin_unlock_irqrestore(&sched_engine->lock, flags); } -static void guc_reset_finish(struct intel_engine_cs *engine) +void intel_guc_submission_cancel_requests(struct intel_guc *guc) {
- if (__tasklet_enable(&engine->sched_engine->tasklet))
/* And kick in case we missed a new request submission. */
tasklet_hi_schedule(&engine->sched_engine->tasklet);
- struct intel_context *ce;
- unsigned long index;
- xa_for_each(&guc->context_lookup, index, ce)
if (intel_context_is_pinned(ce))
guc_cancel_context_requests(ce);
- ENGINE_TRACE(engine, "depth->%d\n",
atomic_read(&engine->sched_engine->tasklet.count));
- guc_cancel_sched_engine_requests(guc->sched_engine);
- /* GuC is blown away, drop all references to contexts */
- xa_destroy(&guc->context_lookup);
+}
+void intel_guc_submission_reset_finish(struct intel_guc *guc) +{
- /* Reset called during driver load or during wedge? */
- if (unlikely(!guc_submission_initialized(guc) ||
test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
return;
- /*
* Technically possible for either of these values to be non-zero here,
* but very unlikely + harmless. Regardless let's add a warn so we can
* see in CI if this happens frequently / a precursor to taking down the
* machine.
*/
- GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
- atomic_set(&guc->outstanding_submission_g2h, 0);
- enable_submission(guc); } /*
@@ -656,6 +956,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc, else trace_i915_request_guc_submit(rq);
- if (unlikely(ret == -EIO))
disable_submission(guc);
- return ret; }
@@ -668,7 +971,8 @@ static void guc_submit_request(struct i915_request *rq) /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&sched_engine->lock, flags);
- if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
- if (submission_disabled(guc) || guc->stalled_request ||
else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY) tasklet_hi_schedule(&sched_engine->tasklet);!i915_sched_engine_is_empty(sched_engine)) queue_request(sched_engine, rq, rq_prio(rq));
@@ -805,7 +1109,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce) static int __guc_action_register_context(struct intel_guc *guc, u32 guc_id,
u32 offset)
u32 offset,
{ u32 action[] = { INTEL_GUC_ACTION_REGISTER_CONTEXT,bool loop)
@@ -813,10 +1118,10 @@ static int __guc_action_register_context(struct intel_guc *guc, offset, };
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
- return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), 0, loop); }
-static int register_context(struct intel_context *ce) +static int register_context(struct intel_context *ce, bool loop) { struct intel_guc *guc = ce_to_guc(ce); u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) + @@ -824,11 +1129,12 @@ static int register_context(struct intel_context *ce) trace_intel_context_register(ce);
- return __guc_action_register_context(guc, ce->guc_id, offset);
- return __guc_action_register_context(guc, ce->guc_id, offset, loop); } static int __guc_action_deregister_context(struct intel_guc *guc,
u32 guc_id)
u32 guc_id,
{ u32 action[] = { INTEL_GUC_ACTION_DEREGISTER_CONTEXT,bool loop)
@@ -836,16 +1142,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc, }; return guc_submission_busy_loop(guc, action, ARRAY_SIZE(action),
G2H_LEN_DW_DEREGISTER_CONTEXT, true);
}G2H_LEN_DW_DEREGISTER_CONTEXT, loop);
-static int deregister_context(struct intel_context *ce, u32 guc_id) +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop) { struct intel_guc *guc = ce_to_guc(ce); trace_intel_context_deregister(ce);
- return __guc_action_deregister_context(guc, guc_id);
- return __guc_action_deregister_context(guc, guc_id, loop); } static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -874,7 +1180,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US; } -static int guc_lrc_desc_pin(struct intel_context *ce) +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) { struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm; @@ -920,18 +1226,44 @@ static int guc_lrc_desc_pin(struct intel_context *ce) */ if (context_registered) { trace_intel_context_steal_guc_id(ce);
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
if (!loop) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
} else {
bool disabled;
unsigned long flags;
/* Seal race with Reset */
spin_lock_irqsave(&ce->guc_state.lock, flags);
disabled = submission_disabled(guc);
if (likely(!disabled)) {
set_context_wait_for_deregister_to_register(ce);
intel_context_get(ce);
}
spin_unlock_irqrestore(&ce->guc_state.lock, flags);
if (unlikely(disabled)) {
reset_lrc_desc(guc, desc_idx);
return 0; /* Will get registered later */
}
} /* * If stealing the guc_id, this ce has the same guc_id as the * context whos guc_id was stole. */ with_intel_runtime_pm(runtime_pm, wakeref)
ret = deregister_context(ce, ce->guc_id);
ret = deregister_context(ce, ce->guc_id, loop);
if (unlikely(ret == -EBUSY)) {
clr_context_wait_for_deregister_to_register(ce);
intel_context_put(ce);
} else { with_intel_runtime_pm(runtime_pm, wakeref)}
ret = register_context(ce);
ret = register_context(ce, loop);
if (unlikely(ret == -EBUSY))
reset_lrc_desc(guc, desc_idx);
else if (unlikely(ret == -ENODEV))
} return ret;ret = 0; /* Will get registered later */
@@ -994,7 +1326,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc, GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID); trace_intel_context_sched_disable(ce);
- intel_context_get(ce); guc_submission_busy_loop(guc, action, ARRAY_SIZE(action), G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1004,6 +1335,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce) { set_context_pending_disable(ce); clr_context_enabled(ce);
- intel_context_get(ce); return ce->guc_id; }
@@ -1016,7 +1348,7 @@ static void guc_context_sched_disable(struct intel_context *ce) u16 guc_id; intel_wakeref_t wakeref;
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) || context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { clr_context_enabled(ce); goto unpin;
@@ -1034,6 +1366,7 @@ static void guc_context_sched_disable(struct intel_context *ce) * request doesn't slip through the 'context_pending_disable' fence. */ if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
} guc_id = prep_context_pending_disable(ce);spin_unlock_irqrestore(&ce->guc_state.lock, flags); return;
@@ -1050,19 +1383,13 @@ static void guc_context_sched_disable(struct intel_context *ce) static inline void guc_lrc_desc_unpin(struct intel_context *ce) {
- struct intel_engine_cs *engine = ce->engine;
- struct intel_guc *guc = &engine->gt->uc.guc;
- unsigned long flags;
- struct intel_guc *guc = ce_to_guc(ce); GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id)); GEM_BUG_ON(ce != __get_context(guc, ce->guc_id)); GEM_BUG_ON(context_enabled(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- deregister_context(ce, ce->guc_id);
- deregister_context(ce, ce->guc_id, true); } static void __guc_context_destroy(struct intel_context *ce)
@@ -1090,13 +1417,15 @@ static void guc_context_destroy(struct kref *kref) struct intel_guc *guc = &ce->engine->gt->uc.guc; intel_wakeref_t wakeref; unsigned long flags;
- bool disabled; /*
*/
- If the guc_id is invalid this context has been stolen and we can free
- it immediately. Also can be freed immediately if the context is not
- registered with the GuC.
- if (context_guc_id_invalid(ce) ||
- if (submission_disabled(guc) ||
context_guc_id_invalid(ce) || !lrc_desc_registered(guc, ce->guc_id)) { release_guc_id(guc, ce); __guc_context_destroy(ce);
@@ -1123,6 +1452,18 @@ static void guc_context_destroy(struct kref *kref) list_del_init(&ce->guc_id_link); spin_unlock_irqrestore(&guc->contexts_lock, flags);
- /* Seal race with Reset */
- spin_lock_irqsave(&ce->guc_state.lock, flags);
- disabled = submission_disabled(guc);
- if (likely(!disabled))
set_context_destroyed(ce);
- spin_unlock_irqrestore(&ce->guc_state.lock, flags);
- if (unlikely(disabled)) {
release_guc_id(guc, ce);
__guc_context_destroy(ce);
return;
- }
- /*
- We defer GuC context deregistration until the context is destroyed
- in order to save on CTBs. With this optimization ideally we only need
@@ -1145,6 +1486,33 @@ static int guc_context_alloc(struct intel_context *ce) return lrc_alloc(ce, ce->engine); } +static void add_to_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock(&ce->guc_active.lock);
- list_move_tail(&rq->sched.link, &ce->guc_active.requests);
- spin_unlock(&ce->guc_active.lock);
+}
+static void remove_from_context(struct i915_request *rq) +{
- struct intel_context *ce = rq->context;
- spin_lock_irq(&ce->guc_active.lock);
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&ce->guc_active.lock);
- atomic_dec(&ce->guc_id_ref);
- i915_request_notify_execute_cb_imm(rq);
+}
- static const struct intel_context_ops guc_context_ops = { .alloc = guc_context_alloc,
@@ -1183,8 +1551,6 @@ static void guc_signal_context_fence(struct intel_context *ce) { unsigned long flags;
- GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
- spin_lock_irqsave(&ce->guc_state.lock, flags); clr_context_wait_for_deregister_to_register(ce); __guc_signal_context_fence(ce);
@@ -1193,8 +1559,9 @@ static void guc_signal_context_fence(struct intel_context *ce) static bool context_needs_register(struct intel_context *ce, bool new_guc_id) {
- return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
- return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
} static int guc_request_alloc(struct i915_request *rq)!submission_disabled(ce_to_guc(ce));
@@ -1252,8 +1619,12 @@ static int guc_request_alloc(struct i915_request *rq) if (unlikely(ret < 0)) return ret;; if (context_needs_register(ce, !!ret)) {
ret = guc_lrc_desc_pin(ce);
ret = guc_lrc_desc_pin(ce, true); if (unlikely(ret)) { /* unwind */
if (ret == -EIO) {
disable_submission(guc);
goto out; /* GPU will be reset */
} atomic_dec(&ce->guc_id_ref); unpin_guc_id(guc, ce); return ret;
@@ -1290,20 +1661,6 @@ static int guc_request_alloc(struct i915_request *rq) return 0; } -static struct intel_engine_cs * -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling) -{
- struct intel_engine_cs *engine;
- intel_engine_mask_t tmp, mask = ve->mask;
- unsigned int num_siblings = 0;
- for_each_engine_masked(engine, ve->gt, mask, tmp)
if (num_siblings++ == sibling)
return engine;
- return NULL;
-}
- static int guc_virtual_context_pre_pin(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr)
@@ -1512,7 +1869,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc, { if (context_guc_id_invalid(ce)) pin_guc_id(guc, ce);
- guc_lrc_desc_pin(ce);
- guc_lrc_desc_pin(ce, true); } static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1578,13 +1935,15 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->cops = &guc_context_ops; engine->request_alloc = guc_request_alloc; engine->bump_serial = guc_bump_serial;
- engine->add_active_request = add_to_context;
- engine->remove_active_request = remove_from_context; engine->sched_engine->schedule = i915_schedule;
- engine->reset.prepare = guc_reset_prepare;
- engine->reset.rewind = guc_reset_rewind;
- engine->reset.cancel = guc_reset_cancel;
- engine->reset.finish = guc_reset_finish;
- engine->reset.prepare = guc_reset_nop;
- engine->reset.rewind = guc_rewind_nop;
- engine->reset.cancel = guc_reset_nop;
- engine->reset.finish = guc_reset_nop; engine->emit_flush = gen8_emit_flush_xcs; engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1757,7 +2116,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, * register this context. */ with_intel_runtime_pm(runtime_pm, wakeref)
register_context(ce);
} else if (context_destroyed(ce)) {register_context(ce, true); guc_signal_context_fence(ce); intel_context_put(ce);
@@ -1939,6 +2298,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count) "v%dx%d", ve->base.class, count); ve->base.context_size = sibling->context_size;
ve->base.add_active_request =
sibling->add_active_request;
ve->base.remove_active_request =
sibling->remove_active_request; ve->base.emit_bb_start = sibling->emit_bb_start; ve->base.emit_flush = sibling->emit_flush; ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 6d8b9233214e..f0b02200aa01 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
- if (!intel_guc_is_ready(guc))
- /* Nothing to do if GuC isn't supported */
- if (!intel_uc_supports_guc(uc)) return;
- /* Firmware expected to be running when this function is called */
- if (!intel_guc_is_ready(guc))
goto sanitize;
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_prepare(guc);
+sanitize: __uc_sanitize(uc); } +void intel_uc_reset(struct intel_uc *uc, bool stalled) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset(guc, stalled);
+}
+void intel_uc_reset_finish(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware expected to be running when this function is called */
- if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
intel_guc_submission_reset_finish(guc);
+}
+void intel_uc_cancel_requests(struct intel_uc *uc) +{
- struct intel_guc *guc = &uc->guc;
- /* Firmware can not be running when this function is called */
- if (intel_uc_uses_guc_submission(uc))
intel_guc_submission_cancel_requests(guc);
+}
- void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index c4cef885e984..eaa3202192ac 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc); void intel_uc_driver_remove(struct intel_uc *uc); void intel_uc_init_mmio(struct intel_uc *uc); void intel_uc_reset_prepare(struct intel_uc *uc); +void intel_uc_reset(struct intel_uc *uc, bool stalled); +void intel_uc_reset_finish(struct intel_uc *uc); +void intel_uc_cancel_requests(struct intel_uc *uc); void intel_uc_suspend(struct intel_uc *uc); void intel_uc_runtime_suspend(struct intel_uc *uc); int intel_uc_resume(struct intel_uc *uc); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index dec5a35c9aa2..192784875a1d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -194,7 +194,7 @@ static bool irq_work_imm(struct irq_work *wrk) return false; } -static void __notify_execute_cb_imm(struct i915_request *rq) +void i915_request_notify_execute_cb_imm(struct i915_request *rq) { __notify_execute_cb(rq, irq_work_imm); } @@ -268,37 +268,6 @@ i915_request_active_engine(struct i915_request *rq, return ret; }
-static void remove_from_engine(struct i915_request *rq) -{
- struct intel_engine_cs *engine, *locked;
- /*
* Virtual engines complicate acquiring the engine timeline lock,
* as their rq->engine pointer is not stable until under that
* engine lock. The simple ploy we use is to take the lock then
* check that the rq still belongs to the newly locked engine.
*/
- locked = READ_ONCE(rq->engine);
- spin_lock_irq(&locked->sched_engine->lock);
- while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
spin_unlock(&locked->sched_engine->lock);
spin_lock(&engine->sched_engine->lock);
locked = engine;
- }
- list_del_init(&rq->sched.link);
- clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
- clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
- /* Prevent further __await_execution() registering a cb, then flush */
- set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
- spin_unlock_irq(&locked->sched_engine->lock);
- __notify_execute_cb_imm(rq);
-}
- static void __rq_init_watchdog(struct i915_request *rq) { rq->watchdog.timer.function = NULL;
@@ -395,9 +364,7 @@ bool i915_request_retire(struct i915_request *rq) * after removing the breadcrumb and signaling it, so that we do not * inadvertently attach the breadcrumb to a completed request. */
- if (!list_empty(&rq->sched.link))
remove_from_engine(rq);
- atomic_dec(&rq->context->guc_id_ref);
- rq->engine->remove_active_request(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb)); __list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -539,7 +506,7 @@ __await_execution(struct i915_request *rq, if (llist_add(&cb->work.node.llist, &signal->execute_cb)) { if (i915_request_is_active(signal) || __request_in_flight(signal))
__notify_execute_cb_imm(signal);
} return 0;i915_request_notify_execute_cb_imm(signal);
@@ -676,7 +643,7 @@ bool __i915_request_submit(struct i915_request *request) result = true; GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
- list_move_tail(&request->sched.link, &engine->sched_engine->requests);
- engine->add_active_request(request); active: clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index f870cd75a001..bcc6340c505e 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -649,4 +649,6 @@ bool i915_request_active_engine(struct i915_request *rq, struct intel_engine_cs **active); +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
- #endif /* I915_REQUEST_H */
If submission is disabled by the backend for any reason, reset the GPU immediately in the heartbeat code as the backend can't be reenabled until the GPU is reset.
Signed-off-by: Matthew Brost matthew.brost@intel.com --- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 63 +++++++++++++++---- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 9 +++ drivers/gpu/drm/i915/i915_scheduler.c | 6 ++ drivers/gpu/drm/i915/i915_scheduler.h | 6 ++ drivers/gpu/drm/i915/i915_scheduler_types.h | 5 ++ 6 files changed, 80 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index b6a305e6a974..a8495364d906 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -70,12 +70,30 @@ static void show_heartbeat(const struct i915_request *rq, { struct drm_printer p = drm_debug_printer("heartbeat");
- intel_engine_dump(engine, &p, - "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n", - engine->name, - rq->fence.context, - rq->fence.seqno, - rq->sched.attr.priority); + if (!rq) { + intel_engine_dump(engine, &p, + "%s heartbeat not ticking\n", + engine->name); + } else { + intel_engine_dump(engine, &p, + "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n", + engine->name, + rq->fence.context, + rq->fence.seqno, + rq->sched.attr.priority); + } +} + +static void +reset_engine(struct intel_engine_cs *engine, struct i915_request *rq) +{ + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) + show_heartbeat(rq, engine); + + intel_gt_handle_error(engine->gt, engine->mask, + I915_ERROR_CAPTURE, + "stopped heartbeat on %s", + engine->name); }
static void heartbeat(struct work_struct *wrk) @@ -102,6 +120,11 @@ static void heartbeat(struct work_struct *wrk) if (intel_gt_is_wedged(engine->gt)) goto out;
+ if (i915_sched_engine_disabled(engine->sched_engine)) { + reset_engine(engine, engine->heartbeat.systole); + goto out; + } + if (engine->heartbeat.systole) { long delay = READ_ONCE(engine->props.heartbeat_interval_ms);
@@ -139,13 +162,7 @@ static void heartbeat(struct work_struct *wrk) engine->sched_engine->schedule(rq, &attr); local_bh_enable(); } else { - if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) - show_heartbeat(rq, engine); - - intel_gt_handle_error(engine->gt, engine->mask, - I915_ERROR_CAPTURE, - "stopped heartbeat on %s", - engine->name); + reset_engine(engine, rq); }
rq->emitted_jiffies = jiffies; @@ -194,6 +211,26 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine) i915_request_put(fetch_and_zero(&engine->heartbeat.systole)); }
+void intel_gt_unpark_heartbeats(struct intel_gt *gt) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, gt, id) + if (intel_engine_pm_is_awake(engine)) + intel_engine_unpark_heartbeat(engine); + +} + +void intel_gt_park_heartbeats(struct intel_gt *gt) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, gt, id) + intel_engine_park_heartbeat(engine); +} + void intel_engine_init_heartbeat(struct intel_engine_cs *engine) { INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h index a488ea3e84a3..5da6d809a87a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h @@ -7,6 +7,7 @@ #define INTEL_ENGINE_HEARTBEAT_H
struct intel_engine_cs; +struct intel_gt;
void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
@@ -16,6 +17,9 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine, void intel_engine_park_heartbeat(struct intel_engine_cs *engine); void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine);
+void intel_gt_park_heartbeats(struct intel_gt *gt); +void intel_gt_unpark_heartbeats(struct intel_gt *gt); + int intel_engine_pulse(struct intel_engine_cs *engine); int intel_engine_flush_barriers(struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index b8c894ad8caf..59fca9748c15 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -10,6 +10,7 @@ #include "gt/intel_breadcrumbs.h" #include "gt/intel_context.h" #include "gt/intel_engine_pm.h" +#include "gt/intel_engine_heartbeat.h" #include "gt/intel_gt.h" #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm.h" @@ -605,6 +606,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc) /* Reset called during driver load? GuC not yet initialised! */ return;
+ intel_gt_park_heartbeats(guc_to_gt(guc)); disable_submission(guc); guc->interrupts.disable(guc);
@@ -890,6 +892,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc) atomic_set(&guc->outstanding_submission_g2h, 0);
enable_submission(guc); + intel_gt_unpark_heartbeats(guc_to_gt(guc)); }
/* @@ -1859,6 +1862,11 @@ static int guc_resume(struct intel_engine_cs *engine) return 0; }
+static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine) +{ + return !sched_engine->tasklet.callback; +} + static void guc_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = guc_submit_request; @@ -2009,6 +2017,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) return -ENOMEM;
guc->sched_engine->schedule = i915_schedule; + guc->sched_engine->disabled = guc_sched_engine_disabled; guc->sched_engine->private_data = guc; tasklet_setup(&guc->sched_engine->tasklet, guc_submission_tasklet); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 3a58a9130309..3fb009ea2cb2 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -440,6 +440,11 @@ void i915_sched_engine_free(struct kref *kref) kfree(sched_engine); }
+static bool default_disabled(struct i915_sched_engine *sched_engine) +{ + return false; +} + struct i915_sched_engine * i915_sched_engine_create(unsigned int subclass) { @@ -453,6 +458,7 @@ i915_sched_engine_create(unsigned int subclass)
sched_engine->queue = RB_ROOT_CACHED; sched_engine->queue_priority_hint = INT_MIN; + sched_engine->disabled = default_disabled;
INIT_LIST_HEAD(&sched_engine->requests); INIT_LIST_HEAD(&sched_engine->hold); diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 650ab8e0db9f..72105a53b0e1 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -98,4 +98,10 @@ void i915_request_show_with_schedule(struct drm_printer *m, const char *prefix, int indent);
+static inline bool +i915_sched_engine_disabled(struct i915_sched_engine *sched_engine) +{ + return sched_engine->disabled(sched_engine); +} + #endif /* _I915_SCHEDULER_H_ */ diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 5935c3152bdc..cfaf52e528d0 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -163,6 +163,11 @@ struct i915_sched_engine { */ void *private_data;
+ /** + * @disabled: check if backend has disabled submission + */ + bool (*disabled)(struct i915_sched_engine *sched_engine); + /** * @kick_backend: kick backend after a request's priority has changed */
On 6/24/2021 00:05, Matthew Brost wrote:
If submission is disabled by the backend for any reason, reset the GPU immediately in the heartbeat code as the backend can't be reenabled until the GPU is reset.
Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: John Harrison John.C.Harrison@Intel.com
.../gpu/drm/i915/gt/intel_engine_heartbeat.c | 63 +++++++++++++++---- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 9 +++ drivers/gpu/drm/i915/i915_scheduler.c | 6 ++ drivers/gpu/drm/i915/i915_scheduler.h | 6 ++ drivers/gpu/drm/i915/i915_scheduler_types.h | 5 ++ 6 files changed, 80 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index b6a305e6a974..a8495364d906 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -70,12 +70,30 @@ static void show_heartbeat(const struct i915_request *rq, { struct drm_printer p = drm_debug_printer("heartbeat");
- intel_engine_dump(engine, &p,
"%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
engine->name,
rq->fence.context,
rq->fence.seqno,
rq->sched.attr.priority);
- if (!rq) {
intel_engine_dump(engine, &p,
"%s heartbeat not ticking\n",
engine->name);
- } else {
intel_engine_dump(engine, &p,
"%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
engine->name,
rq->fence.context,
rq->fence.seqno,
rq->sched.attr.priority);
- }
+}
+static void +reset_engine(struct intel_engine_cs *engine, struct i915_request *rq) +{
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
show_heartbeat(rq, engine);
intel_gt_handle_error(engine->gt, engine->mask,
I915_ERROR_CAPTURE,
"stopped heartbeat on %s",
engine->name);
}
static void heartbeat(struct work_struct *wrk)
@@ -102,6 +120,11 @@ static void heartbeat(struct work_struct *wrk) if (intel_gt_is_wedged(engine->gt)) goto out;
- if (i915_sched_engine_disabled(engine->sched_engine)) {
reset_engine(engine, engine->heartbeat.systole);
goto out;
- }
- if (engine->heartbeat.systole) { long delay = READ_ONCE(engine->props.heartbeat_interval_ms);
@@ -139,13 +162,7 @@ static void heartbeat(struct work_struct *wrk) engine->sched_engine->schedule(rq, &attr); local_bh_enable(); } else {
if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
show_heartbeat(rq, engine);
intel_gt_handle_error(engine->gt, engine->mask,
I915_ERROR_CAPTURE,
"stopped heartbeat on %s",
engine->name);
reset_engine(engine, rq);
}
rq->emitted_jiffies = jiffies;
@@ -194,6 +211,26 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine) i915_request_put(fetch_and_zero(&engine->heartbeat.systole)); }
+void intel_gt_unpark_heartbeats(struct intel_gt *gt) +{
- struct intel_engine_cs *engine;
- enum intel_engine_id id;
- for_each_engine(engine, gt, id)
if (intel_engine_pm_is_awake(engine))
intel_engine_unpark_heartbeat(engine);
+}
+void intel_gt_park_heartbeats(struct intel_gt *gt) +{
- struct intel_engine_cs *engine;
- enum intel_engine_id id;
- for_each_engine(engine, gt, id)
intel_engine_park_heartbeat(engine);
+}
- void intel_engine_init_heartbeat(struct intel_engine_cs *engine) { INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h index a488ea3e84a3..5da6d809a87a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h @@ -7,6 +7,7 @@ #define INTEL_ENGINE_HEARTBEAT_H
struct intel_engine_cs; +struct intel_gt;
void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
@@ -16,6 +17,9 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine, void intel_engine_park_heartbeat(struct intel_engine_cs *engine); void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine);
+void intel_gt_park_heartbeats(struct intel_gt *gt); +void intel_gt_unpark_heartbeats(struct intel_gt *gt);
- int intel_engine_pulse(struct intel_engine_cs *engine); int intel_engine_flush_barriers(struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index b8c894ad8caf..59fca9748c15 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -10,6 +10,7 @@ #include "gt/intel_breadcrumbs.h" #include "gt/intel_context.h" #include "gt/intel_engine_pm.h" +#include "gt/intel_engine_heartbeat.h" #include "gt/intel_gt.h" #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm.h" @@ -605,6 +606,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc) /* Reset called during driver load? GuC not yet initialised! */ return;
- intel_gt_park_heartbeats(guc_to_gt(guc)); disable_submission(guc); guc->interrupts.disable(guc);
@@ -890,6 +892,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc) atomic_set(&guc->outstanding_submission_g2h, 0);
enable_submission(guc);
intel_gt_unpark_heartbeats(guc_to_gt(guc)); }
/*
@@ -1859,6 +1862,11 @@ static int guc_resume(struct intel_engine_cs *engine) return 0; }
+static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine) +{
- return !sched_engine->tasklet.callback;
+}
- static void guc_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = guc_submit_request;
@@ -2009,6 +2017,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine) return -ENOMEM;
guc->sched_engine->schedule = i915_schedule;
guc->sched_engine->private_data = guc; tasklet_setup(&guc->sched_engine->tasklet, guc_submission_tasklet);guc->sched_engine->disabled = guc_sched_engine_disabled;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 3a58a9130309..3fb009ea2cb2 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -440,6 +440,11 @@ void i915_sched_engine_free(struct kref *kref) kfree(sched_engine); }
+static bool default_disabled(struct i915_sched_engine *sched_engine) +{
- return false;
+}
- struct i915_sched_engine * i915_sched_engine_create(unsigned int subclass) {
@@ -453,6 +458,7 @@ i915_sched_engine_create(unsigned int subclass)
sched_engine->queue = RB_ROOT_CACHED; sched_engine->queue_priority_hint = INT_MIN;
sched_engine->disabled = default_disabled;
INIT_LIST_HEAD(&sched_engine->requests); INIT_LIST_HEAD(&sched_engine->hold);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h index 650ab8e0db9f..72105a53b0e1 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.h +++ b/drivers/gpu/drm/i915/i915_scheduler.h @@ -98,4 +98,10 @@ void i915_request_show_with_schedule(struct drm_printer *m, const char *prefix, int indent);
+static inline bool +i915_sched_engine_disabled(struct i915_sched_engine *sched_engine) +{
- return sched_engine->disabled(sched_engine);
+}
- #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 5935c3152bdc..cfaf52e528d0 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -163,6 +163,11 @@ struct i915_sched_engine { */ void *private_data;
- /**
* @disabled: check if backend has disabled submission
*/
- bool (*disabled)(struct i915_sched_engine *sched_engine);
- /**
*/
- @kick_backend: kick backend after a request's priority has changed
Add disable GuC interrupts to intel_guc_sanitize(). Part of this requires moving the guc_*_interrupt wrapper function into header file intel_guc.h.
Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 16 ++++++++++++++++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 21 +++------------------ 2 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 40c9868762d7..85ef6767f13b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -217,9 +217,25 @@ static inline bool intel_guc_is_ready(struct intel_guc *guc) return intel_guc_is_fw_running(guc) && intel_guc_ct_enabled(&guc->ct); }
+static inline void intel_guc_reset_interrupts(struct intel_guc *guc) +{ + guc->interrupts.reset(guc); +} + +static inline void intel_guc_enable_interrupts(struct intel_guc *guc) +{ + guc->interrupts.enable(guc); +} + +static inline void intel_guc_disable_interrupts(struct intel_guc *guc) +{ + guc->interrupts.disable(guc); +} + static inline int intel_guc_sanitize(struct intel_guc *guc) { intel_uc_fw_sanitize(&guc->fw); + intel_guc_disable_interrupts(guc); intel_guc_ct_sanitize(&guc->ct); guc->mmio_msg = 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index f0b02200aa01..ab11fe731ee7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -207,21 +207,6 @@ static void guc_handle_mmio_msg(struct intel_guc *guc) spin_unlock_irq(&guc->irq_lock); }
-static void guc_reset_interrupts(struct intel_guc *guc) -{ - guc->interrupts.reset(guc); -} - -static void guc_enable_interrupts(struct intel_guc *guc) -{ - guc->interrupts.enable(guc); -} - -static void guc_disable_interrupts(struct intel_guc *guc) -{ - guc->interrupts.disable(guc); -} - static int guc_enable_communication(struct intel_guc *guc) { struct intel_gt *gt = guc_to_gt(guc); @@ -242,7 +227,7 @@ static int guc_enable_communication(struct intel_guc *guc) guc_get_mmio_msg(guc); guc_handle_mmio_msg(guc);
- guc_enable_interrupts(guc); + intel_guc_enable_interrupts(guc);
/* check for CT messages received before we enabled interrupts */ spin_lock_irq(>->irq_lock); @@ -265,7 +250,7 @@ static void guc_disable_communication(struct intel_guc *guc) */ guc_clear_mmio_msg(guc);
- guc_disable_interrupts(guc); + intel_guc_disable_interrupts(guc);
intel_guc_ct_disable(&guc->ct);
@@ -463,7 +448,7 @@ static int __uc_init_hw(struct intel_uc *uc) if (ret) goto err_out;
- guc_reset_interrupts(guc); + intel_guc_reset_interrupts(guc);
/* WaEnableuKernelHeaderValidFix:skl */ /* WaEnableGuCBootHashCheckNotSet:skl,bxt,kbl */
On 6/24/2021 00:05, Matthew Brost wrote:
Add disable GuC interrupts to intel_guc_sanitize(). Part of this requires moving the guc_*_interrupt wrapper function into header file intel_guc.h.
Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com
Reviewed-by: John Harrison John.C.Harrison@Intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 16 ++++++++++++++++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 21 +++------------------ 2 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 40c9868762d7..85ef6767f13b 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -217,9 +217,25 @@ static inline bool intel_guc_is_ready(struct intel_guc *guc) return intel_guc_is_fw_running(guc) && intel_guc_ct_enabled(&guc->ct); }
+static inline void intel_guc_reset_interrupts(struct intel_guc *guc) +{
- guc->interrupts.reset(guc);
+}
+static inline void intel_guc_enable_interrupts(struct intel_guc *guc) +{
- guc->interrupts.enable(guc);
+}
+static inline void intel_guc_disable_interrupts(struct intel_guc *guc) +{
- guc->interrupts.disable(guc);
+}
- static inline int intel_guc_sanitize(struct intel_guc *guc) { intel_uc_fw_sanitize(&guc->fw);
- intel_guc_disable_interrupts(guc); intel_guc_ct_sanitize(&guc->ct); guc->mmio_msg = 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index f0b02200aa01..ab11fe731ee7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -207,21 +207,6 @@ static void guc_handle_mmio_msg(struct intel_guc *guc) spin_unlock_irq(&guc->irq_lock); }
-static void guc_reset_interrupts(struct intel_guc *guc) -{
- guc->interrupts.reset(guc);
-}
-static void guc_enable_interrupts(struct intel_guc *guc) -{
- guc->interrupts.enable(guc);
-}
-static void guc_disable_interrupts(struct intel_guc *guc) -{
- guc->interrupts.disable(guc);
-}
- static int guc_enable_communication(struct intel_guc *guc) { struct intel_gt *gt = guc_to_gt(guc);
@@ -242,7 +227,7 @@ static int guc_enable_communication(struct intel_guc *guc) guc_get_mmio_msg(guc); guc_handle_mmio_msg(guc);
- guc_enable_interrupts(guc);
intel_guc_enable_interrupts(guc);
/* check for CT messages received before we enabled interrupts */ spin_lock_irq(>->irq_lock);
@@ -265,7 +250,7 @@ static void guc_disable_communication(struct intel_guc *guc) */ guc_clear_mmio_msg(guc);
- guc_disable_interrupts(guc);
intel_guc_disable_interrupts(guc);
intel_guc_ct_disable(&guc->ct);
@@ -463,7 +448,7 @@ static int __uc_init_hw(struct intel_uc *uc) if (ret) goto err_out;
- guc_reset_interrupts(guc);
intel_guc_reset_interrupts(guc);
/* WaEnableuKernelHeaderValidFix:skl */ /* WaEnableGuCBootHashCheckNotSet:skl,bxt,kbl */
The new GuC interface introduces an MMIO H2G command, INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This MMIO tears down any active contexts generating a context reset G2H CTB for each. Once that step completes the GuC tears down the CTB channels. It is safe to suspend once this MMIO H2G command completes and all G2H CTBs have been processed. In practice the i915 will likely never receive a G2H as suspend should only be called after the GPU is idle.
Resume is implemented in the same manner as before - simply reload the GuC firmware and reinitialize everything (e.g. CTB channels, contexts, etc..).
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com --- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 64 ++++++++----------- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 5 ++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 20 ++++-- 5 files changed, 53 insertions(+), 51 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h index 57e18babdf4b..596cf4b818e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h @@ -142,6 +142,7 @@ enum intel_guc_action { INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506, INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600, + INTEL_GUC_ACTION_RESET_CLIENT = 0x5B01, INTEL_GUC_ACTION_LIMIT };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 9b09395b998f..68266cbffd1f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -524,51 +524,34 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset) */ int intel_guc_suspend(struct intel_guc *guc) { - struct intel_uncore *uncore = guc_to_gt(guc)->uncore; int ret; - u32 status; u32 action[] = { - INTEL_GUC_ACTION_ENTER_S_STATE, - GUC_POWER_D1, /* any value greater than GUC_POWER_D0 */ + INTEL_GUC_ACTION_RESET_CLIENT, };
- /* - * If GuC communication is enabled but submission is not supported, - * we do not need to suspend the GuC. - */ - if (!intel_guc_submission_is_used(guc) || !intel_guc_is_ready(guc)) + if (!intel_guc_is_ready(guc)) return 0;
- /* - * The ENTER_S_STATE action queues the save/restore operation in GuC FW - * and then returns, so waiting on the H2G is not enough to guarantee - * GuC is done. When all the processing is done, GuC writes - * INTEL_GUC_SLEEP_STATE_SUCCESS to scratch register 14, so we can poll - * on that. Note that GuC does not ensure that the value in the register - * is different from INTEL_GUC_SLEEP_STATE_SUCCESS while the action is - * in progress so we need to take care of that ourselves as well. - */ - - intel_uncore_write(uncore, SOFT_SCRATCH(14), - INTEL_GUC_SLEEP_STATE_INVALID_MASK); - - ret = intel_guc_send(guc, action, ARRAY_SIZE(action)); - if (ret) - return ret; - - ret = __intel_wait_for_register(uncore, SOFT_SCRATCH(14), - INTEL_GUC_SLEEP_STATE_INVALID_MASK, - 0, 0, 10, &status); - if (ret) - return ret; - - if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) { - DRM_ERROR("GuC failed to change sleep state. " - "action=0x%x, err=%u\n", - action[0], status); - return -EIO; + if (intel_guc_submission_is_used(guc)) { + /* + * This H2G MMIO command tears down the GuC in two steps. First it will + * generate a G2H CTB for every active context indicating a reset. In + * practice the i915 shouldn't ever get a G2H as suspend should only be + * called when the GPU is idle. Next, it tears down the CTBs and this + * H2G MMIO command completes. + * + * Don't abort on a failure code from the GuC. Keep going and do the + * clean up in santize() and re-initialisation on resume and hopefully + * the error here won't be problematic. + */ + ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0); + if (ret) + DRM_ERROR("GuC suspend: RESET_CLIENT action failed with error %d!\n", ret); }
+ /* Signal that the GuC isn't running. */ + intel_guc_sanitize(guc); + return 0; }
@@ -578,7 +561,12 @@ int intel_guc_suspend(struct intel_guc *guc) */ int intel_guc_resume(struct intel_guc *guc) { - /* XXX: to be implemented with submission interface rework */ + /* + * NB: This function can still be called even if GuC submission is + * disabled, e.g. if GuC is enabled for HuC authentication only. Thus, + * if any code is later added here, it must be support doing nothing + * if submission is disabled (as per intel_guc_suspend). + */ return 0; }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 59fca9748c15..16b61fe71b07 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -304,10 +304,10 @@ static int guc_submission_busy_loop(struct intel_guc* guc, return err; }
-static int guc_wait_for_pending_msg(struct intel_guc *guc, - atomic_t *wait_var, - bool interruptible, - long timeout) +int intel_guc_wait_for_pending_msg(struct intel_guc *guc, + atomic_t *wait_var, + bool interruptible, + long timeout) { const int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE; @@ -352,8 +352,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) if (unlikely(timeout < 0)) timeout = -timeout, interruptible = false;
- return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h, - interruptible, timeout); + return intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h, + interruptible, timeout); }
static int guc_lrc_desc_pin(struct intel_context *ce, bool loop); @@ -625,7 +625,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc) for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) { intel_guc_to_host_event_handler(guc); #define wait_for_reset(guc, wait_var) \ - guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20)) + intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20)) do { wait_for_reset(guc, &guc->outstanding_submission_g2h); } while (!list_empty(&guc->ct.requests.incoming)); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 95df5ab06031..b9b9f0f60f91 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -27,6 +27,11 @@ void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc, + atomic_t *wait_var, + bool interruptible, + long timeout); + static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { /* XXX: GuC submission is unavailable for now */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index ab11fe731ee7..b523a8521351 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -596,14 +596,18 @@ void intel_uc_cancel_requests(struct intel_uc *uc) void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc; - int err;
if (!intel_guc_is_ready(guc)) return;
- err = intel_guc_suspend(guc); - if (err) - DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err); + /* + * Wait for any outstanding CTB before tearing down communication /w the + * GuC. + */ +#define OUTSTANDING_CTB_TIMEOUT_PERIOD (HZ / 5) + intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h, + false, OUTSTANDING_CTB_TIMEOUT_PERIOD); + GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
guc_disable_communication(guc); } @@ -612,12 +616,16 @@ void intel_uc_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc; intel_wakeref_t wakeref; + int err;
if (!intel_guc_is_ready(guc)) return;
- with_intel_runtime_pm(uc_to_gt(uc)->uncore->rpm, wakeref) - intel_uc_runtime_suspend(uc); + with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) { + err = intel_guc_suspend(guc); + if (err) + DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err); + } }
static int __uc_resume(struct intel_uc *uc, bool enable_communication)
On 6/24/2021 00:05, Matthew Brost wrote:
The new GuC interface introduces an MMIO H2G command, INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This MMIO tears down any active contexts generating a context reset G2H CTB for each. Once that step completes the GuC tears down the CTB channels. It is safe to suspend once this MMIO H2G command completes and all G2H CTBs have been processed. In practice the i915 will likely never receive a G2H as suspend should only be called after the GPU is idle.
Resume is implemented in the same manner as before - simply reload the GuC firmware and reinitialize everything (e.g. CTB channels, contexts, etc..).
Cc: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com
Reviewed-by: John Harrison <John.C.Harrison@Intel.com?
.../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 64 ++++++++----------- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 ++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 5 ++ drivers/gpu/drm/i915/gt/uc/intel_uc.c | 20 ++++-- 5 files changed, 53 insertions(+), 51 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h index 57e18babdf4b..596cf4b818e5 100644 --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h @@ -142,6 +142,7 @@ enum intel_guc_action { INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505, INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506, INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
- INTEL_GUC_ACTION_RESET_CLIENT = 0x5B01, INTEL_GUC_ACTION_LIMIT };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 9b09395b998f..68266cbffd1f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -524,51 +524,34 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset) */ int intel_guc_suspend(struct intel_guc *guc) {
- struct intel_uncore *uncore = guc_to_gt(guc)->uncore; int ret;
- u32 status; u32 action[] = {
INTEL_GUC_ACTION_ENTER_S_STATE,
GUC_POWER_D1, /* any value greater than GUC_POWER_D0 */
};INTEL_GUC_ACTION_RESET_CLIENT,
- /*
* If GuC communication is enabled but submission is not supported,
* we do not need to suspend the GuC.
*/
- if (!intel_guc_submission_is_used(guc) || !intel_guc_is_ready(guc))
- if (!intel_guc_is_ready(guc)) return 0;
- /*
* The ENTER_S_STATE action queues the save/restore operation in GuC FW
* and then returns, so waiting on the H2G is not enough to guarantee
* GuC is done. When all the processing is done, GuC writes
* INTEL_GUC_SLEEP_STATE_SUCCESS to scratch register 14, so we can poll
* on that. Note that GuC does not ensure that the value in the register
* is different from INTEL_GUC_SLEEP_STATE_SUCCESS while the action is
* in progress so we need to take care of that ourselves as well.
*/
- intel_uncore_write(uncore, SOFT_SCRATCH(14),
INTEL_GUC_SLEEP_STATE_INVALID_MASK);
- ret = intel_guc_send(guc, action, ARRAY_SIZE(action));
- if (ret)
return ret;
- ret = __intel_wait_for_register(uncore, SOFT_SCRATCH(14),
INTEL_GUC_SLEEP_STATE_INVALID_MASK,
0, 0, 10, &status);
- if (ret)
return ret;
- if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) {
DRM_ERROR("GuC failed to change sleep state. "
"action=0x%x, err=%u\n",
action[0], status);
return -EIO;
if (intel_guc_submission_is_used(guc)) {
/*
* This H2G MMIO command tears down the GuC in two steps. First it will
* generate a G2H CTB for every active context indicating a reset. In
* practice the i915 shouldn't ever get a G2H as suspend should only be
* called when the GPU is idle. Next, it tears down the CTBs and this
* H2G MMIO command completes.
*
* Don't abort on a failure code from the GuC. Keep going and do the
* clean up in santize() and re-initialisation on resume and hopefully
* the error here won't be problematic.
*/
ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0);
if (ret)
DRM_ERROR("GuC suspend: RESET_CLIENT action failed with error %d!\n", ret);
}
/* Signal that the GuC isn't running. */
intel_guc_sanitize(guc);
return 0; }
@@ -578,7 +561,12 @@ int intel_guc_suspend(struct intel_guc *guc) */ int intel_guc_resume(struct intel_guc *guc) {
- /* XXX: to be implemented with submission interface rework */
- /*
* NB: This function can still be called even if GuC submission is
* disabled, e.g. if GuC is enabled for HuC authentication only. Thus,
* if any code is later added here, it must be support doing nothing
* if submission is disabled (as per intel_guc_suspend).
return 0; }*/
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 59fca9748c15..16b61fe71b07 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -304,10 +304,10 @@ static int guc_submission_busy_loop(struct intel_guc* guc, return err; }
-static int guc_wait_for_pending_msg(struct intel_guc *guc,
atomic_t *wait_var,
bool interruptible,
long timeout)
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
atomic_t *wait_var,
bool interruptible,
{ const int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;long timeout)
@@ -352,8 +352,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout) if (unlikely(timeout < 0)) timeout = -timeout, interruptible = false;
- return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
interruptible, timeout);
return intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
interruptible, timeout);
}
static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
@@ -625,7 +625,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc) for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) { intel_guc_to_host_event_handler(guc); #define wait_for_reset(guc, wait_var) \
guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
do { wait_for_reset(guc, &guc->outstanding_submission_g2h); } while (!list_empty(&guc->ct.requests.incoming));intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index 95df5ab06031..b9b9f0f60f91 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -27,6 +27,11 @@ void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p);
bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
atomic_t *wait_var,
bool interruptible,
long timeout);
- static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { /* XXX: GuC submission is unavailable for now */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index ab11fe731ee7..b523a8521351 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -596,14 +596,18 @@ void intel_uc_cancel_requests(struct intel_uc *uc) void intel_uc_runtime_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
int err;
if (!intel_guc_is_ready(guc)) return;
err = intel_guc_suspend(guc);
if (err)
DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
- /*
* Wait for any outstanding CTB before tearing down communication /w the
* GuC.
*/
+#define OUTSTANDING_CTB_TIMEOUT_PERIOD (HZ / 5)
intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
false, OUTSTANDING_CTB_TIMEOUT_PERIOD);
GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
guc_disable_communication(guc); }
@@ -612,12 +616,16 @@ void intel_uc_suspend(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc; intel_wakeref_t wakeref;
int err;
if (!intel_guc_is_ready(guc)) return;
- with_intel_runtime_pm(uc_to_gt(uc)->uncore->rpm, wakeref)
intel_uc_runtime_suspend(uc);
with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
err = intel_guc_suspend(guc);
if (err)
DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
} }
static int __uc_resume(struct intel_uc *uc, bool enable_communication)
GuC will issue a reset on detecting an engine hang and will notify the driver via a G2H message. The driver will service the notification by resetting the guilty context to a simple state or banning it completely.
Cc: Matthew Brost matthew.brost@intel.com Cc: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++++++++++++ drivers/gpu/drm/i915/i915_trace.h | 10 ++++++ 4 files changed, 50 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 85ef6767f13b..e94b0ef733da 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_context_reset_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len);
void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 4ed074df88e5..a2020373b8e8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -945,6 +945,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: ret = intel_guc_sched_done_process_msg(guc, payload, len); break; + case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION: + ret = intel_guc_context_reset_process_msg(guc, payload, len); + break; default: ret = -EOPNOTSUPP; break; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 16b61fe71b07..9845c5bd9832 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; }
+static void guc_context_replay(struct intel_context *ce) +{ + struct i915_sched_engine *sched_engine = ce->engine->sched_engine; + + __guc_reset_context(ce, true); + tasklet_hi_schedule(&sched_engine->tasklet); +} + +static void guc_handle_context_reset(struct intel_guc *guc, + struct intel_context *ce) +{ + trace_intel_context_reset(ce); + guc_context_replay(ce); +} + +int intel_guc_context_reset_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len) +{ + struct intel_context *ce; + int desc_idx = msg[0]; + + if (unlikely(len != 1)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + ce = g2h_context_lookup(guc, desc_idx); + if (unlikely(!ce)) + return -EPROTO; + + guc_handle_context_reset(guc, ce); + + return 0; +} + void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 97c2e83984ed..c095c4d39456 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context, __entry->guc_sched_state_no_lock) );
+DEFINE_EVENT(intel_context, intel_context_reset, + TP_PROTO(struct intel_context *ce), + TP_ARGS(ce) +); + DEFINE_EVENT(intel_context, intel_context_register, TP_PROTO(struct intel_context *ce), TP_ARGS(ce) @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq) { }
+static inline void +trace_intel_context_reset(struct intel_context *ce) +{ +} + static inline void trace_intel_context_register(struct intel_context *ce) {
On 6/24/2021 00:05, Matthew Brost wrote:
GuC will issue a reset on detecting an engine hang and will notify the driver via a G2H message. The driver will service the notification by resetting the guilty context to a simple state or banning it completely.
Cc: Matthew Brost matthew.brost@intel.com Cc: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++++++++++++ drivers/gpu/drm/i915/i915_trace.h | 10 ++++++ 4 files changed, 50 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 85ef6767f13b..e94b0ef733da 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len);
void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 4ed074df88e5..a2020373b8e8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -945,6 +945,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: ret = intel_guc_sched_done_process_msg(guc, payload, len); break;
- case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
ret = intel_guc_context_reset_process_msg(guc, payload, len);
default: ret = -EOPNOTSUPP; break;break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 16b61fe71b07..9845c5bd9832 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; }
+static void guc_context_replay(struct intel_context *ce) +{
- struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
- __guc_reset_context(ce, true);
- tasklet_hi_schedule(&sched_engine->tasklet);
+}
+static void guc_handle_context_reset(struct intel_guc *guc,
struct intel_context *ce)
+{
- trace_intel_context_reset(ce);
- guc_context_replay(ce);
+}
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len)
+{
- struct intel_context *ce;
- int desc_idx = msg[0];
Should do this dereference after checking the length? Or is it guaranteed that the length cannot be zero?
John.
- if (unlikely(len != 1)) {
drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
return -EPROTO;
- }
- ce = g2h_context_lookup(guc, desc_idx);
- if (unlikely(!ce))
return -EPROTO;
- guc_handle_context_reset(guc, ce);
- return 0;
+}
- void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 97c2e83984ed..c095c4d39456 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context, __entry->guc_sched_state_no_lock) );
+DEFINE_EVENT(intel_context, intel_context_reset,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
- DEFINE_EVENT(intel_context, intel_context_register, TP_PROTO(struct intel_context *ce), TP_ARGS(ce)
@@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq) { }
+static inline void +trace_intel_context_reset(struct intel_context *ce) +{ +}
- static inline void trace_intel_context_register(struct intel_context *ce) {
On Mon, Jul 12, 2021 at 03:58:12PM -0700, John Harrison wrote:
On 6/24/2021 00:05, Matthew Brost wrote:
GuC will issue a reset on detecting an engine hang and will notify the driver via a G2H message. The driver will service the notification by resetting the guilty context to a simple state or banning it completely.
Cc: Matthew Brost matthew.brost@intel.com Cc: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++++++++++++ drivers/gpu/drm/i915/i915_trace.h | 10 ++++++ 4 files changed, 50 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 85ef6767f13b..e94b0ef733da 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);const u32 *msg, u32 len);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 4ed074df88e5..a2020373b8e8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -945,6 +945,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE: ret = intel_guc_sched_done_process_msg(guc, payload, len); break;
- case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
ret = intel_guc_context_reset_process_msg(guc, payload, len);
default: ret = -EOPNOTSUPP; break;break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 16b61fe71b07..9845c5bd9832 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2192,6 +2192,41 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; } +static void guc_context_replay(struct intel_context *ce) +{
- struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
- __guc_reset_context(ce, true);
- tasklet_hi_schedule(&sched_engine->tasklet);
+}
+static void guc_handle_context_reset(struct intel_guc *guc,
struct intel_context *ce)
+{
- trace_intel_context_reset(ce);
- guc_context_replay(ce);
+}
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len)
+{
- struct intel_context *ce;
- int desc_idx = msg[0];
Should do this dereference after checking the length? Or is it guaranteed that the length cannot be zero?
I think for safety, it should be moved.
Matt
John.
- if (unlikely(len != 1)) {
drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
return -EPROTO;
- }
- ce = g2h_context_lookup(guc, desc_idx);
- if (unlikely(!ce))
return -EPROTO;
- guc_handle_context_reset(guc, ce);
- return 0;
+}
- void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 97c2e83984ed..c095c4d39456 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context, __entry->guc_sched_state_no_lock) ); +DEFINE_EVENT(intel_context, intel_context_reset,
TP_PROTO(struct intel_context *ce),
TP_ARGS(ce)
+);
- DEFINE_EVENT(intel_context, intel_context_register, TP_PROTO(struct intel_context *ce), TP_ARGS(ce)
@@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq) { } +static inline void +trace_intel_context_reset(struct intel_context *ce) +{ +}
- static inline void trace_intel_context_register(struct intel_context *ce) {
GuC will notify the driver, via G2H, if it fails to reset an engine. We recover by resorting to a full GPU reset.
Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Fernando Pacheco fernando.pacheco@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 +++++++++++++++++++ 3 files changed, 48 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index e94b0ef733da..99742625e6ff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -264,6 +264,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_context_reset_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_engine_failure_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len);
void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a2020373b8e8..dd6177c8d75c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -948,6 +948,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION: ret = intel_guc_context_reset_process_msg(guc, payload, len); break; + case INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION: + ret = intel_guc_engine_failure_process_msg(guc, payload, len); + break; default: ret = -EOPNOTSUPP; break; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9845c5bd9832..c3223958dfe0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2227,6 +2227,49 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc, return 0; }
+static struct intel_engine_cs * +guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance) +{ + struct intel_gt *gt = guc_to_gt(guc); + u8 engine_class = guc_class_to_engine_class(guc_class); + + /* Class index is checked in class converter */ + GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE); + + return gt->engine_class[engine_class][instance]; +} + +int intel_guc_engine_failure_process_msg(struct intel_guc *guc, + const u32 *msg, u32 len) +{ + struct intel_engine_cs *engine; + u8 guc_class, instance; + u32 reason; + + if (unlikely(len != 3)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + return -EPROTO; + } + + guc_class = msg[0]; + instance = msg[1]; + reason = msg[2]; + + engine = guc_lookup_engine(guc, guc_class, instance); + if (unlikely(!engine)) { + drm_dbg(&guc_to_gt(guc)->i915->drm, + "Invalid engine %d:%d", guc_class, instance); + return -EPROTO; + } + + intel_gt_handle_error(guc_to_gt(guc), engine->mask, + I915_ERROR_CAPTURE, + "GuC failed to reset %s (reason=0x%08x)\n", + engine->name, reason); + + return 0; +} + void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p) {
On 6/24/2021 00:05, Matthew Brost wrote:
GuC will notify the driver, via G2H, if it fails to reset an engine. We recover by resorting to a full GPU reset.
Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Fernando Pacheco fernando.pacheco@intel.com
Reviewed-by: John Harrison John.C.Harrison@Intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 3 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 +++++++++++++++++++ 3 files changed, 48 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index e94b0ef733da..99742625e6ff 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -264,6 +264,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); int intel_guc_context_reset_process_msg(struct intel_guc *guc, const u32 *msg, u32 len); +int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len);
void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index a2020373b8e8..dd6177c8d75c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -948,6 +948,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION: ret = intel_guc_context_reset_process_msg(guc, payload, len); break;
- case INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION:
ret = intel_guc_engine_failure_process_msg(guc, payload, len);
default: ret = -EOPNOTSUPP; break;break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9845c5bd9832..c3223958dfe0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2227,6 +2227,49 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc, return 0; }
+static struct intel_engine_cs * +guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance) +{
- struct intel_gt *gt = guc_to_gt(guc);
- u8 engine_class = guc_class_to_engine_class(guc_class);
- /* Class index is checked in class converter */
- GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE);
- return gt->engine_class[engine_class][instance];
+}
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
const u32 *msg, u32 len)
+{
- struct intel_engine_cs *engine;
- u8 guc_class, instance;
- u32 reason;
- if (unlikely(len != 3)) {
drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
return -EPROTO;
- }
- guc_class = msg[0];
- instance = msg[1];
- reason = msg[2];
- engine = guc_lookup_engine(guc, guc_class, instance);
- if (unlikely(!engine)) {
drm_dbg(&guc_to_gt(guc)->i915->drm,
"Invalid engine %d:%d", guc_class, instance);
return -EPROTO;
- }
- intel_gt_handle_error(guc_to_gt(guc), engine->mask,
I915_ERROR_CAPTURE,
"GuC failed to reset %s (reason=0x%08x)\n",
engine->name, reason);
- return 0;
+}
- void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p) {
The GuC can implement execution qunatums, detect hung contexts and other such things but it requires the timer expired interrupt to do so.
Signed-off-by: Matthew Brost matthew.brost@intel.com CC: John Harrison John.C.Harrison@Intel.com --- drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index 06e9a8ed4e03..0c8e7f2b06f0 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -1877,6 +1877,10 @@ void intel_rps_init(struct intel_rps *rps)
if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) < 11) rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC; + + /* GuC needs ARAT expired interrupt unmasked */ + if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc)) + rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK; }
void intel_rps_sanitize(struct intel_rps *rps)
On 6/24/2021 00:05, Matthew Brost wrote:
The GuC can implement execution qunatums, detect hung contexts and other such things but it requires the timer expired interrupt to do so.
Signed-off-by: Matthew Brost matthew.brost@intel.com CC: John Harrison John.C.Harrison@Intel.com
Reviewed-by: John Harrison John.C.Harrison@Intel.com
drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index 06e9a8ed4e03..0c8e7f2b06f0 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -1877,6 +1877,10 @@ void intel_rps_init(struct intel_rps *rps)
if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) < 11) rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC;
/* GuC needs ARAT expired interrupt unmasked */
if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc))
rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK;
}
void intel_rps_sanitize(struct intel_rps *rps)
From: John Harrison John.C.Harrison@Intel.com
The driver must provide GuC with a list of mmio registers that should be saved/restored during a GuC-based engine reset. Unfortunately, the list must be dynamically allocated as its size is variable. That means the driver must generate the list twice - once to work out the size and a second time to actually save it.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Fernando Pacheco fernando.pacheco@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com --- drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 ++-- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 199 +++++++++++++++++- drivers/gpu/drm/i915/i915_reg.h | 1 + 5 files changed, 222 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index d9a5a445ceec..9bb85187f071 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -150,13 +150,14 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa) }
static void wa_add(struct i915_wa_list *wal, i915_reg_t reg, - u32 clear, u32 set, u32 read_mask) + u32 clear, u32 set, u32 read_mask, bool masked_reg) { struct i915_wa wa = { .reg = reg, .clr = clear, .set = set, .read = read_mask, + .masked_reg = masked_reg, };
_wa_add(wal, &wa); @@ -165,7 +166,7 @@ static void wa_add(struct i915_wa_list *wal, i915_reg_t reg, static void wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set) { - wa_add(wal, reg, clear, set, clear); + wa_add(wal, reg, clear, set, clear, false); }
static void @@ -200,20 +201,20 @@ wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr) static void wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val) { - wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val); + wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true); }
static void wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val) { - wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val); + wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true); }
static void wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg, u32 mask, u32 val) { - wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask); + wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true); }
static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine, @@ -583,10 +584,10 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs *engine, GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC);
/* WaEnableFloatBlendOptimization:icl */ - wa_write_clr_set(wal, - GEN10_CACHE_MODE_SS, - 0, /* write-only, so skip validation */ - _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE)); + wa_add(wal, GEN10_CACHE_MODE_SS, 0, + _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE), + 0 /* write-only, so skip validation */, + true);
/* WaDisableGPGPUMidThreadPreemption:icl */ wa_masked_field_set(wal, GEN8_CS_CHICKEN1, @@ -631,7 +632,7 @@ static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine, FF_MODE2, FF_MODE2_TDS_TIMER_MASK, FF_MODE2_TDS_TIMER_128, - 0); + 0, false); }
static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine, @@ -669,7 +670,7 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine, FF_MODE2, FF_MODE2_GS_TIMER_MASK, FF_MODE2_GS_TIMER_224, - 0); + 0, false); }
static void dg1_ctx_workarounds_init(struct intel_engine_cs *engine, @@ -840,7 +841,7 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal) wa_add(wal, HSW_ROW_CHICKEN3, 0, _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE), - 0 /* XXX does this reg exist? */); + 0 /* XXX does this reg exist? */, true);
/* WaVSRefCountFullforceMissDisable:hsw */ wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME); @@ -1929,10 +1930,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) * disable bit, which we don't touch here, but it's good * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). */ - wa_add(wal, GEN7_GT_MODE, 0, - _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, - GEN6_WIZ_HASHING_16x4), - GEN6_WIZ_HASHING_16x4); + wa_masked_field_set(wal, + GEN7_GT_MODE, + GEN6_WIZ_HASHING_MASK, + GEN6_WIZ_HASHING_16x4); }
if (IS_GRAPHICS_VER(i915, 6, 7)) @@ -1982,10 +1983,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) * disable bit, which we don't touch here, but it's good * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM). */ - wa_add(wal, - GEN6_GT_MODE, 0, - _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4), - GEN6_WIZ_HASHING_16x4); + wa_masked_field_set(wal, + GEN7_GT_MODE, + GEN6_WIZ_HASHING_MASK, + GEN6_WIZ_HASHING_16x4);
/* WaDisable_RenderCache_OperationalFlush:snb */ wa_masked_dis(wal, CACHE_MODE_0, RC_OP_FLUSH_ENABLE); @@ -2006,7 +2007,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) wa_add(wal, MI_MODE, 0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH), /* XXX bit doesn't stick on Broadwater */ - IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH); + IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH, true);
if (GRAPHICS_VER(i915) == 4) /* @@ -2021,7 +2022,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal) */ wa_add(wal, ECOSKPD, 0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE), - 0 /* XXX bit doesn't stick on Broadwater */); + 0 /* XXX bit doesn't stick on Broadwater */, + true); }
static void diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h index c214111ea367..1e873681795d 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h +++ b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h @@ -15,6 +15,7 @@ struct i915_wa { u32 clr; u32 set; u32 read; + bool masked_reg; };
struct i915_wa_list { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 99742625e6ff..ab1a85b508db 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -58,6 +58,7 @@ struct intel_guc {
struct i915_vma *ads_vma; struct __guc_ads_blob *ads_blob; + u32 ads_regset_size;
struct i915_vma *lrc_desc_pool; void *lrc_desc_pool_vaddr; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index b82145652d57..9fd3c911f5fb 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -3,6 +3,8 @@ * Copyright © 2014-2019 Intel Corporation */
+#include <linux/bsearch.h> + #include "gt/intel_gt.h" #include "gt/intel_lrc.h" #include "intel_guc_ads.h" @@ -23,7 +25,12 @@ * | guc_policies | * +---------------------------------------+ * | guc_gt_system_info | - * +---------------------------------------+ + * +---------------------------------------+ <== static + * | guc_mmio_reg[countA] (engine 0.0) | + * | guc_mmio_reg[countB] (engine 0.1) | + * | guc_mmio_reg[countC] (engine 1.0) | + * | ... | + * +---------------------------------------+ <== dynamic * | padding | * +---------------------------------------+ <== 4K aligned * | private data | @@ -35,16 +42,33 @@ struct __guc_ads_blob { struct guc_ads ads; struct guc_policies policies; struct guc_gt_system_info system_info; + /* From here on, location is dynamic! Refer to above diagram. */ + struct guc_mmio_reg regset[0]; } __packed;
+static u32 guc_ads_regset_size(struct intel_guc *guc) +{ + GEM_BUG_ON(!guc->ads_regset_size); + return guc->ads_regset_size; +} + static u32 guc_ads_private_data_size(struct intel_guc *guc) { return PAGE_ALIGN(guc->fw.private_data_size); }
+static u32 guc_ads_regset_offset(struct intel_guc *guc) +{ + return offsetof(struct __guc_ads_blob, regset); +} + static u32 guc_ads_private_data_offset(struct intel_guc *guc) { - return PAGE_ALIGN(sizeof(struct __guc_ads_blob)); + u32 offset; + + offset = guc_ads_regset_offset(guc) + + guc_ads_regset_size(guc); + return PAGE_ALIGN(offset); }
static u32 guc_ads_blob_size(struct intel_guc *guc) @@ -83,6 +107,165 @@ static void guc_mapping_table_init(struct intel_gt *gt, } }
+/* + * The save/restore register list must be pre-calculated to a temporary + * buffer of driver defined size before it can be generated in place + * inside the ADS. + */ +#define MAX_MMIO_REGS 128 /* Arbitrary size, increase as needed */ +struct temp_regset { + struct guc_mmio_reg *registers; + u32 used; + u32 size; +}; + +static int guc_mmio_reg_cmp(const void *a, const void *b) +{ + const struct guc_mmio_reg *ra = a; + const struct guc_mmio_reg *rb = b; + + return (int)ra->offset - (int)rb->offset; +} + +static void guc_mmio_reg_add(struct temp_regset *regset, + u32 offset, u32 flags) +{ + u32 count = regset->used; + struct guc_mmio_reg reg = { + .offset = offset, + .flags = flags, + }; + struct guc_mmio_reg *slot; + + GEM_BUG_ON(count >= regset->size); + + /* + * The mmio list is built using separate lists within the driver. + * It's possible that at some point we may attempt to add the same + * register more than once. Do not consider this an error; silently + * move on if the register is already in the list. + */ + if (bsearch(®, regset->registers, count, + sizeof(reg), guc_mmio_reg_cmp)) + return; + + slot = ®set->registers[count]; + regset->used++; + *slot = reg; + + while (slot-- > regset->registers) { + GEM_BUG_ON(slot[0].offset == slot[1].offset); + if (slot[1].offset > slot[0].offset) + break; + + swap(slot[1], slot[0]); + } +} + +#define GUC_MMIO_REG_ADD(regset, reg, masked) \ + guc_mmio_reg_add(regset, \ + i915_mmio_reg_offset((reg)), \ + (masked) ? GUC_REGSET_MASKED : 0) + +static void guc_mmio_regset_init(struct temp_regset *regset, + struct intel_engine_cs *engine) +{ + const u32 base = engine->mmio_base; + struct i915_wa_list *wal = &engine->wa_list; + struct i915_wa *wa; + unsigned int i; + + regset->used = 0; + + GUC_MMIO_REG_ADD(regset, RING_MODE_GEN7(base), true); + GUC_MMIO_REG_ADD(regset, RING_HWS_PGA(base), false); + GUC_MMIO_REG_ADD(regset, RING_IMR(base), false); + + for (i = 0, wa = wal->list; i < wal->count; i++, wa++) + GUC_MMIO_REG_ADD(regset, wa->reg, wa->masked_reg); + + /* Be extra paranoid and include all whitelist registers. */ + for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++) + GUC_MMIO_REG_ADD(regset, + RING_FORCE_TO_NONPRIV(base, i), + false); + + /* add in local MOCS registers */ + for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++) + GUC_MMIO_REG_ADD(regset, GEN9_LNCFCMOCS(i), false); +} + +static int guc_mmio_reg_state_query(struct intel_guc *guc) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct intel_engine_cs *engine; + enum intel_engine_id id; + struct temp_regset temp_set; + u32 total; + + /* + * Need to actually build the list in order to filter out + * duplicates and other such data dependent constructions. + */ + temp_set.size = MAX_MMIO_REGS; + temp_set.registers = kmalloc_array(temp_set.size, + sizeof(*temp_set.registers), + GFP_KERNEL); + if (!temp_set.registers) + return -ENOMEM; + + total = 0; + for_each_engine(engine, gt, id) { + guc_mmio_regset_init(&temp_set, engine); + total += temp_set.used; + } + + kfree(temp_set.registers); + + return total * sizeof(struct guc_mmio_reg); +} + +static void guc_mmio_reg_state_init(struct intel_guc *guc, + struct __guc_ads_blob *blob) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct intel_engine_cs *engine; + enum intel_engine_id id; + struct temp_regset temp_set; + struct guc_mmio_reg_set *ads_reg_set; + u32 addr_ggtt, offset; + u8 guc_class; + + offset = guc_ads_regset_offset(guc); + addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset; + temp_set.registers = (struct guc_mmio_reg *) (((u8 *) blob) + offset); + temp_set.size = guc->ads_regset_size / sizeof(temp_set.registers[0]); + + for_each_engine(engine, gt, id) { + /* Class index is checked in class converter */ + GEM_BUG_ON(engine->instance >= GUC_MAX_INSTANCES_PER_CLASS); + + guc_class = engine_class_to_guc_class(engine->class); + ads_reg_set = &blob->ads.reg_state_list[guc_class][engine->instance]; + + guc_mmio_regset_init(&temp_set, engine); + if (!temp_set.used) { + ads_reg_set->address = 0; + ads_reg_set->count = 0; + continue; + } + + ads_reg_set->address = addr_ggtt; + ads_reg_set->count = temp_set.used; + + temp_set.size -= temp_set.used; + temp_set.registers += temp_set.used; + addr_ggtt += temp_set.used * sizeof(struct guc_mmio_reg); + } + + GEM_BUG_ON(temp_set.size); +} + /* * The first 80 dwords of the register state context, containing the * execlists and ppgtt registers. @@ -121,8 +304,7 @@ static void __guc_ads_init(struct intel_guc *guc) */ blob->ads.golden_context_lrca[guc_class] = 0; blob->ads.eng_state_size[guc_class] = - intel_engine_context_size(guc_to_gt(guc), - engine_class) - + intel_engine_context_size(gt, engine_class) - skipped_size; }
@@ -153,6 +335,9 @@ static void __guc_ads_init(struct intel_guc *guc) blob->ads.scheduler_policies = base + ptr_offset(blob, policies); blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
+ /* MMIO save/restore list */ + guc_mmio_reg_state_init(guc, blob); + /* Private Data */ blob->ads.private_data = base + guc_ads_private_data_offset(guc);
@@ -173,6 +358,12 @@ int intel_guc_ads_create(struct intel_guc *guc)
GEM_BUG_ON(guc->ads_vma);
+ /* Need to calculate the reg state size dynamically: */ + ret = guc_mmio_reg_state_query(guc); + if (ret < 0) + return ret; + guc->ads_regset_size = ret; + size = guc_ads_blob_size(guc);
ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma, diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index a9c2242d61a2..f1217b5c2ff3 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -12289,6 +12289,7 @@ enum skl_power_gate {
/* MOCS (Memory Object Control State) registers */ #define GEN9_LNCFCMOCS(i) _MMIO(0xb020 + (i) * 4) /* L3 Cache Control */ +#define GEN9_LNCFCMOCS_REG_COUNT 32
#define __GEN9_RCS0_MOCS0 0xc800 #define GEN9_GFX_MOCS(i) _MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
From: John Harrison John.C.Harrison@Intel.com
It is impossible to seal all race conditions of resets occurring concurrent to other operations. At least, not without introducing excesive mutex locking. Instead, don't complain if it occurs. In particular, don't complain if trying to send a H2G during a reset. Whatever the H2G was about should get redone once the reset is over.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++++- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 3 +++ drivers/gpu/drm/i915/gt/uc/intel_uc.h | 2 ++ 3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index dd6177c8d75c..3b32755f892e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -727,7 +727,10 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, int ret;
if (unlikely(!ct->enabled)) { - WARN(1, "Unexpected send: action=%#x\n", *action); + struct intel_guc *guc = ct_to_guc(ct); + struct intel_uc *uc = container_of(guc, struct intel_uc, guc); + + WARN(!uc->reset_in_progress, "Unexpected send: action=%#x\n", *action); return -ENODEV; }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index b523a8521351..77c1fe2ed883 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -550,6 +550,7 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
+ uc->reset_in_progress = true;
/* Nothing to do if GuC isn't supported */ if (!intel_uc_supports_guc(uc)) @@ -579,6 +580,8 @@ void intel_uc_reset_finish(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
+ uc->reset_in_progress = false; + /* Firmware expected to be running when this function is called */ if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc)) intel_guc_submission_reset_finish(guc); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index eaa3202192ac..91315e3f1c58 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -30,6 +30,8 @@ struct intel_uc {
/* Snapshot of GuC log from last failed load */ struct drm_i915_gem_object *load_err_log; + + bool reset_in_progress; };
void intel_uc_init_early(struct intel_uc *uc);
On Thu, Jun 24, 2021 at 12:05:08AM -0700, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
It is impossible to seal all race conditions of resets occurring concurrent to other operations. At least, not without introducing excesive mutex locking. Instead, don't complain if it occurs. In particular, don't complain if trying to send a H2G during a reset. Whatever the H2G was about should get redone once the reset is over.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++++- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 3 +++ drivers/gpu/drm/i915/gt/uc/intel_uc.h | 2 ++ 3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index dd6177c8d75c..3b32755f892e 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -727,7 +727,10 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len, int ret;
if (unlikely(!ct->enabled)) {
WARN(1, "Unexpected send: action=%#x\n", *action);
struct intel_guc *guc = ct_to_guc(ct);
struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
return -ENODEV; }WARN(!uc->reset_in_progress, "Unexpected send: action=%#x\n", *action);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index b523a8521351..77c1fe2ed883 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -550,6 +550,7 @@ void intel_uc_reset_prepare(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
uc->reset_in_progress = true;
/* Nothing to do if GuC isn't supported */ if (!intel_uc_supports_guc(uc))
@@ -579,6 +580,8 @@ void intel_uc_reset_finish(struct intel_uc *uc) { struct intel_guc *guc = &uc->guc;
- uc->reset_in_progress = false;
- /* Firmware expected to be running when this function is called */ if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc)) intel_guc_submission_reset_finish(guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index eaa3202192ac..91315e3f1c58 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -30,6 +30,8 @@ struct intel_uc {
/* Snapshot of GuC log from last failed load */ struct drm_i915_gem_object *load_err_log;
- bool reset_in_progress;
};
void intel_uc_init_early(struct intel_uc *uc);
2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
From: John Harrison John.C.Harrison@Intel.com
Clear the 'disable resets' flag to allow GuC to reset hung contexts (detected via pre-emption timeout).
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index 9fd3c911f5fb..d3e86ab7508f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -81,8 +81,7 @@ static void guc_policies_init(struct guc_policies *policies) { policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US; policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI; - /* Disable automatic resets as not yet supported. */ - policies->global_flags = GLOBAL_POLICY_DISABLE_ENGINE_RESET; + policies->global_flags = 0; policies->is_valid = 1; }
On Thu, Jun 24, 2021 at 12:05:09AM -0700, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
Clear the 'disable resets' flag to allow GuC to reset hung contexts (detected via pre-emption timeout).
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index 9fd3c911f5fb..d3e86ab7508f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -81,8 +81,7 @@ static void guc_policies_init(struct guc_policies *policies) { policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US; policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
- /* Disable automatic resets as not yet supported. */
- policies->global_flags = GLOBAL_POLICY_DISABLE_ENGINE_RESET;
- policies->global_flags = 0; policies->is_valid = 1;
}
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
We receive notification of an engine reset from GuC at its completion. Meaning GuC has potentially cleared any HW state we may have been interested in capturing. GuC resumes scheduling on the engine post-reset, as the resets are meant to be transparent, further muddling our error state.
There is ongoing work to define an API for a GuC debug state dump. The suggestion for now is to manually disable FW initiated resets in cases where debug state is needed.
Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_context.c | 20 +++++++++++ drivers/gpu/drm/i915/gt/intel_context.h | 3 ++ drivers/gpu/drm/i915/gt/intel_engine.h | 21 ++++++++++- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 ++++-- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++---------- drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++++++++++--- 7 files changed, 91 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 2f01437056a8..3fe7794b2bfd 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce) return rq; }
+struct i915_request *intel_context_find_active_request(struct intel_context *ce) +{ + struct i915_request *rq, *active = NULL; + unsigned long flags; + + GEM_BUG_ON(!intel_engine_uses_guc(ce->engine)); + + spin_lock_irqsave(&ce->guc_active.lock, flags); + list_for_each_entry_reverse(rq, &ce->guc_active.requests, + sched.link) { + if (i915_request_completed(rq)) + break; + + active = rq; + } + spin_unlock_irqrestore(&ce->guc_active.lock, flags); + + return active; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_context.c" #endif diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index a592a9605dc8..3363b59c0c40 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -201,6 +201,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
struct i915_request *intel_context_create_request(struct intel_context *ce);
+struct i915_request * +intel_context_find_active_request(struct intel_context *ce); + static inline struct intel_ring *__intel_context_ring_size(u64 sz) { return u64_to_ptr(struct intel_ring, sz); diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index e9e0657f847a..6ea5643a3aaa 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -245,7 +245,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now);
struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine); +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
u32 intel_engine_context_size(struct intel_gt *gt, u8 class); struct intel_context * @@ -328,4 +328,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) return engine->cops->get_sibling(engine, sibling); }
+static inline void +intel_engine_set_hung_context(struct intel_engine_cs *engine, + struct intel_context *ce) +{ + engine->hung_ce = ce; +} + +static inline void +intel_engine_clear_hung_context(struct intel_engine_cs *engine) +{ + intel_engine_set_hung_context(engine, NULL); +} + +static inline struct intel_context * +intel_engine_get_hung_context(struct intel_engine_cs *engine) +{ + return engine->hung_ce; +} + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 69245670b8b0..1d243b83b023 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1671,7 +1671,7 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "\tRequests:\n");
spin_lock_irqsave(&engine->sched_engine->lock, flags); - rq = intel_engine_find_active_request(engine); + rq = intel_engine_execlist_find_hung_request(engine); if (rq) { struct intel_timeline *tl = get_timeline(rq);
@@ -1782,10 +1782,17 @@ static bool match_ring(struct i915_request *rq) }
struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine) +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine) { struct i915_request *request, *active = NULL;
+ /* + * This search does not work in GuC submission mode. However, the GuC + * will report the hanging context directly to the driver itself. So + * the driver should never get here when in GuC mode. + */ + GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc)); + /* * We are called by the error capture, reset and to dump engine * state at random points in time. In particular, note that neither is diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index f9d264c008e8..0ceffa2be7a7 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -303,6 +303,8 @@ struct intel_engine_cs { /* keep a request in reserve for a [pm] barrier under oom */ struct i915_request *request_pool;
+ struct intel_context *hung_ce; + struct llist_head barrier_tasks;
struct intel_context *kernel_context; /* pinned */ diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c3223958dfe0..315edeaa186a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -727,24 +727,6 @@ __unwind_incomplete_requests(struct intel_context *ce) spin_unlock_irqrestore(&sched_engine->lock, flags); }
-static struct i915_request *context_find_active_request(struct intel_context *ce) -{ - struct i915_request *rq, *active = NULL; - unsigned long flags; - - spin_lock_irqsave(&ce->guc_active.lock, flags); - list_for_each_entry_reverse(rq, &ce->guc_active.requests, - sched.link) { - if (i915_request_completed(rq)) - break; - - active = rq; - } - spin_unlock_irqrestore(&ce->guc_active.lock, flags); - - return active; -} - static void __guc_reset_context(struct intel_context *ce, bool stalled) { struct i915_request *rq; @@ -758,7 +740,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled) */ clr_context_enabled(ce);
- rq = context_find_active_request(ce); + rq = intel_context_find_active_request(ce); if (!rq) { head = ce->ring->tail; stalled = false; @@ -2192,6 +2174,20 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; }
+static void capture_error_state(struct intel_guc *guc, + struct intel_context *ce) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct drm_i915_private *i915 = gt->i915; + struct intel_engine_cs *engine = __context_to_physical_engine(ce); + intel_wakeref_t wakeref; + + intel_engine_set_hung_context(engine, ce); + with_intel_runtime_pm(&i915->runtime_pm, wakeref) + i915_capture_error_state(gt, engine->mask); + atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]); +} + static void guc_context_replay(struct intel_context *ce) { struct i915_sched_engine *sched_engine = ce->engine->sched_engine; @@ -2204,6 +2200,7 @@ static void guc_handle_context_reset(struct intel_guc *guc, struct intel_context *ce) { trace_intel_context_reset(ce); + capture_error_state(guc, ce); guc_context_replay(ce); }
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index cb182c6d265a..20e0a1bfadc1 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1429,20 +1429,37 @@ capture_engine(struct intel_engine_cs *engine, { struct intel_engine_capture_vma *capture = NULL; struct intel_engine_coredump *ee; - struct i915_request *rq; + struct intel_context *ce; + struct i915_request *rq = NULL; unsigned long flags;
ee = intel_engine_coredump_alloc(engine, GFP_KERNEL); if (!ee) return NULL;
- spin_lock_irqsave(&engine->sched_engine->lock, flags); - rq = intel_engine_find_active_request(engine); + ce = intel_engine_get_hung_context(engine); + if (ce) { + intel_engine_clear_hung_context(engine); + rq = intel_context_find_active_request(ce); + if (!rq || !i915_request_started(rq)) + goto no_request_capture; + } else { + /* + * Getting here with GuC enabled means it is a forced error capture + * with no actual hang. So, no need to attempt the execlist search. + */ + if (!intel_uc_uses_guc_submission(&engine->gt->uc)) { + spin_lock_irqsave(&engine->sched_engine->lock, flags); + rq = intel_engine_execlist_find_hung_request(engine); + spin_unlock_irqrestore(&engine->sched_engine->lock, + flags); + } + } if (rq) capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL); - spin_unlock_irqrestore(&engine->sched_engine->lock, flags); if (!capture) { +no_request_capture: kfree(ee); return NULL; }
On 6/24/2021 00:05, Matthew Brost wrote:
We receive notification of an engine reset from GuC at its completion. Meaning GuC has potentially cleared any HW state we may have been interested in capturing. GuC resumes scheduling on the engine post-reset, as the resets are meant to be transparent, further muddling our error state.
There is ongoing work to define an API for a GuC debug state dump. The suggestion for now is to manually disable FW initiated resets in cases where debug state is needed.
Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: John Harrison John.C.Harrison@Intel.com
drivers/gpu/drm/i915/gt/intel_context.c | 20 +++++++++++ drivers/gpu/drm/i915/gt/intel_context.h | 3 ++ drivers/gpu/drm/i915/gt/intel_engine.h | 21 ++++++++++- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 11 ++++-- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++---------- drivers/gpu/drm/i915/i915_gpu_error.c | 25 ++++++++++--- 7 files changed, 91 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 2f01437056a8..3fe7794b2bfd 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -514,6 +514,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce) return rq; }
+struct i915_request *intel_context_find_active_request(struct intel_context *ce) +{
- struct i915_request *rq, *active = NULL;
- unsigned long flags;
- GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
- spin_lock_irqsave(&ce->guc_active.lock, flags);
- list_for_each_entry_reverse(rq, &ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
break;
active = rq;
- }
- spin_unlock_irqrestore(&ce->guc_active.lock, flags);
- return active;
+}
- #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_context.c" #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index a592a9605dc8..3363b59c0c40 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -201,6 +201,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
struct i915_request *intel_context_create_request(struct intel_context *ce);
+struct i915_request * +intel_context_find_active_request(struct intel_context *ce);
- static inline struct intel_ring *__intel_context_ring_size(u64 sz) { return u64_to_ptr(struct intel_ring, sz);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index e9e0657f847a..6ea5643a3aaa 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -245,7 +245,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now);
struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine); +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
u32 intel_engine_context_size(struct intel_gt *gt, u8 class); struct intel_context * @@ -328,4 +328,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling) return engine->cops->get_sibling(engine, sibling); }
+static inline void +intel_engine_set_hung_context(struct intel_engine_cs *engine,
struct intel_context *ce)
+{
- engine->hung_ce = ce;
+}
+static inline void +intel_engine_clear_hung_context(struct intel_engine_cs *engine) +{
- intel_engine_set_hung_context(engine, NULL);
+}
+static inline struct intel_context * +intel_engine_get_hung_context(struct intel_engine_cs *engine) +{
- return engine->hung_ce;
+}
- #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 69245670b8b0..1d243b83b023 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1671,7 +1671,7 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "\tRequests:\n");
spin_lock_irqsave(&engine->sched_engine->lock, flags);
- rq = intel_engine_find_active_request(engine);
- rq = intel_engine_execlist_find_hung_request(engine); if (rq) { struct intel_timeline *tl = get_timeline(rq);
@@ -1782,10 +1782,17 @@ static bool match_ring(struct i915_request *rq) }
struct i915_request * -intel_engine_find_active_request(struct intel_engine_cs *engine) +intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine) { struct i915_request *request, *active = NULL;
- /*
* This search does not work in GuC submission mode. However, the GuC
* will report the hanging context directly to the driver itself. So
* the driver should never get here when in GuC mode.
*/
- GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
- /*
- We are called by the error capture, reset and to dump engine
- state at random points in time. In particular, note that neither is
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index f9d264c008e8..0ceffa2be7a7 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -303,6 +303,8 @@ struct intel_engine_cs { /* keep a request in reserve for a [pm] barrier under oom */ struct i915_request *request_pool;
struct intel_context *hung_ce;
struct llist_head barrier_tasks;
struct intel_context *kernel_context; /* pinned */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c3223958dfe0..315edeaa186a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -727,24 +727,6 @@ __unwind_incomplete_requests(struct intel_context *ce) spin_unlock_irqrestore(&sched_engine->lock, flags); }
-static struct i915_request *context_find_active_request(struct intel_context *ce) -{
- struct i915_request *rq, *active = NULL;
- unsigned long flags;
- spin_lock_irqsave(&ce->guc_active.lock, flags);
- list_for_each_entry_reverse(rq, &ce->guc_active.requests,
sched.link) {
if (i915_request_completed(rq))
break;
active = rq;
- }
- spin_unlock_irqrestore(&ce->guc_active.lock, flags);
- return active;
-}
- static void __guc_reset_context(struct intel_context *ce, bool stalled) { struct i915_request *rq;
@@ -758,7 +740,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled) */ clr_context_enabled(ce);
- rq = context_find_active_request(ce);
- rq = intel_context_find_active_request(ce); if (!rq) { head = ce->ring->tail; stalled = false;
@@ -2192,6 +2174,20 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc, return 0; }
+static void capture_error_state(struct intel_guc *guc,
struct intel_context *ce)
+{
- struct intel_gt *gt = guc_to_gt(guc);
- struct drm_i915_private *i915 = gt->i915;
- struct intel_engine_cs *engine = __context_to_physical_engine(ce);
- intel_wakeref_t wakeref;
- intel_engine_set_hung_context(engine, ce);
- with_intel_runtime_pm(&i915->runtime_pm, wakeref)
i915_capture_error_state(gt, engine->mask);
- atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]);
+}
- static void guc_context_replay(struct intel_context *ce) { struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
@@ -2204,6 +2200,7 @@ static void guc_handle_context_reset(struct intel_guc *guc, struct intel_context *ce) { trace_intel_context_reset(ce);
- capture_error_state(guc, ce); guc_context_replay(ce); }
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index cb182c6d265a..20e0a1bfadc1 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1429,20 +1429,37 @@ capture_engine(struct intel_engine_cs *engine, { struct intel_engine_capture_vma *capture = NULL; struct intel_engine_coredump *ee;
- struct i915_request *rq;
struct intel_context *ce;
struct i915_request *rq = NULL; unsigned long flags;
ee = intel_engine_coredump_alloc(engine, GFP_KERNEL); if (!ee) return NULL;
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- rq = intel_engine_find_active_request(engine);
- ce = intel_engine_get_hung_context(engine);
- if (ce) {
intel_engine_clear_hung_context(engine);
rq = intel_context_find_active_request(ce);
if (!rq || !i915_request_started(rq))
goto no_request_capture;
- } else {
/*
* Getting here with GuC enabled means it is a forced error capture
* with no actual hang. So, no need to attempt the execlist search.
*/
if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
spin_lock_irqsave(&engine->sched_engine->lock, flags);
rq = intel_engine_execlist_find_hung_request(engine);
spin_unlock_irqrestore(&engine->sched_engine->lock,
flags);
}
- } if (rq) capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL);
- spin_unlock_irqrestore(&engine->sched_engine->lock, flags); if (!capture) {
+no_request_capture: kfree(ee); return NULL; }
From: John Harrison John.C.Harrison@Intel.com
In the case of a full GPU reset (e.g. because GuC has died or because GuC's hang detection has been disabled), the driver can't rely on GuC reporting the guilty context. Instead, the driver needs to scan all active contexts and find one that is currently executing, as per the execlist mode behaviour. In GuC mode, this scan is different to execlist mode as the active request list is handled very differently.
Similarly, the request state dump in debugfs needs to be handled differently when in GuC submission mode.
Also refactured some of the request scanning code to avoid duplication across the multiple code paths that are now replicating it.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_engine.h | 3 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 139 ++++++++++++------ .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 8 + drivers/gpu/drm/i915/gt/intel_reset.c | 2 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 ++++++ drivers/gpu/drm/i915/i915_request.h | 11 ++ 9 files changed, 229 insertions(+), 47 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 6ea5643a3aaa..9ba131175564 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -240,6 +240,9 @@ __printf(3, 4) void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m, const char *header, ...); +void intel_engine_dump_active_requests(struct list_head *requests, + struct i915_request *hung_rq, + struct drm_printer *m);
ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 1d243b83b023..bbea7c9a367d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1624,6 +1624,97 @@ static void print_properties(struct intel_engine_cs *engine, read_ul(&engine->defaults, p->offset)); }
+static void engine_dump_request(struct i915_request *rq, struct drm_printer *m, const char *msg) +{ + struct intel_timeline *tl = get_timeline(rq); + + i915_request_show(m, rq, msg, 0); + + drm_printf(m, "\t\tring->start: 0x%08x\n", + i915_ggtt_offset(rq->ring->vma)); + drm_printf(m, "\t\tring->head: 0x%08x\n", + rq->ring->head); + drm_printf(m, "\t\tring->tail: 0x%08x\n", + rq->ring->tail); + drm_printf(m, "\t\tring->emit: 0x%08x\n", + rq->ring->emit); + drm_printf(m, "\t\tring->space: 0x%08x\n", + rq->ring->space); + + if (tl) { + drm_printf(m, "\t\tring->hwsp: 0x%08x\n", + tl->hwsp_offset); + intel_timeline_put(tl); + } + + print_request_ring(m, rq); + + if (rq->context->lrc_reg_state) { + drm_printf(m, "Logical Ring Context:\n"); + hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE); + } +} + +void intel_engine_dump_active_requests(struct list_head *requests, + struct i915_request *hung_rq, + struct drm_printer *m) +{ + struct i915_request *rq; + const char *msg; + enum i915_request_state state; + + list_for_each_entry(rq, requests, sched.link) { + if (rq == hung_rq) + continue; + + state = i915_test_request_state(rq); + if (state < I915_REQUEST_QUEUED) + continue; + + if (state == I915_REQUEST_ACTIVE) + msg = "\t\tactive on engine"; + else + msg = "\t\tactive in queue"; + + engine_dump_request(rq, m, msg); + } +} + +static void engine_dump_active_requests(struct intel_engine_cs *engine, struct drm_printer *m) +{ + struct i915_request *hung_rq = NULL; + struct intel_context *ce; + bool guc; + + /* + * No need for an engine->irq_seqno_barrier() before the seqno reads. + * The GPU is still running so requests are still executing and any + * hardware reads will be out of date by the time they are reported. + * But the intention here is just to report an instantaneous snapshot + * so that's fine. + */ + lockdep_assert_held(&engine->sched_engine->lock); + + drm_printf(m, "\tRequests:\n"); + + guc = intel_uc_uses_guc_submission(&engine->gt->uc); + if (guc) { + ce = intel_engine_get_hung_context(engine); + if (ce) + hung_rq = intel_context_find_active_request(ce); + } else + hung_rq = intel_engine_execlist_find_hung_request(engine); + + if (hung_rq) + engine_dump_request(hung_rq, m, "\t\thung"); + + if (guc) + intel_guc_dump_active_requests(engine, hung_rq, m); + else + intel_engine_dump_active_requests(&engine->sched_engine->requests, + hung_rq, m); +} + void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m, const char *header, ...) @@ -1668,39 +1759,9 @@ void intel_engine_dump(struct intel_engine_cs *engine, i915_reset_count(error)); print_properties(engine, m);
- drm_printf(m, "\tRequests:\n"); - spin_lock_irqsave(&engine->sched_engine->lock, flags); - rq = intel_engine_execlist_find_hung_request(engine); - if (rq) { - struct intel_timeline *tl = get_timeline(rq); - - i915_request_show(m, rq, "\t\tactive ", 0); - - drm_printf(m, "\t\tring->start: 0x%08x\n", - i915_ggtt_offset(rq->ring->vma)); - drm_printf(m, "\t\tring->head: 0x%08x\n", - rq->ring->head); - drm_printf(m, "\t\tring->tail: 0x%08x\n", - rq->ring->tail); - drm_printf(m, "\t\tring->emit: 0x%08x\n", - rq->ring->emit); - drm_printf(m, "\t\tring->space: 0x%08x\n", - rq->ring->space); - - if (tl) { - drm_printf(m, "\t\tring->hwsp: 0x%08x\n", - tl->hwsp_offset); - intel_timeline_put(tl); - } - - print_request_ring(m, rq); + engine_dump_active_requests(engine, m);
- if (rq->context->lrc_reg_state) { - drm_printf(m, "Logical Ring Context:\n"); - hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE); - } - } drm_printf(m, "\tOn hold?: %lu\n", list_count(&engine->sched_engine->hold)); spin_unlock_irqrestore(&engine->sched_engine->lock, flags); @@ -1774,13 +1835,6 @@ intel_engine_create_virtual(struct intel_engine_cs **siblings, return siblings[0]->cops->create_virtual(siblings, count); }
-static bool match_ring(struct i915_request *rq) -{ - u32 ring = ENGINE_READ(rq->engine, RING_START); - - return ring == i915_ggtt_offset(rq->ring->vma); -} - struct i915_request * intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine) { @@ -1824,14 +1878,7 @@ intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
list_for_each_entry(request, &engine->sched_engine->requests, sched.link) { - if (__i915_request_is_complete(request)) - continue; - - if (!__i915_request_has_started(request)) - continue; - - /* More than one preemptible request may match! */ - if (!match_ring(request)) + if (i915_test_request_state(request) != I915_REQUEST_ACTIVE) continue;
active = request; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index a8495364d906..f0768824de6f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -90,6 +90,14 @@ reset_engine(struct intel_engine_cs *engine, struct i915_request *rq) if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) show_heartbeat(rq, engine);
+ if (intel_engine_uses_guc(engine)) + /* + * GuC itself is toast or GuC's hang detection + * is disabled. Either way, need to find the + * hang culprit manually. + */ + intel_guc_find_hung_context(engine); + intel_gt_handle_error(engine->gt, engine->mask, I915_ERROR_CAPTURE, "stopped heartbeat on %s", diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 2987282dff6d..f3cdbf4ba5c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -156,7 +156,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty) if (guilty) { i915_request_set_error_once(rq, -EIO); __i915_request_skip(rq); - if (mark_guilty(rq)) + if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine)) skip_context(rq); } else { i915_request_set_error_once(rq, -EAGAIN); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index ab1a85b508db..c38365cd5fab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -268,6 +268,8 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc, int intel_guc_engine_failure_process_msg(struct intel_guc *guc, const u32 *msg, u32 len);
+void intel_guc_find_hung_context(struct intel_engine_cs *engine); + void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); void intel_guc_submission_reset_finish(struct intel_guc *guc); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 315edeaa186a..6188189314d5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2267,6 +2267,73 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, return 0; }
+void intel_guc_find_hung_context(struct intel_engine_cs *engine) +{ + struct intel_guc *guc = &engine->gt->uc.guc; + struct intel_context *ce; + struct i915_request *rq; + unsigned long index; + + /* Reset called during driver load? GuC not yet initialised! */ + if (unlikely(!guc_submission_initialized(guc))) + return; + + xa_for_each(&guc->context_lookup, index, ce) { + if (!intel_context_is_pinned(ce)) + continue; + + if (intel_engine_is_virtual(ce->engine)) { + if (!(ce->engine->mask & engine->mask)) + continue; + } else { + if (ce->engine != engine) + continue; + } + + list_for_each_entry(rq, &ce->guc_active.requests, sched.link) { + if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE) + continue; + + intel_engine_set_hung_context(engine, ce); + + /* Can only cope with one hang at a time... */ + return; + } + } +} + +void intel_guc_dump_active_requests(struct intel_engine_cs *engine, + struct i915_request *hung_rq, + struct drm_printer *m) +{ + struct intel_guc *guc = &engine->gt->uc.guc; + struct intel_context *ce; + unsigned long index; + unsigned long flags; + + /* Reset called during driver load? GuC not yet initialised! */ + if (unlikely(!guc_submission_initialized(guc))) + return; + + xa_for_each(&guc->context_lookup, index, ce) { + if (!intel_context_is_pinned(ce)) + continue; + + if (intel_engine_is_virtual(ce->engine)) { + if (!(ce->engine->mask & engine->mask)) + continue; + } else { + if (ce->engine != engine) + continue; + } + + spin_lock_irqsave(&ce->guc_active.lock, flags); + intel_engine_dump_active_requests(&ce->guc_active.requests, + hung_rq, m); + spin_unlock_irqrestore(&ce->guc_active.lock, flags); + } +} + void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index b9b9f0f60f91..a2a3fad72be1 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -24,6 +24,9 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine); void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p); void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p); +void intel_guc_dump_active_requests(struct intel_engine_cs *engine, + struct i915_request *hung_rq, + struct drm_printer *m);
bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 192784875a1d..2978c8d45021 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -2076,6 +2076,47 @@ void i915_request_show(struct drm_printer *m, name); }
+static bool engine_match_ring(struct intel_engine_cs *engine, struct i915_request *rq) +{ + u32 ring = ENGINE_READ(engine, RING_START); + + return ring == i915_ggtt_offset(rq->ring->vma); +} + +static bool match_ring(struct i915_request *rq) +{ + struct intel_engine_cs *engine; + bool found; + int i; + + if (!intel_engine_is_virtual(rq->engine)) + return engine_match_ring(rq->engine, rq); + + found = false; + i = 0; + while ((engine = intel_engine_get_sibling(rq->engine, i++))) { + found = engine_match_ring(engine, rq); + if (found) + break; + } + + return found; +} + +enum i915_request_state i915_test_request_state(struct i915_request *rq) +{ + if (i915_request_completed(rq)) + return I915_REQUEST_COMPLETE; + + if (!i915_request_started(rq)) + return I915_REQUEST_PENDING; + + if (match_ring(rq)) + return I915_REQUEST_ACTIVE; + + return I915_REQUEST_QUEUED; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/mock_request.c" #include "selftests/i915_request.c" diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index bcc6340c505e..f98385f72782 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -651,4 +651,15 @@ i915_request_active_engine(struct i915_request *rq,
void i915_request_notify_execute_cb_imm(struct i915_request *rq);
+enum i915_request_state +{ + I915_REQUEST_UNKNOWN = 0, + I915_REQUEST_COMPLETE, + I915_REQUEST_PENDING, + I915_REQUEST_QUEUED, + I915_REQUEST_ACTIVE, +}; + +enum i915_request_state i915_test_request_state(struct i915_request *rq); + #endif /* I915_REQUEST_H */
On Thu, Jun 24, 2021 at 12:05:11AM -0700, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
In the case of a full GPU reset (e.g. because GuC has died or because GuC's hang detection has been disabled), the driver can't rely on GuC reporting the guilty context. Instead, the driver needs to scan all active contexts and find one that is currently executing, as per the execlist mode behaviour. In GuC mode, this scan is different to execlist mode as the active request list is handled very differently.
Similarly, the request state dump in debugfs needs to be handled differently when in GuC submission mode.
Also refactured some of the request scanning code to avoid duplication across the multiple code paths that are now replicating it.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/intel_engine.h | 3 + drivers/gpu/drm/i915/gt/intel_engine_cs.c | 139 ++++++++++++------ .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 8 + drivers/gpu/drm/i915/gt/intel_reset.c | 2 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 + drivers/gpu/drm/i915/i915_request.c | 41 ++++++ drivers/gpu/drm/i915/i915_request.h | 11 ++ 9 files changed, 229 insertions(+), 47 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 6ea5643a3aaa..9ba131175564 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -240,6 +240,9 @@ __printf(3, 4) void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m, const char *header, ...); +void intel_engine_dump_active_requests(struct list_head *requests,
struct i915_request *hung_rq,
struct drm_printer *m);
ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 1d243b83b023..bbea7c9a367d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1624,6 +1624,97 @@ static void print_properties(struct intel_engine_cs *engine, read_ul(&engine->defaults, p->offset)); }
+static void engine_dump_request(struct i915_request *rq, struct drm_printer *m, const char *msg) +{
- struct intel_timeline *tl = get_timeline(rq);
- i915_request_show(m, rq, msg, 0);
- drm_printf(m, "\t\tring->start: 0x%08x\n",
i915_ggtt_offset(rq->ring->vma));
- drm_printf(m, "\t\tring->head: 0x%08x\n",
rq->ring->head);
- drm_printf(m, "\t\tring->tail: 0x%08x\n",
rq->ring->tail);
- drm_printf(m, "\t\tring->emit: 0x%08x\n",
rq->ring->emit);
- drm_printf(m, "\t\tring->space: 0x%08x\n",
rq->ring->space);
- if (tl) {
drm_printf(m, "\t\tring->hwsp: 0x%08x\n",
tl->hwsp_offset);
intel_timeline_put(tl);
- }
- print_request_ring(m, rq);
- if (rq->context->lrc_reg_state) {
drm_printf(m, "Logical Ring Context:\n");
hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
- }
+}
+void intel_engine_dump_active_requests(struct list_head *requests,
struct i915_request *hung_rq,
struct drm_printer *m)
+{
- struct i915_request *rq;
- const char *msg;
- enum i915_request_state state;
- list_for_each_entry(rq, requests, sched.link) {
if (rq == hung_rq)
continue;
state = i915_test_request_state(rq);
if (state < I915_REQUEST_QUEUED)
continue;
if (state == I915_REQUEST_ACTIVE)
msg = "\t\tactive on engine";
else
msg = "\t\tactive in queue";
engine_dump_request(rq, m, msg);
- }
+}
+static void engine_dump_active_requests(struct intel_engine_cs *engine, struct drm_printer *m) +{
- struct i915_request *hung_rq = NULL;
- struct intel_context *ce;
- bool guc;
- /*
* No need for an engine->irq_seqno_barrier() before the seqno reads.
* The GPU is still running so requests are still executing and any
* hardware reads will be out of date by the time they are reported.
* But the intention here is just to report an instantaneous snapshot
* so that's fine.
*/
- lockdep_assert_held(&engine->sched_engine->lock);
- drm_printf(m, "\tRequests:\n");
- guc = intel_uc_uses_guc_submission(&engine->gt->uc);
- if (guc) {
ce = intel_engine_get_hung_context(engine);
if (ce)
hung_rq = intel_context_find_active_request(ce);
- } else
hung_rq = intel_engine_execlist_find_hung_request(engine);
- if (hung_rq)
engine_dump_request(hung_rq, m, "\t\thung");
- if (guc)
intel_guc_dump_active_requests(engine, hung_rq, m);
- else
intel_engine_dump_active_requests(&engine->sched_engine->requests,
hung_rq, m);
+}
void intel_engine_dump(struct intel_engine_cs *engine, struct drm_printer *m, const char *header, ...) @@ -1668,39 +1759,9 @@ void intel_engine_dump(struct intel_engine_cs *engine, i915_reset_count(error)); print_properties(engine, m);
- drm_printf(m, "\tRequests:\n");
- spin_lock_irqsave(&engine->sched_engine->lock, flags);
- rq = intel_engine_execlist_find_hung_request(engine);
- if (rq) {
struct intel_timeline *tl = get_timeline(rq);
i915_request_show(m, rq, "\t\tactive ", 0);
drm_printf(m, "\t\tring->start: 0x%08x\n",
i915_ggtt_offset(rq->ring->vma));
drm_printf(m, "\t\tring->head: 0x%08x\n",
rq->ring->head);
drm_printf(m, "\t\tring->tail: 0x%08x\n",
rq->ring->tail);
drm_printf(m, "\t\tring->emit: 0x%08x\n",
rq->ring->emit);
drm_printf(m, "\t\tring->space: 0x%08x\n",
rq->ring->space);
if (tl) {
drm_printf(m, "\t\tring->hwsp: 0x%08x\n",
tl->hwsp_offset);
intel_timeline_put(tl);
}
print_request_ring(m, rq);
- engine_dump_active_requests(engine, m);
if (rq->context->lrc_reg_state) {
drm_printf(m, "Logical Ring Context:\n");
hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
}
- } drm_printf(m, "\tOn hold?: %lu\n", list_count(&engine->sched_engine->hold)); spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
@@ -1774,13 +1835,6 @@ intel_engine_create_virtual(struct intel_engine_cs **siblings, return siblings[0]->cops->create_virtual(siblings, count); }
-static bool match_ring(struct i915_request *rq) -{
- u32 ring = ENGINE_READ(rq->engine, RING_START);
- return ring == i915_ggtt_offset(rq->ring->vma);
-}
struct i915_request * intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine) { @@ -1824,14 +1878,7 @@ intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
list_for_each_entry(request, &engine->sched_engine->requests, sched.link) {
if (__i915_request_is_complete(request))
continue;
if (!__i915_request_has_started(request))
continue;
/* More than one preemptible request may match! */
if (!match_ring(request))
if (i915_test_request_state(request) != I915_REQUEST_ACTIVE) continue;
active = request;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index a8495364d906..f0768824de6f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -90,6 +90,14 @@ reset_engine(struct intel_engine_cs *engine, struct i915_request *rq) if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) show_heartbeat(rq, engine);
- if (intel_engine_uses_guc(engine))
/*
* GuC itself is toast or GuC's hang detection
* is disabled. Either way, need to find the
* hang culprit manually.
*/
intel_guc_find_hung_context(engine);
- intel_gt_handle_error(engine->gt, engine->mask, I915_ERROR_CAPTURE, "stopped heartbeat on %s",
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 2987282dff6d..f3cdbf4ba5c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -156,7 +156,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty) if (guilty) { i915_request_set_error_once(rq, -EIO); __i915_request_skip(rq);
if (mark_guilty(rq))
} else { i915_request_set_error_once(rq, -EAGAIN);if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine)) skip_context(rq);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index ab1a85b508db..c38365cd5fab 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -268,6 +268,8 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc, int intel_guc_engine_failure_process_msg(struct intel_guc *guc, const u32 *msg, u32 len);
+void intel_guc_find_hung_context(struct intel_engine_cs *engine);
void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); void intel_guc_submission_reset_finish(struct intel_guc *guc); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 315edeaa186a..6188189314d5 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2267,6 +2267,73 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, return 0; }
+void intel_guc_find_hung_context(struct intel_engine_cs *engine) +{
- struct intel_guc *guc = &engine->gt->uc.guc;
- struct intel_context *ce;
- struct i915_request *rq;
- unsigned long index;
- /* Reset called during driver load? GuC not yet initialised! */
- if (unlikely(!guc_submission_initialized(guc)))
return;
- xa_for_each(&guc->context_lookup, index, ce) {
if (!intel_context_is_pinned(ce))
continue;
if (intel_engine_is_virtual(ce->engine)) {
if (!(ce->engine->mask & engine->mask))
continue;
} else {
if (ce->engine != engine)
continue;
}
list_for_each_entry(rq, &ce->guc_active.requests, sched.link) {
if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE)
continue;
intel_engine_set_hung_context(engine, ce);
/* Can only cope with one hang at a time... */
return;
}
- }
+}
+void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
struct i915_request *hung_rq,
struct drm_printer *m)
+{
- struct intel_guc *guc = &engine->gt->uc.guc;
- struct intel_context *ce;
- unsigned long index;
- unsigned long flags;
- /* Reset called during driver load? GuC not yet initialised! */
- if (unlikely(!guc_submission_initialized(guc)))
return;
- xa_for_each(&guc->context_lookup, index, ce) {
if (!intel_context_is_pinned(ce))
continue;
if (intel_engine_is_virtual(ce->engine)) {
if (!(ce->engine->mask & engine->mask))
continue;
} else {
if (ce->engine != engine)
continue;
}
spin_lock_irqsave(&ce->guc_active.lock, flags);
intel_engine_dump_active_requests(&ce->guc_active.requests,
hung_rq, m);
spin_unlock_irqrestore(&ce->guc_active.lock, flags);
- }
+}
void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p) { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index b9b9f0f60f91..a2a3fad72be1 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -24,6 +24,9 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine); void intel_guc_log_submission_info(struct intel_guc *guc, struct drm_printer *p); void intel_guc_log_context_info(struct intel_guc *guc, struct drm_printer *p); +void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
struct i915_request *hung_rq,
struct drm_printer *m);
bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 192784875a1d..2978c8d45021 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -2076,6 +2076,47 @@ void i915_request_show(struct drm_printer *m, name); }
+static bool engine_match_ring(struct intel_engine_cs *engine, struct i915_request *rq) +{
- u32 ring = ENGINE_READ(engine, RING_START);
- return ring == i915_ggtt_offset(rq->ring->vma);
+}
+static bool match_ring(struct i915_request *rq) +{
- struct intel_engine_cs *engine;
- bool found;
- int i;
- if (!intel_engine_is_virtual(rq->engine))
return engine_match_ring(rq->engine, rq);
- found = false;
- i = 0;
- while ((engine = intel_engine_get_sibling(rq->engine, i++))) {
found = engine_match_ring(engine, rq);
if (found)
break;
- }
- return found;
+}
+enum i915_request_state i915_test_request_state(struct i915_request *rq) +{
- if (i915_request_completed(rq))
return I915_REQUEST_COMPLETE;
- if (!i915_request_started(rq))
return I915_REQUEST_PENDING;
- if (match_ring(rq))
return I915_REQUEST_ACTIVE;
- return I915_REQUEST_QUEUED;
+}
#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/mock_request.c" #include "selftests/i915_request.c" diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index bcc6340c505e..f98385f72782 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -651,4 +651,15 @@ i915_request_active_engine(struct i915_request *rq,
void i915_request_notify_execute_cb_imm(struct i915_request *rq);
+enum i915_request_state +{
- I915_REQUEST_UNKNOWN = 0,
- I915_REQUEST_COMPLETE,
- I915_REQUEST_PENDING,
- I915_REQUEST_QUEUED,
- I915_REQUEST_ACTIVE,
+};
+enum i915_request_state i915_test_request_state(struct i915_request *rq);
#endif /* I915_REQUEST_H */
2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
From: John Harrison John.C.Harrison@Intel.com
Use the official driver default scheduling policies for configuring the GuC scheduler rather than a bunch of hardcoded values.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Jose Souza jose.souza@intel.com --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 44 ++++++++++++++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +++-- 4 files changed, 53 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 0ceffa2be7a7..37db857bb56c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -455,6 +455,7 @@ struct intel_engine_cs { #define I915_ENGINE_IS_VIRTUAL BIT(5) #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) +#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) unsigned int flags;
/* diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index c38365cd5fab..905ecbc7dbe3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -270,6 +270,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
void intel_guc_find_hung_context(struct intel_engine_cs *engine);
+int intel_guc_global_policies_update(struct intel_guc *guc); + void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); void intel_guc_submission_reset_finish(struct intel_guc *guc); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index d3e86ab7508f..2ad5fcd4e1b7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -77,14 +77,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc) guc_ads_private_data_size(guc); }
-static void guc_policies_init(struct guc_policies *policies) +static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies) { + struct intel_gt *gt = guc_to_gt(guc); + struct drm_i915_private *i915 = gt->i915; + policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US; policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI; + policies->global_flags = 0; + if (i915->params.reset < 2) + policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET; + policies->is_valid = 1; }
+static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset) +{ + u32 action[] = { + INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE, + policy_offset + }; + + return intel_guc_send(guc, action, ARRAY_SIZE(action)); +} + +int intel_guc_global_policies_update(struct intel_guc *guc) +{ + struct __guc_ads_blob *blob = guc->ads_blob; + struct intel_gt *gt = guc_to_gt(guc); + intel_wakeref_t wakeref; + int ret; + + if (!blob) + return -ENOTSUPP; + + GEM_BUG_ON(!blob->ads.scheduler_policies); + + guc_policies_init(guc, &blob->policies); + + if (!intel_guc_is_ready(guc)) + return 0; + + with_intel_runtime_pm(>->i915->runtime_pm, wakeref) + ret = guc_action_policies_update(guc, blob->ads.scheduler_policies); + + return ret; +} + static void guc_mapping_table_init(struct intel_gt *gt, struct guc_gt_system_info *system_info) { @@ -281,7 +321,7 @@ static void __guc_ads_init(struct intel_guc *guc) u8 engine_class, guc_class;
/* GuC scheduling policies */ - guc_policies_init(&blob->policies); + guc_policies_init(guc, &blob->policies);
/* * GuC expects a per-engine-class context image and size diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 6188189314d5..a427336ce916 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -873,6 +873,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc) GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h)); atomic_set(&guc->outstanding_submission_g2h, 0);
+ intel_guc_global_policies_update(guc); enable_submission(guc); intel_gt_unpark_heartbeats(guc_to_gt(guc)); } @@ -1161,8 +1162,12 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, { desc->policy_flags = 0;
- desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US; - desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US; + if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION) + desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE; + + /* NB: For both of these, zero means disabled. */ + desc->execution_quantum = engine->props.timeslice_duration_ms * 1000; + desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000; }
static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) @@ -1945,13 +1950,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->set_default_submission = guc_set_default_submission;
engine->flags |= I915_ENGINE_HAS_PREEMPTION; + engine->flags |= I915_ENGINE_HAS_TIMESLICES;
/* * TODO: GuC supports timeslicing and semaphores as well, but they're * handled by the firmware so some minor tweaks are required before * enabling. * - * engine->flags |= I915_ENGINE_HAS_TIMESLICES; * engine->flags |= I915_ENGINE_HAS_SEMAPHORES; */
On Thu, Jun 24, 2021 at 12:05:12AM -0700, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
Use the official driver default scheduling policies for configuring the GuC scheduler rather than a bunch of hardcoded values.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Jose Souza jose.souza@intel.com
drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 44 ++++++++++++++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +++-- 4 files changed, 53 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 0ceffa2be7a7..37db857bb56c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -455,6 +455,7 @@ struct intel_engine_cs { #define I915_ENGINE_IS_VIRTUAL BIT(5) #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) +#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) unsigned int flags;
/* diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index c38365cd5fab..905ecbc7dbe3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -270,6 +270,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
void intel_guc_find_hung_context(struct intel_engine_cs *engine);
+int intel_guc_global_policies_update(struct intel_guc *guc);
void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); void intel_guc_submission_reset_finish(struct intel_guc *guc); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index d3e86ab7508f..2ad5fcd4e1b7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -77,14 +77,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc) guc_ads_private_data_size(guc); }
-static void guc_policies_init(struct guc_policies *policies) +static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies) {
- struct intel_gt *gt = guc_to_gt(guc);
- struct drm_i915_private *i915 = gt->i915;
- policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US; policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
- policies->global_flags = 0;
- if (i915->params.reset < 2)
policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
- policies->is_valid = 1;
}
+static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset) +{
- u32 action[] = {
INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE,
policy_offset
- };
- return intel_guc_send(guc, action, ARRAY_SIZE(action));
+}
+int intel_guc_global_policies_update(struct intel_guc *guc) +{
- struct __guc_ads_blob *blob = guc->ads_blob;
- struct intel_gt *gt = guc_to_gt(guc);
- intel_wakeref_t wakeref;
- int ret;
- if (!blob)
return -ENOTSUPP;
- GEM_BUG_ON(!blob->ads.scheduler_policies);
- guc_policies_init(guc, &blob->policies);
- if (!intel_guc_is_ready(guc))
return 0;
- with_intel_runtime_pm(>->i915->runtime_pm, wakeref)
ret = guc_action_policies_update(guc, blob->ads.scheduler_policies);
- return ret;
+}
static void guc_mapping_table_init(struct intel_gt *gt, struct guc_gt_system_info *system_info) { @@ -281,7 +321,7 @@ static void __guc_ads_init(struct intel_guc *guc) u8 engine_class, guc_class;
/* GuC scheduling policies */
- guc_policies_init(&blob->policies);
guc_policies_init(guc, &blob->policies);
/*
- GuC expects a per-engine-class context image and size
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 6188189314d5..a427336ce916 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -873,6 +873,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc) GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h)); atomic_set(&guc->outstanding_submission_g2h, 0);
- intel_guc_global_policies_update(guc); enable_submission(guc); intel_gt_unpark_heartbeats(guc_to_gt(guc));
} @@ -1161,8 +1162,12 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, { desc->policy_flags = 0;
- desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
- desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
- if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
I can't see where we set this in this series, although I do see a selftest we need to fixup that sets this. Perhaps we drop this until we fix that selftest? Or at minimum add a comment saying it will be used in the future by selftests. What do you think John?
desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
- /* NB: For both of these, zero means disabled. */
- desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
- desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
}
static int guc_lrc_desc_pin(struct intel_context *ce, bool loop) @@ -1945,13 +1950,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->set_default_submission = guc_set_default_submission;
engine->flags |= I915_ENGINE_HAS_PREEMPTION;
engine->flags |= I915_ENGINE_HAS_TIMESLICES;
/*
- TODO: GuC supports timeslicing and semaphores as well, but they're
Nit, we now support timeslicing. I can fix that up in next rev.
Matt
* handled by the firmware so some minor tweaks are required before * enabling. *
* engine->flags |= I915_ENGINE_HAS_TIMESLICES;
*/
- engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 6/24/2021 17:59, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 12:05:12AM -0700, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
Use the official driver default scheduling policies for configuring the GuC scheduler rather than a bunch of hardcoded values.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Jose Souza jose.souza@intel.com
drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 44 ++++++++++++++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +++-- 4 files changed, 53 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 0ceffa2be7a7..37db857bb56c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -455,6 +455,7 @@ struct intel_engine_cs { #define I915_ENGINE_IS_VIRTUAL BIT(5) #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) +#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) unsigned int flags;
/* diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index c38365cd5fab..905ecbc7dbe3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -270,6 +270,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
void intel_guc_find_hung_context(struct intel_engine_cs *engine);
+int intel_guc_global_policies_update(struct intel_guc *guc);
- void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index d3e86ab7508f..2ad5fcd4e1b7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -77,14 +77,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc) guc_ads_private_data_size(guc); }
-static void guc_policies_init(struct guc_policies *policies) +static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies) {
- struct intel_gt *gt = guc_to_gt(guc);
- struct drm_i915_private *i915 = gt->i915;
- policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US; policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
- policies->global_flags = 0;
- if (i915->params.reset < 2)
policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
- policies->is_valid = 1; }
+static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset) +{
- u32 action[] = {
INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE,
policy_offset
- };
- return intel_guc_send(guc, action, ARRAY_SIZE(action));
+}
+int intel_guc_global_policies_update(struct intel_guc *guc) +{
- struct __guc_ads_blob *blob = guc->ads_blob;
- struct intel_gt *gt = guc_to_gt(guc);
- intel_wakeref_t wakeref;
- int ret;
- if (!blob)
return -ENOTSUPP;
- GEM_BUG_ON(!blob->ads.scheduler_policies);
- guc_policies_init(guc, &blob->policies);
- if (!intel_guc_is_ready(guc))
return 0;
- with_intel_runtime_pm(>->i915->runtime_pm, wakeref)
ret = guc_action_policies_update(guc, blob->ads.scheduler_policies);
- return ret;
+}
- static void guc_mapping_table_init(struct intel_gt *gt, struct guc_gt_system_info *system_info) {
@@ -281,7 +321,7 @@ static void __guc_ads_init(struct intel_guc *guc) u8 engine_class, guc_class;
/* GuC scheduling policies */
- guc_policies_init(&blob->policies);
guc_policies_init(guc, &blob->policies);
/*
- GuC expects a per-engine-class context image and size
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 6188189314d5..a427336ce916 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -873,6 +873,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc) GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h)); atomic_set(&guc->outstanding_submission_g2h, 0);
- intel_guc_global_policies_update(guc); enable_submission(guc); intel_gt_unpark_heartbeats(guc_to_gt(guc)); }
@@ -1161,8 +1162,12 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, { desc->policy_flags = 0;
- desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
- desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
- if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
I can't see where we set this in this series, although I do see a selftest we need to fixup that sets this. Perhaps we drop this until we fix that selftest? Or at minimum add a comment saying it will be used in the future by selftests. What do you think John?
Yeah, it is only ever intended to be used by selftests. So yes, it could be punted down the road until the selftest patch. Likewise the definition for the flag, too.
John.
desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
/* NB: For both of these, zero means disabled. */
desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000; }
static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
@@ -1945,13 +1950,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->set_default_submission = guc_set_default_submission;
engine->flags |= I915_ENGINE_HAS_PREEMPTION;
engine->flags |= I915_ENGINE_HAS_TIMESLICES;
/*
- TODO: GuC supports timeslicing and semaphores as well, but they're
Nit, we now support timeslicing. I can fix that up in next rev.
Matt
* handled by the firmware so some minor tweaks are required before * enabling. *
* engine->flags |= I915_ENGINE_HAS_TIMESLICES;
*/
- engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Jun 25, 2021 at 12:10:46PM -0700, John Harrison wrote:
On 6/24/2021 17:59, Matthew Brost wrote:
On Thu, Jun 24, 2021 at 12:05:12AM -0700, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
Use the official driver default scheduling policies for configuring the GuC scheduler rather than a bunch of hardcoded values.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: Jose Souza jose.souza@intel.com
drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 44 ++++++++++++++++++- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 +++-- 4 files changed, 53 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 0ceffa2be7a7..37db857bb56c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -455,6 +455,7 @@ struct intel_engine_cs { #define I915_ENGINE_IS_VIRTUAL BIT(5) #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) +#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) unsigned int flags; /* diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index c38365cd5fab..905ecbc7dbe3 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -270,6 +270,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, void intel_guc_find_hung_context(struct intel_engine_cs *engine); +int intel_guc_global_policies_update(struct intel_guc *guc);
- void intel_guc_submission_reset_prepare(struct intel_guc *guc); void intel_guc_submission_reset(struct intel_guc *guc, bool stalled); void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index d3e86ab7508f..2ad5fcd4e1b7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -77,14 +77,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc) guc_ads_private_data_size(guc); } -static void guc_policies_init(struct guc_policies *policies) +static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies) {
- struct intel_gt *gt = guc_to_gt(guc);
- struct drm_i915_private *i915 = gt->i915;
- policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US; policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
- policies->global_flags = 0;
- if (i915->params.reset < 2)
policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
- policies->is_valid = 1; }
+static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset) +{
- u32 action[] = {
INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE,
policy_offset
- };
- return intel_guc_send(guc, action, ARRAY_SIZE(action));
+}
+int intel_guc_global_policies_update(struct intel_guc *guc) +{
- struct __guc_ads_blob *blob = guc->ads_blob;
- struct intel_gt *gt = guc_to_gt(guc);
- intel_wakeref_t wakeref;
- int ret;
- if (!blob)
return -ENOTSUPP;
- GEM_BUG_ON(!blob->ads.scheduler_policies);
- guc_policies_init(guc, &blob->policies);
- if (!intel_guc_is_ready(guc))
return 0;
- with_intel_runtime_pm(>->i915->runtime_pm, wakeref)
ret = guc_action_policies_update(guc, blob->ads.scheduler_policies);
- return ret;
+}
- static void guc_mapping_table_init(struct intel_gt *gt, struct guc_gt_system_info *system_info) {
@@ -281,7 +321,7 @@ static void __guc_ads_init(struct intel_guc *guc) u8 engine_class, guc_class; /* GuC scheduling policies */
- guc_policies_init(&blob->policies);
- guc_policies_init(guc, &blob->policies); /*
- GuC expects a per-engine-class context image and size
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 6188189314d5..a427336ce916 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -873,6 +873,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc) GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h)); atomic_set(&guc->outstanding_submission_g2h, 0);
- intel_guc_global_policies_update(guc); enable_submission(guc); intel_gt_unpark_heartbeats(guc_to_gt(guc)); }
@@ -1161,8 +1162,12 @@ static void guc_context_policy_init(struct intel_engine_cs *engine, { desc->policy_flags = 0;
- desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
- desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
- if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
I can't see where we set this in this series, although I do see a selftest we need to fixup that sets this. Perhaps we drop this until we fix that selftest? Or at minimum add a comment saying it will be used in the future by selftests. What do you think John?
Yeah, it is only ever intended to be used by selftests. So yes, it could be punted down the road until the selftest patch. Likewise the definition for the flag, too.
Ok, fixed that in my branch. With that: Reviewed-by: Matthew Brost matthew.brost@intel.com
John.
desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
- /* NB: For both of these, zero means disabled. */
- desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
- desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000; } static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
@@ -1945,13 +1950,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine) engine->set_default_submission = guc_set_default_submission; engine->flags |= I915_ENGINE_HAS_PREEMPTION;
- engine->flags |= I915_ENGINE_HAS_TIMESLICES; /*
- TODO: GuC supports timeslicing and semaphores as well, but they're
Nit, we now support timeslicing. I can fix that up in next rev.
Matt
* handled by the firmware so some minor tweaks are required before * enabling. *
* engine->flags |= I915_ENGINE_HAS_TIMESLICES;
*/
- engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
From: John Harrison John.C.Harrison@Intel.com
Changing the reset module parameter has no effect on a running GuC. The corresponding entry in the ADS must be updated and then the GuC informed via a Host2GuC message.
The new debugfs interface to module parameters allows this to happen. However, connecting the parameter data address back to anything useful is messy. One option would be to pass a new private data structure address through instead of just the parameter pointer. However, that means having a new (and different) data structure for each parameter and a new (and different) write function for each parameter. This method keeps everything generic by instead using a string lookup on the directory entry name.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 2 +- drivers/gpu/drm/i915/i915_debugfs_params.c | 31 ++++++++++++++++++++++ 2 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index 2ad5fcd4e1b7..c6d0b762d82c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -99,7 +99,7 @@ static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset) policy_offset };
- return intel_guc_send(guc, action, ARRAY_SIZE(action)); + return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true); }
int intel_guc_global_policies_update(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/i915_debugfs_params.c b/drivers/gpu/drm/i915/i915_debugfs_params.c index 4e2b077692cb..8ecd8b42f048 100644 --- a/drivers/gpu/drm/i915/i915_debugfs_params.c +++ b/drivers/gpu/drm/i915/i915_debugfs_params.c @@ -6,9 +6,20 @@ #include <linux/kernel.h>
#include "i915_debugfs_params.h" +#include "gt/intel_gt.h" +#include "gt/uc/intel_guc.h" #include "i915_drv.h" #include "i915_params.h"
+#define MATCH_DEBUGFS_NODE_NAME(_file, _name) (strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0) + +#define GET_I915(i915, name, ptr) \ + do { \ + struct i915_params *params; \ + params = container_of(((void *) (ptr)), typeof(*params), name); \ + (i915) = container_of(params, typeof(*(i915)), params); \ + } while(0) + /* int param */ static int i915_param_int_show(struct seq_file *m, void *data) { @@ -24,6 +35,16 @@ static int i915_param_int_open(struct inode *inode, struct file *file) return single_open(file, i915_param_int_show, inode->i_private); }
+static int notify_guc(struct drm_i915_private *i915) +{ + int ret = 0; + + if (intel_uc_uses_guc_submission(&i915->gt.uc)) + ret = intel_guc_global_policies_update(&i915->gt.uc.guc); + + return ret; +} + static ssize_t i915_param_int_write(struct file *file, const char __user *ubuf, size_t len, loff_t *offp) @@ -81,8 +102,10 @@ static ssize_t i915_param_uint_write(struct file *file, const char __user *ubuf, size_t len, loff_t *offp) { + struct drm_i915_private *i915; struct seq_file *m = file->private_data; unsigned int *value = m->private; + unsigned int old = *value; int ret;
ret = kstrtouint_from_user(ubuf, len, 0, value); @@ -95,6 +118,14 @@ static ssize_t i915_param_uint_write(struct file *file, *value = b; }
+ if (!ret && MATCH_DEBUGFS_NODE_NAME(file, "reset")) { + GET_I915(i915, reset, value); + + ret = notify_guc(i915); + if (ret) + *value = old; + } + return ret ?: len; }
On Thu, Jun 24, 2021 at 12:05:13AM -0700, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
Changing the reset module parameter has no effect on a running GuC. The corresponding entry in the ADS must be updated and then the GuC informed via a Host2GuC message.
The new debugfs interface to module parameters allows this to happen. However, connecting the parameter data address back to anything useful is messy. One option would be to pass a new private data structure address through instead of just the parameter pointer. However, that means having a new (and different) data structure for each parameter and a new (and different) write function for each parameter. This method keeps everything generic by instead using a string lookup on the directory entry name.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 2 +- drivers/gpu/drm/i915/i915_debugfs_params.c | 31 ++++++++++++++++++++++ 2 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index 2ad5fcd4e1b7..c6d0b762d82c 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -99,7 +99,7 @@ static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset) policy_offset };
- return intel_guc_send(guc, action, ARRAY_SIZE(action));
- return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
}
int intel_guc_global_policies_update(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/i915_debugfs_params.c b/drivers/gpu/drm/i915/i915_debugfs_params.c index 4e2b077692cb..8ecd8b42f048 100644 --- a/drivers/gpu/drm/i915/i915_debugfs_params.c +++ b/drivers/gpu/drm/i915/i915_debugfs_params.c @@ -6,9 +6,20 @@ #include <linux/kernel.h>
#include "i915_debugfs_params.h" +#include "gt/intel_gt.h" +#include "gt/uc/intel_guc.h" #include "i915_drv.h" #include "i915_params.h"
+#define MATCH_DEBUGFS_NODE_NAME(_file, _name) (strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0)
+#define GET_I915(i915, name, ptr) \
- do { \
struct i915_params *params; \
params = container_of(((void *) (ptr)), typeof(*params), name); \
(i915) = container_of(params, typeof(*(i915)), params); \
- } while(0)
/* int param */ static int i915_param_int_show(struct seq_file *m, void *data) { @@ -24,6 +35,16 @@ static int i915_param_int_open(struct inode *inode, struct file *file) return single_open(file, i915_param_int_show, inode->i_private); }
+static int notify_guc(struct drm_i915_private *i915) +{
- int ret = 0;
- if (intel_uc_uses_guc_submission(&i915->gt.uc))
ret = intel_guc_global_policies_update(&i915->gt.uc.guc);
- return ret;
+}
static ssize_t i915_param_int_write(struct file *file, const char __user *ubuf, size_t len, loff_t *offp) @@ -81,8 +102,10 @@ static ssize_t i915_param_uint_write(struct file *file, const char __user *ubuf, size_t len, loff_t *offp) {
struct drm_i915_private *i915; struct seq_file *m = file->private_data; unsigned int *value = m->private;
unsigned int old = *value; int ret;
ret = kstrtouint_from_user(ubuf, len, 0, value);
@@ -95,6 +118,14 @@ static ssize_t i915_param_uint_write(struct file *file, *value = b; }
- if (!ret && MATCH_DEBUGFS_NODE_NAME(file, "reset")) {
GET_I915(i915, reset, value);
We might want to make this into a macro in case we need to update more than just "reset" with the GuC going forward but that is not a blocker.
With that: Reviewed-by: Matthew Brost matthew.brost@intel.com
ret = notify_guc(i915);
if (ret)
*value = old;
- }
- return ret ?: len;
}
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
From: John Harrison John.C.Harrison@Intel.com
Added the scheduling policy parameters to the 'guc_info' debugfs state dump.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 13 +++++++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 2 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 2 ++ 3 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index c6d0b762d82c..b8182844aa00 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -92,6 +92,19 @@ static void guc_policies_init(struct intel_guc *guc, struct guc_policies *polici policies->is_valid = 1; }
+void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *dp) +{ + struct __guc_ads_blob *blob = guc->ads_blob; + + if (unlikely(!blob)) + return; + + drm_printf(dp, "Global scheduling policies:\n"); + drm_printf(dp, " DPC promote time = %u\n", blob->policies.dpc_promote_time); + drm_printf(dp, " Max num work items = %u\n", blob->policies.max_num_work_items); + drm_printf(dp, " Flags = %u\n", blob->policies.global_flags); +} + static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset) { u32 action[] = { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h index b00d3ae1113a..0fdcb3583601 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h @@ -7,9 +7,11 @@ #define _INTEL_GUC_ADS_H_
struct intel_guc; +struct drm_printer;
int intel_guc_ads_create(struct intel_guc *guc); void intel_guc_ads_destroy(struct intel_guc *guc); void intel_guc_ads_reset(struct intel_guc *guc); +void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *p);
#endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index 62b9ce0fafaa..9a03ff56e654 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -10,6 +10,7 @@ #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" #include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_ads.h" #include "gt/uc/intel_guc_submission.h"
static int guc_info_show(struct seq_file *m, void *data) @@ -29,6 +30,7 @@ static int guc_info_show(struct seq_file *m, void *data)
intel_guc_log_ct_info(&guc->ct, &p); intel_guc_log_submission_info(guc, &p); + intel_guc_log_policy_info(guc, &p);
return 0; }
On Thu, Jun 24, 2021 at 12:05:14AM -0700, Matthew Brost wrote:
From: John Harrison John.C.Harrison@Intel.com
Added the scheduling policy parameters to the 'guc_info' debugfs state dump.
Signed-off-by: John Harrison john.c.harrison@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 13 +++++++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 2 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 2 ++ 3 files changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index c6d0b762d82c..b8182844aa00 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -92,6 +92,19 @@ static void guc_policies_init(struct intel_guc *guc, struct guc_policies *polici policies->is_valid = 1; }
+void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *dp) +{
- struct __guc_ads_blob *blob = guc->ads_blob;
- if (unlikely(!blob))
return;
- drm_printf(dp, "Global scheduling policies:\n");
- drm_printf(dp, " DPC promote time = %u\n", blob->policies.dpc_promote_time);
- drm_printf(dp, " Max num work items = %u\n", blob->policies.max_num_work_items);
- drm_printf(dp, " Flags = %u\n", blob->policies.global_flags);
+}
static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset) { u32 action[] = { diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h index b00d3ae1113a..0fdcb3583601 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h @@ -7,9 +7,11 @@ #define _INTEL_GUC_ADS_H_
struct intel_guc; +struct drm_printer;
int intel_guc_ads_create(struct intel_guc *guc); void intel_guc_ads_destroy(struct intel_guc *guc); void intel_guc_ads_reset(struct intel_guc *guc); +void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *p);
#endif diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c index 62b9ce0fafaa..9a03ff56e654 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c @@ -10,6 +10,7 @@ #include "intel_guc_debugfs.h" #include "intel_guc_log_debugfs.h" #include "gt/uc/intel_guc_ct.h" +#include "gt/uc/intel_guc_ads.h" #include "gt/uc/intel_guc_submission.h"
static int guc_info_show(struct seq_file *m, void *data) @@ -29,6 +30,7 @@ static int guc_info_show(struct seq_file *m, void *data)
intel_guc_log_ct_info(&guc->ct, &p); intel_guc_log_submission_info(guc, &p);
intel_guc_log_policy_info(guc, &p);
return 0;
}
2.28.0
From: John Harrison John.C.Harrison@Intel.com
The media watchdog mechanism involves GuC doing a silent reset and continue of the hung context. This requires the i915 driver provide a golden context to GuC in the ADS.
Signed-off-by: John Harrison John.C.Harrison@Intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/intel_gt.c | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 5 + drivers/gpu/drm/i915/gt/uc/intel_guc.h | 2 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++--- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_uc.c | 5 + drivers/gpu/drm/i915/gt/uc/intel_uc.h | 1 + 7 files changed, 199 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index acfdd53b2678..ceeb517ba259 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt) if (err) goto err_gt;
+ intel_uc_init_late(>->uc); + err = i915_inject_probe_error(gt->i915, -EIO); if (err) goto err_gt; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 68266cbffd1f..979128e28372 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc) } }
+void intel_guc_init_late(struct intel_guc *guc) +{ + intel_guc_ads_init_late(guc); +} + static u32 guc_ctl_debug_flags(struct intel_guc *guc) { u32 level = intel_guc_log_get_level(&guc->log); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 905ecbc7dbe3..fae01dc8e1b9 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -59,6 +59,7 @@ struct intel_guc { struct i915_vma *ads_vma; struct __guc_ads_blob *ads_blob; u32 ads_regset_size; + u32 ads_golden_ctxt_size;
struct i915_vma *lrc_desc_pool; void *lrc_desc_pool_vaddr; @@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc, }
void intel_guc_init_early(struct intel_guc *guc); +void intel_guc_init_late(struct intel_guc *guc); void intel_guc_init_send_regs(struct intel_guc *guc); void intel_guc_write_params(struct intel_guc *guc); int intel_guc_init(struct intel_guc *guc); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c index b8182844aa00..dfaeafc512fb 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c @@ -7,6 +7,7 @@
#include "gt/intel_gt.h" #include "gt/intel_lrc.h" +#include "gt/shmem_utils.h" #include "intel_guc_ads.h" #include "intel_guc_fwif.h" #include "intel_uc.h" @@ -33,6 +34,10 @@ * +---------------------------------------+ <== dynamic * | padding | * +---------------------------------------+ <== 4K aligned + * | golden contexts | + * +---------------------------------------+ + * | padding | + * +---------------------------------------+ <== 4K aligned * | private data | * +---------------------------------------+ * | padding | @@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc) return guc->ads_regset_size; }
+static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc) +{ + return PAGE_ALIGN(guc->ads_golden_ctxt_size); +} + static u32 guc_ads_private_data_size(struct intel_guc *guc) { return PAGE_ALIGN(guc->fw.private_data_size); @@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc) return offsetof(struct __guc_ads_blob, regset); }
-static u32 guc_ads_private_data_offset(struct intel_guc *guc) +static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc) { u32 offset;
offset = guc_ads_regset_offset(guc) + guc_ads_regset_size(guc); + + return PAGE_ALIGN(offset); +} + +static u32 guc_ads_private_data_offset(struct intel_guc *guc) +{ + u32 offset; + + offset = guc_ads_golden_ctxt_offset(guc) + + guc_ads_golden_ctxt_size(guc); + return PAGE_ALIGN(offset); }
@@ -318,53 +339,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc, GEM_BUG_ON(temp_set.size); }
-/* - * The first 80 dwords of the register state context, containing the - * execlists and ppgtt registers. - */ -#define LR_HW_CONTEXT_SIZE (80 * sizeof(u32)) +static void fill_engine_enable_masks(struct intel_gt *gt, + struct guc_gt_system_info *info) +{ + info->engine_enabled_masks[GUC_RENDER_CLASS] = 1; + info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1; + info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt); + info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt); +}
-static void __guc_ads_init(struct intel_guc *guc) +/* Skip execlist and PPGTT registers */ +#define LR_HW_CONTEXT_SIZE (80 * sizeof(u32)) +#define SKIP_SIZE (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE) + +static int guc_prep_golden_context(struct intel_guc *guc, + struct __guc_ads_blob *blob) { struct intel_gt *gt = guc_to_gt(guc); - struct drm_i915_private *i915 = gt->i915; + u32 addr_ggtt, offset; + u32 total_size = 0, alloc_size, real_size; + u8 engine_class, guc_class; + struct guc_gt_system_info *info, local_info; + + /* + * Reserve the memory for the golden contexts and point GuC at it but + * leave it empty for now. The context data will be filled in later + * once there is something available to put there. + * + * Note that the HWSP and ring context are not included. + * + * Note also that the storage must be pinned in the GGTT, so that the + * address won't change after GuC has been told where to find it. The + * GuC will also validate that the LRC base + size fall within the + * allowed GGTT range. + */ + if (blob) { + offset = guc_ads_golden_ctxt_offset(guc); + addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset; + info = &blob->system_info; + } else { + memset(&local_info, 0, sizeof(local_info)); + info = &local_info; + fill_engine_enable_masks(gt, info); + } + + for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) { + if (engine_class == OTHER_CLASS) + continue; + + guc_class = engine_class_to_guc_class(engine_class); + + if (!info->engine_enabled_masks[guc_class]) + continue; + + real_size = intel_engine_context_size(gt, engine_class); + alloc_size = PAGE_ALIGN(real_size); + total_size += alloc_size; + + if (!blob) + continue; + + blob->ads.eng_state_size[guc_class] = real_size; + blob->ads.golden_context_lrca[guc_class] = addr_ggtt; + addr_ggtt += alloc_size; + } + + if (!blob) + return total_size; + + GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size); + return total_size; +} + +static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, gt, id) { + if (engine->class != engine_class) + continue; + + if (!engine->default_state) + continue; + + return engine; + } + + return NULL; +} + +static void guc_init_golden_context(struct intel_guc *guc) +{ struct __guc_ads_blob *blob = guc->ads_blob; - const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE; - u32 base; + struct intel_engine_cs *engine; + struct intel_gt *gt = guc_to_gt(guc); + u32 addr_ggtt, offset; + u32 total_size = 0, alloc_size, real_size; u8 engine_class, guc_class; + u8 *ptr;
- /* GuC scheduling policies */ - guc_policies_init(guc, &blob->policies); + if (!intel_uc_uses_guc_submission(>->uc)) + return; + + GEM_BUG_ON(!blob);
/* - * GuC expects a per-engine-class context image and size - * (minus hwsp and ring context). The context image will be - * used to reinitialize engines after a reset. It must exist - * and be pinned in the GGTT, so that the address won't change after - * we have told GuC where to find it. The context size will be used - * to validate that the LRC base + size fall within allowed GGTT. + * Go back and fill in the golden context data now that it is + * available. */ + offset = guc_ads_golden_ctxt_offset(guc); + addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset; + ptr = ((u8 *) blob) + offset; + for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) { if (engine_class == OTHER_CLASS) continue;
guc_class = engine_class_to_guc_class(engine_class);
- /* - * TODO: Set context pointer to default state to allow - * GuC to re-init guilty contexts after internal reset. - */ - blob->ads.golden_context_lrca[guc_class] = 0; - blob->ads.eng_state_size[guc_class] = - intel_engine_context_size(gt, engine_class) - - skipped_size; + if (!blob->system_info.engine_enabled_masks[guc_class]) + continue; + + real_size = intel_engine_context_size(gt, engine_class); + alloc_size = PAGE_ALIGN(real_size); + total_size += alloc_size; + + engine = find_engine_state(gt, engine_class); + if (!engine) { + drm_err(>->i915->drm, "No engine state recorded for class %d!\n", engine_class); + blob->ads.eng_state_size[guc_class] = 0; + blob->ads.golden_context_lrca[guc_class] = 0; + continue; + } + + GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size); + GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt); + addr_ggtt += alloc_size; + + shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size); + ptr += alloc_size; }
+ GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size); +} + +static void __guc_ads_init(struct intel_guc *guc) +{ + struct intel_gt *gt = guc_to_gt(guc); + struct drm_i915_private *i915 = gt->i915; + struct __guc_ads_blob *blob = guc->ads_blob; + u32 base; + + /* GuC scheduling policies */ + guc_policies_init(guc, &blob->policies); + /* System info */ - blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1; - blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1; - blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt); - blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt); + fill_engine_enable_masks(gt, &blob->system_info);
blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] = hweight8(gt->info.sseu.slice_mask); @@ -379,6 +510,9 @@ static void __guc_ads_init(struct intel_guc *guc) GEN12_DOORBELLS_PER_SQIDI) + 1; }
+ /* Golden contexts for re-initialising after a watchdog reset */ + guc_prep_golden_context(guc, blob); + guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
base = intel_guc_ggtt_offset(guc, guc->ads_vma); @@ -416,6 +550,13 @@ int intel_guc_ads_create(struct intel_guc *guc) return ret; guc->ads_regset_size = ret;
+ /* Likewise the golden contexts: */ + ret = guc_prep_golden_context(guc, NULL); + if (ret < 0) + return ret; + guc->ads_golden_ctxt_size = ret; + + /* Now the total size can be determined: */ size = guc_ads_blob_size(guc);
ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma, @@ -428,6 +569,18 @@ int intel_guc_ads_create(struct intel_guc *guc) return 0; }
+void intel_guc_ads_init_late(struct intel_guc *guc) +{ + /* + * The golden context setup requires the saved engine state from + * __engines_record_defaults(). However, that requires engines to be + * operational which means the ADS must already have been configured. + * Fortunately, the golden context state is not needed until a hang + * occurs, so it can be filled in during this late init phase. + */ + guc_init_golden_context(guc); +} + void intel_guc_ads_destroy(struct intel_guc *guc) { i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h index 0fdcb3583601..dac0dc32da34 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h @@ -11,6 +11,7 @@ struct drm_printer;
int intel_guc_ads_create(struct intel_guc *guc); void intel_guc_ads_destroy(struct intel_guc *guc); +void intel_guc_ads_init_late(struct intel_guc *guc); void intel_guc_ads_reset(struct intel_guc *guc); void intel_guc_log_policy_info(struct intel_guc *guc, struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 77c1fe2ed883..7a69c3c027e9 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc) uc->ops = &uc_ops_off; }
+void intel_uc_init_late(struct intel_uc *uc) +{ + intel_guc_init_late(&uc->guc); +} + void intel_uc_driver_late_release(struct intel_uc *uc) { } diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h index 91315e3f1c58..e2da2b6e76e1 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h @@ -35,6 +35,7 @@ struct intel_uc { };
void intel_uc_init_early(struct intel_uc *uc); +void intel_uc_init_late(struct intel_uc *uc); void intel_uc_driver_late_release(struct intel_uc *uc); void intel_uc_driver_remove(struct intel_uc *uc); void intel_uc_init_mmio(struct intel_uc *uc);
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com --- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +++++++++----- 4 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index fae01dc8e1b9..77981788204f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -54,6 +54,7 @@ struct intel_guc { struct ida guc_ids; struct list_head guc_id_list;
+ bool submission_supported; bool submission_selected;
struct i915_vma *ads_vma; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a427336ce916..405339202280 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc) /* Note: By the time we're here, GuC may have already been reset */ }
+static bool __guc_submission_supported(struct intel_guc *guc) +{ + /* GuC submission is unavailable for pre-Gen11 */ + return intel_guc_is_supported(guc) && + INTEL_GEN(guc_to_gt(guc)->i915) >= 11; +} + static bool __guc_submission_selected(struct intel_guc *guc) { struct drm_i915_private *i915 = guc_to_gt(guc)->i915; @@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
void intel_guc_submission_init_early(struct intel_guc *guc) { + guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index a2a3fad72be1..be767eb6ff71 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) { - /* XXX: GuC submission is unavailable for now */ - return false; + return guc->submission_supported; }
static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) return; }
- /* Default: enable HuC authentication only */ - i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; + /* Intermediate platforms are HuC authentication only */ + if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { + drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n"); + i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; + return; + } + + /* Default: enable HuC authentication and GuC submission */ + i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION; }
/* Reset GuC providing us with fresh state for both GuC and HuC. @@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc) if (i915_inject_probe_failure(uc_to_gt(uc)->i915)) return -ENOMEM;
- /* XXX: GuC submission is unavailable for now */ - GEM_BUG_ON(intel_uc_uses_guc_submission(uc)); - ret = intel_guc_init(guc); if (ret) return ret;
On 24/06/2021 10:05, Matthew Brost wrote:
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +++++++++----- 4 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index fae01dc8e1b9..77981788204f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -54,6 +54,7 @@ struct intel_guc { struct ida guc_ids; struct list_head guc_id_list;
bool submission_supported; bool submission_selected;
struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a427336ce916..405339202280 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc) /* Note: By the time we're here, GuC may have already been reset */ }
+static bool __guc_submission_supported(struct intel_guc *guc) +{
- /* GuC submission is unavailable for pre-Gen11 */
- return intel_guc_is_supported(guc) &&
INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
+}
- static bool __guc_submission_selected(struct intel_guc *guc) { struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
@@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
void intel_guc_submission_init_early(struct intel_guc *guc) {
- guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index a2a3fad72be1..be767eb6ff71 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) {
- /* XXX: GuC submission is unavailable for now */
- return false;
return guc->submission_supported; }
static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) return; }
- /* Default: enable HuC authentication only */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
- /* Intermediate platforms are HuC authentication only */
- if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about:
"Disabling GuC on untested platforms"?
i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
return;
- }
- /* Default: enable HuC authentication and GuC submission */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
This seems to be in contradiction with the GuC submission plan which states:
"Not enabled by default on any current platforms but can be enabled via modparam enable_guc".
When you rework the patch, could you please add a warning when the user force-enables the GuC Command Submission? Something like:
"WARNING: The user force-enabled the experimental GuC command submission backend using i915.enable_guc. Please disable it if experiencing stability issues. No bug reports will be accepted on this backend".
This should allow you to work on the backend, while communicating clearly to users that it is not ready just yet. Once it has matured, the warning can be removed.
Cheers, Martin
}
/* Reset GuC providing us with fresh state for both GuC and HuC. @@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc) if (i915_inject_probe_failure(uc_to_gt(uc)->i915)) return -ENOMEM;
- /* XXX: GuC submission is unavailable for now */
- GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
- ret = intel_guc_init(guc); if (ret) return ret;
On Wed, Jun 30, 2021 at 11:22:38AM +0300, Martin Peres wrote:
On 24/06/2021 10:05, Matthew Brost wrote:
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +++++++++----- 4 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index fae01dc8e1b9..77981788204f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -54,6 +54,7 @@ struct intel_guc { struct ida guc_ids; struct list_head guc_id_list;
- bool submission_supported; bool submission_selected; struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a427336ce916..405339202280 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc) /* Note: By the time we're here, GuC may have already been reset */ } +static bool __guc_submission_supported(struct intel_guc *guc) +{
- /* GuC submission is unavailable for pre-Gen11 */
- return intel_guc_is_supported(guc) &&
INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
+}
- static bool __guc_submission_selected(struct intel_guc *guc) { struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
@@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc) void intel_guc_submission_init_early(struct intel_guc *guc) {
- guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index a2a3fad72be1..be767eb6ff71 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc, static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) {
- /* XXX: GuC submission is unavailable for now */
- return false;
- return guc->submission_supported; } static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) return; }
- /* Default: enable HuC authentication only */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
- /* Intermediate platforms are HuC authentication only */
- if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about:
"Disabling GuC on untested platforms"?
This isn't my comment but it seems right to me. AFAIK this describes the current PR but it is subject to change (i.e. we may enable GuC on DG1 by default at some point).
i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
return;
- }
- /* Default: enable HuC authentication and GuC submission */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
This seems to be in contradiction with the GuC submission plan which states:
"Not enabled by default on any current platforms but can be enabled via modparam enable_guc".
I don't believe any current platform gets this point where GuC submission would be enabled by default. The first would be ADL-P which isn't out yet.
When you rework the patch, could you please add a warning when the user force-enables the GuC Command Submission? Something like:
"WARNING: The user force-enabled the experimental GuC command submission backend using i915.enable_guc. Please disable it if experiencing stability issues. No bug reports will be accepted on this backend".
This should allow you to work on the backend, while communicating clearly to users that it is not ready just yet. Once it has matured, the warning can be removed.
This is a good idea but the only issue I see this message blowing up CI. We plan to enable GuC submission, via a modparam, on several platforms (e.g. TGL) where TGL isn't the PR in CI. I think if is a debug level message CI should be happy but I'll double check on this.
Matt
Cheers, Martin
} /* Reset GuC providing us with fresh state for both GuC and HuC. @@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc) if (i915_inject_probe_failure(uc_to_gt(uc)->i915)) return -ENOMEM;
- /* XXX: GuC submission is unavailable for now */
- GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
- ret = intel_guc_init(guc); if (ret) return ret;
On 30/06/2021 21:00, Matthew Brost wrote:
On Wed, Jun 30, 2021 at 11:22:38AM +0300, Martin Peres wrote:
On 24/06/2021 10:05, Matthew Brost wrote:
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +++++++++----- 4 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index fae01dc8e1b9..77981788204f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -54,6 +54,7 @@ struct intel_guc { struct ida guc_ids; struct list_head guc_id_list;
- bool submission_supported; bool submission_selected; struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a427336ce916..405339202280 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc) /* Note: By the time we're here, GuC may have already been reset */ } +static bool __guc_submission_supported(struct intel_guc *guc) +{
- /* GuC submission is unavailable for pre-Gen11 */
- return intel_guc_is_supported(guc) &&
INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
+}
- static bool __guc_submission_selected(struct intel_guc *guc) { struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
@@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc) void intel_guc_submission_init_early(struct intel_guc *guc) {
- guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc); }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index a2a3fad72be1..be767eb6ff71 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc, static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) {
- /* XXX: GuC submission is unavailable for now */
- return false;
- return guc->submission_supported; } static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) return; }
- /* Default: enable HuC authentication only */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
- /* Intermediate platforms are HuC authentication only */
- if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about:
"Disabling GuC on untested platforms"?
This isn't my comment but it seems right to me. AFAIK this describes the current PR but it is subject to change (i.e. we may enable GuC on DG1 by default at some point).
Well, it's pretty bad PR to say that DG1 and ADL are old when they are not even out ;)
But seriously, fix this sentence, it makes no sense at all unless you are really trying to confuse non-native speakers (and annoy language purists too).
i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
return;
- }
- /* Default: enable HuC authentication and GuC submission */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
This seems to be in contradiction with the GuC submission plan which states:
"Not enabled by default on any current platforms but can be enabled via modparam enable_guc".
I don't believe any current platform gets this point where GuC submission would be enabled by default. The first would be ADL-P which isn't out yet.
Isn't that exactly what the line above does?
When you rework the patch, could you please add a warning when the user force-enables the GuC Command Submission? Something like:
"WARNING: The user force-enabled the experimental GuC command submission backend using i915.enable_guc. Please disable it if experiencing stability issues. No bug reports will be accepted on this backend".
This should allow you to work on the backend, while communicating clearly to users that it is not ready just yet. Once it has matured, the warning can be removed.
This is a good idea but the only issue I see this message blowing up CI. We plan to enable GuC submission, via a modparam, on several platforms (e.g. TGL) where TGL isn't the PR in CI. I think if is a debug level message CI should be happy but I'll double check on this.
Some taints would be problematic. The only issue you may have is the IGT reload tests which could give you a dmesg-warn if you were to use any level under level 5. So put is as an info message and you'll be good :)
In case of doubt, just ask Petri / Adrinael, he is your local IGT maintainer.
Martin
Matt
Cheers, Martin
} /* Reset GuC providing us with fresh state for both GuC and HuC. @@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc) if (i915_inject_probe_failure(uc_to_gt(uc)->i915)) return -ENOMEM;
- /* XXX: GuC submission is unavailable for now */
- GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
- ret = intel_guc_init(guc); if (ret) return ret;
On 01/07/2021 21:24, Martin Peres wrote: [...]
+Â Â Â Â Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; +Â Â Â Â Â Â Â return; +Â Â Â }
+Â Â Â /* Default: enable HuC authentication and GuC submission */ +Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
This seems to be in contradiction with the GuC submission plan which states:
"Not enabled by default on any current platforms but can be enabled via modparam enable_guc".
I don't believe any current platform gets this point where GuC submission would be enabled by default. The first would be ADL-P which isn't out yet.
Isn't that exactly what the line above does?
In case you missed this crucial part of the review. Please answer the above question.
Cheers, Martin
On 02.07.2021 10:13, Martin Peres wrote:
On 01/07/2021 21:24, Martin Peres wrote: [...]
+Â Â Â Â Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; +Â Â Â Â Â Â Â return; +Â Â Â }
+Â Â Â /* Default: enable HuC authentication and GuC submission */ +Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
This seems to be in contradiction with the GuC submission plan which states:
"Not enabled by default on any current platforms but can be enabled via modparam enable_guc".
I don't believe any current platform gets this point where GuC submission would be enabled by default. The first would be ADL-P which isn't out yet.
Isn't that exactly what the line above does?
In case you missed this crucial part of the review. Please answer the above question.
I guess there is some misunderstanding here, and I must admit I had similar doubt, but if you look beyond patch diff and check function code you will find that the very condition is:
/* Don't enable GuC/HuC on pre-Gen12 */ if (GRAPHICS_VER(i915) < 12) { i915->params.enable_guc = 0; return; }
so all pre-Gen12 platforms will continue to have GuC/HuC disabled.
Thanks, Michal
On 02/07/2021 16:06, Michal Wajdeczko wrote:
On 02.07.2021 10:13, Martin Peres wrote:
On 01/07/2021 21:24, Martin Peres wrote: [...]
+Â Â Â Â Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; +Â Â Â Â Â Â Â return; +Â Â Â }
+Â Â Â /* Default: enable HuC authentication and GuC submission */ +Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
This seems to be in contradiction with the GuC submission plan which states:
"Not enabled by default on any current platforms but can be enabled via modparam enable_guc".
I don't believe any current platform gets this point where GuC submission would be enabled by default. The first would be ADL-P which isn't out yet.
Isn't that exactly what the line above does?
In case you missed this crucial part of the review. Please answer the above question.
I guess there is some misunderstanding here, and I must admit I had similar doubt, but if you look beyond patch diff and check function code you will find that the very condition is:
/* Don't enable GuC/HuC on pre-Gen12 */ if (GRAPHICS_VER(i915) < 12) { i915->params.enable_guc = 0; return; }
so all pre-Gen12 platforms will continue to have GuC/HuC disabled.
Thanks Michal, but then the problem is the other way: how can one enable it on gen11?
I like what Daniele was going for here: separating the capability from the user-requested value, but then it seems the patch stopped half way. How about never touching the parameter, and having a AND between the two values to get the effective enable_guc?
Right now, the code is really confusing :s
Thanks, Martin
Thanks, Michal
On 02.07.2021 15:12, Martin Peres wrote:
On 02/07/2021 16:06, Michal Wajdeczko wrote:
On 02.07.2021 10:13, Martin Peres wrote:
On 01/07/2021 21:24, Martin Peres wrote: [...]
> +Â Â Â Â Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; > +Â Â Â Â Â Â Â return; > +Â Â Â } > + > +Â Â Â /* Default: enable HuC authentication and GuC submission */ > +Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | > ENABLE_GUC_SUBMISSION;
This seems to be in contradiction with the GuC submission plan which states:
"Not enabled by default on any current platforms but can be enabled via modparam enable_guc".
I don't believe any current platform gets this point where GuC submission would be enabled by default. The first would be ADL-P which isn't out yet.
Isn't that exactly what the line above does?
In case you missed this crucial part of the review. Please answer the above question.
I guess there is some misunderstanding here, and I must admit I had similar doubt, but if you look beyond patch diff and check function code you will find that the very condition is:
/* Don't enable GuC/HuC on pre-Gen12 */ Â Â Â Â if (GRAPHICS_VER(i915) < 12) { Â Â Â Â Â Â Â i915->params.enable_guc = 0; Â Â Â Â Â Â Â return; Â Â Â Â }
so all pre-Gen12 platforms will continue to have GuC/HuC disabled.
Thanks Michal, but then the problem is the other way: how can one enable it on gen11?
this code here converts default GuC auto mode (enable_guc=-1) into per platform desired (tested) GuC/HuC enables.
to override that default, you may still use enable_guc=1 to explicitly enable GuC submission and since we also have this code:
+static bool __guc_submission_supported(struct intel_guc *guc) +{ + /* GuC submission is unavailable for pre-Gen11 */ + return intel_guc_is_supported(guc) && + INTEL_GEN(guc_to_gt(guc)->i915) >= 11; +}
it should work on any Gen11+.
Michal
I like what Daniele was going for here: separating the capability from the user-requested value, but then it seems the patch stopped half way. How about never touching the parameter, and having a AND between the two values to get the effective enable_guc?
Right now, the code is really confusing :s
Thanks, Martin
Thanks, Michal
On 6/30/2021 01:22, Martin Peres wrote:
On 24/06/2021 10:05, Matthew Brost wrote:
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h           | 1 +  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +--  drivers/gpu/drm/i915/gt/uc/intel_uc.c            | 14 +++++++++-----  4 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index fae01dc8e1b9..77981788204f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -54,6 +54,7 @@ struct intel_guc { Â Â Â Â Â struct ida guc_ids; Â Â Â Â Â struct list_head guc_id_list; Â +Â Â Â bool submission_supported; Â Â Â Â Â bool submission_selected; Â Â Â Â Â Â struct i915_vma *ads_vma; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a427336ce916..405339202280 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc) Â Â Â Â Â /* Note: By the time we're here, GuC may have already been reset */ Â } Â +static bool __guc_submission_supported(struct intel_guc *guc) +{ +Â Â Â /* GuC submission is unavailable for pre-Gen11 */ +Â Â Â return intel_guc_is_supported(guc) && +Â Â Â Â Â Â Â Â Â Â INTEL_GEN(guc_to_gt(guc)->i915) >= 11; +}
static bool __guc_submission_selected(struct intel_guc *guc) Â { Â Â Â Â Â struct drm_i915_private *i915 = guc_to_gt(guc)->i915; @@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc) Â Â void intel_guc_submission_init_early(struct intel_guc *guc) Â { +Â Â Â guc->submission_supported = __guc_submission_supported(guc); Â Â Â Â Â guc->submission_selected = __guc_submission_selected(guc); Â } Â diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index a2a3fad72be1..be767eb6ff71 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc, Â Â static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) Â { -Â Â Â /* XXX: GuC submission is unavailable for now */ -Â Â Â return false; +Â Â Â return guc->submission_supported; Â } Â Â static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) Â Â Â Â Â Â Â Â Â return; Â Â Â Â Â } Â -Â Â Â /* Default: enable HuC authentication only */ -Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; +Â Â Â /* Intermediate platforms are HuC authentication only */ +Â Â Â if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { +Â Â Â Â Â Â Â drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about:
"Disabling GuC on untested platforms"?
Just because something is not in the shops yet does not mean it is new. Technology is always obsolete by the time it goes on sale.
And the issue is not a lack of testing, it is a question of whether we are allowed to change the default on something that has already started being used by customers or not (including pre-release beta customers). I.e. it is basically a political decision not an engineering decision.
+Â Â Â Â Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; +Â Â Â Â Â Â Â return; +Â Â Â }
+Â Â Â /* Default: enable HuC authentication and GuC submission */ +Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
This seems to be in contradiction with the GuC submission plan which states:
"Not enabled by default on any current platforms but can be enabled via modparam enable_guc".
All current platforms have already been explicitly tested for above. This is setting the default on newer platforms - ADL-P and later. For which the official expectation is to have GuC enabled.
When you rework the patch, could you please add a warning when the user force-enables the GuC Command Submission?
There already is one. If you set the module parameter then the kernel is tainted. That means 'here be dragons' - you have done something officially not supported to your kernel so all bets are off, if it blows up it is your own problem.
Something like:
"WARNING: The user force-enabled the experimental GuC command submission backend using i915.enable_guc. Please disable it if experiencing stability issues. No bug reports will be accepted on this backend".
This should allow you to work on the backend, while communicating clearly to users that it is not ready just yet. Once it has matured, the warning can be removed.
The fact that ADL-P is not on the shelves in your local retail store should be sufficient to ensure that users are aware that ADL-P support is not entirely mature yet. And in many ways, not just GuC based submission.
John.
Cheers, Martin
} Â Â /* Reset GuC providing us with fresh state for both GuC and HuC. @@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc) Â Â Â Â Â if (i915_inject_probe_failure(uc_to_gt(uc)->i915)) Â Â Â Â Â Â Â Â Â return -ENOMEM; Â -Â Â Â /* XXX: GuC submission is unavailable for now */ -Â Â Â GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
ret = intel_guc_init(guc); Â Â Â Â Â if (ret) Â Â Â Â Â Â Â Â Â return ret;
On Wed, 30 Jun 2021 11:58:25 -0700 John Harrison john.c.harrison@intel.com wrote:
On 6/30/2021 01:22, Martin Peres wrote:
On 24/06/2021 10:05, Matthew Brost wrote:
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h           | 1 +  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +--  drivers/gpu/drm/i915/gt/uc/intel_uc.c            | 14 +++++++++-----  4 files changed, 19 insertions(+), 7 deletions(-)
...
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) Â Â Â Â Â Â Â Â Â return; Â Â Â Â Â } Â -Â Â Â /* Default: enable HuC authentication only */ -Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; +Â Â Â /* Intermediate platforms are HuC authentication only */ +Â Â Â if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { +Â Â Â Â Â Â Â drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about:
"Disabling GuC on untested platforms"?
Just because something is not in the shops yet does not mean it is new. Technology is always obsolete by the time it goes on sale.
That is a very good reason to not use terminology like "new", "old", "current", "modern" etc. at all.
End users like me definitely do not share your interpretation of "old".
Thanks, pq
And the issue is not a lack of testing, it is a question of whether we are allowed to change the default on something that has already started being used by customers or not (including pre-release beta customers). I.e. it is basically a political decision not an engineering decision.
On 01/07/2021 11:14, Pekka Paalanen wrote:
On Wed, 30 Jun 2021 11:58:25 -0700 John Harrison john.c.harrison@intel.com wrote:
On 6/30/2021 01:22, Martin Peres wrote:
On 24/06/2021 10:05, Matthew Brost wrote:
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h           | 1 +  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +--  drivers/gpu/drm/i915/gt/uc/intel_uc.c            | 14 +++++++++-----  4 files changed, 19 insertions(+), 7 deletions(-)
...
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) Â Â Â Â Â Â Â Â Â return; Â Â Â Â Â } Â -Â Â Â /* Default: enable HuC authentication only */ -Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; +Â Â Â /* Intermediate platforms are HuC authentication only */ +Â Â Â if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { +Â Â Â Â Â Â Â drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about:
"Disabling GuC on untested platforms"?
Just because something is not in the shops yet does not mean it is new. Technology is always obsolete by the time it goes on sale.
That is a very good reason to not use terminology like "new", "old", "current", "modern" etc. at all.
End users like me definitely do not share your interpretation of "old".
Yep, old and new is relative. In the end, what matters is the validation effort, which is why I was proposing "untested platforms".
Also, remember that you are not writing these messages for Intel engineers, but instead are writing for Linux *users*.
Cheers, Martin
Thanks, pq
And the issue is not a lack of testing, it is a question of whether we are allowed to change the default on something that has already started being used by customers or not (including pre-release beta customers). I.e. it is basically a political decision not an engineering decision.
On Thu, Jul 1, 2021 at 8:27 PM Martin Peres martin.peres@free.fr wrote:
On 01/07/2021 11:14, Pekka Paalanen wrote:
On Wed, 30 Jun 2021 11:58:25 -0700 John Harrison john.c.harrison@intel.com wrote:
On 6/30/2021 01:22, Martin Peres wrote:
On 24/06/2021 10:05, Matthew Brost wrote:
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +++++++++----- 4 files changed, 19 insertions(+), 7 deletions(-)
...
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) return; }
- /* Default: enable HuC authentication only */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
- /* Intermediate platforms are HuC authentication only */
- if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
drm_dbg(&i915->drm, "Disabling GuC only due to old
platform\n");
This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about:
"Disabling GuC on untested platforms"?
Just because something is not in the shops yet does not mean it is new. Technology is always obsolete by the time it goes on sale.
That is a very good reason to not use terminology like "new", "old", "current", "modern" etc. at all.
End users like me definitely do not share your interpretation of "old".
Yep, old and new is relative. In the end, what matters is the validation effort, which is why I was proposing "untested platforms".
Also, remember that you are not writing these messages for Intel engineers, but instead are writing for Linux *users*.
It's drm_dbg. Users don't read this stuff, at least not users with no clue what the driver does and stuff like that. -Daniel
On Thu, 1 Jul 2021 21:28:06 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 1, 2021 at 8:27 PM Martin Peres martin.peres@free.fr wrote:
On 01/07/2021 11:14, Pekka Paalanen wrote:
On Wed, 30 Jun 2021 11:58:25 -0700 John Harrison john.c.harrison@intel.com wrote:
On 6/30/2021 01:22, Martin Peres wrote:
On 24/06/2021 10:05, Matthew Brost wrote:
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +++++++++----- 4 files changed, 19 insertions(+), 7 deletions(-)
...
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) return; }
- /* Default: enable HuC authentication only */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
- /* Intermediate platforms are HuC authentication only */
- if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
drm_dbg(&i915->drm, "Disabling GuC only due to old
platform\n");
This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about:
"Disabling GuC on untested platforms"?
Just because something is not in the shops yet does not mean it is new. Technology is always obsolete by the time it goes on sale.
That is a very good reason to not use terminology like "new", "old", "current", "modern" etc. at all.
End users like me definitely do not share your interpretation of "old".
Yep, old and new is relative. In the end, what matters is the validation effort, which is why I was proposing "untested platforms".
Also, remember that you are not writing these messages for Intel engineers, but instead are writing for Linux *users*.
It's drm_dbg. Users don't read this stuff, at least not users with no clue what the driver does and stuff like that.
If I had a problem, I would read it, and I have no clue what anything of that is.
Thanks, pq
On 02/07/2021 10:29, Pekka Paalanen wrote:
On Thu, 1 Jul 2021 21:28:06 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 1, 2021 at 8:27 PM Martin Peres martin.peres@free.fr wrote:
On 01/07/2021 11:14, Pekka Paalanen wrote:
On Wed, 30 Jun 2021 11:58:25 -0700 John Harrison john.c.harrison@intel.com wrote:
On 6/30/2021 01:22, Martin Peres wrote:
On 24/06/2021 10:05, Matthew Brost wrote: > From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com > > Unblock GuC submission on Gen11+ platforms. > > Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com > Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com > Signed-off-by: Matthew Brost matthew.brost@intel.com > --- > drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- > drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +++++++++----- > 4 files changed, 19 insertions(+), 7 deletions(-) >
...
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c > b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > index 7a69c3c027e9..61be0aa81492 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c > @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct > intel_uc *uc) > return; > } > - /* Default: enable HuC authentication only */ > - i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; > + /* Intermediate platforms are HuC authentication only */ > + if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { > + drm_dbg(&i915->drm, "Disabling GuC only due to old > platform\n");
This comment does not seem accurate, given that DG1 is barely out, and ADL is not out yet. How about:
"Disabling GuC on untested platforms"?
Just because something is not in the shops yet does not mean it is new. Technology is always obsolete by the time it goes on sale.
That is a very good reason to not use terminology like "new", "old", "current", "modern" etc. at all.
End users like me definitely do not share your interpretation of "old".
Yep, old and new is relative. In the end, what matters is the validation effort, which is why I was proposing "untested platforms".
Also, remember that you are not writing these messages for Intel engineers, but instead are writing for Linux *users*.
It's drm_dbg. Users don't read this stuff, at least not users with no clue what the driver does and stuff like that.
If I had a problem, I would read it, and I have no clue what anything of that is.
Exactly.
This level of defense for what is clearly a bad *debug* message (at the very least, the grammar) makes no sense at all!
I don't want to hear arguments like "Not my patch" from a developer literally sending the patch to the ML and who added his SoB to the patch, playing with words, or minimizing the problem of having such a message.
All of the above are just clear signals for the community to get off your playground, which is frankly unacceptable. Your email address does not matter.
In the spirit of collaboration, your response should have been "Good catch, how about XXXX or YYYY?". This would not have wasted everyone's time in an attempt to just have it your way.
My level of confidence in this GuC transition was already low, but you guys are working hard to shoot yourself in the foot. Trust should be earned!
Martin
Thanks, pq
On 02.07.2021 10:09, Martin Peres wrote:
On 02/07/2021 10:29, Pekka Paalanen wrote:
On Thu, 1 Jul 2021 21:28:06 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 1, 2021 at 8:27 PM Martin Peres martin.peres@free.fr wrote:
On 01/07/2021 11:14, Pekka Paalanen wrote:
On Wed, 30 Jun 2021 11:58:25 -0700 John Harrison john.c.harrison@intel.com wrote:
On 6/30/2021 01:22, Martin Peres wrote: > On 24/06/2021 10:05, Matthew Brost wrote: >> From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com >> >> Unblock GuC submission on Gen11+ platforms. >> >> Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com >> Signed-off-by: Daniele Ceraolo Spurio >> daniele.ceraolospurio@intel.com >> Signed-off-by: Matthew Brost matthew.brost@intel.com >> --- >>    drivers/gpu/drm/i915/gt/uc/intel_guc.h           | 1 + >>    drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ >>    drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- >>    drivers/gpu/drm/i915/gt/uc/intel_uc.c            | 14 >> +++++++++----- >>    4 files changed, 19 insertions(+), 7 deletions(-) >>
...
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >> index 7a69c3c027e9..61be0aa81492 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >> @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct >> intel_uc *uc) >> Â Â Â Â Â Â Â Â Â Â Â return; >> Â Â Â Â Â Â Â } >> Â Â Â -Â Â Â /* Default: enable HuC authentication only */ >> -Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; >> +Â Â Â /* Intermediate platforms are HuC authentication only */ >> +Â Â Â if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { >> +Â Â Â Â Â Â Â drm_dbg(&i915->drm, "Disabling GuC only due to old >> platform\n"); > > This comment does not seem accurate, given that DG1 is barely > out, and > ADL is not out yet. How about: > > "Disabling GuC on untested platforms"? > Â Just because something is not in the shops yet does not mean it is new. Technology is always obsolete by the time it goes on sale.
That is a very good reason to not use terminology like "new", "old", "current", "modern" etc. at all.
End users like me definitely do not share your interpretation of "old".
Yep, old and new is relative. In the end, what matters is the validation effort, which is why I was proposing "untested platforms".
Also, remember that you are not writing these messages for Intel engineers, but instead are writing for Linux *users*.
It's drm_dbg. Users don't read this stuff, at least not users with no clue what the driver does and stuff like that.
If I had a problem, I would read it, and I have no clue what anything of that is.
Exactly.
This level of defense for what is clearly a bad *debug* message (at the very least, the grammar) makes no sense at all!
I don't want to hear arguments like "Not my patch" from a developer literally sending the patch to the ML and who added his SoB to the patch, playing with words, or minimizing the problem of having such a message.
Agree that 'not my patch' is never a good excuse, but equally we can't blame original patch author as patch was updated few times since then.
Maybe to avoid confusions and simplify reviews, we could split this patch into two smaller: first one that really unblocks GuC submission on all Gen11+ (see __guc_submission_supported) and second one that updates defaults for Gen12+ (see uc_expand_default_options), as original patch (from ~2019) evolved more than what title/commit message says.
Then we can fix all messaging and make sure it's clear and understood.
Thanks, Michal
All of the above are just clear signals for the community to get off your playground, which is frankly unacceptable. Your email address does not matter.
In the spirit of collaboration, your response should have been "Good catch, how about XXXX or YYYY?". This would not have wasted everyone's time in an attempt to just have it your way.
My level of confidence in this GuC transition was already low, but you guys are working hard to shoot yourself in the foot. Trust should be earned!
Martin
Thanks, pq
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 02/07/2021 18:07, Michal Wajdeczko wrote:
On 02.07.2021 10:09, Martin Peres wrote:
On 02/07/2021 10:29, Pekka Paalanen wrote:
On Thu, 1 Jul 2021 21:28:06 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 1, 2021 at 8:27 PM Martin Peres martin.peres@free.fr wrote:
On 01/07/2021 11:14, Pekka Paalanen wrote:
On Wed, 30 Jun 2021 11:58:25 -0700 John Harrison john.c.harrison@intel.com wrote:
> On 6/30/2021 01:22, Martin Peres wrote: >> On 24/06/2021 10:05, Matthew Brost wrote: >>> From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com >>> >>> Unblock GuC submission on Gen11+ platforms. >>> >>> Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com >>> Signed-off-by: Daniele Ceraolo Spurio >>> daniele.ceraolospurio@intel.com >>> Signed-off-by: Matthew Brost matthew.brost@intel.com >>> --- >>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h           | 1 + >>>    drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ >>>    drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- >>>    drivers/gpu/drm/i915/gt/uc/intel_uc.c            | 14 >>> +++++++++----- >>>    4 files changed, 19 insertions(+), 7 deletions(-) >>>
...
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>> index 7a69c3c027e9..61be0aa81492 100644 >>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>> @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct >>> intel_uc *uc) >>> Â Â Â Â Â Â Â Â Â Â Â return; >>> Â Â Â Â Â Â Â } >>> Â Â Â -Â Â Â /* Default: enable HuC authentication only */ >>> -Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; >>> +Â Â Â /* Intermediate platforms are HuC authentication only */ >>> +Â Â Â if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { >>> +Â Â Â Â Â Â Â drm_dbg(&i915->drm, "Disabling GuC only due to old >>> platform\n"); >> >> This comment does not seem accurate, given that DG1 is barely >> out, and >> ADL is not out yet. How about: >> >> "Disabling GuC on untested platforms"? >> > Just because something is not in the shops yet does not mean it is > new. > Technology is always obsolete by the time it goes on sale.
That is a very good reason to not use terminology like "new", "old", "current", "modern" etc. at all.
End users like me definitely do not share your interpretation of "old".
Yep, old and new is relative. In the end, what matters is the validation effort, which is why I was proposing "untested platforms".
Also, remember that you are not writing these messages for Intel engineers, but instead are writing for Linux *users*.
It's drm_dbg. Users don't read this stuff, at least not users with no clue what the driver does and stuff like that.
If I had a problem, I would read it, and I have no clue what anything of that is.
Exactly.
This level of defense for what is clearly a bad *debug* message (at the very least, the grammar) makes no sense at all!
I don't want to hear arguments like "Not my patch" from a developer literally sending the patch to the ML and who added his SoB to the patch, playing with words, or minimizing the problem of having such a message.
Agree that 'not my patch' is never a good excuse, but equally we can't blame original patch author as patch was updated few times since then.
I never wanted to blame the author here, I was only speaking about the handling of feedback on the patch.
Maybe to avoid confusions and simplify reviews, we could split this patch into two smaller: first one that really unblocks GuC submission on all Gen11+ (see __guc_submission_supported) and second one that updates defaults for Gen12+ (see uc_expand_default_options), as original patch (from ~2019) evolved more than what title/commit message says.
Both work for me, as long as it is a collaborative effort.
Cheers, Martin
Then we can fix all messaging and make sure it's clear and understood.
Thanks, Michal
All of the above are just clear signals for the community to get off your playground, which is frankly unacceptable. Your email address does not matter.
In the spirit of collaboration, your response should have been "Good catch, how about XXXX or YYYY?". This would not have wasted everyone's time in an attempt to just have it your way.
My level of confidence in this GuC transition was already low, but you guys are working hard to shoot yourself in the foot. Trust should be earned!
Martin
Thanks, pq
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 7/3/2021 01:21, Martin Peres wrote:
On 02/07/2021 18:07, Michal Wajdeczko wrote:
On 02.07.2021 10:09, Martin Peres wrote:
On 02/07/2021 10:29, Pekka Paalanen wrote:
On Thu, 1 Jul 2021 21:28:06 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 1, 2021 at 8:27 PM Martin Peres martin.peres@free.fr wrote:
On 01/07/2021 11:14, Pekka Paalanen wrote: > On Wed, 30 Jun 2021 11:58:25 -0700 > John Harrison john.c.harrison@intel.com wrote: >> On 6/30/2021 01:22, Martin Peres wrote: >>> On 24/06/2021 10:05, Matthew Brost wrote: >>>> From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com >>>> >>>> Unblock GuC submission on Gen11+ platforms. >>>> >>>> Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com >>>> Signed-off-by: Daniele Ceraolo Spurio >>>> daniele.ceraolospurio@intel.com >>>> Signed-off-by: Matthew Brost matthew.brost@intel.com >>>> --- >>>> drivers/gpu/drm/i915/gt/uc/intel_guc.h |Â 1 + >>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |Â 8 ++++++++ >>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |Â 3 +-- >>>> drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 >>>> +++++++++----- >>>> Â Â Â Â 4 files changed, 19 insertions(+), 7 deletions(-) > > ... >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>> index 7a69c3c027e9..61be0aa81492 100644 >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>> @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct >>>> intel_uc *uc) >>>> Â Â Â Â Â Â Â Â Â Â Â Â return; >>>> Â Â Â Â Â Â Â Â } >>>> Â Â Â Â -Â Â Â /* Default: enable HuC authentication only */ >>>> -Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; >>>> +Â Â Â /* Intermediate platforms are HuC authentication only */ >>>> +Â Â Â if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { >>>> +Â Â Â Â Â Â Â drm_dbg(&i915->drm, "Disabling GuC only due to old >>>> platform\n"); >>> >>> This comment does not seem accurate, given that DG1 is barely >>> out, and >>> ADL is not out yet. How about: >>> >>> "Disabling GuC on untested platforms"? >> Just because something is not in the shops yet does not mean it is >> new. >> Technology is always obsolete by the time it goes on sale. > > That is a very good reason to not use terminology like "new", > "old", > "current", "modern" etc. at all. > > End users like me definitely do not share your interpretation of > "old".
Yep, old and new is relative. In the end, what matters is the validation effort, which is why I was proposing "untested platforms".
Also, remember that you are not writing these messages for Intel engineers, but instead are writing for Linux *users*.
It's drm_dbg. Users don't read this stuff, at least not users with no clue what the driver does and stuff like that.
If I had a problem, I would read it, and I have no clue what anything of that is.
Exactly.
I don't see how replacing 'old' for 'untested' helps anybody to understand anything. Untested just implies we can't be bothered to test stuff before publishing it. And as previously stated, this is purely a political decision not a technical one. Sure, change the message to be 'Disabling GuC submission but enabling HuC loading via GuC on platform XXX' if that makes it clearer what is going on. Or just drop the message completely. It's simply explaining what the default option is for the current platform which you can also get by reading the code. However, I disagree that 'untested' is the correct message. Quite a lot of testing has been happening on TGL+ with GuC submission enabled.
This level of defense for what is clearly a bad *debug* message (at the very least, the grammar) makes no sense at all!
I don't want to hear arguments like "Not my patch" from a developer literally sending the patch to the ML and who added his SoB to the patch, playing with words, or minimizing the problem of having such a message.
Agree that 'not my patch' is never a good excuse, but equally we can't blame original patch author as patch was updated few times since then.
I never wanted to blame the author here, I was only speaking about the handling of feedback on the patch.
Maybe to avoid confusions and simplify reviews, we could split this patch into two smaller: first one that really unblocks GuC submission on all Gen11+ (see __guc_submission_supported) and second one that updates defaults for Gen12+ (see uc_expand_default_options), as original patch (from ~2019) evolved more than what title/commit message says.
Both work for me, as long as it is a collaborative effort.
I'm not seeing how splitting the patch up fixes the complaints about the debug message.
And to be clear, no-one is actually arguing for a code change as such? The issue is just about the text of the debug message? Or did I miss something somewhere?
John.
Cheers, Martin
Then we can fix all messaging and make sure it's clear and understood.
Thanks, Michal
All of the above are just clear signals for the community to get off your playground, which is frankly unacceptable. Your email address does not matter.
In the spirit of collaboration, your response should have been "Good catch, how about XXXX or YYYY?". This would not have wasted everyone's time in an attempt to just have it your way.
My level of confidence in this GuC transition was already low, but you guys are working hard to shoot yourself in the foot. Trust should be earned!
Martin
Thanks, pq
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Tue, 6 Jul 2021 17:57:35 -0700 John Harrison john.c.harrison@intel.com wrote:
On 7/3/2021 01:21, Martin Peres wrote:
On 02/07/2021 18:07, Michal Wajdeczko wrote:
On 02.07.2021 10:09, Martin Peres wrote:
On 02/07/2021 10:29, Pekka Paalanen wrote:
On Thu, 1 Jul 2021 21:28:06 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 1, 2021 at 8:27 PM Martin Peres martin.peres@free.fr wrote: > > On 01/07/2021 11:14, Pekka Paalanen wrote: >> On Wed, 30 Jun 2021 11:58:25 -0700 >> John Harrison john.c.harrison@intel.com wrote: >>> On 6/30/2021 01:22, Martin Peres wrote: >>>> On 24/06/2021 10:05, Matthew Brost wrote: >>>>> From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com >>>>> >>>>> Unblock GuC submission on Gen11+ platforms. >>>>> >>>>> Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com >>>>> Signed-off-by: Daniele Ceraolo Spurio >>>>> daniele.ceraolospurio@intel.com >>>>> Signed-off-by: Matthew Brost matthew.brost@intel.com >>>>> --- >>>>> drivers/gpu/drm/i915/gt/uc/intel_guc.h |Â 1 + >>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |Â 8 ++++++++ >>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |Â 3 +-- >>>>> drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 >>>>> +++++++++----- >>>>> Â Â Â Â 4 files changed, 19 insertions(+), 7 deletions(-) >> >> ... >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>>> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>>> index 7a69c3c027e9..61be0aa81492 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>>> @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct >>>>> intel_uc *uc) >>>>> Â Â Â Â Â Â Â Â Â Â Â Â return; >>>>> Â Â Â Â Â Â Â Â } >>>>> Â Â Â Â -Â Â Â /* Default: enable HuC authentication only */ >>>>> -Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; >>>>> +Â Â Â /* Intermediate platforms are HuC authentication only */ >>>>> +Â Â Â if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { >>>>> +Â Â Â Â Â Â Â drm_dbg(&i915->drm, "Disabling GuC only due to old >>>>> platform\n"); >>>> >>>> This comment does not seem accurate, given that DG1 is barely >>>> out, and >>>> ADL is not out yet. How about: >>>> >>>> "Disabling GuC on untested platforms"? >>> Just because something is not in the shops yet does not mean it is >>> new. >>> Technology is always obsolete by the time it goes on sale. >> >> That is a very good reason to not use terminology like "new", >> "old", >> "current", "modern" etc. at all. >> >> End users like me definitely do not share your interpretation of >> "old". > > Yep, old and new is relative. In the end, what matters is the > validation > effort, which is why I was proposing "untested platforms". > > Also, remember that you are not writing these messages for Intel > engineers, but instead are writing for Linux *users*.
It's drm_dbg. Users don't read this stuff, at least not users with no clue what the driver does and stuff like that.
If I had a problem, I would read it, and I have no clue what anything of that is.
Exactly.
I don't see how replacing 'old' for 'untested' helps anybody to understand anything. Untested just implies we can't be bothered to test stuff before publishing it. And as previously stated, this is purely a political decision not a technical one. Sure, change the message to be 'Disabling GuC submission but enabling HuC loading via GuC on platform XXX' if that makes it clearer what is going on. Or just drop the message completely. It's simply explaining what the default option is for the current platform which you can also get by reading the code. However, I disagree that 'untested' is the correct message. Quite a lot of testing has been happening on TGL+ with GuC submission enabled.
Hi,
it seems to me that "untested" was just a wrong guess, nothing more. It was presented with "how about?", not as an exact demand.
You don't have to attack that word, just use another phrasing that is both correct and not misleading to the majority of tech savvy people.
Thanks, pq
On 07.07.2021 02:57, John Harrison wrote:
On 7/3/2021 01:21, Martin Peres wrote:
On 02/07/2021 18:07, Michal Wajdeczko wrote:
On 02.07.2021 10:09, Martin Peres wrote:
On 02/07/2021 10:29, Pekka Paalanen wrote:
On Thu, 1 Jul 2021 21:28:06 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 1, 2021 at 8:27 PM Martin Peres martin.peres@free.fr wrote: > > On 01/07/2021 11:14, Pekka Paalanen wrote: >> On Wed, 30 Jun 2021 11:58:25 -0700 >> John Harrison john.c.harrison@intel.com wrote: >>> On 6/30/2021 01:22, Martin Peres wrote: >>>> On 24/06/2021 10:05, Matthew Brost wrote: >>>>> From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com >>>>> >>>>> Unblock GuC submission on Gen11+ platforms. >>>>> >>>>> Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com >>>>> Signed-off-by: Daniele Ceraolo Spurio >>>>> daniele.ceraolospurio@intel.com >>>>> Signed-off-by: Matthew Brost matthew.brost@intel.com >>>>> --- >>>>> drivers/gpu/drm/i915/gt/uc/intel_guc.h |Â 1 + >>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |Â 8 ++++++++ >>>>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |Â 3 +-- >>>>> drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 >>>>> +++++++++----- >>>>> Â Â Â Â 4 files changed, 19 insertions(+), 7 deletions(-) >> >> ... >>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>>> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>>> index 7a69c3c027e9..61be0aa81492 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c >>>>> @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct >>>>> intel_uc *uc) >>>>> Â Â Â Â Â Â Â Â Â Â Â Â return; >>>>> Â Â Â Â Â Â Â Â } >>>>> Â Â Â Â -Â Â Â /* Default: enable HuC authentication only */ >>>>> -Â Â Â i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; >>>>> +Â Â Â /* Intermediate platforms are HuC authentication only */ >>>>> +Â Â Â if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { >>>>> +Â Â Â Â Â Â Â drm_dbg(&i915->drm, "Disabling GuC only due to old >>>>> platform\n"); >>>> >>>> This comment does not seem accurate, given that DG1 is barely >>>> out, and >>>> ADL is not out yet. How about: >>>> >>>> "Disabling GuC on untested platforms"? >>> Just because something is not in the shops yet does not mean it is >>> new. >>> Technology is always obsolete by the time it goes on sale. >> >> That is a very good reason to not use terminology like "new", >> "old", >> "current", "modern" etc. at all. >> >> End users like me definitely do not share your interpretation of >> "old". > > Yep, old and new is relative. In the end, what matters is the > validation > effort, which is why I was proposing "untested platforms". > > Also, remember that you are not writing these messages for Intel > engineers, but instead are writing for Linux *users*.
It's drm_dbg. Users don't read this stuff, at least not users with no clue what the driver does and stuff like that.
If I had a problem, I would read it, and I have no clue what anything of that is.
Exactly.
I don't see how replacing 'old' for 'untested' helps anybody to understand anything. Untested just implies we can't be bothered to test stuff before publishing it. And as previously stated, this is purely a political decision not a technical one. Sure, change the message to be 'Disabling GuC submission but enabling HuC loading via GuC on platform XXX' if that makes it clearer what is going on. Or just drop the message completely. It's simply explaining what the default option is for the current platform which you can also get by reading the code. However, I disagree that 'untested' is the correct message. Quite a lot of testing has been happening on TGL+ with GuC submission enabled.
This level of defense for what is clearly a bad *debug* message (at the very least, the grammar) makes no sense at all!
I don't want to hear arguments like "Not my patch" from a developer literally sending the patch to the ML and who added his SoB to the patch, playing with words, or minimizing the problem of having such a message.
Agree that 'not my patch' is never a good excuse, but equally we can't blame original patch author as patch was updated few times since then.
I never wanted to blame the author here, I was only speaking about the handling of feedback on the patch.
Maybe to avoid confusions and simplify reviews, we could split this patch into two smaller: first one that really unblocks GuC submission on all Gen11+ (see __guc_submission_supported) and second one that updates defaults for Gen12+ (see uc_expand_default_options), as original patch (from ~2019) evolved more than what title/commit message says.
Both work for me, as long as it is a collaborative effort.
I'm not seeing how splitting the patch up fixes the complaints about the debug message.
I assume it's not just about debug message (but still related)
With separate patches you can explain in commit messages:
patch1: why (from technical point of view) we unblock GuC submission only for Gen11+ (as pre-Gen11 are also using the same GuC firmware so one can assume GuC submission will work there too),
patch2: why (from "political" point of view) we want to turn on GuC submission by default only on selected Gen12+ platforms (as one could wonder why we don't enable GuC submission for Gen11+ since it should work there too).
Then it should be easy to find proper wording for any debug message we may want to add.
And to be clear, no-one is actually arguing for a code change as such? The issue is just about the text of the debug message? Or did I miss something somewhere?
Change is trivial is hard to complain, what is missing, IMHO, is good rationale why we are making GuC submission enabling so selective.
Michal
John.
Cheers, Martin
Then we can fix all messaging and make sure it's clear and understood.
Thanks, Michal
All of the above are just clear signals for the community to get off your playground, which is frankly unacceptable. Your email address does not matter.
In the spirit of collaboration, your response should have been "Good catch, how about XXXX or YYYY?". This would not have wasted everyone's time in an attempt to just have it your way.
My level of confidence in this GuC transition was already low, but you guys are working hard to shoot yourself in the foot. Trust should be earned!
Martin
Thanks, pq
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Jun 24, 2021 at 12:05:16AM -0700, Matthew Brost wrote:
From: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com
Unblock GuC submission on Gen11+ platforms.
Signed-off-by: Michal Wajdeczko michal.wajdeczko@intel.com Signed-off-by: Daniele Ceraolo Spurio daniele.ceraolospurio@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com
Updating debug message per feedback, with that: Reviewed-by: Matthew Brost matthew.brost@intel.com
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 8 ++++++++ drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h | 3 +-- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 14 +++++++++----- 4 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index fae01dc8e1b9..77981788204f 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -54,6 +54,7 @@ struct intel_guc { struct ida guc_ids; struct list_head guc_id_list;
bool submission_supported; bool submission_selected;
struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index a427336ce916..405339202280 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2042,6 +2042,13 @@ void intel_guc_submission_disable(struct intel_guc *guc) /* Note: By the time we're here, GuC may have already been reset */ }
+static bool __guc_submission_supported(struct intel_guc *guc) +{
- /* GuC submission is unavailable for pre-Gen11 */
- return intel_guc_is_supported(guc) &&
INTEL_GEN(guc_to_gt(guc)->i915) >= 11;
+}
static bool __guc_submission_selected(struct intel_guc *guc) { struct drm_i915_private *i915 = guc_to_gt(guc)->i915; @@ -2054,6 +2061,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
void intel_guc_submission_init_early(struct intel_guc *guc) {
- guc->submission_supported = __guc_submission_supported(guc); guc->submission_selected = __guc_submission_selected(guc);
}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h index a2a3fad72be1..be767eb6ff71 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h @@ -37,8 +37,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
static inline bool intel_guc_submission_is_supported(struct intel_guc *guc) {
- /* XXX: GuC submission is unavailable for now */
- return false;
- return guc->submission_supported;
}
static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index 7a69c3c027e9..61be0aa81492 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -34,8 +34,15 @@ static void uc_expand_default_options(struct intel_uc *uc) return; }
- /* Default: enable HuC authentication only */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
- /* Intermediate platforms are HuC authentication only */
- if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
drm_dbg(&i915->drm, "Disabling GuC only due to old platform\n");
i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
return;
- }
- /* Default: enable HuC authentication and GuC submission */
- i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
}
/* Reset GuC providing us with fresh state for both GuC and HuC. @@ -313,9 +320,6 @@ static int __uc_init(struct intel_uc *uc) if (i915_inject_probe_failure(uc_to_gt(uc)->i915)) return -ENOMEM;
- /* XXX: GuC submission is unavailable for now */
- GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
- ret = intel_guc_init(guc); if (ret) return ret;
-- 2.28.0
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Hi Matt & John,
Can you please queue patches with the right Fixes: references to convert all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS protection for now. Please do so before next Wednesday so we get it queued in drm-intel-next-fixes.
There's the orthogonal track to discuss what would be the stable set of tracepoints we could expose. However, before that discussion is closed, let's keep a rather strict line to avoid potential maintenance burned.
We can then relax in the future as needed.
Regards, Joonas
Quoting Matthew Brost (2021-06-24 10:04:29)
As discussed in [1], [2] we are enabling GuC submission support in the i915. This is a subset of the patches in step 5 described in [1], basically it is absolute to enable CI with GuC submission on gen11+ platforms.
This series itself will likely be broken down into smaller patch sets to merge. Likely into CTBs changes, basic submission, virtual engines, and resets.
A following series will address the missing patches remaining from [1].
Locally tested on TGL machine and basic tests seem to be passing.
Signed-off-by: Matthew Brost matthew.brost@intel.com
[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/
Daniele Ceraolo Spurio (1): drm/i915/guc: Unblock GuC submission on Gen11+
John Harrison (10): drm/i915/guc: Module load failure test for CT buffer creation drm/i915: Track 'serial' counts for virtual engines drm/i915/guc: Provide mmio list to be saved/restored on engine reset drm/i915/guc: Don't complain about reset races drm/i915/guc: Enable GuC engine reset drm/i915/guc: Fix for error capture after full GPU reset with GuC drm/i915/guc: Hook GuC scheduling policies up drm/i915/guc: Connect reset modparam updates to GuC policy flags drm/i915/guc: Include scheduling policies in the debugfs state dump drm/i915/guc: Add golden context to GuC ADS
Matthew Brost (36): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads drm/i915/guc: Add new GuC interface defines and structures drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor drm/i915/guc: Add lrc descriptor context lookup array drm/i915/guc: Implement GuC submission tasklet drm/i915/guc: Add bypass tasklet submission path to GuC drm/i915/guc: Implement GuC context operations for new inteface drm/i915/guc: Insert fence on context when deregistering drm/i915/guc: Defer context unpin until scheduling is disabled drm/i915/guc: Disable engine barriers with GuC during unpin drm/i915/guc: Extend deregistration fence to schedule disable drm/i915: Disable preempt busywait when using GuC scheduling drm/i915/guc: Ensure request ordering via completion fences drm/i915/guc: Disable semaphores when using GuC scheduling drm/i915/guc: Ensure G2H response has space in buffer drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC drm/i915/guc: Update GuC debugfs to support new GuC drm/i915/guc: Add several request trace points drm/i915: Add intel_context tracing drm/i915/guc: GuC virtual engines drm/i915: Hold reference to intel_context over life of i915_request drm/i915/guc: Disable bonding extension with GuC submission drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs drm/i915/guc: Reset implementation for new GuC interface drm/i915: Reset GPU immediately if submission is disabled drm/i915/guc: Add disable interrupts to guc sanitize drm/i915/guc: Suspend/resume implementation for new interface drm/i915/guc: Handle context reset notification drm/i915/guc: Handle engine reset failure notification drm/i915/guc: Enable the timer expired interrupt for GuC drm/i915/guc: Capture error state on context reset
drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + drivers/gpu/drm/i915/gt/intel_context.c | 41 +- drivers/gpu/drm/i915/gt/intel_context.h | 31 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- .../drm/i915/gt/intel_execlists_submission.c | 234 +- .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/intel_gt.c | 21 + drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + drivers/gpu/drm/i915/gt/intel_rps.c | 4 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- drivers/gpu/drm/i915/gt/selftest_context.c | 10 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/i915_request.c | 159 +- drivers/gpu/drm/i915/i915_request.h | 21 + drivers/gpu/drm/i915/i915_scheduler.c | 6 + drivers/gpu/drm/i915/i915_scheduler.h | 6 + drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + drivers/gpu/drm/i915/i915_trace.h | 197 +- .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 57 files changed, 4159 insertions(+), 787 deletions(-)
-- 2.28.0
On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:
Hi Matt & John,
Can you please queue patches with the right Fixes: references to convert all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS protection for now. Please do so before next Wednesday so we get it queued in drm-intel-next-fixes.
Don't we already do that? I checked i915_trace.h and every tracepoint I added (intel_context class, i915_request_guc_submit) is protected by LOW_LEVEL_TRACEPOINTS.
The only thing I changed outside of that protection is adding the guc_id field to existing i915_request class tracepoints. Without the guc_id in those tracepoints these are basically useless with GuC submission. We could revert that if it is a huge deal but as I said then they are useless...
Matt
There's the orthogonal track to discuss what would be the stable set of tracepoints we could expose. However, before that discussion is closed, let's keep a rather strict line to avoid potential maintenance burned.
We can then relax in the future as needed.
Regards, Joonas
Quoting Matthew Brost (2021-06-24 10:04:29)
As discussed in [1], [2] we are enabling GuC submission support in the i915. This is a subset of the patches in step 5 described in [1], basically it is absolute to enable CI with GuC submission on gen11+ platforms.
This series itself will likely be broken down into smaller patch sets to merge. Likely into CTBs changes, basic submission, virtual engines, and resets.
A following series will address the missing patches remaining from [1].
Locally tested on TGL machine and basic tests seem to be passing.
Signed-off-by: Matthew Brost matthew.brost@intel.com
[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/
Daniele Ceraolo Spurio (1): drm/i915/guc: Unblock GuC submission on Gen11+
John Harrison (10): drm/i915/guc: Module load failure test for CT buffer creation drm/i915: Track 'serial' counts for virtual engines drm/i915/guc: Provide mmio list to be saved/restored on engine reset drm/i915/guc: Don't complain about reset races drm/i915/guc: Enable GuC engine reset drm/i915/guc: Fix for error capture after full GPU reset with GuC drm/i915/guc: Hook GuC scheduling policies up drm/i915/guc: Connect reset modparam updates to GuC policy flags drm/i915/guc: Include scheduling policies in the debugfs state dump drm/i915/guc: Add golden context to GuC ADS
Matthew Brost (36): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads drm/i915/guc: Add new GuC interface defines and structures drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor drm/i915/guc: Add lrc descriptor context lookup array drm/i915/guc: Implement GuC submission tasklet drm/i915/guc: Add bypass tasklet submission path to GuC drm/i915/guc: Implement GuC context operations for new inteface drm/i915/guc: Insert fence on context when deregistering drm/i915/guc: Defer context unpin until scheduling is disabled drm/i915/guc: Disable engine barriers with GuC during unpin drm/i915/guc: Extend deregistration fence to schedule disable drm/i915: Disable preempt busywait when using GuC scheduling drm/i915/guc: Ensure request ordering via completion fences drm/i915/guc: Disable semaphores when using GuC scheduling drm/i915/guc: Ensure G2H response has space in buffer drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC drm/i915/guc: Update GuC debugfs to support new GuC drm/i915/guc: Add several request trace points drm/i915: Add intel_context tracing drm/i915/guc: GuC virtual engines drm/i915: Hold reference to intel_context over life of i915_request drm/i915/guc: Disable bonding extension with GuC submission drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs drm/i915/guc: Reset implementation for new GuC interface drm/i915: Reset GPU immediately if submission is disabled drm/i915/guc: Add disable interrupts to guc sanitize drm/i915/guc: Suspend/resume implementation for new interface drm/i915/guc: Handle context reset notification drm/i915/guc: Handle engine reset failure notification drm/i915/guc: Enable the timer expired interrupt for GuC drm/i915/guc: Capture error state on context reset
drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + drivers/gpu/drm/i915/gt/intel_context.c | 41 +- drivers/gpu/drm/i915/gt/intel_context.h | 31 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- .../drm/i915/gt/intel_execlists_submission.c | 234 +- .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/intel_gt.c | 21 + drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + drivers/gpu/drm/i915/gt/intel_rps.c | 4 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- drivers/gpu/drm/i915/gt/selftest_context.c | 10 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/i915_request.c | 159 +- drivers/gpu/drm/i915/i915_request.h | 21 + drivers/gpu/drm/i915/i915_scheduler.c | 6 + drivers/gpu/drm/i915/i915_scheduler.h | 6 + drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + drivers/gpu/drm/i915/i915_trace.h | 197 +- .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 57 files changed, 4159 insertions(+), 787 deletions(-)
-- 2.28.0
Quoting Matthew Brost (2021-10-22 19:42:19)
On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:
Hi Matt & John,
Can you please queue patches with the right Fixes: references to convert all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS protection for now. Please do so before next Wednesday so we get it queued in drm-intel-next-fixes.
Don't we already do that? I checked i915_trace.h and every tracepoint I added (intel_context class, i915_request_guc_submit) is protected by LOW_LEVEL_TRACEPOINTS.
The only thing I changed outside of that protection is adding the guc_id field to existing i915_request class tracepoints.
It's the first search hit for "guc" inside the i915_trace.h file :)
Without the guc_id in those tracepoints these are basically useless with GuC submission. We could revert that if it is a huge deal but as I said then they are useless...
Let's eliminate it for now and restore the tracepoint exactly as it was.
If there is an immediate need, we should instead have an auxilary tracepoint which is enabled only through LOW_LEVEL_TRACEPOINTS and that amends the information of the basic tracepoint.
For the longer term solution we should align towards the dma fence tracepoints. When those are combined with the OA information, one should be able to get a good understanding of both the software and hardware scheduling decisions.
Regards, Joonas
Matt
There's the orthogonal track to discuss what would be the stable set of tracepoints we could expose. However, before that discussion is closed, let's keep a rather strict line to avoid potential maintenance burned.
We can then relax in the future as needed.
Regards, Joonas
Quoting Matthew Brost (2021-06-24 10:04:29)
As discussed in [1], [2] we are enabling GuC submission support in the i915. This is a subset of the patches in step 5 described in [1], basically it is absolute to enable CI with GuC submission on gen11+ platforms.
This series itself will likely be broken down into smaller patch sets to merge. Likely into CTBs changes, basic submission, virtual engines, and resets.
A following series will address the missing patches remaining from [1].
Locally tested on TGL machine and basic tests seem to be passing.
Signed-off-by: Matthew Brost matthew.brost@intel.com
[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/
Daniele Ceraolo Spurio (1): drm/i915/guc: Unblock GuC submission on Gen11+
John Harrison (10): drm/i915/guc: Module load failure test for CT buffer creation drm/i915: Track 'serial' counts for virtual engines drm/i915/guc: Provide mmio list to be saved/restored on engine reset drm/i915/guc: Don't complain about reset races drm/i915/guc: Enable GuC engine reset drm/i915/guc: Fix for error capture after full GPU reset with GuC drm/i915/guc: Hook GuC scheduling policies up drm/i915/guc: Connect reset modparam updates to GuC policy flags drm/i915/guc: Include scheduling policies in the debugfs state dump drm/i915/guc: Add golden context to GuC ADS
Matthew Brost (36): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads drm/i915/guc: Add new GuC interface defines and structures drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor drm/i915/guc: Add lrc descriptor context lookup array drm/i915/guc: Implement GuC submission tasklet drm/i915/guc: Add bypass tasklet submission path to GuC drm/i915/guc: Implement GuC context operations for new inteface drm/i915/guc: Insert fence on context when deregistering drm/i915/guc: Defer context unpin until scheduling is disabled drm/i915/guc: Disable engine barriers with GuC during unpin drm/i915/guc: Extend deregistration fence to schedule disable drm/i915: Disable preempt busywait when using GuC scheduling drm/i915/guc: Ensure request ordering via completion fences drm/i915/guc: Disable semaphores when using GuC scheduling drm/i915/guc: Ensure G2H response has space in buffer drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC drm/i915/guc: Update GuC debugfs to support new GuC drm/i915/guc: Add several request trace points drm/i915: Add intel_context tracing drm/i915/guc: GuC virtual engines drm/i915: Hold reference to intel_context over life of i915_request drm/i915/guc: Disable bonding extension with GuC submission drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs drm/i915/guc: Reset implementation for new GuC interface drm/i915: Reset GPU immediately if submission is disabled drm/i915/guc: Add disable interrupts to guc sanitize drm/i915/guc: Suspend/resume implementation for new interface drm/i915/guc: Handle context reset notification drm/i915/guc: Handle engine reset failure notification drm/i915/guc: Enable the timer expired interrupt for GuC drm/i915/guc: Capture error state on context reset
drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + drivers/gpu/drm/i915/gt/intel_context.c | 41 +- drivers/gpu/drm/i915/gt/intel_context.h | 31 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- .../drm/i915/gt/intel_execlists_submission.c | 234 +- .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/intel_gt.c | 21 + drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + drivers/gpu/drm/i915/gt/intel_rps.c | 4 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- drivers/gpu/drm/i915/gt/selftest_context.c | 10 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/i915_request.c | 159 +- drivers/gpu/drm/i915/i915_request.h | 21 + drivers/gpu/drm/i915/i915_scheduler.c | 6 + drivers/gpu/drm/i915/i915_scheduler.h | 6 + drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + drivers/gpu/drm/i915/i915_trace.h | 197 +- .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 57 files changed, 4159 insertions(+), 787 deletions(-)
-- 2.28.0
On Mon, Oct 25, 2021 at 12:37:02PM +0300, Joonas Lahtinen wrote:
Quoting Matthew Brost (2021-10-22 19:42:19)
On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:
Hi Matt & John,
Can you please queue patches with the right Fixes: references to convert all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS protection for now. Please do so before next Wednesday so we get it queued in drm-intel-next-fixes.
Don't we already do that? I checked i915_trace.h and every tracepoint I added (intel_context class, i915_request_guc_submit) is protected by LOW_LEVEL_TRACEPOINTS.
The only thing I changed outside of that protection is adding the guc_id field to existing i915_request class tracepoints.
It's the first search hit for "guc" inside the i915_trace.h file :)
Without the guc_id in those tracepoints these are basically useless with GuC submission. We could revert that if it is a huge deal but as I said then they are useless...
Let's eliminate it for now and restore the tracepoint exactly as it was.
Don't really agree - let's render tracepoints to be useless? Are tracepoints ABI? I googled this and couldn't really find a definie answer. If tracepoints are ABI, then OK I can revert this change but still this is a poor technical decision (tracepoints should not be ABI).
If there is an immediate need, we should instead have an auxilary tracepoint which is enabled only through LOW_LEVEL_TRACEPOINTS and that amends the information of the basic tracepoint.
Regardless of what I said above, I'll post 2 patches. The 1st just remove the GuC, the 2nd modify the tracepoint to include guc_id if LOW_LEVEL_TRACEPOINTS is defined.
For the longer term solution we should align towards the dma fence tracepoints. When those are combined with the OA information, one should be able to get a good understanding of both the software and hardware scheduling decisions.
Not sure about this either. I use these tracepoins to correlate things to the GuC log. Between the 2, if you know what you are doing you basically can figure out everything that is happening. Fields in the trace translate directly to fields in the GuC log. Some of these fields are backend specific, not sure how these could be pushed the dma fence tracepoints. For what it is worth, without these tracepoints we'd likely still have a bunch of bugs in the GuC firmware. I understand these points, several other i915 developers do, and several of the GuC firmware developers do too.
Matt
Regards, Joonas
Matt
There's the orthogonal track to discuss what would be the stable set of tracepoints we could expose. However, before that discussion is closed, let's keep a rather strict line to avoid potential maintenance burned.
We can then relax in the future as needed.
Regards, Joonas
Quoting Matthew Brost (2021-06-24 10:04:29)
As discussed in [1], [2] we are enabling GuC submission support in the i915. This is a subset of the patches in step 5 described in [1], basically it is absolute to enable CI with GuC submission on gen11+ platforms.
This series itself will likely be broken down into smaller patch sets to merge. Likely into CTBs changes, basic submission, virtual engines, and resets.
A following series will address the missing patches remaining from [1].
Locally tested on TGL machine and basic tests seem to be passing.
Signed-off-by: Matthew Brost matthew.brost@intel.com
[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/
Daniele Ceraolo Spurio (1): drm/i915/guc: Unblock GuC submission on Gen11+
John Harrison (10): drm/i915/guc: Module load failure test for CT buffer creation drm/i915: Track 'serial' counts for virtual engines drm/i915/guc: Provide mmio list to be saved/restored on engine reset drm/i915/guc: Don't complain about reset races drm/i915/guc: Enable GuC engine reset drm/i915/guc: Fix for error capture after full GPU reset with GuC drm/i915/guc: Hook GuC scheduling policies up drm/i915/guc: Connect reset modparam updates to GuC policy flags drm/i915/guc: Include scheduling policies in the debugfs state dump drm/i915/guc: Add golden context to GuC ADS
Matthew Brost (36): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads drm/i915/guc: Add new GuC interface defines and structures drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor drm/i915/guc: Add lrc descriptor context lookup array drm/i915/guc: Implement GuC submission tasklet drm/i915/guc: Add bypass tasklet submission path to GuC drm/i915/guc: Implement GuC context operations for new inteface drm/i915/guc: Insert fence on context when deregistering drm/i915/guc: Defer context unpin until scheduling is disabled drm/i915/guc: Disable engine barriers with GuC during unpin drm/i915/guc: Extend deregistration fence to schedule disable drm/i915: Disable preempt busywait when using GuC scheduling drm/i915/guc: Ensure request ordering via completion fences drm/i915/guc: Disable semaphores when using GuC scheduling drm/i915/guc: Ensure G2H response has space in buffer drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC drm/i915/guc: Update GuC debugfs to support new GuC drm/i915/guc: Add several request trace points drm/i915: Add intel_context tracing drm/i915/guc: GuC virtual engines drm/i915: Hold reference to intel_context over life of i915_request drm/i915/guc: Disable bonding extension with GuC submission drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs drm/i915/guc: Reset implementation for new GuC interface drm/i915: Reset GPU immediately if submission is disabled drm/i915/guc: Add disable interrupts to guc sanitize drm/i915/guc: Suspend/resume implementation for new interface drm/i915/guc: Handle context reset notification drm/i915/guc: Handle engine reset failure notification drm/i915/guc: Enable the timer expired interrupt for GuC drm/i915/guc: Capture error state on context reset
drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + drivers/gpu/drm/i915/gt/intel_context.c | 41 +- drivers/gpu/drm/i915/gt/intel_context.h | 31 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- .../drm/i915/gt/intel_execlists_submission.c | 234 +- .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/intel_gt.c | 21 + drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + drivers/gpu/drm/i915/gt/intel_rps.c | 4 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- drivers/gpu/drm/i915/gt/selftest_context.c | 10 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/i915_request.c | 159 +- drivers/gpu/drm/i915/i915_request.h | 21 + drivers/gpu/drm/i915/i915_scheduler.c | 6 + drivers/gpu/drm/i915/i915_scheduler.h | 6 + drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + drivers/gpu/drm/i915/i915_trace.h | 197 +- .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 57 files changed, 4159 insertions(+), 787 deletions(-)
-- 2.28.0
Quoting Matthew Brost (2021-10-25 18:15:09)
On Mon, Oct 25, 2021 at 12:37:02PM +0300, Joonas Lahtinen wrote:
Quoting Matthew Brost (2021-10-22 19:42:19)
On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:
Hi Matt & John,
Can you please queue patches with the right Fixes: references to convert all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS protection for now. Please do so before next Wednesday so we get it queued in drm-intel-next-fixes.
Don't we already do that? I checked i915_trace.h and every tracepoint I added (intel_context class, i915_request_guc_submit) is protected by LOW_LEVEL_TRACEPOINTS.
The only thing I changed outside of that protection is adding the guc_id field to existing i915_request class tracepoints.
It's the first search hit for "guc" inside the i915_trace.h file :)
Without the guc_id in those tracepoints these are basically useless with GuC submission. We could revert that if it is a huge deal but as I said then they are useless...
Let's eliminate it for now and restore the tracepoint exactly as it was.
Don't really agree - let's render tracepoints to be useless? Are tracepoints ABI? I googled this and couldn't really find a definie answer. If tracepoints are ABI, then OK I can revert this change but still this is a poor technical decision (tracepoints should not be ABI).
Thats a very heated discussion in general. But the fact is that if tracepoint changes have caused regressions to applications, they have been forced to be remain untouched. You are free to raise the discussion with Linus/LKML if you feel that should not be the case. So the end result is that tracepoints are effectively in limbo, not ABI unless some application uses them like ABI.
Feel free to search the intel-gfx/lkml for "tracepoints" keyword and look for threads with many replies. It's not that I would not agree, it's more that I'm not in the mood for repeating that discussion over and over again and always land in the same spot.
So for now, we don't add anything new to tracepoints we can't guarantee to always be there untouched. Similarly, we don't guarantee any of them to remain stable. So we try to be compatible with the limbo.
I'm long overdue waiting for some stable consumer to step up for the tracepoints, so we can then start discussion what would actually be the best way of getting that information out for them. In ~5 years that has not happened.
If there is an immediate need, we should instead have an auxilary tracepoint which is enabled only through LOW_LEVEL_TRACEPOINTS and that amends the information of the basic tracepoint.
Regardless of what I said above, I'll post 2 patches. The 1st just remove the GuC, the 2nd modify the tracepoint to include guc_id if LOW_LEVEL_TRACEPOINTS is defined.
Thanks. Let's get a patch merged which simply drops the guc_id for now to unblock things.
For the second, an auxilary tracepoint will be preferred instead of mutating the existing one (regardless of the LOW_LEVEL_TRACEPOINTS).
I only noticed a patch that mutates the tracepoints, can you double-check sending the first patch?
Regards, Joonas
For the longer term solution we should align towards the dma fence tracepoints. When those are combined with the OA information, one should be able to get a good understanding of both the software and hardware scheduling decisions.
Not sure about this either. I use these tracepoins to correlate things to the GuC log. Between the 2, if you know what you are doing you basically can figure out everything that is happening. Fields in the trace translate directly to fields in the GuC log. Some of these fields are backend specific, not sure how these could be pushed the dma fence tracepoints. For what it is worth, without these tracepoints we'd likely still have a bunch of bugs in the GuC firmware. I understand these points, several other i915 developers do, and several of the GuC firmware developers do too.
Matt
Regards, Joonas
Matt
There's the orthogonal track to discuss what would be the stable set of tracepoints we could expose. However, before that discussion is closed, let's keep a rather strict line to avoid potential maintenance burned.
We can then relax in the future as needed.
Regards, Joonas
Quoting Matthew Brost (2021-06-24 10:04:29)
As discussed in [1], [2] we are enabling GuC submission support in the i915. This is a subset of the patches in step 5 described in [1], basically it is absolute to enable CI with GuC submission on gen11+ platforms.
This series itself will likely be broken down into smaller patch sets to merge. Likely into CTBs changes, basic submission, virtual engines, and resets.
A following series will address the missing patches remaining from [1].
Locally tested on TGL machine and basic tests seem to be passing.
Signed-off-by: Matthew Brost matthew.brost@intel.com
[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/
Daniele Ceraolo Spurio (1): drm/i915/guc: Unblock GuC submission on Gen11+
John Harrison (10): drm/i915/guc: Module load failure test for CT buffer creation drm/i915: Track 'serial' counts for virtual engines drm/i915/guc: Provide mmio list to be saved/restored on engine reset drm/i915/guc: Don't complain about reset races drm/i915/guc: Enable GuC engine reset drm/i915/guc: Fix for error capture after full GPU reset with GuC drm/i915/guc: Hook GuC scheduling policies up drm/i915/guc: Connect reset modparam updates to GuC policy flags drm/i915/guc: Include scheduling policies in the debugfs state dump drm/i915/guc: Add golden context to GuC ADS
Matthew Brost (36): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads drm/i915/guc: Add new GuC interface defines and structures drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor drm/i915/guc: Add lrc descriptor context lookup array drm/i915/guc: Implement GuC submission tasklet drm/i915/guc: Add bypass tasklet submission path to GuC drm/i915/guc: Implement GuC context operations for new inteface drm/i915/guc: Insert fence on context when deregistering drm/i915/guc: Defer context unpin until scheduling is disabled drm/i915/guc: Disable engine barriers with GuC during unpin drm/i915/guc: Extend deregistration fence to schedule disable drm/i915: Disable preempt busywait when using GuC scheduling drm/i915/guc: Ensure request ordering via completion fences drm/i915/guc: Disable semaphores when using GuC scheduling drm/i915/guc: Ensure G2H response has space in buffer drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC drm/i915/guc: Update GuC debugfs to support new GuC drm/i915/guc: Add several request trace points drm/i915: Add intel_context tracing drm/i915/guc: GuC virtual engines drm/i915: Hold reference to intel_context over life of i915_request drm/i915/guc: Disable bonding extension with GuC submission drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs drm/i915/guc: Reset implementation for new GuC interface drm/i915: Reset GPU immediately if submission is disabled drm/i915/guc: Add disable interrupts to guc sanitize drm/i915/guc: Suspend/resume implementation for new interface drm/i915/guc: Handle context reset notification drm/i915/guc: Handle engine reset failure notification drm/i915/guc: Enable the timer expired interrupt for GuC drm/i915/guc: Capture error state on context reset
drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + drivers/gpu/drm/i915/gt/intel_context.c | 41 +- drivers/gpu/drm/i915/gt/intel_context.h | 31 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- .../drm/i915/gt/intel_execlists_submission.c | 234 +- .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/intel_gt.c | 21 + drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + drivers/gpu/drm/i915/gt/intel_rps.c | 4 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- drivers/gpu/drm/i915/gt/selftest_context.c | 10 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/i915_request.c | 159 +- drivers/gpu/drm/i915/i915_request.h | 21 + drivers/gpu/drm/i915/i915_scheduler.c | 6 + drivers/gpu/drm/i915/i915_scheduler.h | 6 + drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + drivers/gpu/drm/i915/i915_trace.h | 197 +- .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 57 files changed, 4159 insertions(+), 787 deletions(-)
-- 2.28.0
On Tue, Oct 26, 2021 at 11:59:35AM +0300, Joonas Lahtinen wrote:
Quoting Matthew Brost (2021-10-25 18:15:09)
On Mon, Oct 25, 2021 at 12:37:02PM +0300, Joonas Lahtinen wrote:
Quoting Matthew Brost (2021-10-22 19:42:19)
On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:
Hi Matt & John,
Can you please queue patches with the right Fixes: references to convert all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS protection for now. Please do so before next Wednesday so we get it queued in drm-intel-next-fixes.
Don't we already do that? I checked i915_trace.h and every tracepoint I added (intel_context class, i915_request_guc_submit) is protected by LOW_LEVEL_TRACEPOINTS.
The only thing I changed outside of that protection is adding the guc_id field to existing i915_request class tracepoints.
It's the first search hit for "guc" inside the i915_trace.h file :)
Without the guc_id in those tracepoints these are basically useless with GuC submission. We could revert that if it is a huge deal but as I said then they are useless...
Let's eliminate it for now and restore the tracepoint exactly as it was.
Don't really agree - let's render tracepoints to be useless? Are tracepoints ABI? I googled this and couldn't really find a definie answer. If tracepoints are ABI, then OK I can revert this change but still this is a poor technical decision (tracepoints should not be ABI).
Thats a very heated discussion in general. But the fact is that if tracepoint changes have caused regressions to applications, they have been forced to be remain untouched. You are free to raise the discussion with Linus/LKML if you feel that should not be the case. So the end result is that tracepoints are effectively in limbo, not ABI unless some application uses them like ABI.
Not trying to start or fight a holy war. If the current rules are don't change tracepoints, we won't. Patch posted, let's stay focused, get an RB, and move on.
Matt
Feel free to search the intel-gfx/lkml for "tracepoints" keyword and look for threads with many replies. It's not that I would not agree, it's more that I'm not in the mood for repeating that discussion over and over again and always land in the same spot.
So for now, we don't add anything new to tracepoints we can't guarantee to always be there untouched. Similarly, we don't guarantee any of them to remain stable. So we try to be compatible with the limbo.
I'm long overdue waiting for some stable consumer to step up for the tracepoints, so we can then start discussion what would actually be the best way of getting that information out for them. In ~5 years that has not happened.
If there is an immediate need, we should instead have an auxilary tracepoint which is enabled only through LOW_LEVEL_TRACEPOINTS and that amends the information of the basic tracepoint.
Regardless of what I said above, I'll post 2 patches. The 1st just remove the GuC, the 2nd modify the tracepoint to include guc_id if LOW_LEVEL_TRACEPOINTS is defined.
Thanks. Let's get a patch merged which simply drops the guc_id for now to unblock things.
For the second, an auxilary tracepoint will be preferred instead of mutating the existing one (regardless of the LOW_LEVEL_TRACEPOINTS).
I only noticed a patch that mutates the tracepoints, can you double-check sending the first patch?
Regards, Joonas
For the longer term solution we should align towards the dma fence tracepoints. When those are combined with the OA information, one should be able to get a good understanding of both the software and hardware scheduling decisions.
Not sure about this either. I use these tracepoins to correlate things to the GuC log. Between the 2, if you know what you are doing you basically can figure out everything that is happening. Fields in the trace translate directly to fields in the GuC log. Some of these fields are backend specific, not sure how these could be pushed the dma fence tracepoints. For what it is worth, without these tracepoints we'd likely still have a bunch of bugs in the GuC firmware. I understand these points, several other i915 developers do, and several of the GuC firmware developers do too.
Matt
Regards, Joonas
Matt
There's the orthogonal track to discuss what would be the stable set of tracepoints we could expose. However, before that discussion is closed, let's keep a rather strict line to avoid potential maintenance burned.
We can then relax in the future as needed.
Regards, Joonas
Quoting Matthew Brost (2021-06-24 10:04:29)
As discussed in [1], [2] we are enabling GuC submission support in the i915. This is a subset of the patches in step 5 described in [1], basically it is absolute to enable CI with GuC submission on gen11+ platforms.
This series itself will likely be broken down into smaller patch sets to merge. Likely into CTBs changes, basic submission, virtual engines, and resets.
A following series will address the missing patches remaining from [1].
Locally tested on TGL machine and basic tests seem to be passing.
Signed-off-by: Matthew Brost matthew.brost@intel.com
[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/
Daniele Ceraolo Spurio (1): drm/i915/guc: Unblock GuC submission on Gen11+
John Harrison (10): drm/i915/guc: Module load failure test for CT buffer creation drm/i915: Track 'serial' counts for virtual engines drm/i915/guc: Provide mmio list to be saved/restored on engine reset drm/i915/guc: Don't complain about reset races drm/i915/guc: Enable GuC engine reset drm/i915/guc: Fix for error capture after full GPU reset with GuC drm/i915/guc: Hook GuC scheduling policies up drm/i915/guc: Connect reset modparam updates to GuC policy flags drm/i915/guc: Include scheduling policies in the debugfs state dump drm/i915/guc: Add golden context to GuC ADS
Matthew Brost (36): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads drm/i915/guc: Add new GuC interface defines and structures drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor drm/i915/guc: Add lrc descriptor context lookup array drm/i915/guc: Implement GuC submission tasklet drm/i915/guc: Add bypass tasklet submission path to GuC drm/i915/guc: Implement GuC context operations for new inteface drm/i915/guc: Insert fence on context when deregistering drm/i915/guc: Defer context unpin until scheduling is disabled drm/i915/guc: Disable engine barriers with GuC during unpin drm/i915/guc: Extend deregistration fence to schedule disable drm/i915: Disable preempt busywait when using GuC scheduling drm/i915/guc: Ensure request ordering via completion fences drm/i915/guc: Disable semaphores when using GuC scheduling drm/i915/guc: Ensure G2H response has space in buffer drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC drm/i915/guc: Update GuC debugfs to support new GuC drm/i915/guc: Add several request trace points drm/i915: Add intel_context tracing drm/i915/guc: GuC virtual engines drm/i915: Hold reference to intel_context over life of i915_request drm/i915/guc: Disable bonding extension with GuC submission drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs drm/i915/guc: Reset implementation for new GuC interface drm/i915: Reset GPU immediately if submission is disabled drm/i915/guc: Add disable interrupts to guc sanitize drm/i915/guc: Suspend/resume implementation for new interface drm/i915/guc: Handle context reset notification drm/i915/guc: Handle engine reset failure notification drm/i915/guc: Enable the timer expired interrupt for GuC drm/i915/guc: Capture error state on context reset
drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + drivers/gpu/drm/i915/gt/intel_context.c | 41 +- drivers/gpu/drm/i915/gt/intel_context.h | 31 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- .../drm/i915/gt/intel_execlists_submission.c | 234 +- .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/intel_gt.c | 21 + drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + drivers/gpu/drm/i915/gt/intel_rps.c | 4 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- drivers/gpu/drm/i915/gt/selftest_context.c | 10 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/i915_request.c | 159 +- drivers/gpu/drm/i915/i915_request.h | 21 + drivers/gpu/drm/i915/i915_scheduler.c | 6 + drivers/gpu/drm/i915/i915_scheduler.h | 6 + drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + drivers/gpu/drm/i915/i915_trace.h | 197 +- .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 57 files changed, 4159 insertions(+), 787 deletions(-)
-- 2.28.0
On Tue, Oct 26, 2021 at 11:59:35AM +0300, Joonas Lahtinen wrote:
Quoting Matthew Brost (2021-10-25 18:15:09)
On Mon, Oct 25, 2021 at 12:37:02PM +0300, Joonas Lahtinen wrote:
Quoting Matthew Brost (2021-10-22 19:42:19)
On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:
Hi Matt & John,
Can you please queue patches with the right Fixes: references to convert all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS protection for now. Please do so before next Wednesday so we get it queued in drm-intel-next-fixes.
Don't we already do that? I checked i915_trace.h and every tracepoint I added (intel_context class, i915_request_guc_submit) is protected by LOW_LEVEL_TRACEPOINTS.
The only thing I changed outside of that protection is adding the guc_id field to existing i915_request class tracepoints.
It's the first search hit for "guc" inside the i915_trace.h file :)
Without the guc_id in those tracepoints these are basically useless with GuC submission. We could revert that if it is a huge deal but as I said then they are useless...
Let's eliminate it for now and restore the tracepoint exactly as it was.
Don't really agree - let's render tracepoints to be useless? Are tracepoints ABI? I googled this and couldn't really find a definie answer. If tracepoints are ABI, then OK I can revert this change but still this is a poor technical decision (tracepoints should not be ABI).
Thats a very heated discussion in general. But the fact is that if tracepoint changes have caused regressions to applications, they have been forced to be remain untouched. You are free to raise the discussion with Linus/LKML if you feel that should not be the case. So the end result is that tracepoints are effectively in limbo, not ABI unless some application uses them like ABI.
Feel free to search the intel-gfx/lkml for "tracepoints" keyword and look for threads with many replies. It's not that I would not agree, it's more that I'm not in the mood for repeating that discussion over and over again and always land in the same spot.
So for now, we don't add anything new to tracepoints we can't guarantee to always be there untouched. Similarly, we don't guarantee any of them to remain stable. So we try to be compatible with the limbo.
I'm long overdue waiting for some stable consumer to step up for the tracepoints, so we can then start discussion what would actually be the best way of getting that information out for them. In ~5 years that has not happened.
If there is an immediate need, we should instead have an auxilary tracepoint which is enabled only through LOW_LEVEL_TRACEPOINTS and that amends the information of the basic tracepoint.
Regardless of what I said above, I'll post 2 patches. The 1st just remove the GuC, the 2nd modify the tracepoint to include guc_id if LOW_LEVEL_TRACEPOINTS is defined.
Thanks. Let's get a patch merged which simply drops the guc_id for now to unblock things.
For the second, an auxilary tracepoint will be preferred instead of mutating the existing one (regardless of the LOW_LEVEL_TRACEPOINTS).
I only noticed a patch that mutates the tracepoints, can you double-check sending the first patch?
Sorry for the double reply - missed this one in the first.
I changed my plans / mind after I send the original email. I only sent a patch which includes guc_id when LOW_LEVEL_TRACEPOINTS is enabled. That is the bear minimum I live with. Without it any time there is a problem results in hacking the kernel. I can't do that. This is a good compromise.
Matt
Regards, Joonas
For the longer term solution we should align towards the dma fence tracepoints. When those are combined with the OA information, one should be able to get a good understanding of both the software and hardware scheduling decisions.
Not sure about this either. I use these tracepoins to correlate things to the GuC log. Between the 2, if you know what you are doing you basically can figure out everything that is happening. Fields in the trace translate directly to fields in the GuC log. Some of these fields are backend specific, not sure how these could be pushed the dma fence tracepoints. For what it is worth, without these tracepoints we'd likely still have a bunch of bugs in the GuC firmware. I understand these points, several other i915 developers do, and several of the GuC firmware developers do too.
Matt
Regards, Joonas
Matt
There's the orthogonal track to discuss what would be the stable set of tracepoints we could expose. However, before that discussion is closed, let's keep a rather strict line to avoid potential maintenance burned.
We can then relax in the future as needed.
Regards, Joonas
Quoting Matthew Brost (2021-06-24 10:04:29)
As discussed in [1], [2] we are enabling GuC submission support in the i915. This is a subset of the patches in step 5 described in [1], basically it is absolute to enable CI with GuC submission on gen11+ platforms.
This series itself will likely be broken down into smaller patch sets to merge. Likely into CTBs changes, basic submission, virtual engines, and resets.
A following series will address the missing patches remaining from [1].
Locally tested on TGL machine and basic tests seem to be passing.
Signed-off-by: Matthew Brost matthew.brost@intel.com
[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/
Daniele Ceraolo Spurio (1): drm/i915/guc: Unblock GuC submission on Gen11+
John Harrison (10): drm/i915/guc: Module load failure test for CT buffer creation drm/i915: Track 'serial' counts for virtual engines drm/i915/guc: Provide mmio list to be saved/restored on engine reset drm/i915/guc: Don't complain about reset races drm/i915/guc: Enable GuC engine reset drm/i915/guc: Fix for error capture after full GPU reset with GuC drm/i915/guc: Hook GuC scheduling policies up drm/i915/guc: Connect reset modparam updates to GuC policy flags drm/i915/guc: Include scheduling policies in the debugfs state dump drm/i915/guc: Add golden context to GuC ADS
Matthew Brost (36): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads drm/i915/guc: Add new GuC interface defines and structures drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor drm/i915/guc: Add lrc descriptor context lookup array drm/i915/guc: Implement GuC submission tasklet drm/i915/guc: Add bypass tasklet submission path to GuC drm/i915/guc: Implement GuC context operations for new inteface drm/i915/guc: Insert fence on context when deregistering drm/i915/guc: Defer context unpin until scheduling is disabled drm/i915/guc: Disable engine barriers with GuC during unpin drm/i915/guc: Extend deregistration fence to schedule disable drm/i915: Disable preempt busywait when using GuC scheduling drm/i915/guc: Ensure request ordering via completion fences drm/i915/guc: Disable semaphores when using GuC scheduling drm/i915/guc: Ensure G2H response has space in buffer drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC drm/i915/guc: Update GuC debugfs to support new GuC drm/i915/guc: Add several request trace points drm/i915: Add intel_context tracing drm/i915/guc: GuC virtual engines drm/i915: Hold reference to intel_context over life of i915_request drm/i915/guc: Disable bonding extension with GuC submission drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs drm/i915/guc: Reset implementation for new GuC interface drm/i915: Reset GPU immediately if submission is disabled drm/i915/guc: Add disable interrupts to guc sanitize drm/i915/guc: Suspend/resume implementation for new interface drm/i915/guc: Handle context reset notification drm/i915/guc: Handle engine reset failure notification drm/i915/guc: Enable the timer expired interrupt for GuC drm/i915/guc: Capture error state on context reset
drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + drivers/gpu/drm/i915/gt/intel_context.c | 41 +- drivers/gpu/drm/i915/gt/intel_context.h | 31 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- .../drm/i915/gt/intel_execlists_submission.c | 234 +- .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/intel_gt.c | 21 + drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + drivers/gpu/drm/i915/gt/intel_rps.c | 4 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- drivers/gpu/drm/i915/gt/selftest_context.c | 10 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/i915_request.c | 159 +- drivers/gpu/drm/i915/i915_request.h | 21 + drivers/gpu/drm/i915/i915_scheduler.c | 6 + drivers/gpu/drm/i915/i915_scheduler.h | 6 + drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + drivers/gpu/drm/i915/i915_trace.h | 197 +- .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 57 files changed, 4159 insertions(+), 787 deletions(-)
-- 2.28.0
Quoting Matthew Brost (2021-10-26 18:51:17)
On Tue, Oct 26, 2021 at 11:59:35AM +0300, Joonas Lahtinen wrote:
Quoting Matthew Brost (2021-10-25 18:15:09)
On Mon, Oct 25, 2021 at 12:37:02PM +0300, Joonas Lahtinen wrote:
Quoting Matthew Brost (2021-10-22 19:42:19)
On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:
Hi Matt & John,
Can you please queue patches with the right Fixes: references to convert all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS protection for now. Please do so before next Wednesday so we get it queued in drm-intel-next-fixes.
Don't we already do that? I checked i915_trace.h and every tracepoint I added (intel_context class, i915_request_guc_submit) is protected by LOW_LEVEL_TRACEPOINTS.
The only thing I changed outside of that protection is adding the guc_id field to existing i915_request class tracepoints.
It's the first search hit for "guc" inside the i915_trace.h file :)
Without the guc_id in those tracepoints these are basically useless with GuC submission. We could revert that if it is a huge deal but as I said then they are useless...
Let's eliminate it for now and restore the tracepoint exactly as it was.
Don't really agree - let's render tracepoints to be useless? Are tracepoints ABI? I googled this and couldn't really find a definie answer. If tracepoints are ABI, then OK I can revert this change but still this is a poor technical decision (tracepoints should not be ABI).
Thats a very heated discussion in general. But the fact is that if tracepoint changes have caused regressions to applications, they have been forced to be remain untouched. You are free to raise the discussion with Linus/LKML if you feel that should not be the case. So the end result is that tracepoints are effectively in limbo, not ABI unless some application uses them like ABI.
Feel free to search the intel-gfx/lkml for "tracepoints" keyword and look for threads with many replies. It's not that I would not agree, it's more that I'm not in the mood for repeating that discussion over and over again and always land in the same spot.
So for now, we don't add anything new to tracepoints we can't guarantee to always be there untouched. Similarly, we don't guarantee any of them to remain stable. So we try to be compatible with the limbo.
I'm long overdue waiting for some stable consumer to step up for the tracepoints, so we can then start discussion what would actually be the best way of getting that information out for them. In ~5 years that has not happened.
If there is an immediate need, we should instead have an auxilary tracepoint which is enabled only through LOW_LEVEL_TRACEPOINTS and that amends the information of the basic tracepoint.
Regardless of what I said above, I'll post 2 patches. The 1st just remove the GuC, the 2nd modify the tracepoint to include guc_id if LOW_LEVEL_TRACEPOINTS is defined.
Thanks. Let's get a patch merged which simply drops the guc_id for now to unblock things.
For the second, an auxilary tracepoint will be preferred instead of mutating the existing one (regardless of the LOW_LEVEL_TRACEPOINTS).
I only noticed a patch that mutates the tracepoints, can you double-check sending the first patch?
Sorry for the double reply - missed this one in the first.
I changed my plans / mind after I send the original email. I only sent a patch which includes guc_id when LOW_LEVEL_TRACEPOINTS is enabled. That is the bear minimum I live with. Without it any time there is a problem results in hacking the kernel. I can't do that. This is a good compromise.
When it comes to fixing a regression, it should be done with the minimal revert/change with "Fixes:" as suggested originally.
Then we can leave the discussion for how to best cover the gap you pointed out, to be resolved in the second patch. There are clearly at least two ways to approach it, either mutate the original tracepoint or add an auxilary tracepoint to amend the information. So we should have a quick discussion between the involved parties which is a better approach.
We should not fix the regression in a patch where we also initiate a change in behavior. That'll make bisecting and backporting patches a pain.
So even if the patches would be merged back-to-back to the tree at the same time, they should be different patches. And in this case it seems that the latter patch should have some discussion to reach a rough consensus and that should not delay the delivery of the fix itself.
I will queue a patch to do the fix. We should continue the discussion about auxilary tracepoint vs. mutating an existing tracepoint based on the kernel config option in an another thread.
Regards, Joonas
Matt
Regards, Joonas
For the longer term solution we should align towards the dma fence tracepoints. When those are combined with the OA information, one should be able to get a good understanding of both the software and hardware scheduling decisions.
Not sure about this either. I use these tracepoins to correlate things to the GuC log. Between the 2, if you know what you are doing you basically can figure out everything that is happening. Fields in the trace translate directly to fields in the GuC log. Some of these fields are backend specific, not sure how these could be pushed the dma fence tracepoints. For what it is worth, without these tracepoints we'd likely still have a bunch of bugs in the GuC firmware. I understand these points, several other i915 developers do, and several of the GuC firmware developers do too.
Matt
Regards, Joonas
Matt
There's the orthogonal track to discuss what would be the stable set of tracepoints we could expose. However, before that discussion is closed, let's keep a rather strict line to avoid potential maintenance burned.
We can then relax in the future as needed.
Regards, Joonas
Quoting Matthew Brost (2021-06-24 10:04:29) > As discussed in [1], [2] we are enabling GuC submission support in the > i915. This is a subset of the patches in step 5 described in [1], > basically it is absolute to enable CI with GuC submission on gen11+ > platforms. > > This series itself will likely be broken down into smaller patch sets to > merge. Likely into CTBs changes, basic submission, virtual engines, and > resets. > > A following series will address the missing patches remaining from [1]. > > Locally tested on TGL machine and basic tests seem to be passing. > > Signed-off-by: Matthew Brost matthew.brost@intel.com > > [1] https://patchwork.freedesktop.org/series/89844/ > [2] https://patchwork.freedesktop.org/series/91417/ > > Daniele Ceraolo Spurio (1): > drm/i915/guc: Unblock GuC submission on Gen11+ > > John Harrison (10): > drm/i915/guc: Module load failure test for CT buffer creation > drm/i915: Track 'serial' counts for virtual engines > drm/i915/guc: Provide mmio list to be saved/restored on engine reset > drm/i915/guc: Don't complain about reset races > drm/i915/guc: Enable GuC engine reset > drm/i915/guc: Fix for error capture after full GPU reset with GuC > drm/i915/guc: Hook GuC scheduling policies up > drm/i915/guc: Connect reset modparam updates to GuC policy flags > drm/i915/guc: Include scheduling policies in the debugfs state dump > drm/i915/guc: Add golden context to GuC ADS > > Matthew Brost (36): > drm/i915/guc: Relax CTB response timeout > drm/i915/guc: Improve error message for unsolicited CT response > drm/i915/guc: Increase size of CTB buffers > drm/i915/guc: Add non blocking CTB send function > drm/i915/guc: Add stall timer to non blocking CTB send function > drm/i915/guc: Optimize CTB writes and reads > drm/i915/guc: Add new GuC interface defines and structures > drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor > drm/i915/guc: Add lrc descriptor context lookup array > drm/i915/guc: Implement GuC submission tasklet > drm/i915/guc: Add bypass tasklet submission path to GuC > drm/i915/guc: Implement GuC context operations for new inteface > drm/i915/guc: Insert fence on context when deregistering > drm/i915/guc: Defer context unpin until scheduling is disabled > drm/i915/guc: Disable engine barriers with GuC during unpin > drm/i915/guc: Extend deregistration fence to schedule disable > drm/i915: Disable preempt busywait when using GuC scheduling > drm/i915/guc: Ensure request ordering via completion fences > drm/i915/guc: Disable semaphores when using GuC scheduling > drm/i915/guc: Ensure G2H response has space in buffer > drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC > drm/i915/guc: Update GuC debugfs to support new GuC > drm/i915/guc: Add several request trace points > drm/i915: Add intel_context tracing > drm/i915/guc: GuC virtual engines > drm/i915: Hold reference to intel_context over life of i915_request > drm/i915/guc: Disable bonding extension with GuC submission > drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs > drm/i915/guc: Reset implementation for new GuC interface > drm/i915: Reset GPU immediately if submission is disabled > drm/i915/guc: Add disable interrupts to guc sanitize > drm/i915/guc: Suspend/resume implementation for new interface > drm/i915/guc: Handle context reset notification > drm/i915/guc: Handle engine reset failure notification > drm/i915/guc: Enable the timer expired interrupt for GuC > drm/i915/guc: Capture error state on context reset > > drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- > drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + > drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- > drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- > drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- > drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- > .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + > drivers/gpu/drm/i915/gt/intel_context.c | 41 +- > drivers/gpu/drm/i915/gt/intel_context.h | 31 +- > drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + > drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- > .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + > drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- > .../drm/i915/gt/intel_execlists_submission.c | 234 +- > .../drm/i915/gt/intel_execlists_submission.h | 11 - > drivers/gpu/drm/i915/gt/intel_gt.c | 21 + > drivers/gpu/drm/i915/gt/intel_gt.h | 2 + > drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- > drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- > drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- > drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - > drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- > .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + > drivers/gpu/drm/i915/gt/intel_rps.c | 4 + > drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- > .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + > drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- > drivers/gpu/drm/i915/gt/selftest_context.c | 10 + > drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- > .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + > drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- > drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- > drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- > drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- > .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- > drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- > .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- > drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- > drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + > drivers/gpu/drm/i915/i915_debugfs.c | 2 + > drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + > drivers/gpu/drm/i915/i915_gem_evict.c | 1 + > drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- > drivers/gpu/drm/i915/i915_reg.h | 2 + > drivers/gpu/drm/i915/i915_request.c | 159 +- > drivers/gpu/drm/i915/i915_request.h | 21 + > drivers/gpu/drm/i915/i915_scheduler.c | 6 + > drivers/gpu/drm/i915/i915_scheduler.h | 6 + > drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + > drivers/gpu/drm/i915/i915_trace.h | 197 +- > .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- > .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- > 57 files changed, 4159 insertions(+), 787 deletions(-) > > -- > 2.28.0 >
On 10/25/2021 02:37, Joonas Lahtinen wrote:
Quoting Matthew Brost (2021-10-22 19:42:19)
On Fri, Oct 22, 2021 at 12:35:04PM +0300, Joonas Lahtinen wrote:
Hi Matt & John,
Can you please queue patches with the right Fixes: references to convert all the GuC tracepoints to be protected by the LOW_LEVEL_TRACEPOINTS protection for now. Please do so before next Wednesday so we get it queued in drm-intel-next-fixes.
Don't we already do that? I checked i915_trace.h and every tracepoint I added (intel_context class, i915_request_guc_submit) is protected by LOW_LEVEL_TRACEPOINTS.
The only thing I changed outside of that protection is adding the guc_id field to existing i915_request class tracepoints.
It's the first search hit for "guc" inside the i915_trace.h file :)
Without the guc_id in those tracepoints these are basically useless with GuC submission. We could revert that if it is a huge deal but as I said then they are useless...
Let's eliminate it for now and restore the tracepoint exactly as it was.
For what purpose?
Your request above was about not adding new tracepoints outside of a low level CONFIG setting. I can understand that on the grounds of not swamping high level tracing with low level details that are not important to the general developer.
But this is not about adding extra tracepoints, this is about making the existing tracepoints usable. With GuC submission, the GuC id is a vital piece of information. Without that, you cannot correlate anything that is happening between i915, GuC and the hardware. Which basically means that for a GuC submission based platform, those tracepoints are useless without this information. And GuC submission is POR for all platforms from ADL-P/DG1 onwards. So by not allowing this update, you are preventing any kind of meaningful debug of any scheduling/execution type issues.
Again, if you are wanting to reduce spam in higher level debug then sure, make the entire set of scheduling tracepoints LOW_LEVEL only. But keeping them around in a censored manner is pointless. They are not ABI, they are allowed to change as and when necessary. And now, it is necessary to update them to match the new POR submission model for current and all future platforms.
If there is an immediate need, we should instead have an auxilary tracepoint which is enabled only through LOW_LEVEL_TRACEPOINTS and that amends the information of the basic tracepoint.
For the longer term solution we should align towards the dma fence tracepoints. When those are combined with the OA information, one should be able to get a good understanding of both the software and hardware scheduling decisions.
I don't follow this. OA information does not tell you any details of what the GuC is doing. DRM/DMA generic tracepoints certainly won't tell you any hardware/firmware or even i915 specific information.
And that is a much longer term goal than being able to debug current platforms with the current driver.
John.
Regards, Joonas
Matt
There's the orthogonal track to discuss what would be the stable set of tracepoints we could expose. However, before that discussion is closed, let's keep a rather strict line to avoid potential maintenance burned.
We can then relax in the future as needed.
Regards, Joonas
Quoting Matthew Brost (2021-06-24 10:04:29)
As discussed in [1], [2] we are enabling GuC submission support in the i915. This is a subset of the patches in step 5 described in [1], basically it is absolute to enable CI with GuC submission on gen11+ platforms.
This series itself will likely be broken down into smaller patch sets to merge. Likely into CTBs changes, basic submission, virtual engines, and resets.
A following series will address the missing patches remaining from [1].
Locally tested on TGL machine and basic tests seem to be passing.
Signed-off-by: Matthew Brost matthew.brost@intel.com
[1] https://patchwork.freedesktop.org/series/89844/ [2] https://patchwork.freedesktop.org/series/91417/
Daniele Ceraolo Spurio (1): drm/i915/guc: Unblock GuC submission on Gen11+
John Harrison (10): drm/i915/guc: Module load failure test for CT buffer creation drm/i915: Track 'serial' counts for virtual engines drm/i915/guc: Provide mmio list to be saved/restored on engine reset drm/i915/guc: Don't complain about reset races drm/i915/guc: Enable GuC engine reset drm/i915/guc: Fix for error capture after full GPU reset with GuC drm/i915/guc: Hook GuC scheduling policies up drm/i915/guc: Connect reset modparam updates to GuC policy flags drm/i915/guc: Include scheduling policies in the debugfs state dump drm/i915/guc: Add golden context to GuC ADS
Matthew Brost (36): drm/i915/guc: Relax CTB response timeout drm/i915/guc: Improve error message for unsolicited CT response drm/i915/guc: Increase size of CTB buffers drm/i915/guc: Add non blocking CTB send function drm/i915/guc: Add stall timer to non blocking CTB send function drm/i915/guc: Optimize CTB writes and reads drm/i915/guc: Add new GuC interface defines and structures drm/i915/guc: Remove GuC stage descriptor, add lrc descriptor drm/i915/guc: Add lrc descriptor context lookup array drm/i915/guc: Implement GuC submission tasklet drm/i915/guc: Add bypass tasklet submission path to GuC drm/i915/guc: Implement GuC context operations for new inteface drm/i915/guc: Insert fence on context when deregistering drm/i915/guc: Defer context unpin until scheduling is disabled drm/i915/guc: Disable engine barriers with GuC during unpin drm/i915/guc: Extend deregistration fence to schedule disable drm/i915: Disable preempt busywait when using GuC scheduling drm/i915/guc: Ensure request ordering via completion fences drm/i915/guc: Disable semaphores when using GuC scheduling drm/i915/guc: Ensure G2H response has space in buffer drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC drm/i915/guc: Update GuC debugfs to support new GuC drm/i915/guc: Add several request trace points drm/i915: Add intel_context tracing drm/i915/guc: GuC virtual engines drm/i915: Hold reference to intel_context over life of i915_request drm/i915/guc: Disable bonding extension with GuC submission drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs drm/i915/guc: Reset implementation for new GuC interface drm/i915: Reset GPU immediately if submission is disabled drm/i915/guc: Add disable interrupts to guc sanitize drm/i915/guc: Suspend/resume implementation for new interface drm/i915/guc: Handle context reset notification drm/i915/guc: Handle engine reset failure notification drm/i915/guc: Enable the timer expired interrupt for GuC drm/i915/guc: Capture error state on context reset
drivers/gpu/drm/i915/gem/i915_gem_context.c | 30 +- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 41 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.h | 14 +- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 7 + drivers/gpu/drm/i915/gt/intel_context.c | 41 +- drivers/gpu/drm/i915/gt/intel_context.h | 31 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 49 + drivers/gpu/drm/i915/gt/intel_engine.h | 72 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 182 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 71 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 12 +- .../drm/i915/gt/intel_execlists_submission.c | 234 +- .../drm/i915/gt/intel_execlists_submission.h | 11 - drivers/gpu/drm/i915/gt/intel_gt.c | 21 + drivers/gpu/drm/i915/gt/intel_gt.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_pm.c | 6 +- drivers/gpu/drm/i915/gt/intel_gt_requests.c | 22 +- drivers/gpu/drm/i915/gt/intel_gt_requests.h | 9 +- drivers/gpu/drm/i915/gt/intel_lrc_reg.h | 1 - drivers/gpu/drm/i915/gt/intel_reset.c | 20 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 28 + drivers/gpu/drm/i915/gt/intel_rps.c | 4 + drivers/gpu/drm/i915/gt/intel_workarounds.c | 46 +- .../gpu/drm/i915/gt/intel_workarounds_types.h | 1 + drivers/gpu/drm/i915/gt/mock_engine.c | 41 +- drivers/gpu/drm/i915/gt/selftest_context.c | 10 + drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +- .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 15 + drivers/gpu/drm/i915/gt/uc/intel_guc.c | 82 +- drivers/gpu/drm/i915/gt/uc/intel_guc.h | 106 +- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 460 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h | 3 + drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 318 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h | 22 +- .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c | 25 +- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 88 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2197 +++++++++++++++-- .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 17 +- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 102 +- drivers/gpu/drm/i915/gt/uc/intel_uc.h | 11 + drivers/gpu/drm/i915/i915_debugfs.c | 2 + drivers/gpu/drm/i915/i915_debugfs_params.c | 31 + drivers/gpu/drm/i915/i915_gem_evict.c | 1 + drivers/gpu/drm/i915/i915_gpu_error.c | 25 +- drivers/gpu/drm/i915/i915_reg.h | 2 + drivers/gpu/drm/i915/i915_request.c | 159 +- drivers/gpu/drm/i915/i915_request.h | 21 + drivers/gpu/drm/i915/i915_scheduler.c | 6 + drivers/gpu/drm/i915/i915_scheduler.h | 6 + drivers/gpu/drm/i915/i915_scheduler_types.h | 5 + drivers/gpu/drm/i915/i915_trace.h | 197 +- .../gpu/drm/i915/selftests/igt_live_test.c | 2 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 3 +- 57 files changed, 4159 insertions(+), 787 deletions(-)
-- 2.28.0
dri-devel@lists.freedesktop.org