On Tue, Aug 17, 2021 at 11:47:53AM +0200, Daniel Vetter wrote:
On Mon, Aug 16, 2021 at 06:51:25AM -0700, Matthew Brost wrote:
When unblocking a context, do not enable scheduling if the context is banned, guc_id invalid, or not registered.
Fixes: 62eaf0ae217d ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost matthew.brost@intel.com Cc: stable@vger.kernel.org
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index c3b7bf7319dd..353899634fa8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1579,6 +1579,9 @@ static void guc_context_unblock(struct intel_context *ce) spin_lock_irqsave(&ce->guc_state.lock, flags);
if (unlikely(submission_disabled(guc) ||
intel_context_is_banned(ce) ||
context_guc_id_invalid(ce) ||
!lrc_desc_registered(guc, ce->guc_id) || !intel_context_is_pinned(ce) || context_pending_disable(ce) || context_blocked(ce) > 1)) {
I think this entire if condition here is screaming that our intel_context state machinery for guc is way too complex, and on the wrong side of incomprehensible.
Also some of these check state outside of the context, and we don't seem to hold spinlocks for those, or anything else.
I general I have no idea which of these are defensive programming and cannot ever happen, and which actually can happen. There's for sure way too many races going on given that this is all context-local stuff.
A lot of this is guarding against a full GT reset while trying to cancel a request. Full GT resets make everything really hard and in pratice should never really happen because the GuC does per engine / context resets. Unfortunately IGTs do weird things like turn off per engine / contexts resets and full GT reset the only way to recover so the IGTs can will expose all the races around GT reset, especially when we run IGTs a pre-prod HW that tends to hang for whatever reason.
Matt
-Daniel
Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch