On Wed, Jul 19, 2017 at 3:44 PM, Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Wed, Jul 19, 2017 at 3:32 PM, Chris Wilson chris@chris-wilson.co.uk wrote:
Quoting Daniel Vetter (2017-07-19 13:54:56)
... using the biggest hammer we have. This is essentially a weaponized version of the timeout-based wedging Chris added in
commit 36703e79a982c8ce5a8e43833291f2719e92d0d1 Author: Chris Wilson chris@chris-wilson.co.uk Date: Thu Jun 22 11:56:25 2017 +0100
drm/i915: Break modeset deadlocks on reset
Because defense-in-depth is good it's good to still have both. Also note that with the locking change we can now restrict this a lot (old gpus and special testing only), so this doesn't kill the TDR benefits on at least anything remotely modern.
And futuremore with a few tricks it should be possible to make a much more educated guess about whether an atomic commit is stuck waiting on the gpu (atomic_t counting the pending i915_sw_fence used by the atomic modeset code should do it), so we can improve this.
But for now just start with something that is guaranteed to recover faster, for much better CI througput.
This defacto reverts TDR on these platforms, but there's not really a single commit to specify as the sole offender.
Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion") Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter") Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com
drivers/gpu/drm/i915/intel_display.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 97777ffa1566..010a1f3e000c 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -3471,6 +3471,11 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv) !gpu_reset_clobbers_display(dev_priv)) return;
/* We have a modeset vs reset deadlock, defensively unbreak it.
*
* FIXME: We can do a _lot_ better, this is just a first iteration.*/
You should keep the error message for wedging the device. If all goes well it is removed in a few patches, so shouldn't be an eyesore for long.
Yeah makes sense, just in case the next patches need to be reverted for some reasons. That's why I split it ou. Something like
DRM_ERROR("Wedging gpu to unblock modesets\n");
and then remove that again 2 patches later?
After thinking about this a bit more, s/DRM_ERROR/DRM_DEBUG_KMS/ or so. We know we can deadlock here, complaining about that only spams dmesg and results in noise in CI, hiding worse stuff. The timeout based wedging as fallback should have a DRM_ERROR since it's unexpected, this one here imo sholdn't. And with the follow-up patches it won't be unconditional anymore (if we don't have to revert them again). -Daniel