Re: [PATCH 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore

30 Jun 2017


      On Fri, Jun 30, 2017 at 03:30:33PM +0200, Daniel Vetter wrote:
...
On Fri, Jun 30, 2017 at 03:25:58PM +0200, Daniel Vetter wrote:
...
On Thu, Jun 29, 2017 at 04:49:48PM +0300, ville.syrjala@linux.intel.com wrote:
...
From: Ville Syrjälä ville.syrjala@linux.intel.com
Introduce an rw_semaphore to protect the display commits. All normal
commits use down_read() and hence can proceed in parallel, but GPU reset
will use down_write() making sure no other commits are in progress when
we have to pull the plug on the display engine on pre-g4x platforms.
There are no modeset/gem locks taken inside __intel_atomic_commit_tail()
itself, and we wait for all dependencies before the down_read(), and
thus there is no chance of deadlocks with this scheme.
How does this solve the deadlock? Afaiui the deadlock is that the gpu
reset stopped unconditionally completing all requests before it did
anything else, including anything with the hw or taking modeset locks.
This ensured that any outstanding flips (we only had pageflips, no atomic
ones back then) could complete (although maybe displaying garbage). The
only thing we had to do was grab the locks (to avoid concurrent modesets)
and then we could happily nuke the hw (since the flips where lost in the
CS anyway), and restore it afterwards.
Then the TDR rewrite came around and broke that, which now makes atomic
time out waiting for the gpu to complete (because the gpu reset waits for
the modeset to complete first). Which means if you want to fix this
without breaking TDR, then you need to cancel the pending atomic commits.
That seems somewhat what you're attempting here with trying to figure out
what the latest hw-committed step is (but that function gets it wrong),
and lots more locking and stuff on top.
Why exactly can't we just go back to the old world of force-completing
dead requests on gen4 and earlier? That would be tons simpler imo instead
of throwing a pile of hacks (which really needs a complete rewrite of the
atomic code in i915) in as a work-around. We didn't have TDR on gen4 and
earlier for years, going back to that isn't going to hurt anyone.
Making working gen4 gpu reset contigent on cancellable atomic commits and
the full commit queue is imo nuts.
And if the GEM folks insist the old behavior can't be restored, then we
just need a tailor-made get-out-of-jail card for gen4 gpu reset somewhere
in i915_sw_fence. Force-completing all render requests atomic updates
depend upon is imo the simplest solution to this, and we've had a driver
that worked like that for years.
And it used to break all the time. I think we've had to fix it at least
three times by now. So I tend to think it's better to fix it in a way
that won't break so easily.
-- 
Ville Syrjälä
Intel OTC

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH 5/5] drm/i915: Solve the GPU reset vs. modeset deadlocks with an rw_semaphore