Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915

3 Jun 2021

      On Thu, Jun 3, 2021 at 5:48 AM Matthew Brost matthew.brost@intel.com wrote:
...
On Wed, Jun 02, 2021 at 08:57:02PM +0200, Daniel Vetter wrote:
...
On Wed, Jun 2, 2021 at 5:27 PM Tvrtko Ursulin
tvrtko.ursulin@linux.intel.com wrote:
...
On 25/05/2021 17:45, Matthew Brost wrote:
...
On Tue, May 25, 2021 at 11:32:26AM +0100, Tvrtko Ursulin wrote:
...

Context pinning code with it's magical two adds, subtract and cmpxchg is

dodgy as well.
Daniele tried to remove this and it proved quite difficult + created
even more races in the backend code. This was prior to the pre-pin and
post-unpin code which makes this even more difficult to fix as I believe
these functions would need to be removed first. Not saying we can't
revisit this someday but I personally really like it - it is a clever
way to avoid reentering the pin / unpin code while asynchronous things
are happening rather than some complex locking scheme. Lastly, this code
has proved incredibly stable as I don't think we've had to fix a single
thing in this area since we've been using this code internally.
Pretty much same as above. The code like:
static inline void __intel_context_unpin(struct intel_context *ce)
{
        if (!ce->ops->sched_disable) {
                __intel_context_do_unpin(ce, 1);
        } else {
                while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
                        if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
                                ce->ops->sched_disable(ce);
                                break;
                        }
                }
        }
}
That's pretty much impenetrable for me and the only thing I can think of
here is **ALARM** must be broken! See what others think..
Yea, probably should add a comment:
/*

If the context has the sched_disable function, it isn't safe to unpin
until this function completes. This function is allowed to complete
asynchronously too. To avoid this function from being entered twice
and move ownership of the unpin to this function's completion, adjust
the pin count to 2 before it is entered. When this function completes
the context can call intel_context_sched_unpin which decrements the
pin count by 2 potentially resulting in an unpin.

A while loop is needed to ensure the atomicity of the pin count. e.g.
The below if / else statement has a race:

if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1)
 ce->ops->sched_disable(ce);

else
 atomic_dec(ce, 1);

Two threads could simultaneously fail the if clause resulting in the
pin_count going to 0 with scheduling enabled + the context pinned.

*/
...
pin_count is a hand-rolled mutex, except not actually a real one, and
it's absolutely hiliarous in it's various incarnations (there's one
each on i915_vm, vma, obj and probably a few more).
Not the worst one I've seen by far in the code we've merged already.
Minimally this needs a comment here and in the struct next to
@pin_count to explain where all this is abused, which would already
make it better than most of the in-tree ones.
As part of the ttm conversion we have a plan to sunset the "pin_count
as a lock" stuff, depending how bad that goes we might need to split
up the task for each struct that has such a pin_count.
Didn't know that with the TTM rework this value might go away. If that
is truely the direction I don't see the point in reworking this now. It
100% works and with a comment I think it can be understood what it is
doing.
Well not go away, but things will change. Currently the various
->pin_count sprinkled all over the place have essentially two uses
- pinning stuff long term (scanout, ctxs, anything that stays pinned
after the ioctl is done essentially)
- short-term lock-like construct
There's going to be two changes:
- The short-term pins will be replaced by dma_resv_lock/unlock pairs
- the locking rules for long-term pins will change, because we'll
require that you must hold dma_resv_lock for unpinning. So no more
atomic_t, also no more races for final unpins vs cleanup work
Also now that you've explained the why for this dance, especially the
async part: Since the new unpin will hold dma_resv_lock, we can
create&attach dma_fence for tracking async completion which then the
next operation can wait on.
The awkward state we have right now is that there's a lot of places
where we require the unpin to be done locklessly with these atomic
tricks, so there's going to be quite some surgery involved all over
the code.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915