Re: [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost

8 Jun 2022

      On Tue, 07 Jun 2022 16:15:19 -0700, John Harrison wrote:
...
On 6/7/2022 15:29, Dixit, Ashutosh wrote:
...
On Sat, 14 May 2022 23:05:06 -0700, Vinay Belgaumkar wrote:
...
SLPC min/max frequency updates require H2G calls. We are seeing
timeouts when GuC channel is backed up and it is unable to respond
in a timely fashion causing warnings and affecting CI.
This is seen when waitboosting happens during a stress test.
this patch updates the waitboost path to use a non-blocking
H2G call instead, which returns as soon as the message is
successfully transmitted.
Overall I think this patch is trying to paper over problems in the blocking
H2G CT interface (specifically the 1 second timeout in
wait_for_ct_request_update()). So I think we should address that problem in
the interface directly rather than having each client (SLPC and any future
client) work around the problem. Following points:

This patch seems to assume that it is 'ok' to ignore the return code
 from FW for a waitboost request (arguing waitboost is best effort so
 it's ok to 'fire and forget'). But the return code is still useful
 e.g. in cases where we see performance issues and want to go back and
 investigate if FW rejected any waitboost requests.

You still get errors reported in the GuC log. Indeed, some errors (or at
least error reasons) are only visible in the log not in the return code.
OK, so we at least have this method for debug available.
...
...

We are already seeing that a 1 second timeout is not sufficient. So why
 not simply increase that timeout?

In fact if we are saying that the CT interface is a "reliable" interface
 (implying no message loss), to ensure reliability that timeout should
 not simply be increased, it should be made "infinite" (in quotes).

Maybe it would have been best to not have a "blocking" H2G interface at
 all (with the wait in wait_for_ct_request_update()). Just have an
 asynchronous interface (which mirrors the actual interface between FW
 and i915) in which clients register callbacks which are invoked when FW
 responds. If this is too big a change we can probably continue with the
 current blocking interface after increasing the timeout as mentioned
 above.

Finally, the waitboost request is just the most likely to get stuck at
 the back of a full CT queue since it happens during normal
 operation. Actually any request, say one initiated from sysfs, can also
 get similarly stuck at the back of a full queue. So any solution should
 also address that situation (where the return code is needed and
 similarly for a future client of the "blocking" (REQUEST/RESPONSE)
 interface).

The blocking interface is only intended for init time operations, not
runtime.
In that case we should probably have code to enforce this in i915.
...
Stuff where the operation is meant to be synchronous and the KMD
should not proceed until it has an ack back from the GuC that the update
has taken place. All runtime operations are expected to be asynchronous. If
a response is required, then it should be sent via an async
callback. E.g. context de-registration is a 'fire and forget' H2G call but
gets a 'deregistration complete' G2H notification when it is safe for the
KMD to free up the associated storage.
At present all GuC interactions in intel_guc_slpc.c (in i915) do *not*
follow this. They use the REQUEST/RESPONSE FW interface which is pushed
through the blocking H2G CT interface in i915. If we are serious about this
this needs a GuC FW change to use bi-directional EVENT's used in the
asynchronous interface (with corresponding changes in intel_guc_slpc.c).
...
There is an 'errors only' H2G mechanism. That will not send an ack back in
the case of a successful H2G but will send back an error notification in
the case of a failure. All async H2Gs should really be using that
mechanism. I think Michal W did post a patch for it and I was meant to be
reviewing it but it dropped of my radar due to other higher priorities.
These I believe are referred to as FAST_REQUEST's in GuC FW. That success
is not communicated back to the KMD might be an issue in cases where KMD
needs to know whether a particular operation was successful (such as
for operations initiated via sysfs).
Thanks.
--
Ashutosh

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH] drm/i915/guc/slpc: Use non-blocking H2G for waitboost