Re: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability

30 Jul 2021

      On 2021-07-30 12:25 p.m., Daniel Vetter wrote:
...
On Thu, Jul 29, 2021 at 01:16:55AM -0700, Vivek Kasireddy wrote:
...
By separating the OUT_FENCE signalling from pageflip completion allows
a Guest compositor to start a new repaint cycle with a new buffer
instead of waiting for the old buffer to be free.
This work is based on the idea/suggestion from Simon and Pekka.
This capability can be a solution for this issue:
https://gitlab.freedesktop.org/wayland/weston/-/issues/514
Corresponding Weston MR:
https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/668
Uh I kinda wanted to discuss this a bit more before we jump into typing
code, but well I guess not that much work yet.
So maybe I'm not understanding the problem, but I think the fundamental
underlying issue is that with KMS you can have at most 2 buffers
in-flight, due to our queue depth limit of 1 pending flip.
Unfortunately that means for virtual hw where it takes a few more
steps/vblanks until the framebuffer actually shows up on screen and is
scanned out, we suffer deeply. The usual fix for that is to drop the
latency and increase throughput, and have more buffers in-flight. Which
this patch tries to do.
Per https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_986797 , IMO the underlying issue is actually that the guest compositor repaint cycle is not aligned with the host compositor one. If they were aligned, the problem would not occur even without allowing multiple page flips in flight, and latency would be lower.
...
Now I think where we go wrong here is that we're trying to hack this up by
defining different semantics for the out-fence and for the drm-event. Imo
that's wrong, they're both meant to show eactly the same thing:

when is the new frame actually visible to the user (as in, eyeballs in a
human head, preferrably, not the time when we've handed the buffer off
to the virtual hw)
when is the previous buffer no longer being used by the scanout hw

We do cheat a bit right now in so far that we assume they're both the
same, as in, panel-side latency is currently the compositor's problem to
figure out.
So for virtual hw I think the timestamp and even completion really need to
happen only when the buffer has been pushed through the entire
virtualization chain, i.e. ideally we get the timestamp from the kms
driver from the host side. Currently that's not done, so this is most
likely quite broken already (virtio relies on the no-vblank auto event
sending, which definitely doesn't wait for anything, or I'm completely
missing something).
I think instead of hacking up some ill-defined 1.5 queue depth support,
what we should do is support queue depth > 1 properly. So:

Change atomic to support queue depth > 1, this needs to be a per-driver
thing due to a bunch of issues in driver code. Essentially drivers must
never look at obj->state pointers, and only ever look up state through
the passed-in drm_atomic_state * update container.

Aside: virtio should loose all it's empty hooks, there's no point in
that.

We fix virtio to send out the completion event at the end of this entire
pipeline, i.e. virtio code needs to take care of sending out the
crtc_state->event correctly.

We probably also want some kind of (maybe per-crtc) recommended queue
depth property so compositors know how many buffers to keep in flight.
Not sure about that.

I'd say there would definitely need to be some kind of signal for the display server that it should queue multiple flips, since this is normally not desirable for latency. In other words, this wouldn't really be useful on bare metal (in contrast to the ability to replace a pending flip with a newer one).
-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability