RE: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability

4 Aug 2021

      Hi Michel,
...
...
...
...
The goal:

Maintain full framerate even when the Guest scanout FB is flipped onto a hardware

plane
...
on the Host -- regardless of either compositor's scheduling policy -- without making
any
...
...
...
copies and ensuring that both Host and Guest are not accessing the buffer at the same
time.
...
The problem:

If the Host compositor flips the client's buffer (in this case Guest compositor's

buffer)
...
...
...
onto a hardware plane, then it can send a wl_buffer.release event for the previous
buffer
...
...
...
only after it gets a pageflip completion. And, if the Guest compositor takes 10-12 ms
to
...
...
...
submit a new buffer and given the fact that the Host compositor waits only for 9 ms,
the
...
...
...
Guest compositor will miss the Host's repaint cycle resulting in halved frame-rate.
The solution:

To ensure full framerate, the Guest compositor has to start it's repaint cycle

(including
...
...
...
the 9 ms wait) when the Host compositor sends the frame callback event to its clients.
In order for this to happen, the dma-fence that the Guest KMS waits on -- before
sending
...
...
...
pageflip completion -- cannot be tied to a wl_buffer.release event. This means that,
the
...
...
...
Guest compositor has to be forced to use a new buffer for its next repaint cycle when
it
...
...
...
gets a pageflip completion.
Is that really the only solution?
[Kasireddy, Vivek] There are a few others I mentioned here:
https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_986572
But I think none of them are as compelling as this one.
...
If we fix the event timestamps so that both guest and host use the same
timestamp, but then the guest starts 5ms (or something like that) earlier,
then things should work too? I.e.

host compositor starts at (previous_frametime + 9ms)
guest compositor starts at (previous_frametime + 4ms)

Ofc this only works if the frametimes we hand out to both match _exactly_
and are as high-precision as the ones on the host side. Which for many gpu
drivers at least is the case, and all the ones you care about for sure :-)
But if the frametimes the guest receives are the no_vblank fake ones, then
they'll be all over the place and this carefully tuned low-latency redraw
loop falls apart. Aside fromm the fact that without tuning the guests to
be earlier than the hosts, you're guaranteed to miss every frame (except
when the timing wobbliness in the guest is big enough by chance to make
the deadline on the oddball frame).
[Kasireddy, Vivek] The Guest and Host use different event timestamps as we don't
share these between the Guest and the Host. It does not seem to be causing any other
problems so far but we did try the experiment you mentioned (i.e., adjusting the delays)
and it works. However, this patch series is meant to fix the issue without having to tweak
anything (delays) because we can't do this for every compositor out there.
Maybe there could be a mechanism which allows the compositor in the guest to
automatically adjust its repaint cycle as needed.
This might even be possible without requiring changes in each compositor, by adjusting
the vertical blank periods in the guest to be aligned with the host compositor repaint
cycles. Not sure about that though.
[Kasireddy, Vivek] The problem really is that the Guest compositor -- or any other compositor
for that matter -- assumes that after a pageflip completion, the old buffer submitted in the
previous flip is free and can be reused again. I think this is a guarantee given by KMS. If we have
to enforce this, we (Guest KMS) have to wait until the Host compositor sends a wl_buffer.release --
which can only happen after Host gets a pageflip completion assuming it uses hardware planes .
From this point onwards, the Guest compositor only has 9 ms (in the case of Weston) -- or less
based on the Host compositor's scheduling policy -- to submit a new frame.
Although, we can adjust the repaint-window of the Guest compositor to ensure a submission 
within 9 ms or increase the delay on the Host, these tweaks are just heuristics. I think in order
to have a generic solution that'll work in all cases means that the Guest compositor has to start
its repaint cycle with a new buffer when the Host sends out the frame callback event.
...
Even if not, both this series or making it possible to queue multiple flips require
corresponding changes in each compositor as well to have any effect.
[Kasireddy, Vivek] Yes, unfortunately; but the hope is that the Guest KMS can do most of
the heavy lifting and keep the changes for the compositors generic enough and minimal.
Thanks,
Vivek
...
--
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

RE: [RFC v1 0/4] drm: Add support for DRM_CAP_DEFERRED_OUT_FENCE capability