Hi Michel,
The goal:
- Maintain full framerate even when the Guest scanout FB is flipped onto a hardware
plane
on the Host -- regardless of either compositor's scheduling policy -- without making
any
copies and ensuring that both Host and Guest are not accessing the buffer at the same
time.
The problem:
- If the Host compositor flips the client's buffer (in this case Guest compositor's
buffer)
onto a hardware plane, then it can send a wl_buffer.release event for the previous
buffer
only after it gets a pageflip completion. And, if the Guest compositor takes 10-12 ms
to
submit a new buffer and given the fact that the Host compositor waits only for 9 ms,
the
Guest compositor will miss the Host's repaint cycle resulting in halved frame-rate.
The solution:
- To ensure full framerate, the Guest compositor has to start it's repaint cycle
(including
the 9 ms wait) when the Host compositor sends the frame callback event to its clients. In order for this to happen, the dma-fence that the Guest KMS waits on -- before
sending
pageflip completion -- cannot be tied to a wl_buffer.release event. This means that,
the
Guest compositor has to be forced to use a new buffer for its next repaint cycle when
it
gets a pageflip completion.
Is that really the only solution?
[Kasireddy, Vivek] There are a few others I mentioned here: https://gitlab.freedesktop.org/wayland/weston/-/issues/514#note_986572 But I think none of them are as compelling as this one.
If we fix the event timestamps so that both guest and host use the same timestamp, but then the guest starts 5ms (or something like that) earlier, then things should work too? I.e.
- host compositor starts at (previous_frametime + 9ms)
- guest compositor starts at (previous_frametime + 4ms)
Ofc this only works if the frametimes we hand out to both match _exactly_ and are as high-precision as the ones on the host side. Which for many gpu drivers at least is the case, and all the ones you care about for sure :-)
But if the frametimes the guest receives are the no_vblank fake ones, then they'll be all over the place and this carefully tuned low-latency redraw loop falls apart. Aside fromm the fact that without tuning the guests to be earlier than the hosts, you're guaranteed to miss every frame (except when the timing wobbliness in the guest is big enough by chance to make the deadline on the oddball frame).
[Kasireddy, Vivek] The Guest and Host use different event timestamps as we don't share these between the Guest and the Host. It does not seem to be causing any other problems so far but we did try the experiment you mentioned (i.e., adjusting the delays) and it works. However, this patch series is meant to fix the issue without having to tweak anything (delays) because we can't do this for every compositor out there.
Maybe there could be a mechanism which allows the compositor in the guest to automatically adjust its repaint cycle as needed.
This might even be possible without requiring changes in each compositor, by adjusting the vertical blank periods in the guest to be aligned with the host compositor repaint cycles. Not sure about that though.
[Kasireddy, Vivek] The problem really is that the Guest compositor -- or any other compositor for that matter -- assumes that after a pageflip completion, the old buffer submitted in the previous flip is free and can be reused again. I think this is a guarantee given by KMS. If we have to enforce this, we (Guest KMS) have to wait until the Host compositor sends a wl_buffer.release -- which can only happen after Host gets a pageflip completion assuming it uses hardware planes . From this point onwards, the Guest compositor only has 9 ms (in the case of Weston) -- or less based on the Host compositor's scheduling policy -- to submit a new frame.
Although, we can adjust the repaint-window of the Guest compositor to ensure a submission within 9 ms or increase the delay on the Host, these tweaks are just heuristics. I think in order to have a generic solution that'll work in all cases means that the Guest compositor has to start its repaint cycle with a new buffer when the Host sends out the frame callback event.
Even if not, both this series or making it possible to queue multiple flips require corresponding changes in each compositor as well to have any effect.
[Kasireddy, Vivek] Yes, unfortunately; but the hope is that the Guest KMS can do most of the heavy lifting and keep the changes for the compositors generic enough and minimal.
Thanks, Vivek
-- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer