Re: Plumbing explicit synchronization through the Linux ecosystem

19 Mar 2020


      On Thu., Mar. 19, 2020, 06:51 Daniel Vetter, daniel@ffwll.ch wrote:
...
On Tue, Mar 17, 2020 at 11:01:57AM +0100, Michel Dänzer wrote:
...
On 2020-03-16 7:33 p.m., Marek Olšák wrote:
...
On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer michel@daenzer.net
wrote:
...
...
...
On 2020-03-16 4:50 a.m., Marek Olšák wrote:
...
The synchronization works because the Mesa driver waits for idle
(drains
...
...
...
...
the GFX pipeline) at the end of command buffers and there is only 1
graphics queue, so everything is ordered.
The GFX pipeline runs asynchronously to the command buffer, meaning
the
...
...
...
...
command buffer only starts draws and doesn't wait for completion. If
the
...
...
...
...
Mesa driver didn't wait at the end of the command buffer, the command
buffer would finish and a different process could start execution of
its
...
...
...
...
own command buffer while shaders of the previous process are still
running.
...
If the Mesa driver submits a command buffer internally (because it's
full),
...
it doesn't wait, so the GFX pipeline doesn't notice that a command
buffer
...
...
...
...
ended and a new one started.
The waiting at the end of command buffers happens only when the
flush is
...
...
...
...
external (Swap buffers, glFlush).
It's a performance problem, because the GFX queue is blocked until
the
...
...
...
GFX
...
pipeline is drained at the end of every frame at least.
So explicit fences for SwapBuffers would help.
Not sure what difference it would make, since the same thing needs to
be
...
...
...
done for explicit fences as well, doesn't it?
No. Explicit fences don't require userspace to wait for idle in the
command
...
...
buffer. Fences are signalled when the last draw is complete and caches
are
...
...
flushed. Before that happens, any command buffer that is not dependent
on
...
...
the fence can start execution. There is never a need for the GPU to be
idle
...
...
if there is enough independent work to do.
I don't think explicit fences in the context of this discussion imply
using that different fence signalling mechanism though. My understanding
is that the API proposed by Jason allows implicit fences to be used as
explicit ones and vice versa, so presumably they have to use the same
signalling mechanism.
Anyway, maybe the different fence signalling mechanism you describe
could be used by the amdgpu kernel driver in general, then Mesa could
drop the waits for idle and get the benefits with implicit sync as well?
Yeah, this is entirely about the programming model visible to userspace.
There shouldn't be any impact on the driver's choice of a top vs. bottom
of the gpu pipeline used for synchronization, that's entirely up to what
you're hw/driver/scheduler can pull off.
Doing a full gfx pipeline flush for shared buffers, when your hw can do
be, sounds like an issue to me that's not related to this here at all. It
might be intertwined with amdgpu's special interpretation of dma_resv
fences though, no idea. We might need to revamp all that. But for a
userspace client that does nothing fancy (no multiple render buffer
targets in one bo, or vk style "I write to everything all the time,
perhaps" stuff) there should be 0 perf difference between implicit sync
through dma_resv and explicit sync through sync_file/syncobj/dma_fence
directly.
If there is I'd consider that a bit a driver bug.
Last time I checked, there was no fence sync in gnome shell and compiz
after an app passes a buffer to it. So drivers have to invent hacks to work
around it and decrease performance. It's not a driver bug.
Implicit sync really means that apps and compositors don't sync, so the
driver has to guess when it should sync.
Marek
-Daniel
...
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Plumbing explicit synchronization through the Linux ecosystem