On Thu., Mar. 19, 2020, 06:51 Daniel Vetter, daniel@ffwll.ch wrote:
On Tue, Mar 17, 2020 at 11:01:57AM +0100, Michel Dänzer wrote:
On 2020-03-16 7:33 p.m., Marek Olšák wrote:
On Mon, Mar 16, 2020 at 5:57 AM Michel Dänzer michel@daenzer.net
wrote:
On 2020-03-16 4:50 a.m., Marek Olšák wrote:
The synchronization works because the Mesa driver waits for idle
(drains
the GFX pipeline) at the end of command buffers and there is only 1 graphics queue, so everything is ordered.
The GFX pipeline runs asynchronously to the command buffer, meaning
the
command buffer only starts draws and doesn't wait for completion. If
the
Mesa driver didn't wait at the end of the command buffer, the command buffer would finish and a different process could start execution of
its
own command buffer while shaders of the previous process are still
running.
If the Mesa driver submits a command buffer internally (because it's
full),
it doesn't wait, so the GFX pipeline doesn't notice that a command
buffer
ended and a new one started.
The waiting at the end of command buffers happens only when the
flush is
external (Swap buffers, glFlush).
It's a performance problem, because the GFX queue is blocked until
the
GFX
pipeline is drained at the end of every frame at least.
So explicit fences for SwapBuffers would help.
Not sure what difference it would make, since the same thing needs to
be
done for explicit fences as well, doesn't it?
No. Explicit fences don't require userspace to wait for idle in the
command
buffer. Fences are signalled when the last draw is complete and caches
are
flushed. Before that happens, any command buffer that is not dependent
on
the fence can start execution. There is never a need for the GPU to be
idle
if there is enough independent work to do.
I don't think explicit fences in the context of this discussion imply using that different fence signalling mechanism though. My understanding is that the API proposed by Jason allows implicit fences to be used as explicit ones and vice versa, so presumably they have to use the same signalling mechanism.
Anyway, maybe the different fence signalling mechanism you describe could be used by the amdgpu kernel driver in general, then Mesa could drop the waits for idle and get the benefits with implicit sync as well?
Yeah, this is entirely about the programming model visible to userspace. There shouldn't be any impact on the driver's choice of a top vs. bottom of the gpu pipeline used for synchronization, that's entirely up to what you're hw/driver/scheduler can pull off.
Doing a full gfx pipeline flush for shared buffers, when your hw can do be, sounds like an issue to me that's not related to this here at all. It might be intertwined with amdgpu's special interpretation of dma_resv fences though, no idea. We might need to revamp all that. But for a userspace client that does nothing fancy (no multiple render buffer targets in one bo, or vk style "I write to everything all the time, perhaps" stuff) there should be 0 perf difference between implicit sync through dma_resv and explicit sync through sync_file/syncobj/dma_fence directly.
If there is I'd consider that a bit a driver bug.
Last time I checked, there was no fence sync in gnome shell and compiz after an app passes a buffer to it. So drivers have to invent hacks to work around it and decrease performance. It's not a driver bug.
Implicit sync really means that apps and compositors don't sync, so the driver has to guess when it should sync.
Marek
-Daniel
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch