On Wed, Jun 09, 2021 at 04:07:24PM +0200, Christian König wrote:
Am 09.06.21 um 15:42 schrieb Daniel Vetter:
[SNIP]
That won't work. The problem is that you have only one exclusive slot, but multiple submissions which execute out of order and compose the buffer object together.
That's why I suggested to use the dma_fence_chain to circumvent this.
But if you are ok that amdgpu sets the exclusive fence without changing the shared ones than the solution I've outlined should already work as well.
Uh that's indeed nasty. Can you give me the details of the exact use-case so I can read the userspace code and come up with an idea? I was assuming that even with parallel processing there's at least one step at the end that unifies it for the next process.
Unfortunately not, with Vulkan that is really in the hand of the application.
Vulkan explicitly says implicit sync isn't a thing, and you need to import/export syncobj if you e.g. want to share a buffer with GL.
Ofc because amdgpu always syncs there's a good chance that userspace running on amdgpu vk doesn't get this right and is breaking the vk spec here :-/
But the example we have in the test cases is using 3D+DMA to compose a buffer IIRC.
Yeah that's the more interesting one I think. I've heard of some post-processing steps, but that always needs to wait for 3D to finish. 3D + copy engine a separate thing.
If we can't detect this somehow then it means we do indeed have to create a fence_chain for the exclusive slot for everything, which would be nasty.
I've already created a prototype of that and it is not that bad. It does have some noticeable overhead, but I think that's ok.
Yup seen that, I'll go and review that tomorrow hopefully. It's not great, but it's definitely a lot better than the force always sync.
Or a large-scale redo across all drivers, which is probaly even more nasty.
Yeah, that is indeed harder to get right.
Yeah, and there's also a bunch of other confusions in that area. -Daniel