Am 22.09.2016 um 14:26 schrieb Daniel Vetter:
On Thu, Sep 22, 2016 at 12:55 PM, Christian König deathsimple@vodafone.de wrote:
Am 22.09.2016 um 08:36 schrieb Daniel Vetter:
On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote:
For a quick workaround I suggest to just serialize all accesses to BO shared with different drivers, but essentially I think it is a perfectly valid requirement to have multiple writers to one BO.
It is, but it's not possible with implicit sync. If you want parallel write access to the same shared buffer, you _must_ carry around some explicit fences. Within amdgpu you can use driver-specific cookies, for shared buffer we now have sync_file. But multiple writers with implicit sync simply fundamentally doesn't work. Because you have no idea with which writer, touching the same subrange you want to touch.
You don't need to separate the BO into subranges which are touched by different engines for allowing multiple writers.
AMD hardware and I'm pretty sure others as well are perfectly capable of writing to the same memory from multiple engines and even multiple GPUs at the same time.
For a good hint of what is possible see the public AMD ISA documentation about atomic operations, but that is only the start of it.
The crux here is that we need to assume that we will have implicit and explicit sync mixed for backward compatibility.
This implies that we need some mechanism like amdgpu uses in it's sync implementation where every fence is associated with an owner which denotes the domain in which implicit sync happens. If you leave this domain you will automatically run into explicit sync.
Currently we define the borders of this domain in amdgpu on process boundary to keep things like DRI2/DRI3 working as expected.
I really don't see how you want to solve this with a single explicit fence for each reservation object. As long as you have multiple concurrently running operations accessing the same buffer you need to keep one fence for each operation no matter what.
I can't make sense of what you're saying, and I suspect we put different meaning to different words. So let me define here:
- implicit fencing: Userspace does not track read/writes to buffers,
but only the kernel does that. This is the assumption DRI2/3 has. Since synchronization is by necessity on a per-buffer level you can only have 1 writer. In the kernel the cross-driver interface for this is struct reservation_object attached to dma-bufs. If you don't fill out/wait for the exclusive fence in there, you're driver is _not_ doing (cross-device) implicit fencing.
I can confirm that my understanding of implicit fencing is exactly the same as yours.
- explicit fencing: Userspace passes around distinct fence objects for
any work going on on the gpu. The kernel doesn't insert any stall of it's own (except for moving buffer objects around ofc). This is what Android. This also seems to be what amdgpu is doing within one process/owner.
No, that is clearly not my understanding of explicit fencing.
Userspace doesn't necessarily need to pass around distinct fence objects with all of it's protocols and the kernel is still responsible for inserting stalls whenever an userspace protocol or application requires this semantics.
Otherwise you will never be able to use explicit fencing on the Linux desktop with protocols like DRI2/DRI3.
I would expect that every driver in the system waits for all fences of a reservation object as long as it isn't told otherwise by providing a distinct fence object with the IOCTL in question.
Regards, Christian.