On 08/25/2015 11:02 AM, Daniel Vetter wrote:
On Mon, Aug 24, 2015 at 04:52:13PM +0100, Daniel Stone wrote:
Hi Thomas,
On 24 August 2015 at 10:50, Thomas Hellstrom thomas@shipmail.org wrote:
In any case, Ideally I'd want the struct dma_buf_sync look something like
enum dma_buf_sync_flags { DMA_BUF_SYNC_READ = (1 << 0), DMA_BUF_SYNC_WRITE = (2 << 0), DMA_BUF_SYNC_RW = (3 << 0), DMA_BUF_SYNC_START = (0 << 2), DMA_BUF_SYNC_END = (1 << 2),
DMA_BUF_SYNC_VALID_FLAGS_MASK = DMA_BUF_SYNC_RW | DMA_BUF_SYNC_END
};
/* begin/end dma-buf functions used for userspace mmap. */ struct dma_buf_sync { enum dma_buf_sync_flags flags; __u64 x; __u64 width;
/* 2D Sync parameters */ __u64 stride; __u64 y; __u64 height;
}
So again, why use a 2D sync in the first place when it's probably possible to batch 1D syncs in the driver? It all boils down to the number of syscalls required, and also the fact that it's probably not trivial to batch sync calls in the read-write case. One can, again, argue that there might be 3D cases or other obscure sync cases, but in that case, let's implement them in the future if the overhead becomes unbearable. If the sync mechanism gets established as mandatory, the problem will solve itself. And at least now we cover the most common case and we don't need to add a new multiplexing mechanism in the sync IOCTL since we already have one in the IOCTL mechanism. We can simply add a new sync IOCTL for new obscure sync cases if desired.
I still don't think this ameliorates the need for batching: consider the case where you update two disjoint screen regions and want them both flushed. Either you issue two separate sync calls (which can be disadvantageous performance-wise on some hardware / setups), or you accumulate the regions and only flush later. So either two ioctls (one in the style of dirtyfb and one to perform the sync/flush; you can shortcut to assume the full buffer was damaged if the former is missing), or one like this:
struct dma_buf_sync_2d { enum dma_buf_sync_flags flags;
__u64 stride_bytes; __u32 bytes_per_pixel; __u32 num_regions; struct { __u64 x; __u64 y; __u64 width; __u64 height; } regions[];
};
I really feel like any kind of multi-range flush interface is feature bloat, and if we do it then we should only do it later on when there's a clear need for it.
IMO all the use-cases so far that wanted to do this have been 2D updates. and having only a 1D sync will most probably scare people away from this interface.
Afaiui dma-buf mmap will mostly be used for up/downloads, which means there will be some explicit copy involved somewhere anyway. So similar to userptr usage. And for those most often you just slurp in a linear block and then let the blitter rearrange it, at least on i915.
Also thus far the consensus was that dma-buf doesn't care/know about content layout, adding strides/bytes/whatever does bend this quite a bit.
I think with a 2D interface, the stride only applies to the sync operation itself and is only a parameter for that sync operation. Whether we should have multiple regions or not is not a big deal for me, but I think at least a 2D sync is crucial.
And finally if we concentrate on the simple-case of one byte range we can focus the discussion on correctness, which imo is the important bit here really.
If we only focus on correctness there are other ways to do this without a sync: We can DMA based on accessed and dirty pages, but performance will crawl, so IMO the sync interface is purely for performance...
-Daniel
Thanks, Thomas