-----Original Message----- From: Lucas Stach [mailto:l.stach@pengutronix.de] Sent: Thursday, June 20, 2013 7:11 PM To: Inki Dae Cc: 'Russell King - ARM Linux'; 'Inki Dae'; 'linux-fbdev'; 'YoungJun Cho'; 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm- kernel@lists.infradead.org; linux-media@vger.kernel.org Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
Am Donnerstag, den 20.06.2013, 17:24 +0900 schrieb Inki Dae: [...]
In addition, please see the below more detail examples.
The conventional way (without dmabuf-sync) is: Task A
- CPU accesses buf
- Send the buf to Task B
- Wait for the buf from Task B
- go to 1
Task B
- Wait for the buf from Task A
- qbuf the buf 2.1 insert the buf to incoming queue
- stream on 3.1 dma_map_sg if ready, and move the buf to ready queue 3.2 get the buf from ready queue, and dma start.
- dqbuf 4.1 dma_unmap_sg after dma operation completion 4.2 move the buf to outgoing queue
- back the buf to Task A
- go to 1
In case that two tasks share buffers, and data flow goes from Task A
to
Task
B, we would need IPC operation to send and receive buffers properly
between
those two tasks every time CPU or DMA access to buffers is started
or
completed.
With dmabuf-sync:
Task A
- dma_buf_sync_lock <- synpoint (call by user side)
- CPU accesses buf
- dma_buf_sync_unlock <- syncpoint (call by user side)
- Send the buf to Task B (just one time)
- go to 1
Task B
- Wait for the buf from Task A (just one time)
- qbuf the buf 1.1 insert the buf to incoming queue
- stream on 3.1 dma_buf_sync_lock <- syncpoint (call by kernel side) 3.2 dma_map_sg if ready, and move the buf to ready queue 3.3 get the buf from ready queue, and dma start.
- dqbuf 4.1 dma_buf_sync_unlock <- syncpoint (call by kernel side) 4.2 dma_unmap_sg after dma operation completion 4.3 move the buf to outgoing queue
- go to 1
On the other hand, in case of using dmabuf-sync, as you can see the
above
example, we would need IPC operation just one time. That way, I
think we
could not only reduce performance overhead but also make user
application
simplified. Of course, this approach can be used for all DMA device
drivers
such as DRM. I'm not a specialist in v4l2 world so there may be
missing
point.
You already need some kind of IPC between the two tasks, as I suspect even in your example it wouldn't make much sense to queue the buffer over and over again in task B without task A writing anything to it.
So
task A has to signal task B there is new data in the buffer to be processed.
There is no need to share the buffer over and over again just to get
the
two processes to work together on the same thing. Just share the fd between both and then do out-of-band completion signaling, as you need this anyway. Without this you'll end up with unpredictable behavior. Just because sync allows you to access the buffer doesn't mean it's valid for your use-case. Without completion signaling you could easily end up overwriting your data from task A multiple times before task B even tries to lock the buffer for processing.
So the valid flow is (and this already works with the current APIs): Task A Task B
CPU access buffer ----------completion signal---------> qbuf (dragging buffer into device domain, flush caches, reserve buffer etc.) | wait for device operation to complete | dqbuf (dragging buffer back into CPU domain, invalidate caches, unreserve) <---------completion signal------------ CPU access buffer
Correct. In case that data flow goes from A to B, it needs some kind of IPC between the two tasks every time as you said. Then, without dmabuf-sync, how do think about the case that two tasks share the same buffer but these tasks access the buffer(buf1) as write, and data of the buffer(buf1) isn't needed to be shared?
Sorry, I don't see the point you are trying to solve here. If you share a buffer and want its content to be clearly defined at every point in time you have to synchronize the tasks working with the buffer, not just the buffer accesses itself.
Easiest way to do so is doing sync through userspace with out-of-band IPC, like in the example above.
In my opinion, that's not definitely easiest way. What I try to do is to avoid using *the out-of-band IPC*. As I mentioned in document file, the conventional mechanism not only makes user application complicated-user process needs to understand how the device driver is worked-but also may incur performance overhead by using the out-of-band IPC. The above my example may not be enough to you but there would be other cases able to use my approach efficiently.
A more advanced way to achieve this would be using cross-device fences to avoid going through userspace for every syncpoint.
Ok, maybe there is something I missed. So question. What is the cross-device fences? dma fence?. And how we can achieve the synchronization mechanism without going through user space for every syncpoint; CPU and DMA share a same buffer?. And could you explain it in detail as long as possible like I did?
With dmabuf-sync is:
Task A
- dma_buf_sync_lock <- synpoint (call by user side)
- CPU writes something to buf1
- dma_buf_sync_unlock <- syncpoint (call by user side)
- copy buf1 to buf2
Random contents here? What's in the buffer, content from the CPU write, or from V4L2 device write?
Please presume that buf1 is physically non contiguous memory, and buf2 is physically contiguous memory; device A without IOMMU is seeing buf2. We would need to copy buf1 to buf2 to send the contents of the buf1 to device A because DMA of the device A cannot access the buf1 directly. And CPU and V4L2 device don't share the contents of the buf1 but share the buf1 as storage.
Thanks, Inki Dae
- go to 1
Task B
- dma_buf_sync_lock
- CPU writes something to buf3
- dma_buf_sync_unlock
- qbuf the buf3(src) and buf1(dst) 4.1 insert buf3,1 to incoming queue 4.2 dma_buf_sync_lock <- syncpoint (call by kernel side)
- stream on 5.1 dma_map_sg if ready, and move the buf to ready queue 5.2 get the buf from ready queue, and dma start.
- dqbuf 6.1 dma_buf_sync_unlock <- syncpoint (call by kernel side) 6.2 dma_unmap_sg after dma operation completion 6.3 move the buf3,1 to outgoing queue
- go to 1
Regards, Lucas -- Pengutronix e.K. | Lucas Stach | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |