Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework

22 Jun 2013


      2013/6/22 Jerome Glisse j.glisse@gmail.com:
...
On Fri, Jun 21, 2013 at 12:55 PM, Inki Dae daeinki@gmail.com wrote:
...
2013/6/21 Lucas Stach l.stach@pengutronix.de:
...
Hi Inki,
please refrain from sending HTML Mails, it makes proper quoting without
messing up the layout everywhere pretty hard.
Sorry about that. I should have used text mode.
...
Am Freitag, den 21.06.2013, 20:01 +0900 schrieb Inki Dae:
[...]
...
Yeah, you'll some knowledge and understanding about the API
you are
working with to get things right. But I think it's not an
unreasonable
thing to expect the programmer working directly with kernel
interfaces
to read up on how things work.
Second thing: I'll rather have *one* consistent API for every
subsystem,
even if they differ from each other than having to implement
this
syncpoint thing in every subsystem. Remember: a single execbuf
in DRM
might reference both GEM objects backed by dma-buf as well
native SHM or
CMA backed objects. The dma-buf-mgr proposal already allows
you to
handle dma-bufs much the same way during validation than
native GEM
objects.
Actually, at first I had implemented a fence helper framework based on
reservation and dma fence to provide easy-use-interface for device
drivers. However, that was wrong implemention: I had not only
customized the dma fence but also not considered dead lock issue.
After that, I have reimplemented it as dmabuf sync to solve dead
issue, and at that time, I realized that we first need to concentrate
on the most basic thing: the fact CPU and CPU, CPU and DMA, or DMA and
DMA can access a same buffer, And the fact simple is the best, and the
fact we need not only kernel side but also user side interfaces. After
that, I collected what is the common part for all subsystems, and I
have devised this dmabuf sync framework for it. I'm not really
specialist in Desktop world. So question. isn't the execbuf used only
for the GPU? the gpu has dedicated video memory(VRAM) so it needs
migration mechanism between system memory and the dedicated video
memory, and also to consider ordering issue while be migrated.
Yeah, execbuf is pretty GPU specific, but I don't see how this matters
for this discussion. Also I don't see a big difference between embedded
and desktop GPUs. Buffer migration is more of a detail here. Both take
command stream that potentially reference other buffers, which might be
native GEM or dma-buf backed objects. Both have to make sure the buffers
are in the right domain (caches cleaned and address mappings set up) and
are available for the desired operation, meaning you have to sync with
other DMA engines and maybe also with CPU.
Yeah, right. Then, in case of desktop gpu, does't it need additional
something to do when a buffer/s is/are migrated from system memory to
video memory domain, or from video memory to system memory domain? I
guess the below members does similar thing, and all other DMA devices
would not need them:
struct fence {
...
unsigned int context, seqno;
...
};
And,
struct seqno_fence {
...
uint32_t seqno_ofs;
...
};
...
The only case where sync isn't clearly defined right now by the current
API entrypoints is when you access memory through the dma-buf fallback
mmap support, which might happen with some software processing element
in a video pipeline or something. I agree that we will need a userspace
interface here, but I think this shouldn't be yet another sync object,
but rather more a prepare/fini_cpu_access ioctl on the dma-buf which
hooks into the existing dma-fence and reservation stuff.
I think we don't need addition ioctl commands for that. I am thinking
of using existing resources as possible. My idea also is similar in
using the reservation stuff to your idea because my approach also
should use the dma-buf resource. However, My idea is that a user
process, that wants buffer synchronization with the other, sees a sync
object as a file descriptor like dma-buf does. The below shows simple
my idea about it:
ioctl(dmabuf_fd, DMA_BUF_IOC_OPEN_SYNC, &sync);
flock(sync->fd, LOCK_SH); <- LOCK_SH means a shared lock.
CPU access for read
flock(sync->fd, LOCK_UN);
Or
flock(sync->fd, LOCK_EX); <- LOCK_EX means an exclusive lock
CPU access for write
flock(sync->fd, LOCK_UN);
close(sync->fd);
As you know, that's similar to dmabuf export feature.
In addition, a more simple idea,
flock(dmabuf_fd, LOCK_SH/EX);
CPU access for read/write
flock(dmabuf_fd, LOCK_UN);
However, I'm not sure that the above examples could be worked well,
and there are no problems yet: actually, I don't fully understand
flock mechanism, so looking into it.
...
...
And to get back to my original point: if you have more than
one task
operating together on a buffer you absolutely need some kind
of real IPC
to sync them up and do something useful. Both you syncpoints
and the
proposed dma-fences only protect the buffer accesses to make
sure
different task don't stomp on each other. There is nothing in
there to
make sure that the output of your pipeline is valid. You have
to take
care of that yourself in userspace. I'll reuse your example to
make it
clear what I mean:
Task A Task B

dma_buf_sync_lock(buf1)
CPU write buf1
dma_buf_sync_unlock(buf1)
---------schedule Task A again-------
dma_buf_sync_lock(buf1)
CPU write buf1
dma_buf_sync_unlock(buf1)
---------schedule Task B---------
qbuf(buf1)
dma_buf_sync_lock(buf1)
....
This is what can happen if you don't take care of proper
syncing. Task A
writes something to the buffer in expectation that Task B will
take care
of it, but before Task B even gets scheduled Task A overwrites
the
buffer again. Not what you wanted, isn't it?
Exactly wrong example. I had already mentioned about that. "In case
that data flow goes from A to B, it needs some kind of IPC between the
two tasks every time" So again, your example would have no any
problem in case that *two tasks share the same buffer but these tasks
access the buffer(buf1) as write, and data of the buffer(buf1) isn't
needed to be shared*. They just need to use the buffer as *storage*.
So All they want is to avoid stomping on the buffer in this case.
Sorry, but I don't see the point. If no one is interested in the data of
the buffer, why are you sharing it in the first place?
Just used as a storage. i.e., Task A fills the buffer with "AAAAAA"
using CPU, And Task B fills the buffer with "BBBBBB" using DMA. They
don't share data of the buffer, but they share *memory region* of the
buffer. That would be very useful for the embedded systems with very
small size system memory.
Just so i understand. You want to share backing memory, you don't want
to share content ie you want to do memory management in userspace.
This sounds wrong on so many level (not even considering the security
implication).
If Task A need memory and then can release it for Task B usage that
should be the role of kernel memory management which of course needs
synchronization btw A and B. But in no case this should be done using
dma-buf. dma-buf is for sharing content btw different devices not
sharing resources.
Just simply let's think of the case that a user process doesn't want for
anyone, other CPU or DMA, to access a shared buffer while he is accessing
the shared buffer. In this case, useful.
Thanks,
Inki Dae
...
Also don't over complicate the vram case, just consider desktop gpu as
using system memory directly. They can do it and they do it. Migration
to vram is orthogonal to all this, it's an optimization so to speak.
Cheers,
Jerome.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework