RE: Introduce a new helper framework for buffer synchronization

14 May 2013

      ...
-----Original Message-----
From: Rob Clark [mailto:robdclark@gmail.com]
Sent: Tuesday, May 14, 2013 2:58 AM
To: Inki Dae
Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
Cho; linux-arm-kernel@lists.infradead.org; linux-media@vger.kernel.org
Subject: Re: Introduce a new helper framework for buffer synchronization
On Mon, May 13, 2013 at 1:18 PM, Inki Dae inki.dae@samsung.com wrote:
...
2013/5/13 Rob Clark robdclark@gmail.com
...
On Mon, May 13, 2013 at 8:21 AM, Inki Dae inki.dae@samsung.com wrote:
...
...
In that case you still wouldn't give userspace control over the
fences.
...
...
...
...
I
don't see any way that can end well.
What if userspace never signals? What if userspace gets killed by
oom
...
...
...
...
killer. Who keeps track of that?
In all cases, all kernel resources to user fence will be released by
kernel
once the fence is timed out: never signaling and process killing by
oom
...
...
...
killer makes the fence timed out. And if we use mmap mechanism you
mentioned
before, I think user resource could also be freed properly.
I tend to agree w/ Maarten here.. there is no good reason for
userspace to be *signaling* fences.  The exception might be some blob
gpu drivers which don't have enough knowledge in the kernel to figure
out what to do.  (In which case you can add driver private ioctls for
that.. still not the right thing to do but at least you don't make a
public API out of it.)
Please do not care whether those are generic or not. Let's see the
following
...
three things. First, it's cache operation. As you know, ARM SoC has ACP
(Accelerator Coherency Port) and can be connected to DMA engine or
similar
...
devices. And this port is used for cache coherency between CPU cache and
DMA
...
device. However, most devices on ARM based embedded systems don't use
the
...
ACP port. So they need proper cache operation before and after of DMA or
CPU
...
access in case of using cachable mapping. Actually, I see many Linux
based
...
platforms call cache control interfaces directly for that. I think the
reason, they do so, is that kernel isn't aware of when and how CPU
accessed
...
memory.
I think we had kicked around the idea of giving dmabuf's a
prepare/finish ioctl quite some time back.  This is probably something
that should be at least a bit decoupled from fences.  (Possibly
'prepare' waits for dma access to complete, but not the other way
around.)
And I did implement in omapdrm support for simulating coherency via
page fault-in / shoot-down..  It is one option that makes it
completely transparent to userspace, although there is some
performance const, so I suppose it depends a bit on your use-case.
...
And second, user process has to do so many things in case of using
shared
...
memory with DMA device. User process should understand how DMA device is
operated and when interfaces for controling the DMA device are called.
Such
...
things would make user application so complicated.
And third, it's performance optimization to multimedia and graphics
devices.
...
As I mentioned already, we should consider sequential processing for
buffer
...
sharing between CPU and DMA device. This means that CPU should stay with
idle until DMA device is completed and vise versa.
That is why I proposed such user interfaces. Of course, these interfaces
might be so ugly yet: for this, Maarten pointed already out and I agree
with
...
him. But there must be another better way. Aren't you think we need
similar
...
thing? With such interfaces, cache control and buffer synchronization
can be
...
performed in kernel level. Moreover, user applization doesn't need to
consider DMA device controlling anymore. Therefore, one thread can
access a
...
shared buffer and the other can control DMA device with the shared
buffer in
...
parallel. We can really make the best use of CPU and DMA idle time. In
other
...
words, we can really make the best use of multi tasking OS, Linux.
So could you please tell me about that there is any reason we don't use
public API for it? I think we can add and use public API if NECESSARY.
well, for cache management, I think it is a better idea.. I didn't
really catch that this was the motivation from the initial patch, but
maybe I read it too quickly.  But cache can be decoupled from
synchronization, because CPU access is not asynchronous.  For
userspace/CPU access to buffer, you should:

wait for buffer
prepare-access
... do whatever cpu access to buffer ...
finish-access
submit buffer for new dma-operation

For data flow from CPU to DMA device,
1) wait for buffer
2) prepare-access (dma_buf_begin_cpu_access)
3) cpu access to buffer
For data flow from DMA device to CPU
1) wait for buffer
2) finish-access (dma_buf_end _cpu_access)
3) dma access to buffer
1) and 2) are coupled with one function: we have implemented
fence_helper_commit_reserve() for it.
Cache control(cache clean or cache invalidate) is performed properly
checking previous access type and current access type.
And the below is actual codes for it,
static void fence_helper_cache_ops(struct fence_helper *fh)
{
    struct seqno_fence_dmabuf *sfd;
list_for_each_entry(sfd, &fh->sf.sync_buf_list, list) {
    	struct dma_buf *dmabuf = sfd->sync_buf;
if (WARN_ON(!dmabuf))
    		continue;
/* first time access. */
    	if (!dmabuf->access_type)
    		goto out;
if (dmabuf->access_type == DMA_BUF_ACCESS_WRITE &&
    			((fh->type == (DMA_BUF_ACCESS_READ |
    					DMA_BUF_ACCESS_DMA)) ||
    			(fh->type == (DMA_BUF_ACCESS_WRITE |
    				     DMA_BUF_ACCESS_DMA))))
    		/* cache clean */
    		dma_buf_end_cpu_access(dmabuf, 0, dmabuf->size,
    					DMA_TO_DEVICE);
    	else if (dmabuf->access_type == (DMA_BUF_ACCESS_WRITE |
    					DMA_BUF_ACCESS_DMA) &&
    			(fh->type == DMA_BUF_ACCESS_READ))
    		/* cache invalidate */
    		dma_buf_begin_cpu_access(dmabuf, 0, dmabuf->size,
    						DMA_FROM_DEVICE);
out:
    	/* Update access type to new one. */
    	dmabuf->access_type = fh->type;
    }
}
The above function is called after wait for buffer.
Thus, we can check who (CPU or DMA) and how (READ or WRITE) accessed and
accesses buffer with this approach. In other words, now kernel is aware of
buffer access by CPU also.
Thanks,
Inki Dae
...
I suppose you could combine the syscall for #1 and #2.. not sure if
that is a good idea or not.  But you don't need to.  And there is
never really any need for userspace to signal a fence.
BR,
-R
...
Thanks,
Inki Dae
...
BR,
-R
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

RE: Introduce a new helper framework for buffer synchronization