On Wed, May 8, 2019 at 2:00 PM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 08.05.19 um 12:11 schrieb Daniel Vetter:
On Wed, May 08, 2019 at 11:15:33AM +0200, Daniel Vetter wrote:
On Tue, May 07, 2019 at 10:13:30AM +0200, Christian König wrote:
To allow a smooth transition from pinning buffer objects to dynamic invalidation we first start to cache the sg_table for an attachment.
Signed-off-by: Christian König christian.koenig@amd.com
drivers/dma-buf/dma-buf.c | 24 ++++++++++++++++++++++++ include/linux/dma-buf.h | 14 ++++++++++++++ 2 files changed, 38 insertions(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 7c858020d14b..775e13f54083 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -573,6 +573,20 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, list_add(&attach->node, &dmabuf->attachments);
mutex_unlock(&dmabuf->lock);
if (!dma_buf_is_dynamic(dmabuf)) {
struct sg_table *sgt;
sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL);
if (!sgt)
sgt = ERR_PTR(-ENOMEM);
if (IS_ERR(sgt)) {
dma_buf_detach(dmabuf, attach);
return ERR_CAST(sgt);
}
attach->sgt = sgt;
}
return attach;
err_attach:
@@ -595,6 +609,10 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) if (WARN_ON(!dmabuf || !attach)) return;
- if (attach->sgt)
dmabuf->ops->unmap_dma_buf(attach, attach->sgt,
DMA_BIDIRECTIONAL);
- mutex_lock(&dmabuf->lock); list_del(&attach->node); if (dmabuf->ops->detach)
@@ -630,6 +648,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, if (WARN_ON(!attach || !attach->dmabuf)) return ERR_PTR(-EINVAL);
- if (attach->sgt)
return attach->sgt;
- sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction); if (!sg_table) sg_table = ERR_PTR(-ENOMEM);
@@ -657,6 +678,9 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, if (WARN_ON(!attach || !attach->dmabuf || !sg_table)) return;
- if (attach->sgt == sg_table)
return;
- attach->dmabuf->ops->unmap_dma_buf(attach, sg_table, direction); }
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 58725f890b5b..52031fdc75bb 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -322,6 +322,7 @@ struct dma_buf_attachment { struct dma_buf *dmabuf; struct device *dev; struct list_head node;
- struct sg_table *sgt; void *priv; };
@@ -373,6 +374,19 @@ static inline void get_dma_buf(struct dma_buf *dmabuf) get_file(dmabuf->file); }
+/**
- dma_buf_is_dynamic - check if a DMA-buf uses dynamic mappings.
- @dmabuf: the DMA-buf to check
- Returns true if a DMA-buf exporter wants to create dynamic sg table mappings
- for each attachment. False if only a single static sg table should be used.
- */
+static inline bool dma_buf_is_dynamic(struct dma_buf *dmabuf) +{
- /* Always use a static mapping for now */
- return false;
Hm I still expect that later on we'll want this to be decided by the attachment: It's only dynamic if both the exporter and the importer support dynamic dma-buf management, otherwise we need to pin.
But anyway, I feel like we need to go over the entire thing anyway once more when p2p has landed, on this:
Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch
Correction that I only just realized, but need to retract that r-b on this and the drm sgt cache removal patch: You now hardcode the direction to DMA_BIDIRECTIONAL, drm_prime did only keep the cache for a given direction.
Now x86 drivers always set DMA_BIDIRECTIONAL, but arm soc drivers (and also v4l videobuf layer) try to guess whether it should be DMA_TO_DEVICE or DMA_FROM_DEVICE. I have no idea what the implications are, also all the cache coherency on dma-bufs is kinda ill-defined.
And we can't throw the sgt cache away at map time if it doesn't fit like drm_prime does, because that reintroduces the reservation object lock, defeating the entire purpose of this. Also we can't just assume drm_prime works for everyone, since the special cases all roll their own dma-buf import. I have also not checked what exactly exporters do. No idea what to do here now.
/me cries
Well it is kind of abusing to use the DMA direction here in the first place and I actually considered to completely remove it.
The key problem is that the DMA direction defines things from the view point of the CPU, e.g. DMA_TO_DEVICE means the data has been accessed by the CPU and we are now pushing it to a device.
But DMA-buf is essentially about device to device transfers, so DMA_TO_DEVICE as well as DMA_FROM_DEVICE is completely ambiguous because you don't know the point of view to start with.
Additional to that the caching implementation in drm_prime.c didn't handled it full either:
/*
* two mappings with different directions for the same attachment are
* not allowed
*/
if (WARN_ON(prime_attach->dir != DMA_NONE))
return ERR_PTR(-EBUSY);
Do the ARM guys and V4L guys actually do anything with the direction except for passing it on dma_map_sg_attrs() ?
I don't think so, but that alone is a big thing because on arm, that actually changes what happens wrt flushing and stuff. On x86 the dma subsystem assumes everything is always coherent, and we have a pile of hacks in drivers to make sure the dma access is coherent enough (since on a gpu this is controlled on a per transaction basis usually, fairly often even under userspace's control).
I have tried to better understand what a better dma api interface would look like for gpus and dma-buf, but every time that comes up the dma maintainers tell me I'm a fool and that this _must_ be abstracted away by the dma api and I give up again. Hence the crying and no idea what to do here :-/
In reality it probably doesn't matter, as long as no one tries to run a gpu driver with dynamic dma-buf export on arm, but I've seen some people trying to get amdgpu at least going in that combo. So not a long-term solution either. And we can't just not cache, because that breaks the world wrt locking ordering I think, or at least a bunch of drm drivers (and not all drm drivers use the drm_prime helpers, so we can't just do this for us and not cache for anyone else and hope that works out).
One option I haven't looked into much, but might work, is that we call the dma api coherency functions if the requested attachment_map/unmap direction doesn't match. But I honestly don't know whether that's allowed and would result in the same thing in the dma-api backends as doing a mapping with the desired direction right away. Also, it would terribly break the layering, so we'd need to add a map_sync callback (there's been patches, but they never landed) and roll them out to all exporters.
Or maybe just back to crying :-) -Daniel
Regards, Christian.
-Daniel
+}
- struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, struct device *dev); void dma_buf_detach(struct dma_buf *dmabuf,
-- 2.17.1
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch