RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region

6 Oct 2020

      ...
-----Original Message-----
From: Daniel Vetter daniel@ffwll.ch
Sent: Tuesday, October 06, 2020 2:22 AM
To: Xiong, Jianxin jianxin.xiong@intel.com
Cc: Jason Gunthorpe jgg@ziepe.ca; Leon Romanovsky leon@kernel.org; linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
Doug Ledford dledford@redhat.com; Vetter, Daniel daniel.vetter@intel.com; Christian Koenig christian.koenig@amd.com
Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region
On Mon, Oct 05, 2020 at 04:18:11PM +0000, Xiong, Jianxin wrote:
...
...
-----Original Message-----
From: Jason Gunthorpe jgg@ziepe.ca
Sent: Monday, October 05, 2020 6:13 AM
To: Xiong, Jianxin jianxin.xiong@intel.com
Cc: linux-rdma@vger.kernel.org; dri-devel@lists.freedesktop.org;
Doug Ledford dledford@redhat.com; Leon Romanovsky
leon@kernel.org; Sumit Semwal sumit.semwal@linaro.org; Christian
Koenig christian.koenig@amd.com; Vetter, Daniel
daniel.vetter@intel.com
Subject: Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf
as user memory region
On Sun, Oct 04, 2020 at 12:12:28PM -0700, Jianxin Xiong wrote:
...
Dma-buf is a standard cross-driver buffer sharing mechanism that
can be used to support peer-to-peer access from RDMA devices.
Device memory exported via dma-buf is associated with a file descriptor.
This is passed to the user space as a property associated with the
buffer allocation. When the buffer is registered as a memory
region, the file descriptor is passed to the RDMA driver along
with other parameters.
Implement the common code for importing dma-buf object and mapping
dma-buf pages.
Signed-off-by: Jianxin Xiong jianxin.xiong@intel.com
Reviewed-by: Sean Hefty sean.hefty@intel.com
Acked-by: Michael J. Ruhl michael.j.ruhl@intel.com

drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/umem.c        |   4 +
 drivers/infiniband/core/umem_dmabuf.c | 291
++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/umem_dmabuf.h |  14 ++
 drivers/infiniband/core/umem_odp.c    |  12 ++
 include/rdma/ib_umem.h                |  19 ++-
 6 files changed, 340 insertions(+), 2 deletions(-)  create mode
100644 drivers/infiniband/core/umem_dmabuf.c
 create mode 100644 drivers/infiniband/core/umem_dmabuf.h
I think this is using ODP too literally, dmabuf isn't going to need
fine grained page faults, and I'm not sure this locking scheme is OK - ODP is horrifically complicated.
...
If this is the approach then I think we should make dmabuf its own
stand alone API, reg_user_mr_dmabuf()
That's the original approach in the first version. We can go back there.
...
The implementation in mlx5 will be much more understandable, it
would just do dma_buf_dynamic_attach() and program the XLT exactly the same as a normal umem.
The move_notify() simply zap's the XLT and triggers a work to reload
it after the move. Locking is provided by the dma_resv_lock. Only a small disruption to the page fault handler is needed.
We considered such scheme but didn't go that way due to the lack of
notification when the move is done and thus the work wouldn't know
when it can reload.
Now I think it again, we could probably signal the reload in the page fault handler.
For reinstanting the pages you need:

dma_resv_lock, this prevents anyone else from issuing new moves or
anything like that
dma_resv_get_excl + dma_fence_wait to wait for any pending moves to
finish. gpus generally don't wait on the cpu, but block the dependent
dma operations from being scheduled until that fence fired. But for rdma
odp I think you need the cpu wait in your worker here.
get the new sg list, write it into your ptes
dma_resv_unlock to make sure you're not racing with a concurrent
move_notify

You can also grab multiple dma_resv_lock in atomically, but I think the odp rdma model doesn't require that (gpus need that).
Note that you're allowed to allocate memory with GFP_KERNEL while holding dma_resv_lock, so this shouldn't impose any issues. You are
otoh not allowed to cause userspace faults (so no gup/pup or copy*user with faulting enabled). So all in all this shouldn't be any worse that
calling pup for normal umem.
Unlike mmu notifier the caller holds dma_resv_lock already for you around the move_notify callback, so you shouldn't need any additional
locking in there (aside from what you need to zap the ptes and flush hw tlbs).
Cheers, Daniel
Hi Daniel, thanks for providing the details. I would have missed the dma_resv_get_excl + dma_fence_wait part otherwise.
...
...
...
...

dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
sgt = dma_buf_map_attachment(umem_dmabuf->attach,
		     DMA_BIDIRECTIONAL);

dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);

This doesn't look right, this lock has to be held up until the HW is
programmed
The mapping remains valid until being invalidated again. There is a sequence number checking before programming the HW.
...
The use of atomic looks probably wrong as well.
Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?
...
...

k = 0;
total_pages = ib_umem_odp_num_pages(umem_odp);
for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
addr = sg_dma_address(sg);

pages = sg_dma_len(sg) >> page_shift;

while (pages > 0 && k < total_pages) {

	umem_odp->dma_list[k++] = addr | access_mask;

	umem_odp->npages++;

	addr += page_size;

	pages--;

This isn't fragmenting the sg into a page list properly, won't work
for unaligned things
I thought the addresses are aligned, but will add explicit alignment here.
...
And really we don't need the dma_list for this case, with a fixed
whole mapping DMA SGL a normal umem sgl is OK and the normal umem XLT programming in mlx5 is fine.
The dma_list is used by both "polulate_mtt()" and "mlx5_ib_invalidate_range", which are used for XLT programming and invalidating
(zapping), respectively.
...
...
Jason

dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

RE: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region