Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region

6 Oct 2020


      On Tue, Oct 06, 2020 at 08:17:05PM +0200, Daniel Vetter wrote:
...
So on the gpu we pipeline this all. So step 4 doesn't happen on the
cpu, but instead we queue up a bunch of command buffers so that the
gpu writes these pagetables (and the flushes tlbs and then does the
actual stuff userspace wants it to do).
mlx5 HW does basically this as well.
We just apply scheduling for this work on the device, not in the CPU.
...
just queue it all up and let the gpu scheduler sort out the mess. End
result is that you get a sgt that points at stuff which very well
might have nothing even remotely resembling your buffer in there at
the moment. But all the copy operations are queued up, so rsn the data
will also be there.
The explanation make sense, thanks
...
But rdma doesn't work like that, so it looks all a bit funny.
Well, I guess it could, but how would it make anything better? I can
overlap building the SGL and the device PTEs with something else doing
'move', but is that a workload that needs such agressive optimization?
...
This is also why the precise semantics of move_notify for gpu<->gpu
sharing took forever to discuss and are still a bit wip, because you
have the inverse problem: The dma api mapping might still be there
Seems like this all makes a graph of operations, can't start the next
one until all deps are finished. Actually sounds a lot like futures.
Would be clearer if this attach API provided some indication that the
SGL is actually a future valid SGL..
Jason

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [RFC PATCH v3 1/4] RDMA/umem: Support importing dma-buf as user memory region