On Mon, Oct 5, 2020 at 8:54 PM Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Mon, Oct 5, 2020 at 8:37 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Mon, Oct 05, 2020 at 08:16:33PM +0200, Daniel Vetter wrote:
kvm is some similar hack added for P2P DMA, see commit add6a0cd1c5ba51b201e1361b05a5df817083618. It might be protected by notifiers..
Yeah my thinking is that kvm (and I think also vfio, also seems to have mmu notifier nearby) are ok because of the mmu notiifer. Assuming that one works correctly.
vfio doesn't have a notifier, Alex was looking to add a vfio private scheme in the vma->private_data:
https://lore.kernel.org/kvm/159017449210.18853.15037950701494323009.stgit@gi...
Guess it never happened.
I was mislead by the mmu notifier in drivers/vfio/vfio.c. But looking closer, that's only used by some drivers, I guess to make sure their device pagetables are kept in sync with reality. And not to make sure the vfio pfn view is kept in sync with reality.
This could get real nasty I think.
So, the answer really is that s390 and media need fixing, and this API should go away (or become kvm specific)
I'm still not clear how you want fo fix this, since your vma->dma_buf idea is kinda a decade long plan and so just not going to happen:
Well, it doesn't mean we have to change every part of dma_buf to participate in this. Just the bits media cares about. Or maybe it is some higher level varient on top of dma_buf.
Or don't use dma_buf for this, add a new object that just provides refcounts and P2P DMA connection for IO pfn ranges..
So good news is, I dug some layers deeper in v4l, and there's only 2 users which do actually handle pfn and don't immediately convert to a pages array:
- videbuf-dma-contig.c. Luckily videobuf 1 is deprecated since
forever, so I think we might get away with either just breaking this, or at least tainting kernels and hiding it behind a nasty Kconfig. This only uses follow_pfn, which we need to keep anyway for vfio in the unsafe variant :-/
- videbuf2-vmalloc.c Digging through history this was added to support
import of v4l buffers from drivers that needed contig memory. And way back before CMA, that meant carveout memory not backed by struct page *. That should now all have struct pages and be managed by CMA (since videbuf2-dma-contig.c just uses dma_alloc_coherent underneath), so I think we can just switch to pin_user_pages(FOLL_LONGTERM here too).
iow I think I can outright delete the frame vector stuff.
Ok this doesn't work, because dma_mmap always uses a remap_pfn_range, which is a VM_IO | VM_PFNMAP vma and so even if it's cma backed and not a carveout, we can't get the pages. Plus trying to move the cma pages out of cma for FOLL_LONGTERM would be kinda bad when they've been allocated as a contig block by dma_alloc_coherent :-)
So this idea of switching over to pup only is going to break zerocopy. I guess I'll need something else for this then. -Daniel