On Fri, Oct 2, 2020 at 8:06 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Fri, Oct 02, 2020 at 07:53:03PM +0200, Daniel Vetter wrote:
For $reasons I've stumbled over this code and I'm not sure the change to the new gup functions in 55a650c35fea ("mm/gup: frame_vector: convert get_user_pages() --> pin_user_pages()") was entirely correct.
This here is used for long term buffers (not just quick I/O) like RDMA, and John notes this in his patch. But I thought the rule for these is that they need to add FOLL_LONGTERM, which John's patch didn't do.
There is already a dax specific check (added in b7f0554a56f2 ("mm: fail get_vaddr_frames() for filesystem-dax mappings")), so this seems like the prudent thing to do.
Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Andrew Morton akpm@linux-foundation.org Cc: John Hubbard jhubbard@nvidia.com Cc: Jérôme Glisse jglisse@redhat.com Cc: Jan Kara jack@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Hi all,
I stumbled over this and figured typing this patch can't hurt. Really just to maybe learn a few things about how gup/pup is supposed to be used (we have a bit of that in drivers/gpu), this here isn't really ralated to anything I'm doing.
FOLL_FORCE is a pretty big clue it should be FOLL_LONGTERM, IMHO
Since you're here ... I've noticed that ib sets FOLL_FORCE when the ib verb access mode indicates possible writes. I'm not really clear on why FOLL_WRITE isn't enough any why you need to be able to write through a vma that's write protected currently.
I'm also wondering whether the explicit dax check should be removed, since FOLL_LONGTERM should take care of that already.
Yep! Confirms the above!
This get_vaddr_frames() thing looks impossible to use properly. How on earth does a driver guarentee
"If @start belongs to VM_IO | VM_PFNMAP vma, we don't touch page structures and the caller must make sure pfns aren't reused for anything else while he is using them."
The only possible way to do that is if the driver restricts the VMAs to ones it owns and interacts with the vm_private data to refcount something.
Since every driver does this wrong anything that uses this is creating terrifying security issues.
IMHO this whole API should be deleted :(
Yeah that part I just tried to conveniently ignore. I guess this dates back to a time when ioremaps where at best fixed, and there wasn't anything like a gpu driver dynamically managing vram around, resulting in random entirely unrelated things possibly being mapped to that set of pfns.
The underlying follow_pfn is also used in other places within drivers/media, so this doesn't seem to be an accident, but actually intentional.
I guess minimally we'd need a VM_PFNMAP flag for dynamically manged drivers like modern drm gpu drivers, to make sure follow_pfn doesn't follow these? -Daniel