On Wed, Jan 30, 2019 at 09:02:08AM +0100, Christoph Hellwig wrote:
On Tue, Jan 29, 2019 at 08:58:35PM +0000, Jason Gunthorpe wrote:
On Tue, Jan 29, 2019 at 01:39:49PM -0700, Logan Gunthorpe wrote:
implement the mapping. And I don't think we should have 'special' vma's for this (though we may need something to ensure we don't get mapping requests mixed with different types of pages...).
I think Jerome explained the point here is to have a 'special vma' rather than a 'special struct page' as, really, we don't need a struct page at all to make this work.
If I recall your earlier attempts at adding struct page for BAR memory, it ran aground on issues related to O_DIRECT/sgls, etc, etc.
Struct page is what makes O_DIRECT work, using sgls or biovecs, etc on it work. Without struct page none of the above can work at all. That is why we use struct page for backing BARs in the existing P2P code. Not that I'm a particular fan of creating struct page for this device memory, but without major invasive surgery to large parts of the kernel it is the only way to make it work.
I don't think anyone is interested in O_DIRECT/etc for RDMA doorbell pages.
.. and again, I recall Logan already attempted to mix non-CPU memory into sgls and it was a disaster. You pointed out that one cannot just put iomem addressed into a SGL without auditing basically the entire block stack to prove that nothing uses iomem without an iomem accessor.
I recall that proposal veered into a direction where the block layer would just fail very early if there was iomem in the sgl, so generally no O_DIRECT support anyhow.
We already accepted the P2P stuff from Logan as essentially a giant special case - it only works with RDMA and only because RDMA MR was hacked up with a special p2p callback.
I don't see why a special case with a VMA is really that different.
If someone figures out the struct page path down the road it can obviously be harmonized with this VMA approach pretty easily.
Jason
On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
I don't see why a special case with a VMA is really that different.
Well one *really* big difference is the VMA changes necessarily expose specialized new functionality to userspace which has to be supported forever and may be difficult to change. The p2pdma code is largely in-kernel and we can rework and change the interfaces all we want as we improve our struct page infrastructure.
I'd also argue that p2pdma isn't nearly as specialized as this VMA thing and can be used pretty generically to do other things. Though, the other ideas we've talked about doing are pretty far off and may have other challenges.
Logan
On Wed, Jan 30, 2019 at 11:13:11AM -0700, Logan Gunthorpe wrote:
On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
I don't see why a special case with a VMA is really that different.
Well one *really* big difference is the VMA changes necessarily expose specialized new functionality to userspace which has to be supported forever and may be difficult to change. The p2pdma code is largely in-kernel and we can rework and change the interfaces all we want as we improve our struct page infrastructure.
I do not see how VMA changes are any different than using struct page in respect to userspace exposure. Those vma callback do not need to be set by everyone, in fact expectation is that only handful of driver will set those.
How can we do p2p between RDMA and GPU for instance, without exposure to userspace ? At some point you need to tell userspace hey this kernel does allow you to do that :)
RDMA works on vma, and GPU driver can easily setup vma for an object hence why vma sounds like a logical place. In fact vma (mmap of a device file) is very common device driver pattern.
In the model i am proposing the exporting device is in control of policy ie wether to allow or not the peer to peer mapping. So each device driver can define proper device specific API to enable and expose that feature to userspace.
If they do, the only thing we have to preserve is the end result for the user. The userspace does not care one bit if we achieve this in the kernel with a set of new callback within the vm_operations struct or in some other way. Only the end result matter.
So question is do we want to allow RDMA to access GPU driver object ? I believe we do, they are people using non upstream solution with open source driver to do just that, so it is a testimony that they are users for this. More use case have been propose too.
I'd also argue that p2pdma isn't nearly as specialized as this VMA thing and can be used pretty generically to do other things. Though, the other ideas we've talked about doing are pretty far off and may have other challenges.
I believe p2p is highly specialize on non cache-coherent inter-connect platform like x86 with PCIE. So i do not think that using struct page for this is a good idea, it is not warranted/needed, and it can only be problematic if some random kernel code get holds of those struct page without understanding it is not regular memory.
I believe the vma callback are the simplest solution with the minimum burden for the device driver and for the kernel. If they are any better solution that emerge there is nothing that would block us to remove this to replace it with the other solution.
Cheers, Jérôme
On Wed, Jan 30, 2019 at 11:13:11AM -0700, Logan Gunthorpe wrote:
On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
I don't see why a special case with a VMA is really that different.
Well one *really* big difference is the VMA changes necessarily expose specialized new functionality to userspace which has to be supported forever and may be difficult to change.
The only user change here is that more things will succeed when creating RDMA MRs (and vice versa to GPU). I don't think this restricts the kernel implementation at all, unless we intend to remove P2P entirely..
Jason
On 2019-01-30 12:19 p.m., Jason Gunthorpe wrote:
On Wed, Jan 30, 2019 at 11:13:11AM -0700, Logan Gunthorpe wrote:
On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
I don't see why a special case with a VMA is really that different.
Well one *really* big difference is the VMA changes necessarily expose specialized new functionality to userspace which has to be supported forever and may be difficult to change.
The only user change here is that more things will succeed when creating RDMA MRs (and vice versa to GPU). I don't think this restricts the kernel implementation at all, unless we intend to remove P2P entirely..
Well for MRs I'd expect you are using struct pages to track the memory some how.... VMAs that aren't backed by pages and use this special interface must therefore be creating new special interfaces that can call p2p_[un]map...
I'd much rather see special cases around struct page so we can find ways to generalize it in the future than create special cases tied to random userspace interfaces.
Logan
On Wed, Jan 30, 2019 at 12:48:33PM -0700, Logan Gunthorpe wrote:
On 2019-01-30 12:19 p.m., Jason Gunthorpe wrote:
On Wed, Jan 30, 2019 at 11:13:11AM -0700, Logan Gunthorpe wrote:
On 2019-01-30 10:44 a.m., Jason Gunthorpe wrote:
I don't see why a special case with a VMA is really that different.
Well one *really* big difference is the VMA changes necessarily expose specialized new functionality to userspace which has to be supported forever and may be difficult to change.
The only user change here is that more things will succeed when creating RDMA MRs (and vice versa to GPU). I don't think this restricts the kernel implementation at all, unless we intend to remove P2P entirely..
Well for MRs I'd expect you are using struct pages to track the memory some how....
Not really, for MRs most drivers care about DMA addresses only. The only reason struct page ever gets involved is because it is part of the GUP, SGL and dma_map family of APIs.
VMAs that aren't backed by pages and use this special interface must therefore be creating new special interfaces that can call p2p_[un]map...
Well, those are kernel internal interfaces, so they can be changed
No matter what we do, code that wants to DMA to user BAR pages must take *some kind* of special care - either it needs to use a special GUP and SGL flow, or a mixed GUP, SGL and p2p_map flow.
I don't really see why one is better than the other at this point, or why doing one means we can't do the other some day later. They are fairly similar.
O_DIRECT seems to be the justification for struct page, but nobody is signing up to make O_DIRECT have the required special GUP/SGL/P2P flow that would be needed to *actually* make that work - so it really isn't a justification today.
Jason
dri-devel@lists.freedesktop.org