On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
- Setting sg_page to NULL
- 'mapping' pages for P2P DMA without going through the iommu
- Allowing P2P DMA without using the p2p dma API to validate that it can work at all in the first place.
All of these result in functional bugs in certain system configurations.
Jason
Hi Jason, Thanks for the feedback. Regarding point 1, why is that a problem if we disable the option to mmap the dma-buf from user-space ?
Userspace has nothing to do with needing struct pages or not
Point 1 and 2 mostly go together, you supporting the iommu is not nice if you dont have struct pages.
You should study Logan's patches I pointed you at as they are solving exactly this problem.
Yes, I do need to study them. I agree with you here. It appears I have a hole in my understanding. I'm missing the connection between iommu support (which I must have of course) and struct pages.
In addition, I didn't see any problem with sg_page being NULL in the RDMA p2p dma-buf code. Did I miss something here ?
No, the design of the dmabuf requires the exporter to do the dma maps and so it is only the exporter that is wrong to omit all the iommu and p2p logic.
RDMA is OK today only because nobody has implemented dma buf support in rxe/si - mainly because the only implementations of exporters don't
Can you please educate me, what is rxe/si ?
set the struct page and are thus buggy.
ok... so how come that patch-set was merged into 5.12 if it's buggy ? Because the current exporters are buggy ? I probably need a history lesson here. But I understand why you think it's a bad idea to add a new buggy exporter.
I will take two GAUDI devices and use one as an exporter and one as an importer. I want to see that the solution works end-to-end, with real device DMA from importer to exporter.
I can tell you it doesn't. Stuffing physical addresses directly into the sg list doesn't involve any of the IOMMU code so any configuration that requires IOMMU page table setup will not work.
Jason
Yes, that's what I expect to see. But I want to see it with my own eyes and then figure out how to solve this. Maybe the result will be going to Logan's path, maybe something else, but I need to start by seeing the failure in a real system.
Thanks for the information, it is really helpful.
Oded