On Mon, Oct 05, 2020 at 04:18:11PM +0000, Xiong, Jianxin wrote:
The implementation in mlx5 will be much more understandable, it would just do dma_buf_dynamic_attach() and program the XLT exactly the same as a normal umem.
The move_notify() simply zap's the XLT and triggers a work to reload it after the move. Locking is provided by the dma_resv_lock. Only a small disruption to the page fault handler is needed.
We considered such scheme but didn't go that way due to the lack of notification when the move is done and thus the work wouldn't know when it can reload.
Well, the work would block on the reservation lock and that indicates the move is done
It would be nicer if the dma_buf could provide an op that things are ready to go though
Now I think it again, we could probably signal the reload in the page fault handler.
This also works, with a performance cost
- dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
- sgt = dma_buf_map_attachment(umem_dmabuf->attach,
DMA_BIDIRECTIONAL);
- dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
This doesn't look right, this lock has to be held up until the HW is programmed
The mapping remains valid until being invalidated again. There is a sequence number checking before programming the HW.
It races, we could immediately trigger invalidation and then re-program the HW with this stale data.
The use of atomic looks probably wrong as well.
Do you mean umem_dmabuf->notifier_seq? Could you elaborate the concern?
It only increments once per invalidation, that usually is racy.
- total_pages = ib_umem_odp_num_pages(umem_odp);
- for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.nents, j) {
addr = sg_dma_address(sg);
pages = sg_dma_len(sg) >> page_shift;
while (pages > 0 && k < total_pages) {
umem_odp->dma_list[k++] = addr | access_mask;
umem_odp->npages++;
addr += page_size;
pages--;
This isn't fragmenting the sg into a page list properly, won't work for unaligned things
I thought the addresses are aligned, but will add explicit alignment here.
I have no idea what comes out of dma_buf, I wouldn't make too many assumptions since it does have to pass through the IOMMU layer too
And really we don't need the dma_list for this case, with a fixed whole mapping DMA SGL a normal umem sgl is OK and the normal umem XLT programming in mlx5 is fine.
The dma_list is used by both "polulate_mtt()" and "mlx5_ib_invalidate_range", which are used for XLT programming and invalidating (zapping), respectively.
Don't use those functions for the dma_buf flow.
Jason