On Tue, Jan 15, 2019 at 06:03:39PM +0000, Thomas Hellstrom wrote:
In the graphics case, it's probably because it doesn't fit the graphics use-cases:
- Memory typically needs to be mappable by another device. (the "dma-
buf" interface)
And there is nothing preventing dma-buf sharing of these buffers. Unlike the get_sgtable mess it can actually work reliably on architectures that have virtually tagged caches and/or don't guarantee cache coherency with mixed attribute mappings.
- DMA buffers are exported to user-space and is sub-allocated by it.
Mostly there are no GPU user-space kernel interfaces to sync / flush subregions and these syncs may happen on a smaller-than-cache-line granularity.
I know of no architectures that can do cache maintainance on a less than cache line basis. Either the instructions require you to specifcy cache lines, or they do sometimes more, sometimes less intelligent rounding up.
Note that as long dma non-coherent buffers are devices owned it is up to the device and the user space driver to take care of coherency, the kernel very much is out of the picture.