Russell King - ARM Linux linux@arm.linux.org.uk writes:
On Thu, Aug 13, 2015 at 01:44:03PM -0700, Eric Anholt wrote:
Struct mutex is here because this code is from the V3D series, with the in-kernel BO cache ripped out (it turns out that the CMA allocator is slow, and you can't just userspace cache since we have to do allocations within the kernel to the tune of a couple per draw and that's too much).
The CMA allocator is fast until you have pinned pages in its region, where it becomes _very_ slow to do allocations, sometimes getting up to the order of seconds.
The main culpret of this are GFP_HIGHUSER_MOVABLE allocations which then pin the page. It doesn't take many of those to make CMA really inefficient.
The problem is that CMA doesn't get any information back from the internal page migration about which pages couldn't be moved, so it dumbly just tries incrementing the allocation by one page (subject to alignment constraints) and retrying again - repeating over the entire CMA region. The bigger the region, the more time this takes.
Ouch.
Since I can workaround the allocation cost, the main problem I have right now is that I've got a set of small allocations for 3D that all need to have the same high 4 bits of paddr, because someone cleverly packed some address bits in a GPU-managed structure. Any recommendations for ways to handle this with CMA?