On 11/10/2011 07:05 PM, Jerome Glisse wrote:
On Thu, Nov 10, 2011 at 11:27:33AM +0100, Thomas Hellstrom wrote:
On 11/09/2011 09:22 PM, j.glisse@gmail.com wrote:
From: Jerome Glissejglisse@redhat.com
This is an overhaul of the ttm memory accounting. This tries to keep the same global behavior while removing the whole zone concept. It keeps a distrinction for dma32 so that we make sure that ttm don't starve the dma32 zone.
There is 3 threshold for memory allocation :
- max_mem is the maximum memory the whole ttm infrastructure is going to allow allocation for (exception of system process see below)
- emer_mem is the maximum memory allowed for system process, this limit is> to max_mem
- swap_limit is the threshold at which point ttm will start to try to swap object because ttm is getting close the max_mem limit
- swap_dma32_limit is the threshold at which point ttm will start swap object to try to reduce the pressure on the dma32 zone. Note that we don't specificly target object to swap to it might very well free more memory from highmem rather than from dma32
Accounting is done through used_mem& used_dma32_mem, which sum give the total amount of memory actually accounted by ttm.
Idea is that allocation will fail if (used_mem + used_dma32_mem)> max_mem and if swapping fail to make enough room.
The used_dma32_mem can be updated as a later stage, allowing to perform accounting test before allocating a whole batch of pages.
Jerome, you're removing a fair amount of functionality here, without justifying why it could be removed.
All this code was overkill.
[1] I don't agree, and since it's well tested, thought throught and working, I see no obvious reason to alter it, within the context of this patch series unless it's absolutely required for the functionality.
Consider a low-end system with 1G of kernel memory and 10G of highmem. How do we avoid putting stress on the kernel memory? I also wouldn't be too surprised if DMA32 zones appear in HIGHMEM systems in the future making the current zone concept good to keep.
Right now kernel memory is accounted against all zone so it decrease not only the kernel zone but also the dma32& highmem if present.
Do you mean that the code is incorrect? In that case, did you consider the fact that zones may overlap? (Although I admit the name "highmem" might be misleading. Should be "total").
Note also that kernel zone in current code == dma32 zone.
Last time I looked, the highmem split was typically at slightly less than 1GB, depending on the hardware and desired setup. I admit that was some time ago, but has that really changed? On all archs? Furthermore, on !Highmem systems, All pages are in the kernel zone.
When it comes to future it looks a lot simpler, it seems everyone is moving toward more capable and more advanced iommu that can remove all the restriction on memory from the device pov. I mean even arm are getting more and more advance iommu. I don't see any architecture worse supporting not going down that road.
While the proposed change is probably possible, with different low <-> high splits depending on whether HIGHMEM is defined or not, I think [1] is a good reason for not pushing it through.
Also, in effect you move the DOS from *all* zones into the DMA32 zone and create a race in that multiple simultaneous allocators can first pre-allocate out of the global zone, and then update the DMA32 zone without synchronization. In this way you might theoretically end up with more DMA32 pages allocated than present in the zone.
Ok a respin attached with a simple change, things will be accounted against dma32 zone and only when we get page we will decrease the dma32 zone usage that way no DOS on dma32.
It also deals with the case where there is still lot of highmem but no more dma32.
So why not just do a ttm_mem_global_alloc() for the pages you want to allocate, and add a proper adjustment function if memory turns out to be either HIGHMEM or !DMA32
With the proposed code there's also a theoretical problem in that a potentially huge number of pages are unaccounted before they are actually freed.
What you mean unaccounted ? The way it works is : r = global_memory_alloc(size) if (r) fail alloc pages update memory accounting according to what page where allocated
So memory is always accounted before even being allocated (exception are the kernel object for vmwgfx& ttm_bo but we can move accounting there too if you want, those are small allocation and i didn't think it was worse changing that).
No, I mean the sequence
unaccount_page_array() ---Race-- free_page_array()
/Thomas
Cheers, Jerome