On Wed, Dec 19, 2012 at 09:22:26AM +0000, Chris Wilson wrote:
On Wed, 19 Dec 2012 11:56:18 +1000, Dave Airlie airlied@gmail.com wrote:
From: Dave Airlie airlied@redhat.com
So we have to offset manager implementations for dealing with VMA offsets. GEM had one using the hash table, TTM had one using an rbtree,
I'd rather we just had one to rule them all, since ttm is using the rbtree variant to allow sub mappings, to keep ABI we need to use that one.
This also adds a bunch of inline helpers to avoid gem/ttm poking around inside the vma_offset objects. TTM needs a reset helper to remove a vma_offset when it copies an object to another one for buffer moves. The helpers also let drivers avoid the map_list.hash_key << PAGE_SHIFT nonsense.
Any clue as to the performance difference between the two implementations? What does it add to the cost of a pagefault?
Hmm, don't have an i-g-t handy for scalability testing of the fault handlers.
+int drm_vma_offset_setup(struct drm_vma_offset_manager *man,
struct drm_vma_offset_node *node,
unsigned long num_pages)
+{
- int ret;
+retry_pre_get:
- ret = drm_mm_pre_get(&man->addr_space_mm);
- if (unlikely(ret != 0))
return ret;
- write_lock(&man->vm_lock);
- node->vm_node = drm_mm_search_free(&man->addr_space_mm,
num_pages, 0, 0);
- if (unlikely(node->vm_node == NULL)) {
ret = -ENOMEM;
ret = -ENOSPC;
Depended upon by the higher layers for determining when to purge their caches; i-g-t/gem_mmap_offset_exhaustion
The larger topic is that drm_mm is only 32bit on 32bit and we routinely exhaust that after a few weeks of uptime. Or better: We _did_ exhaust that, until we've added tons of checks in both kernel&libdrm to reap cached objects if it doesn't work. Hence it's paramount for our code to get a -ENOSPC to engage in mmap offset reaping.
goto out_unlock;
- }
- node->vm_node = drm_mm_get_block_atomic(node->vm_node,
num_pages, 0);
I'd feel happier if this tried to only take from the preallocated pool rather than actually try a GFP_ATOMIC allocation.
Seconded. Imo drm_mm_pre_get should just die - it artificially limits prealloc to 4 nodes, which means you'll fall over if 5 threads enter this. And with the drm_mm rework of about a year ago it's no longer required, you can simply embedded the struct drm_mm_node whereever you want, and mm won't allocate anything any more on top of that. Or preallocate the drm_mm_node in the code, outside of the locks. Inserting the node happens then with drm_mm_insert* functions, which combine the search_free + get_blcok (since really, there's nothing interesting you can do with the hole space you get from search_free). See e.g.
http://cgit.freedesktop.org/~danvet/drm-intel/commit/?id=dc9dd7a20fde95aa81a...
for the conversion. My efforts of yonder to convert everyon have stalled a bit in the ttm code, but now that i915 is converted and (hopefully) the mmap offset stuff, I'll pick this up again. Would allow us to kill quite a bit of cruft from the drm_mm interfaces.
Cheers, Daniel