On 4/28/20 6:18 PM, Chris Wilson wrote:
Quoting Nirmoy Das (2020-04-28 17:04:23)
Userspace can severely fragment rb_hole_addr rbtree by manipulating alignment while allocating buffers. Fragmented rb_hole_addr rbtree would result in large delays while allocating buffer object for a userspace application. It takes long time to find suitable hole because if we fail to find a suitable hole in the first attempt then we look for neighbouring nodes using rb_prev(). Traversing rbtree using rb_prev() can take really long time if the tree is fragmented.
This patch improves searches in fragmented rb_hole_addr rbtree by storing an extra field in drm_mm_node, max_hole_size. Each drm_mm_node now stores maximum hole size for its subtree in drm_mm_node->max_hole_size. Using drm_mm_node->max_hole_size, it is possible to eliminate complete subtree if that subtree is unable to serve a request hence reducing number of rb_prev() used.
Note: Implementation details of rb_hole_addr rbtree updates after a node removal and addition is borrowed from block/bfq-wf2q.c which is trying to achieve a similar goal.
Feels like it is an augmented rbtree?
I just read about augmented rbtree quickly and yes it feels like so. I will try to understand more
about it and will see if I can convert this logic to a augmented rbtree.
+static struct drm_mm_node * +next_hole_high_addr(struct drm_mm_node *entry, u64 size) +{
struct rb_node *rb_node, *left_rb_node, *parent_rb_node;
struct drm_mm_node *left_node;
if (!entry)
return false;
rb_node = &entry->rb_hole_addr;
if (rb_node->rb_left) {
left_rb_node = rb_node->rb_left;
parent_rb_node = rb_parent(rb_node);
left_node = rb_entry(left_rb_node,
struct drm_mm_node, rb_hole_addr);
if ((left_node->max_hole_size < size ||
entry->size == entry->max_hole_size) &&
parent_rb_node && parent_rb_node->rb_left != rb_node)
return rb_hole_addr_to_node(parent_rb_node);
}
return rb_hole_addr_to_node(rb_prev(rb_node));
+}
The max_hole_size would equally apply to next_hole_low_addr(), right?
Sorry, yes we need that too. I will add next_hole_low_addr().
Sadly, I did not explore the fragmented search LOW/HIGH in test-drm_mm.c. That would be a useful addition as well.
Yes that would be useful too.
I used ./tools/testing/selftests/drivers/gpu/drm_mm.sh to verify this patch.
-Chris
Thanks for reviewing,
Nirmoy