From: Rasmus Villemoes [mailto:linux@rasmusvillemoes.dk]
Nice work! A few random comments/questions:
- It does add some complexity, but I think a few comments would make it more digestable.
I'm open to adding some comments ... I need some time between writing the code and writing the comments to be sure what comments are useful.
- Hm, maybe I'm confused, and I certainly don't understand how the whole radix tree works. But do you use every leaf node as an exceptional entry initially, to allocate the first 62 ids from that level? This code
I do! And that question tells me one useful comment to add!
if ((bit + RADIX_TREE_EXCEPTIONAL_SHIFT) < BITS_PER_LONG) { bit += RADIX_TREE_EXCEPTIONAL_SHIFT; radix_tree_iter_replace(root, &iter, slot, (void *)((1UL << bit) | RADIX_TREE_EXCEPTIONAL_ENTRY)); *id = new; return 0; }
operates on bit, which is the offset from index*IDA_BITMAP_BITS, and it seems to create an exceptional entry somewhere down the tree (which may of course be the root).
But we don't seem to allocate another bit from that exceptional entry ever unless it happened to sit at index 0; the code
unsigned long tmp = (unsigned long)bitmap; if (start < BITS_PER_LONG) { unsigned tbit = find_next_zero_bit(&tmp, BITS_PER_LONG, start); if (tbit < BITS_PER_LONG) { tmp |= 1UL << tbit; rcu_assign_pointer(*slot, (void *)tmp); *id = new + tbit - RADIX_TREE_EXCEPTIONAL_SHIFT; return 0; } }
is only used for small values of start (though it does seem to account for a non-zero value of new == iter.index * IDA_BITMAP_BITS).
Ahh. You're reading the code wrong, and I wrote the code wrongly too, so it's clearly overly complex. I was testing with 'start' set to 0, allocating N entries, and then freeing them. If I'd set start to 1024 and allocated two entries, I'd've seen the failure.
Please see the top commit here ("Improve IDA exceptional entry handling"): http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/idr-2...
In the micro-optimization department, I think we should avoid find_next_zero_bit on a single word, and instead use __ffs. Something like (assuming we can use bit instead of start)
if (bit + RADIX_TREE_EXCEPTIONAL_SHIFT < BITS_PER_LONG) { tmp = (~(unsigned long)bitmap) >> (bit + RADIX_TREE_EXCEPTIONAL_SHIFT); if (tmp) { tbit = __ffs(tmp) + bit + RADIX_TREE_EXCEPTIONAL_SHIFT; tmp = (unsigned long)bitmap | (1UL << tbit); rcu_assign_pointer(*slot, (void *)tmp); *id = new + tbit - RADIX_TREE_EXCEPTIONAL_SHIFT; return 0; } }
I'm certainly open to microoptimisations, but I'll have to think about this one for a bit.