From: Rasmus Villemoes [mailto:linux@rasmusvillemoes.dk]
On Fri, Dec 16 2016, Matthew Wilcox mawilcox@microsoft.com wrote:
Thanks for your work on this; you've really put some effort into proving your work has value. My motivation was purely aesthetic, but you've got some genuine savings here (admittedly it's about a quarter of a cent's worth of memory with DRAM selling for $10/GB). Nevertheless, that adds up over a billion devices, and there are still people trying to fit Linux into 4MB embedded devices.
Yeah, my main motivation was embedded devices which don't have the luxury of measuring their RAM in GB. E.g., it's crazy that the watchdog_ida effectively use more memory than the .text of the watchdog subsystem, and similarly for the kthread workers, etc., etc.. I didn't mean for my patches to go in as is, more to provoke some discussion. I wasn't aware of your reimplementation, but it seems that may make the problem go away.
It certainly shrinks the problem down to a size where it may not be worth introducing another implementation.
On a 64-bit machine, your tIDA root is 24 bytes; my new IDA root is 16 bytes. If you allocate only one entry, you'll allocate 8 bytes. Thanks to the slab allocator, that gets rounded up to 32 bytes. I allocate the full 128 byte leaf, but I store the pointer to it in the root (unlike the IDR, the radix tree doesn't need to allocate a layer for a single entry). So tIDA wins on memory consumption between 1 and 511 IDs, and newIDA is slightly ahead between 512 and 1023 IDs.
This sounds good. I think there may still be a lot of users that never allocate more than a handful of IDAs, making a 128 byte allocation still somewhat excessive. One thing I considered was (exactly as it's done for file descriptor tables) to embed a single word in the struct ida and use that initially; I haven't looked closely at newIDA, so I don't know how easy that would be or if its worth the complexity.
Heh, I was thinking about that too. The radix tree supports "exceptional entries" which have the bottom bit set. On a 64-bit machine, we could use 62 of the bits in the radix tree root to store the ID bitmap. I'm a little wary of the potential complexity, but we should try it out.
Did you come up with any fun tests that could be added to the test-suite? It feels a little slender right now.