Re: [PATCH 2/3] find: micro-optimize for_each_{set,clear}_bit()

19 Jun 2021

On Sat, Jun 19, 2021 at 05:24:15PM +0100, Marc Zyngier wrote:
...
On Fri, 18 Jun 2021 20:57:34 +0100,
Yury Norov yury.norov@gmail.com wrote:
...
The macros iterate thru all set/clear bits in a bitmap. They search a
first bit using find_first_bit(), and the rest bits using find_next_bit().
Since find_next_bit() is called shortly after find_first_bit(), we can
save few lines of I-cache by not using find_first_bit().
Really?
...
Signed-off-by: Yury Norov yury.norov@gmail.com
include/linux/find.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/find.h b/include/linux/find.h
index 4500e8ab93e2..ae9ed52b52b8 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 #endif
#define for_each_set_bit(bit, addr, size) \

for ((bit) = find_first_bit((addr), (size));		\


for ((bit) = find_next_bit((addr), (size), 0);		\

On which architecture do you observe a gain? Only 32bit ARM and m68k
implement their own version of find_first_bit(), and everyone else
uses the canonical implementation:
And those who enable GENERIC_FIND_FIRST_BIT - x86, arm64, arc, mips
and s390.
...
#ifndef find_first_bit
#define find_first_bit(addr, size) find_next_bit((addr), (size), 0)
#endif
These architectures explicitly have different implementations for
find_first_bit() and find_next_bit() because they can do better
(whether that is true or not is another debate). I don't think you
should remove this optimisation until it has been measured on these
two architectures.
This patch is based on a series that enables separate implementation
of find_first_bit() for all architectures; according to my tests,
find_first* is ~ twice faster than find_next* on arm64 and x86.
https://lore.kernel.org/lkml/20210612123639.329047-1-yury.norov@gmail.com/T/...
After applying the series, I noticed that my small kernel module that
calls for_each_set_bit() is now using find_first_bit() to just find
one bit, and find_next_bit() for all others. I think it's better to
always use find_next_bit() in this case to minimize the chance of
cache miss. But if it's not that obvious, I'll try to write some test.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [PATCH 2/3] find: micro-optimize for_each_{set,clear}_bit()

Signed-off-by: Yury Norov yury.norov@gmail.com