Message ID | 20240620175703.605111-1-yury.norov@gmail.com |
---|---|
Headers | show |
Series | lib/find: add atomic find_bit() primitives | expand |
On Thu, 20 Jun 2024 at 10:57, Yury Norov <yury.norov@gmail.com> wrote: > > > The typical lock-protected bit allocation may look like this: If it looks like this, then nobody cares. Clearly the user in question never actually cared about performance, and you SHOULD NOT then say "let's optimize this that nobody cares about":. Yury, I spend an inordinate amount of time just double-checking your patches. I ended up having to basically undo one of them just days ago. New rule: before you send some optimization, you need to have NUMBERS. Some kind of "look, this code is visible in profiles, so we actually care". Because without numbers, I'm just not going to pull anything from you. These insane inlines for things that don't matter need to stop. And if they *DO* matter, you need to show that they matter. Linus
On Thu, Jun 20, 2024 at 11:00:38AM -0700, Linus Torvalds wrote: > On Thu, 20 Jun 2024 at 10:57, Yury Norov <yury.norov@gmail.com> wrote: > > > > > > The typical lock-protected bit allocation may look like this: > > If it looks like this, then nobody cares. Clearly the user in question > never actually cared about performance, and you SHOULD NOT then say > "let's optimize this that nobody cares about":. > > Yury, I spend an inordinate amount of time just double-checking your > patches. I ended up having to basically undo one of them just days > ago. Is that in master already? I didn't get any email, and I can't find anything related in the master branch. > New rule: before you send some optimization, you need to have NUMBERS. I tried to underline that it's not a performance optimization at my best. People notice some performance differences, but it's ~3%, no more. > Some kind of "look, this code is visible in profiles, so we actually care". The original motivation comes from a KCSAN report, so it's already visible in profiles. See [1] in cover letter. This series doesn't fix that particular issue, but it adds tooling that allow people to search and acquire bits in bitmaps without firing KCSAN warnings. This series fixes one real bug in the codebase - see #33, and simplifies bitmaps usage in many other places. Many people like it, and acked the patches. Again, this is NOT a performance series. Thanks, Yury > Because without numbers, I'm just not going to pull anything from you. > These insane inlines for things that don't matter need to stop. > > And if they *DO* matter, you need to show that they matter. > > Linus
On Thu, 20 Jun 2024 at 11:32, Yury Norov <yury.norov@gmail.com> wrote: > > Is that in master already? I didn't get any email, and I can't find > anything related in the master branch. It's 5d272dd1b343 ("cpumask: limit FORCE_NR_CPUS to just the UP case"). > > New rule: before you send some optimization, you need to have NUMBERS. > > I tried to underline that it's not a performance optimization at my > best. If it's not about performance, then it damn well shouldn't be 90% inline functions in a header file. If it's a helper function, it needs to be a real function elsewhere. Not this: include/linux/find_atomic.h | 324 +++++++++++++++++++ because either performance really matters, in which case you need to show profiles, or performance doesn't matter, in which case it damn well shouldn't have special cases for small bitsets that double the size of the code. Linus
On Thu, Jun 20, 2024 at 12:26:18PM -0700, Linus Torvalds wrote: > On Thu, 20 Jun 2024 at 11:32, Yury Norov <yury.norov@gmail.com> wrote: > > > > Is that in master already? I didn't get any email, and I can't find > > anything related in the master branch. > > It's 5d272dd1b343 ("cpumask: limit FORCE_NR_CPUS to just the UP case"). FORCE_NR_CPUS helped to generate a better code for me back then. I'll check again against the current kernel. The 5d272dd1b343 is wrong. Limiting FORCE_NR_CPUS to UP case makes no sense because in UP case nr_cpu_ids is already a compile-time macro: #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS) #define nr_cpu_ids ((unsigned int)NR_CPUS) #else extern unsigned int nr_cpu_ids; #endif I use FORCE_NR_CPUS for my Rpi. (used, until I burnt it) > > > New rule: before you send some optimization, you need to have NUMBERS. > > > > I tried to underline that it's not a performance optimization at my > > best. > > If it's not about performance, then it damn well shouldn't be 90% > inline functions in a header file. > > If it's a helper function, it needs to be a real function elsewhere. Not this: > > include/linux/find_atomic.h | 324 +++++++++++++++++++ > > because either performance really matters, in which case you need to > show profiles, or performance doesn't matter, in which case it damn > well shouldn't have special cases for small bitsets that double the > size of the code. This small_const_nbits() thing is a compile-time optimization for a single-word bitmap with a compile-time length. If the bitmap is longer, or nbits is not known at compile time, the inline part goes away entirely at compile time. In the other case, outline part goes away. So those converting from find_bit() + test_and_set_bit() will see no new outline function calls. This inline + outline implementation is traditional for bitmaps, and for some people it's important. For example, Sean Christopherson explicitly asked to add a notice that converting to the new API will still generate inline code. See patch #13.
On Thu, 20 Jun 2024 at 13:20, Yury Norov <yury.norov@gmail.com> wrote: > > FORCE_NR_CPUS helped to generate a better code for me back then. I'll > check again against the current kernel. Of _course_ it generates better code. But when "better code" is a source of bugs, and isn't actually useful in general, it's not better, is it. > The 5d272dd1b343 is wrong. Limiting FORCE_NR_CPUS to UP case makes no > sense because in UP case nr_cpu_ids is already a compile-time macro: Yury, I'm very aware. That was obviously intentional. the whole point of the commit is to just disable the the whole thing as useless and problematic. I could have just ripped it out entirely. I ended up doing a one-liner instead. Linus