diff mbox series

compiler: enable CONFIG_OPTIMIZE_INLINING forcibly

Message ID 20190830034304.24259-1-yamada.masahiro@socionext.com
State Accepted
Commit ac7c3e4ff401b304489a031938dbeaab585bfe0a
Headers show
Series compiler: enable CONFIG_OPTIMIZE_INLINING forcibly | expand

Commit Message

Masahiro Yamada Aug. 30, 2019, 3:43 a.m. UTC
Commit 9012d011660e ("compiler: allow all arches to enable
CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable
this option. A couple of build errors were reported by randconfig,
but all of them have been ironed out.

Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely
(and it will simplify the 'inline' macro in compiler_types.h),
this commit changes it to always-on option. Going forward, the
compiler will always be allowed to not inline functions marked
'inline'.

This is not a problem for x86 since it has been long used by
arch/x86/configs/{x86_64,i386}_defconfig.

I am keeping the config option just in case any problem crops up for
other architectures.

The code clean-up will be done after confirming this is solid.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

---

 lib/Kconfig.debug | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

-- 
2.17.1

Comments

Nick Desaulniers Aug. 30, 2019, 8:54 p.m. UTC | #1
On Thu, Aug 29, 2019 at 8:43 PM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>

> Commit 9012d011660e ("compiler: allow all arches to enable

> CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> this option. A couple of build errors were reported by randconfig,

> but all of them have been ironed out.

>

> Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> (and it will simplify the 'inline' macro in compiler_types.h),

> this commit changes it to always-on option. Going forward, the

> compiler will always be allowed to not inline functions marked

> 'inline'.

>

> This is not a problem for x86 since it has been long used by

> arch/x86/configs/{x86_64,i386}_defconfig.

>

> I am keeping the config option just in case any problem crops up for

> other architectures.

>

> The code clean-up will be done after confirming this is solid.

>

> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


Just saw akpm picked this up, but
Acked-by: Nick Desaulniers <ndesaulniers@google.com>


> ---

>

>  lib/Kconfig.debug | 4 +---

>  1 file changed, 1 insertion(+), 3 deletions(-)

>

> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug

> index 5960e2980a8a..e25493811df8 100644

> --- a/lib/Kconfig.debug

> +++ b/lib/Kconfig.debug

> @@ -327,7 +327,7 @@ config HEADERS_CHECK

>           relevant for userspace, say 'Y'.

>

>  config OPTIMIZE_INLINING

> -       bool "Allow compiler to uninline functions marked 'inline'"

> +       def_bool y

>         help

>           This option determines if the kernel forces gcc to inline the functions

>           developers have marked 'inline'. Doing so takes away freedom from gcc to

> @@ -338,8 +338,6 @@ config OPTIMIZE_INLINING

>           decision will become the default in the future. Until then this option

>           is there to test gcc for this.

>

> -         If unsure, say N.

> -

>  config DEBUG_SECTION_MISMATCH

>         bool "Enable full Section mismatch analysis"

>         help

> --

> 2.17.1

>



-- 
Thanks,
~Nick Desaulniers
Geert Uytterhoeven Sept. 26, 2019, 8:54 a.m. UTC | #2
Hi Yamada-san,

On Fri, Aug 30, 2019 at 5:44 AM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
> Commit 9012d011660e ("compiler: allow all arches to enable

> CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> this option. A couple of build errors were reported by randconfig,

> but all of them have been ironed out.

>

> Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> (and it will simplify the 'inline' macro in compiler_types.h),

> this commit changes it to always-on option. Going forward, the

> compiler will always be allowed to not inline functions marked

> 'inline'.

>

> This is not a problem for x86 since it has been long used by

> arch/x86/configs/{x86_64,i386}_defconfig.

>

> I am keeping the config option just in case any problem crops up for

> other architectures.

>

> The code clean-up will be done after confirming this is solid.

>

> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


This breaks compiling drivers/video/fbdev/c2p*, as the functions in
drivers/video/fbdev/c2p_core.h are no longer inlined, leading to calls
to the non-existent function c2p_unsupported(), as reported by KISSKB:

On Thu, Sep 26, 2019 at 5:02 AM <noreply@ellerman.id.au> wrote:
> FAILED linux-next/m68k-defconfig/m68k Thu Sep 26, 12:58

>

> http://kisskb.ellerman.id.au/kisskb/buildresult/13973194/

>

> Commit:   Add linux-next specific files for 20190925

>           d47175169c28eedd2cc2ab8c01f38764cb0269cc

> Compiler: m68k-linux-gcc (GCC) 4.6.3 / GNU ld (GNU Binutils) 2.22

>

> Possible errors

> ---------------

>

> c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported'

> c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported'

> c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported'

> c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported'

> make[1]: *** [Makefile:1074: vmlinux] Error 1

> make: *** [Makefile:179: sub-make] Error 2


I managed to reproduce this with gcc version 8.3.0 (Ubuntu
8.3.0-6ubuntu1~18.04.1) , and bisected the failure to commit
025f072e5823947c ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") .

Marking the functions __always_inline instead of inline fixes that.
Shall I send a patch to do that?

Thanks!

> --- a/lib/Kconfig.debug

> +++ b/lib/Kconfig.debug

> @@ -327,7 +327,7 @@ config HEADERS_CHECK

>           relevant for userspace, say 'Y'.

>

>  config OPTIMIZE_INLINING

> -       bool "Allow compiler to uninline functions marked 'inline'"

> +       def_bool y

>         help

>           This option determines if the kernel forces gcc to inline the functions

>           developers have marked 'inline'. Doing so takes away freedom from gcc to

> @@ -338,8 +338,6 @@ config OPTIMIZE_INLINING

>           decision will become the default in the future. Until then this option

>           is there to test gcc for this.

>

> -         If unsure, say N.

> -

>  config DEBUG_SECTION_MISMATCH

>         bool "Enable full Section mismatch analysis"

>         help


Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Masahiro Yamada Sept. 26, 2019, 9:02 a.m. UTC | #3
Hi Geert,

On Thu, Sep 26, 2019 at 5:54 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>

> Hi Yamada-san,

>

> On Fri, Aug 30, 2019 at 5:44 AM Masahiro Yamada

> <yamada.masahiro@socionext.com> wrote:

> > Commit 9012d011660e ("compiler: allow all arches to enable

> > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> > this option. A couple of build errors were reported by randconfig,

> > but all of them have been ironed out.

> >

> > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> > (and it will simplify the 'inline' macro in compiler_types.h),

> > this commit changes it to always-on option. Going forward, the

> > compiler will always be allowed to not inline functions marked

> > 'inline'.

> >

> > This is not a problem for x86 since it has been long used by

> > arch/x86/configs/{x86_64,i386}_defconfig.

> >

> > I am keeping the config option just in case any problem crops up for

> > other architectures.

> >

> > The code clean-up will be done after confirming this is solid.

> >

> > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

>

> This breaks compiling drivers/video/fbdev/c2p*, as the functions in

> drivers/video/fbdev/c2p_core.h are no longer inlined, leading to calls

> to the non-existent function c2p_unsupported(), as reported by KISSKB:

>

> On Thu, Sep 26, 2019 at 5:02 AM <noreply@ellerman.id.au> wrote:

> > FAILED linux-next/m68k-defconfig/m68k Thu Sep 26, 12:58

> >

> > http://kisskb.ellerman.id.au/kisskb/buildresult/13973194/

> >

> > Commit:   Add linux-next specific files for 20190925

> >           d47175169c28eedd2cc2ab8c01f38764cb0269cc

> > Compiler: m68k-linux-gcc (GCC) 4.6.3 / GNU ld (GNU Binutils) 2.22

> >

> > Possible errors

> > ---------------

> >

> > c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported'

> > c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported'

> > c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported'

> > c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported'

> > make[1]: *** [Makefile:1074: vmlinux] Error 1

> > make: *** [Makefile:179: sub-make] Error 2

>

> I managed to reproduce this with gcc version 8.3.0 (Ubuntu

> 8.3.0-6ubuntu1~18.04.1) , and bisected the failure to commit

> 025f072e5823947c ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") .

>

> Marking the functions __always_inline instead of inline fixes that.

> Shall I send a patch to do that?



Yes, please.

But you do not need to touch _transp() or comp().
Only functions that call c2p_unsupported().


BTW, c2p_unsupported() can be replaced with BUILD_BUG().
This will break the build earlier in case
it cannot be optimized out.


-- 
Best Regards
Masahiro Yamada
Geert Uytterhoeven Sept. 26, 2019, 9:26 a.m. UTC | #4
Hi Yamada-san,

On Thu, Sep 26, 2019 at 11:03 AM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
> On Thu, Sep 26, 2019 at 5:54 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> > On Fri, Aug 30, 2019 at 5:44 AM Masahiro Yamada

> > <yamada.masahiro@socionext.com> wrote:

> > > Commit 9012d011660e ("compiler: allow all arches to enable

> > > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> > > this option. A couple of build errors were reported by randconfig,

> > > but all of them have been ironed out.

> > >

> > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> > > (and it will simplify the 'inline' macro in compiler_types.h),

> > > this commit changes it to always-on option. Going forward, the

> > > compiler will always be allowed to not inline functions marked

> > > 'inline'.

> > >

> > > This is not a problem for x86 since it has been long used by

> > > arch/x86/configs/{x86_64,i386}_defconfig.

> > >

> > > I am keeping the config option just in case any problem crops up for

> > > other architectures.

> > >

> > > The code clean-up will be done after confirming this is solid.

> > >

> > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

> >

> > This breaks compiling drivers/video/fbdev/c2p*, as the functions in

> > drivers/video/fbdev/c2p_core.h are no longer inlined, leading to calls

> > to the non-existent function c2p_unsupported(), as reported by KISSKB:

> >

> > On Thu, Sep 26, 2019 at 5:02 AM <noreply@ellerman.id.au> wrote:

> > > FAILED linux-next/m68k-defconfig/m68k Thu Sep 26, 12:58

> > >

> > > http://kisskb.ellerman.id.au/kisskb/buildresult/13973194/

> > >

> > > Commit:   Add linux-next specific files for 20190925

> > >           d47175169c28eedd2cc2ab8c01f38764cb0269cc

> > > Compiler: m68k-linux-gcc (GCC) 4.6.3 / GNU ld (GNU Binutils) 2.22

> > >

> > > Possible errors

> > > ---------------

> > >

> > > c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported'

> > > c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported'

> > > c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported'

> > > c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported'

> > > make[1]: *** [Makefile:1074: vmlinux] Error 1

> > > make: *** [Makefile:179: sub-make] Error 2

> >

> > I managed to reproduce this with gcc version 8.3.0 (Ubuntu

> > 8.3.0-6ubuntu1~18.04.1) , and bisected the failure to commit

> > 025f072e5823947c ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") .

> >

> > Marking the functions __always_inline instead of inline fixes that.

> > Shall I send a patch to do that?

>

>

> Yes, please.


OK, will do.

> But you do not need to touch _transp() or comp().

> Only functions that call c2p_unsupported().


However, the inlining of these functions is very performance-sensitive too.
So perhaps I should mark all of them __always_inline?

> BTW, c2p_unsupported() can be replaced with BUILD_BUG().

> This will break the build earlier in case

> it cannot be optimized out.


Right. Will do, too.

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Masahiro Yamada Sept. 26, 2019, 9:46 a.m. UTC | #5
Hi Geert,

On Thu, Sep 26, 2019 at 6:26 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>

> Hi Yamada-san,

>

> On Thu, Sep 26, 2019 at 11:03 AM Masahiro Yamada

> <yamada.masahiro@socionext.com> wrote:

> > On Thu, Sep 26, 2019 at 5:54 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> > > On Fri, Aug 30, 2019 at 5:44 AM Masahiro Yamada

> > > <yamada.masahiro@socionext.com> wrote:

> > > > Commit 9012d011660e ("compiler: allow all arches to enable

> > > > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> > > > this option. A couple of build errors were reported by randconfig,

> > > > but all of them have been ironed out.

> > > >

> > > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> > > > (and it will simplify the 'inline' macro in compiler_types.h),

> > > > this commit changes it to always-on option. Going forward, the

> > > > compiler will always be allowed to not inline functions marked

> > > > 'inline'.

> > > >

> > > > This is not a problem for x86 since it has been long used by

> > > > arch/x86/configs/{x86_64,i386}_defconfig.

> > > >

> > > > I am keeping the config option just in case any problem crops up for

> > > > other architectures.

> > > >

> > > > The code clean-up will be done after confirming this is solid.

> > > >

> > > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

> > >

> > > This breaks compiling drivers/video/fbdev/c2p*, as the functions in

> > > drivers/video/fbdev/c2p_core.h are no longer inlined, leading to calls

> > > to the non-existent function c2p_unsupported(), as reported by KISSKB:

> > >

> > > On Thu, Sep 26, 2019 at 5:02 AM <noreply@ellerman.id.au> wrote:

> > > > FAILED linux-next/m68k-defconfig/m68k Thu Sep 26, 12:58

> > > >

> > > > http://kisskb.ellerman.id.au/kisskb/buildresult/13973194/

> > > >

> > > > Commit:   Add linux-next specific files for 20190925

> > > >           d47175169c28eedd2cc2ab8c01f38764cb0269cc

> > > > Compiler: m68k-linux-gcc (GCC) 4.6.3 / GNU ld (GNU Binutils) 2.22

> > > >

> > > > Possible errors

> > > > ---------------

> > > >

> > > > c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported'

> > > > c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported'

> > > > c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported'

> > > > c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported'

> > > > make[1]: *** [Makefile:1074: vmlinux] Error 1

> > > > make: *** [Makefile:179: sub-make] Error 2

> > >

> > > I managed to reproduce this with gcc version 8.3.0 (Ubuntu

> > > 8.3.0-6ubuntu1~18.04.1) , and bisected the failure to commit

> > > 025f072e5823947c ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") .

> > >

> > > Marking the functions __always_inline instead of inline fixes that.

> > > Shall I send a patch to do that?

> >

> >

> > Yes, please.

>

> OK, will do.

>

> > But you do not need to touch _transp() or comp().

> > Only functions that call c2p_unsupported().

>

> However, the inlining of these functions is very performance-sensitive too.

> So perhaps I should mark all of them __always_inline?



I want to leave as much compiler's freedom as possible,
and the optimization criteria depends on CONFIG option.

When CONFIG_CC_OPTIMIZE_FOR_SIZE is chosen,
the optimization should respect the code size over performance,
and the compiler may prefer not inlining it.


So, the basic idea is, use __always_inline
only when otherwise the builds would be broken.


Masahiro Yamada




> > BTW, c2p_unsupported() can be replaced with BUILD_BUG().

> > This will break the build earlier in case

> > it cannot be optimized out.

>

> Right. Will do, too.

>

> Thanks!

>

> Gr{oetje,eeting}s,

>

>                         Geert

>

> --

> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

>

> In personal conversations with technical people, I call myself a hacker. But

> when I'm talking to journalists I just say "programmer" or something like that.

>                                 -- Linus Torvalds




-- 
Best Regards
Masahiro Yamada
Nicolas Saenz Julienne Sept. 27, 2019, 10:43 a.m. UTC | #6
On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote:
> Commit 9012d011660e ("compiler: allow all arches to enable

> CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> this option. A couple of build errors were reported by randconfig,

> but all of them have been ironed out.

> 

> Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> (and it will simplify the 'inline' macro in compiler_types.h),

> this commit changes it to always-on option. Going forward, the

> compiler will always be allowed to not inline functions marked

> 'inline'.

> 

> This is not a problem for x86 since it has been long used by

> arch/x86/configs/{x86_64,i386}_defconfig.

> 

> I am keeping the config option just in case any problem crops up for

> other architectures.

> 

> The code clean-up will be done after confirming this is solid.

> 

> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


[ Adding some ARM people as they might be able to help ]

This was found to cause a regression on a Raspberry Pi 2 B built with
bcm2835_defconfig which among other things has no SMP support.

The relevant logs (edited to remove the noise) are:

[    5.827333] Run /init as init process
Loading, please wait...
Failed to set SO_PASSCRED: Bad address
Failed to bind netlink socket: Bad address
Failed to create manager: Bad address
Failed to set SO_PASSCRED: Bad address
[    9.021623] systemd[1]: SO_PASSCRED failed: Bad address
[!!!!!!] Failed to start up manager.
[    9.079148] systemd[1]: Freezing execution.

I looked into it, it turns out that the call to get_user() in sock_setsockopt()
is returning -EFAULT. Down the assembly rabbit hole that get_user() is I
found-out that it's the macro 'check_uaccess' who's triggering the error.

I'm clueless at this point, so I hope you can give me some hints on what's
going bad here.

Regards,
Nicolas
Nicolas Saenz Julienne Sept. 27, 2019, 10:58 a.m. UTC | #7
On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote:
> Commit 9012d011660e ("compiler: allow all arches to enable

> CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> this option. A couple of build errors were reported by randconfig,

> but all of them have been ironed out.

> 

> Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> (and it will simplify the 'inline' macro in compiler_types.h),

> this commit changes it to always-on option. Going forward, the

> compiler will always be allowed to not inline functions marked

> 'inline'.

> 

> This is not a problem for x86 since it has been long used by

> arch/x86/configs/{x86_64,i386}_defconfig.

> 

> I am keeping the config option just in case any problem crops up for

> other architectures.

> 

> The code clean-up will be done after confirming this is solid.

> 

> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>


[ Resending as the mail delivery system failed to resolve some the hosts,
namely Masahiro's ]

[ Adding some ARM people as they might be able to help ]

This was found to cause a regression on a Raspberry Pi 2 B built with
bcm2835_defconfig which among other things has no SMP support.

The relevant logs (edited to remove the noise) are:

[    5.827333] Run /init as init process
Loading, please wait...
Failed to set SO_PASSCRED: Bad address
Failed to bind netlink socket: Bad address
Failed to create manager: Bad address
Failed to set SO_PASSCRED: Bad address
[    9.021623] systemd[1]: SO_PASSCRED failed: Bad address
[!!!!!!] Failed to start up manager.
[    9.079148] systemd[1]: Freezing execution.

I looked into it, it turns out that the call to get_user() in sock_setsockopt()
is returning -EFAULT. Down the assembly rabbit hole that get_user() is I
found-out that it's the macro 'check_uaccess' who's triggering the error.

I'm clueless at this point, so I hope you can give me some hints on what's
going bad here.

Regards,
Nicolas
Charles Keepax Sept. 27, 2019, 10:59 a.m. UTC | #8
On Fri, Sep 27, 2019 at 12:43:33PM +0200, Nicolas Saenz Julienne wrote:
> On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote:

> > Commit 9012d011660e ("compiler: allow all arches to enable

> > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> > this option. A couple of build errors were reported by randconfig,

> > but all of them have been ironed out.

> > 

> > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> > (and it will simplify the 'inline' macro in compiler_types.h),

> > this commit changes it to always-on option. Going forward, the

> > compiler will always be allowed to not inline functions marked

> > 'inline'.

> > 

> > This is not a problem for x86 since it has been long used by

> > arch/x86/configs/{x86_64,i386}_defconfig.

> > 

> > I am keeping the config option just in case any problem crops up for

> > other architectures.

> > 

> > The code clean-up will be done after confirming this is solid.

> > 

> > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

> 

> [ Adding some ARM people as they might be able to help ]

> 

> This was found to cause a regression on a Raspberry Pi 2 B built with

> bcm2835_defconfig which among other things has no SMP support.

> 

> The relevant logs (edited to remove the noise) are:

> 

> [    5.827333] Run /init as init process

> Loading, please wait...

> Failed to set SO_PASSCRED: Bad address

> Failed to bind netlink socket: Bad address

> Failed to create manager: Bad address

> Failed to set SO_PASSCRED: Bad address

> [    9.021623] systemd[1]: SO_PASSCRED failed: Bad address

> [!!!!!!] Failed to start up manager.

> [    9.079148] systemd[1]: Freezing execution.

> 

> I looked into it, it turns out that the call to get_user() in sock_setsockopt()

> is returning -EFAULT. Down the assembly rabbit hole that get_user() is I

> found-out that it's the macro 'check_uaccess' who's triggering the error.

> 

> I'm clueless at this point, so I hope you can give me some hints on what's

> going bad here.

> 


Not sure I can add much in terms of a solution, but I can at
least confirm I am seeing exactly the same issue, on my Zynq
based ARM system.

Thanks,
Charles
Nick Desaulniers Sept. 27, 2019, 10:08 p.m. UTC | #9
On Fri, Sep 27, 2019 at 3:43 AM Nicolas Saenz Julienne
<nsaenzjulienne@suse.de> wrote:
>

> On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote:

> > Commit 9012d011660e ("compiler: allow all arches to enable

> > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> > this option. A couple of build errors were reported by randconfig,

> > but all of them have been ironed out.

> >

> > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> > (and it will simplify the 'inline' macro in compiler_types.h),

> > this commit changes it to always-on option. Going forward, the

> > compiler will always be allowed to not inline functions marked

> > 'inline'.

> >

> > This is not a problem for x86 since it has been long used by

> > arch/x86/configs/{x86_64,i386}_defconfig.

> >

> > I am keeping the config option just in case any problem crops up for

> > other architectures.

> >

> > The code clean-up will be done after confirming this is solid.

> >

> > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

>

> [ Adding some ARM people as they might be able to help ]

>

> This was found to cause a regression on a Raspberry Pi 2 B built with

> bcm2835_defconfig which among other things has no SMP support.

>

> The relevant logs (edited to remove the noise) are:

>

> [    5.827333] Run /init as init process

> Loading, please wait...

> Failed to set SO_PASSCRED: Bad address

> Failed to bind netlink socket: Bad address

> Failed to create manager: Bad address

> Failed to set SO_PASSCRED: Bad address

> [    9.021623] systemd[1]: SO_PASSCRED failed: Bad address

> [!!!!!!] Failed to start up manager.

> [    9.079148] systemd[1]: Freezing execution.

>

> I looked into it, it turns out that the call to get_user() in sock_setsockopt()

> is returning -EFAULT. Down the assembly rabbit hole that get_user() is I

> found-out that it's the macro 'check_uaccess' who's triggering the error.

>

> I'm clueless at this point, so I hope you can give me some hints on what's

> going bad here.


So get_user() was passed a bad value/pointer from userspace? Do you
know which of the tree calls to get_user() from sock_setsockopt() is
failing?  (It's not immediately clear to me how this patch is at
fault, vs there just being a bug in the source somewhere).
-- 
Thanks,
~Nick Desaulniers
Linus Torvalds Sept. 27, 2019, 10:38 p.m. UTC | #10
On Fri, Sep 27, 2019 at 3:08 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>

> So get_user() was passed a bad value/pointer from userspace? Do you

> know which of the tree calls to get_user() from sock_setsockopt() is

> failing?  (It's not immediately clear to me how this patch is at

> fault, vs there just being a bug in the source somewhere).


Based on the error messages, the SO_PASSCRED ones are almost certainly
from the get_user() in net/core/sock.c: sock_setsockopt(), which just
does

        if (optlen < sizeof(int))
                return -EINVAL;

        if (get_user(val, (int __user *)optval))
                return -EFAULT;

        valbool = val ? 1 : 0;

but it's the other messages imply that a lot of other cases are
failing too (ie the "Failed to bind netlink socket" is, according to
google, a bind() that fails with the same EFAULT error).

There are probably even more failures that happen elsewhere and just
don't even syslog the fact. I'd guess that all get_user() calls just
fail, and those are the ones that happen to get printed out.

Now, _why_ it would fail, I have ni idea. There are several inlines in
the arm uaccess.h file, and it depends on other headers like
<asm/domain.h> with more inlines still - eg get/set_domain() etc.

Soem of that code is pretty subtle. They have fixed register usage
(but the asm macros actually check them). And the inline asms clobber
the link register, but they do seem to clearly _state_ that they
clobber it, so who knows.

Just based on the EFAULT, I'd _guess_ that it's some interaction with
the domain access control register (so that get/set_domain() thing).
But I'm not even sure that code is enabled for the Rpi2, so who
knows..

               Linus
Masahiro Yamada Sept. 30, 2019, 6:04 a.m. UTC | #11
Hi.

On Fri, Sep 27, 2019 at 7:58 PM Nicolas Saenz Julienne
<nsaenzjulienne@suse.de> wrote:
>

> On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote:

> > Commit 9012d011660e ("compiler: allow all arches to enable

> > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable

> > this option. A couple of build errors were reported by randconfig,

> > but all of them have been ironed out.

> >

> > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely

> > (and it will simplify the 'inline' macro in compiler_types.h),

> > this commit changes it to always-on option. Going forward, the

> > compiler will always be allowed to not inline functions marked

> > 'inline'.

> >

> > This is not a problem for x86 since it has been long used by

> > arch/x86/configs/{x86_64,i386}_defconfig.

> >

> > I am keeping the config option just in case any problem crops up for

> > other architectures.

> >

> > The code clean-up will be done after confirming this is solid.

> >

> > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>

>

> [ Resending as the mail delivery system failed to resolve some the hosts,

> namely Masahiro's ]

>

> [ Adding some ARM people as they might be able to help ]

>

> This was found to cause a regression on a Raspberry Pi 2 B built with

> bcm2835_defconfig which among other things has no SMP support.

>

> The relevant logs (edited to remove the noise) are:

>

> [    5.827333] Run /init as init process

> Loading, please wait...

> Failed to set SO_PASSCRED: Bad address

> Failed to bind netlink socket: Bad address

> Failed to create manager: Bad address

> Failed to set SO_PASSCRED: Bad address

> [    9.021623] systemd[1]: SO_PASSCRED failed: Bad address

> [!!!!!!] Failed to start up manager.

> [    9.079148] systemd[1]: Freezing execution.

>

> I looked into it, it turns out that the call to get_user() in sock_setsockopt()

> is returning -EFAULT. Down the assembly rabbit hole that get_user() is I

> found-out that it's the macro 'check_uaccess' who's triggering the error.

>

> I'm clueless at this point, so I hope you can give me some hints on what's

> going bad here.

>

> Regards,

> Nicolas

>

>


I posted a fix:
https://lore.kernel.org/patchwork/patch/1132459/

Thanks.



-- 
Best Regards
Masahiro Yamada
Will Deacon Sept. 30, 2019, 11:26 a.m. UTC | #12
On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote:
> On Fri, Sep 27, 2019 at 3:08 PM Nick Desaulniers

> <ndesaulniers@google.com> wrote:

> >

> > So get_user() was passed a bad value/pointer from userspace? Do you

> > know which of the tree calls to get_user() from sock_setsockopt() is

> > failing?  (It's not immediately clear to me how this patch is at

> > fault, vs there just being a bug in the source somewhere).

> 

> Based on the error messages, the SO_PASSCRED ones are almost certainly

> from the get_user() in net/core/sock.c: sock_setsockopt(), which just

> does

> 

>         if (optlen < sizeof(int))

>                 return -EINVAL;

> 

>         if (get_user(val, (int __user *)optval))

>                 return -EFAULT;

> 

>         valbool = val ? 1 : 0;

> 

> but it's the other messages imply that a lot of other cases are

> failing too (ie the "Failed to bind netlink socket" is, according to

> google, a bind() that fails with the same EFAULT error).

> 

> There are probably even more failures that happen elsewhere and just

> don't even syslog the fact. I'd guess that all get_user() calls just

> fail, and those are the ones that happen to get printed out.

> 

> Now, _why_ it would fail, I have ni idea. There are several inlines in

> the arm uaccess.h file, and it depends on other headers like

> <asm/domain.h> with more inlines still - eg get/set_domain() etc.

> 

> Soem of that code is pretty subtle. They have fixed register usage

> (but the asm macros actually check them). And the inline asms clobber

> the link register, but they do seem to clearly _state_ that they

> clobber it, so who knows.

> 

> Just based on the EFAULT, I'd _guess_ that it's some interaction with

> the domain access control register (so that get/set_domain() thing).

> But I'm not even sure that code is enabled for the Rpi2, so who

> knows..


FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local
variables marked as 'register' where GCC would do crazy things and end
up corrupting data, so I suspect the use of fixed registers in the arm
uaccess functions is hitting something similar:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111

Although this particular case couldn't be reproduced with GCC 9, prior
versions of the compiler get it wrong so I'm very much opposed to enabling
CONFIG_OPTIMIZE_INLINING by default on arm/arm64.

Will
Masahiro Yamada Sept. 30, 2019, 12:05 p.m. UTC | #13
On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote:
>

> On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote:

> > On Fri, Sep 27, 2019 at 3:08 PM Nick Desaulniers

> > <ndesaulniers@google.com> wrote:

> > >

> > > So get_user() was passed a bad value/pointer from userspace? Do you

> > > know which of the tree calls to get_user() from sock_setsockopt() is

> > > failing?  (It's not immediately clear to me how this patch is at

> > > fault, vs there just being a bug in the source somewhere).

> >

> > Based on the error messages, the SO_PASSCRED ones are almost certainly

> > from the get_user() in net/core/sock.c: sock_setsockopt(), which just

> > does

> >

> >         if (optlen < sizeof(int))

> >                 return -EINVAL;

> >

> >         if (get_user(val, (int __user *)optval))

> >                 return -EFAULT;

> >

> >         valbool = val ? 1 : 0;

> >

> > but it's the other messages imply that a lot of other cases are

> > failing too (ie the "Failed to bind netlink socket" is, according to

> > google, a bind() that fails with the same EFAULT error).

> >

> > There are probably even more failures that happen elsewhere and just

> > don't even syslog the fact. I'd guess that all get_user() calls just

> > fail, and those are the ones that happen to get printed out.

> >

> > Now, _why_ it would fail, I have ni idea. There are several inlines in

> > the arm uaccess.h file, and it depends on other headers like

> > <asm/domain.h> with more inlines still - eg get/set_domain() etc.

> >

> > Soem of that code is pretty subtle. They have fixed register usage

> > (but the asm macros actually check them). And the inline asms clobber

> > the link register, but they do seem to clearly _state_ that they

> > clobber it, so who knows.

> >

> > Just based on the EFAULT, I'd _guess_ that it's some interaction with

> > the domain access control register (so that get/set_domain() thing).

> > But I'm not even sure that code is enabled for the Rpi2, so who

> > knows..

>

> FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local

> variables marked as 'register' where GCC would do crazy things and end

> up corrupting data, so I suspect the use of fixed registers in the arm

> uaccess functions is hitting something similar:

>

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111



No. Not similar at all.


I fixed it already. See
https://lore.kernel.org/patchwork/patch/1132459/


The problems are fixable by writing correct code.




I think we discussed this already.

- There is nothing arch-specific in CONFIG_OPTIMIZE_INLINING

- 'inline' is just a hint. It does not guarantee function inlining.
  This is standard.

- The kernel macrofies 'inline' to add __attribute__((__always_inline__))
  This terrible hack must end.

-  __attribute__((__always_inline__)) takes aways compiler's freedom,
   and prevents it from optimizing the code for -O2, -Os, or whatever.





> Although this particular case couldn't be reproduced with GCC 9, prior

> versions of the compiler get it wrong so I'm very much opposed to enabling

> CONFIG_OPTIMIZE_INLINING by default on arm/arm64.

>

> Will



--
Best Regards
Masahiro Yamada
Will Deacon Sept. 30, 2019, 12:18 p.m. UTC | #14
On Mon, Sep 30, 2019 at 09:05:11PM +0900, Masahiro Yamada wrote:
> On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote:

> > On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote:

> > > Soem of that code is pretty subtle. They have fixed register usage

> > > (but the asm macros actually check them). And the inline asms clobber

> > > the link register, but they do seem to clearly _state_ that they

> > > clobber it, so who knows.

> > >

> > > Just based on the EFAULT, I'd _guess_ that it's some interaction with

> > > the domain access control register (so that get/set_domain() thing).

> > > But I'm not even sure that code is enabled for the Rpi2, so who

> > > knows..

> >

> > FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local

> > variables marked as 'register' where GCC would do crazy things and end

> > up corrupting data, so I suspect the use of fixed registers in the arm

> > uaccess functions is hitting something similar:

> >

> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111

> 

> No. Not similar at all.


They're similar in that enabling CONFIG_OPTIMIZE_INLINING causes register
variables to go wrong. I agree that the ARM code looks dodgy with
that call to uaccess_save_and_enable(), but there are __asmeq macros
in there to try to catch that, so it's still very fishy.

> I fixed it already. See

> https://lore.kernel.org/patchwork/patch/1132459/


You fixed the specific case above for 32-bit ARM, but the arm64 case
is due to a compiler bug. As it happens, we've reworked our atomics
in 5.4 so that particular issue no longer triggers, but the fact remains
that GCC has been shown to screw up explicit register allocation for
perfectly legitimate code when giving the flexibility to move code out
of line.

> The problems are fixable by writing correct code.


Right, in the compiler ;)

> I think we discussed this already.


We did?

> - There is nothing arch-specific in CONFIG_OPTIMIZE_INLINING


Apart from the bugs... and even then, that's just based on reports.

> - 'inline' is just a hint. It does not guarantee function inlining.

>   This is standard.

> 

> - The kernel macrofies 'inline' to add __attribute__((__always_inline__))

>   This terrible hack must end.


I'm all for getting rid of hacks, but not at the cost of correctness.

> -  __attribute__((__always_inline__)) takes aways compiler's freedom,

>    and prevents it from optimizing the code for -O2, -Os, or whatever.


s/whatever/miscompiling the code/

If it helps, here is more information about the arm64 failure which
triggered the GCC bugzilla:

https://www.spinics.net/lists/arm-kernel/msg730329.html

Will
Nick Desaulniers Sept. 30, 2019, 9:50 p.m. UTC | #15
On Mon, Sep 30, 2019 at 5:18 AM Will Deacon <will@kernel.org> wrote:
>

> On Mon, Sep 30, 2019 at 09:05:11PM +0900, Masahiro Yamada wrote:

> > On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote:

> > > On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote:

> > > > Soem of that code is pretty subtle. They have fixed register usage

> > > > (but the asm macros actually check them). And the inline asms clobber

> > > > the link register, but they do seem to clearly _state_ that they

> > > > clobber it, so who knows.

> > > >

> > > > Just based on the EFAULT, I'd _guess_ that it's some interaction with

> > > > the domain access control register (so that get/set_domain() thing).

> > > > But I'm not even sure that code is enabled for the Rpi2, so who

> > > > knows..

> > >

> > > FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local

> > > variables marked as 'register' where GCC would do crazy things and end

> > > up corrupting data, so I suspect the use of fixed registers in the arm

> > > uaccess functions is hitting something similar:

> > >

> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111

> >

> > No. Not similar at all.

>

> They're similar in that enabling CONFIG_OPTIMIZE_INLINING causes register

> variables to go wrong. I agree that the ARM code looks dodgy with

> that call to uaccess_save_and_enable(), but there are __asmeq macros

> in there to try to catch that, so it's still very fishy.

>

> > I fixed it already. See

> > https://lore.kernel.org/patchwork/patch/1132459/

>

> You fixed the specific case above for 32-bit ARM, but the arm64 case

> is due to a compiler bug. As it happens, we've reworked our atomics

> in 5.4 so that particular issue no longer triggers, but the fact remains

> that GCC has been shown to screw up explicit register allocation for

> perfectly legitimate code when giving the flexibility to move code out

> of line.


So __attribute__((always_inline)) doesn't guarantee that code will be
inlined.  For instance in LLVM's inliner, it asks/answers "should I
inline" and "can I inline."  "Should" has to do with a cost model, and
is very heuristic-y.  "Can" has more to do with the transforms, and
whether they're all implemented and safe.  If you if you say
__attribute__((always_inline)), the answer to "can I inline this" can
still be *no*.  The only way to guarantee inlining is via the C
preprocessor.  The only way to prevent inlining is via
__attribute__((no_inline)).  inline and __attribute__((always_inline))
are a heuristic laden mess and should not be relied upon.  I would
also look closely at code that *requires* inlining or the lack there
of to be correct.  That the kernel no longer compiles at -O0 is not a
good thing IMO, and hurts developers that want a short
compile/execute/debug cycle.

In this case, if there's a known codegen bug in a particular compiler
or certain versions of it, I recommend the use of either the C
preprocessor or __attribute__((no_inline)) to get the desired behavior
localized to the function in question, and for us to proceed with
Masahiro's cleanup.

The comment above the use of CONFIG_OPTIMIZE_INLINING in
include/linux/compiler_types.h says:
  * Force always-inline if the user requests it so via the .config.
Which makes me grimace (__attribute__((always_inline)) doesn't *force*
anything as per above), and the idea that forcing things marked inline
to also be __attribute__((always_inline)) is an "optimization" (re:
the name of the config; CONFIG_OPTIMIZE_INLINING) is also highly
suspect.  Aggressive inlining leads to image size bloat, instruction
cache and register pressure; it is not exclusively an optimization.

>

> > The problems are fixable by writing correct code.

>

> Right, in the compiler ;)

>

> > I think we discussed this already.

>

> We did?

>

> > - There is nothing arch-specific in CONFIG_OPTIMIZE_INLINING

>

> Apart from the bugs... and even then, that's just based on reports.

>

> > - 'inline' is just a hint. It does not guarantee function inlining.

> >   This is standard.

> >

> > - The kernel macrofies 'inline' to add __attribute__((__always_inline__))

> >   This terrible hack must end.

>

> I'm all for getting rid of hacks, but not at the cost of correctness.

>

> > -  __attribute__((__always_inline__)) takes aways compiler's freedom,

> >    and prevents it from optimizing the code for -O2, -Os, or whatever.

>

> s/whatever/miscompiling the code/

>

> If it helps, here is more information about the arm64 failure which

> triggered the GCC bugzilla:

>

> https://www.spinics.net/lists/arm-kernel/msg730329.html

>

> Will




-- 
Thanks,
~Nick Desaulniers
Miguel Ojeda Sept. 30, 2019, 10:08 p.m. UTC | #16
On Mon, Sep 30, 2019 at 11:50 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>

> So __attribute__((always_inline)) doesn't guarantee that code will be

> inlined.  [...] inline and __attribute__((always_inline))

> are a heuristic laden mess and should not be relied upon.


Small note: in GCC, __attribute__((always_inline)) is documented as
actually guaranteeing to either inline or error otherwise (although
see the remark for indirect calls):

    "Failure to inline such a function is diagnosed as an error. Note
that if such a function is called indirectly the compiler may or may
not inline it depending on optimization level and a failure to inline
an indirect call may or may not be diagnosed."

As for LLVM/Clang, no idea, since it does not say anything about it in
the docs -- but from what you say, it is a weaker guarantee.

Cheers,
Miguel
Nick Desaulniers Sept. 30, 2019, 10:34 p.m. UTC | #17
On Mon, Sep 30, 2019 at 3:08 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>

> On Mon, Sep 30, 2019 at 11:50 PM Nick Desaulniers

> <ndesaulniers@google.com> wrote:

> >

> > So __attribute__((always_inline)) doesn't guarantee that code will be

> > inlined.  [...] inline and __attribute__((always_inline))

> > are a heuristic laden mess and should not be relied upon.

>

> Small note: in GCC, __attribute__((always_inline)) is documented as

> actually guaranteeing to either inline or error otherwise (although

> see the remark for indirect calls):

>

>     "Failure to inline such a function is diagnosed as an error. Note


Not an error, but a warning at least: https://godbolt.org/z/_V5im1.

That's interesting, so it has multiple semantics, because it's also
documented to inline even when no optimizations are specified.  So
when someone uses __attribute__((always_inline)) without a comment,
it's not clear whether they mean for there to be a warning when this
is not inlined, or for it to be inlined at -O0 (guess not for the
kernel), or both.  If the kernel wants to enforce the former, why not
set `-Werror=attributes`?  Maybe that warning is too broad?  Seems
like a recipe for subtly broken code found at runtime, when we'd
rather have stronger compile time guarantees.

> that if such a function is called indirectly the compiler may or may

> not inline it depending on optimization level and a failure to inline

> an indirect call may or may not be diagnosed."

>

> As for LLVM/Clang, no idea, since it does not say anything about it in

> the docs -- but from what you say, it is a weaker guarantee.


Filed https://bugs.llvm.org/show_bug.cgi?id=43517
-- 
Thanks,
~Nick Desaulniers
Will Deacon Oct. 1, 2019, 9:28 a.m. UTC | #18
Hi Nick,

On Mon, Sep 30, 2019 at 02:50:10PM -0700, Nick Desaulniers wrote:
> On Mon, Sep 30, 2019 at 5:18 AM Will Deacon <will@kernel.org> wrote:

> > On Mon, Sep 30, 2019 at 09:05:11PM +0900, Masahiro Yamada wrote:

> > > On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote:

> > > > FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local

> > > > variables marked as 'register' where GCC would do crazy things and end

> > > > up corrupting data, so I suspect the use of fixed registers in the arm

> > > > uaccess functions is hitting something similar:

> > > >

> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111

> > >

> > > No. Not similar at all.

> >

> > They're similar in that enabling CONFIG_OPTIMIZE_INLINING causes register

> > variables to go wrong. I agree that the ARM code looks dodgy with

> > that call to uaccess_save_and_enable(), but there are __asmeq macros

> > in there to try to catch that, so it's still very fishy.

> >

> > > I fixed it already. See

> > > https://lore.kernel.org/patchwork/patch/1132459/

> >

> > You fixed the specific case above for 32-bit ARM, but the arm64 case

> > is due to a compiler bug. As it happens, we've reworked our atomics

> > in 5.4 so that particular issue no longer triggers, but the fact remains

> > that GCC has been shown to screw up explicit register allocation for

> > perfectly legitimate code when giving the flexibility to move code out

> > of line.

> 

> So __attribute__((always_inline)) doesn't guarantee that code will be

> inlined.  For instance in LLVM's inliner, it asks/answers "should I

> inline" and "can I inline."  "Should" has to do with a cost model, and

> is very heuristic-y.  "Can" has more to do with the transforms, and

> whether they're all implemented and safe.  If you if you say

> __attribute__((always_inline)), the answer to "can I inline this" can

> still be *no*.  The only way to guarantee inlining is via the C

> preprocessor.  The only way to prevent inlining is via

> __attribute__((no_inline)).  inline and __attribute__((always_inline))

> are a heuristic laden mess and should not be relied upon.  I would

> also look closely at code that *requires* inlining or the lack there

> of to be correct.  That the kernel no longer compiles at -O0 is not a

> good thing IMO, and hurts developers that want a short

> compile/execute/debug cycle.

> 

> In this case, if there's a known codegen bug in a particular compiler

> or certain versions of it, I recommend the use of either the C

> preprocessor or __attribute__((no_inline)) to get the desired behavior

> localized to the function in question, and for us to proceed with

> Masahiro's cleanup.


Hmm, I don't see how that would help. The problem occurs when things
are moved out of line by the compiler (see below).

> The comment above the use of CONFIG_OPTIMIZE_INLINING in

> include/linux/compiler_types.h says:

>   * Force always-inline if the user requests it so via the .config.

> Which makes me grimace (__attribute__((always_inline)) doesn't *force*

> anything as per above), and the idea that forcing things marked inline

> to also be __attribute__((always_inline)) is an "optimization" (re:

> the name of the config; CONFIG_OPTIMIZE_INLINING) is also highly

> suspect.  Aggressive inlining leads to image size bloat, instruction

> cache and register pressure; it is not exclusively an optimization.


Agreed on all of this, but the fact remains that GCC has been shown to
*miscompile* the arm64 kernel with CONFIG_OPTIMIZE_INLINING=y. Please,
look at this thread:

	https://www.spinics.net/lists/arm-kernel/msg730329.html
	https://www.spinics.net/lists/arm-kernel/msg730512.html

GCC decides to pull an atomic operation out-of-line and, in doing so,
gets the register allocations subtly wrong when passing a 'register'
variable into an inline asm. I would like to avoid this sort of thing
happening, since it can result in really nasty bugs that manifest at
runtime and are extremely difficult to debug, which is why I would much
prefer not to have this option on by default for arm64. I sent a patch
already:

	https://lkml.kernel.org/r/20190930114540.27498-1-will@kernel.org

and I'm happy to spin a v2 which depends on !CC_IS_CLANG as well.

Reducing the instruction cache footprint is great, but not if the
resulting code is broken!

Will
Masahiro Yamada Oct. 1, 2019, 9:39 a.m. UTC | #19
Hi Will,

On Mon, Sep 30, 2019 at 9:18 PM Will Deacon <will@kernel.org> wrote:
>

> On Mon, Sep 30, 2019 at 09:05:11PM +0900, Masahiro Yamada wrote:

> > On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote:

> > > On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote:

> > > > Soem of that code is pretty subtle. They have fixed register usage

> > > > (but the asm macros actually check them). And the inline asms clobber

> > > > the link register, but they do seem to clearly _state_ that they

> > > > clobber it, so who knows.

> > > >

> > > > Just based on the EFAULT, I'd _guess_ that it's some interaction with

> > > > the domain access control register (so that get/set_domain() thing).

> > > > But I'm not even sure that code is enabled for the Rpi2, so who

> > > > knows..

> > >

> > > FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local

> > > variables marked as 'register' where GCC would do crazy things and end

> > > up corrupting data, so I suspect the use of fixed registers in the arm

> > > uaccess functions is hitting something similar:

> > >

> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111

> >

> > No. Not similar at all.

>

> They're similar in that enabling CONFIG_OPTIMIZE_INLINING causes register

> variables to go wrong.


You are just looking at the result, not the cause.


> I agree that the ARM code looks dodgy with

> that call to uaccess_save_and_enable(), but there are __asmeq macros

> in there to try to catch that, so it's still very fishy.


I am totally fine with double-checking
the output from the compiler.


>

> > I fixed it already. See

> > https://lore.kernel.org/patchwork/patch/1132459/

>

> You fixed the specific case above for 32-bit ARM, but the arm64 case

> is due to a compiler bug. As it happens, we've reworked our atomics

> in 5.4 so that particular issue no longer triggers, but the fact remains

> that GCC has been shown to screw up explicit register allocation for

> perfectly legitimate code when giving the flexibility to move code out

> of line.

>

> > The problems are fixable by writing correct code.

>

> Right, in the compiler ;)


Not always.

You showed a compiler bug case for arm64.
The 32bit ARM is not the case.


>

> > I think we discussed this already.

>

> We did?



https://www.spinics.net/lists/arm-kernel/msg754567.html

This is a bug in the kernel code.



> > - There is nothing arch-specific in CONFIG_OPTIMIZE_INLINING

>

> Apart from the bugs... and even then, that's just based on reports.

>

> > - 'inline' is just a hint. It does not guarantee function inlining.

> >   This is standard.

> >

> > - The kernel macrofies 'inline' to add __attribute__((__always_inline__))

> >   This terrible hack must end.

>

> I'm all for getting rid of hacks, but not at the cost of correctness.

>

> > -  __attribute__((__always_inline__)) takes aways compiler's freedom,

> >    and prevents it from optimizing the code for -O2, -Os, or whatever.

>

> s/whatever/miscompiling the code/

>

> If it helps, here is more information about the arm64 failure which

> triggered the GCC bugzilla:

>

> https://www.spinics.net/lists/arm-kernel/msg730329.html

>

> Will


You put multiple references here and there,
but they are all about arch_atomic64_dec_if_positive().


--
Best Regards
Masahiro Yamada
Will Deacon Oct. 1, 2019, 10:40 a.m. UTC | #20
On Tue, Oct 01, 2019 at 06:39:34PM +0900, Masahiro Yamada wrote:
> On Mon, Sep 30, 2019 at 9:18 PM Will Deacon <will@kernel.org> wrote:

> > I agree that the ARM code looks dodgy with

> > that call to uaccess_save_and_enable(), but there are __asmeq macros

> > in there to try to catch that, so it's still very fishy.

> 

> I am totally fine with double-checking

> the output from the compiler.


Sure, but don't you find it odd that those checks didn't fire, and instead
the failure occurred at runtime?

> > > I fixed it already. See

> > > https://lore.kernel.org/patchwork/patch/1132459/

> >

> > You fixed the specific case above for 32-bit ARM, but the arm64 case

> > is due to a compiler bug. As it happens, we've reworked our atomics

> > in 5.4 so that particular issue no longer triggers, but the fact remains

> > that GCC has been shown to screw up explicit register allocation for

> > perfectly legitimate code when giving the flexibility to move code out

> > of line.

> >

> > > The problems are fixable by writing correct code.

> >

> > Right, in the compiler ;)

> 

> Not always.

> 

> You showed a compiler bug case for arm64.

> The 32bit ARM is not the case.


I would be /very/ surprised if the same compiler bug didn't also apply
to ARM, which is why I'm championing the conservative approach of restoring
the old default behaviour rather than run the risk of runtime failures.

> > If it helps, here is more information about the arm64 failure which

> > triggered the GCC bugzilla:

> >

> > https://www.spinics.net/lists/arm-kernel/msg730329.html

> >

> You put multiple references here and there,

> but they are all about arch_atomic64_dec_if_positive().


Right, but that's because it was the atomic64 selftest code which triggered
the compiler bug in practice. Without a better understanding of why GCC
got this wrong, we can't assume that no other register variables are
affected.

Will
Nick Desaulniers Oct. 1, 2019, 4:32 p.m. UTC | #21
On Tue, Oct 1, 2019 at 2:28 AM Will Deacon <will@kernel.org> wrote:
>

> Hi Nick,

>

> On Mon, Sep 30, 2019 at 02:50:10PM -0700, Nick Desaulniers wrote:

> > So __attribute__((always_inline)) doesn't guarantee that code will be

> > inlined.  For instance in LLVM's inliner, it asks/answers "should I

> > inline" and "can I inline."  "Should" has to do with a cost model, and

> > is very heuristic-y.  "Can" has more to do with the transforms, and

> > whether they're all implemented and safe.  If you if you say

> > __attribute__((always_inline)), the answer to "can I inline this" can

> > still be *no*.  The only way to guarantee inlining is via the C

> > preprocessor.  The only way to prevent inlining is via

> > __attribute__((no_inline)).  inline and __attribute__((always_inline))

> > are a heuristic laden mess and should not be relied upon.  I would

> > also look closely at code that *requires* inlining or the lack there

> > of to be correct.  That the kernel no longer compiles at -O0 is not a

> > good thing IMO, and hurts developers that want a short

> > compile/execute/debug cycle.

> >

> > In this case, if there's a known codegen bug in a particular compiler

> > or certain versions of it, I recommend the use of either the C

> > preprocessor or __attribute__((no_inline)) to get the desired behavior

> > localized to the function in question, and for us to proceed with

> > Masahiro's cleanup.

>

> Hmm, I don't see how that would help. The problem occurs when things

> are moved out of line by the compiler (see below).


It's being moved out of line because __attribute__((always_inline)) or
just inline provide no guarantees that outlining does not occur.  It
would help to make functions that need to be inlined macros, because
the C preprocessor doesn't have that issue.

>

> > The comment above the use of CONFIG_OPTIMIZE_INLINING in

> > include/linux/compiler_types.h says:

> >   * Force always-inline if the user requests it so via the .config.

> > Which makes me grimace (__attribute__((always_inline)) doesn't *force*

> > anything as per above), and the idea that forcing things marked inline

> > to also be __attribute__((always_inline)) is an "optimization" (re:

> > the name of the config; CONFIG_OPTIMIZE_INLINING) is also highly

> > suspect.  Aggressive inlining leads to image size bloat, instruction

> > cache and register pressure; it is not exclusively an optimization.

>

> Agreed on all of this, but the fact remains that GCC has been shown to

> *miscompile* the arm64 kernel with CONFIG_OPTIMIZE_INLINING=y. Please,

> look at this thread:

>

>         https://www.spinics.net/lists/arm-kernel/msg730329.html

>         https://www.spinics.net/lists/arm-kernel/msg730512.html

>

> GCC decides to pull an atomic operation out-of-line and, in doing so,


If the function is incorrect unless inlined, use a macro.

> gets the register allocations subtly wrong when passing a 'register'

> variable into an inline asm. I would like to avoid this sort of thing

> happening, since it can result in really nasty bugs that manifest at

> runtime and are extremely difficult to debug, which is why I would much

> prefer not to have this option on by default for arm64. I sent a patch

> already:

>

>         https://lkml.kernel.org/r/20190930114540.27498-1-will@kernel.org

>

> and I'm happy to spin a v2 which depends on !CC_IS_CLANG as well.


For small things like whether we mark a function always_inline or not,
I think it's simpler to just keep the code consistent between
compilers, even if it's to work around a bug in one compiler.  A
comment in the code would be sufficient.

>

> Reducing the instruction cache footprint is great, but not if the

> resulting code is broken!


You don't have to convince compiler folks about correctness. ;)
Correctness trumps all, especially performance.
-- 
Thanks,
~Nick Desaulniers
Will Deacon Oct. 1, 2019, 5:01 p.m. UTC | #22
On Tue, Oct 01, 2019 at 09:32:25AM -0700, Nick Desaulniers wrote:
> On Tue, Oct 1, 2019 at 2:28 AM Will Deacon <will@kernel.org> wrote:

> > On Mon, Sep 30, 2019 at 02:50:10PM -0700, Nick Desaulniers wrote:

> > > In this case, if there's a known codegen bug in a particular compiler

> > > or certain versions of it, I recommend the use of either the C

> > > preprocessor or __attribute__((no_inline)) to get the desired behavior

> > > localized to the function in question, and for us to proceed with

> > > Masahiro's cleanup.

> >

> > Hmm, I don't see how that would help. The problem occurs when things

> > are moved out of line by the compiler (see below).

> 

> It's being moved out of line because __attribute__((always_inline)) or

> just inline provide no guarantees that outlining does not occur.  It

> would help to make functions that need to be inlined macros, because

> the C preprocessor doesn't have that issue.


Hmm, let me try to put it another way: enabling CONFIG_OPTIMIZE_INLINING
has been shown to cause miscompilation with many versions of GCC on arm64
because the compiler moves things out of line that it otherwise doesn't
appear to do. I don't see how __attribute__((no_inline)) helps with that,
and replacing static functions with macros isn't great for type-checking
and argument evaluation.

> > > The comment above the use of CONFIG_OPTIMIZE_INLINING in

> > > include/linux/compiler_types.h says:

> > >   * Force always-inline if the user requests it so via the .config.

> > > Which makes me grimace (__attribute__((always_inline)) doesn't *force*

> > > anything as per above), and the idea that forcing things marked inline

> > > to also be __attribute__((always_inline)) is an "optimization" (re:

> > > the name of the config; CONFIG_OPTIMIZE_INLINING) is also highly

> > > suspect.  Aggressive inlining leads to image size bloat, instruction

> > > cache and register pressure; it is not exclusively an optimization.

> >

> > Agreed on all of this, but the fact remains that GCC has been shown to

> > *miscompile* the arm64 kernel with CONFIG_OPTIMIZE_INLINING=y. Please,

> > look at this thread:

> >

> >         https://www.spinics.net/lists/arm-kernel/msg730329.html

> >         https://www.spinics.net/lists/arm-kernel/msg730512.html

> >

> > GCC decides to pull an atomic operation out-of-line and, in doing so,

> 

> If the function is incorrect unless inlined, use a macro.


The function is correct per the GCC documentation regarding register
variables and inline asm.

> > Reducing the instruction cache footprint is great, but not if the

> > resulting code is broken!

> 

> You don't have to convince compiler folks about correctness. ;)

> Correctness trumps all, especially performance.


Then why is this conversation so difficult? :(

Will
Nick Desaulniers Oct. 1, 2019, 5:44 p.m. UTC | #23
On Tue, Oct 1, 2019 at 10:01 AM Will Deacon <will@kernel.org> wrote:
>

> On Tue, Oct 01, 2019 at 09:32:25AM -0700, Nick Desaulniers wrote:

> > On Tue, Oct 1, 2019 at 2:28 AM Will Deacon <will@kernel.org> wrote:

> > > On Mon, Sep 30, 2019 at 02:50:10PM -0700, Nick Desaulniers wrote:

> > > > In this case, if there's a known codegen bug in a particular compiler

> > > > or certain versions of it, I recommend the use of either the C

> > > > preprocessor or __attribute__((no_inline)) to get the desired behavior

> > > > localized to the function in question, and for us to proceed with

> > > > Masahiro's cleanup.

> > >

> > > Hmm, I don't see how that would help. The problem occurs when things

> > > are moved out of line by the compiler (see below).

> >

> > It's being moved out of line because __attribute__((always_inline)) or

> > just inline provide no guarantees that outlining does not occur.  It

> > would help to make functions that need to be inlined macros, because

> > the C preprocessor doesn't have that issue.

>

> Hmm, let me try to put it another way: enabling CONFIG_OPTIMIZE_INLINING

> has been shown to cause miscompilation with many versions of GCC on arm64

> because the compiler moves things out of line that it otherwise doesn't

> appear to do.


Yes.  That code should use the C preprocessor to force inlining. Then
you can enable CONFIG_OPTIMIZE_INLINING for arm64.

> I don't see how __attribute__((no_inline)) helps with that,


Because that *wasn't* the point of what I said about
__attribute__((no_inline)). My point is to use the C preprocessor to
guarantee that no outlining occurs.

My point related to __attribute__((no_inline)) was the simple decision
matrix that should be used:

Do I need strong guarantees that my code must be inlined? Use the C
preprocessor.
Do I need strong guarantees that my code is *not* to be inlined? Use
__attribute__((no_inline)).

The *former* is what applies here, not the latter.

inline and __attribute__((always_inline)) don't provide strong
guarantees (despite their fun sounding names;
-funsafe-math-optimizations/-Ofast being the poster child for "that
sounds nice, I would like my math to be fun, safe, and fast, let's use
that, WCGW?").  The more we use them, the more we continuously be
bitten by bugs related to outlining. Like the one you cited and the
arm32 patch Masahiro wrote.

Would using the C preprocessor to force inlining fix the GCC/arm64
miscompilation bug you described above? I suspect it would.

> and replacing static functions with macros isn't great for type-checking

> and argument evaluation.


See typecheck (include/linux/typecheck.h).

> > You don't have to convince compiler folks about correctness. ;)

> > Correctness trumps all, especially performance.

>

> Then why is this conversation so difficult? :(


I apologize; I don't mean to be difficult.  I would just like to avoid
surprises when code written with the assumption that it will be
inlined is not.  It sounds like we found one issue in arm32 and one in
arm64 related to outlining.  If we fix those two cases, I think we're
close to proceeding with Masahiro's cleanup, which I view as a good
thing for the health of the Linux kernel codebase.
-- 
Thanks,
~Nick Desaulniers
Russell King (Oracle) Oct. 1, 2019, 5:55 p.m. UTC | #24
On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:
> I apologize; I don't mean to be difficult.  I would just like to avoid

> surprises when code written with the assumption that it will be

> inlined is not.  It sounds like we found one issue in arm32 and one in

> arm64 related to outlining.  If we fix those two cases, I think we're

> close to proceeding with Masahiro's cleanup, which I view as a good

> thing for the health of the Linux kernel codebase.


Except, using the C preprocessor for this turns the arm32 code into
yuck:

1. We'd need to turn get_domain() and set_domain() into multi-line
   preprocessor macro definitions, using the GCC ({ }) extension
   so that get_domain() can return a value.

2. uaccess_save_and_enable() and uaccess_restore() also need to
   become preprocessor macro definitions too.

So, we end up with multiple levels of nested preprocessor macros.
When something goes wrong, the compiler warning/error message is
going to be utterly _horrid_.

Now, as to whether an __attribute__((always_inline)) can or can not
be inlined...

`always_inline'
     Generally, functions are not inlined unless optimization is
     specified.  For functions declared inline, this attribute inlines
     the function even if no optimization level is specified.

Is this another instance of the compiler folk changing the rules of
already documented semantics?  This says nothing about "might not be
inlined if someone passes some random combination of -f flags".

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Nick Desaulniers Oct. 1, 2019, 6 p.m. UTC | #25
On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

> On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:

> > I apologize; I don't mean to be difficult.  I would just like to avoid

> > surprises when code written with the assumption that it will be

> > inlined is not.  It sounds like we found one issue in arm32 and one in

> > arm64 related to outlining.  If we fix those two cases, I think we're

> > close to proceeding with Masahiro's cleanup, which I view as a good

> > thing for the health of the Linux kernel codebase.

>

> Except, using the C preprocessor for this turns the arm32 code into

> yuck:

>

> 1. We'd need to turn get_domain() and set_domain() into multi-line

>    preprocessor macro definitions, using the GCC ({ }) extension

>    so that get_domain() can return a value.

>

> 2. uaccess_save_and_enable() and uaccess_restore() also need to

>    become preprocessor macro definitions too.

>

> So, we end up with multiple levels of nested preprocessor macros.

> When something goes wrong, the compiler warning/error message is

> going to be utterly _horrid_.


That's why I preferred V1 of Masahiro's patch, that fixed the inline
asm not to make use of caller saved registers before calling a
function that might not be inlined.
-- 
Thanks,
~Nick Desaulniers
Russell King (Oracle) Oct. 1, 2019, 6:14 p.m. UTC | #26
On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote:
> On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:

> > > I apologize; I don't mean to be difficult.  I would just like to avoid

> > > surprises when code written with the assumption that it will be

> > > inlined is not.  It sounds like we found one issue in arm32 and one in

> > > arm64 related to outlining.  If we fix those two cases, I think we're

> > > close to proceeding with Masahiro's cleanup, which I view as a good

> > > thing for the health of the Linux kernel codebase.

> >

> > Except, using the C preprocessor for this turns the arm32 code into

> > yuck:

> >

> > 1. We'd need to turn get_domain() and set_domain() into multi-line

> >    preprocessor macro definitions, using the GCC ({ }) extension

> >    so that get_domain() can return a value.

> >

> > 2. uaccess_save_and_enable() and uaccess_restore() also need to

> >    become preprocessor macro definitions too.

> >

> > So, we end up with multiple levels of nested preprocessor macros.

> > When something goes wrong, the compiler warning/error message is

> > going to be utterly _horrid_.

> 

> That's why I preferred V1 of Masahiro's patch, that fixed the inline

> asm not to make use of caller saved registers before calling a

> function that might not be inlined.


... which I objected to based on the fact that this uaccess stuff is
supposed to add protection against the kernel being fooled into
accessing userspace when it shouldn't.  The whole intention there is
that [sg]et_domain(), and uaccess_*() are _always_ inlined as close
as possible to the call site of the accessor touching userspace.

Moving it before the assignments mean that the compiler is then free
to issue memory loads/stores to load up those registers, which is
exactly what we want to avoid.


In any case, I violently disagree with the idea that stuff we have
in header files should be permitted not to be inlined because we
have soo much that is marked inline.  Having it moved out of line,
and essentially the same function code appearing in multiple C files
is really not an improvement over the current situation with excessive
use of inlining.  Anyone who has looked at the code resulting from
dma_map_single() will know exactly what I'm talking about, which is
way in excess of the few instructions we have for the uaccess_* stuff
here.

The right approach is to move stuff out of line - and by that, I
mean _actually_ move the damn code, so that different compilation
units can use the same instructions, and thereby gain from the
whole point of an instruction cache.

The whole "let's make inline not really mean inline" is nothing more
than a band-aid to the overuse (and abuse) of "inline".

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Nick Desaulniers Oct. 1, 2019, 8:21 p.m. UTC | #27
On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

> On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote:

> > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin

> > <linux@armlinux.org.uk> wrote:

> > >

> > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:

> > > > I apologize; I don't mean to be difficult.  I would just like to avoid

> > > > surprises when code written with the assumption that it will be

> > > > inlined is not.  It sounds like we found one issue in arm32 and one in

> > > > arm64 related to outlining.  If we fix those two cases, I think we're

> > > > close to proceeding with Masahiro's cleanup, which I view as a good

> > > > thing for the health of the Linux kernel codebase.

> > >

> > > Except, using the C preprocessor for this turns the arm32 code into

> > > yuck:

> > >

> > > 1. We'd need to turn get_domain() and set_domain() into multi-line

> > >    preprocessor macro definitions, using the GCC ({ }) extension

> > >    so that get_domain() can return a value.

> > >

> > > 2. uaccess_save_and_enable() and uaccess_restore() also need to

> > >    become preprocessor macro definitions too.

> > >

> > > So, we end up with multiple levels of nested preprocessor macros.

> > > When something goes wrong, the compiler warning/error message is

> > > going to be utterly _horrid_.

> >

> > That's why I preferred V1 of Masahiro's patch, that fixed the inline

> > asm not to make use of caller saved registers before calling a

> > function that might not be inlined.

>

> ... which I objected to based on the fact that this uaccess stuff is

> supposed to add protection against the kernel being fooled into

> accessing userspace when it shouldn't.  The whole intention there is

> that [sg]et_domain(), and uaccess_*() are _always_ inlined as close

> as possible to the call site of the accessor touching userspace.


Then use the C preprocessor to force the inlining.  I'm sorry it's not
as pretty as static inline functions.

>

> Moving it before the assignments mean that the compiler is then free

> to issue memory loads/stores to load up those registers, which is

> exactly what we want to avoid.

>

>

> In any case, I violently disagree with the idea that stuff we have

> in header files should be permitted not to be inlined because we

> have soo much that is marked inline.


So there's a very important subtly here.  There's:
1. code that adds `inline` cause "oh maybe it would be nice to inline
this, but if it isn't no big deal"
2. code that if not inlined is somehow not correct.
3. avoid ODR violations via `static inline`

I'll posit that "we have soo much that is marked inline [is
predominantly case 1 or 3, not case 2]."  Case 2 is a code smell, and
requires extra scrutiny.

> Having it moved out of line,

> and essentially the same function code appearing in multiple C files

> is really not an improvement over the current situation with excessive

> use of inlining.  Anyone who has looked at the code resulting from

> dma_map_single() will know exactly what I'm talking about, which is

> way in excess of the few instructions we have for the uaccess_* stuff

> here.

>

> The right approach is to move stuff out of line - and by that, I

> mean _actually_ move the damn code, so that different compilation

> units can use the same instructions, and thereby gain from the

> whole point of an instruction cache.


And be marked __attribute__((noinline)), otherwise might be inlined via LTO.

>

> The whole "let's make inline not really mean inline" is nothing more

> than a band-aid to the overuse (and abuse) of "inline".


Let's triple check the ISO C11 draft spec just to be sure:
§ 6.7.4.6: A function declared with an inline function specifier is an
inline function. Making a
function an inline function suggests that calls to the function be as
fast as possible.
The extent to which such suggestions are effective is
implementation-defined. 139)
139) For example, an implementation might never perform inline
substitution, or might only perform inline
substitutions to calls in the scope of an inline declaration.
§ J.3.8 [Undefined Behavior] Hints: The extent to which suggestions
made by using the inline function specifier are effective (6.7.4).

My translation:
"Please don't assume inline means anything."

For the unspecified GNU C extension __attribute__((always_inline)), it
seems to me like it's meant more for performing inlining (an
optimization) at -O0.  Whether the compiler warns or not seems like a
nice side effect, but provides no strong guarantee otherwise.

I'm sorry that so much code may have been written with that
assumption, and I'm sorry to be the bearer of bad news, but this isn't
a recent change.  If code was written under false assumptions, it
should be rewritten. Sorry.
-- 
Thanks,
~Nick Desaulniers
Arnd Bergmann Oct. 1, 2019, 8:53 p.m. UTC | #28
On Tue, Oct 1, 2019 at 10:21 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
> On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote:

> > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin

> > > <linux@armlinux.org.uk> wrote:

> > In any case, I violently disagree with the idea that stuff we have

> > in header files should be permitted not to be inlined because we

> > have soo much that is marked inline.

>

> So there's a very important subtly here.  There's:

> 1. code that adds `inline` cause "oh maybe it would be nice to inline

> this, but if it isn't no big deal"

> 2. code that if not inlined is somehow not correct.

> 3. avoid ODR violations via `static inline`

>

> I'll posit that "we have soo much that is marked inline [is

> predominantly case 1 or 3, not case 2]."  Case 2 is a code smell, and

> requires extra scrutiny.


1. is clearly the most common case, but there is also

4. Some compiler version (possibly long gone, possibly still current)
makes bad inlining decisions that result in horrible but functionally
correct object code for a particular function, and forcing a function to
be inlined results in what we had expected the compiler to do already.

The problem with 2. is that they are largely not documented, and often
unintentional. This also affects functions that are simply 'static'
rather than 'static inline' but are broken when not being inlined.
E.g. we've had a number of kernel bugs with 'static __init' functions
that gcc always inlined into a non-__init caller, but that produced an
invalid call into a discarded function when clang decided to leave
them out of line.

      Arnd
Russell King (Oracle) Oct. 1, 2019, 8:59 p.m. UTC | #29
On Tue, Oct 01, 2019 at 01:21:44PM -0700, Nick Desaulniers wrote:
> On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote:

> > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin

> > > <linux@armlinux.org.uk> wrote:

> > > >

> > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:

> > > > > I apologize; I don't mean to be difficult.  I would just like to avoid

> > > > > surprises when code written with the assumption that it will be

> > > > > inlined is not.  It sounds like we found one issue in arm32 and one in

> > > > > arm64 related to outlining.  If we fix those two cases, I think we're

> > > > > close to proceeding with Masahiro's cleanup, which I view as a good

> > > > > thing for the health of the Linux kernel codebase.

> > > >

> > > > Except, using the C preprocessor for this turns the arm32 code into

> > > > yuck:

> > > >

> > > > 1. We'd need to turn get_domain() and set_domain() into multi-line

> > > >    preprocessor macro definitions, using the GCC ({ }) extension

> > > >    so that get_domain() can return a value.

> > > >

> > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to

> > > >    become preprocessor macro definitions too.

> > > >

> > > > So, we end up with multiple levels of nested preprocessor macros.

> > > > When something goes wrong, the compiler warning/error message is

> > > > going to be utterly _horrid_.

> > >

> > > That's why I preferred V1 of Masahiro's patch, that fixed the inline

> > > asm not to make use of caller saved registers before calling a

> > > function that might not be inlined.

> >

> > ... which I objected to based on the fact that this uaccess stuff is

> > supposed to add protection against the kernel being fooled into

> > accessing userspace when it shouldn't.  The whole intention there is

> > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close

> > as possible to the call site of the accessor touching userspace.

> 

> Then use the C preprocessor to force the inlining.  I'm sorry it's not

> as pretty as static inline functions.

> 

> >

> > Moving it before the assignments mean that the compiler is then free

> > to issue memory loads/stores to load up those registers, which is

> > exactly what we want to avoid.

> >

> >

> > In any case, I violently disagree with the idea that stuff we have

> > in header files should be permitted not to be inlined because we

> > have soo much that is marked inline.

> 

> So there's a very important subtly here.  There's:

> 1. code that adds `inline` cause "oh maybe it would be nice to inline

> this, but if it isn't no big deal"

> 2. code that if not inlined is somehow not correct.

> 3. avoid ODR violations via `static inline`

> 

> I'll posit that "we have soo much that is marked inline [is

> predominantly case 1 or 3, not case 2]."  Case 2 is a code smell, and

> requires extra scrutiny.

> 

> > Having it moved out of line,

> > and essentially the same function code appearing in multiple C files

> > is really not an improvement over the current situation with excessive

> > use of inlining.  Anyone who has looked at the code resulting from

> > dma_map_single() will know exactly what I'm talking about, which is

> > way in excess of the few instructions we have for the uaccess_* stuff

> > here.

> >

> > The right approach is to move stuff out of line - and by that, I

> > mean _actually_ move the damn code, so that different compilation

> > units can use the same instructions, and thereby gain from the

> > whole point of an instruction cache.

> 

> And be marked __attribute__((noinline)), otherwise might be inlined via LTO.

> 

> >

> > The whole "let's make inline not really mean inline" is nothing more

> > than a band-aid to the overuse (and abuse) of "inline".

> 

> Let's triple check the ISO C11 draft spec just to be sure:

> § 6.7.4.6: A function declared with an inline function specifier is an

> inline function. Making a

> function an inline function suggests that calls to the function be as

> fast as possible.

> The extent to which such suggestions are effective is

> implementation-defined. 139)

> 139) For example, an implementation might never perform inline

> substitution, or might only perform inline

> substitutions to calls in the scope of an inline declaration.

> § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions

> made by using the inline function specifier are effective (6.7.4).

> 

> My translation:

> "Please don't assume inline means anything."

> 

> For the unspecified GNU C extension __attribute__((always_inline)), it

> seems to me like it's meant more for performing inlining (an

> optimization) at -O0.  Whether the compiler warns or not seems like a

> nice side effect, but provides no strong guarantee otherwise.

> 

> I'm sorry that so much code may have been written with that

> assumption, and I'm sorry to be the bearer of bad news, but this isn't

> a recent change.  If code was written under false assumptions, it

> should be rewritten. Sorry.


You may quote C11, but that is not relevent.  The kernel is coded to
gnu89 standard - see the -std=gnu89 flag.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Miguel Ojeda Oct. 1, 2019, 9:06 p.m. UTC | #30
On Tue, Oct 1, 2019 at 10:53 PM Arnd Bergmann <arnd@arndb.de> wrote:
>

> 1. is clearly the most common case, but there is also

>

> 4. Some compiler version (possibly long gone, possibly still current)

> makes bad inlining decisions that result in horrible but functionally

> correct object code for a particular function, and forcing a function to

> be inlined results in what we had expected the compiler to do already.


There is also 5. code that does not even compile without it, e.g.
_static_cpu_has() in x86_64 which requires
__attribute__((always_inline)), at least on GCC 9.2.

For x64_64 it is the only one case I found, though. If you disable
__always_inline everything else compiles and links (in a defconfig).

Cheers,
Miguel
Nick Desaulniers Oct. 1, 2019, 9:14 p.m. UTC | #31
On Tue, Oct 1, 2019 at 2:06 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>

> On Tue, Oct 1, 2019 at 10:53 PM Arnd Bergmann <arnd@arndb.de> wrote:

> >

> > 1. is clearly the most common case, but there is also

> >

> > 4. Some compiler version (possibly long gone, possibly still current)

> > makes bad inlining decisions that result in horrible but functionally

> > correct object code for a particular function, and forcing a function to

> > be inlined results in what we had expected the compiler to do already.

>

> There is also 5. code that does not even compile without it, e.g.

> _static_cpu_has() in x86_64 which requires

> __attribute__((always_inline)), at least on GCC 9.2.


I assert that's just another case of 2, and should be investigated. (I
think I remember that from when I had to teach LLVM how to inline asm
goto; since the compiler can reject inlining if it doesn't know how to
do such a transform).

>

> For x64_64 it is the only one case I found, though. If you disable

> __always_inline everything else compiles and links (in a defconfig).


Cool, so one bug in arm32, one bug in arm64, one bug in x86_64.
Doesn't sound like too much work to fix.
-- 
Thanks,
~Nick Desaulniers
Russell King (Oracle) Oct. 1, 2019, 9:26 p.m. UTC | #32
On Tue, Oct 01, 2019 at 09:59:38PM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Oct 01, 2019 at 01:21:44PM -0700, Nick Desaulniers wrote:

> > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin

> > <linux@armlinux.org.uk> wrote:

> > >

> > > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote:

> > > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin

> > > > <linux@armlinux.org.uk> wrote:

> > > > >

> > > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:

> > > > > > I apologize; I don't mean to be difficult.  I would just like to avoid

> > > > > > surprises when code written with the assumption that it will be

> > > > > > inlined is not.  It sounds like we found one issue in arm32 and one in

> > > > > > arm64 related to outlining.  If we fix those two cases, I think we're

> > > > > > close to proceeding with Masahiro's cleanup, which I view as a good

> > > > > > thing for the health of the Linux kernel codebase.

> > > > >

> > > > > Except, using the C preprocessor for this turns the arm32 code into

> > > > > yuck:

> > > > >

> > > > > 1. We'd need to turn get_domain() and set_domain() into multi-line

> > > > >    preprocessor macro definitions, using the GCC ({ }) extension

> > > > >    so that get_domain() can return a value.

> > > > >

> > > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to

> > > > >    become preprocessor macro definitions too.

> > > > >

> > > > > So, we end up with multiple levels of nested preprocessor macros.

> > > > > When something goes wrong, the compiler warning/error message is

> > > > > going to be utterly _horrid_.

> > > >

> > > > That's why I preferred V1 of Masahiro's patch, that fixed the inline

> > > > asm not to make use of caller saved registers before calling a

> > > > function that might not be inlined.

> > >

> > > ... which I objected to based on the fact that this uaccess stuff is

> > > supposed to add protection against the kernel being fooled into

> > > accessing userspace when it shouldn't.  The whole intention there is

> > > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close

> > > as possible to the call site of the accessor touching userspace.

> > 

> > Then use the C preprocessor to force the inlining.  I'm sorry it's not

> > as pretty as static inline functions.

> > 

> > >

> > > Moving it before the assignments mean that the compiler is then free

> > > to issue memory loads/stores to load up those registers, which is

> > > exactly what we want to avoid.

> > >

> > >

> > > In any case, I violently disagree with the idea that stuff we have

> > > in header files should be permitted not to be inlined because we

> > > have soo much that is marked inline.

> > 

> > So there's a very important subtly here.  There's:

> > 1. code that adds `inline` cause "oh maybe it would be nice to inline

> > this, but if it isn't no big deal"

> > 2. code that if not inlined is somehow not correct.

> > 3. avoid ODR violations via `static inline`

> > 

> > I'll posit that "we have soo much that is marked inline [is

> > predominantly case 1 or 3, not case 2]."  Case 2 is a code smell, and

> > requires extra scrutiny.

> > 

> > > Having it moved out of line,

> > > and essentially the same function code appearing in multiple C files

> > > is really not an improvement over the current situation with excessive

> > > use of inlining.  Anyone who has looked at the code resulting from

> > > dma_map_single() will know exactly what I'm talking about, which is

> > > way in excess of the few instructions we have for the uaccess_* stuff

> > > here.

> > >

> > > The right approach is to move stuff out of line - and by that, I

> > > mean _actually_ move the damn code, so that different compilation

> > > units can use the same instructions, and thereby gain from the

> > > whole point of an instruction cache.

> > 

> > And be marked __attribute__((noinline)), otherwise might be inlined via LTO.

> > 

> > >

> > > The whole "let's make inline not really mean inline" is nothing more

> > > than a band-aid to the overuse (and abuse) of "inline".

> > 

> > Let's triple check the ISO C11 draft spec just to be sure:

> > § 6.7.4.6: A function declared with an inline function specifier is an

> > inline function. Making a

> > function an inline function suggests that calls to the function be as

> > fast as possible.

> > The extent to which such suggestions are effective is

> > implementation-defined. 139)

> > 139) For example, an implementation might never perform inline

> > substitution, or might only perform inline

> > substitutions to calls in the scope of an inline declaration.

> > § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions

> > made by using the inline function specifier are effective (6.7.4).

> > 

> > My translation:

> > "Please don't assume inline means anything."

> > 

> > For the unspecified GNU C extension __attribute__((always_inline)), it

> > seems to me like it's meant more for performing inlining (an

> > optimization) at -O0.  Whether the compiler warns or not seems like a

> > nice side effect, but provides no strong guarantee otherwise.

> > 

> > I'm sorry that so much code may have been written with that

> > assumption, and I'm sorry to be the bearer of bad news, but this isn't

> > a recent change.  If code was written under false assumptions, it

> > should be rewritten. Sorry.

> 

> You may quote C11, but that is not relevent.  The kernel is coded to

> gnu89 standard - see the -std=gnu89 flag.


There's more to this and why C11 is entirely irrelevant.  The "inline"
you see in our headers is not the compiler keyword that you find in
various C standards, it is a macro that gets expanded to either:

#define inline inline __attribute__((__always_inline__)) __gnu_inline \
        __maybe_unused notrace

or

#define inline inline                                    __gnu_inline \
        __maybe_unused notrace

__gnu_inline is defined as:

#define __gnu_inline                    __attribute__((__gnu_inline__))

So this attaches the gnu_inline attribute to the function:

`gnu_inline'
     This attribute should be used with a function that is also declared
     with the `inline' keyword.  It directs GCC to treat the function
     as if it were defined in gnu90 mode even when compiling in C99 or
     gnu99 mode.
...
     Since ISO C99 specifies a different semantics for `inline', this
     function attribute is provided as a transition measure and as a
     useful feature in its own right.  This attribute is available in
     GCC 4.1.3 and later.  It is available if either of the
     preprocessor macros `__GNUC_GNU_INLINE__' or
     `__GNUC_STDC_INLINE__' are defined.  *Note An Inline Function is
     As Fast As a Macro: Inline.

which is quite clear that C99 semantics do not apply to _this_ inline.
The manual goes on to explain:

 GCC implements three different semantics of declaring a function
inline.  One is available with `-std=gnu89' or `-fgnu89-inline' or when
`gnu_inline' attribute is present on all inline declarations, another
when `-std=c99', `-std=c11', `-std=gnu99' or `-std=gnu11' (without
`-fgnu89-inline'), and the third is used when compiling C++.

I'd suggest gnu90 mode is pretty similar to gnu89 mode, and as we build
the kernel in gnu89 mode, that is the inlining definition that is
appropriate.

When it comes to __always_inline, the GCC manual is the definitive
reference, since we use the GCC attribute for that:

#define __always_inline                 inline __attribute__((__always_inline__))

and I've already quoted what the GCC manual says for always_inline.

Arguing about what the C11 spec says about inlining when we aren't
using C11 dialect in the kernel, but are using GCC features, does
not move the discussion on.

Thanks anyway, maybe it will become relevent in the future if we
decide to move to C11.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Nick Desaulniers Oct. 1, 2019, 9:32 p.m. UTC | #33
On Tue, Oct 1, 2019 at 2:26 PM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

> On Tue, Oct 01, 2019 at 09:59:38PM +0100, Russell King - ARM Linux admin wrote:

> > On Tue, Oct 01, 2019 at 01:21:44PM -0700, Nick Desaulniers wrote:

> > > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin

> > > <linux@armlinux.org.uk> wrote:

> > > >

> > > > The whole "let's make inline not really mean inline" is nothing more

> > > > than a band-aid to the overuse (and abuse) of "inline".

> > >

> > > Let's triple check the ISO C11 draft spec just to be sure:

> > > § 6.7.4.6: A function declared with an inline function specifier is an

> > > inline function. Making a

> > > function an inline function suggests that calls to the function be as

> > > fast as possible.

> > > The extent to which such suggestions are effective is

> > > implementation-defined. 139)

> > > 139) For example, an implementation might never perform inline

> > > substitution, or might only perform inline

> > > substitutions to calls in the scope of an inline declaration.

> > > § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions

> > > made by using the inline function specifier are effective (6.7.4).

> > >

> > > My translation:

> > > "Please don't assume inline means anything."

> > >

> > > For the unspecified GNU C extension __attribute__((always_inline)), it

> > > seems to me like it's meant more for performing inlining (an

> > > optimization) at -O0.  Whether the compiler warns or not seems like a

> > > nice side effect, but provides no strong guarantee otherwise.

> > >

> > > I'm sorry that so much code may have been written with that

> > > assumption, and I'm sorry to be the bearer of bad news, but this isn't

> > > a recent change.  If code was written under false assumptions, it

> > > should be rewritten. Sorry.

> >

> > You may quote C11, but that is not relevent.  The kernel is coded to

> > gnu89 standard - see the -std=gnu89 flag.

>

> There's more to this and why C11 is entirely irrelevant.  The "inline"

> you see in our headers is not the compiler keyword that you find in

> various C standards, it is a macro that gets expanded to either:

>

> #define inline inline __attribute__((__always_inline__)) __gnu_inline \

>         __maybe_unused notrace

>

> or

>

> #define inline inline                                    __gnu_inline \

>         __maybe_unused notrace

>

> __gnu_inline is defined as:

>

> #define __gnu_inline                    __attribute__((__gnu_inline__))

>

> So this attaches the gnu_inline attribute to the function:

>

> `gnu_inline'

>      This attribute should be used with a function that is also declared

>      with the `inline' keyword.  It directs GCC to treat the function

>      as if it were defined in gnu90 mode even when compiling in C99 or

>      gnu99 mode.

> ...

>      Since ISO C99 specifies a different semantics for `inline', this

>      function attribute is provided as a transition measure and as a

>      useful feature in its own right.  This attribute is available in

>      GCC 4.1.3 and later.  It is available if either of the

>      preprocessor macros `__GNUC_GNU_INLINE__' or

>      `__GNUC_STDC_INLINE__' are defined.  *Note An Inline Function is

>      As Fast As a Macro: Inline.

>

> which is quite clear that C99 semantics do not apply to _this_ inline.

> The manual goes on to explain:

>

>  GCC implements three different semantics of declaring a function

> inline.  One is available with `-std=gnu89' or `-fgnu89-inline' or when

> `gnu_inline' attribute is present on all inline declarations, another

> when `-std=c99', `-std=c11', `-std=gnu99' or `-std=gnu11' (without

> `-fgnu89-inline'), and the third is used when compiling C++.


(I wrote the kernel patch for gnu_inline; it only comes into play when
`inline` appears on a function *also defined as `extern`*).

>

> I'd suggest gnu90 mode is pretty similar to gnu89 mode, and as we build

> the kernel in gnu89 mode, that is the inlining definition that is

> appropriate.

>

> When it comes to __always_inline, the GCC manual is the definitive

> reference, since we use the GCC attribute for that:

>

> #define __always_inline                 inline __attribute__((__always_inline__))

>

> and I've already quoted what the GCC manual says for always_inline.

>

> Arguing about what the C11 spec says about inlining when we aren't

> using C11 dialect in the kernel, but are using GCC features, does

> not move the discussion on.

>

> Thanks anyway, maybe it will become relevent in the future if we

> decide to move to C11.


It's not like the semantics of inline are better specified by an older
standard, or changed (The only real semantic change involving `inline`
between ISO C90 and ISO C99 has to do with whether `extern inline`
emits the function with external linkage as you noted).  But that's
irrelevant to the discussion.).  I quoted C11 because ctrl+f doesn't
work for the C90 ISO spec pdf.  C90 spec doesn't even have a section
on Function Specifiers.  From what I can tell, `inline` wasn't
specified until ISO C99.

GNU modes are often modifiers off of ISO C bases; gnu89 corresponds to
ISO C90.  They may permit the use of features from newer ISO C specs
and GNU C extensions without warning under -Wpedantic.  There aren't a
whole lot of semantic differences, at least that I'm aware of.

Please don't assume inline means anything.
-- 
Thanks,
~Nick Desaulniers
Russell King (Oracle) Oct. 1, 2019, 9:58 p.m. UTC | #34
On Tue, Oct 01, 2019 at 02:32:54PM -0700, Nick Desaulniers wrote:
> On Tue, Oct 1, 2019 at 2:26 PM Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Tue, Oct 01, 2019 at 09:59:38PM +0100, Russell King - ARM Linux admin wrote:

> > > On Tue, Oct 01, 2019 at 01:21:44PM -0700, Nick Desaulniers wrote:

> > > > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin

> > > > <linux@armlinux.org.uk> wrote:

> > > > >

> > > > > The whole "let's make inline not really mean inline" is nothing more

> > > > > than a band-aid to the overuse (and abuse) of "inline".

> > > >

> > > > Let's triple check the ISO C11 draft spec just to be sure:

> > > > § 6.7.4.6: A function declared with an inline function specifier is an

> > > > inline function. Making a

> > > > function an inline function suggests that calls to the function be as

> > > > fast as possible.

> > > > The extent to which such suggestions are effective is

> > > > implementation-defined. 139)

> > > > 139) For example, an implementation might never perform inline

> > > > substitution, or might only perform inline

> > > > substitutions to calls in the scope of an inline declaration.

> > > > § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions

> > > > made by using the inline function specifier are effective (6.7.4).

> > > >

> > > > My translation:

> > > > "Please don't assume inline means anything."

> > > >

> > > > For the unspecified GNU C extension __attribute__((always_inline)), it

> > > > seems to me like it's meant more for performing inlining (an

> > > > optimization) at -O0.  Whether the compiler warns or not seems like a

> > > > nice side effect, but provides no strong guarantee otherwise.

> > > >

> > > > I'm sorry that so much code may have been written with that

> > > > assumption, and I'm sorry to be the bearer of bad news, but this isn't

> > > > a recent change.  If code was written under false assumptions, it

> > > > should be rewritten. Sorry.

> > >

> > > You may quote C11, but that is not relevent.  The kernel is coded to

> > > gnu89 standard - see the -std=gnu89 flag.

> >

> > There's more to this and why C11 is entirely irrelevant.  The "inline"

> > you see in our headers is not the compiler keyword that you find in

> > various C standards, it is a macro that gets expanded to either:

> >

> > #define inline inline __attribute__((__always_inline__)) __gnu_inline \

> >         __maybe_unused notrace

> >

> > or

> >

> > #define inline inline                                    __gnu_inline \

> >         __maybe_unused notrace

> >

> > __gnu_inline is defined as:

> >

> > #define __gnu_inline                    __attribute__((__gnu_inline__))

> >

> > So this attaches the gnu_inline attribute to the function:

> >

> > `gnu_inline'

> >      This attribute should be used with a function that is also declared

> >      with the `inline' keyword.  It directs GCC to treat the function

> >      as if it were defined in gnu90 mode even when compiling in C99 or

> >      gnu99 mode.

> > ...

> >      Since ISO C99 specifies a different semantics for `inline', this

> >      function attribute is provided as a transition measure and as a

> >      useful feature in its own right.  This attribute is available in

> >      GCC 4.1.3 and later.  It is available if either of the

> >      preprocessor macros `__GNUC_GNU_INLINE__' or

> >      `__GNUC_STDC_INLINE__' are defined.  *Note An Inline Function is

> >      As Fast As a Macro: Inline.

> >

> > which is quite clear that C99 semantics do not apply to _this_ inline.

> > The manual goes on to explain:

> >

> >  GCC implements three different semantics of declaring a function

> > inline.  One is available with `-std=gnu89' or `-fgnu89-inline' or when

> > `gnu_inline' attribute is present on all inline declarations, another

> > when `-std=c99', `-std=c11', `-std=gnu99' or `-std=gnu11' (without

> > `-fgnu89-inline'), and the third is used when compiling C++.

> 

> (I wrote the kernel patch for gnu_inline; it only comes into play when

> `inline` appears on a function *also defined as `extern`*).


From what I can tell reading the GCC manual, the patch adding
gnu_inline should have no effect.  Maybe it was written before
-std=gnu89 was in use by the kernel makefiles?

> > I'd suggest gnu90 mode is pretty similar to gnu89 mode, and as we build

> > the kernel in gnu89 mode, that is the inlining definition that is

> > appropriate.

> >

> > When it comes to __always_inline, the GCC manual is the definitive

> > reference, since we use the GCC attribute for that:

> >

> > #define __always_inline                 inline __attribute__((__always_inline__))

> >

> > and I've already quoted what the GCC manual says for always_inline.

> >

> > Arguing about what the C11 spec says about inlining when we aren't

> > using C11 dialect in the kernel, but are using GCC features, does

> > not move the discussion on.

> >

> > Thanks anyway, maybe it will become relevent in the future if we

> > decide to move to C11.

> 

> It's not like the semantics of inline are better specified by an older

> standard, or changed (The only real semantic change involving `inline`

> between ISO C90 and ISO C99 has to do with whether `extern inline`

> emits the function with external linkage as you noted).  But that's

> irrelevant to the discussion.).  I quoted C11 because ctrl+f doesn't

> work for the C90 ISO spec pdf.  C90 spec doesn't even have a section

> on Function Specifiers.  From what I can tell, `inline` wasn't

> specified until ISO C99.

> 

> GNU modes are often modifiers off of ISO C bases; gnu89 corresponds to

> ISO C90.  They may permit the use of features from newer ISO C specs

> and GNU C extensions without warning under -Wpedantic.  There aren't a

> whole lot of semantic differences, at least that I'm aware of.


Right, so GCC had inlining support before ISO C added it (which I
distinctly remember, I've been involved in Linux since 1994.)

Unless ISO C based their definition in some way off GCC's
implementation, I still don't see how quoting ISO C documents
helps this discussion when it's the GCC feature that we're using.

And none of this is relevent anyway if we use the GCC
always_inline extension *which is obviously the right way to
resolve this instance*.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Geert Uytterhoeven Oct. 2, 2019, 12:55 p.m. UTC | #35
Hi Nick,

On Wed, Oct 2, 2019 at 6:33 AM Nick Desaulniers <ndesaulniers@google.com> wrote:
> On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote:

> > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin

> > > <linux@armlinux.org.uk> wrote:

> > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:

> > > > > I apologize; I don't mean to be difficult.  I would just like to avoid

> > > > > surprises when code written with the assumption that it will be

> > > > > inlined is not.  It sounds like we found one issue in arm32 and one in

> > > > > arm64 related to outlining.  If we fix those two cases, I think we're

> > > > > close to proceeding with Masahiro's cleanup, which I view as a good

> > > > > thing for the health of the Linux kernel codebase.

> > > >

> > > > Except, using the C preprocessor for this turns the arm32 code into

> > > > yuck:

> > > >

> > > > 1. We'd need to turn get_domain() and set_domain() into multi-line

> > > >    preprocessor macro definitions, using the GCC ({ }) extension

> > > >    so that get_domain() can return a value.

> > > >

> > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to

> > > >    become preprocessor macro definitions too.

> > > >

> > > > So, we end up with multiple levels of nested preprocessor macros.

> > > > When something goes wrong, the compiler warning/error message is

> > > > going to be utterly _horrid_.

> > >

> > > That's why I preferred V1 of Masahiro's patch, that fixed the inline

> > > asm not to make use of caller saved registers before calling a

> > > function that might not be inlined.

> >

> > ... which I objected to based on the fact that this uaccess stuff is

> > supposed to add protection against the kernel being fooled into

> > accessing userspace when it shouldn't.  The whole intention there is

> > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close

> > as possible to the call site of the accessor touching userspace.

>

> Then use the C preprocessor to force the inlining.  I'm sorry it's not

> as pretty as static inline functions.


Which makes us lose the baby^H^H^H^Htype checking performed
on function parameters, requiring to add more ugly checks.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Will Deacon Oct. 2, 2019, 6:51 p.m. UTC | #36
On Wed, Oct 02, 2019 at 02:55:50PM +0200, Geert Uytterhoeven wrote:
> On Wed, Oct 2, 2019 at 6:33 AM Nick Desaulniers <ndesaulniers@google.com> wrote:

> > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin

> > <linux@armlinux.org.uk> wrote:

> > > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote:

> > > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin

> > > > <linux@armlinux.org.uk> wrote:

> > > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote:

> > > > > > I apologize; I don't mean to be difficult.  I would just like to avoid

> > > > > > surprises when code written with the assumption that it will be

> > > > > > inlined is not.  It sounds like we found one issue in arm32 and one in

> > > > > > arm64 related to outlining.  If we fix those two cases, I think we're

> > > > > > close to proceeding with Masahiro's cleanup, which I view as a good

> > > > > > thing for the health of the Linux kernel codebase.

> > > > >

> > > > > Except, using the C preprocessor for this turns the arm32 code into

> > > > > yuck:

> > > > >

> > > > > 1. We'd need to turn get_domain() and set_domain() into multi-line

> > > > >    preprocessor macro definitions, using the GCC ({ }) extension

> > > > >    so that get_domain() can return a value.

> > > > >

> > > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to

> > > > >    become preprocessor macro definitions too.

> > > > >

> > > > > So, we end up with multiple levels of nested preprocessor macros.

> > > > > When something goes wrong, the compiler warning/error message is

> > > > > going to be utterly _horrid_.

> > > >

> > > > That's why I preferred V1 of Masahiro's patch, that fixed the inline

> > > > asm not to make use of caller saved registers before calling a

> > > > function that might not be inlined.

> > >

> > > ... which I objected to based on the fact that this uaccess stuff is

> > > supposed to add protection against the kernel being fooled into

> > > accessing userspace when it shouldn't.  The whole intention there is

> > > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close

> > > as possible to the call site of the accessor touching userspace.

> >

> > Then use the C preprocessor to force the inlining.  I'm sorry it's not

> > as pretty as static inline functions.

> 

> Which makes us lose the baby^H^H^H^Htype checking performed

> on function parameters, requiring to add more ugly checks.


Indeed, and the resulting mess is (at least in my opinion) considerably
worse than where we were in 5.3 and earlier kernels with 'inline' defined
as '__always_inline'.

Will
Linus Torvalds Oct. 2, 2019, 8:39 p.m. UTC | #37
On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>

> >

> > Then use the C preprocessor to force the inlining.  I'm sorry it's not

> > as pretty as static inline functions.

>

> Which makes us lose the baby^H^H^H^Htype checking performed

> on function parameters, requiring to add more ugly checks.


I'm 100% agreed on this.

If the inline change is being pushed by people who say "you should
have used macros instead if you wanted inlining", then I will just
revert that stupid commit that is causing problems.

No, the preprocessor is not the answer.

That said, code that relies on inlining for _correctness_ should use
"__always_inline" and possibly even have a comment about why.

But I am considering just undoing commit 9012d011660e ("compiler:
allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The
advantages are questionable, and when the advantages are balanced
against actual regressions and the arguments are "use macros", that
just shows how badly thought out this was.

                Linus
Masahiro Yamada Oct. 3, 2019, 2:10 a.m. UTC | #38
On Thu, Oct 3, 2019 at 5:46 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>

> On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> >

> > >

> > > Then use the C preprocessor to force the inlining.  I'm sorry it's not

> > > as pretty as static inline functions.

> >

> > Which makes us lose the baby^H^H^H^Htype checking performed

> > on function parameters, requiring to add more ugly checks.

>

> I'm 100% agreed on this.

>

> If the inline change is being pushed by people who say "you should

> have used macros instead if you wanted inlining", then I will just

> revert that stupid commit that is causing problems.

>

> No, the preprocessor is not the answer.

>

> That said, code that relies on inlining for _correctness_ should use

> "__always_inline" and possibly even have a comment about why.

>

> But I am considering just undoing commit 9012d011660e ("compiler:

> allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely.


No, please do not.

Macrofying the 'inline' is a horrid mistake that makes incorrect code work.
It would eternally prevent people from writing portable, correct code.
Please do not encourage to hide problems.


> The

> advantages are questionable, and when the advantages are balanced

> against actual regressions and the arguments are "use macros", that

> just shows how badly thought out this was.

>

>                 Linus




-- 
Best Regards
Masahiro Yamada
Will Deacon Oct. 3, 2019, 4:36 p.m. UTC | #39
On Wed, Oct 02, 2019 at 01:39:40PM -0700, Linus Torvalds wrote:
> On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> >

> > >

> > > Then use the C preprocessor to force the inlining.  I'm sorry it's not

> > > as pretty as static inline functions.

> >

> > Which makes us lose the baby^H^H^H^Htype checking performed

> > on function parameters, requiring to add more ugly checks.

> 

> I'm 100% agreed on this.

> 

> If the inline change is being pushed by people who say "you should

> have used macros instead if you wanted inlining", then I will just

> revert that stupid commit that is causing problems.

> 

> No, the preprocessor is not the answer.

> 

> That said, code that relies on inlining for _correctness_ should use

> "__always_inline" and possibly even have a comment about why.

> 

> But I am considering just undoing commit 9012d011660e ("compiler:

> allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The

> advantages are questionable, and when the advantages are balanced

> against actual regressions and the arguments are "use macros", that

> just shows how badly thought out this was.


It's clear that opinions are divided on this issue, but you can add
an enthusiastic:

Acked-by: Will Deacon <will@kernel.org>


if you go ahead with the revert. I'm all for allowing the compiler to
make its own inlining decisions, but not when the potential for
miscompilation isn't fully understood and the proposed alternatives turn
the source into an unreadable mess. Perhaps we can do something different
for 5.5 (arch opt-in? clang only? invert the logic? work to move functions
over to __always_inline /before/ flipping the CONFIG option? ...?)

Will
Linus Torvalds Oct. 3, 2019, 5:01 p.m. UTC | #40
On Wed, Oct 2, 2019 at 7:11 PM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>

> Macrofying the 'inline' is a horrid mistake that makes incorrect code work.

> It would eternally prevent people from writing portable, correct code.

> Please do not encourage to hide problems.


Honestly, if the alternative to hiding problems is "use a macro", then
I'd rather hide the problems and just make "inline" means "inline".

If "inline" means "it's just a hint, use macros", then inline is useless.

If "inline" means "using this means that there are known compiler
bugs, but we don't know where they trigger", then inline is _worse_
than useless.

I do not see the big advantage of letting the compiler say "yeah, I'm
not going to do that, Dave".

And I see a *huge* disadvantage when people are ignoring compiler
bugs, and are saying "use a macro". Seriously.

Right now we see the obvious compiler bugs that cause build breakages.
How many non-obvious compiler bugs do we have? And how sure are you
that our code is "correct" after fixing a couple of obvious cases?

As to "portable", nobody cares. We're a kernel. We aren't portable,
and never were.

If this is purely about the fact that x86 is different from other
architectures, then let's remove the "compiler can do stupid things"
option on x86 too. It was never clear that it was a huge advantage.

               Linus
Linus Torvalds Oct. 3, 2019, 5:08 p.m. UTC | #41
On Thu, Oct 3, 2019 at 10:01 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>

> If this is purely about the fact that x86 is different from other

> architectures, then let's remove the "compiler can do stupid things"

> option on x86 too. It was never clear that it was a huge advantage.


Side note: what might be an actual advantage would be to have a mode
where you see when the compiler would choose to not inline things.

Maybe we have things that are marked "inline" that simply shouldn't
be. We do tend to have this history of adding small helper functions
to header files, and then they grow and grow and grow. And now they
shouldn't be in a header file any more, but they still are.

So having a "compiler doesn't want to inline this" flag might be
useful for finding those, but it might also be useful for people
looking for what triggers bugs.

But honestly, the last time I saw somebody argue for "I don't want
inlining", it was the broken "I want to compiler the kernel with no
optimizations so that it's easier to look at the resulting code". That
was just a _bad_ bad patch. And if it is concerns like that that are
driving this "compiler doesn't have to inline", then we should stop it
immediately.

             Linus
Masahiro Yamada Oct. 3, 2019, 5:23 p.m. UTC | #42
On Fri, Oct 4, 2019 at 2:02 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>

> On Wed, Oct 2, 2019 at 7:11 PM Masahiro Yamada

> <yamada.masahiro@socionext.com> wrote:

> >

> > Macrofying the 'inline' is a horrid mistake that makes incorrect code work.

> > It would eternally prevent people from writing portable, correct code.

> > Please do not encourage to hide problems.

>

> Honestly, if the alternative to hiding problems is "use a macro", then

> I'd rather hide the problems and just make "inline" means "inline".

>

> If "inline" means "it's just a hint, use macros", then inline is useless.


For clarification,
I am not saying "use macros" at all.


I just want to annotate __always_inline for the case
"2. code that if not inlined is somehow not correct."



> If "inline" means "using this means that there are known compiler

> bugs, but we don't know where they trigger", then inline is _worse_

> than useless.

>

> I do not see the big advantage of letting the compiler say "yeah, I'm

> not going to do that, Dave".

>

> And I see a *huge* disadvantage when people are ignoring compiler

> bugs, and are saying "use a macro". Seriously.



Again, not saying "use a macro".



>

> Right now we see the obvious compiler bugs that cause build breakages.

> How many non-obvious compiler bugs do we have? And how sure are you

> that our code is "correct" after fixing a couple of obvious cases?

>

> As to "portable", nobody cares. We're a kernel. We aren't portable,

> and never were.

>

> If this is purely about the fact that x86 is different from other

> architectures, then let's remove the "compiler can do stupid things"

> option on x86 too. It was never clear that it was a huge advantage.

>

>                Linus




-- 
Best Regards
Masahiro Yamada
Linus Torvalds Oct. 3, 2019, 5:29 p.m. UTC | #43
On Thu, Oct 3, 2019 at 10:24 AM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>

> I just want to annotate __always_inline for the case

> "2. code that if not inlined is somehow not correct."


Oh, I support that entirely - if only for documentation.

But I do *not* support the dismissal of the architecture maintainers
concerns about "does it work?" and apparently known compiler bugs.

> Again, not saying "use a macro".


Other people did, though.

And there seemed to be little balancing of the pain vs the gain. The
gain really isn't that obvious. If the code shrinks by a couple of kB,
is that good or bad? Maybe it is smaller, but is it _better_?

            Linus
Miguel Ojeda Oct. 3, 2019, 8:21 p.m. UTC | #44
On Thu, Oct 3, 2019 at 7:29 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>

> On Thu, Oct 3, 2019 at 10:24 AM Masahiro Yamada

> <yamada.masahiro@socionext.com> wrote:

> >

> > I just want to annotate __always_inline for the case

> > "2. code that if not inlined is somehow not correct."

>

> Oh, I support that entirely - if only for documentation.

>

> But I do *not* support the dismissal of the architecture maintainers

> concerns about "does it work?" and apparently known compiler bugs.

>

> > Again, not saying "use a macro".

>

> Other people did, though.

>

> And there seemed to be little balancing of the pain vs the gain. The

> gain really isn't that obvious. If the code shrinks by a couple of kB,

> is that good or bad? Maybe it is smaller, but is it _better_?


I think both positions that people have shown are important to take
into account.

We should minimize our usage of macros wherever possible and certainly
not write new ones when another solution is available. But we should
*also* minimize our dependence on code that "must-be-inlined" to work
as much as possible.

In particular, I think we should allow to use __always_inline only if
it doesn't work otherwise, as an alternative before trying the next
worst solution (macros). And avoid using only "inline" when we
actually require inlining, of course.

And the reasoning for each usage of __always_inline should have a
comment (be it "bad codegen", "performance tanks without it",
"compiler X <= 4.2 refuses to compile"...). Which is also useful for
compiler folks to grep for cases to improve/fix in their compiler!

Cheers,
Miguel
Geert Uytterhoeven Oct. 4, 2019, 7:37 a.m. UTC | #45
Hi Miguel,

On Thu, Oct 3, 2019 at 10:21 PM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
> On Thu, Oct 3, 2019 at 7:29 PM Linus Torvalds

> <torvalds@linux-foundation.org> wrote:

> > On Thu, Oct 3, 2019 at 10:24 AM Masahiro Yamada

> > <yamada.masahiro@socionext.com> wrote:

> > >

> > > I just want to annotate __always_inline for the case

> > > "2. code that if not inlined is somehow not correct."

> >

> > Oh, I support that entirely - if only for documentation.

> >

> > But I do *not* support the dismissal of the architecture maintainers

> > concerns about "does it work?" and apparently known compiler bugs.

> >

> > > Again, not saying "use a macro".

> >

> > Other people did, though.

> >

> > And there seemed to be little balancing of the pain vs the gain. The

> > gain really isn't that obvious. If the code shrinks by a couple of kB,

> > is that good or bad? Maybe it is smaller, but is it _better_?

>

> I think both positions that people have shown are important to take

> into account.

>

> We should minimize our usage of macros wherever possible and certainly

> not write new ones when another solution is available. But we should

> *also* minimize our dependence on code that "must-be-inlined" to work

> as much as possible.

>

> In particular, I think we should allow to use __always_inline only if

> it doesn't work otherwise, as an alternative before trying the next

> worst solution (macros). And avoid using only "inline" when we

> actually require inlining, of course.

>

> And the reasoning for each usage of __always_inline should have a

> comment (be it "bad codegen", "performance tanks without it",

> "compiler X <= 4.2 refuses to compile"...). Which is also useful for

> compiler folks to grep for cases to improve/fix in their compiler!


First, we had "inline" and normal functions, where "inline" was used to
make sure a function was inlined (e.g. because it contained code paths
that were intended to be optimized away[*]).
Then, the compiler started not honoring the "inline" keyword, so we got
"always_inline", "inline", and normal functions.  With a hack to #define
"inline" to "always_inline" for some compiler versions.

What's next? There should be a way for the programmer to indicate a
function must be inlined.

[*] Some unused  code paths may contain references to symbols that may
    not exist for the current configuration, or not exist at all.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Stefan Wahren Oct. 12, 2019, 10:15 a.m. UTC | #46
Hi,

Am 03.10.19 um 18:36 schrieb Will Deacon:
> On Wed, Oct 02, 2019 at 01:39:40PM -0700, Linus Torvalds wrote:

>> On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

>>>> Then use the C preprocessor to force the inlining.  I'm sorry it's not

>>>> as pretty as static inline functions.

>>> Which makes us lose the baby^H^H^H^Htype checking performed

>>> on function parameters, requiring to add more ugly checks.

>> I'm 100% agreed on this.

>>

>> If the inline change is being pushed by people who say "you should

>> have used macros instead if you wanted inlining", then I will just

>> revert that stupid commit that is causing problems.

>>

>> No, the preprocessor is not the answer.

>>

>> That said, code that relies on inlining for _correctness_ should use

>> "__always_inline" and possibly even have a comment about why.

>>

>> But I am considering just undoing commit 9012d011660e ("compiler:

>> allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The

>> advantages are questionable, and when the advantages are balanced

>> against actual regressions and the arguments are "use macros", that

>> just shows how badly thought out this was.

> It's clear that opinions are divided on this issue, but you can add

> an enthusiastic:

>

> Acked-by: Will Deacon <will@kernel.org>

>

> if you go ahead with the revert. I'm all for allowing the compiler to

> make its own inlining decisions, but not when the potential for

> miscompilation isn't fully understood and the proposed alternatives turn

> the source into an unreadable mess. Perhaps we can do something different

> for 5.5 (arch opt-in? clang only? invert the logic? work to move functions

> over to __always_inline /before/ flipping the CONFIG option? ...?)


what's the status on this?

In need to prepare my pull requests for 5.5 and all recent kernelci
targets (including linux-next) with bcm2835_defconfig are still broken.

Stefan

>

> Will
Masahiro Yamada Oct. 12, 2019, 11:12 a.m. UTC | #47
On Sat, Oct 12, 2019 at 7:21 PM Stefan Wahren <wahrenst@gmx.net> wrote:
>

> Hi,

>

> Am 03.10.19 um 18:36 schrieb Will Deacon:

> > On Wed, Oct 02, 2019 at 01:39:40PM -0700, Linus Torvalds wrote:

> >> On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> >>>> Then use the C preprocessor to force the inlining.  I'm sorry it's not

> >>>> as pretty as static inline functions.

> >>> Which makes us lose the baby^H^H^H^Htype checking performed

> >>> on function parameters, requiring to add more ugly checks.

> >> I'm 100% agreed on this.

> >>

> >> If the inline change is being pushed by people who say "you should

> >> have used macros instead if you wanted inlining", then I will just

> >> revert that stupid commit that is causing problems.

> >>

> >> No, the preprocessor is not the answer.

> >>

> >> That said, code that relies on inlining for _correctness_ should use

> >> "__always_inline" and possibly even have a comment about why.

> >>

> >> But I am considering just undoing commit 9012d011660e ("compiler:

> >> allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The

> >> advantages are questionable, and when the advantages are balanced

> >> against actual regressions and the arguments are "use macros", that

> >> just shows how badly thought out this was.

> > It's clear that opinions are divided on this issue, but you can add

> > an enthusiastic:

> >

> > Acked-by: Will Deacon <will@kernel.org>

> >

> > if you go ahead with the revert. I'm all for allowing the compiler to

> > make its own inlining decisions, but not when the potential for

> > miscompilation isn't fully understood and the proposed alternatives turn

> > the source into an unreadable mess. Perhaps we can do something different

> > for 5.5 (arch opt-in? clang only? invert the logic? work to move functions

> > over to __always_inline /before/ flipping the CONFIG option? ...?)

>

> what's the status on this?

>

> In need to prepare my pull requests for 5.5 and all recent kernelci

> targets (including linux-next) with bcm2835_defconfig are still broken.

>

> Stefan



Russell King already applied the fix-up patch.

https://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8908/1

So, (I hope) the breakage of bcm2835 will be solved in mainline soon.





-- 
Best Regards
Masahiro Yamada
Russell King (Oracle) Oct. 12, 2019, 2:45 p.m. UTC | #48
On Sat, Oct 12, 2019 at 12:15:42PM +0200, Stefan Wahren wrote:
> Hi,

> 

> Am 03.10.19 um 18:36 schrieb Will Deacon:

> > On Wed, Oct 02, 2019 at 01:39:40PM -0700, Linus Torvalds wrote:

> >> On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:

> >>>> Then use the C preprocessor to force the inlining.  I'm sorry it's not

> >>>> as pretty as static inline functions.

> >>> Which makes us lose the baby^H^H^H^Htype checking performed

> >>> on function parameters, requiring to add more ugly checks.

> >> I'm 100% agreed on this.

> >>

> >> If the inline change is being pushed by people who say "you should

> >> have used macros instead if you wanted inlining", then I will just

> >> revert that stupid commit that is causing problems.

> >>

> >> No, the preprocessor is not the answer.

> >>

> >> That said, code that relies on inlining for _correctness_ should use

> >> "__always_inline" and possibly even have a comment about why.

> >>

> >> But I am considering just undoing commit 9012d011660e ("compiler:

> >> allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The

> >> advantages are questionable, and when the advantages are balanced

> >> against actual regressions and the arguments are "use macros", that

> >> just shows how badly thought out this was.

> > It's clear that opinions are divided on this issue, but you can add

> > an enthusiastic:

> >

> > Acked-by: Will Deacon <will@kernel.org>

> >

> > if you go ahead with the revert. I'm all for allowing the compiler to

> > make its own inlining decisions, but not when the potential for

> > miscompilation isn't fully understood and the proposed alternatives turn

> > the source into an unreadable mess. Perhaps we can do something different

> > for 5.5 (arch opt-in? clang only? invert the logic? work to move functions

> > over to __always_inline /before/ flipping the CONFIG option? ...?)

> 

> what's the status on this?

> 

> In need to prepare my pull requests for 5.5 and all recent kernelci

> targets (including linux-next) with bcm2835_defconfig are still broken.


I merged the patches late on Thursday, it may have been too late for
linux-next to pick them up - and because of the time difference between
UK and Australia, it means they won't be in linux-next until next week
(basically, tomorrow).  linux-next is basically a Sunday to Thursday
operation from my point of view.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
diff mbox series

Patch

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5960e2980a8a..e25493811df8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -327,7 +327,7 @@  config HEADERS_CHECK
 	  relevant for userspace, say 'Y'.
 
 config OPTIMIZE_INLINING
-	bool "Allow compiler to uninline functions marked 'inline'"
+	def_bool y
 	help
 	  This option determines if the kernel forces gcc to inline the functions
 	  developers have marked 'inline'. Doing so takes away freedom from gcc to
@@ -338,8 +338,6 @@  config OPTIMIZE_INLINING
 	  decision will become the default in the future. Until then this option
 	  is there to test gcc for this.
 
-	  If unsure, say N.
-
 config DEBUG_SECTION_MISMATCH
 	bool "Enable full Section mismatch analysis"
 	help