Message ID | 20190830034304.24259-1-yamada.masahiro@socionext.com |
---|---|
State | Accepted |
Commit | ac7c3e4ff401b304489a031938dbeaab585bfe0a |
Headers | show |
Series | compiler: enable CONFIG_OPTIMIZE_INLINING forcibly | expand |
On Thu, Aug 29, 2019 at 8:43 PM Masahiro Yamada <yamada.masahiro@socionext.com> wrote: > > Commit 9012d011660e ("compiler: allow all arches to enable > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > this option. A couple of build errors were reported by randconfig, > but all of them have been ironed out. > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > (and it will simplify the 'inline' macro in compiler_types.h), > this commit changes it to always-on option. Going forward, the > compiler will always be allowed to not inline functions marked > 'inline'. > > This is not a problem for x86 since it has been long used by > arch/x86/configs/{x86_64,i386}_defconfig. > > I am keeping the config option just in case any problem crops up for > other architectures. > > The code clean-up will be done after confirming this is solid. > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Just saw akpm picked this up, but Acked-by: Nick Desaulniers <ndesaulniers@google.com> > --- > > lib/Kconfig.debug | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 5960e2980a8a..e25493811df8 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -327,7 +327,7 @@ config HEADERS_CHECK > relevant for userspace, say 'Y'. > > config OPTIMIZE_INLINING > - bool "Allow compiler to uninline functions marked 'inline'" > + def_bool y > help > This option determines if the kernel forces gcc to inline the functions > developers have marked 'inline'. Doing so takes away freedom from gcc to > @@ -338,8 +338,6 @@ config OPTIMIZE_INLINING > decision will become the default in the future. Until then this option > is there to test gcc for this. > > - If unsure, say N. > - > config DEBUG_SECTION_MISMATCH > bool "Enable full Section mismatch analysis" > help > -- > 2.17.1 > -- Thanks, ~Nick Desaulniers
Hi Yamada-san, On Fri, Aug 30, 2019 at 5:44 AM Masahiro Yamada <yamada.masahiro@socionext.com> wrote: > Commit 9012d011660e ("compiler: allow all arches to enable > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > this option. A couple of build errors were reported by randconfig, > but all of them have been ironed out. > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > (and it will simplify the 'inline' macro in compiler_types.h), > this commit changes it to always-on option. Going forward, the > compiler will always be allowed to not inline functions marked > 'inline'. > > This is not a problem for x86 since it has been long used by > arch/x86/configs/{x86_64,i386}_defconfig. > > I am keeping the config option just in case any problem crops up for > other architectures. > > The code clean-up will be done after confirming this is solid. > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> This breaks compiling drivers/video/fbdev/c2p*, as the functions in drivers/video/fbdev/c2p_core.h are no longer inlined, leading to calls to the non-existent function c2p_unsupported(), as reported by KISSKB: On Thu, Sep 26, 2019 at 5:02 AM <noreply@ellerman.id.au> wrote: > FAILED linux-next/m68k-defconfig/m68k Thu Sep 26, 12:58 > > http://kisskb.ellerman.id.au/kisskb/buildresult/13973194/ > > Commit: Add linux-next specific files for 20190925 > d47175169c28eedd2cc2ab8c01f38764cb0269cc > Compiler: m68k-linux-gcc (GCC) 4.6.3 / GNU ld (GNU Binutils) 2.22 > > Possible errors > --------------- > > c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported' > c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported' > c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported' > c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported' > make[1]: *** [Makefile:1074: vmlinux] Error 1 > make: *** [Makefile:179: sub-make] Error 2 I managed to reproduce this with gcc version 8.3.0 (Ubuntu 8.3.0-6ubuntu1~18.04.1) , and bisected the failure to commit 025f072e5823947c ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") . Marking the functions __always_inline instead of inline fixes that. Shall I send a patch to do that? Thanks! > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -327,7 +327,7 @@ config HEADERS_CHECK > relevant for userspace, say 'Y'. > > config OPTIMIZE_INLINING > - bool "Allow compiler to uninline functions marked 'inline'" > + def_bool y > help > This option determines if the kernel forces gcc to inline the functions > developers have marked 'inline'. Doing so takes away freedom from gcc to > @@ -338,8 +338,6 @@ config OPTIMIZE_INLINING > decision will become the default in the future. Until then this option > is there to test gcc for this. > > - If unsure, say N. > - > config DEBUG_SECTION_MISMATCH > bool "Enable full Section mismatch analysis" > help Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi Geert, On Thu, Sep 26, 2019 at 5:54 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Yamada-san, > > On Fri, Aug 30, 2019 at 5:44 AM Masahiro Yamada > <yamada.masahiro@socionext.com> wrote: > > Commit 9012d011660e ("compiler: allow all arches to enable > > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > > this option. A couple of build errors were reported by randconfig, > > but all of them have been ironed out. > > > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > > (and it will simplify the 'inline' macro in compiler_types.h), > > this commit changes it to always-on option. Going forward, the > > compiler will always be allowed to not inline functions marked > > 'inline'. > > > > This is not a problem for x86 since it has been long used by > > arch/x86/configs/{x86_64,i386}_defconfig. > > > > I am keeping the config option just in case any problem crops up for > > other architectures. > > > > The code clean-up will be done after confirming this is solid. > > > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> > > This breaks compiling drivers/video/fbdev/c2p*, as the functions in > drivers/video/fbdev/c2p_core.h are no longer inlined, leading to calls > to the non-existent function c2p_unsupported(), as reported by KISSKB: > > On Thu, Sep 26, 2019 at 5:02 AM <noreply@ellerman.id.au> wrote: > > FAILED linux-next/m68k-defconfig/m68k Thu Sep 26, 12:58 > > > > http://kisskb.ellerman.id.au/kisskb/buildresult/13973194/ > > > > Commit: Add linux-next specific files for 20190925 > > d47175169c28eedd2cc2ab8c01f38764cb0269cc > > Compiler: m68k-linux-gcc (GCC) 4.6.3 / GNU ld (GNU Binutils) 2.22 > > > > Possible errors > > --------------- > > > > c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported' > > c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported' > > c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported' > > c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported' > > make[1]: *** [Makefile:1074: vmlinux] Error 1 > > make: *** [Makefile:179: sub-make] Error 2 > > I managed to reproduce this with gcc version 8.3.0 (Ubuntu > 8.3.0-6ubuntu1~18.04.1) , and bisected the failure to commit > 025f072e5823947c ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") . > > Marking the functions __always_inline instead of inline fixes that. > Shall I send a patch to do that? Yes, please. But you do not need to touch _transp() or comp(). Only functions that call c2p_unsupported(). BTW, c2p_unsupported() can be replaced with BUILD_BUG(). This will break the build earlier in case it cannot be optimized out. -- Best Regards Masahiro Yamada
Hi Yamada-san, On Thu, Sep 26, 2019 at 11:03 AM Masahiro Yamada <yamada.masahiro@socionext.com> wrote: > On Thu, Sep 26, 2019 at 5:54 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > On Fri, Aug 30, 2019 at 5:44 AM Masahiro Yamada > > <yamada.masahiro@socionext.com> wrote: > > > Commit 9012d011660e ("compiler: allow all arches to enable > > > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > > > this option. A couple of build errors were reported by randconfig, > > > but all of them have been ironed out. > > > > > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > > > (and it will simplify the 'inline' macro in compiler_types.h), > > > this commit changes it to always-on option. Going forward, the > > > compiler will always be allowed to not inline functions marked > > > 'inline'. > > > > > > This is not a problem for x86 since it has been long used by > > > arch/x86/configs/{x86_64,i386}_defconfig. > > > > > > I am keeping the config option just in case any problem crops up for > > > other architectures. > > > > > > The code clean-up will be done after confirming this is solid. > > > > > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> > > > > This breaks compiling drivers/video/fbdev/c2p*, as the functions in > > drivers/video/fbdev/c2p_core.h are no longer inlined, leading to calls > > to the non-existent function c2p_unsupported(), as reported by KISSKB: > > > > On Thu, Sep 26, 2019 at 5:02 AM <noreply@ellerman.id.au> wrote: > > > FAILED linux-next/m68k-defconfig/m68k Thu Sep 26, 12:58 > > > > > > http://kisskb.ellerman.id.au/kisskb/buildresult/13973194/ > > > > > > Commit: Add linux-next specific files for 20190925 > > > d47175169c28eedd2cc2ab8c01f38764cb0269cc > > > Compiler: m68k-linux-gcc (GCC) 4.6.3 / GNU ld (GNU Binutils) 2.22 > > > > > > Possible errors > > > --------------- > > > > > > c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported' > > > c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported' > > > c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported' > > > c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported' > > > make[1]: *** [Makefile:1074: vmlinux] Error 1 > > > make: *** [Makefile:179: sub-make] Error 2 > > > > I managed to reproduce this with gcc version 8.3.0 (Ubuntu > > 8.3.0-6ubuntu1~18.04.1) , and bisected the failure to commit > > 025f072e5823947c ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") . > > > > Marking the functions __always_inline instead of inline fixes that. > > Shall I send a patch to do that? > > > Yes, please. OK, will do. > But you do not need to touch _transp() or comp(). > Only functions that call c2p_unsupported(). However, the inlining of these functions is very performance-sensitive too. So perhaps I should mark all of them __always_inline? > BTW, c2p_unsupported() can be replaced with BUILD_BUG(). > This will break the build earlier in case > it cannot be optimized out. Right. Will do, too. Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi Geert, On Thu, Sep 26, 2019 at 6:26 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > Hi Yamada-san, > > On Thu, Sep 26, 2019 at 11:03 AM Masahiro Yamada > <yamada.masahiro@socionext.com> wrote: > > On Thu, Sep 26, 2019 at 5:54 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > On Fri, Aug 30, 2019 at 5:44 AM Masahiro Yamada > > > <yamada.masahiro@socionext.com> wrote: > > > > Commit 9012d011660e ("compiler: allow all arches to enable > > > > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > > > > this option. A couple of build errors were reported by randconfig, > > > > but all of them have been ironed out. > > > > > > > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > > > > (and it will simplify the 'inline' macro in compiler_types.h), > > > > this commit changes it to always-on option. Going forward, the > > > > compiler will always be allowed to not inline functions marked > > > > 'inline'. > > > > > > > > This is not a problem for x86 since it has been long used by > > > > arch/x86/configs/{x86_64,i386}_defconfig. > > > > > > > > I am keeping the config option just in case any problem crops up for > > > > other architectures. > > > > > > > > The code clean-up will be done after confirming this is solid. > > > > > > > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> > > > > > > This breaks compiling drivers/video/fbdev/c2p*, as the functions in > > > drivers/video/fbdev/c2p_core.h are no longer inlined, leading to calls > > > to the non-existent function c2p_unsupported(), as reported by KISSKB: > > > > > > On Thu, Sep 26, 2019 at 5:02 AM <noreply@ellerman.id.au> wrote: > > > > FAILED linux-next/m68k-defconfig/m68k Thu Sep 26, 12:58 > > > > > > > > http://kisskb.ellerman.id.au/kisskb/buildresult/13973194/ > > > > > > > > Commit: Add linux-next specific files for 20190925 > > > > d47175169c28eedd2cc2ab8c01f38764cb0269cc > > > > Compiler: m68k-linux-gcc (GCC) 4.6.3 / GNU ld (GNU Binutils) 2.22 > > > > > > > > Possible errors > > > > --------------- > > > > > > > > c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported' > > > > c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported' > > > > c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported' > > > > c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported' > > > > make[1]: *** [Makefile:1074: vmlinux] Error 1 > > > > make: *** [Makefile:179: sub-make] Error 2 > > > > > > I managed to reproduce this with gcc version 8.3.0 (Ubuntu > > > 8.3.0-6ubuntu1~18.04.1) , and bisected the failure to commit > > > 025f072e5823947c ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") . > > > > > > Marking the functions __always_inline instead of inline fixes that. > > > Shall I send a patch to do that? > > > > > > Yes, please. > > OK, will do. > > > But you do not need to touch _transp() or comp(). > > Only functions that call c2p_unsupported(). > > However, the inlining of these functions is very performance-sensitive too. > So perhaps I should mark all of them __always_inline? I want to leave as much compiler's freedom as possible, and the optimization criteria depends on CONFIG option. When CONFIG_CC_OPTIMIZE_FOR_SIZE is chosen, the optimization should respect the code size over performance, and the compiler may prefer not inlining it. So, the basic idea is, use __always_inline only when otherwise the builds would be broken. Masahiro Yamada > > BTW, c2p_unsupported() can be replaced with BUILD_BUG(). > > This will break the build earlier in case > > it cannot be optimized out. > > Right. Will do, too. > > Thanks! > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds -- Best Regards Masahiro Yamada
On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote: > Commit 9012d011660e ("compiler: allow all arches to enable > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > this option. A couple of build errors were reported by randconfig, > but all of them have been ironed out. > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > (and it will simplify the 'inline' macro in compiler_types.h), > this commit changes it to always-on option. Going forward, the > compiler will always be allowed to not inline functions marked > 'inline'. > > This is not a problem for x86 since it has been long used by > arch/x86/configs/{x86_64,i386}_defconfig. > > I am keeping the config option just in case any problem crops up for > other architectures. > > The code clean-up will be done after confirming this is solid. > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> [ Adding some ARM people as they might be able to help ] This was found to cause a regression on a Raspberry Pi 2 B built with bcm2835_defconfig which among other things has no SMP support. The relevant logs (edited to remove the noise) are: [ 5.827333] Run /init as init process Loading, please wait... Failed to set SO_PASSCRED: Bad address Failed to bind netlink socket: Bad address Failed to create manager: Bad address Failed to set SO_PASSCRED: Bad address [ 9.021623] systemd[1]: SO_PASSCRED failed: Bad address [!!!!!!] Failed to start up manager. [ 9.079148] systemd[1]: Freezing execution. I looked into it, it turns out that the call to get_user() in sock_setsockopt() is returning -EFAULT. Down the assembly rabbit hole that get_user() is I found-out that it's the macro 'check_uaccess' who's triggering the error. I'm clueless at this point, so I hope you can give me some hints on what's going bad here. Regards, Nicolas
On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote: > Commit 9012d011660e ("compiler: allow all arches to enable > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > this option. A couple of build errors were reported by randconfig, > but all of them have been ironed out. > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > (and it will simplify the 'inline' macro in compiler_types.h), > this commit changes it to always-on option. Going forward, the > compiler will always be allowed to not inline functions marked > 'inline'. > > This is not a problem for x86 since it has been long used by > arch/x86/configs/{x86_64,i386}_defconfig. > > I am keeping the config option just in case any problem crops up for > other architectures. > > The code clean-up will be done after confirming this is solid. > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> [ Resending as the mail delivery system failed to resolve some the hosts, namely Masahiro's ] [ Adding some ARM people as they might be able to help ] This was found to cause a regression on a Raspberry Pi 2 B built with bcm2835_defconfig which among other things has no SMP support. The relevant logs (edited to remove the noise) are: [ 5.827333] Run /init as init process Loading, please wait... Failed to set SO_PASSCRED: Bad address Failed to bind netlink socket: Bad address Failed to create manager: Bad address Failed to set SO_PASSCRED: Bad address [ 9.021623] systemd[1]: SO_PASSCRED failed: Bad address [!!!!!!] Failed to start up manager. [ 9.079148] systemd[1]: Freezing execution. I looked into it, it turns out that the call to get_user() in sock_setsockopt() is returning -EFAULT. Down the assembly rabbit hole that get_user() is I found-out that it's the macro 'check_uaccess' who's triggering the error. I'm clueless at this point, so I hope you can give me some hints on what's going bad here. Regards, Nicolas
On Fri, Sep 27, 2019 at 12:43:33PM +0200, Nicolas Saenz Julienne wrote: > On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote: > > Commit 9012d011660e ("compiler: allow all arches to enable > > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > > this option. A couple of build errors were reported by randconfig, > > but all of them have been ironed out. > > > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > > (and it will simplify the 'inline' macro in compiler_types.h), > > this commit changes it to always-on option. Going forward, the > > compiler will always be allowed to not inline functions marked > > 'inline'. > > > > This is not a problem for x86 since it has been long used by > > arch/x86/configs/{x86_64,i386}_defconfig. > > > > I am keeping the config option just in case any problem crops up for > > other architectures. > > > > The code clean-up will be done after confirming this is solid. > > > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> > > [ Adding some ARM people as they might be able to help ] > > This was found to cause a regression on a Raspberry Pi 2 B built with > bcm2835_defconfig which among other things has no SMP support. > > The relevant logs (edited to remove the noise) are: > > [ 5.827333] Run /init as init process > Loading, please wait... > Failed to set SO_PASSCRED: Bad address > Failed to bind netlink socket: Bad address > Failed to create manager: Bad address > Failed to set SO_PASSCRED: Bad address > [ 9.021623] systemd[1]: SO_PASSCRED failed: Bad address > [!!!!!!] Failed to start up manager. > [ 9.079148] systemd[1]: Freezing execution. > > I looked into it, it turns out that the call to get_user() in sock_setsockopt() > is returning -EFAULT. Down the assembly rabbit hole that get_user() is I > found-out that it's the macro 'check_uaccess' who's triggering the error. > > I'm clueless at this point, so I hope you can give me some hints on what's > going bad here. > Not sure I can add much in terms of a solution, but I can at least confirm I am seeing exactly the same issue, on my Zynq based ARM system. Thanks, Charles
On Fri, Sep 27, 2019 at 3:43 AM Nicolas Saenz Julienne <nsaenzjulienne@suse.de> wrote: > > On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote: > > Commit 9012d011660e ("compiler: allow all arches to enable > > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > > this option. A couple of build errors were reported by randconfig, > > but all of them have been ironed out. > > > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > > (and it will simplify the 'inline' macro in compiler_types.h), > > this commit changes it to always-on option. Going forward, the > > compiler will always be allowed to not inline functions marked > > 'inline'. > > > > This is not a problem for x86 since it has been long used by > > arch/x86/configs/{x86_64,i386}_defconfig. > > > > I am keeping the config option just in case any problem crops up for > > other architectures. > > > > The code clean-up will be done after confirming this is solid. > > > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> > > [ Adding some ARM people as they might be able to help ] > > This was found to cause a regression on a Raspberry Pi 2 B built with > bcm2835_defconfig which among other things has no SMP support. > > The relevant logs (edited to remove the noise) are: > > [ 5.827333] Run /init as init process > Loading, please wait... > Failed to set SO_PASSCRED: Bad address > Failed to bind netlink socket: Bad address > Failed to create manager: Bad address > Failed to set SO_PASSCRED: Bad address > [ 9.021623] systemd[1]: SO_PASSCRED failed: Bad address > [!!!!!!] Failed to start up manager. > [ 9.079148] systemd[1]: Freezing execution. > > I looked into it, it turns out that the call to get_user() in sock_setsockopt() > is returning -EFAULT. Down the assembly rabbit hole that get_user() is I > found-out that it's the macro 'check_uaccess' who's triggering the error. > > I'm clueless at this point, so I hope you can give me some hints on what's > going bad here. So get_user() was passed a bad value/pointer from userspace? Do you know which of the tree calls to get_user() from sock_setsockopt() is failing? (It's not immediately clear to me how this patch is at fault, vs there just being a bug in the source somewhere). -- Thanks, ~Nick Desaulniers
On Fri, Sep 27, 2019 at 3:08 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > So get_user() was passed a bad value/pointer from userspace? Do you > know which of the tree calls to get_user() from sock_setsockopt() is > failing? (It's not immediately clear to me how this patch is at > fault, vs there just being a bug in the source somewhere). Based on the error messages, the SO_PASSCRED ones are almost certainly from the get_user() in net/core/sock.c: sock_setsockopt(), which just does if (optlen < sizeof(int)) return -EINVAL; if (get_user(val, (int __user *)optval)) return -EFAULT; valbool = val ? 1 : 0; but it's the other messages imply that a lot of other cases are failing too (ie the "Failed to bind netlink socket" is, according to google, a bind() that fails with the same EFAULT error). There are probably even more failures that happen elsewhere and just don't even syslog the fact. I'd guess that all get_user() calls just fail, and those are the ones that happen to get printed out. Now, _why_ it would fail, I have ni idea. There are several inlines in the arm uaccess.h file, and it depends on other headers like <asm/domain.h> with more inlines still - eg get/set_domain() etc. Soem of that code is pretty subtle. They have fixed register usage (but the asm macros actually check them). And the inline asms clobber the link register, but they do seem to clearly _state_ that they clobber it, so who knows. Just based on the EFAULT, I'd _guess_ that it's some interaction with the domain access control register (so that get/set_domain() thing). But I'm not even sure that code is enabled for the Rpi2, so who knows.. Linus
Hi. On Fri, Sep 27, 2019 at 7:58 PM Nicolas Saenz Julienne <nsaenzjulienne@suse.de> wrote: > > On Fri, 2019-08-30 at 12:43 +0900, Masahiro Yamada wrote: > > Commit 9012d011660e ("compiler: allow all arches to enable > > CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable > > this option. A couple of build errors were reported by randconfig, > > but all of them have been ironed out. > > > > Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely > > (and it will simplify the 'inline' macro in compiler_types.h), > > this commit changes it to always-on option. Going forward, the > > compiler will always be allowed to not inline functions marked > > 'inline'. > > > > This is not a problem for x86 since it has been long used by > > arch/x86/configs/{x86_64,i386}_defconfig. > > > > I am keeping the config option just in case any problem crops up for > > other architectures. > > > > The code clean-up will be done after confirming this is solid. > > > > Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> > > [ Resending as the mail delivery system failed to resolve some the hosts, > namely Masahiro's ] > > [ Adding some ARM people as they might be able to help ] > > This was found to cause a regression on a Raspberry Pi 2 B built with > bcm2835_defconfig which among other things has no SMP support. > > The relevant logs (edited to remove the noise) are: > > [ 5.827333] Run /init as init process > Loading, please wait... > Failed to set SO_PASSCRED: Bad address > Failed to bind netlink socket: Bad address > Failed to create manager: Bad address > Failed to set SO_PASSCRED: Bad address > [ 9.021623] systemd[1]: SO_PASSCRED failed: Bad address > [!!!!!!] Failed to start up manager. > [ 9.079148] systemd[1]: Freezing execution. > > I looked into it, it turns out that the call to get_user() in sock_setsockopt() > is returning -EFAULT. Down the assembly rabbit hole that get_user() is I > found-out that it's the macro 'check_uaccess' who's triggering the error. > > I'm clueless at this point, so I hope you can give me some hints on what's > going bad here. > > Regards, > Nicolas > > I posted a fix: https://lore.kernel.org/patchwork/patch/1132459/ Thanks. -- Best Regards Masahiro Yamada
On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote: > On Fri, Sep 27, 2019 at 3:08 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > So get_user() was passed a bad value/pointer from userspace? Do you > > know which of the tree calls to get_user() from sock_setsockopt() is > > failing? (It's not immediately clear to me how this patch is at > > fault, vs there just being a bug in the source somewhere). > > Based on the error messages, the SO_PASSCRED ones are almost certainly > from the get_user() in net/core/sock.c: sock_setsockopt(), which just > does > > if (optlen < sizeof(int)) > return -EINVAL; > > if (get_user(val, (int __user *)optval)) > return -EFAULT; > > valbool = val ? 1 : 0; > > but it's the other messages imply that a lot of other cases are > failing too (ie the "Failed to bind netlink socket" is, according to > google, a bind() that fails with the same EFAULT error). > > There are probably even more failures that happen elsewhere and just > don't even syslog the fact. I'd guess that all get_user() calls just > fail, and those are the ones that happen to get printed out. > > Now, _why_ it would fail, I have ni idea. There are several inlines in > the arm uaccess.h file, and it depends on other headers like > <asm/domain.h> with more inlines still - eg get/set_domain() etc. > > Soem of that code is pretty subtle. They have fixed register usage > (but the asm macros actually check them). And the inline asms clobber > the link register, but they do seem to clearly _state_ that they > clobber it, so who knows. > > Just based on the EFAULT, I'd _guess_ that it's some interaction with > the domain access control register (so that get/set_domain() thing). > But I'm not even sure that code is enabled for the Rpi2, so who > knows.. FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local variables marked as 'register' where GCC would do crazy things and end up corrupting data, so I suspect the use of fixed registers in the arm uaccess functions is hitting something similar: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111 Although this particular case couldn't be reproduced with GCC 9, prior versions of the compiler get it wrong so I'm very much opposed to enabling CONFIG_OPTIMIZE_INLINING by default on arm/arm64. Will
On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote: > > On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote: > > On Fri, Sep 27, 2019 at 3:08 PM Nick Desaulniers > > <ndesaulniers@google.com> wrote: > > > > > > So get_user() was passed a bad value/pointer from userspace? Do you > > > know which of the tree calls to get_user() from sock_setsockopt() is > > > failing? (It's not immediately clear to me how this patch is at > > > fault, vs there just being a bug in the source somewhere). > > > > Based on the error messages, the SO_PASSCRED ones are almost certainly > > from the get_user() in net/core/sock.c: sock_setsockopt(), which just > > does > > > > if (optlen < sizeof(int)) > > return -EINVAL; > > > > if (get_user(val, (int __user *)optval)) > > return -EFAULT; > > > > valbool = val ? 1 : 0; > > > > but it's the other messages imply that a lot of other cases are > > failing too (ie the "Failed to bind netlink socket" is, according to > > google, a bind() that fails with the same EFAULT error). > > > > There are probably even more failures that happen elsewhere and just > > don't even syslog the fact. I'd guess that all get_user() calls just > > fail, and those are the ones that happen to get printed out. > > > > Now, _why_ it would fail, I have ni idea. There are several inlines in > > the arm uaccess.h file, and it depends on other headers like > > <asm/domain.h> with more inlines still - eg get/set_domain() etc. > > > > Soem of that code is pretty subtle. They have fixed register usage > > (but the asm macros actually check them). And the inline asms clobber > > the link register, but they do seem to clearly _state_ that they > > clobber it, so who knows. > > > > Just based on the EFAULT, I'd _guess_ that it's some interaction with > > the domain access control register (so that get/set_domain() thing). > > But I'm not even sure that code is enabled for the Rpi2, so who > > knows.. > > FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local > variables marked as 'register' where GCC would do crazy things and end > up corrupting data, so I suspect the use of fixed registers in the arm > uaccess functions is hitting something similar: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111 No. Not similar at all. I fixed it already. See https://lore.kernel.org/patchwork/patch/1132459/ The problems are fixable by writing correct code. I think we discussed this already. - There is nothing arch-specific in CONFIG_OPTIMIZE_INLINING - 'inline' is just a hint. It does not guarantee function inlining. This is standard. - The kernel macrofies 'inline' to add __attribute__((__always_inline__)) This terrible hack must end. - __attribute__((__always_inline__)) takes aways compiler's freedom, and prevents it from optimizing the code for -O2, -Os, or whatever. > Although this particular case couldn't be reproduced with GCC 9, prior > versions of the compiler get it wrong so I'm very much opposed to enabling > CONFIG_OPTIMIZE_INLINING by default on arm/arm64. > > Will -- Best Regards Masahiro Yamada
On Mon, Sep 30, 2019 at 09:05:11PM +0900, Masahiro Yamada wrote: > On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote: > > On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote: > > > Soem of that code is pretty subtle. They have fixed register usage > > > (but the asm macros actually check them). And the inline asms clobber > > > the link register, but they do seem to clearly _state_ that they > > > clobber it, so who knows. > > > > > > Just based on the EFAULT, I'd _guess_ that it's some interaction with > > > the domain access control register (so that get/set_domain() thing). > > > But I'm not even sure that code is enabled for the Rpi2, so who > > > knows.. > > > > FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local > > variables marked as 'register' where GCC would do crazy things and end > > up corrupting data, so I suspect the use of fixed registers in the arm > > uaccess functions is hitting something similar: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111 > > No. Not similar at all. They're similar in that enabling CONFIG_OPTIMIZE_INLINING causes register variables to go wrong. I agree that the ARM code looks dodgy with that call to uaccess_save_and_enable(), but there are __asmeq macros in there to try to catch that, so it's still very fishy. > I fixed it already. See > https://lore.kernel.org/patchwork/patch/1132459/ You fixed the specific case above for 32-bit ARM, but the arm64 case is due to a compiler bug. As it happens, we've reworked our atomics in 5.4 so that particular issue no longer triggers, but the fact remains that GCC has been shown to screw up explicit register allocation for perfectly legitimate code when giving the flexibility to move code out of line. > The problems are fixable by writing correct code. Right, in the compiler ;) > I think we discussed this already. We did? > - There is nothing arch-specific in CONFIG_OPTIMIZE_INLINING Apart from the bugs... and even then, that's just based on reports. > - 'inline' is just a hint. It does not guarantee function inlining. > This is standard. > > - The kernel macrofies 'inline' to add __attribute__((__always_inline__)) > This terrible hack must end. I'm all for getting rid of hacks, but not at the cost of correctness. > - __attribute__((__always_inline__)) takes aways compiler's freedom, > and prevents it from optimizing the code for -O2, -Os, or whatever. s/whatever/miscompiling the code/ If it helps, here is more information about the arm64 failure which triggered the GCC bugzilla: https://www.spinics.net/lists/arm-kernel/msg730329.html Will
On Mon, Sep 30, 2019 at 5:18 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Sep 30, 2019 at 09:05:11PM +0900, Masahiro Yamada wrote: > > On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote: > > > On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote: > > > > Soem of that code is pretty subtle. They have fixed register usage > > > > (but the asm macros actually check them). And the inline asms clobber > > > > the link register, but they do seem to clearly _state_ that they > > > > clobber it, so who knows. > > > > > > > > Just based on the EFAULT, I'd _guess_ that it's some interaction with > > > > the domain access control register (so that get/set_domain() thing). > > > > But I'm not even sure that code is enabled for the Rpi2, so who > > > > knows.. > > > > > > FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local > > > variables marked as 'register' where GCC would do crazy things and end > > > up corrupting data, so I suspect the use of fixed registers in the arm > > > uaccess functions is hitting something similar: > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111 > > > > No. Not similar at all. > > They're similar in that enabling CONFIG_OPTIMIZE_INLINING causes register > variables to go wrong. I agree that the ARM code looks dodgy with > that call to uaccess_save_and_enable(), but there are __asmeq macros > in there to try to catch that, so it's still very fishy. > > > I fixed it already. See > > https://lore.kernel.org/patchwork/patch/1132459/ > > You fixed the specific case above for 32-bit ARM, but the arm64 case > is due to a compiler bug. As it happens, we've reworked our atomics > in 5.4 so that particular issue no longer triggers, but the fact remains > that GCC has been shown to screw up explicit register allocation for > perfectly legitimate code when giving the flexibility to move code out > of line. So __attribute__((always_inline)) doesn't guarantee that code will be inlined. For instance in LLVM's inliner, it asks/answers "should I inline" and "can I inline." "Should" has to do with a cost model, and is very heuristic-y. "Can" has more to do with the transforms, and whether they're all implemented and safe. If you if you say __attribute__((always_inline)), the answer to "can I inline this" can still be *no*. The only way to guarantee inlining is via the C preprocessor. The only way to prevent inlining is via __attribute__((no_inline)). inline and __attribute__((always_inline)) are a heuristic laden mess and should not be relied upon. I would also look closely at code that *requires* inlining or the lack there of to be correct. That the kernel no longer compiles at -O0 is not a good thing IMO, and hurts developers that want a short compile/execute/debug cycle. In this case, if there's a known codegen bug in a particular compiler or certain versions of it, I recommend the use of either the C preprocessor or __attribute__((no_inline)) to get the desired behavior localized to the function in question, and for us to proceed with Masahiro's cleanup. The comment above the use of CONFIG_OPTIMIZE_INLINING in include/linux/compiler_types.h says: * Force always-inline if the user requests it so via the .config. Which makes me grimace (__attribute__((always_inline)) doesn't *force* anything as per above), and the idea that forcing things marked inline to also be __attribute__((always_inline)) is an "optimization" (re: the name of the config; CONFIG_OPTIMIZE_INLINING) is also highly suspect. Aggressive inlining leads to image size bloat, instruction cache and register pressure; it is not exclusively an optimization. > > > The problems are fixable by writing correct code. > > Right, in the compiler ;) > > > I think we discussed this already. > > We did? > > > - There is nothing arch-specific in CONFIG_OPTIMIZE_INLINING > > Apart from the bugs... and even then, that's just based on reports. > > > - 'inline' is just a hint. It does not guarantee function inlining. > > This is standard. > > > > - The kernel macrofies 'inline' to add __attribute__((__always_inline__)) > > This terrible hack must end. > > I'm all for getting rid of hacks, but not at the cost of correctness. > > > - __attribute__((__always_inline__)) takes aways compiler's freedom, > > and prevents it from optimizing the code for -O2, -Os, or whatever. > > s/whatever/miscompiling the code/ > > If it helps, here is more information about the arm64 failure which > triggered the GCC bugzilla: > > https://www.spinics.net/lists/arm-kernel/msg730329.html > > Will -- Thanks, ~Nick Desaulniers
On Mon, Sep 30, 2019 at 11:50 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > So __attribute__((always_inline)) doesn't guarantee that code will be > inlined. [...] inline and __attribute__((always_inline)) > are a heuristic laden mess and should not be relied upon. Small note: in GCC, __attribute__((always_inline)) is documented as actually guaranteeing to either inline or error otherwise (although see the remark for indirect calls): "Failure to inline such a function is diagnosed as an error. Note that if such a function is called indirectly the compiler may or may not inline it depending on optimization level and a failure to inline an indirect call may or may not be diagnosed." As for LLVM/Clang, no idea, since it does not say anything about it in the docs -- but from what you say, it is a weaker guarantee. Cheers, Miguel
On Mon, Sep 30, 2019 at 3:08 PM Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote: > > On Mon, Sep 30, 2019 at 11:50 PM Nick Desaulniers > <ndesaulniers@google.com> wrote: > > > > So __attribute__((always_inline)) doesn't guarantee that code will be > > inlined. [...] inline and __attribute__((always_inline)) > > are a heuristic laden mess and should not be relied upon. > > Small note: in GCC, __attribute__((always_inline)) is documented as > actually guaranteeing to either inline or error otherwise (although > see the remark for indirect calls): > > "Failure to inline such a function is diagnosed as an error. Note Not an error, but a warning at least: https://godbolt.org/z/_V5im1. That's interesting, so it has multiple semantics, because it's also documented to inline even when no optimizations are specified. So when someone uses __attribute__((always_inline)) without a comment, it's not clear whether they mean for there to be a warning when this is not inlined, or for it to be inlined at -O0 (guess not for the kernel), or both. If the kernel wants to enforce the former, why not set `-Werror=attributes`? Maybe that warning is too broad? Seems like a recipe for subtly broken code found at runtime, when we'd rather have stronger compile time guarantees. > that if such a function is called indirectly the compiler may or may > not inline it depending on optimization level and a failure to inline > an indirect call may or may not be diagnosed." > > As for LLVM/Clang, no idea, since it does not say anything about it in > the docs -- but from what you say, it is a weaker guarantee. Filed https://bugs.llvm.org/show_bug.cgi?id=43517 -- Thanks, ~Nick Desaulniers
Hi Nick, On Mon, Sep 30, 2019 at 02:50:10PM -0700, Nick Desaulniers wrote: > On Mon, Sep 30, 2019 at 5:18 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Sep 30, 2019 at 09:05:11PM +0900, Masahiro Yamada wrote: > > > On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote: > > > > FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local > > > > variables marked as 'register' where GCC would do crazy things and end > > > > up corrupting data, so I suspect the use of fixed registers in the arm > > > > uaccess functions is hitting something similar: > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111 > > > > > > No. Not similar at all. > > > > They're similar in that enabling CONFIG_OPTIMIZE_INLINING causes register > > variables to go wrong. I agree that the ARM code looks dodgy with > > that call to uaccess_save_and_enable(), but there are __asmeq macros > > in there to try to catch that, so it's still very fishy. > > > > > I fixed it already. See > > > https://lore.kernel.org/patchwork/patch/1132459/ > > > > You fixed the specific case above for 32-bit ARM, but the arm64 case > > is due to a compiler bug. As it happens, we've reworked our atomics > > in 5.4 so that particular issue no longer triggers, but the fact remains > > that GCC has been shown to screw up explicit register allocation for > > perfectly legitimate code when giving the flexibility to move code out > > of line. > > So __attribute__((always_inline)) doesn't guarantee that code will be > inlined. For instance in LLVM's inliner, it asks/answers "should I > inline" and "can I inline." "Should" has to do with a cost model, and > is very heuristic-y. "Can" has more to do with the transforms, and > whether they're all implemented and safe. If you if you say > __attribute__((always_inline)), the answer to "can I inline this" can > still be *no*. The only way to guarantee inlining is via the C > preprocessor. The only way to prevent inlining is via > __attribute__((no_inline)). inline and __attribute__((always_inline)) > are a heuristic laden mess and should not be relied upon. I would > also look closely at code that *requires* inlining or the lack there > of to be correct. That the kernel no longer compiles at -O0 is not a > good thing IMO, and hurts developers that want a short > compile/execute/debug cycle. > > In this case, if there's a known codegen bug in a particular compiler > or certain versions of it, I recommend the use of either the C > preprocessor or __attribute__((no_inline)) to get the desired behavior > localized to the function in question, and for us to proceed with > Masahiro's cleanup. Hmm, I don't see how that would help. The problem occurs when things are moved out of line by the compiler (see below). > The comment above the use of CONFIG_OPTIMIZE_INLINING in > include/linux/compiler_types.h says: > * Force always-inline if the user requests it so via the .config. > Which makes me grimace (__attribute__((always_inline)) doesn't *force* > anything as per above), and the idea that forcing things marked inline > to also be __attribute__((always_inline)) is an "optimization" (re: > the name of the config; CONFIG_OPTIMIZE_INLINING) is also highly > suspect. Aggressive inlining leads to image size bloat, instruction > cache and register pressure; it is not exclusively an optimization. Agreed on all of this, but the fact remains that GCC has been shown to *miscompile* the arm64 kernel with CONFIG_OPTIMIZE_INLINING=y. Please, look at this thread: https://www.spinics.net/lists/arm-kernel/msg730329.html https://www.spinics.net/lists/arm-kernel/msg730512.html GCC decides to pull an atomic operation out-of-line and, in doing so, gets the register allocations subtly wrong when passing a 'register' variable into an inline asm. I would like to avoid this sort of thing happening, since it can result in really nasty bugs that manifest at runtime and are extremely difficult to debug, which is why I would much prefer not to have this option on by default for arm64. I sent a patch already: https://lkml.kernel.org/r/20190930114540.27498-1-will@kernel.org and I'm happy to spin a v2 which depends on !CC_IS_CLANG as well. Reducing the instruction cache footprint is great, but not if the resulting code is broken! Will
Hi Will, On Mon, Sep 30, 2019 at 9:18 PM Will Deacon <will@kernel.org> wrote: > > On Mon, Sep 30, 2019 at 09:05:11PM +0900, Masahiro Yamada wrote: > > On Mon, Sep 30, 2019 at 8:26 PM Will Deacon <will@kernel.org> wrote: > > > On Fri, Sep 27, 2019 at 03:38:44PM -0700, Linus Torvalds wrote: > > > > Soem of that code is pretty subtle. They have fixed register usage > > > > (but the asm macros actually check them). And the inline asms clobber > > > > the link register, but they do seem to clearly _state_ that they > > > > clobber it, so who knows. > > > > > > > > Just based on the EFAULT, I'd _guess_ that it's some interaction with > > > > the domain access control register (so that get/set_domain() thing). > > > > But I'm not even sure that code is enabled for the Rpi2, so who > > > > knows.. > > > > > > FWIW, we've run into issues with CONFIG_OPTIMIZE_INLINING and local > > > variables marked as 'register' where GCC would do crazy things and end > > > up corrupting data, so I suspect the use of fixed registers in the arm > > > uaccess functions is hitting something similar: > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91111 > > > > No. Not similar at all. > > They're similar in that enabling CONFIG_OPTIMIZE_INLINING causes register > variables to go wrong. You are just looking at the result, not the cause. > I agree that the ARM code looks dodgy with > that call to uaccess_save_and_enable(), but there are __asmeq macros > in there to try to catch that, so it's still very fishy. I am totally fine with double-checking the output from the compiler. > > > I fixed it already. See > > https://lore.kernel.org/patchwork/patch/1132459/ > > You fixed the specific case above for 32-bit ARM, but the arm64 case > is due to a compiler bug. As it happens, we've reworked our atomics > in 5.4 so that particular issue no longer triggers, but the fact remains > that GCC has been shown to screw up explicit register allocation for > perfectly legitimate code when giving the flexibility to move code out > of line. > > > The problems are fixable by writing correct code. > > Right, in the compiler ;) Not always. You showed a compiler bug case for arm64. The 32bit ARM is not the case. > > > I think we discussed this already. > > We did? https://www.spinics.net/lists/arm-kernel/msg754567.html This is a bug in the kernel code. > > - There is nothing arch-specific in CONFIG_OPTIMIZE_INLINING > > Apart from the bugs... and even then, that's just based on reports. > > > - 'inline' is just a hint. It does not guarantee function inlining. > > This is standard. > > > > - The kernel macrofies 'inline' to add __attribute__((__always_inline__)) > > This terrible hack must end. > > I'm all for getting rid of hacks, but not at the cost of correctness. > > > - __attribute__((__always_inline__)) takes aways compiler's freedom, > > and prevents it from optimizing the code for -O2, -Os, or whatever. > > s/whatever/miscompiling the code/ > > If it helps, here is more information about the arm64 failure which > triggered the GCC bugzilla: > > https://www.spinics.net/lists/arm-kernel/msg730329.html > > Will You put multiple references here and there, but they are all about arch_atomic64_dec_if_positive(). -- Best Regards Masahiro Yamada
On Tue, Oct 01, 2019 at 06:39:34PM +0900, Masahiro Yamada wrote: > On Mon, Sep 30, 2019 at 9:18 PM Will Deacon <will@kernel.org> wrote: > > I agree that the ARM code looks dodgy with > > that call to uaccess_save_and_enable(), but there are __asmeq macros > > in there to try to catch that, so it's still very fishy. > > I am totally fine with double-checking > the output from the compiler. Sure, but don't you find it odd that those checks didn't fire, and instead the failure occurred at runtime? > > > I fixed it already. See > > > https://lore.kernel.org/patchwork/patch/1132459/ > > > > You fixed the specific case above for 32-bit ARM, but the arm64 case > > is due to a compiler bug. As it happens, we've reworked our atomics > > in 5.4 so that particular issue no longer triggers, but the fact remains > > that GCC has been shown to screw up explicit register allocation for > > perfectly legitimate code when giving the flexibility to move code out > > of line. > > > > > The problems are fixable by writing correct code. > > > > Right, in the compiler ;) > > Not always. > > You showed a compiler bug case for arm64. > The 32bit ARM is not the case. I would be /very/ surprised if the same compiler bug didn't also apply to ARM, which is why I'm championing the conservative approach of restoring the old default behaviour rather than run the risk of runtime failures. > > If it helps, here is more information about the arm64 failure which > > triggered the GCC bugzilla: > > > > https://www.spinics.net/lists/arm-kernel/msg730329.html > > > You put multiple references here and there, > but they are all about arch_atomic64_dec_if_positive(). Right, but that's because it was the atomic64 selftest code which triggered the compiler bug in practice. Without a better understanding of why GCC got this wrong, we can't assume that no other register variables are affected. Will
On Tue, Oct 1, 2019 at 2:28 AM Will Deacon <will@kernel.org> wrote: > > Hi Nick, > > On Mon, Sep 30, 2019 at 02:50:10PM -0700, Nick Desaulniers wrote: > > So __attribute__((always_inline)) doesn't guarantee that code will be > > inlined. For instance in LLVM's inliner, it asks/answers "should I > > inline" and "can I inline." "Should" has to do with a cost model, and > > is very heuristic-y. "Can" has more to do with the transforms, and > > whether they're all implemented and safe. If you if you say > > __attribute__((always_inline)), the answer to "can I inline this" can > > still be *no*. The only way to guarantee inlining is via the C > > preprocessor. The only way to prevent inlining is via > > __attribute__((no_inline)). inline and __attribute__((always_inline)) > > are a heuristic laden mess and should not be relied upon. I would > > also look closely at code that *requires* inlining or the lack there > > of to be correct. That the kernel no longer compiles at -O0 is not a > > good thing IMO, and hurts developers that want a short > > compile/execute/debug cycle. > > > > In this case, if there's a known codegen bug in a particular compiler > > or certain versions of it, I recommend the use of either the C > > preprocessor or __attribute__((no_inline)) to get the desired behavior > > localized to the function in question, and for us to proceed with > > Masahiro's cleanup. > > Hmm, I don't see how that would help. The problem occurs when things > are moved out of line by the compiler (see below). It's being moved out of line because __attribute__((always_inline)) or just inline provide no guarantees that outlining does not occur. It would help to make functions that need to be inlined macros, because the C preprocessor doesn't have that issue. > > > The comment above the use of CONFIG_OPTIMIZE_INLINING in > > include/linux/compiler_types.h says: > > * Force always-inline if the user requests it so via the .config. > > Which makes me grimace (__attribute__((always_inline)) doesn't *force* > > anything as per above), and the idea that forcing things marked inline > > to also be __attribute__((always_inline)) is an "optimization" (re: > > the name of the config; CONFIG_OPTIMIZE_INLINING) is also highly > > suspect. Aggressive inlining leads to image size bloat, instruction > > cache and register pressure; it is not exclusively an optimization. > > Agreed on all of this, but the fact remains that GCC has been shown to > *miscompile* the arm64 kernel with CONFIG_OPTIMIZE_INLINING=y. Please, > look at this thread: > > https://www.spinics.net/lists/arm-kernel/msg730329.html > https://www.spinics.net/lists/arm-kernel/msg730512.html > > GCC decides to pull an atomic operation out-of-line and, in doing so, If the function is incorrect unless inlined, use a macro. > gets the register allocations subtly wrong when passing a 'register' > variable into an inline asm. I would like to avoid this sort of thing > happening, since it can result in really nasty bugs that manifest at > runtime and are extremely difficult to debug, which is why I would much > prefer not to have this option on by default for arm64. I sent a patch > already: > > https://lkml.kernel.org/r/20190930114540.27498-1-will@kernel.org > > and I'm happy to spin a v2 which depends on !CC_IS_CLANG as well. For small things like whether we mark a function always_inline or not, I think it's simpler to just keep the code consistent between compilers, even if it's to work around a bug in one compiler. A comment in the code would be sufficient. > > Reducing the instruction cache footprint is great, but not if the > resulting code is broken! You don't have to convince compiler folks about correctness. ;) Correctness trumps all, especially performance. -- Thanks, ~Nick Desaulniers
On Tue, Oct 01, 2019 at 09:32:25AM -0700, Nick Desaulniers wrote: > On Tue, Oct 1, 2019 at 2:28 AM Will Deacon <will@kernel.org> wrote: > > On Mon, Sep 30, 2019 at 02:50:10PM -0700, Nick Desaulniers wrote: > > > In this case, if there's a known codegen bug in a particular compiler > > > or certain versions of it, I recommend the use of either the C > > > preprocessor or __attribute__((no_inline)) to get the desired behavior > > > localized to the function in question, and for us to proceed with > > > Masahiro's cleanup. > > > > Hmm, I don't see how that would help. The problem occurs when things > > are moved out of line by the compiler (see below). > > It's being moved out of line because __attribute__((always_inline)) or > just inline provide no guarantees that outlining does not occur. It > would help to make functions that need to be inlined macros, because > the C preprocessor doesn't have that issue. Hmm, let me try to put it another way: enabling CONFIG_OPTIMIZE_INLINING has been shown to cause miscompilation with many versions of GCC on arm64 because the compiler moves things out of line that it otherwise doesn't appear to do. I don't see how __attribute__((no_inline)) helps with that, and replacing static functions with macros isn't great for type-checking and argument evaluation. > > > The comment above the use of CONFIG_OPTIMIZE_INLINING in > > > include/linux/compiler_types.h says: > > > * Force always-inline if the user requests it so via the .config. > > > Which makes me grimace (__attribute__((always_inline)) doesn't *force* > > > anything as per above), and the idea that forcing things marked inline > > > to also be __attribute__((always_inline)) is an "optimization" (re: > > > the name of the config; CONFIG_OPTIMIZE_INLINING) is also highly > > > suspect. Aggressive inlining leads to image size bloat, instruction > > > cache and register pressure; it is not exclusively an optimization. > > > > Agreed on all of this, but the fact remains that GCC has been shown to > > *miscompile* the arm64 kernel with CONFIG_OPTIMIZE_INLINING=y. Please, > > look at this thread: > > > > https://www.spinics.net/lists/arm-kernel/msg730329.html > > https://www.spinics.net/lists/arm-kernel/msg730512.html > > > > GCC decides to pull an atomic operation out-of-line and, in doing so, > > If the function is incorrect unless inlined, use a macro. The function is correct per the GCC documentation regarding register variables and inline asm. > > Reducing the instruction cache footprint is great, but not if the > > resulting code is broken! > > You don't have to convince compiler folks about correctness. ;) > Correctness trumps all, especially performance. Then why is this conversation so difficult? :( Will
On Tue, Oct 1, 2019 at 10:01 AM Will Deacon <will@kernel.org> wrote: > > On Tue, Oct 01, 2019 at 09:32:25AM -0700, Nick Desaulniers wrote: > > On Tue, Oct 1, 2019 at 2:28 AM Will Deacon <will@kernel.org> wrote: > > > On Mon, Sep 30, 2019 at 02:50:10PM -0700, Nick Desaulniers wrote: > > > > In this case, if there's a known codegen bug in a particular compiler > > > > or certain versions of it, I recommend the use of either the C > > > > preprocessor or __attribute__((no_inline)) to get the desired behavior > > > > localized to the function in question, and for us to proceed with > > > > Masahiro's cleanup. > > > > > > Hmm, I don't see how that would help. The problem occurs when things > > > are moved out of line by the compiler (see below). > > > > It's being moved out of line because __attribute__((always_inline)) or > > just inline provide no guarantees that outlining does not occur. It > > would help to make functions that need to be inlined macros, because > > the C preprocessor doesn't have that issue. > > Hmm, let me try to put it another way: enabling CONFIG_OPTIMIZE_INLINING > has been shown to cause miscompilation with many versions of GCC on arm64 > because the compiler moves things out of line that it otherwise doesn't > appear to do. Yes. That code should use the C preprocessor to force inlining. Then you can enable CONFIG_OPTIMIZE_INLINING for arm64. > I don't see how __attribute__((no_inline)) helps with that, Because that *wasn't* the point of what I said about __attribute__((no_inline)). My point is to use the C preprocessor to guarantee that no outlining occurs. My point related to __attribute__((no_inline)) was the simple decision matrix that should be used: Do I need strong guarantees that my code must be inlined? Use the C preprocessor. Do I need strong guarantees that my code is *not* to be inlined? Use __attribute__((no_inline)). The *former* is what applies here, not the latter. inline and __attribute__((always_inline)) don't provide strong guarantees (despite their fun sounding names; -funsafe-math-optimizations/-Ofast being the poster child for "that sounds nice, I would like my math to be fun, safe, and fast, let's use that, WCGW?"). The more we use them, the more we continuously be bitten by bugs related to outlining. Like the one you cited and the arm32 patch Masahiro wrote. Would using the C preprocessor to force inlining fix the GCC/arm64 miscompilation bug you described above? I suspect it would. > and replacing static functions with macros isn't great for type-checking > and argument evaluation. See typecheck (include/linux/typecheck.h). > > You don't have to convince compiler folks about correctness. ;) > > Correctness trumps all, especially performance. > > Then why is this conversation so difficult? :( I apologize; I don't mean to be difficult. I would just like to avoid surprises when code written with the assumption that it will be inlined is not. It sounds like we found one issue in arm32 and one in arm64 related to outlining. If we fix those two cases, I think we're close to proceeding with Masahiro's cleanup, which I view as a good thing for the health of the Linux kernel codebase. -- Thanks, ~Nick Desaulniers
On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote: > I apologize; I don't mean to be difficult. I would just like to avoid > surprises when code written with the assumption that it will be > inlined is not. It sounds like we found one issue in arm32 and one in > arm64 related to outlining. If we fix those two cases, I think we're > close to proceeding with Masahiro's cleanup, which I view as a good > thing for the health of the Linux kernel codebase. Except, using the C preprocessor for this turns the arm32 code into yuck: 1. We'd need to turn get_domain() and set_domain() into multi-line preprocessor macro definitions, using the GCC ({ }) extension so that get_domain() can return a value. 2. uaccess_save_and_enable() and uaccess_restore() also need to become preprocessor macro definitions too. So, we end up with multiple levels of nested preprocessor macros. When something goes wrong, the compiler warning/error message is going to be utterly _horrid_. Now, as to whether an __attribute__((always_inline)) can or can not be inlined... `always_inline' Generally, functions are not inlined unless optimization is specified. For functions declared inline, this attribute inlines the function even if no optimization level is specified. Is this another instance of the compiler folk changing the rules of already documented semantics? This says nothing about "might not be inlined if someone passes some random combination of -f flags". -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote: > > I apologize; I don't mean to be difficult. I would just like to avoid > > surprises when code written with the assumption that it will be > > inlined is not. It sounds like we found one issue in arm32 and one in > > arm64 related to outlining. If we fix those two cases, I think we're > > close to proceeding with Masahiro's cleanup, which I view as a good > > thing for the health of the Linux kernel codebase. > > Except, using the C preprocessor for this turns the arm32 code into > yuck: > > 1. We'd need to turn get_domain() and set_domain() into multi-line > preprocessor macro definitions, using the GCC ({ }) extension > so that get_domain() can return a value. > > 2. uaccess_save_and_enable() and uaccess_restore() also need to > become preprocessor macro definitions too. > > So, we end up with multiple levels of nested preprocessor macros. > When something goes wrong, the compiler warning/error message is > going to be utterly _horrid_. That's why I preferred V1 of Masahiro's patch, that fixed the inline asm not to make use of caller saved registers before calling a function that might not be inlined. -- Thanks, ~Nick Desaulniers
On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote: > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote: > > > I apologize; I don't mean to be difficult. I would just like to avoid > > > surprises when code written with the assumption that it will be > > > inlined is not. It sounds like we found one issue in arm32 and one in > > > arm64 related to outlining. If we fix those two cases, I think we're > > > close to proceeding with Masahiro's cleanup, which I view as a good > > > thing for the health of the Linux kernel codebase. > > > > Except, using the C preprocessor for this turns the arm32 code into > > yuck: > > > > 1. We'd need to turn get_domain() and set_domain() into multi-line > > preprocessor macro definitions, using the GCC ({ }) extension > > so that get_domain() can return a value. > > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to > > become preprocessor macro definitions too. > > > > So, we end up with multiple levels of nested preprocessor macros. > > When something goes wrong, the compiler warning/error message is > > going to be utterly _horrid_. > > That's why I preferred V1 of Masahiro's patch, that fixed the inline > asm not to make use of caller saved registers before calling a > function that might not be inlined. ... which I objected to based on the fact that this uaccess stuff is supposed to add protection against the kernel being fooled into accessing userspace when it shouldn't. The whole intention there is that [sg]et_domain(), and uaccess_*() are _always_ inlined as close as possible to the call site of the accessor touching userspace. Moving it before the assignments mean that the compiler is then free to issue memory loads/stores to load up those registers, which is exactly what we want to avoid. In any case, I violently disagree with the idea that stuff we have in header files should be permitted not to be inlined because we have soo much that is marked inline. Having it moved out of line, and essentially the same function code appearing in multiple C files is really not an improvement over the current situation with excessive use of inlining. Anyone who has looked at the code resulting from dma_map_single() will know exactly what I'm talking about, which is way in excess of the few instructions we have for the uaccess_* stuff here. The right approach is to move stuff out of line - and by that, I mean _actually_ move the damn code, so that different compilation units can use the same instructions, and thereby gain from the whole point of an instruction cache. The whole "let's make inline not really mean inline" is nothing more than a band-aid to the overuse (and abuse) of "inline". -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote: > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote: > > > > I apologize; I don't mean to be difficult. I would just like to avoid > > > > surprises when code written with the assumption that it will be > > > > inlined is not. It sounds like we found one issue in arm32 and one in > > > > arm64 related to outlining. If we fix those two cases, I think we're > > > > close to proceeding with Masahiro's cleanup, which I view as a good > > > > thing for the health of the Linux kernel codebase. > > > > > > Except, using the C preprocessor for this turns the arm32 code into > > > yuck: > > > > > > 1. We'd need to turn get_domain() and set_domain() into multi-line > > > preprocessor macro definitions, using the GCC ({ }) extension > > > so that get_domain() can return a value. > > > > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to > > > become preprocessor macro definitions too. > > > > > > So, we end up with multiple levels of nested preprocessor macros. > > > When something goes wrong, the compiler warning/error message is > > > going to be utterly _horrid_. > > > > That's why I preferred V1 of Masahiro's patch, that fixed the inline > > asm not to make use of caller saved registers before calling a > > function that might not be inlined. > > ... which I objected to based on the fact that this uaccess stuff is > supposed to add protection against the kernel being fooled into > accessing userspace when it shouldn't. The whole intention there is > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close > as possible to the call site of the accessor touching userspace. Then use the C preprocessor to force the inlining. I'm sorry it's not as pretty as static inline functions. > > Moving it before the assignments mean that the compiler is then free > to issue memory loads/stores to load up those registers, which is > exactly what we want to avoid. > > > In any case, I violently disagree with the idea that stuff we have > in header files should be permitted not to be inlined because we > have soo much that is marked inline. So there's a very important subtly here. There's: 1. code that adds `inline` cause "oh maybe it would be nice to inline this, but if it isn't no big deal" 2. code that if not inlined is somehow not correct. 3. avoid ODR violations via `static inline` I'll posit that "we have soo much that is marked inline [is predominantly case 1 or 3, not case 2]." Case 2 is a code smell, and requires extra scrutiny. > Having it moved out of line, > and essentially the same function code appearing in multiple C files > is really not an improvement over the current situation with excessive > use of inlining. Anyone who has looked at the code resulting from > dma_map_single() will know exactly what I'm talking about, which is > way in excess of the few instructions we have for the uaccess_* stuff > here. > > The right approach is to move stuff out of line - and by that, I > mean _actually_ move the damn code, so that different compilation > units can use the same instructions, and thereby gain from the > whole point of an instruction cache. And be marked __attribute__((noinline)), otherwise might be inlined via LTO. > > The whole "let's make inline not really mean inline" is nothing more > than a band-aid to the overuse (and abuse) of "inline". Let's triple check the ISO C11 draft spec just to be sure: § 6.7.4.6: A function declared with an inline function specifier is an inline function. Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined. 139) 139) For example, an implementation might never perform inline substitution, or might only perform inline substitutions to calls in the scope of an inline declaration. § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions made by using the inline function specifier are effective (6.7.4). My translation: "Please don't assume inline means anything." For the unspecified GNU C extension __attribute__((always_inline)), it seems to me like it's meant more for performing inlining (an optimization) at -O0. Whether the compiler warns or not seems like a nice side effect, but provides no strong guarantee otherwise. I'm sorry that so much code may have been written with that assumption, and I'm sorry to be the bearer of bad news, but this isn't a recent change. If code was written under false assumptions, it should be rewritten. Sorry. -- Thanks, ~Nick Desaulniers
On Tue, Oct 1, 2019 at 10:21 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote: > > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > In any case, I violently disagree with the idea that stuff we have > > in header files should be permitted not to be inlined because we > > have soo much that is marked inline. > > So there's a very important subtly here. There's: > 1. code that adds `inline` cause "oh maybe it would be nice to inline > this, but if it isn't no big deal" > 2. code that if not inlined is somehow not correct. > 3. avoid ODR violations via `static inline` > > I'll posit that "we have soo much that is marked inline [is > predominantly case 1 or 3, not case 2]." Case 2 is a code smell, and > requires extra scrutiny. 1. is clearly the most common case, but there is also 4. Some compiler version (possibly long gone, possibly still current) makes bad inlining decisions that result in horrible but functionally correct object code for a particular function, and forcing a function to be inlined results in what we had expected the compiler to do already. The problem with 2. is that they are largely not documented, and often unintentional. This also affects functions that are simply 'static' rather than 'static inline' but are broken when not being inlined. E.g. we've had a number of kernel bugs with 'static __init' functions that gcc always inlined into a non-__init caller, but that produced an invalid call into a discarded function when clang decided to leave them out of line. Arnd
On Tue, Oct 01, 2019 at 01:21:44PM -0700, Nick Desaulniers wrote: > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote: > > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > > > > > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote: > > > > > I apologize; I don't mean to be difficult. I would just like to avoid > > > > > surprises when code written with the assumption that it will be > > > > > inlined is not. It sounds like we found one issue in arm32 and one in > > > > > arm64 related to outlining. If we fix those two cases, I think we're > > > > > close to proceeding with Masahiro's cleanup, which I view as a good > > > > > thing for the health of the Linux kernel codebase. > > > > > > > > Except, using the C preprocessor for this turns the arm32 code into > > > > yuck: > > > > > > > > 1. We'd need to turn get_domain() and set_domain() into multi-line > > > > preprocessor macro definitions, using the GCC ({ }) extension > > > > so that get_domain() can return a value. > > > > > > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to > > > > become preprocessor macro definitions too. > > > > > > > > So, we end up with multiple levels of nested preprocessor macros. > > > > When something goes wrong, the compiler warning/error message is > > > > going to be utterly _horrid_. > > > > > > That's why I preferred V1 of Masahiro's patch, that fixed the inline > > > asm not to make use of caller saved registers before calling a > > > function that might not be inlined. > > > > ... which I objected to based on the fact that this uaccess stuff is > > supposed to add protection against the kernel being fooled into > > accessing userspace when it shouldn't. The whole intention there is > > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close > > as possible to the call site of the accessor touching userspace. > > Then use the C preprocessor to force the inlining. I'm sorry it's not > as pretty as static inline functions. > > > > > Moving it before the assignments mean that the compiler is then free > > to issue memory loads/stores to load up those registers, which is > > exactly what we want to avoid. > > > > > > In any case, I violently disagree with the idea that stuff we have > > in header files should be permitted not to be inlined because we > > have soo much that is marked inline. > > So there's a very important subtly here. There's: > 1. code that adds `inline` cause "oh maybe it would be nice to inline > this, but if it isn't no big deal" > 2. code that if not inlined is somehow not correct. > 3. avoid ODR violations via `static inline` > > I'll posit that "we have soo much that is marked inline [is > predominantly case 1 or 3, not case 2]." Case 2 is a code smell, and > requires extra scrutiny. > > > Having it moved out of line, > > and essentially the same function code appearing in multiple C files > > is really not an improvement over the current situation with excessive > > use of inlining. Anyone who has looked at the code resulting from > > dma_map_single() will know exactly what I'm talking about, which is > > way in excess of the few instructions we have for the uaccess_* stuff > > here. > > > > The right approach is to move stuff out of line - and by that, I > > mean _actually_ move the damn code, so that different compilation > > units can use the same instructions, and thereby gain from the > > whole point of an instruction cache. > > And be marked __attribute__((noinline)), otherwise might be inlined via LTO. > > > > > The whole "let's make inline not really mean inline" is nothing more > > than a band-aid to the overuse (and abuse) of "inline". > > Let's triple check the ISO C11 draft spec just to be sure: > § 6.7.4.6: A function declared with an inline function specifier is an > inline function. Making a > function an inline function suggests that calls to the function be as > fast as possible. > The extent to which such suggestions are effective is > implementation-defined. 139) > 139) For example, an implementation might never perform inline > substitution, or might only perform inline > substitutions to calls in the scope of an inline declaration. > § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions > made by using the inline function specifier are effective (6.7.4). > > My translation: > "Please don't assume inline means anything." > > For the unspecified GNU C extension __attribute__((always_inline)), it > seems to me like it's meant more for performing inlining (an > optimization) at -O0. Whether the compiler warns or not seems like a > nice side effect, but provides no strong guarantee otherwise. > > I'm sorry that so much code may have been written with that > assumption, and I'm sorry to be the bearer of bad news, but this isn't > a recent change. If code was written under false assumptions, it > should be rewritten. Sorry. You may quote C11, but that is not relevent. The kernel is coded to gnu89 standard - see the -std=gnu89 flag. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
On Tue, Oct 1, 2019 at 10:53 PM Arnd Bergmann <arnd@arndb.de> wrote: > > 1. is clearly the most common case, but there is also > > 4. Some compiler version (possibly long gone, possibly still current) > makes bad inlining decisions that result in horrible but functionally > correct object code for a particular function, and forcing a function to > be inlined results in what we had expected the compiler to do already. There is also 5. code that does not even compile without it, e.g. _static_cpu_has() in x86_64 which requires __attribute__((always_inline)), at least on GCC 9.2. For x64_64 it is the only one case I found, though. If you disable __always_inline everything else compiles and links (in a defconfig). Cheers, Miguel
On Tue, Oct 1, 2019 at 2:06 PM Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote: > > On Tue, Oct 1, 2019 at 10:53 PM Arnd Bergmann <arnd@arndb.de> wrote: > > > > 1. is clearly the most common case, but there is also > > > > 4. Some compiler version (possibly long gone, possibly still current) > > makes bad inlining decisions that result in horrible but functionally > > correct object code for a particular function, and forcing a function to > > be inlined results in what we had expected the compiler to do already. > > There is also 5. code that does not even compile without it, e.g. > _static_cpu_has() in x86_64 which requires > __attribute__((always_inline)), at least on GCC 9.2. I assert that's just another case of 2, and should be investigated. (I think I remember that from when I had to teach LLVM how to inline asm goto; since the compiler can reject inlining if it doesn't know how to do such a transform). > > For x64_64 it is the only one case I found, though. If you disable > __always_inline everything else compiles and links (in a defconfig). Cool, so one bug in arm32, one bug in arm64, one bug in x86_64. Doesn't sound like too much work to fix. -- Thanks, ~Nick Desaulniers
On Tue, Oct 01, 2019 at 09:59:38PM +0100, Russell King - ARM Linux admin wrote: > On Tue, Oct 01, 2019 at 01:21:44PM -0700, Nick Desaulniers wrote: > > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > > > > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote: > > > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin > > > > <linux@armlinux.org.uk> wrote: > > > > > > > > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote: > > > > > > I apologize; I don't mean to be difficult. I would just like to avoid > > > > > > surprises when code written with the assumption that it will be > > > > > > inlined is not. It sounds like we found one issue in arm32 and one in > > > > > > arm64 related to outlining. If we fix those two cases, I think we're > > > > > > close to proceeding with Masahiro's cleanup, which I view as a good > > > > > > thing for the health of the Linux kernel codebase. > > > > > > > > > > Except, using the C preprocessor for this turns the arm32 code into > > > > > yuck: > > > > > > > > > > 1. We'd need to turn get_domain() and set_domain() into multi-line > > > > > preprocessor macro definitions, using the GCC ({ }) extension > > > > > so that get_domain() can return a value. > > > > > > > > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to > > > > > become preprocessor macro definitions too. > > > > > > > > > > So, we end up with multiple levels of nested preprocessor macros. > > > > > When something goes wrong, the compiler warning/error message is > > > > > going to be utterly _horrid_. > > > > > > > > That's why I preferred V1 of Masahiro's patch, that fixed the inline > > > > asm not to make use of caller saved registers before calling a > > > > function that might not be inlined. > > > > > > ... which I objected to based on the fact that this uaccess stuff is > > > supposed to add protection against the kernel being fooled into > > > accessing userspace when it shouldn't. The whole intention there is > > > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close > > > as possible to the call site of the accessor touching userspace. > > > > Then use the C preprocessor to force the inlining. I'm sorry it's not > > as pretty as static inline functions. > > > > > > > > Moving it before the assignments mean that the compiler is then free > > > to issue memory loads/stores to load up those registers, which is > > > exactly what we want to avoid. > > > > > > > > > In any case, I violently disagree with the idea that stuff we have > > > in header files should be permitted not to be inlined because we > > > have soo much that is marked inline. > > > > So there's a very important subtly here. There's: > > 1. code that adds `inline` cause "oh maybe it would be nice to inline > > this, but if it isn't no big deal" > > 2. code that if not inlined is somehow not correct. > > 3. avoid ODR violations via `static inline` > > > > I'll posit that "we have soo much that is marked inline [is > > predominantly case 1 or 3, not case 2]." Case 2 is a code smell, and > > requires extra scrutiny. > > > > > Having it moved out of line, > > > and essentially the same function code appearing in multiple C files > > > is really not an improvement over the current situation with excessive > > > use of inlining. Anyone who has looked at the code resulting from > > > dma_map_single() will know exactly what I'm talking about, which is > > > way in excess of the few instructions we have for the uaccess_* stuff > > > here. > > > > > > The right approach is to move stuff out of line - and by that, I > > > mean _actually_ move the damn code, so that different compilation > > > units can use the same instructions, and thereby gain from the > > > whole point of an instruction cache. > > > > And be marked __attribute__((noinline)), otherwise might be inlined via LTO. > > > > > > > > The whole "let's make inline not really mean inline" is nothing more > > > than a band-aid to the overuse (and abuse) of "inline". > > > > Let's triple check the ISO C11 draft spec just to be sure: > > § 6.7.4.6: A function declared with an inline function specifier is an > > inline function. Making a > > function an inline function suggests that calls to the function be as > > fast as possible. > > The extent to which such suggestions are effective is > > implementation-defined. 139) > > 139) For example, an implementation might never perform inline > > substitution, or might only perform inline > > substitutions to calls in the scope of an inline declaration. > > § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions > > made by using the inline function specifier are effective (6.7.4). > > > > My translation: > > "Please don't assume inline means anything." > > > > For the unspecified GNU C extension __attribute__((always_inline)), it > > seems to me like it's meant more for performing inlining (an > > optimization) at -O0. Whether the compiler warns or not seems like a > > nice side effect, but provides no strong guarantee otherwise. > > > > I'm sorry that so much code may have been written with that > > assumption, and I'm sorry to be the bearer of bad news, but this isn't > > a recent change. If code was written under false assumptions, it > > should be rewritten. Sorry. > > You may quote C11, but that is not relevent. The kernel is coded to > gnu89 standard - see the -std=gnu89 flag. There's more to this and why C11 is entirely irrelevant. The "inline" you see in our headers is not the compiler keyword that you find in various C standards, it is a macro that gets expanded to either: #define inline inline __attribute__((__always_inline__)) __gnu_inline \ __maybe_unused notrace or #define inline inline __gnu_inline \ __maybe_unused notrace __gnu_inline is defined as: #define __gnu_inline __attribute__((__gnu_inline__)) So this attaches the gnu_inline attribute to the function: `gnu_inline' This attribute should be used with a function that is also declared with the `inline' keyword. It directs GCC to treat the function as if it were defined in gnu90 mode even when compiling in C99 or gnu99 mode. ... Since ISO C99 specifies a different semantics for `inline', this function attribute is provided as a transition measure and as a useful feature in its own right. This attribute is available in GCC 4.1.3 and later. It is available if either of the preprocessor macros `__GNUC_GNU_INLINE__' or `__GNUC_STDC_INLINE__' are defined. *Note An Inline Function is As Fast As a Macro: Inline. which is quite clear that C99 semantics do not apply to _this_ inline. The manual goes on to explain: GCC implements three different semantics of declaring a function inline. One is available with `-std=gnu89' or `-fgnu89-inline' or when `gnu_inline' attribute is present on all inline declarations, another when `-std=c99', `-std=c11', `-std=gnu99' or `-std=gnu11' (without `-fgnu89-inline'), and the third is used when compiling C++. I'd suggest gnu90 mode is pretty similar to gnu89 mode, and as we build the kernel in gnu89 mode, that is the inlining definition that is appropriate. When it comes to __always_inline, the GCC manual is the definitive reference, since we use the GCC attribute for that: #define __always_inline inline __attribute__((__always_inline__)) and I've already quoted what the GCC manual says for always_inline. Arguing about what the C11 spec says about inlining when we aren't using C11 dialect in the kernel, but are using GCC features, does not move the discussion on. Thanks anyway, maybe it will become relevent in the future if we decide to move to C11. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
On Tue, Oct 1, 2019 at 2:26 PM Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Tue, Oct 01, 2019 at 09:59:38PM +0100, Russell King - ARM Linux admin wrote: > > On Tue, Oct 01, 2019 at 01:21:44PM -0700, Nick Desaulniers wrote: > > > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > > > > > > > The whole "let's make inline not really mean inline" is nothing more > > > > than a band-aid to the overuse (and abuse) of "inline". > > > > > > Let's triple check the ISO C11 draft spec just to be sure: > > > § 6.7.4.6: A function declared with an inline function specifier is an > > > inline function. Making a > > > function an inline function suggests that calls to the function be as > > > fast as possible. > > > The extent to which such suggestions are effective is > > > implementation-defined. 139) > > > 139) For example, an implementation might never perform inline > > > substitution, or might only perform inline > > > substitutions to calls in the scope of an inline declaration. > > > § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions > > > made by using the inline function specifier are effective (6.7.4). > > > > > > My translation: > > > "Please don't assume inline means anything." > > > > > > For the unspecified GNU C extension __attribute__((always_inline)), it > > > seems to me like it's meant more for performing inlining (an > > > optimization) at -O0. Whether the compiler warns or not seems like a > > > nice side effect, but provides no strong guarantee otherwise. > > > > > > I'm sorry that so much code may have been written with that > > > assumption, and I'm sorry to be the bearer of bad news, but this isn't > > > a recent change. If code was written under false assumptions, it > > > should be rewritten. Sorry. > > > > You may quote C11, but that is not relevent. The kernel is coded to > > gnu89 standard - see the -std=gnu89 flag. > > There's more to this and why C11 is entirely irrelevant. The "inline" > you see in our headers is not the compiler keyword that you find in > various C standards, it is a macro that gets expanded to either: > > #define inline inline __attribute__((__always_inline__)) __gnu_inline \ > __maybe_unused notrace > > or > > #define inline inline __gnu_inline \ > __maybe_unused notrace > > __gnu_inline is defined as: > > #define __gnu_inline __attribute__((__gnu_inline__)) > > So this attaches the gnu_inline attribute to the function: > > `gnu_inline' > This attribute should be used with a function that is also declared > with the `inline' keyword. It directs GCC to treat the function > as if it were defined in gnu90 mode even when compiling in C99 or > gnu99 mode. > ... > Since ISO C99 specifies a different semantics for `inline', this > function attribute is provided as a transition measure and as a > useful feature in its own right. This attribute is available in > GCC 4.1.3 and later. It is available if either of the > preprocessor macros `__GNUC_GNU_INLINE__' or > `__GNUC_STDC_INLINE__' are defined. *Note An Inline Function is > As Fast As a Macro: Inline. > > which is quite clear that C99 semantics do not apply to _this_ inline. > The manual goes on to explain: > > GCC implements three different semantics of declaring a function > inline. One is available with `-std=gnu89' or `-fgnu89-inline' or when > `gnu_inline' attribute is present on all inline declarations, another > when `-std=c99', `-std=c11', `-std=gnu99' or `-std=gnu11' (without > `-fgnu89-inline'), and the third is used when compiling C++. (I wrote the kernel patch for gnu_inline; it only comes into play when `inline` appears on a function *also defined as `extern`*). > > I'd suggest gnu90 mode is pretty similar to gnu89 mode, and as we build > the kernel in gnu89 mode, that is the inlining definition that is > appropriate. > > When it comes to __always_inline, the GCC manual is the definitive > reference, since we use the GCC attribute for that: > > #define __always_inline inline __attribute__((__always_inline__)) > > and I've already quoted what the GCC manual says for always_inline. > > Arguing about what the C11 spec says about inlining when we aren't > using C11 dialect in the kernel, but are using GCC features, does > not move the discussion on. > > Thanks anyway, maybe it will become relevent in the future if we > decide to move to C11. It's not like the semantics of inline are better specified by an older standard, or changed (The only real semantic change involving `inline` between ISO C90 and ISO C99 has to do with whether `extern inline` emits the function with external linkage as you noted). But that's irrelevant to the discussion.). I quoted C11 because ctrl+f doesn't work for the C90 ISO spec pdf. C90 spec doesn't even have a section on Function Specifiers. From what I can tell, `inline` wasn't specified until ISO C99. GNU modes are often modifiers off of ISO C bases; gnu89 corresponds to ISO C90. They may permit the use of features from newer ISO C specs and GNU C extensions without warning under -Wpedantic. There aren't a whole lot of semantic differences, at least that I'm aware of. Please don't assume inline means anything. -- Thanks, ~Nick Desaulniers
On Tue, Oct 01, 2019 at 02:32:54PM -0700, Nick Desaulniers wrote: > On Tue, Oct 1, 2019 at 2:26 PM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Tue, Oct 01, 2019 at 09:59:38PM +0100, Russell King - ARM Linux admin wrote: > > > On Tue, Oct 01, 2019 at 01:21:44PM -0700, Nick Desaulniers wrote: > > > > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin > > > > <linux@armlinux.org.uk> wrote: > > > > > > > > > > The whole "let's make inline not really mean inline" is nothing more > > > > > than a band-aid to the overuse (and abuse) of "inline". > > > > > > > > Let's triple check the ISO C11 draft spec just to be sure: > > > > § 6.7.4.6: A function declared with an inline function specifier is an > > > > inline function. Making a > > > > function an inline function suggests that calls to the function be as > > > > fast as possible. > > > > The extent to which such suggestions are effective is > > > > implementation-defined. 139) > > > > 139) For example, an implementation might never perform inline > > > > substitution, or might only perform inline > > > > substitutions to calls in the scope of an inline declaration. > > > > § J.3.8 [Undefined Behavior] Hints: The extent to which suggestions > > > > made by using the inline function specifier are effective (6.7.4). > > > > > > > > My translation: > > > > "Please don't assume inline means anything." > > > > > > > > For the unspecified GNU C extension __attribute__((always_inline)), it > > > > seems to me like it's meant more for performing inlining (an > > > > optimization) at -O0. Whether the compiler warns or not seems like a > > > > nice side effect, but provides no strong guarantee otherwise. > > > > > > > > I'm sorry that so much code may have been written with that > > > > assumption, and I'm sorry to be the bearer of bad news, but this isn't > > > > a recent change. If code was written under false assumptions, it > > > > should be rewritten. Sorry. > > > > > > You may quote C11, but that is not relevent. The kernel is coded to > > > gnu89 standard - see the -std=gnu89 flag. > > > > There's more to this and why C11 is entirely irrelevant. The "inline" > > you see in our headers is not the compiler keyword that you find in > > various C standards, it is a macro that gets expanded to either: > > > > #define inline inline __attribute__((__always_inline__)) __gnu_inline \ > > __maybe_unused notrace > > > > or > > > > #define inline inline __gnu_inline \ > > __maybe_unused notrace > > > > __gnu_inline is defined as: > > > > #define __gnu_inline __attribute__((__gnu_inline__)) > > > > So this attaches the gnu_inline attribute to the function: > > > > `gnu_inline' > > This attribute should be used with a function that is also declared > > with the `inline' keyword. It directs GCC to treat the function > > as if it were defined in gnu90 mode even when compiling in C99 or > > gnu99 mode. > > ... > > Since ISO C99 specifies a different semantics for `inline', this > > function attribute is provided as a transition measure and as a > > useful feature in its own right. This attribute is available in > > GCC 4.1.3 and later. It is available if either of the > > preprocessor macros `__GNUC_GNU_INLINE__' or > > `__GNUC_STDC_INLINE__' are defined. *Note An Inline Function is > > As Fast As a Macro: Inline. > > > > which is quite clear that C99 semantics do not apply to _this_ inline. > > The manual goes on to explain: > > > > GCC implements three different semantics of declaring a function > > inline. One is available with `-std=gnu89' or `-fgnu89-inline' or when > > `gnu_inline' attribute is present on all inline declarations, another > > when `-std=c99', `-std=c11', `-std=gnu99' or `-std=gnu11' (without > > `-fgnu89-inline'), and the third is used when compiling C++. > > (I wrote the kernel patch for gnu_inline; it only comes into play when > `inline` appears on a function *also defined as `extern`*). From what I can tell reading the GCC manual, the patch adding gnu_inline should have no effect. Maybe it was written before -std=gnu89 was in use by the kernel makefiles? > > I'd suggest gnu90 mode is pretty similar to gnu89 mode, and as we build > > the kernel in gnu89 mode, that is the inlining definition that is > > appropriate. > > > > When it comes to __always_inline, the GCC manual is the definitive > > reference, since we use the GCC attribute for that: > > > > #define __always_inline inline __attribute__((__always_inline__)) > > > > and I've already quoted what the GCC manual says for always_inline. > > > > Arguing about what the C11 spec says about inlining when we aren't > > using C11 dialect in the kernel, but are using GCC features, does > > not move the discussion on. > > > > Thanks anyway, maybe it will become relevent in the future if we > > decide to move to C11. > > It's not like the semantics of inline are better specified by an older > standard, or changed (The only real semantic change involving `inline` > between ISO C90 and ISO C99 has to do with whether `extern inline` > emits the function with external linkage as you noted). But that's > irrelevant to the discussion.). I quoted C11 because ctrl+f doesn't > work for the C90 ISO spec pdf. C90 spec doesn't even have a section > on Function Specifiers. From what I can tell, `inline` wasn't > specified until ISO C99. > > GNU modes are often modifiers off of ISO C bases; gnu89 corresponds to > ISO C90. They may permit the use of features from newer ISO C specs > and GNU C extensions without warning under -Wpedantic. There aren't a > whole lot of semantic differences, at least that I'm aware of. Right, so GCC had inlining support before ISO C added it (which I distinctly remember, I've been involved in Linux since 1994.) Unless ISO C based their definition in some way off GCC's implementation, I still don't see how quoting ISO C documents helps this discussion when it's the GCC feature that we're using. And none of this is relevent anyway if we use the GCC always_inline extension *which is obviously the right way to resolve this instance*. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
Hi Nick, On Wed, Oct 2, 2019 at 6:33 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote: > > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote: > > > > > I apologize; I don't mean to be difficult. I would just like to avoid > > > > > surprises when code written with the assumption that it will be > > > > > inlined is not. It sounds like we found one issue in arm32 and one in > > > > > arm64 related to outlining. If we fix those two cases, I think we're > > > > > close to proceeding with Masahiro's cleanup, which I view as a good > > > > > thing for the health of the Linux kernel codebase. > > > > > > > > Except, using the C preprocessor for this turns the arm32 code into > > > > yuck: > > > > > > > > 1. We'd need to turn get_domain() and set_domain() into multi-line > > > > preprocessor macro definitions, using the GCC ({ }) extension > > > > so that get_domain() can return a value. > > > > > > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to > > > > become preprocessor macro definitions too. > > > > > > > > So, we end up with multiple levels of nested preprocessor macros. > > > > When something goes wrong, the compiler warning/error message is > > > > going to be utterly _horrid_. > > > > > > That's why I preferred V1 of Masahiro's patch, that fixed the inline > > > asm not to make use of caller saved registers before calling a > > > function that might not be inlined. > > > > ... which I objected to based on the fact that this uaccess stuff is > > supposed to add protection against the kernel being fooled into > > accessing userspace when it shouldn't. The whole intention there is > > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close > > as possible to the call site of the accessor touching userspace. > > Then use the C preprocessor to force the inlining. I'm sorry it's not > as pretty as static inline functions. Which makes us lose the baby^H^H^H^Htype checking performed on function parameters, requiring to add more ugly checks. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Wed, Oct 02, 2019 at 02:55:50PM +0200, Geert Uytterhoeven wrote: > On Wed, Oct 2, 2019 at 6:33 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Tue, Oct 1, 2019 at 11:14 AM Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > On Tue, Oct 01, 2019 at 11:00:11AM -0700, Nick Desaulniers wrote: > > > > On Tue, Oct 1, 2019 at 10:55 AM Russell King - ARM Linux admin > > > > <linux@armlinux.org.uk> wrote: > > > > > On Tue, Oct 01, 2019 at 10:44:43AM -0700, Nick Desaulniers wrote: > > > > > > I apologize; I don't mean to be difficult. I would just like to avoid > > > > > > surprises when code written with the assumption that it will be > > > > > > inlined is not. It sounds like we found one issue in arm32 and one in > > > > > > arm64 related to outlining. If we fix those two cases, I think we're > > > > > > close to proceeding with Masahiro's cleanup, which I view as a good > > > > > > thing for the health of the Linux kernel codebase. > > > > > > > > > > Except, using the C preprocessor for this turns the arm32 code into > > > > > yuck: > > > > > > > > > > 1. We'd need to turn get_domain() and set_domain() into multi-line > > > > > preprocessor macro definitions, using the GCC ({ }) extension > > > > > so that get_domain() can return a value. > > > > > > > > > > 2. uaccess_save_and_enable() and uaccess_restore() also need to > > > > > become preprocessor macro definitions too. > > > > > > > > > > So, we end up with multiple levels of nested preprocessor macros. > > > > > When something goes wrong, the compiler warning/error message is > > > > > going to be utterly _horrid_. > > > > > > > > That's why I preferred V1 of Masahiro's patch, that fixed the inline > > > > asm not to make use of caller saved registers before calling a > > > > function that might not be inlined. > > > > > > ... which I objected to based on the fact that this uaccess stuff is > > > supposed to add protection against the kernel being fooled into > > > accessing userspace when it shouldn't. The whole intention there is > > > that [sg]et_domain(), and uaccess_*() are _always_ inlined as close > > > as possible to the call site of the accessor touching userspace. > > > > Then use the C preprocessor to force the inlining. I'm sorry it's not > > as pretty as static inline functions. > > Which makes us lose the baby^H^H^H^Htype checking performed > on function parameters, requiring to add more ugly checks. Indeed, and the resulting mess is (at least in my opinion) considerably worse than where we were in 5.3 and earlier kernels with 'inline' defined as '__always_inline'. Will
On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > Then use the C preprocessor to force the inlining. I'm sorry it's not > > as pretty as static inline functions. > > Which makes us lose the baby^H^H^H^Htype checking performed > on function parameters, requiring to add more ugly checks. I'm 100% agreed on this. If the inline change is being pushed by people who say "you should have used macros instead if you wanted inlining", then I will just revert that stupid commit that is causing problems. No, the preprocessor is not the answer. That said, code that relies on inlining for _correctness_ should use "__always_inline" and possibly even have a comment about why. But I am considering just undoing commit 9012d011660e ("compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The advantages are questionable, and when the advantages are balanced against actual regressions and the arguments are "use macros", that just shows how badly thought out this was. Linus
On Thu, Oct 3, 2019 at 5:46 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > > > > Then use the C preprocessor to force the inlining. I'm sorry it's not > > > as pretty as static inline functions. > > > > Which makes us lose the baby^H^H^H^Htype checking performed > > on function parameters, requiring to add more ugly checks. > > I'm 100% agreed on this. > > If the inline change is being pushed by people who say "you should > have used macros instead if you wanted inlining", then I will just > revert that stupid commit that is causing problems. > > No, the preprocessor is not the answer. > > That said, code that relies on inlining for _correctness_ should use > "__always_inline" and possibly even have a comment about why. > > But I am considering just undoing commit 9012d011660e ("compiler: > allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. No, please do not. Macrofying the 'inline' is a horrid mistake that makes incorrect code work. It would eternally prevent people from writing portable, correct code. Please do not encourage to hide problems. > The > advantages are questionable, and when the advantages are balanced > against actual regressions and the arguments are "use macros", that > just shows how badly thought out this was. > > Linus -- Best Regards Masahiro Yamada
On Wed, Oct 02, 2019 at 01:39:40PM -0700, Linus Torvalds wrote: > On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > > > > > > > Then use the C preprocessor to force the inlining. I'm sorry it's not > > > as pretty as static inline functions. > > > > Which makes us lose the baby^H^H^H^Htype checking performed > > on function parameters, requiring to add more ugly checks. > > I'm 100% agreed on this. > > If the inline change is being pushed by people who say "you should > have used macros instead if you wanted inlining", then I will just > revert that stupid commit that is causing problems. > > No, the preprocessor is not the answer. > > That said, code that relies on inlining for _correctness_ should use > "__always_inline" and possibly even have a comment about why. > > But I am considering just undoing commit 9012d011660e ("compiler: > allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The > advantages are questionable, and when the advantages are balanced > against actual regressions and the arguments are "use macros", that > just shows how badly thought out this was. It's clear that opinions are divided on this issue, but you can add an enthusiastic: Acked-by: Will Deacon <will@kernel.org> if you go ahead with the revert. I'm all for allowing the compiler to make its own inlining decisions, but not when the potential for miscompilation isn't fully understood and the proposed alternatives turn the source into an unreadable mess. Perhaps we can do something different for 5.5 (arch opt-in? clang only? invert the logic? work to move functions over to __always_inline /before/ flipping the CONFIG option? ...?) Will
On Wed, Oct 2, 2019 at 7:11 PM Masahiro Yamada <yamada.masahiro@socionext.com> wrote: > > Macrofying the 'inline' is a horrid mistake that makes incorrect code work. > It would eternally prevent people from writing portable, correct code. > Please do not encourage to hide problems. Honestly, if the alternative to hiding problems is "use a macro", then I'd rather hide the problems and just make "inline" means "inline". If "inline" means "it's just a hint, use macros", then inline is useless. If "inline" means "using this means that there are known compiler bugs, but we don't know where they trigger", then inline is _worse_ than useless. I do not see the big advantage of letting the compiler say "yeah, I'm not going to do that, Dave". And I see a *huge* disadvantage when people are ignoring compiler bugs, and are saying "use a macro". Seriously. Right now we see the obvious compiler bugs that cause build breakages. How many non-obvious compiler bugs do we have? And how sure are you that our code is "correct" after fixing a couple of obvious cases? As to "portable", nobody cares. We're a kernel. We aren't portable, and never were. If this is purely about the fact that x86 is different from other architectures, then let's remove the "compiler can do stupid things" option on x86 too. It was never clear that it was a huge advantage. Linus
On Thu, Oct 3, 2019 at 10:01 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > If this is purely about the fact that x86 is different from other > architectures, then let's remove the "compiler can do stupid things" > option on x86 too. It was never clear that it was a huge advantage. Side note: what might be an actual advantage would be to have a mode where you see when the compiler would choose to not inline things. Maybe we have things that are marked "inline" that simply shouldn't be. We do tend to have this history of adding small helper functions to header files, and then they grow and grow and grow. And now they shouldn't be in a header file any more, but they still are. So having a "compiler doesn't want to inline this" flag might be useful for finding those, but it might also be useful for people looking for what triggers bugs. But honestly, the last time I saw somebody argue for "I don't want inlining", it was the broken "I want to compiler the kernel with no optimizations so that it's easier to look at the resulting code". That was just a _bad_ bad patch. And if it is concerns like that that are driving this "compiler doesn't have to inline", then we should stop it immediately. Linus
On Fri, Oct 4, 2019 at 2:02 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Wed, Oct 2, 2019 at 7:11 PM Masahiro Yamada > <yamada.masahiro@socionext.com> wrote: > > > > Macrofying the 'inline' is a horrid mistake that makes incorrect code work. > > It would eternally prevent people from writing portable, correct code. > > Please do not encourage to hide problems. > > Honestly, if the alternative to hiding problems is "use a macro", then > I'd rather hide the problems and just make "inline" means "inline". > > If "inline" means "it's just a hint, use macros", then inline is useless. For clarification, I am not saying "use macros" at all. I just want to annotate __always_inline for the case "2. code that if not inlined is somehow not correct." > If "inline" means "using this means that there are known compiler > bugs, but we don't know where they trigger", then inline is _worse_ > than useless. > > I do not see the big advantage of letting the compiler say "yeah, I'm > not going to do that, Dave". > > And I see a *huge* disadvantage when people are ignoring compiler > bugs, and are saying "use a macro". Seriously. Again, not saying "use a macro". > > Right now we see the obvious compiler bugs that cause build breakages. > How many non-obvious compiler bugs do we have? And how sure are you > that our code is "correct" after fixing a couple of obvious cases? > > As to "portable", nobody cares. We're a kernel. We aren't portable, > and never were. > > If this is purely about the fact that x86 is different from other > architectures, then let's remove the "compiler can do stupid things" > option on x86 too. It was never clear that it was a huge advantage. > > Linus -- Best Regards Masahiro Yamada
On Thu, Oct 3, 2019 at 10:24 AM Masahiro Yamada <yamada.masahiro@socionext.com> wrote: > > I just want to annotate __always_inline for the case > "2. code that if not inlined is somehow not correct." Oh, I support that entirely - if only for documentation. But I do *not* support the dismissal of the architecture maintainers concerns about "does it work?" and apparently known compiler bugs. > Again, not saying "use a macro". Other people did, though. And there seemed to be little balancing of the pain vs the gain. The gain really isn't that obvious. If the code shrinks by a couple of kB, is that good or bad? Maybe it is smaller, but is it _better_? Linus
On Thu, Oct 3, 2019 at 7:29 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Oct 3, 2019 at 10:24 AM Masahiro Yamada > <yamada.masahiro@socionext.com> wrote: > > > > I just want to annotate __always_inline for the case > > "2. code that if not inlined is somehow not correct." > > Oh, I support that entirely - if only for documentation. > > But I do *not* support the dismissal of the architecture maintainers > concerns about "does it work?" and apparently known compiler bugs. > > > Again, not saying "use a macro". > > Other people did, though. > > And there seemed to be little balancing of the pain vs the gain. The > gain really isn't that obvious. If the code shrinks by a couple of kB, > is that good or bad? Maybe it is smaller, but is it _better_? I think both positions that people have shown are important to take into account. We should minimize our usage of macros wherever possible and certainly not write new ones when another solution is available. But we should *also* minimize our dependence on code that "must-be-inlined" to work as much as possible. In particular, I think we should allow to use __always_inline only if it doesn't work otherwise, as an alternative before trying the next worst solution (macros). And avoid using only "inline" when we actually require inlining, of course. And the reasoning for each usage of __always_inline should have a comment (be it "bad codegen", "performance tanks without it", "compiler X <= 4.2 refuses to compile"...). Which is also useful for compiler folks to grep for cases to improve/fix in their compiler! Cheers, Miguel
Hi Miguel, On Thu, Oct 3, 2019 at 10:21 PM Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote: > On Thu, Oct 3, 2019 at 7:29 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > On Thu, Oct 3, 2019 at 10:24 AM Masahiro Yamada > > <yamada.masahiro@socionext.com> wrote: > > > > > > I just want to annotate __always_inline for the case > > > "2. code that if not inlined is somehow not correct." > > > > Oh, I support that entirely - if only for documentation. > > > > But I do *not* support the dismissal of the architecture maintainers > > concerns about "does it work?" and apparently known compiler bugs. > > > > > Again, not saying "use a macro". > > > > Other people did, though. > > > > And there seemed to be little balancing of the pain vs the gain. The > > gain really isn't that obvious. If the code shrinks by a couple of kB, > > is that good or bad? Maybe it is smaller, but is it _better_? > > I think both positions that people have shown are important to take > into account. > > We should minimize our usage of macros wherever possible and certainly > not write new ones when another solution is available. But we should > *also* minimize our dependence on code that "must-be-inlined" to work > as much as possible. > > In particular, I think we should allow to use __always_inline only if > it doesn't work otherwise, as an alternative before trying the next > worst solution (macros). And avoid using only "inline" when we > actually require inlining, of course. > > And the reasoning for each usage of __always_inline should have a > comment (be it "bad codegen", "performance tanks without it", > "compiler X <= 4.2 refuses to compile"...). Which is also useful for > compiler folks to grep for cases to improve/fix in their compiler! First, we had "inline" and normal functions, where "inline" was used to make sure a function was inlined (e.g. because it contained code paths that were intended to be optimized away[*]). Then, the compiler started not honoring the "inline" keyword, so we got "always_inline", "inline", and normal functions. With a hack to #define "inline" to "always_inline" for some compiler versions. What's next? There should be a way for the programmer to indicate a function must be inlined. [*] Some unused code paths may contain references to symbols that may not exist for the current configuration, or not exist at all. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi, Am 03.10.19 um 18:36 schrieb Will Deacon: > On Wed, Oct 02, 2019 at 01:39:40PM -0700, Linus Torvalds wrote: >> On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: >>>> Then use the C preprocessor to force the inlining. I'm sorry it's not >>>> as pretty as static inline functions. >>> Which makes us lose the baby^H^H^H^Htype checking performed >>> on function parameters, requiring to add more ugly checks. >> I'm 100% agreed on this. >> >> If the inline change is being pushed by people who say "you should >> have used macros instead if you wanted inlining", then I will just >> revert that stupid commit that is causing problems. >> >> No, the preprocessor is not the answer. >> >> That said, code that relies on inlining for _correctness_ should use >> "__always_inline" and possibly even have a comment about why. >> >> But I am considering just undoing commit 9012d011660e ("compiler: >> allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The >> advantages are questionable, and when the advantages are balanced >> against actual regressions and the arguments are "use macros", that >> just shows how badly thought out this was. > It's clear that opinions are divided on this issue, but you can add > an enthusiastic: > > Acked-by: Will Deacon <will@kernel.org> > > if you go ahead with the revert. I'm all for allowing the compiler to > make its own inlining decisions, but not when the potential for > miscompilation isn't fully understood and the proposed alternatives turn > the source into an unreadable mess. Perhaps we can do something different > for 5.5 (arch opt-in? clang only? invert the logic? work to move functions > over to __always_inline /before/ flipping the CONFIG option? ...?) what's the status on this? In need to prepare my pull requests for 5.5 and all recent kernelci targets (including linux-next) with bcm2835_defconfig are still broken. Stefan > > Will
On Sat, Oct 12, 2019 at 7:21 PM Stefan Wahren <wahrenst@gmx.net> wrote: > > Hi, > > Am 03.10.19 um 18:36 schrieb Will Deacon: > > On Wed, Oct 02, 2019 at 01:39:40PM -0700, Linus Torvalds wrote: > >> On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > >>>> Then use the C preprocessor to force the inlining. I'm sorry it's not > >>>> as pretty as static inline functions. > >>> Which makes us lose the baby^H^H^H^Htype checking performed > >>> on function parameters, requiring to add more ugly checks. > >> I'm 100% agreed on this. > >> > >> If the inline change is being pushed by people who say "you should > >> have used macros instead if you wanted inlining", then I will just > >> revert that stupid commit that is causing problems. > >> > >> No, the preprocessor is not the answer. > >> > >> That said, code that relies on inlining for _correctness_ should use > >> "__always_inline" and possibly even have a comment about why. > >> > >> But I am considering just undoing commit 9012d011660e ("compiler: > >> allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The > >> advantages are questionable, and when the advantages are balanced > >> against actual regressions and the arguments are "use macros", that > >> just shows how badly thought out this was. > > It's clear that opinions are divided on this issue, but you can add > > an enthusiastic: > > > > Acked-by: Will Deacon <will@kernel.org> > > > > if you go ahead with the revert. I'm all for allowing the compiler to > > make its own inlining decisions, but not when the potential for > > miscompilation isn't fully understood and the proposed alternatives turn > > the source into an unreadable mess. Perhaps we can do something different > > for 5.5 (arch opt-in? clang only? invert the logic? work to move functions > > over to __always_inline /before/ flipping the CONFIG option? ...?) > > what's the status on this? > > In need to prepare my pull requests for 5.5 and all recent kernelci > targets (including linux-next) with bcm2835_defconfig are still broken. > > Stefan Russell King already applied the fix-up patch. https://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8908/1 So, (I hope) the breakage of bcm2835 will be solved in mainline soon. -- Best Regards Masahiro Yamada
On Sat, Oct 12, 2019 at 12:15:42PM +0200, Stefan Wahren wrote: > Hi, > > Am 03.10.19 um 18:36 schrieb Will Deacon: > > On Wed, Oct 02, 2019 at 01:39:40PM -0700, Linus Torvalds wrote: > >> On Wed, Oct 2, 2019 at 5:56 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > >>>> Then use the C preprocessor to force the inlining. I'm sorry it's not > >>>> as pretty as static inline functions. > >>> Which makes us lose the baby^H^H^H^Htype checking performed > >>> on function parameters, requiring to add more ugly checks. > >> I'm 100% agreed on this. > >> > >> If the inline change is being pushed by people who say "you should > >> have used macros instead if you wanted inlining", then I will just > >> revert that stupid commit that is causing problems. > >> > >> No, the preprocessor is not the answer. > >> > >> That said, code that relies on inlining for _correctness_ should use > >> "__always_inline" and possibly even have a comment about why. > >> > >> But I am considering just undoing commit 9012d011660e ("compiler: > >> allow all arches to enable CONFIG_OPTIMIZE_INLINING") entirely. The > >> advantages are questionable, and when the advantages are balanced > >> against actual regressions and the arguments are "use macros", that > >> just shows how badly thought out this was. > > It's clear that opinions are divided on this issue, but you can add > > an enthusiastic: > > > > Acked-by: Will Deacon <will@kernel.org> > > > > if you go ahead with the revert. I'm all for allowing the compiler to > > make its own inlining decisions, but not when the potential for > > miscompilation isn't fully understood and the proposed alternatives turn > > the source into an unreadable mess. Perhaps we can do something different > > for 5.5 (arch opt-in? clang only? invert the logic? work to move functions > > over to __always_inline /before/ flipping the CONFIG option? ...?) > > what's the status on this? > > In need to prepare my pull requests for 5.5 and all recent kernelci > targets (including linux-next) with bcm2835_defconfig are still broken. I merged the patches late on Thursday, it may have been too late for linux-next to pick them up - and because of the time difference between UK and Australia, it means they won't be in linux-next until next week (basically, tomorrow). linux-next is basically a Sunday to Thursday operation from my point of view. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 5960e2980a8a..e25493811df8 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -327,7 +327,7 @@ config HEADERS_CHECK relevant for userspace, say 'Y'. config OPTIMIZE_INLINING - bool "Allow compiler to uninline functions marked 'inline'" + def_bool y help This option determines if the kernel forces gcc to inline the functions developers have marked 'inline'. Doing so takes away freedom from gcc to @@ -338,8 +338,6 @@ config OPTIMIZE_INLINING decision will become the default in the future. Until then this option is there to test gcc for this. - If unsure, say N. - config DEBUG_SECTION_MISMATCH bool "Enable full Section mismatch analysis" help
Commit 9012d011660e ("compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable this option. A couple of build errors were reported by randconfig, but all of them have been ironed out. Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely (and it will simplify the 'inline' macro in compiler_types.h), this commit changes it to always-on option. Going forward, the compiler will always be allowed to not inline functions marked 'inline'. This is not a problem for x86 since it has been long used by arch/x86/configs/{x86_64,i386}_defconfig. I am keeping the config option just in case any problem crops up for other architectures. The code clean-up will be done after confirming this is solid. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> --- lib/Kconfig.debug | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) -- 2.17.1