Message ID | 20190809182106.62130-1-ndesaulniers@google.com |
---|---|
State | New |
Headers | show |
Series | powerpc: fix inline asm constraints for dcbz | expand |
On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > static inline void dcbz(void *addr) > { > - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > } > > static inline void dcbi(void *addr) > { > - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > } I think the result of the discussion was that an output argument only kind-of makes sense for dcbz, but for the others it's really an input, and clang is wrong in the way it handles the "Z" constraint by making a copy, which it doesn't do for "m". I'm not sure whether it's correct to use "m" instead of "Z" here, which would be a better workaround if that works. More importantly though, clang really needs to be fixed to handle "Z" correctly. Arnd
Arnd Bergmann <arnd@arndb.de> a écrit : > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > Linux <clang-built-linux@googlegroups.com> wrote: > >> static inline void dcbz(void *addr) >> { >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); >> } >> >> static inline void dcbi(void *addr) >> { >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); >> } > > I think the result of the discussion was that an output argument only kind-of > makes sense for dcbz, but for the others it's really an input, and clang is > wrong in the way it handles the "Z" constraint by making a copy, which it > doesn't do for "m". > > I'm not sure whether it's correct to use "m" instead of "Z" here, which > would be a better workaround if that works. More importantly though, > clang really needs to be fixed to handle "Z" correctly. As the benefit is null, I think the best is probably to reverse my original commit until at least CLang is fixed, as initialy suggested by mpe Christophe
On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote: > > Arnd Bergmann <arnd@arndb.de> a écrit : > > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > > Linux <clang-built-linux@googlegroups.com> wrote: > > > >> static inline void dcbz(void *addr) > >> { > >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > >> > >> static inline void dcbi(void *addr) > >> { > >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > > > > I think the result of the discussion was that an output argument only kind-of > > makes sense for dcbz, but for the others it's really an input, and clang is > > wrong in the way it handles the "Z" constraint by making a copy, which it > > doesn't do for "m". > > > > I'm not sure whether it's correct to use "m" instead of "Z" here, which > > would be a better workaround if that works. More importantly though, > > clang really needs to be fixed to handle "Z" correctly. > > As the benefit is null, I think the best is probably to reverse my > original commit until at least CLang is fixed, as initialy suggested > by mpe Yes, makes sense. There is one other use of the "Z" constraint, so on top of the revert, I think it might be helpful if Nick could check if the patch below makes any difference with clang and, if it does, whether the current version is broken. Arnd diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index 23e5d5d16c7e..28b467779328 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size __iomem *addr) \ { \ u##size ret; \ __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \ - : "=r" (ret) : "Z" (*addr) : "memory"); \ + : "=r" (ret) : "m" (*addr) : "memory"); \ return ret; \ } @@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size __iomem *addr) \ static inline void name(volatile u##size __iomem *addr, u##size val) \ { \ __asm__ __volatile__("sync;"#insn" %1,%y0" \ - : "=Z" (*addr) : "r" (val) : "memory"); \ + : "=m" (*addr) : "r" (val) : "memory"); \ mmiowb_set_pending(); \ }
On Fri, Aug 09, 2019 at 11:21:05AM -0700, Nick Desaulniers wrote: > The input parameter is modified, so it should be an output parameter > with "=" to make it so that a copy of the input is not made by Clang. > > Link: https://bugs.llvm.org/show_bug.cgi?id=42762 > Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers > Link: https://github.com/ClangBuiltLinux/linux/issues/593 > Link: https://godbolt.org/z/QwhZXi > Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ > Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers") > Debugged-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: kbuild test robot <lkp@intel.com> > Suggested-by: Arnd Bergmann <arnd@arndb.de> > Suggested-by: Nathan Chancellor <natechancellor@gmail.com> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> I applied this patch as well as a revert of the original patch and both clang and GCC appear to generate the same code; I think a straight revert would be better. Crude testing script and the generated files attached. Cheers, Nathan
On Fri, Aug 09, 2019 at 08:28:19PM +0200, Arnd Bergmann wrote: > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > Linux <clang-built-linux@googlegroups.com> wrote: > > > static inline void dcbz(void *addr) > > { > > - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > > + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > } > > > > static inline void dcbi(void *addr) > > { > > - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > > + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > } > > I think the result of the discussion was that an output argument only kind-of > makes sense for dcbz, but for the others it's really an input, and clang is > wrong in the way it handles the "Z" constraint by making a copy, which it > doesn't do for "m". Yes. And clang has probably miscompiled this in all kernels since we have used "Z" for the first time, in 2008 (0f3d6bcd391b). It is not necessarily fatal or at least not easily visible for the I/O accessors: it "just" gets memory ordering wrong slightly (it looks like it does the sync;tw;isync thing around an extra stack access, after it has performed the actual I/O as any other memory load, without any synchronisation). > I'm not sure whether it's correct to use "m" instead of "Z" here, which > would be a better workaround if that works. More importantly though, > clang really needs to be fixed to handle "Z" correctly. "m" allows offset addressing, which these insns do not. That is the same reason you need the "y" output modifier. "m" is wrong here. We have other memory constraints, but do those work with LLVM? Segher
On Fri, Aug 09, 2019 at 10:03:01PM +0200, Christophe Leroy wrote: > Arnd Bergmann <arnd@arndb.de> a écrit : > > >On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > >Linux <clang-built-linux@googlegroups.com> wrote: > > > >> static inline void dcbz(void *addr) > >> { > >>- __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > >>+ __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > >> > >> static inline void dcbi(void *addr) > >> { > >>- __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > >>+ __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > > > >I think the result of the discussion was that an output argument only > >kind-of > >makes sense for dcbz, but for the others it's really an input, and clang is > >wrong in the way it handles the "Z" constraint by making a copy, which it > >doesn't do for "m". > > > >I'm not sure whether it's correct to use "m" instead of "Z" here, which > >would be a better workaround if that works. More importantly though, > >clang really needs to be fixed to handle "Z" correctly. > > As the benefit is null, I think the best is probably to reverse my > original commit until at least CLang is fixed, as initialy suggested > by mpe And what about the other uses of "Z"? Also, if you use C routines (instead of assembler code) for the basic "clear a block" and the like routines, as there have been patches for recently, the benefit is not zero. Segher
On Fri, Aug 9, 2019 at 1:13 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy > <christophe.leroy@c-s.fr> wrote: > > > > Arnd Bergmann <arnd@arndb.de> a écrit : > > > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > > > Linux <clang-built-linux@googlegroups.com> wrote: > > > > > >> static inline void dcbz(void *addr) > > >> { > > >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > > >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > >> } > > >> > > >> static inline void dcbi(void *addr) > > >> { > > >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > > >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > >> } > > > > > > I think the result of the discussion was that an output argument only kind-of > > > makes sense for dcbz, but for the others it's really an input, and clang is > > > wrong in the way it handles the "Z" constraint by making a copy, which it > > > doesn't do for "m". > > > > > > I'm not sure whether it's correct to use "m" instead of "Z" here, which > > > would be a better workaround if that works. More importantly though, > > > clang really needs to be fixed to handle "Z" correctly. > > > > As the benefit is null, I think the best is probably to reverse my > > original commit until at least CLang is fixed, as initialy suggested > > by mpe > > Yes, makes sense. > > There is one other use of the "Z" constraint, so on top of the revert, I > think it might be helpful if Nick could check if the patch below makes > any difference with clang and, if it does, whether the current version > is broken. > > Arnd > > diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h > index 23e5d5d16c7e..28b467779328 100644 > --- a/arch/powerpc/include/asm/io.h > +++ b/arch/powerpc/include/asm/io.h > @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size > __iomem *addr) \ > { \ > u##size ret; \ > __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \ > - : "=r" (ret) : "Z" (*addr) : "memory"); \ > + : "=r" (ret) : "m" (*addr) : "memory"); \ > return ret; \ > } > > @@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size > __iomem *addr) \ > static inline void name(volatile u##size __iomem *addr, u##size val) \ > { \ > __asm__ __volatile__("sync;"#insn" %1,%y0" \ > - : "=Z" (*addr) : "r" (val) : "memory"); \ > + : "=m" (*addr) : "r" (val) : "memory"); \ > mmiowb_set_pending(); \ > } Does not work: https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122654899 https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37 -- Thanks, ~Nick Desaulniers
On Fri, Aug 09, 2019 at 10:12:56PM +0200, Arnd Bergmann wrote: > @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size > __iomem *addr) \ > { \ > u##size ret; \ > __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \ > - : "=r" (ret) : "Z" (*addr) : "memory"); \ > + : "=r" (ret) : "m" (*addr) : "memory"); \ > return ret; \ > } That will no longer compile something like u8 *p; u16 x = in_le16(p + 12); (you'll get something like "invalid %y value, try using the 'Z' constraint"). So then you remove the %y, but that makes you get something like sync;lhbrx 3,12(3);twi 0,3,0;isync which is completely wrong. Segher
diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index b3388d95f451..5a0df6a1b9dc 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -107,22 +107,22 @@ extern void _set_L3CR(unsigned long); static inline void dcbz(void *addr) { - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); } static inline void dcbi(void *addr) { - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); } static inline void dcbf(void *addr) { - __asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbf %y0" : "=Z"(*(u8 *)addr) :: "memory"); } static inline void dcbst(void *addr) { - __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbst %y0" : "=Z"(*(u8 *)addr) :: "memory"); } #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */
The input parameter is modified, so it should be an output parameter with "=" to make it so that a copy of the input is not made by Clang. Link: https://bugs.llvm.org/show_bug.cgi?id=42762 Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers Link: https://github.com/ClangBuiltLinux/linux/issues/593 Link: https://godbolt.org/z/QwhZXi Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers") Debugged-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kbuild test robot <lkp@intel.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- Green CI run: https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122521976 https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37 arch/powerpc/include/asm/cache.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -- 2.23.0.rc1.153.gdeed80330f-goog