diff mbox series

powerpc: fix inline asm constraints for dcbz

Message ID 20190809182106.62130-1-ndesaulniers@google.com
State New
Headers show
Series powerpc: fix inline asm constraints for dcbz | expand

Commit Message

Nick Desaulniers Aug. 9, 2019, 6:21 p.m. UTC
The input parameter is modified, so it should be an output parameter
with "=" to make it so that a copy of the input is not made by Clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=42762
Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers
Link: https://github.com/ClangBuiltLinux/linux/issues/593
Link: https://godbolt.org/z/QwhZXi
Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers")
Debugged-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: kbuild test robot <lkp@intel.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

---
Green CI run:
https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122521976
https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37

 arch/powerpc/include/asm/cache.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.23.0.rc1.153.gdeed80330f-goog

Comments

Arnd Bergmann Aug. 9, 2019, 6:28 p.m. UTC | #1
On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:

>  static inline void dcbz(void *addr)

>  {

> -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");

> +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");

>  }

>

>  static inline void dcbi(void *addr)

>  {

> -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");

> +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");

>  }


I think the result of the discussion was that an output argument only kind-of
makes sense for dcbz, but for the others it's really an input, and clang is
wrong in the way it handles the "Z" constraint by making a copy, which it
doesn't do for "m".

I'm not sure whether it's correct to use "m" instead of "Z" here, which
would be a better workaround if that works. More importantly though,
clang really needs to be fixed to handle "Z" correctly.

        Arnd
Christophe Leroy Aug. 9, 2019, 8:03 p.m. UTC | #2
Arnd Bergmann <arnd@arndb.de> a écrit :

> On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built

> Linux <clang-built-linux@googlegroups.com> wrote:

>

>>  static inline void dcbz(void *addr)

>>  {

>> -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");

>> +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");

>>  }

>>

>>  static inline void dcbi(void *addr)

>>  {

>> -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");

>> +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");

>>  }

>

> I think the result of the discussion was that an output argument only kind-of

> makes sense for dcbz, but for the others it's really an input, and clang is

> wrong in the way it handles the "Z" constraint by making a copy, which it

> doesn't do for "m".

>

> I'm not sure whether it's correct to use "m" instead of "Z" here, which

> would be a better workaround if that works. More importantly though,

> clang really needs to be fixed to handle "Z" correctly.


As the benefit is null, I think the best is probably to reverse my  
original commit until at least CLang is fixed, as initialy suggested  
by mpe

Christophe
Arnd Bergmann Aug. 9, 2019, 8:12 p.m. UTC | #3
On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy
<christophe.leroy@c-s.fr> wrote:
>

> Arnd Bergmann <arnd@arndb.de> a écrit :

> > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built

> > Linux <clang-built-linux@googlegroups.com> wrote:

> >

> >>  static inline void dcbz(void *addr)

> >>  {

> >> -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");

> >> +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");

> >>  }

> >>

> >>  static inline void dcbi(void *addr)

> >>  {

> >> -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");

> >> +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");

> >>  }

> >

> > I think the result of the discussion was that an output argument only kind-of

> > makes sense for dcbz, but for the others it's really an input, and clang is

> > wrong in the way it handles the "Z" constraint by making a copy, which it

> > doesn't do for "m".

> >

> > I'm not sure whether it's correct to use "m" instead of "Z" here, which

> > would be a better workaround if that works. More importantly though,

> > clang really needs to be fixed to handle "Z" correctly.

>

> As the benefit is null, I think the best is probably to reverse my

> original commit until at least CLang is fixed, as initialy suggested

> by mpe


Yes, makes sense.

There is one other use of the "Z" constraint, so on top of the revert, I
think it might be helpful if Nick could check if the patch below makes
any difference with clang and, if it does, whether the current version
is broken.

       Arnd

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 23e5d5d16c7e..28b467779328 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size
__iomem *addr)    \
 {                                                                      \
        u##size ret;                                                    \
        __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync"     \
-               : "=r" (ret) : "Z" (*addr) : "memory");                 \
+               : "=r" (ret) : "m" (*addr) : "memory");                 \
        return ret;                                                     \
 }

@@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size
__iomem *addr)    \
 static inline void name(volatile u##size __iomem *addr, u##size val)   \
 {                                                                      \
        __asm__ __volatile__("sync;"#insn" %1,%y0"                      \
-               : "=Z" (*addr) : "r" (val) : "memory");                 \
+               : "=m" (*addr) : "r" (val) : "memory");                 \
        mmiowb_set_pending();                                           \
 }
Nathan Chancellor Aug. 9, 2019, 8:36 p.m. UTC | #4
On Fri, Aug 09, 2019 at 11:21:05AM -0700, Nick Desaulniers wrote:
> The input parameter is modified, so it should be an output parameter

> with "=" to make it so that a copy of the input is not made by Clang.

> 

> Link: https://bugs.llvm.org/show_bug.cgi?id=42762

> Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers

> Link: https://github.com/ClangBuiltLinux/linux/issues/593

> Link: https://godbolt.org/z/QwhZXi

> Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/

> Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers")

> Debugged-by: Nathan Chancellor <natechancellor@gmail.com>

> Reported-by: Nathan Chancellor <natechancellor@gmail.com>

> Reported-by: kbuild test robot <lkp@intel.com>

> Suggested-by: Arnd Bergmann <arnd@arndb.de>

> Suggested-by: Nathan Chancellor <natechancellor@gmail.com>

> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>


I applied this patch as well as a revert of the original patch and both
clang and GCC appear to generate the same code; I think a straight
revert would be better.

Crude testing script and the generated files attached.

Cheers,
Nathan
Segher Boessenkool Aug. 9, 2019, 9:55 p.m. UTC | #5
On Fri, Aug 09, 2019 at 08:28:19PM +0200, Arnd Bergmann wrote:
> On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built

> Linux <clang-built-linux@googlegroups.com> wrote:

> 

> >  static inline void dcbz(void *addr)

> >  {

> > -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");

> > +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");

> >  }

> >

> >  static inline void dcbi(void *addr)

> >  {

> > -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");

> > +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");

> >  }

> 

> I think the result of the discussion was that an output argument only kind-of

> makes sense for dcbz, but for the others it's really an input, and clang is

> wrong in the way it handles the "Z" constraint by making a copy, which it

> doesn't do for "m".


Yes.  And clang has probably miscompiled this in all kernels since we
have used "Z" for the first time, in 2008 (0f3d6bcd391b).

It is not necessarily fatal or at least not easily visible for the I/O
accessors: it "just" gets memory ordering wrong slightly (it looks like
it does the sync;tw;isync thing around an extra stack access, after it
has performed the actual I/O as any other memory load, without any
synchronisation).

> I'm not sure whether it's correct to use "m" instead of "Z" here, which

> would be a better workaround if that works. More importantly though,

> clang really needs to be fixed to handle "Z" correctly.


"m" allows offset addressing, which these insns do not.  That is the
same reason you need the "y" output modifier.  "m" is wrong here.

We have other memory constraints, but do those work with LLVM?


Segher
Segher Boessenkool Aug. 9, 2019, 10 p.m. UTC | #6
On Fri, Aug 09, 2019 at 10:03:01PM +0200, Christophe Leroy wrote:
> Arnd Bergmann <arnd@arndb.de> a écrit :

> 

> >On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built

> >Linux <clang-built-linux@googlegroups.com> wrote:

> >

> >> static inline void dcbz(void *addr)

> >> {

> >>-       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");

> >>+       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");

> >> }

> >>

> >> static inline void dcbi(void *addr)

> >> {

> >>-       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");

> >>+       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");

> >> }

> >

> >I think the result of the discussion was that an output argument only 

> >kind-of

> >makes sense for dcbz, but for the others it's really an input, and clang is

> >wrong in the way it handles the "Z" constraint by making a copy, which it

> >doesn't do for "m".

> >

> >I'm not sure whether it's correct to use "m" instead of "Z" here, which

> >would be a better workaround if that works. More importantly though,

> >clang really needs to be fixed to handle "Z" correctly.

> 

> As the benefit is null, I think the best is probably to reverse my  

> original commit until at least CLang is fixed, as initialy suggested  

> by mpe


And what about the other uses of "Z"?


Also, if you use C routines (instead of assembler code) for the basic
"clear a block" and the like routines, as there have been patches for
recently, the benefit is not zero.


Segher
Nick Desaulniers Aug. 9, 2019, 10:03 p.m. UTC | #7
On Fri, Aug 9, 2019 at 1:13 PM Arnd Bergmann <arnd@arndb.de> wrote:
>

> On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy

> <christophe.leroy@c-s.fr> wrote:

> >

> > Arnd Bergmann <arnd@arndb.de> a écrit :

> > > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built

> > > Linux <clang-built-linux@googlegroups.com> wrote:

> > >

> > >>  static inline void dcbz(void *addr)

> > >>  {

> > >> -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");

> > >> +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");

> > >>  }

> > >>

> > >>  static inline void dcbi(void *addr)

> > >>  {

> > >> -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");

> > >> +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");

> > >>  }

> > >

> > > I think the result of the discussion was that an output argument only kind-of

> > > makes sense for dcbz, but for the others it's really an input, and clang is

> > > wrong in the way it handles the "Z" constraint by making a copy, which it

> > > doesn't do for "m".

> > >

> > > I'm not sure whether it's correct to use "m" instead of "Z" here, which

> > > would be a better workaround if that works. More importantly though,

> > > clang really needs to be fixed to handle "Z" correctly.

> >

> > As the benefit is null, I think the best is probably to reverse my

> > original commit until at least CLang is fixed, as initialy suggested

> > by mpe

>

> Yes, makes sense.

>

> There is one other use of the "Z" constraint, so on top of the revert, I

> think it might be helpful if Nick could check if the patch below makes

> any difference with clang and, if it does, whether the current version

> is broken.

>

>        Arnd

>

> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h

> index 23e5d5d16c7e..28b467779328 100644

> --- a/arch/powerpc/include/asm/io.h

> +++ b/arch/powerpc/include/asm/io.h

> @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size

> __iomem *addr)    \

>  {                                                                      \

>         u##size ret;                                                    \

>         __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync"     \

> -               : "=r" (ret) : "Z" (*addr) : "memory");                 \

> +               : "=r" (ret) : "m" (*addr) : "memory");                 \

>         return ret;                                                     \

>  }

>

> @@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size

> __iomem *addr)    \

>  static inline void name(volatile u##size __iomem *addr, u##size val)   \

>  {                                                                      \

>         __asm__ __volatile__("sync;"#insn" %1,%y0"                      \

> -               : "=Z" (*addr) : "r" (val) : "memory");                 \

> +               : "=m" (*addr) : "r" (val) : "memory");                 \

>         mmiowb_set_pending();                                           \

>  }


Does not work:
https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122654899
https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37

-- 
Thanks,
~Nick Desaulniers
Segher Boessenkool Aug. 9, 2019, 10:10 p.m. UTC | #8
On Fri, Aug 09, 2019 at 10:12:56PM +0200, Arnd Bergmann wrote:
> @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size

> __iomem *addr)    \

>  {                                                                      \

>         u##size ret;                                                    \

>         __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync"     \

> -               : "=r" (ret) : "Z" (*addr) : "memory");                 \

> +               : "=r" (ret) : "m" (*addr) : "memory");                 \

>         return ret;                                                     \

>  }


That will no longer compile something like
  u8 *p;
  u16 x = in_le16(p + 12);
(you'll get something like "invalid %y value, try using the 'Z' constraint").

So then you remove the %y, but that makes you get something like
  sync;lhbrx 3,12(3);twi 0,3,0;isync
which is completely wrong.


Segher
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..5a0df6a1b9dc 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -107,22 +107,22 @@  extern void _set_L3CR(unsigned long);
 
 static inline void dcbz(void *addr)
 {
-	__asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
 }
 
 static inline void dcbi(void *addr)
 {
-	__asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
 }
 
 static inline void dcbf(void *addr)
 {
-	__asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbf %y0" : "=Z"(*(u8 *)addr) :: "memory");
 }
 
 static inline void dcbst(void *addr)
 {
-	__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbst %y0" : "=Z"(*(u8 *)addr) :: "memory");
 }
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */