Message ID | 20190307091514.2489338-2-arnd@arndb.de |
---|---|
State | New |
Headers | show |
Series | [1/2] futex: mark futex_detect_cmpxchg() as 'noinline' | expand |
On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > Passing registers containing zero as both the address (NULL pointer) > and data into cmpxchg_futex_value_locked() leads clang to assign > the same register for both inputs on ARM, which triggers a warning > explaining that this instruction has unpredictable behavior on ARMv5. > > /tmp/futex-7e740e.s: Assembler messages: > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > as Mikael wrote: > "One way of fixing this is to make uaddr an input/output register, since > "that prevents it from overlapping any other input or output." > > but then withdrawn as the warning was determined to be harmless, and it > apparently never showed up again with later gcc versions. > > Now the same problem is back when compiling with clang, and we are trying > to get clang to build the kernel without warnings, as gcc normally does. > > Cc: Mikael Pettersson <mikpe@it.uu.se> > Cc: Mikael Pettersson <mikpelinux@gmail.com> > Cc: Dave Martin <Dave.Martin@arm.com> > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > arch/arm/include/asm/futex.h | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > index 0a46676b4245..79790912974e 100644 > --- a/arch/arm/include/asm/futex.h > +++ b/arch/arm/include/asm/futex.h > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > preempt_disable(); > __ua_flags = uaccess_save_and_enable(); > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > - "1: " TUSER(ldr) " %1, [%4]\n" > - " teq %1, %2\n" > + "1: " TUSER(ldr) " %1, [%2]\n" > + " teq %1, %3\n" > " it eq @ explicit IT needed for the 2b label\n" > - "2: " TUSER(streq) " %3, [%4]\n" > + "2: " TUSER(streq) " %4, [%2]\n" > __futex_atomic_ex_table("%5") > - : "+r" (ret), "=&r" (val) > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > : "cc", "memory"); > uaccess_restore(__ua_flags); Underspecification of constraints to extended inline assembly is a common issue exposed by other compilers (and possibly but in-effect infrequently compiler upgrades). So the reordering of the constraints means the in the assembly (notes for other reviewers): %2 -> %3 %3 -> %4 %4 -> %2 Yep, looks good to me, thanks for finding this old patch and resending, Arnd! Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> I think it would be good to further credit Mikael with reported by and suggested by tags, but not sure which email address is preferred? -- Thanks, ~Nick Desaulniers
On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > Passing registers containing zero as both the address (NULL pointer) > > and data into cmpxchg_futex_value_locked() leads clang to assign > > the same register for both inputs on ARM, which triggers a warning > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > /tmp/futex-7e740e.s: Assembler messages: > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > as Mikael wrote: > > "One way of fixing this is to make uaddr an input/output register, since > > "that prevents it from overlapping any other input or output." > > > > but then withdrawn as the warning was determined to be harmless, and it > > apparently never showed up again with later gcc versions. > > > > Now the same problem is back when compiling with clang, and we are trying > > to get clang to build the kernel without warnings, as gcc normally does. > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > Cc: Dave Martin <Dave.Martin@arm.com> > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > --- > > arch/arm/include/asm/futex.h | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > index 0a46676b4245..79790912974e 100644 > > --- a/arch/arm/include/asm/futex.h > > +++ b/arch/arm/include/asm/futex.h > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > preempt_disable(); > > __ua_flags = uaccess_save_and_enable(); > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > - "1: " TUSER(ldr) " %1, [%4]\n" > > - " teq %1, %2\n" > > + "1: " TUSER(ldr) " %1, [%2]\n" > > + " teq %1, %3\n" > > " it eq @ explicit IT needed for the 2b label\n" > > - "2: " TUSER(streq) " %3, [%4]\n" > > + "2: " TUSER(streq) " %4, [%2]\n" > > __futex_atomic_ex_table("%5") > > - : "+r" (ret), "=&r" (val) > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > : "cc", "memory"); > > uaccess_restore(__ua_flags); > > Underspecification of constraints to extended inline assembly is a > common issue exposed by other compilers (and possibly but in-effect > infrequently compiler upgrades). > So the reordering of the constraints means the in the assembly (notes > for other reviewers): > %2 -> %3 > %3 -> %4 > %4 -> %2 > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! I don't see what is "underspecified" in the original constraints. Please explain. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
On Thu, Mar 7, 2019 at 3:49 PM Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > Underspecification of constraints to extended inline assembly is a > > common issue exposed by other compilers (and possibly but in-effect > > infrequently compiler upgrades). > > I don't see what is "underspecified" in the original constraints. > Please explain. From the link: The problem is that in the T(streq) insn, %3 and %4 MUST be different registers, but nothing in the asm() constrains them to be different. > > > "One way of fixing this is to make uaddr an input/output register, since > > > "that prevents it from overlapping any other input or output." -- Thanks, ~Nick Desaulniers
On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > Passing registers containing zero as both the address (NULL pointer) > > > and data into cmpxchg_futex_value_locked() leads clang to assign > > > the same register for both inputs on ARM, which triggers a warning > > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > > > /tmp/futex-7e740e.s: Assembler messages: > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > > as Mikael wrote: > > > "One way of fixing this is to make uaddr an input/output register, since > > > "that prevents it from overlapping any other input or output." > > > > > > but then withdrawn as the warning was determined to be harmless, and it > > > apparently never showed up again with later gcc versions. > > > > > > Now the same problem is back when compiling with clang, and we are trying > > > to get clang to build the kernel without warnings, as gcc normally does. > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > > Cc: Dave Martin <Dave.Martin@arm.com> > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > --- > > > arch/arm/include/asm/futex.h | 10 +++++----- > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > > index 0a46676b4245..79790912974e 100644 > > > --- a/arch/arm/include/asm/futex.h > > > +++ b/arch/arm/include/asm/futex.h > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > > preempt_disable(); > > > __ua_flags = uaccess_save_and_enable(); > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > > - "1: " TUSER(ldr) " %1, [%4]\n" > > > - " teq %1, %2\n" > > > + "1: " TUSER(ldr) " %1, [%2]\n" > > > + " teq %1, %3\n" > > > " it eq @ explicit IT needed for the 2b label\n" > > > - "2: " TUSER(streq) " %3, [%4]\n" > > > + "2: " TUSER(streq) " %4, [%2]\n" > > > __futex_atomic_ex_table("%5") > > > - : "+r" (ret), "=&r" (val) > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > > : "cc", "memory"); > > > uaccess_restore(__ua_flags); > > > > Underspecification of constraints to extended inline assembly is a > > common issue exposed by other compilers (and possibly but in-effect > > infrequently compiler upgrades). > > So the reordering of the constraints means the in the assembly (notes > > for other reviewers): > > %2 -> %3 > > %3 -> %4 > > %4 -> %2 > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! > > I don't see what is "underspecified" in the original constraints. > Please explain. > I agree that that statement makes little sense. As Russell points out in the referenced thread, there is nothing wrong with the generated assembly, given that the UNPREDICTABLE opcode is unreachable in practice. Unfortunately, we have no way to flag this diagnostic as a known false positive, and AFAICT, there is no reason we couldn't end up with the same diagnostic popping up for GCC builds in the future, considering that the register assignment matches the constraints. (We have seen somewhat similar issues where constant folded function clones are emitted with a constant argument that could never occur in reality [0]) Given the above, the only meaningful way to invoke this function is with different registers assigned to %3 and %4, and so tightening the constraints to guarantee that does not actually result in worse code (except maybe for the instantiations that we won't ever call in the first place). So I think we should fix this. I wonder if just adding BUG_ON(__builtin_constant_p(uaddr)); at the beginning makes any difference - this shouldn't result in any object code differences since the conditional will always evaluate to false at build time for instantiations we care about. [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/
On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote: > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > > Passing registers containing zero as both the address (NULL pointer) > > > > and data into cmpxchg_futex_value_locked() leads clang to assign > > > > the same register for both inputs on ARM, which triggers a warning > > > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > > > > > /tmp/futex-7e740e.s: Assembler messages: > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > > > as Mikael wrote: > > > > "One way of fixing this is to make uaddr an input/output register, since > > > > "that prevents it from overlapping any other input or output." > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it > > > > apparently never showed up again with later gcc versions. > > > > > > > > Now the same problem is back when compiling with clang, and we are trying > > > > to get clang to build the kernel without warnings, as gcc normally does. > > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > > > Cc: Dave Martin <Dave.Martin@arm.com> > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > --- > > > > arch/arm/include/asm/futex.h | 10 +++++----- > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > > > index 0a46676b4245..79790912974e 100644 > > > > --- a/arch/arm/include/asm/futex.h > > > > +++ b/arch/arm/include/asm/futex.h > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > > > preempt_disable(); > > > > __ua_flags = uaccess_save_and_enable(); > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > > > - "1: " TUSER(ldr) " %1, [%4]\n" > > > > - " teq %1, %2\n" > > > > + "1: " TUSER(ldr) " %1, [%2]\n" > > > > + " teq %1, %3\n" > > > > " it eq @ explicit IT needed for the 2b label\n" > > > > - "2: " TUSER(streq) " %3, [%4]\n" > > > > + "2: " TUSER(streq) " %4, [%2]\n" > > > > __futex_atomic_ex_table("%5") > > > > - : "+r" (ret), "=&r" (val) > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > > > : "cc", "memory"); > > > > uaccess_restore(__ua_flags); > > > > > > Underspecification of constraints to extended inline assembly is a > > > common issue exposed by other compilers (and possibly but in-effect > > > infrequently compiler upgrades). > > > So the reordering of the constraints means the in the assembly (notes > > > for other reviewers): > > > %2 -> %3 > > > %3 -> %4 > > > %4 -> %2 > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! > > > > I don't see what is "underspecified" in the original constraints. > > Please explain. > > > > I agree that that statement makes little sense. > > As Russell points out in the referenced thread, there is nothing wrong > with the generated assembly, given that the UNPREDICTABLE opcode is > unreachable in practice. Unfortunately, we have no way to flag this > diagnostic as a known false positive, and AFAICT, there is no reason > we couldn't end up with the same diagnostic popping up for GCC builds > in the future, considering that the register assignment matches the > constraints. (We have seen somewhat similar issues where constant > folded function clones are emitted with a constant argument that could > never occur in reality [0]) > > Given the above, the only meaningful way to invoke this function is > with different registers assigned to %3 and %4, and so tightening the > constraints to guarantee that does not actually result in worse code > (except maybe for the instantiations that we won't ever call in the > first place). So I think we should fix this. > > I wonder if just adding > > BUG_ON(__builtin_constant_p(uaddr)); > > at the beginning makes any difference - this shouldn't result in any > object code differences since the conditional will always evaluate to > false at build time for instantiations we care about. > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/ What I'm actually asking is: The GCC manual says that input operands _may_ overlap output operands since GCC assumes that input operands are consumed before output operands are written. This is an explicit statement. The GCC manual does not say that input operands may overlap with each other, and the behaviour of GCC thus far (apart from one version, presumably caused by a bug) has been that input operands are unique. Clang appears to be different: it allows input operands that are registers, and contain the same constant value to be the same physical register. The assertion is that the constraints are under-specified. I am questioning that assertion. If the constraints are under-specified, I would have expected gcc-4.4's behaviour to have persisted, and we would've been told by gcc's developers to fix our code. That didn't happen, and instead gcc seems to have been fixed. So, my conclusion is that it is intentional that input operands to asm() do not overlap with themselves. It seems to me that the work-around for clang is to change every input operand to be an output operand with a "+&r" contraint - an operand that is both read and written by the "instruction", and that the operand is "earlyclobber". For something that is really only read, that seems strange. Also, reading GCC's manual, it would appear that "+&" is wrong. `+' Means that this operand is both read and written by the instruction. When the compiler fixes up the operands to satisfy the constraints, it needs to know which operands are inputs to the instruction and which are outputs from it. `=' identifies an output; `+' identifies an operand that is both input and output; all other ^^^^^^^^^^^^^^^^^^^^^ operands are assumed to be input only. `&' Means (in a particular alternative) that this operand is an "earlyclobber" operand, which is modified before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is used as an input operand or as part ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ of any memory address. So "+" says that this operand is an input but "&" says that it must not be in a register that is used as an input. That's contradictory, and I think we can expect GCC to barf or at least end up doing strange stuff, if not with existing versions, then with future versions. Hence, I'm asking for clarification why it is thought that the existing code underspecifies the asm constraints, and I'm trying to get some more thought about what the constraints should be, in case there is a need to use "better" constraints. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
On Thu, Mar 07, 2019 at 04:04:23PM -0800, Nick Desaulniers wrote: > On Thu, Mar 7, 2019 at 3:49 PM Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > > Underspecification of constraints to extended inline assembly is a > > > common issue exposed by other compilers (and possibly but in-effect > > > infrequently compiler upgrades). > > > > I don't see what is "underspecified" in the original constraints. > > Please explain. > > From the link: > > The problem is that in the T(streq) insn, %3 and %4 MUST be different registers, > but nothing in the asm() constrains them to be different. Thanks, but that is not what I'm asking. See my reply to Ard. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote: > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > > > > Passing registers containing zero as both the address (NULL pointer) > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign > > > > > the same register for both inputs on ARM, which triggers a warning > > > > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > > > > > > > /tmp/futex-7e740e.s: Assembler messages: > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > > > > as Mikael wrote: > > > > > "One way of fixing this is to make uaddr an input/output register, since > > > > > "that prevents it from overlapping any other input or output." > > > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it > > > > > apparently never showed up again with later gcc versions. > > > > > > > > > > Now the same problem is back when compiling with clang, and we are trying > > > > > to get clang to build the kernel without warnings, as gcc normally does. > > > > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > > > > Cc: Dave Martin <Dave.Martin@arm.com> > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > > --- > > > > > arch/arm/include/asm/futex.h | 10 +++++----- > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > > > > index 0a46676b4245..79790912974e 100644 > > > > > --- a/arch/arm/include/asm/futex.h > > > > > +++ b/arch/arm/include/asm/futex.h > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > > > > preempt_disable(); > > > > > __ua_flags = uaccess_save_and_enable(); > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > > > > - "1: " TUSER(ldr) " %1, [%4]\n" > > > > > - " teq %1, %2\n" > > > > > + "1: " TUSER(ldr) " %1, [%2]\n" > > > > > + " teq %1, %3\n" > > > > > " it eq @ explicit IT needed for the 2b label\n" > > > > > - "2: " TUSER(streq) " %3, [%4]\n" > > > > > + "2: " TUSER(streq) " %4, [%2]\n" > > > > > __futex_atomic_ex_table("%5") > > > > > - : "+r" (ret), "=&r" (val) > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > > > > : "cc", "memory"); > > > > > uaccess_restore(__ua_flags); > > > > > > > > Underspecification of constraints to extended inline assembly is a > > > > common issue exposed by other compilers (and possibly but in-effect > > > > infrequently compiler upgrades). > > > > So the reordering of the constraints means the in the assembly (notes > > > > for other reviewers): > > > > %2 -> %3 > > > > %3 -> %4 > > > > %4 -> %2 > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! > > > > > > I don't see what is "underspecified" in the original constraints. > > > Please explain. > > > > > > > I agree that that statement makes little sense. > > > > As Russell points out in the referenced thread, there is nothing wrong > > with the generated assembly, given that the UNPREDICTABLE opcode is > > unreachable in practice. Unfortunately, we have no way to flag this > > diagnostic as a known false positive, and AFAICT, there is no reason > > we couldn't end up with the same diagnostic popping up for GCC builds > > in the future, considering that the register assignment matches the > > constraints. (We have seen somewhat similar issues where constant > > folded function clones are emitted with a constant argument that could > > never occur in reality [0]) > > > > Given the above, the only meaningful way to invoke this function is > > with different registers assigned to %3 and %4, and so tightening the > > constraints to guarantee that does not actually result in worse code > > (except maybe for the instantiations that we won't ever call in the > > first place). So I think we should fix this. > > > > I wonder if just adding > > > > BUG_ON(__builtin_constant_p(uaddr)); > > > > at the beginning makes any difference - this shouldn't result in any > > object code differences since the conditional will always evaluate to > > false at build time for instantiations we care about. > > > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/ > > What I'm actually asking is: > > The GCC manual says that input operands _may_ overlap output operands > since GCC assumes that input operands are consumed before output > operands are written. This is an explicit statement. > > The GCC manual does not say that input operands may overlap with each > other, and the behaviour of GCC thus far (apart from one version, > presumably caused by a bug) has been that input operands are unique. > Not entirely. I have run into issues where GCC assumes that registers that are only used for input operands are left untouched by the asm code. I.e., if you put an asm() block in a loop and modify an input register, your code may break on the next pass, even if the input register does not overlap with an output register. To me, that seems to suggest that whether or not inputs may overlap is irrelevant, since they are not expected to be modified. > Clang appears to be different: it allows input operands that are > registers, and contain the same constant value to be the same physical > register. > > The assertion is that the constraints are under-specified. I am > questioning that assertion. > > If the constraints are under-specified, I would have expected gcc-4.4's > behaviour to have persisted, and we would've been told by gcc's > developers to fix our code. That didn't happen, and instead gcc seems > to have been fixed. So, my conclusion is that it is intentional that > input operands to asm() do not overlap with themselves. > Whether we hit the error or not is not deterministic. Like in the ilog2() case I quoted, GCC may decide to instantiate a constant folded ['curried', if you will] clone of a function, and so even if any calls to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval and uaddr are compiled, it does not mean they occur like that in the C code. > It seems to me that the work-around for clang is to change every input > operand to be an output operand with a "+&r" contraint - an operand > that is both read and written by the "instruction", and that the operand > is "earlyclobber". For something that is really only read, that seems > strange. > > Also, reading GCC's manual, it would appear that "+&" is wrong. > > `+' > Means that this operand is both read and written by the > instruction. > > When the compiler fixes up the operands to satisfy the constraints, > it needs to know which operands are inputs to the instruction and > which are outputs from it. `=' identifies an output; `+' > identifies an operand that is both input and output; all other > ^^^^^^^^^^^^^^^^^^^^^ > operands are assumed to be input only. > > `&' > Means (in a particular alternative) that this operand is an > "earlyclobber" operand, which is modified before the instruction is > finished using the input operands. Therefore, this operand may > not lie in a register that is used as an input operand or as part > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > of any memory address. > > So "+" says that this operand is an input but "&" says that it must not > be in a register that is used as an input. That's contradictory, and I > think we can expect GCC to barf or at least end up doing strange stuff, > if not with existing versions, then with future versions. > I wondered about the same thing: given that the asm itself is a black box to the compiler, it can never reuse an in/output register for output, so when it is clobbered is irrelevant. > Hence, I'm asking for clarification why it is thought that the existing > code underspecifies the asm constraints, and I'm trying to get some more > thought about what the constraints should be, in case there is a need to > use "better" constraints. > I think the constraints are correct, but as I argued before, tightening the constraints to ensure that uaddr and newval are not mapped onto the same register should not result in any object code changes, except for the case where the compiler instantiated a constprop clone that is bogus to begin with.
On Fri, 8 Mar 2019 at 11:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote: > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > > > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > > > > > > Passing registers containing zero as both the address (NULL pointer) > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign > > > > > > the same register for both inputs on ARM, which triggers a warning > > > > > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > > > > > > > > > /tmp/futex-7e740e.s: Assembler messages: > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > > > > > as Mikael wrote: > > > > > > "One way of fixing this is to make uaddr an input/output register, since > > > > > > "that prevents it from overlapping any other input or output." > > > > > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it > > > > > > apparently never showed up again with later gcc versions. > > > > > > > > > > > > Now the same problem is back when compiling with clang, and we are trying > > > > > > to get clang to build the kernel without warnings, as gcc normally does. > > > > > > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > > > > > Cc: Dave Martin <Dave.Martin@arm.com> > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > > > --- > > > > > > arch/arm/include/asm/futex.h | 10 +++++----- > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > > > > > index 0a46676b4245..79790912974e 100644 > > > > > > --- a/arch/arm/include/asm/futex.h > > > > > > +++ b/arch/arm/include/asm/futex.h > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > > > > > preempt_disable(); > > > > > > __ua_flags = uaccess_save_and_enable(); > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n" > > > > > > - " teq %1, %2\n" > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n" > > > > > > + " teq %1, %3\n" > > > > > > " it eq @ explicit IT needed for the 2b label\n" > > > > > > - "2: " TUSER(streq) " %3, [%4]\n" > > > > > > + "2: " TUSER(streq) " %4, [%2]\n" > > > > > > __futex_atomic_ex_table("%5") > > > > > > - : "+r" (ret), "=&r" (val) > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > > > > > : "cc", "memory"); > > > > > > uaccess_restore(__ua_flags); > > > > > > > > > > Underspecification of constraints to extended inline assembly is a > > > > > common issue exposed by other compilers (and possibly but in-effect > > > > > infrequently compiler upgrades). > > > > > So the reordering of the constraints means the in the assembly (notes > > > > > for other reviewers): > > > > > %2 -> %3 > > > > > %3 -> %4 > > > > > %4 -> %2 > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! > > > > > > > > I don't see what is "underspecified" in the original constraints. > > > > Please explain. > > > > > > > > > > I agree that that statement makes little sense. > > > > > > As Russell points out in the referenced thread, there is nothing wrong > > > with the generated assembly, given that the UNPREDICTABLE opcode is > > > unreachable in practice. Unfortunately, we have no way to flag this > > > diagnostic as a known false positive, and AFAICT, there is no reason > > > we couldn't end up with the same diagnostic popping up for GCC builds > > > in the future, considering that the register assignment matches the > > > constraints. (We have seen somewhat similar issues where constant > > > folded function clones are emitted with a constant argument that could > > > never occur in reality [0]) > > > > > > Given the above, the only meaningful way to invoke this function is > > > with different registers assigned to %3 and %4, and so tightening the > > > constraints to guarantee that does not actually result in worse code > > > (except maybe for the instantiations that we won't ever call in the > > > first place). So I think we should fix this. > > > > > > I wonder if just adding > > > > > > BUG_ON(__builtin_constant_p(uaddr)); > > > > > > at the beginning makes any difference - this shouldn't result in any > > > object code differences since the conditional will always evaluate to > > > false at build time for instantiations we care about. > > > > > > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/ > > > > What I'm actually asking is: > > > > The GCC manual says that input operands _may_ overlap output operands > > since GCC assumes that input operands are consumed before output > > operands are written. This is an explicit statement. > > > > The GCC manual does not say that input operands may overlap with each > > other, and the behaviour of GCC thus far (apart from one version, > > presumably caused by a bug) has been that input operands are unique. > > > > Not entirely. I have run into issues where GCC assumes that registers > that are only used for input operands are left untouched by the asm > code. I.e., if you put an asm() block in a loop and modify an input > register, your code may break on the next pass, even if the input > register does not overlap with an output register. > > To me, that seems to suggest that whether or not inputs may overlap is > irrelevant, since they are not expected to be modified. > > > Clang appears to be different: it allows input operands that are > > registers, and contain the same constant value to be the same physical > > register. > > > > The assertion is that the constraints are under-specified. I am > > questioning that assertion. > > > > If the constraints are under-specified, I would have expected gcc-4.4's > > behaviour to have persisted, and we would've been told by gcc's > > developers to fix our code. That didn't happen, and instead gcc seems > > to have been fixed. So, my conclusion is that it is intentional that > > input operands to asm() do not overlap with themselves. > > > > Whether we hit the error or not is not deterministic. Like in the > ilog2() case I quoted, GCC may decide to instantiate a constant folded > ['curried', if you will] clone of a function, and so even if any calls > to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval > and uaddr are compiled, it does not mean they occur like that in the C > code. > > > It seems to me that the work-around for clang is to change every input > > operand to be an output operand with a "+&r" contraint - an operand > > that is both read and written by the "instruction", and that the operand > > is "earlyclobber". For something that is really only read, that seems > > strange. > > > > Also, reading GCC's manual, it would appear that "+&" is wrong. > > > > `+' > > Means that this operand is both read and written by the > > instruction. > > > > When the compiler fixes up the operands to satisfy the constraints, > > it needs to know which operands are inputs to the instruction and > > which are outputs from it. `=' identifies an output; `+' > > identifies an operand that is both input and output; all other > > ^^^^^^^^^^^^^^^^^^^^^ > > operands are assumed to be input only. > > > > `&' > > Means (in a particular alternative) that this operand is an > > "earlyclobber" operand, which is modified before the instruction is > > finished using the input operands. Therefore, this operand may > > not lie in a register that is used as an input operand or as part > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > of any memory address. > > > > So "+" says that this operand is an input but "&" says that it must not > > be in a register that is used as an input. That's contradictory, and I > > think we can expect GCC to barf or at least end up doing strange stuff, > > if not with existing versions, then with future versions. > > > > I wondered about the same thing: given that the asm itself is a black > box to the compiler, it can never reuse an in/output register for > output, so when it is clobbered is irrelevant. > > > Hence, I'm asking for clarification why it is thought that the existing > > code underspecifies the asm constraints, and I'm trying to get some more > > thought about what the constraints should be, in case there is a need to > > use "better" constraints. > > > > I think the constraints are correct, but as I argued before, > tightening the constraints to ensure that uaddr and newval are not > mapped onto the same register should not result in any object code > changes, except for the case where the compiler instantiated a > constprop clone that is bogus to begin with. Compiling the following code """ #include <stdio.h> static void foo(void *a, int b) { asm("str %0, [%1]" :: "r"(a), "r"(b)); } int main(void) { foo(NULL, 0); } """ with GCC 6.3 (at -O2) gives me .arch armv7-a .eabi_attribute 28, 1 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "futex.c" .section .text.startup,"ax",%progbits .align 1 .p2align 2,,3 .global main .syntax unified .thumb .thumb_func .fpu vfpv3-d16 .type main, %function main: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. movs r0, #0 .syntax unified @ 6 "/tmp/futex.c" 1 str r0, [r0] @ 0 "" 2 .thumb .syntax unified bx lr .size main, .-main .ident "GCC: (Debian 6.3.0-18) 6.3.0 20170516" .section .note.GNU-stack,"",%progbits and so GCC definitely behaves similar in this regard.
On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote: > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote: > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > > > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > > > > > > Passing registers containing zero as both the address (NULL pointer) > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign > > > > > > the same register for both inputs on ARM, which triggers a warning > > > > > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > > > > > > > > > /tmp/futex-7e740e.s: Assembler messages: > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > > > > > as Mikael wrote: > > > > > > "One way of fixing this is to make uaddr an input/output register, since > > > > > > "that prevents it from overlapping any other input or output." > > > > > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it > > > > > > apparently never showed up again with later gcc versions. > > > > > > > > > > > > Now the same problem is back when compiling with clang, and we are trying > > > > > > to get clang to build the kernel without warnings, as gcc normally does. > > > > > > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > > > > > Cc: Dave Martin <Dave.Martin@arm.com> > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > > > --- > > > > > > arch/arm/include/asm/futex.h | 10 +++++----- > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > > > > > index 0a46676b4245..79790912974e 100644 > > > > > > --- a/arch/arm/include/asm/futex.h > > > > > > +++ b/arch/arm/include/asm/futex.h > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > > > > > preempt_disable(); > > > > > > __ua_flags = uaccess_save_and_enable(); > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n" > > > > > > - " teq %1, %2\n" > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n" > > > > > > + " teq %1, %3\n" > > > > > > " it eq @ explicit IT needed for the 2b label\n" > > > > > > - "2: " TUSER(streq) " %3, [%4]\n" > > > > > > + "2: " TUSER(streq) " %4, [%2]\n" > > > > > > __futex_atomic_ex_table("%5") > > > > > > - : "+r" (ret), "=&r" (val) > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > > > > > : "cc", "memory"); > > > > > > uaccess_restore(__ua_flags); > > > > > > > > > > Underspecification of constraints to extended inline assembly is a > > > > > common issue exposed by other compilers (and possibly but in-effect > > > > > infrequently compiler upgrades). > > > > > So the reordering of the constraints means the in the assembly (notes > > > > > for other reviewers): > > > > > %2 -> %3 > > > > > %3 -> %4 > > > > > %4 -> %2 > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! > > > > > > > > I don't see what is "underspecified" in the original constraints. > > > > Please explain. > > > > > > > > > > I agree that that statement makes little sense. > > > > > > As Russell points out in the referenced thread, there is nothing wrong > > > with the generated assembly, given that the UNPREDICTABLE opcode is > > > unreachable in practice. Unfortunately, we have no way to flag this > > > diagnostic as a known false positive, and AFAICT, there is no reason > > > we couldn't end up with the same diagnostic popping up for GCC builds > > > in the future, considering that the register assignment matches the > > > constraints. (We have seen somewhat similar issues where constant > > > folded function clones are emitted with a constant argument that could > > > never occur in reality [0]) > > > > > > Given the above, the only meaningful way to invoke this function is > > > with different registers assigned to %3 and %4, and so tightening the > > > constraints to guarantee that does not actually result in worse code > > > (except maybe for the instantiations that we won't ever call in the > > > first place). So I think we should fix this. > > > > > > I wonder if just adding > > > > > > BUG_ON(__builtin_constant_p(uaddr)); > > > > > > at the beginning makes any difference - this shouldn't result in any > > > object code differences since the conditional will always evaluate to > > > false at build time for instantiations we care about. > > > > > > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/ > > > > What I'm actually asking is: > > > > The GCC manual says that input operands _may_ overlap output operands > > since GCC assumes that input operands are consumed before output > > operands are written. This is an explicit statement. > > > > The GCC manual does not say that input operands may overlap with each > > other, and the behaviour of GCC thus far (apart from one version, > > presumably caused by a bug) has been that input operands are unique. > > > > Not entirely. I have run into issues where GCC assumes that registers > that are only used for input operands are left untouched by the asm > code. I.e., if you put an asm() block in a loop and modify an input > register, your code may break on the next pass, even if the input > register does not overlap with an output register. GCC has had the expectation for decades that _input_ operands are not changed in value by the code in the assembly. This isn't quite the same thing as the uniqueness of the register allocation for input operands. > To me, that seems to suggest that whether or not inputs may overlap is > irrelevant, since they are not expected to be modified. How is: stmfd sp!, {r0-r3, ip, lr} bl foo ldmfd sp!, {r0-r3, ip, lr} where r1 may be an input operand (to pass an argument to foo) any different from: ldrt r0, [r1] as far as whether r1 is modified in both cases? In both cases, the value of r1 is read and written by both instructions, but in both cases the value of r1 remains the same no matter what the value of r1 was. The "input operands should not be modified" is entirely orthogonal to the input operand register allocation. > > Clang appears to be different: it allows input operands that are > > registers, and contain the same constant value to be the same physical > > register. > > > > The assertion is that the constraints are under-specified. I am > > questioning that assertion. > > > > If the constraints are under-specified, I would have expected gcc-4.4's > > behaviour to have persisted, and we would've been told by gcc's > > developers to fix our code. That didn't happen, and instead gcc seems > > to have been fixed. So, my conclusion is that it is intentional that > > input operands to asm() do not overlap with themselves. > > > > Whether we hit the error or not is not deterministic. Like in the > ilog2() case I quoted, GCC may decide to instantiate a constant folded > ['curried', if you will] clone of a function, and so even if any calls > to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval > and uaddr are compiled, it does not mean they occur like that in the C > code. Again, I think this is different: gcc knows what the C code is doing and can optimise it. GCC doesn't have any idea what the code in an asm() is doing beyond what the constraints are telling it, and the rules for those constraints set out in the GCC manual. Given that we are explicitly talking about the register allocation for input operands, I'm not sure how the ilog2() case you mention applies. > > It seems to me that the work-around for clang is to change every input > > operand to be an output operand with a "+&r" contraint - an operand > > that is both read and written by the "instruction", and that the operand > > is "earlyclobber". For something that is really only read, that seems > > strange. > > > > Also, reading GCC's manual, it would appear that "+&" is wrong. > > > > `+' > > Means that this operand is both read and written by the > > instruction. > > > > When the compiler fixes up the operands to satisfy the constraints, > > it needs to know which operands are inputs to the instruction and > > which are outputs from it. `=' identifies an output; `+' > > identifies an operand that is both input and output; all other > > ^^^^^^^^^^^^^^^^^^^^^ > > operands are assumed to be input only. > > > > `&' > > Means (in a particular alternative) that this operand is an > > "earlyclobber" operand, which is modified before the instruction is > > finished using the input operands. Therefore, this operand may > > not lie in a register that is used as an input operand or as part > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > of any memory address. > > > > So "+" says that this operand is an input but "&" says that it must not > > be in a register that is used as an input. That's contradictory, and I > > think we can expect GCC to barf or at least end up doing strange stuff, > > if not with existing versions, then with future versions. > > > > I wondered about the same thing: given that the asm itself is a black > box to the compiler, it can never reuse an in/output register for > output, so when it is clobbered is irrelevant. Let me try again - you seem to have completely missed my point. + specifies that the operand is an input. & specifies that the operand is not an input. + and & are contradictory. GCC is at liberty to not assign a value to an operand with a +& modifier, or error out such a construction. > > > Hence, I'm asking for clarification why it is thought that the existing > > code underspecifies the asm constraints, and I'm trying to get some more > > thought about what the constraints should be, in case there is a need to > > use "better" constraints. > > I think the constraints are correct, but as I argued before, > tightening the constraints to ensure that uaddr and newval are not > mapped onto the same register should not result in any object code > changes, except for the case where the compiler instantiated a > constprop clone that is bogus to begin with. ... by tightening it to an undefined combination of constraint modifiers that just happens to seem to do the right thing. No, this is not proper "engineering". This is bodging. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote: > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote: > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin > > > > <linux@armlinux.org.uk> wrote: > > > > > > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > > > > > > > > Passing registers containing zero as both the address (NULL pointer) > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign > > > > > > > the same register for both inputs on ARM, which triggers a warning > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > > > > > > > > > > > /tmp/futex-7e740e.s: Assembler messages: > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > > > > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > > > > > > as Mikael wrote: > > > > > > > "One way of fixing this is to make uaddr an input/output register, since > > > > > > > "that prevents it from overlapping any other input or output." > > > > > > > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it > > > > > > > apparently never showed up again with later gcc versions. > > > > > > > > > > > > > > Now the same problem is back when compiling with clang, and we are trying > > > > > > > to get clang to build the kernel without warnings, as gcc normally does. > > > > > > > > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > > > > > > Cc: Dave Martin <Dave.Martin@arm.com> > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > > > > --- > > > > > > > arch/arm/include/asm/futex.h | 10 +++++----- > > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > > > > > > index 0a46676b4245..79790912974e 100644 > > > > > > > --- a/arch/arm/include/asm/futex.h > > > > > > > +++ b/arch/arm/include/asm/futex.h > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > > > > > > preempt_disable(); > > > > > > > __ua_flags = uaccess_save_and_enable(); > > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n" > > > > > > > - " teq %1, %2\n" > > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n" > > > > > > > + " teq %1, %3\n" > > > > > > > " it eq @ explicit IT needed for the 2b label\n" > > > > > > > - "2: " TUSER(streq) " %3, [%4]\n" > > > > > > > + "2: " TUSER(streq) " %4, [%2]\n" > > > > > > > __futex_atomic_ex_table("%5") > > > > > > > - : "+r" (ret), "=&r" (val) > > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > > > > > > : "cc", "memory"); > > > > > > > uaccess_restore(__ua_flags); > > > > > > > > > > > > Underspecification of constraints to extended inline assembly is a > > > > > > common issue exposed by other compilers (and possibly but in-effect > > > > > > infrequently compiler upgrades). > > > > > > So the reordering of the constraints means the in the assembly (notes > > > > > > for other reviewers): > > > > > > %2 -> %3 > > > > > > %3 -> %4 > > > > > > %4 -> %2 > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! > > > > > > > > > > I don't see what is "underspecified" in the original constraints. > > > > > Please explain. > > > > > > > > > > > > > I agree that that statement makes little sense. > > > > > > > > As Russell points out in the referenced thread, there is nothing wrong > > > > with the generated assembly, given that the UNPREDICTABLE opcode is > > > > unreachable in practice. Unfortunately, we have no way to flag this > > > > diagnostic as a known false positive, and AFAICT, there is no reason > > > > we couldn't end up with the same diagnostic popping up for GCC builds > > > > in the future, considering that the register assignment matches the > > > > constraints. (We have seen somewhat similar issues where constant > > > > folded function clones are emitted with a constant argument that could > > > > never occur in reality [0]) > > > > > > > > Given the above, the only meaningful way to invoke this function is > > > > with different registers assigned to %3 and %4, and so tightening the > > > > constraints to guarantee that does not actually result in worse code > > > > (except maybe for the instantiations that we won't ever call in the > > > > first place). So I think we should fix this. > > > > > > > > I wonder if just adding > > > > > > > > BUG_ON(__builtin_constant_p(uaddr)); > > > > > > > > at the beginning makes any difference - this shouldn't result in any > > > > object code differences since the conditional will always evaluate to > > > > false at build time for instantiations we care about. > > > > > > > > > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/ > > > > > > What I'm actually asking is: > > > > > > The GCC manual says that input operands _may_ overlap output operands > > > since GCC assumes that input operands are consumed before output > > > operands are written. This is an explicit statement. > > > > > > The GCC manual does not say that input operands may overlap with each > > > other, and the behaviour of GCC thus far (apart from one version, > > > presumably caused by a bug) has been that input operands are unique. > > > > > > > Not entirely. I have run into issues where GCC assumes that registers > > that are only used for input operands are left untouched by the asm > > code. I.e., if you put an asm() block in a loop and modify an input > > register, your code may break on the next pass, even if the input > > register does not overlap with an output register. > > GCC has had the expectation for decades that _input_ operands are not > changed in value by the code in the assembly. This isn't quite the > same thing as the uniqueness of the register allocation for input > operands. > > > To me, that seems to suggest that whether or not inputs may overlap is > > irrelevant, since they are not expected to be modified. > > How is: > > stmfd sp!, {r0-r3, ip, lr} > bl foo > ldmfd sp!, {r0-r3, ip, lr} > > where r1 may be an input operand (to pass an argument to foo) any > different from: > > ldrt r0, [r1] > > as far as whether r1 is modified in both cases? In both cases, the > value of r1 is read and written by both instructions, but in both > cases the value of r1 remains the same no matter what the value of r1 > was. > > The "input operands should not be modified" is entirely orthogonal to > the input operand register allocation. > The question is whether it is reasonable for GCC to use the same register for input operands that have the same value. From the assumption that GCC makes that the asm will not modified follows directly that we can use the same register for different operands. And in fact, since that asm code (when built in ARM mode) does modify the register, uaddr should not be an input operand to begin with. In other words, there is an actual bug here, and this patch fixes it. > > > Clang appears to be different: it allows input operands that are > > > registers, and contain the same constant value to be the same physical > > > register. > > > > > > The assertion is that the constraints are under-specified. I am > > > questioning that assertion. > > > > > > If the constraints are under-specified, I would have expected gcc-4.4's > > > behaviour to have persisted, and we would've been told by gcc's > > > developers to fix our code. That didn't happen, and instead gcc seems > > > to have been fixed. So, my conclusion is that it is intentional that > > > input operands to asm() do not overlap with themselves. > > > > > > > Whether we hit the error or not is not deterministic. Like in the > > ilog2() case I quoted, GCC may decide to instantiate a constant folded > > ['curried', if you will] clone of a function, and so even if any calls > > to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval > > and uaddr are compiled, it does not mean they occur like that in the C > > code. > > Again, I think this is different: gcc knows what the C code is doing and > can optimise it. GCC doesn't have any idea what the code in an asm() is > doing beyond what the constraints are telling it, and the rules for > those constraints set out in the GCC manual. > > Given that we are explicitly talking about the register allocation for > input operands, I'm not sure how the ilog2() case you mention applies. > The relevance of the ilog2() case is that we are dealing with an invocation of the function that never actually occurs in the code. The compiler emits it as part of an optimization step, and this is how we end up with constant operands for newval and uaddr. > > > It seems to me that the work-around for clang is to change every input > > > operand to be an output operand with a "+&r" contraint - an operand > > > that is both read and written by the "instruction", and that the operand > > > is "earlyclobber". For something that is really only read, that seems > > > strange. > > > > > > Also, reading GCC's manual, it would appear that "+&" is wrong. > > > > > > `+' > > > Means that this operand is both read and written by the > > > instruction. > > > > > > When the compiler fixes up the operands to satisfy the constraints, > > > it needs to know which operands are inputs to the instruction and > > > which are outputs from it. `=' identifies an output; `+' > > > identifies an operand that is both input and output; all other > > > ^^^^^^^^^^^^^^^^^^^^^ > > > operands are assumed to be input only. > > > > > > `&' > > > Means (in a particular alternative) that this operand is an > > > "earlyclobber" operand, which is modified before the instruction is > > > finished using the input operands. Therefore, this operand may > > > not lie in a register that is used as an input operand or as part > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > of any memory address. > > > > > > So "+" says that this operand is an input but "&" says that it must not > > > be in a register that is used as an input. That's contradictory, and I > > > think we can expect GCC to barf or at least end up doing strange stuff, > > > if not with existing versions, then with future versions. > > > > > > > I wondered about the same thing: given that the asm itself is a black > > box to the compiler, it can never reuse an in/output register for > > output, so when it is clobbered is irrelevant. > > Let me try again - you seem to have completely missed my point. > > + specifies that the operand is an input. > & specifies that the operand is not an input. > > + and & are contradictory. > > GCC is at liberty to not assign a value to an operand with a +& > modifier, or error out such a construction. > I agree that the +& does not make sense. > > > > > Hence, I'm asking for clarification why it is thought that the existing > > > code underspecifies the asm constraints, and I'm trying to get some more > > > thought about what the constraints should be, in case there is a need to > > > use "better" constraints. > > > > I think the constraints are correct, but as I argued before, > > tightening the constraints to ensure that uaddr and newval are not > > mapped onto the same register should not result in any object code > > changes, except for the case where the compiler instantiated a > > constprop clone that is bogus to begin with. > > ... by tightening it to an undefined combination of constraint modifiers > that just happens to seem to do the right thing. No, this is not proper > "engineering". This is bodging. > As I argued above, using an input operand for uaddr is incorrect (in ARM mode) since the instruction does modify the register. So modulo the +&, I think the patch is an improvement.
On Fri, Mar 08, 2019 at 11:16:47AM +0100, Ard Biesheuvel wrote: > Compiling the following code > > """ > #include <stdio.h> > > static void foo(void *a, int b) > { > asm("str %0, [%1]" :: "r"(a), "r"(b)); > } > > int main(void) > { > foo(NULL, 0); > } > """ > > with GCC 6.3 (at -O2) gives me > > .arch armv7-a > .eabi_attribute 28, 1 > .eabi_attribute 20, 1 > .eabi_attribute 21, 1 > .eabi_attribute 23, 3 > .eabi_attribute 24, 1 > .eabi_attribute 25, 1 > .eabi_attribute 26, 2 > .eabi_attribute 30, 2 > .eabi_attribute 34, 1 > .eabi_attribute 18, 4 > .file "futex.c" > .section .text.startup,"ax",%progbits > .align 1 > .p2align 2,,3 > .global main > .syntax unified > .thumb > .thumb_func > .fpu vfpv3-d16 > .type main, %function > main: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > @ link register save eliminated. > movs r0, #0 > .syntax unified > @ 6 "/tmp/futex.c" 1 > str r0, [r0] > @ 0 "" 2 > .thumb > .syntax unified > bx lr > .size main, .-main > .ident "GCC: (Debian 6.3.0-18) 6.3.0 20170516" > .section .note.GNU-stack,"",%progbits > > and so GCC definitely behaves similar in this regard. Let's take this further - a volatile is required for these cases to avoid gcc eliminating the asm() due to the output not being used: #define NULL ((void *)0) static void foo(void *a, int b) { asm volatile("str %1, [%0]" : "=&r" (a) : "0" (a), "r" (b)); } int main(void) { foo(NULL, 0); } produces: mov r3, #0 mov r2, r3 str r2, [r2] which looks to me to be incorrect to the GCC manual - the '&' on the output operand should mean that it does not conflict with other input operands, but clearly 'r2' has ended up being 'b' as well. I suspect this is a bug, or if not, is completely counter-intuitive from the description in the GCC manual. Using "+r" (a) : "r" (b) also results in: mov r3, #0 str r3, [r3] It seems that only using "+&r" (a) : "r" (b) avoids a and b being in the same register, but I question whether we are stepping into undefined compiler behaviour with that. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote: > On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote: > > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > > > > > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote: > > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin > > > > > <linux@armlinux.org.uk> wrote: > > > > > > > > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > > > > > > > > > > Passing registers containing zero as both the address (NULL pointer) > > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign > > > > > > > > the same register for both inputs on ARM, which triggers a warning > > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > > > > > > > > > > > > > /tmp/futex-7e740e.s: Assembler messages: > > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > > > > > > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > > > > > > > as Mikael wrote: > > > > > > > > "One way of fixing this is to make uaddr an input/output register, since > > > > > > > > "that prevents it from overlapping any other input or output." > > > > > > > > > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it > > > > > > > > apparently never showed up again with later gcc versions. > > > > > > > > > > > > > > > > Now the same problem is back when compiling with clang, and we are trying > > > > > > > > to get clang to build the kernel without warnings, as gcc normally does. > > > > > > > > > > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > > > > > > > Cc: Dave Martin <Dave.Martin@arm.com> > > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > > > > > --- > > > > > > > > arch/arm/include/asm/futex.h | 10 +++++----- > > > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > > > > > > > index 0a46676b4245..79790912974e 100644 > > > > > > > > --- a/arch/arm/include/asm/futex.h > > > > > > > > +++ b/arch/arm/include/asm/futex.h > > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > > > > > > > preempt_disable(); > > > > > > > > __ua_flags = uaccess_save_and_enable(); > > > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n" > > > > > > > > - " teq %1, %2\n" > > > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n" > > > > > > > > + " teq %1, %3\n" > > > > > > > > " it eq @ explicit IT needed for the 2b label\n" > > > > > > > > - "2: " TUSER(streq) " %3, [%4]\n" > > > > > > > > + "2: " TUSER(streq) " %4, [%2]\n" > > > > > > > > __futex_atomic_ex_table("%5") > > > > > > > > - : "+r" (ret), "=&r" (val) > > > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > > > > > > > : "cc", "memory"); > > > > > > > > uaccess_restore(__ua_flags); > > > > > > > > > > > > > > Underspecification of constraints to extended inline assembly is a > > > > > > > common issue exposed by other compilers (and possibly but in-effect > > > > > > > infrequently compiler upgrades). > > > > > > > So the reordering of the constraints means the in the assembly (notes > > > > > > > for other reviewers): > > > > > > > %2 -> %3 > > > > > > > %3 -> %4 > > > > > > > %4 -> %2 > > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! > > > > > > > > > > > > I don't see what is "underspecified" in the original constraints. > > > > > > Please explain. > > > > > > > > > > > > > > > > I agree that that statement makes little sense. > > > > > > > > > > As Russell points out in the referenced thread, there is nothing wrong > > > > > with the generated assembly, given that the UNPREDICTABLE opcode is > > > > > unreachable in practice. Unfortunately, we have no way to flag this > > > > > diagnostic as a known false positive, and AFAICT, there is no reason > > > > > we couldn't end up with the same diagnostic popping up for GCC builds > > > > > in the future, considering that the register assignment matches the > > > > > constraints. (We have seen somewhat similar issues where constant > > > > > folded function clones are emitted with a constant argument that could > > > > > never occur in reality [0]) > > > > > > > > > > Given the above, the only meaningful way to invoke this function is > > > > > with different registers assigned to %3 and %4, and so tightening the > > > > > constraints to guarantee that does not actually result in worse code > > > > > (except maybe for the instantiations that we won't ever call in the > > > > > first place). So I think we should fix this. > > > > > > > > > > I wonder if just adding > > > > > > > > > > BUG_ON(__builtin_constant_p(uaddr)); > > > > > > > > > > at the beginning makes any difference - this shouldn't result in any > > > > > object code differences since the conditional will always evaluate to > > > > > false at build time for instantiations we care about. > > > > > > > > > > > > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/ > > > > > > > > What I'm actually asking is: > > > > > > > > The GCC manual says that input operands _may_ overlap output operands > > > > since GCC assumes that input operands are consumed before output > > > > operands are written. This is an explicit statement. > > > > > > > > The GCC manual does not say that input operands may overlap with each > > > > other, and the behaviour of GCC thus far (apart from one version, > > > > presumably caused by a bug) has been that input operands are unique. > > > > > > > > > > Not entirely. I have run into issues where GCC assumes that registers > > > that are only used for input operands are left untouched by the asm > > > code. I.e., if you put an asm() block in a loop and modify an input > > > register, your code may break on the next pass, even if the input > > > register does not overlap with an output register. > > > > GCC has had the expectation for decades that _input_ operands are not > > changed in value by the code in the assembly. This isn't quite the > > same thing as the uniqueness of the register allocation for input > > operands. > > > > > To me, that seems to suggest that whether or not inputs may overlap is > > > irrelevant, since they are not expected to be modified. > > > > How is: > > > > stmfd sp!, {r0-r3, ip, lr} > > bl foo > > ldmfd sp!, {r0-r3, ip, lr} > > > > where r1 may be an input operand (to pass an argument to foo) any > > different from: > > > > ldrt r0, [r1] > > > > as far as whether r1 is modified in both cases? In both cases, the > > value of r1 is read and written by both instructions, but in both > > cases the value of r1 remains the same no matter what the value of r1 > > was. > > > > The "input operands should not be modified" is entirely orthogonal to > > the input operand register allocation. > > > > The question is whether it is reasonable for GCC to use the same > register for input operands that have the same value. From the > assumption that GCC makes that the asm will not modified follows > directly that we can use the same register for different operands. > > And in fact, since that asm code (when built in ARM mode) does modify > the register, uaddr should not be an input operand to begin with. In > other words, there is an actual bug here, and this patch fixes it. Again, you miss my point. > > > > Clang appears to be different: it allows input operands that are > > > > registers, and contain the same constant value to be the same physical > > > > register. > > > > > > > > The assertion is that the constraints are under-specified. I am > > > > questioning that assertion. > > > > > > > > If the constraints are under-specified, I would have expected gcc-4.4's > > > > behaviour to have persisted, and we would've been told by gcc's > > > > developers to fix our code. That didn't happen, and instead gcc seems > > > > to have been fixed. So, my conclusion is that it is intentional that > > > > input operands to asm() do not overlap with themselves. > > > > > > > > > > Whether we hit the error or not is not deterministic. Like in the > > > ilog2() case I quoted, GCC may decide to instantiate a constant folded > > > ['curried', if you will] clone of a function, and so even if any calls > > > to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval > > > and uaddr are compiled, it does not mean they occur like that in the C > > > code. > > > > Again, I think this is different: gcc knows what the C code is doing and > > can optimise it. GCC doesn't have any idea what the code in an asm() is > > doing beyond what the constraints are telling it, and the rules for > > those constraints set out in the GCC manual. > > > > Given that we are explicitly talking about the register allocation for > > input operands, I'm not sure how the ilog2() case you mention applies. > > > > The relevance of the ilog2() case is that we are dealing with an > invocation of the function that never actually occurs in the code. The > compiler emits it as part of an optimization step, and this is how we > end up with constant operands for newval and uaddr. > > > > > It seems to me that the work-around for clang is to change every input > > > > operand to be an output operand with a "+&r" contraint - an operand > > > > that is both read and written by the "instruction", and that the operand > > > > is "earlyclobber". For something that is really only read, that seems > > > > strange. > > > > > > > > Also, reading GCC's manual, it would appear that "+&" is wrong. > > > > > > > > `+' > > > > Means that this operand is both read and written by the > > > > instruction. > > > > > > > > When the compiler fixes up the operands to satisfy the constraints, > > > > it needs to know which operands are inputs to the instruction and > > > > which are outputs from it. `=' identifies an output; `+' > > > > identifies an operand that is both input and output; all other > > > > ^^^^^^^^^^^^^^^^^^^^^ > > > > operands are assumed to be input only. > > > > > > > > `&' > > > > Means (in a particular alternative) that this operand is an > > > > "earlyclobber" operand, which is modified before the instruction is > > > > finished using the input operands. Therefore, this operand may > > > > not lie in a register that is used as an input operand or as part > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > of any memory address. > > > > > > > > So "+" says that this operand is an input but "&" says that it must not > > > > be in a register that is used as an input. That's contradictory, and I > > > > think we can expect GCC to barf or at least end up doing strange stuff, > > > > if not with existing versions, then with future versions. > > > > > > > > > > I wondered about the same thing: given that the asm itself is a black > > > box to the compiler, it can never reuse an in/output register for > > > output, so when it is clobbered is irrelevant. > > > > Let me try again - you seem to have completely missed my point. > > > > + specifies that the operand is an input. > > & specifies that the operand is not an input. > > > > + and & are contradictory. > > > > GCC is at liberty to not assign a value to an operand with a +& > > modifier, or error out such a construction. > > > > I agree that the +& does not make sense. > > > > > > > > Hence, I'm asking for clarification why it is thought that the existing > > > > code underspecifies the asm constraints, and I'm trying to get some more > > > > thought about what the constraints should be, in case there is a need to > > > > use "better" constraints. > > > > > > I think the constraints are correct, but as I argued before, > > > tightening the constraints to ensure that uaddr and newval are not > > > mapped onto the same register should not result in any object code > > > changes, except for the case where the compiler instantiated a > > > constprop clone that is bogus to begin with. > > > > ... by tightening it to an undefined combination of constraint modifiers > > that just happens to seem to do the right thing. No, this is not proper > > "engineering". This is bodging. > > > > As I argued above, using an input operand for uaddr is incorrect (in > ARM mode) since the instruction does modify the register. So modulo > the +&, I think the patch is an improvement. > -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up
oN fRI, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote: > On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin > <linux@armlinux.org.uk> wrote: > > > > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote: > > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin > > > <linux@armlinux.org.uk> wrote: > > > > > > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote: > > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin > > > > > <linux@armlinux.org.uk> wrote: > > > > > > > > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > > > > > > > > > > Passing registers containing zero as both the address (NULL pointer) > > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign > > > > > > > > the same register for both inputs on ARM, which triggers a warning > > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > > > > > > > > > > > > > /tmp/futex-7e740e.s: Assembler messages: > > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > > > > > > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > > > > > > > as Mikael wrote: > > > > > > > > "One way of fixing this is to make uaddr an input/output register, since > > > > > > > > "that prevents it from overlapping any other input or output." > > > > > > > > > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it > > > > > > > > apparently never showed up again with later gcc versions. > > > > > > > > > > > > > > > > Now the same problem is back when compiling with clang, and we are trying > > > > > > > > to get clang to build the kernel without warnings, as gcc normally does. > > > > > > > > > > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > > > > > > > Cc: Dave Martin <Dave.Martin@arm.com> > > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > > > > > --- > > > > > > > > arch/arm/include/asm/futex.h | 10 +++++----- > > > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > > > > > > > index 0a46676b4245..79790912974e 100644 > > > > > > > > --- a/arch/arm/include/asm/futex.h > > > > > > > > +++ b/arch/arm/include/asm/futex.h > > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > > > > > > > preempt_disable(); > > > > > > > > __ua_flags = uaccess_save_and_enable(); > > > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n" > > > > > > > > - " teq %1, %2\n" > > > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n" > > > > > > > > + " teq %1, %3\n" > > > > > > > > " it eq @ explicit IT needed for the 2b label\n" > > > > > > > > - "2: " TUSER(streq) " %3, [%4]\n" > > > > > > > > + "2: " TUSER(streq) " %4, [%2]\n" > > > > > > > > __futex_atomic_ex_table("%5") > > > > > > > > - : "+r" (ret), "=&r" (val) > > > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > > > > > > > : "cc", "memory"); > > > > > > > > uaccess_restore(__ua_flags); > > > > > > > > > > > > > > Underspecification of constraints to extended inline assembly is a > > > > > > > common issue exposed by other compilers (and possibly but in-effect > > > > > > > infrequently compiler upgrades). > > > > > > > So the reordering of the constraints means the in the assembly (notes > > > > > > > for other reviewers): > > > > > > > %2 -> %3 > > > > > > > %3 -> %4 > > > > > > > %4 -> %2 > > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! > > > > > > > > > > > > I don't see what is "underspecified" in the original constraints. > > > > > > Please explain. > > > > > > > > > > > > > > > > I agree that that statement makes little sense. > > > > > > > > > > As Russell points out in the referenced thread, there is nothing wrong > > > > > with the generated assembly, given that the UNPREDICTABLE opcode is > > > > > unreachable in practice. Unfortunately, we have no way to flag this > > > > > diagnostic as a known false positive, and AFAICT, there is no reason > > > > > we couldn't end up with the same diagnostic popping up for GCC builds > > > > > in the future, considering that the register assignment matches the > > > > > constraints. (We have seen somewhat similar issues where constant > > > > > folded function clones are emitted with a constant argument that could > > > > > never occur in reality [0]) > > > > > > > > > > Given the above, the only meaningful way to invoke this function is > > > > > with different registers assigned to %3 and %4, and so tightening the > > > > > constraints to guarantee that does not actually result in worse code > > > > > (except maybe for the instantiations that we won't ever call in the > > > > > first place). So I think we should fix this. > > > > > > > > > > I wonder if just adding > > > > > > > > > > BUG_ON(__builtin_constant_p(uaddr)); > > > > > > > > > > at the beginning makes any difference - this shouldn't result in any > > > > > object code differences since the conditional will always evaluate to > > > > > false at build time for instantiations we care about. > > > > > > > > > > > > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/ > > > > > > > > What I'm actually asking is: > > > > > > > > The GCC manual says that input operands _may_ overlap output operands > > > > since GCC assumes that input operands are consumed before output > > > > operands are written. This is an explicit statement. > > > > > > > > The GCC manual does not say that input operands may overlap with each > > > > other, and the behaviour of GCC thus far (apart from one version, > > > > presumably caused by a bug) has been that input operands are unique. > > > > > > > > > > Not entirely. I have run into issues where GCC assumes that registers > > > that are only used for input operands are left untouched by the asm > > > code. I.e., if you put an asm() block in a loop and modify an input > > > register, your code may break on the next pass, even if the input > > > register does not overlap with an output register. > > > > GCC has had the expectation for decades that _input_ operands are not > > changed in value by the code in the assembly. This isn't quite the > > same thing as the uniqueness of the register allocation for input > > operands. > > > > > To me, that seems to suggest that whether or not inputs may overlap is > > > irrelevant, since they are not expected to be modified. > > > > How is: > > > > stmfd sp!, {r0-r3, ip, lr} > > bl foo > > ldmfd sp!, {r0-r3, ip, lr} > > > > where r1 may be an input operand (to pass an argument to foo) any > > different from: > > > > ldrt r0, [r1] > > > > as far as whether r1 is modified in both cases? In both cases, the > > value of r1 is read and written by both instructions, but in both > > cases the value of r1 remains the same no matter what the value of r1 > > was. > > > > The "input operands should not be modified" is entirely orthogonal to > > the input operand register allocation. > > > > The question is whether it is reasonable for GCC to use the same > register for input operands that have the same value. From the > assumption that GCC makes that the asm will not modified follows > directly that we can use the same register for different operands. Whether "reasonable" or not, GCC does it. And I don't think this is new behaviour... int f(void) { int res; asm ("ADD %0, %0, %0" : "=r" (res) : "r" (77), "r" (77)); return res; } -> 00000000 <f>: 0: e3a0004d mov r0, #77 ; 0x4d 4: e0800000 add r0, r0, r0 8: e12fff1e bx lr > And in fact, since that asm code (when built in ARM mode) does modify > the register, uaddr should not be an input operand to begin with. In > other words, there is an actual bug here, and this patch fixes it. Does the old code modify the register? As I read it, the register is written (in the ARM case) by the underlying STRT instruction, but since the post-index offset it 0, the value written back is the same as the value originally read. In ARMv7-A, strt r0, [r0], #imm will store the original (unmodified) value of r0 if I read the pseudocode correctly. I can't remember the history, but I think that older architecture versions provided a choice about whether the unmodified or modified value was stored. So gas probably just checks whether the registers are the same and emits a warning to be on the safe side. If #imm is 0 (as in the existing futex code here) then it may make no difference in practice though. So, I'm not absolutely convinced there's a bug here, unless this is truly specified as UNPREDICATABLE in older arch versions. But the warning it at least annoying and use of "&" to prevent gas allocating things to the same register is already widespread for "+" asm arguments, even for arm. If the _value_ of the affected operand is not changed by the asm, I think we don't strictly need "+", but we are using "&" here for its register allocation side-effects, and "&" (for its original purpose at least) is only applicable to output ("=" or "+") operands. So I think the patch probably makes sense. IMHO the gas documentation is misleading (or at least unhelpful). Cheers ---Dave
On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote: > > On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin > > <linux@armlinux.org.uk> wrote: > > > > > > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote: > > > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin > > > > <linux@armlinux.org.uk> wrote: > > > > > > > > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote: > > > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin > > > > > > <linux@armlinux.org.uk> wrote: > > > > > > > > > > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote: > > > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > > > > > > > > > > > > > Passing registers containing zero as both the address (NULL pointer) > > > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign > > > > > > > > > the same register for both inputs on ARM, which triggers a warning > > > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5. > > > > > > > > > > > > > > > > > > /tmp/futex-7e740e.s: Assembler messages: > > > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base > > > > > > > > > > > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, > > > > > > > > > as Mikael wrote: > > > > > > > > > "One way of fixing this is to make uaddr an input/output register, since > > > > > > > > > "that prevents it from overlapping any other input or output." > > > > > > > > > > > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it > > > > > > > > > apparently never showed up again with later gcc versions. > > > > > > > > > > > > > > > > > > Now the same problem is back when compiling with clang, and we are trying > > > > > > > > > to get clang to build the kernel without warnings, as gcc normally does. > > > > > > > > > > > > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se> > > > > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com> > > > > > > > > > Cc: Dave Martin <Dave.Martin@arm.com> > > > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ > > > > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > > > > > > --- > > > > > > > > > arch/arm/include/asm/futex.h | 10 +++++----- > > > > > > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > > > > > > > > > index 0a46676b4245..79790912974e 100644 > > > > > > > > > --- a/arch/arm/include/asm/futex.h > > > > > > > > > +++ b/arch/arm/include/asm/futex.h > > > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > > > > > > > > > preempt_disable(); > > > > > > > > > __ua_flags = uaccess_save_and_enable(); > > > > > > > > > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > > > > > > > > > - "1: " TUSER(ldr) " %1, [%4]\n" > > > > > > > > > - " teq %1, %2\n" > > > > > > > > > + "1: " TUSER(ldr) " %1, [%2]\n" > > > > > > > > > + " teq %1, %3\n" > > > > > > > > > " it eq @ explicit IT needed for the 2b label\n" > > > > > > > > > - "2: " TUSER(streq) " %3, [%4]\n" > > > > > > > > > + "2: " TUSER(streq) " %4, [%2]\n" > > > > > > > > > __futex_atomic_ex_table("%5") > > > > > > > > > - : "+r" (ret), "=&r" (val) > > > > > > > > > - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) > > > > > > > > > + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) > > > > > > > > > + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) > > > > > > > > > : "cc", "memory"); > > > > > > > > > uaccess_restore(__ua_flags); > > > > > > > > > > > > > > > > Underspecification of constraints to extended inline assembly is a > > > > > > > > common issue exposed by other compilers (and possibly but in-effect > > > > > > > > infrequently compiler upgrades). > > > > > > > > So the reordering of the constraints means the in the assembly (notes > > > > > > > > for other reviewers): > > > > > > > > %2 -> %3 > > > > > > > > %3 -> %4 > > > > > > > > %4 -> %2 > > > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd! > > > > > > > > > > > > > > I don't see what is "underspecified" in the original constraints. > > > > > > > Please explain. > > > > > > > > > > > > > > > > > > > I agree that that statement makes little sense. > > > > > > > > > > > > As Russell points out in the referenced thread, there is nothing wrong > > > > > > with the generated assembly, given that the UNPREDICTABLE opcode is > > > > > > unreachable in practice. Unfortunately, we have no way to flag this > > > > > > diagnostic as a known false positive, and AFAICT, there is no reason > > > > > > we couldn't end up with the same diagnostic popping up for GCC builds > > > > > > in the future, considering that the register assignment matches the > > > > > > constraints. (We have seen somewhat similar issues where constant > > > > > > folded function clones are emitted with a constant argument that could > > > > > > never occur in reality [0]) > > > > > > > > > > > > Given the above, the only meaningful way to invoke this function is > > > > > > with different registers assigned to %3 and %4, and so tightening the > > > > > > constraints to guarantee that does not actually result in worse code > > > > > > (except maybe for the instantiations that we won't ever call in the > > > > > > first place). So I think we should fix this. > > > > > > > > > > > > I wonder if just adding > > > > > > > > > > > > BUG_ON(__builtin_constant_p(uaddr)); > > > > > > > > > > > > at the beginning makes any difference - this shouldn't result in any > > > > > > object code differences since the conditional will always evaluate to > > > > > > false at build time for instantiations we care about. > > > > > > > > > > > > > > > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/ > > > > > > > > > > What I'm actually asking is: > > > > > > > > > > The GCC manual says that input operands _may_ overlap output operands > > > > > since GCC assumes that input operands are consumed before output > > > > > operands are written. This is an explicit statement. > > > > > > > > > > The GCC manual does not say that input operands may overlap with each > > > > > other, and the behaviour of GCC thus far (apart from one version, > > > > > presumably caused by a bug) has been that input operands are unique. > > > > > > > > > > > > > Not entirely. I have run into issues where GCC assumes that registers > > > > that are only used for input operands are left untouched by the asm > > > > code. I.e., if you put an asm() block in a loop and modify an input > > > > register, your code may break on the next pass, even if the input > > > > register does not overlap with an output register. > > > > > > GCC has had the expectation for decades that _input_ operands are not > > > changed in value by the code in the assembly. This isn't quite the > > > same thing as the uniqueness of the register allocation for input > > > operands. > > > > > > > To me, that seems to suggest that whether or not inputs may overlap is > > > > irrelevant, since they are not expected to be modified. > > > > > > How is: > > > > > > stmfd sp!, {r0-r3, ip, lr} > > > bl foo > > > ldmfd sp!, {r0-r3, ip, lr} > > > > > > where r1 may be an input operand (to pass an argument to foo) any > > > different from: > > > > > > ldrt r0, [r1] > > > > > > as far as whether r1 is modified in both cases? In both cases, the > > > value of r1 is read and written by both instructions, but in both > > > cases the value of r1 remains the same no matter what the value of r1 > > > was. > > > > > > The "input operands should not be modified" is entirely orthogonal to > > > the input operand register allocation. > > > > > > > The question is whether it is reasonable for GCC to use the same > > register for input operands that have the same value. From the > > assumption that GCC makes that the asm will not modified follows > > directly that we can use the same register for different operands. > > > > And in fact, since that asm code (when built in ARM mode) does modify > > the register, uaddr should not be an input operand to begin with. In > > other words, there is an actual bug here, and this patch fixes it. > > Again, you miss my point. > Perhaps. So let me summarize what I do understand. 1) if futex_atomic_cmpxchg_inatomic() is instantiated *and executed* with the same compile time constant value of 0x0 for newval and uaddr, we end up with an opcode for the STRT instruction that is CONSTRAINED UNPREDICTABLE, but we will never execute it since the preceding load will fault and enter the fixup handler. 2) such occurrences of futex_atomic_cmpxchg_inatomic() are unlikely to occur in the code, but may be instantiated by the compiler when doing constant propagation (like in the ilog2() case I quoted), but these instantiations will never be called 3) both clang and gcc may map different inline asm input operands onto the same register if the value is guaranteed to be the same (i.e., they are both compile time constants) My statement about uaddr was slightly misguided, in the sense that our invocation of STRT does use the post-index variant, but with an increment of zero, so the value written back to the register equals the original value. But it does explain why this opcode is CONSTRAINED UNPREDICTABLE in the first place. Given 2) above, I wonder if anyone could confirm whether adding 'BUG_ON(__builtin_constant_p(uaddr))' silences the warning as well.
On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote: > > Perhaps. So let me summarize what I do understand. > > 1) if futex_atomic_cmpxchg_inatomic() is instantiated *and executed* > with the same compile time constant value of 0x0 for newval and uaddr, > we end up with an opcode for the STRT instruction that is CONSTRAINED > UNPREDICTABLE, but we will never execute it since the preceding load > will fault and enter the fixup handler. > 2) such occurrences of futex_atomic_cmpxchg_inatomic() are unlikely to > occur in the code, but may be instantiated by the compiler when doing > constant propagation (like in the ilog2() case I quoted), but these > instantiations will never be called > 3) both clang and gcc may map different inline asm input operands onto > the same register if the value is guaranteed to be the same (i.e., > they are both compile time constants) > > My statement about uaddr was slightly misguided, in the sense that our > invocation of STRT does use the post-index variant, but with an > increment of zero, so the value written back to the register equals > the original value. But it does explain why this opcode is CONSTRAINED > UNPREDICTABLE in the first place. > > Given 2) above, I wonder if anyone could confirm whether adding > 'BUG_ON(__builtin_constant_p(uaddr))' silences the warning as well. Like this? diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h index 0a46676b4245..e6e9b403cd61 100644 --- a/arch/arm/include/asm/futex.h +++ b/arch/arm/include/asm/futex.h @@ -57,6 +57,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, /* Prefetching cannot fault */ prefetchw(uaddr); __ua_flags = uaccess_save_and_enable(); + BUG_ON(__builtin_constant_p(uaddr) || !uaddr); __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" "1: ldrex %1, [%4]\n" " teq %1, %2\n" This had no effect here. My first attempt (before finding the original patch from Mikael Pettersson) was to change the probe to pass '1' as the value instead of '0', that worked fine. Arnd
On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote: > > > > > Perhaps. So let me summarize what I do understand. > > > > 1) if futex_atomic_cmpxchg_inatomic() is instantiated *and executed* > > with the same compile time constant value of 0x0 for newval and uaddr, > > we end up with an opcode for the STRT instruction that is CONSTRAINED > > UNPREDICTABLE, but we will never execute it since the preceding load > > will fault and enter the fixup handler. > > 2) such occurrences of futex_atomic_cmpxchg_inatomic() are unlikely to > > occur in the code, but may be instantiated by the compiler when doing > > constant propagation (like in the ilog2() case I quoted), but these > > instantiations will never be called > > 3) both clang and gcc may map different inline asm input operands onto > > the same register if the value is guaranteed to be the same (i.e., > > they are both compile time constants) > > > > My statement about uaddr was slightly misguided, in the sense that our > > invocation of STRT does use the post-index variant, but with an > > increment of zero, so the value written back to the register equals > > the original value. But it does explain why this opcode is CONSTRAINED > > UNPREDICTABLE in the first place. > > > > Given 2) above, I wonder if anyone could confirm whether adding > > 'BUG_ON(__builtin_constant_p(uaddr))' silences the warning as well. > > Like this? > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h > index 0a46676b4245..e6e9b403cd61 100644 > --- a/arch/arm/include/asm/futex.h > +++ b/arch/arm/include/asm/futex.h > @@ -57,6 +57,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, > /* Prefetching cannot fault */ > prefetchw(uaddr); > __ua_flags = uaccess_save_and_enable(); > + BUG_ON(__builtin_constant_p(uaddr) || !uaddr); > __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" > "1: ldrex %1, [%4]\n" > " teq %1, %2\n" > > This had no effect here. > > My first attempt (before finding the original patch from Mikael Pettersson) > was to change the probe to pass '1' as the value instead of '0', that > worked fine. > Which probe is that?
On Mon, Mar 11, 2019 at 3:36 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel > > <ard.biesheuvel@linaro.org> wrote: > > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote: > > > > My first attempt (before finding the original patch from Mikael Pettersson) > > was to change the probe to pass '1' as the value instead of '0', that > > worked fine. > > > > Which probe is that? diff --git a/kernel/futex.c b/kernel/futex.c index c3b73b0311bc..19615ad3c4f7 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -3864,7 +3864,7 @@ static void __init futex_detect_cmpxchg(void) * implementation, the non-functional ones will return * -ENOSYS. */ - if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT) + if (cmpxchg_futex_value_locked(&curval, NULL, 1, 1) == -EFAULT) futex_cmpxchg_enabled = 1; #endif } Arnd
On Mon, 11 Mar 2019 at 17:30, Arnd Bergmann <arnd@arndb.de> wrote: > > On Mon, Mar 11, 2019 at 3:36 PM Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: > > On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <arnd@arndb.de> wrote: > > > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel > > > <ard.biesheuvel@linaro.org> wrote: > > > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote: > > > > > > My first attempt (before finding the original patch from Mikael Pettersson) > > > was to change the probe to pass '1' as the value instead of '0', that > > > worked fine. > > > > > > > Which probe is that? > > diff --git a/kernel/futex.c b/kernel/futex.c > index c3b73b0311bc..19615ad3c4f7 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -3864,7 +3864,7 @@ static void __init futex_detect_cmpxchg(void) > * implementation, the non-functional ones will return > * -ENOSYS. > */ > - if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT) > + if (cmpxchg_futex_value_locked(&curval, NULL, 1, 1) == -EFAULT) > futex_cmpxchg_enabled = 1; > #endif > } > Ah ok. That explains a lot. Can't we just return -EFAULT if uaddr is NULL? Or does that defeat this check? Note that PA-RISC has /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is * our gateway page, and causes no end of trouble... */ if (uaccess_kernel() && !uaddr) return -EFAULT;
On Mon, Mar 11, 2019 at 5:36 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > On Mon, 11 Mar 2019 at 17:30, Arnd Bergmann <arnd@arndb.de> wrote: > > > > On Mon, Mar 11, 2019 at 3:36 PM Ard Biesheuvel > > <ard.biesheuvel@linaro.org> wrote: > > > On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <arnd@arndb.de> wrote: > > > > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel > > > > <ard.biesheuvel@linaro.org> wrote: > > > > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > > > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote: > > > > > > > > My first attempt (before finding the original patch from Mikael Pettersson) > > > > was to change the probe to pass '1' as the value instead of '0', that > > > > worked fine. > > > > > > > > > > Which probe is that? > > > > diff --git a/kernel/futex.c b/kernel/futex.c > > index c3b73b0311bc..19615ad3c4f7 100644 > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -3864,7 +3864,7 @@ static void __init futex_detect_cmpxchg(void) > > * implementation, the non-functional ones will return > > * -ENOSYS. > > */ > > - if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT) > > + if (cmpxchg_futex_value_locked(&curval, NULL, 1, 1) == -EFAULT) > > futex_cmpxchg_enabled = 1; > > #endif > > } > > > > Ah ok. > > That explains a lot. > > Can't we just return -EFAULT if uaddr is NULL? Or does that defeat this check? I think that would work here, it would just create a tiny overhead for each call to futex_atomic_cmpxchg_inatomic(). Semi-related side note: After I looked at access_ok() for a bit too long, I tried replacing it with #define access_ok(addr, size) \ (((u64)(uintptr_t)addr + (u64)(size_t)size) >= current_thread_info()->addr_limit) which interestingly seemed to improve the output with clang (it lets it combine multiple access_ok() checks and schedule the instructions better, compared to our inline asm implementation), but it unfortunately creates horrible code with gcc. Arnd
diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h index 0a46676b4245..79790912974e 100644 --- a/arch/arm/include/asm/futex.h +++ b/arch/arm/include/asm/futex.h @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr, preempt_disable(); __ua_flags = uaccess_save_and_enable(); __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n" - "1: " TUSER(ldr) " %1, [%4]\n" - " teq %1, %2\n" + "1: " TUSER(ldr) " %1, [%2]\n" + " teq %1, %3\n" " it eq @ explicit IT needed for the 2b label\n" - "2: " TUSER(streq) " %3, [%4]\n" + "2: " TUSER(streq) " %4, [%2]\n" __futex_atomic_ex_table("%5") - : "+r" (ret), "=&r" (val) - : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT) + : "+&r" (ret), "=&r" (val), "+&r" (uaddr) + : "r" (oldval), "r" (newval), "Ir" (-EFAULT) : "cc", "memory"); uaccess_restore(__ua_flags);
Passing registers containing zero as both the address (NULL pointer) and data into cmpxchg_futex_value_locked() leads clang to assign the same register for both inputs on ARM, which triggers a warning explaining that this instruction has unpredictable behavior on ARMv5. /tmp/futex-7e740e.s: Assembler messages: /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4, as Mikael wrote: "One way of fixing this is to make uaddr an input/output register, since "that prevents it from overlapping any other input or output." but then withdrawn as the warning was determined to be harmless, and it apparently never showed up again with later gcc versions. Now the same problem is back when compiling with clang, and we are trying to get clang to build the kernel without warnings, as gcc normally does. Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Mikael Pettersson <mikpelinux@gmail.com> Cc: Dave Martin <Dave.Martin@arm.com> Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/arm/include/asm/futex.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) -- 2.20.0