[00/12] ARM: use adr_l/ldr_l macros for PC-relative references

Message ID	20200914095706.3985-1-ardb@kernel.org
Headers	show Return-Path: <SRS0=ddwy=CX=vger.kernel.org=linux-efi-owner@kernel.org> From: Ard Biesheuvel <ardb@kernel.org> To: linux-efi@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel <ardb@kernel.org>, Russell King <linux@armlinux.org.uk>, Linus Walleij <linus.walleij@linaro.org>, Nicolas Pitre <nico@fluxnic.net>, Nick Desaulniers <ndesaulniers@google.com>, Stefan Agner <stefan@agner.ch>, Peter Smith <Peter.Smith@arm.com>, Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org> Subject: [PATCH 00/12] ARM: use adr_l/ldr_l macros for PC-relative references Date: Mon, 14 Sep 2020 12:56:54 +0300 Message-Id: <20200914095706.3985-1-ardb@kernel.org> Sender: linux-efi-owner@vger.kernel.org Precedence: bulk
Series	ARM: use adr_l/ldr_l macros for PC-relative references \| expand [00/12] ARM: use adr_l/ldr_l macros for PC-relative references [02/12] ARM: efistub: replace adrl pseudo-op with adr_l macro invocation [04/12] ARM: head-common.S: use PC-relative insn sequence for __proc_info [06/12] ARM: head.S: use PC-relative insn sequence for secondary_data [08/12] ARM: head: use PC-relative insn sequence for __smp_alt [10/12] ARM: head.S: use PC-relative insn sequences for __fixup_pv_table [12/12] ARM: kvm: replace open coded VA->PA calculations with adr_l call

Ard Biesheuvel Sept. 14, 2020, 9:56 a.m. UTC

This is a respin of the adr_l/ldr_l code I wrote some years ago in the
context of my KASLR proof of concept for 32-bit ARM.

A new use case came up, in the form of Clang, which does not implement
the 'adrl' pseudo-instruction in its assembler, and so for PC-relative
references that don't fit into a ARM adr instruction, we need something
else. Patch #2 addresses an actual Clang build issue of this nature, by
replacing an occurrence of adrl with adr_l.

I have included my existing cleanup patches that were built on top of the
adr_l macro, which replace several occurrences of open coded arithmetic to
calculate runtime addresses based on link time virtual addresses stored
in literals.

Note that all of these patches with the exception of #2 were reviewed or
acked by Nico before, but given that this was a while ago (and the fact
that neither of us work for Linaro anymore), I have dropped these. Note
that only patch #1 deviates significantly from the last version that I
sent out, the remaining ones were just freshened up (and their commit
logs slightly expanded).

Cc: Russell King <linux@armlinux.org.uk>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Peter Smith <Peter.Smith@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>

Ard Biesheuvel (12):
  ARM: assembler: introduce adr_l, ldr_l and str_l macros
  ARM: efistub: replace adrl pseudo-op with adr_l macro invocation
  ARM: module: add support for place relative relocations
  ARM: head-common.S: use PC-relative insn sequence for __proc_info
  ARM: head-common.S: use PC-relative insn sequence for idmap creation
  ARM: head.S: use PC-relative insn sequence for secondary_data
  ARM: kernel: use relative references for UP/SMP alternatives
  ARM: head: use PC-relative insn sequence for __smp_alt
  ARM: sleep.S: use PC-relative insn sequence for
    sleep_save_sp/mpidr_hash
  ARM: head.S: use PC-relative insn sequences for __fixup_pv_table
  ARM: head.S: use PC relative insn sequence to calculate PHYS_OFFSET
  ARM: kvm: replace open coded VA->PA calculations with adr_l call

 arch/arm/boot/compressed/head.S  | 18 +---
 arch/arm/include/asm/assembler.h | 88 ++++++++++++++++++-
 arch/arm/include/asm/elf.h       |  5 ++
 arch/arm/include/asm/processor.h |  2 +-
 arch/arm/kernel/head-common.S    | 22 ++---
 arch/arm/kernel/head.S           | 90 +++++---------------
 arch/arm/kernel/hyp-stub.S       | 27 +++---
 arch/arm/kernel/module.c         | 20 ++++-
 arch/arm/kernel/sleep.S          | 19 ++---
 9 files changed, 159 insertions(+), 132 deletions(-)

Nicolas Pitre Sept. 14, 2020, 2:06 p.m. UTC | #1

On Mon, 14 Sep 2020, Ard Biesheuvel wrote:

> This is a respin of the adr_l/ldr_l code I wrote some years ago in the
> context of my KASLR proof of concept for 32-bit ARM.
> 
> A new use case came up, in the form of Clang, which does not implement
> the 'adrl' pseudo-instruction in its assembler, and so for PC-relative
> references that don't fit into a ARM adr instruction, we need something
> else. Patch #2 addresses an actual Clang build issue of this nature, by
> replacing an occurrence of adrl with adr_l.
> 
> I have included my existing cleanup patches that were built on top of the
> adr_l macro, which replace several occurrences of open coded arithmetic to
> calculate runtime addresses based on link time virtual addresses stored
> in literals.
> 
> Note that all of these patches with the exception of #2 were reviewed or
> acked by Nico before, but given that this was a while ago (and the fact

Certainly it must have been, as I didn't remember much of it.

> that neither of us work for Linaro anymore), I have dropped these. Note
> that only patch #1 deviates significantly from the last version that I
> sent out, the remaining ones were just freshened up (and their commit
> logs slightly expanded).

Reviewed-by: Nicolas Pitre <nico@fluxnic.net>

> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Cc: Nicolas Pitre <nico@fluxnic.net>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Stefan Agner <stefan@agner.ch>
> Cc: Peter Smith <Peter.Smith@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Will Deacon <will@kernel.org>
> 
> Ard Biesheuvel (12):
>   ARM: assembler: introduce adr_l, ldr_l and str_l macros
>   ARM: efistub: replace adrl pseudo-op with adr_l macro invocation
>   ARM: module: add support for place relative relocations
>   ARM: head-common.S: use PC-relative insn sequence for __proc_info
>   ARM: head-common.S: use PC-relative insn sequence for idmap creation
>   ARM: head.S: use PC-relative insn sequence for secondary_data
>   ARM: kernel: use relative references for UP/SMP alternatives
>   ARM: head: use PC-relative insn sequence for __smp_alt
>   ARM: sleep.S: use PC-relative insn sequence for
>     sleep_save_sp/mpidr_hash
>   ARM: head.S: use PC-relative insn sequences for __fixup_pv_table
>   ARM: head.S: use PC relative insn sequence to calculate PHYS_OFFSET
>   ARM: kvm: replace open coded VA->PA calculations with adr_l call
> 
>  arch/arm/boot/compressed/head.S  | 18 +---
>  arch/arm/include/asm/assembler.h | 88 ++++++++++++++++++-
>  arch/arm/include/asm/elf.h       |  5 ++
>  arch/arm/include/asm/processor.h |  2 +-
>  arch/arm/kernel/head-common.S    | 22 ++---
>  arch/arm/kernel/head.S           | 90 +++++---------------
>  arch/arm/kernel/hyp-stub.S       | 27 +++---
>  arch/arm/kernel/module.c         | 20 ++++-
>  arch/arm/kernel/sleep.S          | 19 ++---
>  9 files changed, 159 insertions(+), 132 deletions(-)
> 
> -- 
> 2.17.1
> 
>

Russell King (Oracle) Sept. 14, 2020, 4:11 p.m. UTC | #2

On Mon, Sep 14, 2020 at 09:35:41AM -0400, Nicolas Pitre wrote:
> On Mon, 14 Sep 2020, Ard Biesheuvel wrote:
> 
> > When using the new adr_l/ldr_l/str_l macros to refer to external symbols
> > from modules, the linker may emit place relative ELF relocations that
> > need to be fixed up by the module loader. So add support for these.
> 
> Just wondering if that capability requirement should be added to the 
> module signature somehow...?
> 
> Maybe not. The MODULE_ARCH_VERMAGIC definition only contains things that 
> are configurable for a given build.

It doesn't need to be. If a module contains a relocation we don't know
how to handle, it will fail to load.

Ard Biesheuvel Sept. 15, 2020, 7:35 a.m. UTC | #3

On Mon, 14 Sep 2020 at 12:57, Ard Biesheuvel <ardb@kernel.org> wrote:
>

> Like arm64, ARM supports position independent code sequences that

> produce symbol references with a greater reach than the ordinary

> adr/ldr instructions. Since on ARM, the adrl pseudo-instruction is

> only supported in ARM mode (and not at all when using Clang), having

> a adr_l macro like we do on arm64 is useful, and increases symmetry

> as well.

>

> Currently, we use open coded instruction sequences involving literals

> and arithmetic operations. Instead, we can use movw/movt pairs on v7

> CPUs, circumventing the D-cache entirely.

>

> E.g., on v7+ CPUs, we can emit a PC-relative reference as follows:

>

>        movw         <reg>, #:lower16:<sym> - (1f + 8)

>        movt         <reg>, #:upper16:<sym> - (1f + 8)

>   1:   add          <reg>, <reg>, pc

>

> For older CPUs, we can emit the literal into a subsection, allowing it

> to be emitted out of line while retaining the ability to perform

> arithmetic on label offsets.

>

> E.g., on pre-v7 CPUs, we can emit a PC-relative reference as follows:

>

>        ldr          <reg>, 2f

>   1:   add          <reg>, <reg>, pc

>        .subsection  1

>   2:   .long        <sym> - (1b + 8)

>        .previous

>

> This is allowed by the assembler because, unlike ordinary sections,

> subsections are combined into a single section in the object file, and

> so the label references are not true cross-section references that are

> visible as relocations. (Subsections have been available in binutils

> since 2004 at least, so they should not cause any issues with older

> toolchains.)

>

> So use the above to implement the macros mov_l, adr_l, ldr_l and str_l,

> all of which will use movw/movt pairs on v7 and later CPUs, and use

> PC-relative literals otherwise.

>

> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

> ---

>  arch/arm/include/asm/assembler.h | 84 ++++++++++++++++++++

>  1 file changed, 84 insertions(+)

>

> diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h

> index feac2c8b86f2..39e972eaaa3f 100644

> --- a/arch/arm/include/asm/assembler.h

> +++ b/arch/arm/include/asm/assembler.h

> @@ -494,4 +494,88 @@ THUMB(     orr     \reg , \reg , #PSR_T_BIT        )

>  #define _ASM_NOKPROBE(entry)

>  #endif

>

> +       .macro          __adldst_l, op, reg, sym, tmp, c

> +       .if             __LINUX_ARM_ARCH__ < 7

> +       ldr\c           \tmp, .La\@

> +       .subsection     1

> +       .align          2

> +.La\@: .long           \sym - .Lpc\@

> +       .previous

> +       .else

> +       .ifnb           \c

> + THUMB(        ittt            \c                      )

> +       .endif

> +       movw\c          \tmp, #:lower16:\sym - .Lpc\@

> +       movt\c          \tmp, #:upper16:\sym - .Lpc\@

> +       .endif

> +

> +#ifndef CONFIG_THUMB2_KERNEL

> +       .set            .Lpc\@, . + 8                   // PC bias

> +       .ifc            \op, add

> +       add\c           \reg, \tmp, pc

> +       .else

> +       \op\c           \reg, [\tmp, pc]


This should be

       \op\c           \reg, [pc, \tmp]

> +       .endif

> +#else

> +.Lb\@: add\c           \tmp, \tmp, pc

> +       /*

> +        * In Thumb-2 builds, the PC bias depends on whether we are currently

> +        * emitting into a .arm or a .thumb section. The size of the add opcode

> +        * above will be 2 bytes when emitting in Thumb mode and 4 bytes when

> +        * emitting in ARM mode, so let's use this to account for the bias.

> +        */

> +       .set            .Lpc\@, . + (. - .Lb\@)

> +

> +       .ifnc           \op, add

> +       \op\c           \reg, [\tmp]

> +       .endif

> +#endif

> +       .endm

> +

> +       /*

> +        * mov_l - move a constant value or [relocated] address into a register

> +        */

> +       .macro          mov_l, dst:req, imm:req

> +       .if             __LINUX_ARM_ARCH__ < 7

> +       ldr             \dst, =\imm

> +       .else

> +       movw            \dst, #:lower16:\imm

> +       movt            \dst, #:upper16:\imm

> +       .endif

> +       .endm

> +

> +       /*

> +        * adr_l - adr pseudo-op with unlimited range

> +        *

> +        * @dst: destination register

> +        * @sym: name of the symbol

> +        * @cond: conditional opcode suffix

> +        */

> +       .macro          adr_l, dst:req, sym:req, cond

> +       __adldst_l      add, \dst, \sym, \dst, \cond

> +       .endm

> +

> +       /*

> +        * ldr_l - ldr <literal> pseudo-op with unlimited range

> +        *

> +        * @dst: destination register

> +        * @sym: name of the symbol

> +        * @cond: conditional opcode suffix

> +        */

> +       .macro          ldr_l, dst:req, sym:req, cond

> +       __adldst_l      ldr, \dst, \sym, \dst, \cond

> +       .endm

> +

> +       /*

> +        * str_l - str <literal> pseudo-op with unlimited range

> +        *

> +        * @src: source register

> +        * @sym: name of the symbol

> +        * @tmp: mandatory scratch register

> +        * @cond: conditional opcode suffix

> +        */

> +       .macro          str_l, src:req, sym:req, tmp:req, cond

> +       __adldst_l      str, \src, \sym, \tmp, \cond

> +       .endm

> +

>  #endif /* __ASM_ASSEMBLER_H__ */

> --

> 2.17.1

>

Nick Desaulniers Sept. 15, 2020, 5:51 p.m. UTC | #4

On Tue, Sep 15, 2020 at 12:35 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 14 Sep 2020 at 12:57, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
> > index feac2c8b86f2..39e972eaaa3f 100644
> > --- a/arch/arm/include/asm/assembler.h
> > +++ b/arch/arm/include/asm/assembler.h
> > @@ -494,4 +494,88 @@ THUMB(     orr     \reg , \reg , #PSR_T_BIT        )
> >  #define _ASM_NOKPROBE(entry)
> >  #endif
> >
> > +       .macro          __adldst_l, op, reg, sym, tmp, c
> > +       .if             __LINUX_ARM_ARCH__ < 7
> > +       ldr\c           \tmp, .La\@
> > +       .subsection     1
> > +       .align          2
> > +.La\@: .long           \sym - .Lpc\@
> > +       .previous
> > +       .else
> > +       .ifnb           \c
> > + THUMB(        ittt            \c                      )
> > +       .endif
> > +       movw\c          \tmp, #:lower16:\sym - .Lpc\@
> > +       movt\c          \tmp, #:upper16:\sym - .Lpc\@
> > +       .endif
> > +
> > +#ifndef CONFIG_THUMB2_KERNEL
> > +       .set            .Lpc\@, . + 8                   // PC bias
> > +       .ifc            \op, add
> > +       add\c           \reg, \tmp, pc
> > +       .else
> > +       \op\c           \reg, [\tmp, pc]
>
> This should be
>
>        \op\c           \reg, [pc, \tmp]

That looks better.  I had tested this series yesterday and wasn't able
to build with either toolchains.  Was going to ask if I was using the
wrong base or if there was an issue with my version binutils.  All
good now, I'll try to hammer on this a bit.

Nick Desaulniers Sept. 15, 2020, 7:32 p.m. UTC | #5

On Mon, Sep 14, 2020 at 2:57 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>

> This is a respin of the adr_l/ldr_l code I wrote some years ago in the

> context of my KASLR proof of concept for 32-bit ARM.

>

> A new use case came up, in the form of Clang, which does not implement

> the 'adrl' pseudo-instruction in its assembler, and so for PC-relative

> references that don't fit into a ARM adr instruction, we need something

> else. Patch #2 addresses an actual Clang build issue of this nature, by

> replacing an occurrence of adrl with adr_l.

>

> I have included my existing cleanup patches that were built on top of the

> adr_l macro, which replace several occurrences of open coded arithmetic to

> calculate runtime addresses based on link time virtual addresses stored

> in literals.

>

> Note that all of these patches with the exception of #2 were reviewed or

> acked by Nico before, but given that this was a while ago (and the fact

> that neither of us work for Linaro anymore), I have dropped these. Note

> that only patch #1 deviates significantly from the last version that I

> sent out, the remaining ones were just freshened up (and their commit

> logs slightly expanded).


Tested-by: Nick Desaulniers <ndesaulniers@google.com>


Thanks for the series, Ard.  I was able to compile and boot the
following with this series (and the fixup to 01/12 applied):

$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make defconfig
$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 defconfig
$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 aspeed_g5_defconfig
$ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 multi_v5_defconfig

(So ARM v7/GCC, ARM v5,6,7/LLVM).  (Technically, the v6 is not
booting, but it's not related to this series and fails to boot without
the series as well.  Our CI on -next is red for an unrelated issue
which is masking the regression).

I was also able to build+boot the defconfig with v5 and v7 with some
configs disabled and a few hacks with LLVM_IAS=1.  This series allowed
me to get further in the build/testing, and I have a few new bugs to
go chase.  If anyone's interested:
https://github.com/ClangBuiltLinux/linux/issues/1154
https://github.com/ClangBuiltLinux/linux/issues/1155
so we're still a handful of bugs away from LLVM_IAS=1 with ARCH=arm,
but we're steadily chipping away at it, and this series is a big help.
Lest it look like there's only kernel fixes in this area, Jian's
https://reviews.llvm.org/D69411 recently was a big help, specifically
for ARCH=arm LLVM_IAS=1.
-- 
Thanks,
~Nick Desaulniers

Ard Biesheuvel Sept. 15, 2020, 9:29 p.m. UTC | #6

On Tue, 15 Sep 2020 at 22:32, Nick Desaulniers <ndesaulniers@google.com> wrote:
>

> On Mon, Sep 14, 2020 at 2:57 AM Ard Biesheuvel <ardb@kernel.org> wrote:

> >

> > This is a respin of the adr_l/ldr_l code I wrote some years ago in the

> > context of my KASLR proof of concept for 32-bit ARM.

> >

> > A new use case came up, in the form of Clang, which does not implement

> > the 'adrl' pseudo-instruction in its assembler, and so for PC-relative

> > references that don't fit into a ARM adr instruction, we need something

> > else. Patch #2 addresses an actual Clang build issue of this nature, by

> > replacing an occurrence of adrl with adr_l.

> >

> > I have included my existing cleanup patches that were built on top of the

> > adr_l macro, which replace several occurrences of open coded arithmetic to

> > calculate runtime addresses based on link time virtual addresses stored

> > in literals.

> >

> > Note that all of these patches with the exception of #2 were reviewed or

> > acked by Nico before, but given that this was a while ago (and the fact

> > that neither of us work for Linaro anymore), I have dropped these. Note

> > that only patch #1 deviates significantly from the last version that I

> > sent out, the remaining ones were just freshened up (and their commit

> > logs slightly expanded).

>

> Tested-by: Nick Desaulniers <ndesaulniers@google.com>

>

> Thanks for the series, Ard.  I was able to compile and boot the

> following with this series (and the fixup to 01/12 applied):

>

> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make defconfig

> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 defconfig

> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 aspeed_g5_defconfig

> $ ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- make LLVM=1 multi_v5_defconfig

>

> (So ARM v7/GCC, ARM v5,6,7/LLVM).  (Technically, the v6 is not

> booting, but it's not related to this series and fails to boot without

> the series as well.  Our CI on -next is red for an unrelated issue

> which is masking the regression).

>


Excellent, thanks for testing. Do you have any coverage for Thumb2
builds as well? (CONFIG_THUMB2_KERNEL=y)

> I was also able to build+boot the defconfig with v5 and v7 with some

> configs disabled and a few hacks with LLVM_IAS=1.  This series allowed

> me to get further in the build/testing, and I have a few new bugs to

> go chase.  If anyone's interested:

> https://github.com/ClangBuiltLinux/linux/issues/1154

> https://github.com/ClangBuiltLinux/linux/issues/1155

> so we're still a handful of bugs away from LLVM_IAS=1 with ARCH=arm,

> but we're steadily chipping away at it, and this series is a big help.

> Lest it look like there's only kernel fixes in this area, Jian's

> https://reviews.llvm.org/D69411 recently was a big help, specifically

> for ARCH=arm LLVM_IAS=1.

> --

> Thanks,

> ~Nick Desaulniers

Nick Desaulniers Sept. 15, 2020, 11:31 p.m. UTC | #7

On Tue, Sep 15, 2020 at 2:30 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>

> Excellent, thanks for testing. Do you have any coverage for Thumb2

> builds as well? (CONFIG_THUMB2_KERNEL=y)

Ah, right, manually testing ARCH=arm defconfig with
CONFIG_THUMB2_KERNEL enabled via menuconfig:

Builds and boots for clang.

(Also needed https://lore.kernel.org/lkml/20200915225751.274531-1-ndesaulniers@google.com/T/#u
for an unrelated issue).

For GCC, I observe:

arch/arm/vfp/vfphw.o: in function `vfp_support_entry':
(.text+0xa): relocation truncated to fit: R_ARM_THM_JUMP19 against
symbol `vfp_kmode_exception' defined in .text.unlikely section in
arch/arm/vfp/vfpmodule.o

There doesn't seem to be a config option to work around that for now.
Though it's not caused by your series; if I drop your series, I can
still reproduce.

Is there a different config I should be testing rather than
defconfig+manual testing, like one of the existing configs? Looks like
only milbeaut_m10v_defconfig enable THUMB2 by default.  I should add
that to my CI and kernelCI for coverage with clang.
https://github.com/ClangBuiltLinux/continuous-integration/issues/94#issuecomment-693030376

milbeaut_m10v_defconfig
builds with your series for clang.  Looks like that config may be
currently broken for GCC?
Looks like it doesn't boot with Clang, so I'll need to debug that.
Again, both of the two past sentences are regardless of your series.
So your series still LGTM.
-- 
Thanks,
~Nick Desaulniers

Ard Biesheuvel Sept. 16, 2020, 5:54 a.m. UTC | #8

On Wed, 16 Sep 2020 at 02:31, Nick Desaulniers <ndesaulniers@google.com> wrote:
>

> On Tue, Sep 15, 2020 at 2:30 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> >

> > Excellent, thanks for testing. Do you have any coverage for Thumb2

> > builds as well? (CONFIG_THUMB2_KERNEL=y)

>

> Ah, right, manually testing ARCH=arm defconfig with

> CONFIG_THUMB2_KERNEL enabled via menuconfig:

>

> Builds and boots for clang.

>

> (Also needed https://lore.kernel.org/lkml/20200915225751.274531-1-ndesaulniers@google.com/T/#u

> for an unrelated issue).

>

> For GCC, I observe:

>

> arch/arm/vfp/vfphw.o: in function `vfp_support_entry':

> (.text+0xa): relocation truncated to fit: R_ARM_THM_JUMP19 against

> symbol `vfp_kmode_exception' defined in .text.unlikely section in

> arch/arm/vfp/vfpmodule.o

>


Interesting. And this is using ld.bfd ?

> There doesn't seem to be a config option to work around that for now.

> Though it's not caused by your series; if I drop your series, I can

> still reproduce.

>

> Is there a different config I should be testing rather than

> defconfig+manual testing, like one of the existing configs? Looks like

> only milbeaut_m10v_defconfig enable THUMB2 by default.  I should add

> that to my CI and kernelCI for coverage with clang.

> https://github.com/ClangBuiltLinux/continuous-integration/issues/94#issuecomment-693030376

>

> milbeaut_m10v_defconfig

> builds with your series for clang.  Looks like that config may be

> currently broken for GCC?

> Looks like it doesn't boot with Clang, so I'll need to debug that.

> Again, both of the two past sentences are regardless of your series.

> So your series still LGTM.


Cheers.

I usually build multi_v7_defconfig with/wihout CONFIG_LPAE x
with/without CONFIG_THUMB2_KERNEL (LPAE affects the size of
dma_addr_t, and uses a different page table format, so it is a useful
bit to flick in testing)

Nick Desaulniers Sept. 16, 2020, 7:53 p.m. UTC | #9

On Tue, Sep 15, 2020 at 10:55 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 16 Sep 2020 at 02:31, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Tue, Sep 15, 2020 at 2:30 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > Excellent, thanks for testing. Do you have any coverage for Thumb2
> > > builds as well? (CONFIG_THUMB2_KERNEL=y)
> >
> > Ah, right, manually testing ARCH=arm defconfig with
> > CONFIG_THUMB2_KERNEL enabled via menuconfig:
> >
> > Builds and boots for clang.
> >
> > (Also needed https://lore.kernel.org/lkml/20200915225751.274531-1-ndesaulniers@google.com/T/#u
> > for an unrelated issue).
> >
> > For GCC, I observe:
> >
> > arch/arm/vfp/vfphw.o: in function `vfp_support_entry':
> > (.text+0xa): relocation truncated to fit: R_ARM_THM_JUMP19 against
> > symbol `vfp_kmode_exception' defined in .text.unlikely section in
> > arch/arm/vfp/vfpmodule.o
> >
>
> Interesting. And this is using ld.bfd ?

$ arm-linux-gnueabihf-ld --version
GNU ld (GNU Binutils for Debian) 2.34

.text.unlikely reminds me of the sections created when profiling data
is present.  That's obviously not the case here, so maybe there's
other ways this section can be created by GCC?  I may have added a
patch recently for placing this section explicitly, which was a
prerequisite for Kees' series explicitly placing all sections.
https://lore.kernel.org/lkml/159896087937.20229.4955362311782724603.tip-bot2@tip-bot2/
Maybe that can be improved?

IIRC, LLD is able to emit range extension thunks for these cases, but
BFD does not.

Ard Biesheuvel Sept. 16, 2020, 8:45 p.m. UTC | #10

On Wed, 16 Sep 2020 at 22:53, Nick Desaulniers <ndesaulniers@google.com> wrote:
>

> On Tue, Sep 15, 2020 at 10:55 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> >

> > On Wed, 16 Sep 2020 at 02:31, Nick Desaulniers <ndesaulniers@google.com> wrote:

> > >

> > > On Tue, Sep 15, 2020 at 2:30 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> > > >

> > > > Excellent, thanks for testing. Do you have any coverage for Thumb2

> > > > builds as well? (CONFIG_THUMB2_KERNEL=y)

> > >

> > > Ah, right, manually testing ARCH=arm defconfig with

> > > CONFIG_THUMB2_KERNEL enabled via menuconfig:

> > >

> > > Builds and boots for clang.

> > >

> > > (Also needed https://lore.kernel.org/lkml/20200915225751.274531-1-ndesaulniers@google.com/T/#u

> > > for an unrelated issue).

> > >

> > > For GCC, I observe:

> > >

> > > arch/arm/vfp/vfphw.o: in function `vfp_support_entry':

> > > (.text+0xa): relocation truncated to fit: R_ARM_THM_JUMP19 against

> > > symbol `vfp_kmode_exception' defined in .text.unlikely section in

> > > arch/arm/vfp/vfpmodule.o

> > >

> >

> > Interesting. And this is using ld.bfd ?

>

> $ arm-linux-gnueabihf-ld --version

> GNU ld (GNU Binutils for Debian) 2.34

>

> .text.unlikely reminds me of the sections created when profiling data

> is present.  That's obviously not the case here, so maybe there's

> other ways this section can be created by GCC?  I may have added a

> patch recently for placing this section explicitly, which was a

> prerequisite for Kees' series explicitly placing all sections.

> https://lore.kernel.org/lkml/159896087937.20229.4955362311782724603.tip-bot2@tip-bot2/

> Maybe that can be improved?

>

> IIRC, LLD is able to emit range extension thunks for these cases, but

> BFD does not.


ld.bfd will usually emit veneers for branches that turn out to be out
of range in the final link stage.

Does the following help?

diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index 4fcff9f59947..f1468702fbc9 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -82,6 +82,7 @@ ENTRY(vfp_support_entry)
        ldr     r3, [sp, #S_PSR]        @ Neither lazy restore nor FP exceptions
        and     r3, r3, #MODE_MASK      @ are supported in kernel mode
        teq     r3, #USR_MODE
+THUMB( it      ne                      )
        bne     vfp_kmode_exception     @ Returns through lr

        VFPFMRX r1, FPEXC               @ Is the VFP enabled?

Nick Desaulniers Sept. 16, 2020, 9:25 p.m. UTC | #11

On Wed, Sep 16, 2020 at 1:45 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 16 Sep 2020 at 22:53, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Tue, Sep 15, 2020 at 10:55 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Wed, 16 Sep 2020 at 02:31, Nick Desaulniers <ndesaulniers@google.com> wrote:
> > > >
> > > > On Tue, Sep 15, 2020 at 2:30 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > >
> > > > > Excellent, thanks for testing. Do you have any coverage for Thumb2
> > > > > builds as well? (CONFIG_THUMB2_KERNEL=y)
> > > >
> > > > Ah, right, manually testing ARCH=arm defconfig with
> > > > CONFIG_THUMB2_KERNEL enabled via menuconfig:
> > > >
> > > > Builds and boots for clang.
> > > >
> > > > (Also needed https://lore.kernel.org/lkml/20200915225751.274531-1-ndesaulniers@google.com/T/#u
> > > > for an unrelated issue).
> > > >
> > > > For GCC, I observe:
> > > >
> > > > arch/arm/vfp/vfphw.o: in function `vfp_support_entry':
> > > > (.text+0xa): relocation truncated to fit: R_ARM_THM_JUMP19 against
> > > > symbol `vfp_kmode_exception' defined in .text.unlikely section in
> > > > arch/arm/vfp/vfpmodule.o
> > > >
> > >
> > > Interesting. And this is using ld.bfd ?
> >
> > $ arm-linux-gnueabihf-ld --version
> > GNU ld (GNU Binutils for Debian) 2.34
> >
> > .text.unlikely reminds me of the sections created when profiling data
> > is present.  That's obviously not the case here, so maybe there's
> > other ways this section can be created by GCC?  I may have added a
> > patch recently for placing this section explicitly, which was a
> > prerequisite for Kees' series explicitly placing all sections.
> > https://lore.kernel.org/lkml/159896087937.20229.4955362311782724603.tip-bot2@tip-bot2/
> > Maybe that can be improved?
> >
> > IIRC, LLD is able to emit range extension thunks for these cases, but
> > BFD does not.
>
> ld.bfd will usually emit veneers for branches that turn out to be out
> of range in the final link stage.
>
> Does the following help?
>
> diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
> index 4fcff9f59947..f1468702fbc9 100644
> --- a/arch/arm/vfp/vfphw.S
> +++ b/arch/arm/vfp/vfphw.S
> @@ -82,6 +82,7 @@ ENTRY(vfp_support_entry)
>         ldr     r3, [sp, #S_PSR]        @ Neither lazy restore nor FP exceptions
>         and     r3, r3, #MODE_MASK      @ are supported in kernel mode
>         teq     r3, #USR_MODE
> +THUMB( it      ne                      )
>         bne     vfp_kmode_exception     @ Returns through lr
>
>         VFPFMRX r1, FPEXC               @ Is the VFP enabled?

Yes, it does!  I'm curious why though?  I've seen the "it prefixes"
before (is that what they're called?), but I don't recall what they're
for.  How come that solves this issue?

(Feel free to use my tested by tag on a subsequent resulting patch
that uses that).  That fix is irrespective of this series, so you
should send it maybe separately?

Also, it looks like the GCC build of milbeaut_m10v_defconfig fails to
boot for me in QEMU; so maybe not just a Clang bug (or maybe, more
than one bug).  (or maybe one of my command line params to QEMU is
wrong).

Stepping through start_kernel(), the call to setup_arch() seems to
hang in qemu.  For both GCC and Clang builds. A breakpoint in panic()
never gets hit.  Looks like the deepest I can get is:

Looks like:
#0  memblock_alloc_try_nid (size=108, align=4, min_addr=0, max_addr=0,
nid=-1) at mm/memblock.c:1601
#1  0xc060f0b4 in memblock_alloc (size=<optimized out>,
align=<optimized out>) at ./include/linux/memblock.h:409
#2  cma_init_reserved_mem (base=1543503872, size=67108864,
order_per_bit=0, name=0xc04b9240 "reserved", res_cma=0xc07ccbdc
<dma_contiguous_default_area>) at mm/cma.c:190
#3  0xc060f2c0 in cma_declare_contiguous_nid (base=1543503872,
size=67108864, limit=1610612736, alignment=8388608, order_per_bit=0,
fixed=false, name=0xc04b9240 "reserved",
    res_cma=0xc07ccbdc <dma_contiguous_default_area>, nid=-1) at mm/cma.c:352
#4  0xc0608cb6 in cma_declare_contiguous (alignment=<optimized out>,
order_per_bit=<optimized out>, name=<optimized out>,
res_cma=<optimized out>, fixed=<optimized out>,
    limit=<optimized out>, size=<optimized out>, base=<optimized out>)
at ./include/linux/cma.h:28
#5  dma_contiguous_reserve_area (size=<optimized out>, base=<optimized
out>, limit=<optimized out>, res_cma=0xc07ccbdc
<dma_contiguous_default_area>, fixed=false)
    at kernel/dma/contiguous.c:201
#6  0xc0608d22 in dma_contiguous_reserve (limit=<optimized out>) at
kernel/dma/contiguous.c:171
#7  0xc0604584 in arm_memblock_init (mdesc=0xc061bfe8
<__mach_desc_GENERIC_DT.35641>) at arch/arm/mm/init.c:230
#8  0xc060302c in setup_arch (cmdline_p=<optimized out>) at
arch/arm/kernel/setup.c:1132
#9  0xc06007d2 in start_kernel () at init/main.c:857

there's a call to memset that seems to hang.  I wonder if memset() was
defined in terms of __builtin_memset, which oft can result in infinite
loops?  (IIRC there's an AEABI related memset; this kind of thing has
hit android userspace before).

(gdb) layout asm

shows that the source call to memset is transformed into a call to mmioset.

(gdb) bt
#0  mmioset () at arch/arm/lib/memset.S:19
#1  0xc060e2dc in memblock_alloc_try_nid (size=108, align=<optimized
out>, min_addr=0, max_addr=0, nid=-1) at mm/memblock.c:1602
#2  0xc060f0b4 in memblock_alloc (size=<optimized out>,
align=<optimized out>) at ./include/linux/memblock.h:409
#3  cma_init_reserved_mem (base=1543503872, size=67108864,
order_per_bit=0, name=0xc04b9240 "reserved", res_cma=0xc07ccbdc
<dma_contiguous_default_area>) at mm/cma.c:190
#4  0xc060f2c0 in cma_declare_contiguous_nid (base=1543503872,
size=67108864, limit=1610612736, alignment=8388608, order_per_bit=0,
fixed=false, name=0xc04b9240 "reserved",
    res_cma=0xc07ccbdc <dma_contiguous_default_area>, nid=-1) at mm/cma.c:352
#5  0xc0608cb6 in cma_declare_contiguous (alignment=<optimized out>,
order_per_bit=<optimized out>, name=<optimized out>,
res_cma=<optimized out>, fixed=<optimized out>,
    limit=<optimized out>, size=<optimized out>, base=<optimized out>)
at ./include/linux/cma.h:28
#6  dma_contiguous_reserve_area (size=<optimized out>, base=<optimized
out>, limit=<optimized out>, res_cma=0xc07ccbdc
<dma_contiguous_default_area>, fixed=false)
    at kernel/dma/contiguous.c:201
#7  0xc0608d22 in dma_contiguous_reserve (limit=<optimized out>) at
kernel/dma/contiguous.c:171
#8  0xc0604584 in arm_memblock_init (mdesc=0xc061bfe8
<__mach_desc_GENERIC_DT.35641>) at arch/arm/mm/init.c:230
#9  0xc060302c in setup_arch (cmdline_p=<optimized out>) at
arch/arm/kernel/setup.c:1132
#10 0xc06007d2 in start_kernel () at init/main.c:857

Using `si` in gdb, it looks like we maybe hit an exception vector?
x   1202                .section .vectors, "ax", %progbits

                                            x
x   1203        .L__vectors_start:

                                            x
x   1204                W(b)    vector_rst

                                            x
x   1205                W(b)    vector_und

                                            x
x   1206                W(ldr)  pc, .L__vectors_start + 0x1000

                                            x
x  >1207                W(b)    vector_pabt

Is the last thing I see, then `si` stops working.  Not sure if `pabt`
is a recognizable acronym to anyone more familiar with ARM?

Happens regardless of your series, FWIW.

Nick Desaulniers Sept. 17, 2020, 12:16 a.m. UTC | #12

On Wed, Sep 16, 2020 at 2:25 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>

> Also, it looks like the GCC build of milbeaut_m10v_defconfig fails to

> boot for me in QEMU; so maybe not just a Clang bug (or maybe, more

> than one bug).  (or maybe one of my command line params to QEMU is

> wrong).

>

> Stepping through start_kernel(), the call to setup_arch() seems to

> hang in qemu.  For both GCC and Clang builds. A breakpoint in panic()

> never gets hit.  Looks like the deepest I can get is:

>

> Looks like:

> #0  memblock_alloc_try_nid (size=108, align=4, min_addr=0, max_addr=0,

> nid=-1) at mm/memblock.c:1601

> #1  0xc060f0b4 in memblock_alloc (size=<optimized out>,

> align=<optimized out>) at ./include/linux/memblock.h:409

> #2  cma_init_reserved_mem (base=1543503872, size=67108864,

> order_per_bit=0, name=0xc04b9240 "reserved", res_cma=0xc07ccbdc

> <dma_contiguous_default_area>) at mm/cma.c:190

> #3  0xc060f2c0 in cma_declare_contiguous_nid (base=1543503872,

> size=67108864, limit=1610612736, alignment=8388608, order_per_bit=0,

> fixed=false, name=0xc04b9240 "reserved",

>     res_cma=0xc07ccbdc <dma_contiguous_default_area>, nid=-1) at mm/cma.c:352

> #4  0xc0608cb6 in cma_declare_contiguous (alignment=<optimized out>,

> order_per_bit=<optimized out>, name=<optimized out>,

> res_cma=<optimized out>, fixed=<optimized out>,

>     limit=<optimized out>, size=<optimized out>, base=<optimized out>)

> at ./include/linux/cma.h:28

> #5  dma_contiguous_reserve_area (size=<optimized out>, base=<optimized

> out>, limit=<optimized out>, res_cma=0xc07ccbdc

> <dma_contiguous_default_area>, fixed=false)

>     at kernel/dma/contiguous.c:201

> #6  0xc0608d22 in dma_contiguous_reserve (limit=<optimized out>) at

> kernel/dma/contiguous.c:171

> #7  0xc0604584 in arm_memblock_init (mdesc=0xc061bfe8

> <__mach_desc_GENERIC_DT.35641>) at arch/arm/mm/init.c:230

> #8  0xc060302c in setup_arch (cmdline_p=<optimized out>) at

> arch/arm/kernel/setup.c:1132

> #9  0xc06007d2 in start_kernel () at init/main.c:857

>

> there's a call to memset that seems to hang.  I wonder if memset() was

> defined in terms of __builtin_memset, which oft can result in infinite

> loops?  (IIRC there's an AEABI related memset; this kind of thing has

> hit android userspace before).

>

> (gdb) layout asm

>

> shows that the source call to memset is transformed into a call to mmioset.

>

> (gdb) bt

> #0  mmioset () at arch/arm/lib/memset.S:19

> #1  0xc060e2dc in memblock_alloc_try_nid (size=108, align=<optimized

> out>, min_addr=0, max_addr=0, nid=-1) at mm/memblock.c:1602

> #2  0xc060f0b4 in memblock_alloc (size=<optimized out>,

> align=<optimized out>) at ./include/linux/memblock.h:409

> #3  cma_init_reserved_mem (base=1543503872, size=67108864,

> order_per_bit=0, name=0xc04b9240 "reserved", res_cma=0xc07ccbdc

> <dma_contiguous_default_area>) at mm/cma.c:190

> #4  0xc060f2c0 in cma_declare_contiguous_nid (base=1543503872,

> size=67108864, limit=1610612736, alignment=8388608, order_per_bit=0,

> fixed=false, name=0xc04b9240 "reserved",

>     res_cma=0xc07ccbdc <dma_contiguous_default_area>, nid=-1) at mm/cma.c:352

> #5  0xc0608cb6 in cma_declare_contiguous (alignment=<optimized out>,

> order_per_bit=<optimized out>, name=<optimized out>,

> res_cma=<optimized out>, fixed=<optimized out>,

>     limit=<optimized out>, size=<optimized out>, base=<optimized out>)

> at ./include/linux/cma.h:28

> #6  dma_contiguous_reserve_area (size=<optimized out>, base=<optimized

> out>, limit=<optimized out>, res_cma=0xc07ccbdc

> <dma_contiguous_default_area>, fixed=false)

>     at kernel/dma/contiguous.c:201

> #7  0xc0608d22 in dma_contiguous_reserve (limit=<optimized out>) at

> kernel/dma/contiguous.c:171

> #8  0xc0604584 in arm_memblock_init (mdesc=0xc061bfe8

> <__mach_desc_GENERIC_DT.35641>) at arch/arm/mm/init.c:230

> #9  0xc060302c in setup_arch (cmdline_p=<optimized out>) at

> arch/arm/kernel/setup.c:1132

> #10 0xc06007d2 in start_kernel () at init/main.c:857

>

> Using `si` in gdb, it looks like we maybe hit an exception vector?

> x   1202                .section .vectors, "ax", %progbits

>

>                                             x

> x   1203        .L__vectors_start:

>

>                                             x

> x   1204                W(b)    vector_rst

>

>                                             x

> x   1205                W(b)    vector_und

>

>                                             x

> x   1206                W(ldr)  pc, .L__vectors_start + 0x1000

>

>                                             x

> x  >1207                W(b)    vector_pabt

>

> Is the last thing I see, then `si` stops working.  Not sure if `pabt`

> is a recognizable acronym to anyone more familiar with ARM?

>

> Happens regardless of your series, FWIW.


It seems this is affecting the ARCH=arm defconfig (regardless of
toolchain) of linux-next today.  I know you've warned me about testing
on -next before...

Maybe this is: https://lore.kernel.org/linux-next/20200916140437.GL2142832@kernel.org/
? That looks arm64 specific though.  Maybe 32b ARM needs the same or a
similar fix?  Oh man, this boots, total shot in the dark:

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 45f9d5ec2360..7118b98c1f5f 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -226,9 +226,6 @@ void __init arm_memblock_init(const struct
machine_desc *mdesc)
        early_init_fdt_reserve_self();
        early_init_fdt_scan_reserved_mem();

-       /* reserve memory for DMA contiguous allocations */
-       dma_contiguous_reserve(arm_dma_limit);
-
        arm_memblock_steal_permitted = false;
        memblock_dump_all();
 }
@@ -248,6 +245,9 @@ void __init bootmem_init(void)
         */
        sparse_init();

+       /* reserve memory for DMA contiguous allocations */
+       dma_contiguous_reserve(arm_dma_limit);
+
        /*
         * Now free the memory - free_area_init needs
         * the sparse mem_map arrays initialized by sparse_init()
-- 
Thanks,
~Nick Desaulniers

Mike Rapoport Sept. 17, 2020, 5:19 a.m. UTC | #13

(added Mike K.)

On Wed, Sep 16, 2020 at 05:16:32PM -0700, Nick Desaulniers wrote:
> On Wed, Sep 16, 2020 at 2:25 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:

...
 
> Maybe this is: https://lore.kernel.org/linux-next/20200916140437.GL2142832@kernel.org/
> ? That looks arm64 specific though.  Maybe 32b ARM needs the same or a
> similar fix?  Oh man, this boots, total shot in the dark:

The CMA change is the problem IMO and it's now removed from -mm and
-next trees. 

> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 45f9d5ec2360..7118b98c1f5f 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -226,9 +226,6 @@ void __init arm_memblock_init(const struct
> machine_desc *mdesc)
>         early_init_fdt_reserve_self();
>         early_init_fdt_scan_reserved_mem();
> 
> -       /* reserve memory for DMA contiguous allocations */
> -       dma_contiguous_reserve(arm_dma_limit);
> -
>         arm_memblock_steal_permitted = false;
>         memblock_dump_all();
>  }
> @@ -248,6 +245,9 @@ void __init bootmem_init(void)
>          */
>         sparse_init();
> 
> +       /* reserve memory for DMA contiguous allocations */
> +       dma_contiguous_reserve(arm_dma_limit);
> +

This might be too late for ARM because in paging_init() it calls
dma_contiguous_remap() which presumes that the CMA area is already
reserved.

dma_contiguous_remap() might be NOP, so your fix will boot until it
fails eventually :) 

>         /*
>          * Now free the memory - free_area_init needs
>          * the sparse mem_map arrays initialized by sparse_init()

Ard Biesheuvel Sept. 17, 2020, 6:01 a.m. UTC | #14

On Thu, 17 Sep 2020 at 00:25, Nick Desaulniers <ndesaulniers@google.com> wrote:
>

> On Wed, Sep 16, 2020 at 1:45 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> >

> > On Wed, 16 Sep 2020 at 22:53, Nick Desaulniers <ndesaulniers@google.com> wrote:

> > >

> > > On Tue, Sep 15, 2020 at 10:55 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> > > >

> > > > On Wed, 16 Sep 2020 at 02:31, Nick Desaulniers <ndesaulniers@google.com> wrote:

> > > > >

> > > > > On Tue, Sep 15, 2020 at 2:30 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> > > > > >

> > > > > > Excellent, thanks for testing. Do you have any coverage for Thumb2

> > > > > > builds as well? (CONFIG_THUMB2_KERNEL=y)

> > > > >

> > > > > Ah, right, manually testing ARCH=arm defconfig with

> > > > > CONFIG_THUMB2_KERNEL enabled via menuconfig:

> > > > >

> > > > > Builds and boots for clang.

> > > > >

> > > > > (Also needed https://lore.kernel.org/lkml/20200915225751.274531-1-ndesaulniers@google.com/T/#u

> > > > > for an unrelated issue).

> > > > >

> > > > > For GCC, I observe:

> > > > >

> > > > > arch/arm/vfp/vfphw.o: in function `vfp_support_entry':

> > > > > (.text+0xa): relocation truncated to fit: R_ARM_THM_JUMP19 against

> > > > > symbol `vfp_kmode_exception' defined in .text.unlikely section in

> > > > > arch/arm/vfp/vfpmodule.o

> > > > >

> > > >

> > > > Interesting. And this is using ld.bfd ?

> > >

> > > $ arm-linux-gnueabihf-ld --version

> > > GNU ld (GNU Binutils for Debian) 2.34

> > >

> > > .text.unlikely reminds me of the sections created when profiling data

> > > is present.  That's obviously not the case here, so maybe there's

> > > other ways this section can be created by GCC?  I may have added a

> > > patch recently for placing this section explicitly, which was a

> > > prerequisite for Kees' series explicitly placing all sections.

> > > https://lore.kernel.org/lkml/159896087937.20229.4955362311782724603.tip-bot2@tip-bot2/

> > > Maybe that can be improved?

> > >

> > > IIRC, LLD is able to emit range extension thunks for these cases, but

> > > BFD does not.

> >

> > ld.bfd will usually emit veneers for branches that turn out to be out

> > of range in the final link stage.

> >

> > Does the following help?

> >

> > diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S

> > index 4fcff9f59947..f1468702fbc9 100644

> > --- a/arch/arm/vfp/vfphw.S

> > +++ b/arch/arm/vfp/vfphw.S

> > @@ -82,6 +82,7 @@ ENTRY(vfp_support_entry)

> >         ldr     r3, [sp, #S_PSR]        @ Neither lazy restore nor FP exceptions

> >         and     r3, r3, #MODE_MASK      @ are supported in kernel mode

> >         teq     r3, #USR_MODE

> > +THUMB( it      ne                      )

> >         bne     vfp_kmode_exception     @ Returns through lr

> >

> >         VFPFMRX r1, FPEXC               @ Is the VFP enabled?

>

> Yes, it does!  I'm curious why though?  I've seen the "it prefixes"

> before (is that what they're called?), but I don't recall what they're

> for.  How come that solves this issue?

>


It forces the use of an instruction encoding that has more space for
an immediate.

> (Feel free to use my tested by tag on a subsequent resulting patch

> that uses that).  That fix is irrespective of this series, so you

> should send it maybe separately?

>


I will, thanks.

> Also, it looks like the GCC build of milbeaut_m10v_defconfig fails to

> boot for me in QEMU; so maybe not just a Clang bug (or maybe, more

> than one bug).  (or maybe one of my command line params to QEMU is

> wrong).

>


I understand that this is actually an existing issue in -next, but in
general, why would you expect to be able to boot
milbeaut_m10v_defconfig on anything other than a Milbeaut MV10
machine? (whatever it is) Or does QEMU emulate a milbeaut machine? If
not, better to stick with configs that are intended to boot on the
QEMU machine emulation that you are using.

Also, while I see the point of regression testing of -next, using it
as a base to test arbitrary series and then report failures against it
produces a lot of noise. -next is *not* a good base for development,
because you get everybody else's half baked crap as well. When you
test my stuff, please use a known good base so we're not off on a
goose chase every time.



> Stepping through start_kernel(), the call to setup_arch() seems to

> hang in qemu.  For both GCC and Clang builds. A breakpoint in panic()

> never gets hit.  Looks like the deepest I can get is:

>

> Looks like:

> #0  memblock_alloc_try_nid (size=108, align=4, min_addr=0, max_addr=0,

> nid=-1) at mm/memblock.c:1601

> #1  0xc060f0b4 in memblock_alloc (size=<optimized out>,

> align=<optimized out>) at ./include/linux/memblock.h:409

> #2  cma_init_reserved_mem (base=1543503872, size=67108864,

> order_per_bit=0, name=0xc04b9240 "reserved", res_cma=0xc07ccbdc

> <dma_contiguous_default_area>) at mm/cma.c:190

> #3  0xc060f2c0 in cma_declare_contiguous_nid (base=1543503872,

> size=67108864, limit=1610612736, alignment=8388608, order_per_bit=0,

> fixed=false, name=0xc04b9240 "reserved",

>     res_cma=0xc07ccbdc <dma_contiguous_default_area>, nid=-1) at mm/cma.c:352

> #4  0xc0608cb6 in cma_declare_contiguous (alignment=<optimized out>,

> order_per_bit=<optimized out>, name=<optimized out>,

> res_cma=<optimized out>, fixed=<optimized out>,

>     limit=<optimized out>, size=<optimized out>, base=<optimized out>)

> at ./include/linux/cma.h:28

> #5  dma_contiguous_reserve_area (size=<optimized out>, base=<optimized

> out>, limit=<optimized out>, res_cma=0xc07ccbdc

> <dma_contiguous_default_area>, fixed=false)

>     at kernel/dma/contiguous.c:201

> #6  0xc0608d22 in dma_contiguous_reserve (limit=<optimized out>) at

> kernel/dma/contiguous.c:171

> #7  0xc0604584 in arm_memblock_init (mdesc=0xc061bfe8

> <__mach_desc_GENERIC_DT.35641>) at arch/arm/mm/init.c:230

> #8  0xc060302c in setup_arch (cmdline_p=<optimized out>) at

> arch/arm/kernel/setup.c:1132

> #9  0xc06007d2 in start_kernel () at init/main.c:857

>

> there's a call to memset that seems to hang.  I wonder if memset() was

> defined in terms of __builtin_memset, which oft can result in infinite

> loops?  (IIRC there's an AEABI related memset; this kind of thing has

> hit android userspace before).

>

> (gdb) layout asm

>

> shows that the source call to memset is transformed into a call to mmioset.

>

> (gdb) bt

> #0  mmioset () at arch/arm/lib/memset.S:19

> #1  0xc060e2dc in memblock_alloc_try_nid (size=108, align=<optimized

> out>, min_addr=0, max_addr=0, nid=-1) at mm/memblock.c:1602

> #2  0xc060f0b4 in memblock_alloc (size=<optimized out>,

> align=<optimized out>) at ./include/linux/memblock.h:409

> #3  cma_init_reserved_mem (base=1543503872, size=67108864,

> order_per_bit=0, name=0xc04b9240 "reserved", res_cma=0xc07ccbdc

> <dma_contiguous_default_area>) at mm/cma.c:190

> #4  0xc060f2c0 in cma_declare_contiguous_nid (base=1543503872,

> size=67108864, limit=1610612736, alignment=8388608, order_per_bit=0,

> fixed=false, name=0xc04b9240 "reserved",

>     res_cma=0xc07ccbdc <dma_contiguous_default_area>, nid=-1) at mm/cma.c:352

> #5  0xc0608cb6 in cma_declare_contiguous (alignment=<optimized out>,

> order_per_bit=<optimized out>, name=<optimized out>,

> res_cma=<optimized out>, fixed=<optimized out>,

>     limit=<optimized out>, size=<optimized out>, base=<optimized out>)

> at ./include/linux/cma.h:28

> #6  dma_contiguous_reserve_area (size=<optimized out>, base=<optimized

> out>, limit=<optimized out>, res_cma=0xc07ccbdc

> <dma_contiguous_default_area>, fixed=false)

>     at kernel/dma/contiguous.c:201

> #7  0xc0608d22 in dma_contiguous_reserve (limit=<optimized out>) at

> kernel/dma/contiguous.c:171

> #8  0xc0604584 in arm_memblock_init (mdesc=0xc061bfe8

> <__mach_desc_GENERIC_DT.35641>) at arch/arm/mm/init.c:230

> #9  0xc060302c in setup_arch (cmdline_p=<optimized out>) at

> arch/arm/kernel/setup.c:1132

> #10 0xc06007d2 in start_kernel () at init/main.c:857

>

> Using `si` in gdb, it looks like we maybe hit an exception vector?

> x   1202                .section .vectors, "ax", %progbits

>

>                                             x

> x   1203        .L__vectors_start:

>

>                                             x

> x   1204                W(b)    vector_rst

>

>                                             x

> x   1205                W(b)    vector_und

>

>                                             x

> x   1206                W(ldr)  pc, .L__vectors_start + 0x1000

>

>                                             x

> x  >1207                W(b)    vector_pabt

>

> Is the last thing I see, then `si` stops working.  Not sure if `pabt`

> is a recognizable acronym to anyone more familiar with ARM?

>

> Happens regardless of your series, FWIW.

> --

> Thanks,

> ~Nick Desaulniers

Nick Desaulniers Sept. 18, 2020, 8:03 p.m. UTC | #15

On Wed, Sep 16, 2020 at 11:01 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>

> On Thu, 17 Sep 2020 at 00:25, Nick Desaulniers <ndesaulniers@google.com> wrote:

> > Also, it looks like the GCC build of milbeaut_m10v_defconfig fails to

> > boot for me in QEMU; so maybe not just a Clang bug (or maybe, more

> > than one bug).  (or maybe one of my command line params to QEMU is

> > wrong).

> >

>

> I understand that this is actually an existing issue in -next, but in

> general, why would you expect to be able to boot

> milbeaut_m10v_defconfig on anything other than a Milbeaut MV10

> machine?

We've been booting other configs in QEMU for a few years now, so I
don't see why yet another config would hurt.  Maybe there's some
hardware dependency, but I guess we'd find that out trying to boot it
in QEMU.  If it boots in QEMU, I guess not booting on metal wasn't so
bad?  Maybe this is considered an antipattern, but you can see how if
we've been getting away with it for years then that would lead to such
expectations.

> (whatever it is) Or does QEMU emulate a milbeaut machine?

$ qemu-system-arm -machine help

doesn't print anything that looks like it, on initial glance.  Looks
like a socionext part:
https://www.socionext.com/en/pr/sn_pr20170105_01e.pdf

> If

> not, better to stick with configs that are intended to boot on the

> QEMU machine emulation that you are using.

I can see in our CI that we've been building+boot testing
multi_v5_defconfig, aspeed_g5_defconfig, and multi_v7_defconfig for a
while now without specifying any machine.  Is there a preferred
machine we should be using for those?  (It looks like qemu supports
ast2500-evb and ast2600-evb; is ARM1176 and ARMv6 core? Is that what
aspeed_g5 uses? Why is virt versioned? Ahhhh!!!!)

> Also, while I see the point of regression testing of -next, using it

> as a base to test arbitrary series and then report failures against it

> produces a lot of noise. -next is *not* a good base for development,

> because you get everybody else's half baked crap as well.

Ack.

> When you

> test my stuff, please use a known good base so we're not off on a

> goose chase every time.

Goose Chase?! gOoSe ChAsE?! *gestures broadly at...everything*
Monsieur, here at the zoo, chasing the geese is not out of our
purview.  It's preferable to cleaning up after the elephants.
-- 
Thanks,
~Nick Desaulniers

Ard Biesheuvel Sept. 18, 2020, 8:44 p.m. UTC | #16

On Fri, 18 Sep 2020 at 22:03, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Wed, Sep 16, 2020 at 11:01 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 17 Sep 2020 at 00:25, Nick Desaulniers <ndesaulniers@google.com> wrote:
> > > Also, it looks like the GCC build of milbeaut_m10v_defconfig fails to
> > > boot for me in QEMU; so maybe not just a Clang bug (or maybe, more
> > > than one bug).  (or maybe one of my command line params to QEMU is
> > > wrong).
> > >
> >
> > I understand that this is actually an existing issue in -next, but in
> > general, why would you expect to be able to boot
> > milbeaut_m10v_defconfig on anything other than a Milbeaut MV10
> > machine?
>
> We've been booting other configs in QEMU for a few years now, so I
> don't see why yet another config would hurt.  Maybe there's some
> hardware dependency, but I guess we'd find that out trying to boot it
> in QEMU.  If it boots in QEMU, I guess not booting on metal wasn't so
> bad?  Maybe this is considered an antipattern, but you can see how if
> we've been getting away with it for years then that would lead to such
> expectations.
>
> > (whatever it is) Or does QEMU emulate a milbeaut machine?
>
> $ qemu-system-arm -machine help
>
> doesn't print anything that looks like it, on initial glance.  Looks
> like a socionext part:
> https://www.socionext.com/en/pr/sn_pr20170105_01e.pdf
>
> > If
> > not, better to stick with configs that are intended to boot on the
> > QEMU machine emulation that you are using.
>
> I can see in our CI that we've been building+boot testing
> multi_v5_defconfig, aspeed_g5_defconfig, and multi_v7_defconfig for a
> while now without specifying any machine.  Is there a preferred
> machine we should be using for those?  (It looks like qemu supports
> ast2500-evb and ast2600-evb; is ARM1176 and ARMv6 core? Is that what
> aspeed_g5 uses? Why is virt versioned? Ahhhh!!!!)
>

Milbeaut's serial output is on a "socionext,milbeaut-usio-uart" UART,
and its defconfig does not include drivers for the PL011 or 8250/16550
UARTs that you typically find on other boards. So how on earth would
you expect to get any output at all if QEMU does not emulate this
exact machine?

In general, if you use QEMU/mach-virt, the only defconfigs you should
reasonably be testing are the ones that contain CONFIG_ARCH_VIRT=y.


> > Also, while I see the point of regression testing of -next, using it
> > as a base to test arbitrary series and then report failures against it
> > produces a lot of noise. -next is *not* a good base for development,
> > because you get everybody else's half baked crap as well.
>
> Ack.
>
> > When you
> > test my stuff, please use a known good base so we're not off on a
> > goose chase every time.
>
> Goose Chase?! gOoSe ChAsE?! *gestures broadly at...everything*
> Monsieur, here at the zoo, chasing the geese is not out of our
> purview.  It's preferable to cleaning up after the elephants.

:-)

Nick Desaulniers Sept. 18, 2020, 9:06 p.m. UTC | #17

On Fri, Sep 18, 2020 at 1:44 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>

> On Fri, 18 Sep 2020 at 22:03, Nick Desaulniers <ndesaulniers@google.com> wrote:

> >

> > On Wed, Sep 16, 2020 at 11:01 PM Ard Biesheuvel <ardb@kernel.org> wrote:

> > >

> > > On Thu, 17 Sep 2020 at 00:25, Nick Desaulniers <ndesaulniers@google.com> wrote:

> > > > Also, it looks like the GCC build of milbeaut_m10v_defconfig fails to

> > > > boot for me in QEMU; so maybe not just a Clang bug (or maybe, more

> > > > than one bug).  (or maybe one of my command line params to QEMU is

> > > > wrong).

> > > >

> > >

> > > I understand that this is actually an existing issue in -next, but in

> > > general, why would you expect to be able to boot

> > > milbeaut_m10v_defconfig on anything other than a Milbeaut MV10

> > > machine?

> >

> > We've been booting other configs in QEMU for a few years now, so I

> > don't see why yet another config would hurt.  Maybe there's some

> > hardware dependency, but I guess we'd find that out trying to boot it

> > in QEMU.  If it boots in QEMU, I guess not booting on metal wasn't so

> > bad?  Maybe this is considered an antipattern, but you can see how if

> > we've been getting away with it for years then that would lead to such

> > expectations.

> >

> > > (whatever it is) Or does QEMU emulate a milbeaut machine?

> >

> > $ qemu-system-arm -machine help

> >

> > doesn't print anything that looks like it, on initial glance.  Looks

> > like a socionext part:

> > https://www.socionext.com/en/pr/sn_pr20170105_01e.pdf

> >

> > > If

> > > not, better to stick with configs that are intended to boot on the

> > > QEMU machine emulation that you are using.

> >

> > I can see in our CI that we've been building+boot testing

> > multi_v5_defconfig, aspeed_g5_defconfig, and multi_v7_defconfig for a

> > while now without specifying any machine.  Is there a preferred

> > machine we should be using for those?  (It looks like qemu supports

> > ast2500-evb and ast2600-evb; is ARM1176 and ARMv6 core? Is that what

> > aspeed_g5 uses? Why is virt versioned? Ahhhh!!!!)

> >

>

> Milbeaut's serial output is on a "socionext,milbeaut-usio-uart" UART,

> and its defconfig does not include drivers for the PL011 or 8250/16550

> UARTs that you typically find on other boards. So how on earth would

> you expect to get any output at all if QEMU does not emulate this

> exact machine?


breakpoints in panic()/printk(), lx-dmesg in GDB via
CONFIG_GDB_SCRIPTS=y generally works. :^)

I guess one thing I don't understand is how to check what UART or what
the name of the tty device would be.  I can grep for
"socionext,milbeaut-usio-uart" and see where it's defined, but I never
would have/still don't know how to find that. Please teach me how to
fish.  I understand the point of DT, and see

arch/arm/boot/dts/milbeaut-m10v.dtsi
76:                     compatible = "socionext,milbeaut-usio-uart";

but the comment about ttyUSI0 is confusing to me; that identifier
doesn't appear anywhere else in the kernel sources.

I generally have the same problem trying to run Pixel kernels in QEMU;
I don't understand how to determine which serial driver is being used,
and what the tty device would be named.  So I just enable the PLO11
driver.

Only last week I found that trying to use a shell as init without any
serial output can result in the shell exiting, panicking the kernel,
which doesn't print over serial...goose chase...

> In general, if you use QEMU/mach-virt, the only defconfigs you should

> reasonably be testing are the ones that contain CONFIG_ARCH_VIRT=y.


Looks like then that would only be multi_v7_defconfig.  How do we
continue to verify we can boot ARMv6 or ARMv5 under virtualization?
There are such machines in QEMU, but then no defconfigs in the kernel.
-- 
Thanks,
~Nick Desaulniers

[00/12] ARM: use adr_l/ldr_l macros for PC-relative references

Message

Comments