Message ID | 20240925150059.3955569-57-ardb+git@google.com |
---|---|
State | New |
Headers | show |
Series | x86: Rely on toolchain for relocatable code | expand |
On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote: > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Build the kernel as a Position Independent Executable (PIE). This > > results in more efficient relocation processing for the virtual > > displacement of the kernel (for KASLR). More importantly, it instructs > > the linker to generate what is actually needed (a program that can be > > moved around in memory before execution), which is better than having to > > rely on the linker to create a position dependent binary that happens to > > tolerate being moved around after poking it in exactly the right manner. > > > > Note that this means that all codegen should be compatible with PIE, > > including Rust objects, so this needs to switch to the small code model > > with the PIE relocation model as well. > > I think that related to this work is the patch series [1] that > introduces the changes necessary to build the kernel as Position > Independent Executable (PIE) on x86_64 [1]. There are some more places > that need to be adapted for PIE. The patch series also introduces > objtool functionality to add validation for x86 PIE. > > [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible" > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/ > Hi Uros, I am aware of that discussion, as I took part in it as well. I don't think any of those changes are actually needed now - did you notice anything in particular that is missing?
On Wed, 25 Sept 2024 at 21:39, Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, Sep 25, 2024 at 9:14 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote: > > > > > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > > > > > Build the kernel as a Position Independent Executable (PIE). This > > > > results in more efficient relocation processing for the virtual > > > > displacement of the kernel (for KASLR). More importantly, it instructs > > > > the linker to generate what is actually needed (a program that can be > > > > moved around in memory before execution), which is better than having to > > > > rely on the linker to create a position dependent binary that happens to > > > > tolerate being moved around after poking it in exactly the right manner. > > > > > > > > Note that this means that all codegen should be compatible with PIE, > > > > including Rust objects, so this needs to switch to the small code model > > > > with the PIE relocation model as well. > > > > > > I think that related to this work is the patch series [1] that > > > introduces the changes necessary to build the kernel as Position > > > Independent Executable (PIE) on x86_64 [1]. There are some more places > > > that need to be adapted for PIE. The patch series also introduces > > > objtool functionality to add validation for x86 PIE. > > > > > > [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible" > > > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/ > > > > > > > Hi Uros, > > > > I am aware of that discussion, as I took part in it as well. > > > > I don't think any of those changes are actually needed now - did you > > notice anything in particular that is missing? > > Some time ago I went through the kernel sources and proposed several > patches that changed all trivial occurrences of non-RIP addresses to > RIP ones. The work was partially based on the mentioned patch series, > and I remember, I left some of them out [e.g. 1], because they > required a temporary variable. I have a similar patch in my series, but the DEBUG_ENTRY code just uses pushf 1f@GOTPCREL(%rip) so no temporaries are needed. > Also, there was discussion about ftrace > [2], where no solution was found. > When linking with -z call-nop=suffix-nop, the __fentry__ call via the GOT will be relaxed by the linker into a 5 byte call followed by a 1 byte NOP, so I don't think we need to do anything special here. It might mean we currently lose -mnop-mcount until we find a solution for that in the compiler. In case you remember, I contributed and you merged a GCC patch that makes the __fentry__ emission logic honour -fdirect-access-external-data which should help here. This landed in GCC 14. > Looking through your series, I didn't find some of the non-RIP -> RIP > changes proposed by the original series (especially the ftrace part), > and noticed that there is no objtool validator proposed to ensure that > all generated code is indeed PIE compatible. > What would be the point of that? The linker will complain and throw an error if the code cannot be converted into a PIE executable, so I don't think we need objtool's help for that. > Speaking of non-RIP -> RIP changes that require a temporary - would it > be beneficial to make a macro that would use the RIP form only when > #ifdef CONFIG_X86_PIE? That would avoid code size increase when PIE is > not needed. > This series does not make the PIE support configurable. Do you think the code size increase is a concern if all GOT based symbol references are elided, e.g, via -fdirect-access-external-data?
On 25/09/2024 17:01, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > Build the kernel as a Position Independent Executable (PIE). This > results in more efficient relocation processing for the virtual > displacement of the kernel (for KASLR). More importantly, it instructs > the linker to generate what is actually needed (a program that can be > moved around in memory before execution), which is better than having to > rely on the linker to create a position dependent binary that happens to > tolerate being moved around after poking it in exactly the right manner. > > Note that this means that all codegen should be compatible with PIE, > including Rust objects, so this needs to switch to the small code model > with the PIE relocation model as well. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/x86/Kconfig | 2 +- > arch/x86/Makefile | 11 +++++++---- > arch/x86/boot/compressed/misc.c | 2 ++ > arch/x86/kernel/vmlinux.lds.S | 5 +++++ > drivers/firmware/efi/libstub/x86-stub.c | 2 ++ > 5 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 54cb1f14218b..dbb4d284b0e1 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2187,7 +2187,7 @@ config RANDOMIZE_BASE > # Relocation on x86 needs some additional build support > config X86_NEED_RELOCS > def_bool y > - depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE) > + depends on X86_32 && RELOCATABLE > > config PHYSICAL_ALIGN > hex "Alignment value to which kernel should be aligned" > diff --git a/arch/x86/Makefile b/arch/x86/Makefile > index 83d20f402535..c1dcff444bc8 100644 > --- a/arch/x86/Makefile > +++ b/arch/x86/Makefile > @@ -206,9 +206,8 @@ else > PIE_CFLAGS-$(CONFIG_SMP) += -mstack-protector-guard-reg=gs > endif > > - # Don't emit relaxable GOTPCREL relocations > - KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no > - KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no $(PIE_CFLAGS-y) > + KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS-y) > + KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small -Crelocation-model=pie > endif > > # > @@ -264,12 +263,16 @@ else > LDFLAGS_vmlinux := > endif > > +ifdef CONFIG_X86_64 > +ldflags-pie-$(CONFIG_LD_IS_LLD) := --apply-dynamic-relocs > +ldflags-pie-$(CONFIG_LD_IS_BFD) := -z call-nop=suffix-nop > +LDFLAGS_vmlinux += --pie -z text $(ldflags-pie-y) > + > # > # The 64-bit kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to > # the linker to force 2MB page size regardless of the default page size used > # by the linker. > # > -ifdef CONFIG_X86_64 > LDFLAGS_vmlinux += -z max-page-size=0x200000 > endif > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > index 89f01375cdb7..79e3ffe16f61 100644 > --- a/arch/x86/boot/compressed/misc.c > +++ b/arch/x86/boot/compressed/misc.c > @@ -495,6 +495,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output) > error("Destination virtual address changed when not relocatable"); > #endif > > + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR; > + > debug_putstr("\nDecompressing Linux... "); > > if (init_unaccepted_memory()) { > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S > index f7e832c2ac61..d172e6e8eaaf 100644 > --- a/arch/x86/kernel/vmlinux.lds.S > +++ b/arch/x86/kernel/vmlinux.lds.S > @@ -459,6 +459,11 @@ xen_elfnote_phys32_entry_offset = > > DISCARDS > > + /DISCARD/ : { > + *(.dynsym .gnu.hash .hash .dynamic .dynstr) > + *(.interp .dynbss .eh_frame .sframe) > + } > + > /* > * Make sure that the .got.plt is either completely empty or it > * contains only the lazy dispatch entries. > diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c > index f8e465da344d..5c03954924fe 100644 > --- a/drivers/firmware/efi/libstub/x86-stub.c > +++ b/drivers/firmware/efi/libstub/x86-stub.c > @@ -912,6 +912,8 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry) > if (status != EFI_SUCCESS) > return status; > > + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR; > + > entry = decompress_kernel((void *)addr, virt_addr, error); > if (entry == ULONG_MAX) { > efi_free(alloc_size, addr); This patch causes a build failure here (on 64-bit): LD .tmp_vmlinux2 NM .tmp_vmlinux2.syms KSYMS .tmp_vmlinux2.kallsyms.S AS .tmp_vmlinux2.kallsyms.o LD vmlinux BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free FAILED elf_update(WRITE): invalid section entry size make[5]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 255 make[5]: *** Deleting file 'vmlinux' make[4]: *** [Makefile:1153: vmlinux] Error 2 make[3]: *** [debian/rules:74: build-arch] Error 2 dpkg-buildpackage: error: make -f debian/rules binary subprocess returned exit status 2 make[2]: *** [scripts/Makefile.package:121: bindeb-pkg] Error 2 make[1]: *** [/home/opc/linux-mainline-worktree2/Makefile:1544: bindeb-pkg] Error 2 make: *** [Makefile:224: __sub-make] Error 2 The parent commit builds fine. With V=1: + ldflags='-m elf_x86_64 -z noexecstack --pie -z text -z call-nop=suffix-nop -z max-page-size=0x200000 --build-id=sha1 --orphan-handling=warn --script=./arch/x86/kernel/vmlinux.lds -Map=vmlinux.map' + ld -m elf_x86_64 -z noexecstack --pie -z text -z call-nop=suffix-nop -z max-page-size=0x200000 --build-id=sha1 --orphan-handling=warn --script=./arch/x86/kernel/vmlinux.lds -Map=vmlinux.map -o vmlinux --whole-archive vmlinux.a .vmlinux.export.o init/version-timestamp.o --no-whole-archive --start-group --end-group .tmp_vmlinux2.kallsyms.o .tmp_vmlinux1.btf.o + is_enabled CONFIG_DEBUG_INFO_BTF + grep -q '^CONFIG_DEBUG_INFO_BTF=y' include/config/auto.conf + info BTFIDS vmlinux + printf ' %-7s %s\n' BTFIDS vmlinux BTFIDS vmlinux + ./tools/bpf/resolve_btfids/resolve_btfids vmlinux WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free FAILED elf_update(WRITE): invalid section entry size I can send the full config off-list if necessary, but looks like it might be enough to set CONFIG_DEBUG_INFO_BTF=y. Vegard
On Wed, 25 Sept 2024 at 22:25, Vegard Nossum <vegard.nossum@oracle.com> wrote: > > > On 25/09/2024 17:01, Ard Biesheuvel wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Build the kernel as a Position Independent Executable (PIE). This > > results in more efficient relocation processing for the virtual > > displacement of the kernel (for KASLR). More importantly, it instructs > > the linker to generate what is actually needed (a program that can be > > moved around in memory before execution), which is better than having to > > rely on the linker to create a position dependent binary that happens to > > tolerate being moved around after poking it in exactly the right manner. > > > > Note that this means that all codegen should be compatible with PIE, > > including Rust objects, so this needs to switch to the small code model > > with the PIE relocation model as well. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > arch/x86/Kconfig | 2 +- > > arch/x86/Makefile | 11 +++++++---- > > arch/x86/boot/compressed/misc.c | 2 ++ > > arch/x86/kernel/vmlinux.lds.S | 5 +++++ > > drivers/firmware/efi/libstub/x86-stub.c | 2 ++ > > 5 files changed, 17 insertions(+), 5 deletions(-) > > ... > > This patch causes a build failure here (on 64-bit): > > LD .tmp_vmlinux2 > NM .tmp_vmlinux2.syms > KSYMS .tmp_vmlinux2.kallsyms.S > AS .tmp_vmlinux2.kallsyms.o > LD vmlinux > BTFIDS vmlinux > WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free > FAILED elf_update(WRITE): invalid section entry size > make[5]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 255 > make[5]: *** Deleting file 'vmlinux' > make[4]: *** [Makefile:1153: vmlinux] Error 2 > make[3]: *** [debian/rules:74: build-arch] Error 2 > dpkg-buildpackage: error: make -f debian/rules binary subprocess > returned exit status 2 > make[2]: *** [scripts/Makefile.package:121: bindeb-pkg] Error 2 > make[1]: *** [/home/opc/linux-mainline-worktree2/Makefile:1544: > bindeb-pkg] Error 2 > make: *** [Makefile:224: __sub-make] Error 2 > > The parent commit builds fine. With V=1: > > + ldflags='-m elf_x86_64 -z noexecstack --pie -z text -z > call-nop=suffix-nop -z max-page-size=0x200000 --build-id=sha1 > --orphan-handling=warn --script=./arch/x86/kernel/vmlinux.lds > -Map=vmlinux.map' > + ld -m elf_x86_64 -z noexecstack --pie -z text -z call-nop=suffix-nop > -z max-page-size=0x200000 --build-id=sha1 --orphan-handling=warn > --script=./arch/x86/kernel/vmlinux.lds -Map=vmlinux.map -o vmlinux > --whole-archive vmlinux.a .vmlinux.export.o init/version-timestamp.o > --no-whole-archive --start-group --end-group .tmp_vmlinux2.kallsyms.o > .tmp_vmlinux1.btf.o > + is_enabled CONFIG_DEBUG_INFO_BTF > + grep -q '^CONFIG_DEBUG_INFO_BTF=y' include/config/auto.conf > + info BTFIDS vmlinux > + printf ' %-7s %s\n' BTFIDS vmlinux > BTFIDS vmlinux > + ./tools/bpf/resolve_btfids/resolve_btfids vmlinux > WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free > FAILED elf_update(WRITE): invalid section entry size > > I can send the full config off-list if necessary, but looks like it > might be enough to set CONFIG_DEBUG_INFO_BTF=y. > Thanks for the report. Turns out that adding the GOT to .rodata bumps the section's sh_entsize to 8, and libelf complains if the section size is not a multiple of the entry size. I'll include a fix in the next revision.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 54cb1f14218b..dbb4d284b0e1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2187,7 +2187,7 @@ config RANDOMIZE_BASE # Relocation on x86 needs some additional build support config X86_NEED_RELOCS def_bool y - depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE) + depends on X86_32 && RELOCATABLE config PHYSICAL_ALIGN hex "Alignment value to which kernel should be aligned" diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 83d20f402535..c1dcff444bc8 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -206,9 +206,8 @@ else PIE_CFLAGS-$(CONFIG_SMP) += -mstack-protector-guard-reg=gs endif - # Don't emit relaxable GOTPCREL relocations - KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no - KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no $(PIE_CFLAGS-y) + KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS-y) + KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small -Crelocation-model=pie endif # @@ -264,12 +263,16 @@ else LDFLAGS_vmlinux := endif +ifdef CONFIG_X86_64 +ldflags-pie-$(CONFIG_LD_IS_LLD) := --apply-dynamic-relocs +ldflags-pie-$(CONFIG_LD_IS_BFD) := -z call-nop=suffix-nop +LDFLAGS_vmlinux += --pie -z text $(ldflags-pie-y) + # # The 64-bit kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to # the linker to force 2MB page size regardless of the default page size used # by the linker. # -ifdef CONFIG_X86_64 LDFLAGS_vmlinux += -z max-page-size=0x200000 endif diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 89f01375cdb7..79e3ffe16f61 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -495,6 +495,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output) error("Destination virtual address changed when not relocatable"); #endif + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR; + debug_putstr("\nDecompressing Linux... "); if (init_unaccepted_memory()) { diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index f7e832c2ac61..d172e6e8eaaf 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -459,6 +459,11 @@ xen_elfnote_phys32_entry_offset = DISCARDS + /DISCARD/ : { + *(.dynsym .gnu.hash .hash .dynamic .dynstr) + *(.interp .dynbss .eh_frame .sframe) + } + /* * Make sure that the .got.plt is either completely empty or it * contains only the lazy dispatch entries. diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index f8e465da344d..5c03954924fe 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -912,6 +912,8 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry) if (status != EFI_SUCCESS) return status; + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR; + entry = decompress_kernel((void *)addr, virt_addr, error); if (entry == ULONG_MAX) { efi_free(alloc_size, addr);