Message ID | 581C57FF.2090901@foss.arm.com |
---|---|
State | New |
Headers | show |
On many or perhaps all machines, elf_machine_load_address could now be implemented purely in C by using a link-time trick. In C, just: static inline ElfW(Addr) __attribute__ ((unused)) elf_machine_load_address (void) { extern const char _BASE[] __attribute__ ((visibility ("hidden"))); return (ElfW(Addr)) _BASE; } Then add a trivial input linker script to the ld.so link: PROVIDE_HIDDEN(_BASE = 0); I know this works for x86_64 and aarch64, and does not require a load. (On x86_64 it's a single lea; on aarch64 it's a single adr+add pair.)
On 04/11/16 21:24, Roland McGrath wrote: > On many or perhaps all machines, elf_machine_load_address could now be > implemented purely in C by using a link-time trick. > > In C, just: > > static inline ElfW(Addr) __attribute__ ((unused)) > elf_machine_load_address (void) > { > extern const char _BASE[] __attribute__ ((visibility ("hidden"))); > return (ElfW(Addr)) _BASE; > } > > Then add a trivial input linker script to the ld.so link: > > PROVIDE_HIDDEN(_BASE = 0); > > I know this works for x86_64 and aarch64, and does not require a load. > (On x86_64 it's a single lea; on aarch64 it's a single adr+add pair.) > this is less maintenance work, because code can be shared, but it is not a portable solution: it relies on linker scripts and on the compiler not doing anything silly. i think asm is preferable, unless we know that all supported linkers handle this on all targets.
On 07/11/16 15:15, Szabolcs Nagy wrote: > On 04/11/16 21:24, Roland McGrath wrote: >> On many or perhaps all machines, elf_machine_load_address could now be >> implemented purely in C by using a link-time trick. >> >> In C, just: >> >> static inline ElfW(Addr) __attribute__ ((unused)) >> elf_machine_load_address (void) >> { >> extern const char _BASE[] __attribute__ ((visibility ("hidden"))); >> return (ElfW(Addr)) _BASE; >> } >> >> Then add a trivial input linker script to the ld.so link: >> >> PROVIDE_HIDDEN(_BASE = 0); on a second thought: why is it not ok to use _DYNAMIC instead of _BASE? then no linker script is needed (_DYNAMIC is in the elf spec).
On 07/11/16 15:23, Szabolcs Nagy wrote: > On 07/11/16 15:15, Szabolcs Nagy wrote: >> On 04/11/16 21:24, Roland McGrath wrote: >>> On many or perhaps all machines, elf_machine_load_address could now be >>> implemented purely in C by using a link-time trick. >>> >>> In C, just: >>> >>> static inline ElfW(Addr) __attribute__ ((unused)) >>> elf_machine_load_address (void) >>> { >>> extern const char _BASE[] __attribute__ ((visibility ("hidden"))); >>> return (ElfW(Addr)) _BASE; >>> } >>> >>> Then add a trivial input linker script to the ld.so link: >>> >>> PROVIDE_HIDDEN(_BASE = 0); > > on a second thought: > why is it not ok to use _DYNAMIC instead of _BASE? > > then no linker script is needed (_DYNAMIC is in the elf spec). > hidden symbol is not accessed with direct pc relative addressing on mips so this approach does not work in general.
There is plenty more reliance on the compiler not doing the wrong things. I don't see any new issue there. The use of a linker script here also does not concern me. This only affects building ld.so itself, so there is no issue about general linker compatibility. We have plenty more use of fancy linker features and only a few linkers are capable of building libc already. I never said I was sure this technique works on all machines. It certainly works on aarch64. Show me the code you have in mind using _DYNAMIC. The scheme using a linker-defined symbol with value 0 is the only one I'm aware of that reduces to the minimal number of assembly instructions, with none of them being a load.
On 08/11/16 21:28, Roland McGrath wrote: > Show me the code you have in mind using _DYNAMIC. The scheme using a > linker-defined symbol with value 0 is the only one I'm aware of that > reduces to the minimal number of assembly instructions, with none of them > being a load. well the current x86_64 code is already doing what i had in mind. i assumed GOT[0] is used elsewhere so it has to be computed anyway and then doing (_DYNAMIC-GOT[0]) should be the same as _BASE using an extra sub.
On Mon, 7 Nov 2016, Szabolcs Nagy wrote:
> hidden symbol is not accessed with direct pc relative addressing on mips
Well, the regular MIPS ISA has no PC-relative addressing mode (except
from branch instructions), so this can't be done with that instruction set
(the MIPS16 and microMIPS ISAs do have some forms of PC-relative
addressing, which can be used to access hidden and internal symbols
bypassing GOT in PIC code if the compiler is smart enough).
Maciej
> On 08/11/16 21:28, Roland McGrath wrote: > > Show me the code you have in mind using _DYNAMIC. The scheme using a > > linker-defined symbol with value 0 is the only one I'm aware of that > > reduces to the minimal number of assembly instructions, with none of them > > being a load. > > well the current x86_64 code is already doing what i had in mind. And it is more costly than using _BASE. > i assumed GOT[0] is used elsewhere so it has to be computed anyway > and then doing (_DYNAMIC-GOT[0]) should be the same as _BASE using > an extra sub. Of course all the methods that work get the same result! The point is that the _BASE method does it the most efficiently.
On 04/11/16 09:42, Renlin Li wrote: > Hi all, > > This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC > symbol instead of _dl_start. > > The static address of _DYNAMIC symbol is stored in the first GOT entry. > Here is the change which makes this solution work. > https://sourceware.org/ml/binutils/2013-06/msg00248.html > > i386, x86_64 targets use the same method to do this as well. > > The original implementation relies on a trick that R_AARCH64_ABS32 relocation > being resolved at link time and the static address fits in the 32bits. > However, in LP64, normally, the address is defined to be 64 bit. > > Additionally, the original inline assembly is not optimized. It uses 4 > instructions including a jump. > > Optimally, the new implementation here is just two instructions: > ldr %1, _GLOBAL_OFFSET_TABLE_ > adr %2, _DYNAMIC > > The size of ld.so is around 130K, so it's save to use ldr, adr to get the address. > The address range for those two instruction is +/-1MB. > > And by the way, this method is ILP32 safe as well. > aarch64 linux toolchain regression test OK. OK to commit? > > Regards, > Renlin Li > > > ChangeLog: > > 2016-11-04 Renlin Li <renlin.li@arm.com> > > * sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use > _DYNAMIC symbol to calculate load address. This is OK. (Roland notes that introducing a BASE symbol with a linker script would even avoid loading GOT[0], but that can be done separately across targets)
On 17/10/17 16:41, Szabolcs Nagy wrote: > On 04/11/16 09:42, Renlin Li wrote: >> Hi all, >> >> This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC >> symbol instead of _dl_start. >> >> The static address of _DYNAMIC symbol is stored in the first GOT entry. >> Here is the change which makes this solution work. >> https://sourceware.org/ml/binutils/2013-06/msg00248.html >> >> i386, x86_64 targets use the same method to do this as well. >> >> The original implementation relies on a trick that R_AARCH64_ABS32 relocation >> being resolved at link time and the static address fits in the 32bits. >> However, in LP64, normally, the address is defined to be 64 bit. >> >> Additionally, the original inline assembly is not optimized. It uses 4 >> instructions including a jump. >> >> Optimally, the new implementation here is just two instructions: >> ldr %1, _GLOBAL_OFFSET_TABLE_ >> adr %2, _DYNAMIC >> >> The size of ld.so is around 130K, so it's save to use ldr, adr to get the address. >> The address range for those two instruction is +/-1MB. >> >> And by the way, this method is ILP32 safe as well. >> aarch64 linux toolchain regression test OK. OK to commit? >> >> Regards, >> Renlin Li >> >> >> ChangeLog: >> >> 2016-11-04 Renlin Li <renlin.li@arm.com> >> >> * sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use >> _DYNAMIC symbol to calculate load address. > > This is OK. > > (Roland notes that introducing a BASE symbol with a > linker script would even avoid loading GOT[0], but > that can be done separately across targets) > please wait with this. looking at the static pie patches, it seems that also needs to compute the base address and that cannot assume -mcmodel=tiny, i don't remember if there was a particular reason -mcmodel=large would be problematic, if inline asm was only used to save a few instructions then please resend the patch but using c code (like what x86_64 is doing), that's less fragile.
Hi Szabolcs, Here is the C version one which should be portable in all cases. aarch64 native glibc regression test checked Okay. Regards, Renlin ChangeLog: 2017-10-18 Renlin Li <renlin.li@arm.com> * sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use _DYNAMIC symbol to calculate load address. On 17/10/17 17:28, Szabolcs Nagy wrote: > On 17/10/17 16:41, Szabolcs Nagy wrote: >> On 04/11/16 09:42, Renlin Li wrote: >>> Hi all, >>> >>> This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC >>> symbol instead of _dl_start. >>> >>> The static address of _DYNAMIC symbol is stored in the first GOT entry. >>> Here is the change which makes this solution work. >>> https://sourceware.org/ml/binutils/2013-06/msg00248.html >>> >>> i386, x86_64 targets use the same method to do this as well. >>> >>> The original implementation relies on a trick that R_AARCH64_ABS32 relocation >>> being resolved at link time and the static address fits in the 32bits. >>> However, in LP64, normally, the address is defined to be 64 bit. >>> >>> Additionally, the original inline assembly is not optimized. It uses 4 >>> instructions including a jump. >>> >>> Optimally, the new implementation here is just two instructions: >>> ldr %1, _GLOBAL_OFFSET_TABLE_ >>> adr %2, _DYNAMIC >>> >>> The size of ld.so is around 130K, so it's save to use ldr, adr to get the address. >>> The address range for those two instruction is +/-1MB. >>> >>> And by the way, this method is ILP32 safe as well. >>> aarch64 linux toolchain regression test OK. OK to commit? >>> >>> Regards, >>> Renlin Li >>> >>> >>> ChangeLog: >>> >>> 2016-11-04 Renlin Li <renlin.li@arm.com> >>> >>> * sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use >>> _DYNAMIC symbol to calculate load address. >> >> This is OK. >> >> (Roland notes that introducing a BASE symbol with a >> linker script would even avoid loading GOT[0], but >> that can be done separately across targets) >> > > please wait with this. > > looking at the static pie patches, it seems that also needs > to compute the base address and that cannot assume -mcmodel=tiny, > i don't remember if there was a particular reason -mcmodel=large > would be problematic, if inline asm was only used to save a > few instructions then please resend the patch but using c code > (like what x86_64 is doing), that's less fragile. > diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h index b124547..e765612 100644 --- a/sysdeps/aarch64/dl-machine.h +++ b/sysdeps/aarch64/dl-machine.h @@ -51,40 +51,11 @@ elf_machine_load_address (void) /* To figure out the load address we use the definition that for any symbol: dynamic_addr(symbol) = static_addr(symbol) + load_addr - The choice of symbol is arbitrary. The static address we obtain - by constructing a non GOT reference to the symbol, the dynamic - address of the symbol we compute using adrp/add to compute the - symbol's address relative to the PC. - This depends on 32/16bit relocations being resolved at link time - and that the static address fits in the 32/16 bits. */ - - ElfW(Addr) static_addr; - ElfW(Addr) dynamic_addr; - - asm (" \n" -" adrp %1, _dl_start; \n" -#ifdef __LP64__ -" add %1, %1, #:lo12:_dl_start \n" -#else -" add %w1, %w1, #:lo12:_dl_start \n" -#endif -" ldr %w0, 1f \n" -" b 2f \n" -"1: \n" -#ifdef __LP64__ -" .word _dl_start \n" -#else -# ifdef __AARCH64EB__ -" .short 0 \n" -# endif -" .short _dl_start \n" -# ifndef __AARCH64EB__ -" .short 0 \n" -# endif -#endif -"2: \n" - : "=r" (static_addr), "=r" (dynamic_addr)); - return dynamic_addr - static_addr; + _DYNAMIC sysmbol is used here as its link-time address stored in + the special unrelocated first GOT entry. */ + + extern ElfW(Dyn) _DYNAMIC[] attribute_hidden; + return (ElfW(Addr)) &_DYNAMIC - elf_machine_dynamic (); } /* Set up the loaded object described by L so its unrelocated PLT
On 18/10/17 11:32, Renlin Li wrote: > Hi Szabolcs, > > Here is the C version one which should be portable in all cases. > aarch64 native glibc regression test checked Okay. > > Regards, > Renlin > > ChangeLog: > > 2017-10-18 Renlin Li <renlin.li@arm.com> > > * sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use > _DYNAMIC symbol to calculate load address. > This is OK to commit.
diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h index 217e179..b2f6618 100644 --- a/sysdeps/aarch64/dl-machine.h +++ b/sysdeps/aarch64/dl-machine.h @@ -49,25 +49,19 @@ elf_machine_load_address (void) /* To figure out the load address we use the definition that for any symbol: dynamic_addr(symbol) = static_addr(symbol) + load_addr - The choice of symbol is arbitrary. The static address we obtain - by constructing a non GOT reference to the symbol, the dynamic - address of the symbol we compute using adrp/add to compute the - symbol's address relative to the PC. - This depends on 32bit relocations being resolved at link time - and that the static address fits in the 32bits. */ - - ElfW(Addr) static_addr; - ElfW(Addr) dynamic_addr; + _DYNAMIC symbol is used here as its static address is stored in + the special unrelocated first GOT entry. */ + ElfW(Addr) static_addr, dynamic_addr; asm (" \n" -" adrp %1, _dl_start; \n" -" add %1, %1, #:lo12:_dl_start \n" -" ldr %w0, 1f \n" -" b 2f \n" -"1: \n" -" .word _dl_start \n" -"2: \n" - : "=r" (static_addr), "=r" (dynamic_addr)); + "adr %0, _DYNAMIC \n" +#ifdef __LP64__ + "ldr %1, _GLOBAL_OFFSET_TABLE_ \n" +#else + "ldr %w1, _GLOBAL_OFFSET_TABLE_ \n" +#endif + : "=r" (dynamic_addr), "=r" (static_addr)); + return dynamic_addr - static_addr; }