Message ID | 20180612152306.25998-1-ard.biesheuvel@linaro.org |
---|---|
Headers | show |
Series | GCC/X64: use hidden visibility for LTO PIE code | expand |
Some super-naive questions, which are supposed to educate me, and not to question the series: On 06/12/18 17:22, Ard Biesheuvel wrote: > The GCC toolchain uses PIE mode when building code for X64, because it > is the most efficient in size: it uses relative references where > possible, but still uses 64-bit quantities for absolute symbol > references, Absolute symbol references such as? References to fixed (constant) addresses? > which is optimal for executables that need to be converted > to PE/COFF using GenFw. Why is that approach optimal? As few relocations records are required as possible? > Enabling PIE mode has a couple of side effects though, primarily caused > by the fact that the primary application area of GCC is to build programs > for userland. GCC will assume that ELF symbols should be preemptible (which > makes sense for PIC but not for PIE, Why don't preemptible symbols make sense for PIE? For example, if a userspace program loads a plugin with dlopen(), and the plugin (.so) uses helper functions from the main executable, then the main executable has to be (well, had to be, earlier?) built with "-rdynamic". Wouldn't this mean the main executable could both be PIE and sensibly have preemptible symbols? (My apologies if I'm disturbingly ignorant about this and the question doesn't even make sense.) > but this simply seems to be the result > of code being shared between the two modes), and it will attempt to keep > absolute references close to each other so that dynamic relocations that > trigger CoW for text pages have the smallest possible footprint. So... Given this behavior, why is it a problem for us? What are the bad symptoms? What is currently broken? Sorry about my naivety here. Thanks, Laszlo > These side effects can be mititgated by overriding the visibility of all > symbol definitions *and* symbol references, using a special #pragma. This > will inform the compiler that symbol preemption and dynamic relocations > are not a concern, and that all symbol references can be emitted as direct > relative references rather than relative references to a GOT entry containing > the absolute address. Unsurprisingly, this leads to better and smaller code. > > Unfortunately, we have not been able to set this override when LTO is in > effect, because the LTO code generator infers from the hidden visibility > of all symbols that none of the code is reachable, and discards it all, > leading to corrupt, empty binaries. > > We can work around this by overriding the visibility for symbols that are > module entry points. So implement this for all occcurrences of the symbol > '_ModuleEntryPoint', and enable 'hidden' visibility in LTO builds as well. > > Note that all the changes in this series resolve to no-ops if USING_LTO > is not #defined. > > Code can be found here: > https://github.com/ardbiesheuvel/edk2/tree/x64-lto-visibility > > Cc: Michael D Kinney <michael.d.kinney@intel.com> > Cc: Liming Gao <liming.gao@intel.com> > Cc: Ruiyu Ni <ruiyu.ni@intel.com> > Cc: Hao Wu <hao.a.wu@intel.com> > Cc: Leif Lindholm <leif.lindholm@linaro.org> > Cc: Jordan Justen <jordan.l.justen@intel.com> > Cc: Andrew Fish <afish@apple.com> > Cc: Star Zeng <star.zeng@intel.com> > Cc: Eric Dong <eric.dong@intel.com> > Cc: Laszlo Ersek <lersek@redhat.com> > Cc: Zenith432 <zenith432@users.sourceforge.net> > Cc: "Shi, Steven" <steven.shi@intel.com> > > Ard Biesheuvel (11): > MdePkg/ProcessorBind.h: define macro to decorate module entry points > DuetPkg: annotate module entry points with EFI_ENTRYPOINT > EdkCompatibilityPkg: annotate module entry points with EFI_ENTRYPOINT > EmbeddedPkg: annotate module entry points with EFI_ENTRYPOINT > EmulatorPkg: annotate module entry points with EFI_ENTRYPOINT > IntelFrameWorkPkg: annotate module entry points with EFI_ENTRYPOINT > MdeModulePkg: annotate module entry points with EFI_ENTRYPOINT > MdePkg: annotate module entry points with EFI_ENTRYPOINT > Nt32Pkg: annotate module entry points with EFI_ENTRYPOINT > UefiCpuPkg: annotate module entry points with EFI_ENTRYPOINT > MdePkg/ProcessorBind.h X64: drop non-LTO limitation on visiblity > override > > DuetPkg/DxeIpl/DxeInit.c | 1 + > DuetPkg/EfiLdr/EfiLoader.c | 1 + > .../EntryPoints/EdkIIGlueDxeDriverEntryPoint.c | 1 + > .../EntryPoints/EdkIIGluePeimEntryPoint.c | 1 + > .../EntryPoints/EdkIIGlueSmmDriverEntryPoint.c | 1 + > .../Library/EdkIIGlueDxeSmmDriverEntryPoint.h | 1 + > .../Include/Library/EdkIIGluePeimEntryPoint.h | 1 + > .../Library/EdkIIGlueUefiDriverEntryPoint.h | 1 + > EmbeddedPkg/TemplateSec/TemplateSec.c | 1 + > EmulatorPkg/Sec/Sec.c | 1 + > .../DxeSmmDriverEntryPoint/DriverEntryPoint.c | 1 + > MdeModulePkg/Universal/CapsulePei/X64/X64Entry.c | 1 + > MdePkg/Include/Base.h | 7 +++++++ > MdePkg/Include/Library/DxeCoreEntryPoint.h | 1 + > MdePkg/Include/Library/PeiCoreEntryPoint.h | 1 + > MdePkg/Include/Library/PeimEntryPoint.h | 1 + > .../Include/Library/UefiApplicationEntryPoint.h | 1 + > MdePkg/Include/Library/UefiDriverEntryPoint.h | 1 + > MdePkg/Include/X64/ProcessorBind.h | 16 +++++++++++----- > .../DxeCoreEntryPoint/DxeCoreEntryPoint.c | 1 + > .../PeiCoreEntryPoint/PeiCoreEntryPoint.c | 1 + > MdePkg/Library/PeimEntryPoint/PeimEntryPoint.c | 1 + > .../ApplicationEntryPoint.c | 1 + > .../UefiDriverEntryPoint/DriverEntryPoint.c | 1 + > Nt32Pkg/Sec/SecMain.c | 1 + > .../PlatformSecLibNull/PlatformSecLibNull.c | 1 + > 26 files changed, 42 insertions(+), 5 deletions(-) > _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On 12 June 2018 at 20:33, Laszlo Ersek <lersek@redhat.com> wrote: > Some super-naive questions, which are supposed to educate me, and not to > question the series: > > On 06/12/18 17:22, Ard Biesheuvel wrote: >> The GCC toolchain uses PIE mode when building code for X64, because it >> is the most efficient in size: it uses relative references where >> possible, but still uses 64-bit quantities for absolute symbol >> references, > > Absolute symbol references such as? References to fixed (constant) > addresses? > I should have been clearer here: from the GCC man page (apologies for the whitespace soup) """ -mcmodel=small Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model. -mcmodel=kernel Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code. -mcmodel=medium Generate code for the medium model: the program is linked in the lower 2 GB of the address space. Small symbols are also placed there. Symbols with sizes larger than -mlarge-data-threshold are put into large data or BSS sections and can be located above 2GB. Programs can be statically or dynamically linked. -mcmodel=large Generate code for the large model. This model makes no assumptions about addresses and sizes of sections. """ Formerly, we used the large model because UEFI can load PE/COFF executables anywhere in the lower address space, not only in the first 2 GB. The small PIE model is the best fit for UEFI because it does not have this limitation, but [unlike the large model] only uses absolute references when necessary, and will use relative references when it can. (I.e., it assumes the program will fit in 4 GB of memory, which the large model does not) Absolute symbol references are things like statically initialized function pointer variables or other quantities whose value cannot be obtained programmatically at runtime using a relative reference. >> which is optimal for executables that need to be converted >> to PE/COFF using GenFw. > > Why is that approach optimal? As few relocations records are required as > possible? > Because GenFw translates ELF relocations into PE/COFF relocations, but only for the subset that requires fixing up at runtime. Relative references do not require such fixups, so a code model that minimizes the number of absolute relocations is therefore optimal. Note that absolute references typically require twice the space as well. >> Enabling PIE mode has a couple of side effects though, primarily caused >> by the fact that the primary application area of GCC is to build programs >> for userland. GCC will assume that ELF symbols should be preemptible (which >> makes sense for PIC but not for PIE, > > Why don't preemptible symbols make sense for PIE? > > For example, if a userspace program loads a plugin with dlopen(), and > the plugin (.so) uses helper functions from the main executable, then > the main executable has to be (well, had to be, earlier?) built with > "-rdynamic". Wouldn't this mean the main executable could both be PIE > and sensibly have preemptible symbols? > > (My apologies if I'm disturbingly ignorant about this and the question > doesn't even make sense.) > I mean that the symbols defined by the PIE executable [i.e., not shared library] can never be preempted. Only symbols in shared libraries can be preempted by the symbols in the main executable, not the other way around. >> but this simply seems to be the result >> of code being shared between the two modes), and it will attempt to keep >> absolute references close to each other so that dynamic relocations that >> trigger CoW for text pages have the smallest possible footprint. > > So... Given this behavior, why is it a problem for us? What are the bad > symptoms? What is currently broken? > The bad symptoms are that PIC code will use GOT entries for all symbol references, meaning that instead of a direct relative reference from the code, it will emit a relative reference to the GOT entry containing the absolute address of the symbol. This involves an additional memory reference, and it requires the GOT entry (which by definition contains an absolute address) to be fixed up at load time. What is broken [as reported by Zenith432] is that GCC in LTO mode may in some cases still emit GOT based relocations that GenFw currently cannot handle. If the address of a symbol is used in a calculation, or when the address of a symbol is taken but not dereferenced (but only passed to a function, for instance), GCC in -Os mode will optimize this into a GOTPCREL reference. Quoting from a private email from Zenith432 (who has already proposed GenFw changes to handle these relocations """ I figured out what's going on with LTO build in GCC5 that is compiled with -Os -flto -DUSING_LTO and does not use visibility #pragma. When compiling with LTO enabled, what happens is that all C source files are transformed during compilation stage to LTO intermediate bytecode (gimple in GCC). Then when static link (ld) takes place, all LTO intermediate bytecode is sent back to compiler code-generation backend to have machine code generated for it as if all the source code is one big C source file ("whole program optimization"). As a result of this, all the extern symbols become local symbols ! like file-level static. Because it's as if all the code is in one big source file. Since there is no dynamic linking, there are no more "extern", and all symbols are like file-level static and treated the same. This is why the LTO build stops emitting GOT loads for size-optimization purposes. GCC doesn't emit GOT loads for file-level static, and in LTO build they're all like that - so no GOT loads. But there is still something that fouls this up... If an extern symbol is defined in assembly source file. Because assembly source files don't participate in LTO. They are transformed by assembler into X64 machine code. During ld, any extern symbol that is defined in an assembly source file and declared and used by C source file is treated as before like external symbol. Which means code generator can go back to its practice of emitting GOT loads if they reduce code size. """ Instead of 'fixing' GenFw, I attempted to go back to the original changes Steven and I did for LTO, to try and remember why we could not use the GCC visibility #pragma when enabling LTO. That is the issue this series aims to fix (but it is an RFC, so comments welcome) -- Ard. _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Hi Ard, Zenith, Thank you both explained the complete knowledge about ELF GOT, LTO, PIC/PIE, machine code mode and GCC visibility #pragma. It is pretty good to read them all in one picture. And I believe copying these explain to a edk2 wiki page in GitHub could be very useful for other edk2 developers. From code change impact view, I see to use the hidden visibility for LTO, which is to remove the !defined(USING_LTO) in X64/ProcessorBind.h actually, need to change other 20+ files overall the edk2. The cost looks not small. We might need more justification to accept such change. Does the hidden visibility in LTO can improve the LTO build code size? Is there any other benefit? Steven Shi Intel\SSG\STO\UEFI Firmware Tel: +86 021-61166522 iNet: 821-6522 > -----Original Message----- > From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] > Sent: Tuesday, June 12, 2018 11:23 PM > To: edk2-devel@lists.01.org > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>; Kinney, Michael D > <michael.d.kinney@intel.com>; Gao, Liming <liming.gao@intel.com>; Ni, > Ruiyu <ruiyu.ni@intel.com>; Wu, Hao A <hao.a.wu@intel.com>; Leif > Lindholm <leif.lindholm@linaro.org>; Justen, Jordan L > <jordan.l.justen@intel.com>; Andrew Fish <afish@apple.com>; Zeng, Star > <star.zeng@intel.com>; Dong, Eric <eric.dong@intel.com>; Laszlo Ersek > <lersek@redhat.com>; Zenith432 <zenith432@users.sourceforge.net>; Shi, > Steven <steven.shi@intel.com> > Subject: [RFC PATCH 00/11] GCC/X64: use hidden visibility for LTO PIE code > > The GCC toolchain uses PIE mode when building code for X64, because it > is the most efficient in size: it uses relative references where > possible, but still uses 64-bit quantities for absolute symbol > references, which is optimal for executables that need to be converted > to PE/COFF using GenFw. > > Enabling PIE mode has a couple of side effects though, primarily caused > by the fact that the primary application area of GCC is to build programs > for userland. GCC will assume that ELF symbols should be preemptible (which > makes sense for PIC but not for PIE, but this simply seems to be the result > of code being shared between the two modes), and it will attempt to keep > absolute references close to each other so that dynamic relocations that > trigger CoW for text pages have the smallest possible footprint. > > These side effects can be mititgated by overriding the visibility of all > symbol definitions *and* symbol references, using a special #pragma. This > will inform the compiler that symbol preemption and dynamic relocations > are not a concern, and that all symbol references can be emitted as direct > relative references rather than relative references to a GOT entry containing > the absolute address. Unsurprisingly, this leads to better and smaller code. > > Unfortunately, we have not been able to set this override when LTO is in > effect, because the LTO code generator infers from the hidden visibility > of all symbols that none of the code is reachable, and discards it all, > leading to corrupt, empty binaries. > > We can work around this by overriding the visibility for symbols that are > module entry points. So implement this for all occcurrences of the symbol > '_ModuleEntryPoint', and enable 'hidden' visibility in LTO builds as well. > > Note that all the changes in this series resolve to no-ops if USING_LTO > is not #defined. > > Code can be found here: > https://github.com/ardbiesheuvel/edk2/tree/x64-lto-visibility > > Cc: Michael D Kinney <michael.d.kinney@intel.com> > Cc: Liming Gao <liming.gao@intel.com> > Cc: Ruiyu Ni <ruiyu.ni@intel.com> > Cc: Hao Wu <hao.a.wu@intel.com> > Cc: Leif Lindholm <leif.lindholm@linaro.org> > Cc: Jordan Justen <jordan.l.justen@intel.com> > Cc: Andrew Fish <afish@apple.com> > Cc: Star Zeng <star.zeng@intel.com> > Cc: Eric Dong <eric.dong@intel.com> > Cc: Laszlo Ersek <lersek@redhat.com> > Cc: Zenith432 <zenith432@users.sourceforge.net> > Cc: "Shi, Steven" <steven.shi@intel.com> > > Ard Biesheuvel (11): > MdePkg/ProcessorBind.h: define macro to decorate module entry points > DuetPkg: annotate module entry points with EFI_ENTRYPOINT > EdkCompatibilityPkg: annotate module entry points with EFI_ENTRYPOINT > EmbeddedPkg: annotate module entry points with EFI_ENTRYPOINT > EmulatorPkg: annotate module entry points with EFI_ENTRYPOINT > IntelFrameWorkPkg: annotate module entry points with EFI_ENTRYPOINT > MdeModulePkg: annotate module entry points with EFI_ENTRYPOINT > MdePkg: annotate module entry points with EFI_ENTRYPOINT > Nt32Pkg: annotate module entry points with EFI_ENTRYPOINT > UefiCpuPkg: annotate module entry points with EFI_ENTRYPOINT > MdePkg/ProcessorBind.h X64: drop non-LTO limitation on visiblity > override > > DuetPkg/DxeIpl/DxeInit.c | 1 + > DuetPkg/EfiLdr/EfiLoader.c | 1 + > .../EntryPoints/EdkIIGlueDxeDriverEntryPoint.c | 1 + > .../EntryPoints/EdkIIGluePeimEntryPoint.c | 1 + > .../EntryPoints/EdkIIGlueSmmDriverEntryPoint.c | 1 + > .../Library/EdkIIGlueDxeSmmDriverEntryPoint.h | 1 + > .../Include/Library/EdkIIGluePeimEntryPoint.h | 1 + > .../Library/EdkIIGlueUefiDriverEntryPoint.h | 1 + > EmbeddedPkg/TemplateSec/TemplateSec.c | 1 + > EmulatorPkg/Sec/Sec.c | 1 + > .../DxeSmmDriverEntryPoint/DriverEntryPoint.c | 1 + > MdeModulePkg/Universal/CapsulePei/X64/X64Entry.c | 1 + > MdePkg/Include/Base.h | 7 +++++++ > MdePkg/Include/Library/DxeCoreEntryPoint.h | 1 + > MdePkg/Include/Library/PeiCoreEntryPoint.h | 1 + > MdePkg/Include/Library/PeimEntryPoint.h | 1 + > .../Include/Library/UefiApplicationEntryPoint.h | 1 + > MdePkg/Include/Library/UefiDriverEntryPoint.h | 1 + > MdePkg/Include/X64/ProcessorBind.h | 16 +++++++++++----- > .../DxeCoreEntryPoint/DxeCoreEntryPoint.c | 1 + > .../PeiCoreEntryPoint/PeiCoreEntryPoint.c | 1 + > MdePkg/Library/PeimEntryPoint/PeimEntryPoint.c | 1 + > .../ApplicationEntryPoint.c | 1 + > .../UefiDriverEntryPoint/DriverEntryPoint.c | 1 + > Nt32Pkg/Sec/SecMain.c | 1 + > .../PlatformSecLibNull/PlatformSecLibNull.c | 1 + > 26 files changed, 42 insertions(+), 5 deletions(-) > > -- > 2.17.1 _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel