Message ID | 20240819145417.23367-1-piliu@redhat.com |
---|---|
Headers | show |
Series | UEFI emulator for kexec | expand |
On Tue, Aug 20, 2024 at 2:00 AM Jarkko Sakkinen <jarkko@kernel.org> wrote: > > On Mon Aug 19, 2024 at 5:53 PM EEST, Pingfan Liu wrote: > > efi_random_alloc() demands EFI_ALLOCATE_ADDRESS when allocate_pages(), > > but the current implement can not ensure the selected target locates > > inside free area, that is to exclude EFI_BOOT_SERVICES_*, > > EFI_RUNTIME_SERVICES_* etc. > > > > Fix the issue by checking md->type. > > If it is a fix shouldn't this have a fixes tag? > Yes, I will supplement the following in the next version Fixes: 2ddbfc81eac8 ("efi: stub: add implementation of efi_random_alloc()") > > > > Signed-off-by: Pingfan Liu <piliu@redhat.com> > > Cc: Ard Biesheuvel <ardb@kernel.org> > > To: linux-efi@vger.kernel.org > > --- > > drivers/firmware/efi/libstub/randomalloc.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c > > index c41e7b2091cdd..7304e767688f2 100644 > > --- a/drivers/firmware/efi/libstub/randomalloc.c > > +++ b/drivers/firmware/efi/libstub/randomalloc.c > > @@ -79,6 +79,8 @@ efi_status_t efi_random_alloc(unsigned long size, > > efi_memory_desc_t *md = (void *)map->map + map_offset; > > unsigned long slots; > > > > I'd add this inline comment: > > /* Skip "unconventional" memory: */ > Adopt. Thanks for your kind review. Best Regards, Pingfan > > + if (!(md->type & (EFI_CONVENTIONAL_MEMORY || EFI_PERSISTENT_MEMORY))) > > + continue; > > slots = get_entry_num_slots(md, size, ilog2(align), alloc_min, > > alloc_max); > > MD_NUM_SLOTS(md) = slots; > > @@ -111,6 +113,9 @@ efi_status_t efi_random_alloc(unsigned long size, > > efi_physical_addr_t target; > > unsigned long pages; > > > > + if (!(md->type & (EFI_CONVENTIONAL_MEMORY || EFI_PERSISTENT_MEMORY))) > > + continue; > > + > > if (total_mirrored_slots > 0 && > > !(md->attribute & EFI_MEMORY_MORE_RELIABLE)) > > continue; > > BR, Jarkko >
On Mo, 19.08.24 22:53, Pingfan Liu (piliu@redhat.com) wrote: > *** Background *** > > As more PE format kernel images are introduced, it post challenge to kexec to > cope with the new format. > > In my attempt to add support for arm64 zboot image in the kernel [1], > Ard suggested using an emulator to tackle this issue. Last year, when > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > emulator approach again [3] Hmm, systemd's systemd-stub code tries to load certain "side-car" files placed next to the UKI, via the UEFI file system APIs. What's your intention with the UEFI emulator regarding that? The sidecars are somewhat important, because that's how we parameterize otherwise strictly sealed, immutable UKIs. Hence, what's the story there? implement some form of fs driver (for what fs precisely?) in the emulator too? And regarding tpm? tpms require drivers and i guess at the moment uefi emulator would run those aren't available anymore? but we really should do a separator measurement then. (also there needs to be some way to pass over measurement log of that measurement?) Lennart -- Lennart Poettering, Berlin
On Wed, Aug 21, 2024 at 10:27 PM Lennart Poettering <mzxreary@0pointer.de> wrote: > > On Mo, 19.08.24 22:53, Pingfan Liu (piliu@redhat.com) wrote: > > > *** Background *** > > > > As more PE format kernel images are introduced, it post challenge to kexec to > > cope with the new format. > > > > In my attempt to add support for arm64 zboot image in the kernel [1], > > Ard suggested using an emulator to tackle this issue. Last year, when > > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > > emulator approach again [3] > > Hmm, systemd's systemd-stub code tries to load certain "side-car" > files placed next to the UKI, via the UEFI file system APIs. What's > your intention with the UEFI emulator regarding that? The sidecars are > somewhat important, because that's how we parameterize otherwise > strictly sealed, immutable UKIs. > IIUC, you are referring to UKI addons. > Hence, what's the story there? implement some form of fs driver (for > what fs precisely?) in the emulator too? > As for addon, that is a missing part in this series. I have overlooked this issue. Originally, I thought that there was no need to implement a disk driver and vfat file system, just preload them into memory, and finally present them through the uefi API. I will take a closer look at it and chew on it. > And regarding tpm? tpms require drivers and i guess at the moment uefi > emulator would run those aren't available anymore? but we really > should do a separator measurement then. (also there needs to be some > way to pass over measurement log of that measurement?) > It is a pity that it is a common issue persistent with kexec-reboot kernel nowadays. I am not familiar with TPM and have no clear idea for the time being. (emulating Platform Configuration Registers ?). But since this emulator is held inside a linux kernel image, and the UKI's signature is checked during kexec_file_load. All of them are safe from modification, this security is not an urgent issue. Thanks for sharing your thoughts and insights. Best Regards, Pingfan > Lennart > > -- > Lennart Poettering, Berlin >
On Thu, 22 Aug 2024 at 13:42, Pingfan Liu <piliu@redhat.com> wrote: > > On Wed, Aug 21, 2024 at 10:27 PM Lennart Poettering > <mzxreary@0pointer.de> wrote: > > > > On Mo, 19.08.24 22:53, Pingfan Liu (piliu@redhat.com) wrote: > > > > > *** Background *** > > > > > > As more PE format kernel images are introduced, it post challenge to kexec to > > > cope with the new format. > > > > > > In my attempt to add support for arm64 zboot image in the kernel [1], > > > Ard suggested using an emulator to tackle this issue. Last year, when > > > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > > > emulator approach again [3] > > > > Hmm, systemd's systemd-stub code tries to load certain "side-car" > > files placed next to the UKI, via the UEFI file system APIs. What's > > your intention with the UEFI emulator regarding that? The sidecars are > > somewhat important, because that's how we parameterize otherwise > > strictly sealed, immutable UKIs. > > > IIUC, you are referring to UKI addons. > > > Hence, what's the story there? implement some form of fs driver (for > > what fs precisely?) in the emulator too? > > > As for addon, that is a missing part in this series. I have overlooked > this issue. Originally, I thought that there was no need to implement > a disk driver and vfat file system, just preload them into memory, and > finally present them through the uefi API. I will take a closer look > at it and chew on it. > Hi Pingfan, If more and more stuff needs coming in, not only the limited boot services then it will be way too complicated and hard to maintain and debug, also the two kexec code paths are duplicated somehow. It is really bad.. I forgot why we can not just extract the kernel from UKI and then load it directly, if the embedded kernel is also signed it should be good? Thanks Dave
On Do, 22.08.24 13:42, Pingfan Liu (piliu@redhat.com) wrote: > On Wed, Aug 21, 2024 at 10:27 PM Lennart Poettering > <mzxreary@0pointer.de> wrote: > > > > On Mo, 19.08.24 22:53, Pingfan Liu (piliu@redhat.com) wrote: > > > > > *** Background *** > > > > > > As more PE format kernel images are introduced, it post challenge to kexec to > > > cope with the new format. > > > > > > In my attempt to add support for arm64 zboot image in the kernel [1], > > > Ard suggested using an emulator to tackle this issue. Last year, when > > > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > > > emulator approach again [3] > > > > Hmm, systemd's systemd-stub code tries to load certain "side-car" > > files placed next to the UKI, via the UEFI file system APIs. What's > > your intention with the UEFI emulator regarding that? The sidecars are > > somewhat important, because that's how we parameterize otherwise > > strictly sealed, immutable UKIs. > > > IIUC, you are referring to UKI addons. Yeah, UKI addons, as well as credential files, and sysext/confext DDIs. The addons are the most interesting btw, because we load them into memory as PE files, and ask the UEFI to authenticate them. > > Hence, what's the story there? implement some form of fs driver (for > > what fs precisely?) in the emulator too? > > > As for addon, that is a missing part in this series. I have overlooked > this issue. Originally, I thought that there was no need to implement > a disk driver and vfat file system, just preload them into memory, and > finally present them through the uefi API. I will take a closer look > at it and chew on it. It doesn't have to be VFAT btw. It just has to be something. For example, it might suffice to take these files, pack them up as cpio or so and pass them along with the UEFI execution. The UEFI emulator would then have to expose them as a file system then. We are not talking of a bazillion of files here, it's mostly a smallish number of sidecar files I'd expect. > > And regarding tpm? tpms require drivers and i guess at the moment uefi > > emulator would run those aren't available anymore? but we really > > should do a separator measurement then. (also there needs to be some > > way to pass over measurement log of that measurement?) > > It is a pity that it is a common issue persistent with kexec-reboot > kernel nowadays. > I am not familiar with TPM and have no clear idea for the time being. > (emulating Platform Configuration Registers ?). But since this > emulator is held inside a linux kernel image, and the UKI's signature > is checked during kexec_file_load. All of them are safe from > modification, this security is not an urgent issue. Hmm, I'd really think about this with some priority. The measurement stuff should not be an afterthought, it typically has major implications on how you design your transitions, because measurements of some component always need to happen *before* you pass control to it, otherwise they are pointless. Lennart -- Lennart Poettering, Berlin
On Thu, Aug 22, 2024 at 4:23 PM Lennart Poettering <mzxreary@0pointer.de> wrote: > > On Do, 22.08.24 13:42, Pingfan Liu (piliu@redhat.com) wrote: > > > On Wed, Aug 21, 2024 at 10:27 PM Lennart Poettering > > <mzxreary@0pointer.de> wrote: > > > > > > On Mo, 19.08.24 22:53, Pingfan Liu (piliu@redhat.com) wrote: > > > > > > > *** Background *** > > > > > > > > As more PE format kernel images are introduced, it post challenge to kexec to > > > > cope with the new format. > > > > > > > > In my attempt to add support for arm64 zboot image in the kernel [1], > > > > Ard suggested using an emulator to tackle this issue. Last year, when > > > > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > > > > emulator approach again [3] > > > > > > Hmm, systemd's systemd-stub code tries to load certain "side-car" > > > files placed next to the UKI, via the UEFI file system APIs. What's > > > your intention with the UEFI emulator regarding that? The sidecars are > > > somewhat important, because that's how we parameterize otherwise > > > strictly sealed, immutable UKIs. > > > > > IIUC, you are referring to UKI addons. > > Yeah, UKI addons, as well as credential files, and sysext/confext > DDIs. > > The addons are the most interesting btw, because we load them into > memory as PE files, and ask the UEFI to authenticate them. > > > > Hence, what's the story there? implement some form of fs driver (for > > > what fs precisely?) in the emulator too? > > > > > As for addon, that is a missing part in this series. I have overlooked > > this issue. Originally, I thought that there was no need to implement > > a disk driver and vfat file system, just preload them into memory, and > > finally present them through the uefi API. I will take a closer look > > at it and chew on it. > > It doesn't have to be VFAT btw. It just has to be something. For > example, it might suffice to take these files, pack them up as cpio or > so and pass them along with the UEFI execution. The UEFI emulator > would then have to expose them as a file system then. > > We are not talking of a bazillion of files here, it's mostly a > smallish number of sidecar files I'd expect. > Yes, I think about using <key, value>, where key is the file path, value is the file content. > > > And regarding tpm? tpms require drivers and i guess at the moment uefi > > > emulator would run those aren't available anymore? but we really > > > should do a separator measurement then. (also there needs to be some > > > way to pass over measurement log of that measurement?) > > > > It is a pity that it is a common issue persistent with kexec-reboot > > kernel nowadays. > > I am not familiar with TPM and have no clear idea for the time being. > > (emulating Platform Configuration Registers ?). But since this > > emulator is held inside a linux kernel image, and the UKI's signature > > is checked during kexec_file_load. All of them are safe from > > modification, this security is not an urgent issue. > > Hmm, I'd really think about this with some priority. The measurement > stuff should not be an afterthought, it typically has major > implications on how you design your transitions, because measurements > of some component always need to happen *before* you pass control to > it, otherwise they are pointless. > OK, I will look into the details of TPM to see how to bail out. Thanks, Pingfan
On Thu, Aug 22, 2024 at 2:17 PM Dave Young <dyoung@redhat.com> wrote: > > On Thu, 22 Aug 2024 at 13:42, Pingfan Liu <piliu@redhat.com> wrote: > > > > On Wed, Aug 21, 2024 at 10:27 PM Lennart Poettering > > <mzxreary@0pointer.de> wrote: > > > > > > On Mo, 19.08.24 22:53, Pingfan Liu (piliu@redhat.com) wrote: > > > > > > > *** Background *** > > > > > > > > As more PE format kernel images are introduced, it post challenge to kexec to > > > > cope with the new format. > > > > > > > > In my attempt to add support for arm64 zboot image in the kernel [1], > > > > Ard suggested using an emulator to tackle this issue. Last year, when > > > > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > > > > emulator approach again [3] > > > > > > Hmm, systemd's systemd-stub code tries to load certain "side-car" > > > files placed next to the UKI, via the UEFI file system APIs. What's > > > your intention with the UEFI emulator regarding that? The sidecars are > > > somewhat important, because that's how we parameterize otherwise > > > strictly sealed, immutable UKIs. > > > > > IIUC, you are referring to UKI addons. > > > > > Hence, what's the story there? implement some form of fs driver (for > > > what fs precisely?) in the emulator too? > > > > > As for addon, that is a missing part in this series. I have overlooked > > this issue. Originally, I thought that there was no need to implement > > a disk driver and vfat file system, just preload them into memory, and > > finally present them through the uefi API. I will take a closer look > > at it and chew on it. > > > > Hi Pingfan, > > If more and more stuff needs coming in, not only the limited boot > services then it will be way too complicated and hard to maintain and > debug, also the two kexec code paths are duplicated somehow. It is > really bad.. > OK, I will try to keep things easier. And what do you mean about " two kexec code paths"? > I forgot why we can not just extract the kernel from UKI and then load > it directly, if the embedded kernel is also signed it should be good? > I think the main concern is about the signature. Thanks, Pingfan
Hi Dave, > I forgot why we can not just extract the kernel from UKI and then load > it directly, if the embedded kernel is also signed it should be good? The problem is that in the basic usecase for UKI you only sign the entire UKI PE file and not the included kernel, because you only want that kernel to be run with that one initrd and that one kernel cmdline. So at a minimum you have to have the signature on the whole UKI checked by the kernel and than have the kernel extract UKI into its parts unless you somehow want to extent trust into userspace to have a helper program do that. That's what my UKI support implementation from last year did. v1: https://lore.kernel.org/lkml/20230909161851.223627-1-kernel@jfarr.cc/ v2: https://lore.kernel.org/lkml/20230911052535.335770-1-kernel@jfarr.cc/ v3-wip: https://github.com/Cydox/linux/blob/2908db6d8556fa617298cfb713355edaa9e4b095/arch/x86/kernel/kexec-uki.c It however also lacks support for the "side-car" files. One option to add them would be to load them using subsequent calls to kexec_file_load with a special flag maybe. TPM measurements are also not done although they are way easier to implement with this approach as we still have the rest of the kernel around. However TPM measurements in this case would be implemented by the kexec loader in the kernel not by the UKI deciding what to measure. So we would have to have a very firm agreement on what to measure. Going the UEFI emulator route gives the UKI format (and other (future) formats) way more flexibility. The cost is to potentially implementing a large portion of the UEFI spec, especially if the goal is to support future unknown formats which IIRC was one of the reasons this approach was suggested. Kind regards, Jan
On 22 18:45:38, Pingfan Liu wrote: > On Thu, Aug 22, 2024 at 4:23 PM Lennart Poettering <mzxreary@0pointer.de> wrote: > > > > On Do, 22.08.24 13:42, Pingfan Liu (piliu@redhat.com) wrote: > > > > > On Wed, Aug 21, 2024 at 10:27 PM Lennart Poettering > > > <mzxreary@0pointer.de> wrote: > > > > > > > > On Mo, 19.08.24 22:53, Pingfan Liu (piliu@redhat.com) wrote: > > > > > > > > > *** Background *** > > > > > > > > > > As more PE format kernel images are introduced, it post challenge to kexec to > > > > > cope with the new format. > > > > > > > > > > In my attempt to add support for arm64 zboot image in the kernel [1], > > > > > Ard suggested using an emulator to tackle this issue. Last year, when > > > > > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > > > > > emulator approach again [3] > > > > > > > > Hmm, systemd's systemd-stub code tries to load certain "side-car" > > > > files placed next to the UKI, via the UEFI file system APIs. What's > > > > your intention with the UEFI emulator regarding that? The sidecars are > > > > somewhat important, because that's how we parameterize otherwise > > > > strictly sealed, immutable UKIs. > > > > > > > IIUC, you are referring to UKI addons. > > > > Yeah, UKI addons, as well as credential files, and sysext/confext > > DDIs. > > > > The addons are the most interesting btw, because we load them into > > memory as PE files, and ask the UEFI to authenticate them. > > > > > > Hence, what's the story there? implement some form of fs driver (for > > > > what fs precisely?) in the emulator too? > > > > > > > As for addon, that is a missing part in this series. I have overlooked > > > this issue. Originally, I thought that there was no need to implement > > > a disk driver and vfat file system, just preload them into memory, and > > > finally present them through the uefi API. I will take a closer look > > > at it and chew on it. > > > > It doesn't have to be VFAT btw. It just has to be something. For > > example, it might suffice to take these files, pack them up as cpio or > > so and pass them along with the UEFI execution. The UEFI emulator > > would then have to expose them as a file system then. > > > > We are not talking of a bazillion of files here, it's mostly a > > smallish number of sidecar files I'd expect. > > > Yes, I think about using <key, value>, where key is the file path, > value is the file content. > > > > > And regarding tpm? tpms require drivers and i guess at the moment uefi > > > > emulator would run those aren't available anymore? but we really > > > > should do a separator measurement then. (also there needs to be some > > > > way to pass over measurement log of that measurement?) > > > > > > It is a pity that it is a common issue persistent with kexec-reboot > > > kernel nowadays. > > > I am not familiar with TPM and have no clear idea for the time being. > > > (emulating Platform Configuration Registers ?). But since this > > > emulator is held inside a linux kernel image, and the UKI's signature > > > is checked during kexec_file_load. All of them are safe from > > > modification, this security is not an urgent issue. > > > > Hmm, I'd really think about this with some priority. The measurement > > stuff should not be an afterthought, it typically has major > > implications on how you design your transitions, because measurements > > of some component always need to happen *before* you pass control to > > it, otherwise they are pointless. > > > > OK, I will look into the details of TPM to see how to bail out. This issue is why I thought a different approach to the UEFI emulator might be useful: (1) On "kexec -l" execute the EFI binary inside the kernel as a kthread until it exits boot services and record all TPM measurements into a buffer (2) On "kexec -e" use the kernels tpm driver to actually perform all the prerecorded measurements. (3) Transition into a "purgatory" that will clean up the address space to make sure we get to an identity mapping. (4) Return control to the EFI app at the point it exited boot services Additional advantage is that we have filesystem access during (1) so it's simple to load additional sidecar files for the UKI. I have two questions: 1: Does the systemd-stub only perform measurements before exiting boot services or also afterwards? 2: Is it okay to just go to an identity mapping when boot services are exited or is the identidy mapping actually required for the entire execution of the EFI app (I know the UEFI spec calls for this, but I think it should be possible to clean up the address space in a purgatory)? I played around with this approach last year and got the start of the kernels EFI stub executing in a kthread and calling into provided boot services, but the difficult part is memory allocation and cleaning up the address space in a purgatory. > > Thanks, > > Pingfan > Kind regards, Jan
On Do, 22.08.24 13:42, Jan Hendrik Farr (kernel@jfarr.cc) wrote: > I have two questions: > > 1: Does the systemd-stub only perform measurements before exiting boot > services or also afterwards? Nope. we pass control to the kernel's own stub, and that calls EBS(), not systemd-stub. Hence, no, we are just measuring a things before EBS(), not after. Lennart -- Lennart Poettering, Berlin
On Thu, 22 Aug 2024 at 18:51, Pingfan Liu <piliu@redhat.com> wrote: > > On Thu, Aug 22, 2024 at 2:17 PM Dave Young <dyoung@redhat.com> wrote: > > > > On Thu, 22 Aug 2024 at 13:42, Pingfan Liu <piliu@redhat.com> wrote: > > > > > > On Wed, Aug 21, 2024 at 10:27 PM Lennart Poettering > > > <mzxreary@0pointer.de> wrote: > > > > > > > > On Mo, 19.08.24 22:53, Pingfan Liu (piliu@redhat.com) wrote: > > > > > > > > > *** Background *** > > > > > > > > > > As more PE format kernel images are introduced, it post challenge to kexec to > > > > > cope with the new format. > > > > > > > > > > In my attempt to add support for arm64 zboot image in the kernel [1], > > > > > Ard suggested using an emulator to tackle this issue. Last year, when > > > > > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > > > > > emulator approach again [3] > > > > > > > > Hmm, systemd's systemd-stub code tries to load certain "side-car" > > > > files placed next to the UKI, via the UEFI file system APIs. What's > > > > your intention with the UEFI emulator regarding that? The sidecars are > > > > somewhat important, because that's how we parameterize otherwise > > > > strictly sealed, immutable UKIs. > > > > > > > IIUC, you are referring to UKI addons. > > > > > > > Hence, what's the story there? implement some form of fs driver (for > > > > what fs precisely?) in the emulator too? > > > > > > > As for addon, that is a missing part in this series. I have overlooked > > > this issue. Originally, I thought that there was no need to implement > > > a disk driver and vfat file system, just preload them into memory, and > > > finally present them through the uefi API. I will take a closer look > > > at it and chew on it. > > > > > > > Hi Pingfan, > > > > If more and more stuff needs coming in, not only the limited boot > > services then it will be way too complicated and hard to maintain and > > debug, also the two kexec code paths are duplicated somehow. It is > > really bad.. > > > OK, I will try to keep things easier. And what do you mean about " two > kexec code paths"? I mean we have the EFI and non-EFI kexec implementation. Also for the EFI kexec code for X86 there is something we passed from 1st kernel to 2nd kernel due to no EFI firmware phase, anyway this part can be cleaned up if the emulator can be done gracefully. > > > I forgot why we can not just extract the kernel from UKI and then load > > it directly, if the embedded kernel is also signed it should be good? > > > > I think the main concern is about the signature. I thought for the minimum case of kdump, we may just live with kernel signed only and leave the initrd/cmdline unsigned. Anyway for kexec reboot it is a problem. > > Thanks, > > Pingfan >
On Thu, 22 Aug 2024 at 18:56, Jan Hendrik Farr <kernel@jfarr.cc> wrote: > > Hi Dave, > > > I forgot why we can not just extract the kernel from UKI and then load > > it directly, if the embedded kernel is also signed it should be good? > > The problem is that in the basic usecase for UKI you only sign the entire > UKI PE file and not the included kernel, because you only want that kernel > to be run with that one initrd and that one kernel cmdline. Hmm, as replied to Pinfan I thought that both the included kernel and UKI can be signed, and for kdump case kexec_file_load can be used simply. > > So at a minimum you have to have the signature on the whole UKI checked by > the kernel and than have the kernel extract UKI into its parts unless you > somehow want to extent trust into userspace to have a helper program do that. extend trust into userspace is hard, previously when Vivek created the kexec_file_load this has been explored and he gave up this option. :( Pingfan, nice to see you have something done as POC at least, and good to see this topic is live. I just have some worries about the complexity of the emulator though. Thanks Dave
On Thu, Aug 22, 2024 at 4:23 PM Lennart Poettering <mzxreary@0pointer.de> wrote: > > On Do, 22.08.24 13:42, Pingfan Liu (piliu@redhat.com) wrote: > > > On Wed, Aug 21, 2024 at 10:27 PM Lennart Poettering > > <mzxreary@0pointer.de> wrote: > > > > > > On Mo, 19.08.24 22:53, Pingfan Liu (piliu@redhat.com) wrote: > > > > > > > *** Background *** > > > > > > > > As more PE format kernel images are introduced, it post challenge to kexec to > > > > cope with the new format. > > > > > > > > In my attempt to add support for arm64 zboot image in the kernel [1], > > > > Ard suggested using an emulator to tackle this issue. Last year, when > > > > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > > > > emulator approach again [3] > > > > > > Hmm, systemd's systemd-stub code tries to load certain "side-car" > > > files placed next to the UKI, via the UEFI file system APIs. What's > > > your intention with the UEFI emulator regarding that? The sidecars are > > > somewhat important, because that's how we parameterize otherwise > > > strictly sealed, immutable UKIs. > > > > > IIUC, you are referring to UKI addons. > > Yeah, UKI addons, as well as credential files, and sysext/confext > DDIs. > > The addons are the most interesting btw, because we load them into > memory as PE files, and ask the UEFI to authenticate them. > > > > Hence, what's the story there? implement some form of fs driver (for > > > what fs precisely?) in the emulator too? > > > > > As for addon, that is a missing part in this series. I have overlooked > > this issue. Originally, I thought that there was no need to implement > > a disk driver and vfat file system, just preload them into memory, and > > finally present them through the uefi API. I will take a closer look > > at it and chew on it. > > It doesn't have to be VFAT btw. It just has to be something. For > example, it might suffice to take these files, pack them up as cpio or > so and pass them along with the UEFI execution. The UEFI emulator > would then have to expose them as a file system then. > > We are not talking of a bazillion of files here, it's mostly a > smallish number of sidecar files I'd expect. > > > > And regarding tpm? tpms require drivers and i guess at the moment uefi > > > emulator would run those aren't available anymore? but we really > > > should do a separator measurement then. (also there needs to be some > > > way to pass over measurement log of that measurement?) > > > > It is a pity that it is a common issue persistent with kexec-reboot > > kernel nowadays. > > I am not familiar with TPM and have no clear idea for the time being. > > (emulating Platform Configuration Registers ?). But since this > > emulator is held inside a linux kernel image, and the UKI's signature > > is checked during kexec_file_load. All of them are safe from > > modification, this security is not an urgent issue. > > Hmm, I'd really think about this with some priority. The measurement > stuff should not be an afterthought, it typically has major > implications on how you design your transitions, because measurements > of some component always need to happen *before* you pass control to > it, otherwise they are pointless. > At present, my emulator returns false to is_efi_secure_boot(), so systemd-stub does not care about the measurement, and moves on. Could you enlighten me about how systemd utilizes the measurement? I grepped 'TPM2_PCR_KERNEL_CONFIG', and saw the systemd-stub asks to extend PCR. But where is the value checked? I guess the systemd will hang if the check fails. Thanks, Pingfan
On Do, 22.08.24 22:29, Pingfan Liu (piliu@redhat.com) wrote: > > Hmm, I'd really think about this with some priority. The measurement > > stuff should not be an afterthought, it typically has major > > implications on how you design your transitions, because measurements > > of some component always need to happen *before* you pass control to > > it, otherwise they are pointless. > > > > At present, my emulator returns false to is_efi_secure_boot(), so > systemd-stub does not care about the measurement, and moves on. > > Could you enlighten me about how systemd utilizes the measurement? I > grepped 'TPM2_PCR_KERNEL_CONFIG', and saw the systemd-stub asks to > extend PCR. But where is the value checked? I guess the systemd will > hang if the check fails. systemd's "systemd-pcrlock" tool will look for measurements like that and generate disk encryption TPM policies from that. Lennart -- Lennart Poettering, Berlin
On Mon, 19 Aug 2024 at 16:55, Pingfan Liu <piliu@redhat.com> wrote: > > *** Background *** > > As more PE format kernel images are introduced, it post challenge to kexec to > cope with the new format. > > In my attempt to add support for arm64 zboot image in the kernel [1], > Ard suggested using an emulator to tackle this issue. Last year, when > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > emulator approach again [3] > > After discussion, Ard's approach seems to be a more promising solution > to handle PE format kernels once and for all. This series follows that > approach and implements an emulator to emulate EFI boot time services, > allowing the efistub kernel to self-extract and boot. > > Another year has passed, and UKI kernel is more and more frequently used > in product. I think it is time to pay effort to resolve this issue. > > > *** Overview of implement *** > The whole model consits of three parts: > > -1. The emulator > It is a self-relocatable PIC code, which is finally linked into kernel, but not > export any internal symbol to kernel. It mainly contains: a PE file parser, > which loads PE format kernel, a group of functions to emulate efi boot service. > > -2. inside kernel, PE-format loader > Its main task is to set up two extra kexec_segment, one for emulator, the other > for passing information from the first kernel to emulator. > > -3. set up identity mapping only for the memory used by the emulator. > Here it relies on kimage_alloc_control_pages() to get pages, which will not > stamped during the process of kexec relocate (cp from src to dst). And since the > mapping only covers a small range of memory, it cost small amount memory. > > > *** To do *** > > Currently, it only works on arm64 virt machine. For x86, it needs some slightly > changes. (I plan to do it in the next version) > > Also, this series does not implement a memory allocator, which I plan to > implement with the help of bitmap. > > About console, currently it hard code for arm64 virt machine, later it should > extract the information through ACPI table. > > For kdump code, it is not implmented yet. But it should share the majority of > this series. > > > *** Test of this series *** > I have tested this series on arm64 virt machine. There I booted the vmlinuz.efi > and kexec_file_load a UKI image, then switch to the second kernel. > > I used a modified kexec-tools [4], which just skips the check of the file format and passes the file directly to kernel. > > [1]: https://lore.kernel.org/linux-arm-kernel/ZBvKSis+dfnqa+Vz@piliu.users.ipa.redhat.com/T/#m42abb0ad3c10126b8b3bfae8a596deb707d6f76e > [2]: https://lore.kernel.org/lkml/20230918173607.421d2616@rotkaeppchen/T/ > [3]: https://lore.kernel.org/lkml/20230918173607.421d2616@rotkaeppchen/T/#mc60aa591cb7616ceb39e1c98f352383f9ba6e985 > [4]: https://github.com/pfliu/kexec-tools.git branch: kexec_uefi_emulator > > > RFCv1 -> RFCv2: > -1.Support to run UKI kernel by: add LoadImage() and StartImage(), add > PE file relocation support, add InstallMultiProtocol() > -2.Also set up idmap for EFI runtime memory descriptor since UKI's > systemd-stub calls runtime service > -3.Move kexec_pe_image.c from arch/arm64/kernel to kernel/, since it > aims to provide a more general architecture support. > > RFCv1: https://lore.kernel.org/linux-efi/20240718085759.13247-1-piliu@redhat.com/ > RFCv2: https://github.com/pfliu/linux.git branch kexec_uefi_emulator_RFCv2 > > Cc: Ard Biesheuvel <ardb@kernel.org> > Cc: Jan Hendrik Farr <kernel@jfarr.cc> > Cc: Philipp Rudo <prudo@redhat.com> > Cc: Lennart Poettering <mzxreary@0pointer.de> > Cc: Jarkko Sakkinen <jarkko@kernel.org> > Cc: Eric Biederman <ebiederm@xmission.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Dave Young <dyoung@redhat.com> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: kexec@lists.infradead.org > Cc: linux-efi@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > > > > Pingfan Liu (9): > efi/libstub: Ask efi_random_alloc() to skip unusable memory > efi/libstub: Complete efi_simple_text_output_protocol > efi/emulator: Initial rountines to emulate EFI boot time service > efi/emulator: Turn on mmu for arm64 > kexec: Introduce kexec_pe_image to parse and load PE file > arm64: kexec: Introduce a new member param_mem to kimage_arch > arm64: mm: Change to prototype of > arm64: kexec: Prepare page table for emulator > arm64: kexec: Enable kexec_pe_image > Thanks for putting this RFC together. This is useful work, and gives us food for thought and discussion. There are a few problems that become apparent when going through these changes. 1. Implementing UEFI entirely is intractable, and unnecessary. Implementing the subset of UEFI that is actually needed to boot Linux *is* tractable, though, but we need to work together to write this down somewhere. - the EFI stub needs the boot services for the EFI memory map and the allocation routines - GRUB needs block I/O - systemd-stub/UKI needs file I/O to look for sidecars - etc etc I implemented a Rust 'efiloader' crate a while ago that encapsulates most of this (it can boot Linux/arm64 on QEMU and boot x86 via GRUB in user space **). Adding file I/O to this should be straight-forward - as Lennart points out, we only need the protocol, it doesn't need to be backed by an actual file system, it just needs to be able to expose other files in the right way. 2. Running the UEFI emulator on bare metal is not going to scale. Cloning UART driver code and MMU code etc is a can of worms that you want to leave closed. And as Lennart points out, there is other hardware (TPM) that needs to be accessible as well. Providing a separate set of drivers for all hardware that the EFI emulator may need to access is not a tractable problem either. The fix for this, as I see it, is to run the EFI emulator in user space, to the point where the payload calls ExitBootServices(). This will allow all I/O and memory protocol to be implemented trivially, using C library routines. I have a crude prototype** of this running to the point where ExitBootServices() is called (and then it crashes). The tricky yet interesting bit here is how we migrate a chunk of user space memory to the bare metal context that will be created by the kexec syscall later (in which the call to ExitBootServices() would return and proceed with the boot). But the principle is rather straight-forward, and would permit us, e.g., to kexec an OS installer too. 3. We need to figure out how to support TPM and PCRs in the context of kexec. This is a fundamental issue with verified boot, given that the kexec PCR state is necessarily different from the boot state, and so we cannot reuse the TPM directly if we want to pretend that we are doing an ordinary boot in kexec. The alternative is to leave the TPM in a state where the kexec kernel can access its sealed secrets, and mock up the TCG2 EFI protocols using a shim that sits between the TPM hardware (as the real TCG2 protocols will be long gone) and the EFI payload. But as I said, this is a fundamental issue, as the ability to pretend that a kexec boot is a pristine boot would mean that verified boot is broken. As future work, I'd like to propose to collaborate on some alignment regarding a UEFI baseline for Linux, i.e., the parts that we actually need to boot Linux. For this series in particular, I don't see a way forward where we adopt this approach, and carry all this code inside the kernel. Thanks. Ard.
On Thu, Aug 29, 2024 at 1:08 AM Ard Biesheuvel <ardb@kernel.org> wrote: > [...] Hi Ard, Thanks for sharing your insight and thoughts. > > Thanks for putting this RFC together. This is useful work, and gives > us food for thought and discussion. > > There are a few problems that become apparent when going through these changes. > > 1. Implementing UEFI entirely is intractable, and unnecessary. > Implementing the subset of UEFI that is actually needed to boot Linux > *is* tractable, though, but we need to work together to write this > down somewhere. > - the EFI stub needs the boot services for the EFI memory map and > the allocation routines > - GRUB needs block I/O > - systemd-stub/UKI needs file I/O to look for sidecars > - etc etc > > I implemented a Rust 'efiloader' crate a while ago that encapsulates > most of this (it can boot Linux/arm64 on QEMU and boot x86 via GRUB in > user space **). Adding file I/O to this should be straight-forward - > as Lennart points out, we only need the protocol, it doesn't need to > be backed by an actual file system, it just needs to be able to expose > other files in the right way. > > 2. Running the UEFI emulator on bare metal is not going to scale. > Cloning UART driver code and MMU code etc is a can of worms that you > want to leave closed. And as Lennart points out, there is other As for MMU code, if the 1st kernel does not turn it off, it can be eliminated from the emulator code, which should not be hard to implement on arm64. And already done in x86. > hardware (TPM) that needs to be accessible as well. Providing a > separate set of drivers for all hardware that the EFI emulator may > need to access is not a tractable problem either. > > The fix for this, as I see it, is to run the EFI emulator in user > space, to the point where the payload calls ExitBootServices(). This > will allow all I/O and memory protocol to be implemented trivially, > using C library routines. I have a crude prototype** of this running Yes, that is a definitely promising method, By this way, we can handle device operations more elegantly. In fact, I used it to develop and debug part of my emulator service code. But when debugging x86 efi-stub, I encounter some problem with the privileged instruction, which causes segment fault. It originates from kaslr_get_random_long(). I think it can be worked around by emulating the instruction if the instruction reads the system state. But if the instruction tries to update system state, it can not be fixed since the system is still owned by the kernel instead of owned by the emulator exclusively. So here we need another agreement on the stub's behavior before ExitBootServices(). > to the point where ExitBootServices() is called (and then it crashes). > The tricky yet interesting bit here is how we migrate a chunk of user > space memory to the bare metal context that will be created by the > kexec syscall later (in which the call to ExitBootServices() would > return and proceed with the boot). But the principle is rather > straight-forward, and would permit us, e.g., to kexec an OS installer > too. > > 3. We need to figure out how to support TPM and PCRs in the context of > kexec. This is a fundamental issue with verified boot, given that the > kexec PCR state is necessarily different from the boot state, and so > we cannot reuse the TPM directly if we want to pretend that we are > doing an ordinary boot in kexec. The alternative is to leave the TPM Here, I miss the big picture. Could you enlighten me more about this? As I thought, the linux kernel will not lock itself down onto a specific firmware. So the trust is one direction, i.e. from bootloader to kernel. In UKI case, systemd-stub takes the measurement and extends the PCR 11/12/13 as in https://uapi-group.org/specifications/specs/linux_tpm_pcr_registry/ Later systemd-pcrlock appraises the value in those registers. If the sections in UKI are intact, the kexec reboot will go smoothly. > in a state where the kexec kernel can access its sealed secrets, and > mock up the TCG2 EFI protocols using a shim that sits between the TPM > hardware (as the real TCG2 protocols will be long gone) and the EFI > payload. But as I said, this is a fundamental issue, as the ability to > pretend that a kexec boot is a pristine boot would mean that verified > boot is broken. > > > As future work, I'd like to propose to collaborate on some alignment > regarding a UEFI baseline for Linux, i.e., the parts that we actually > need to boot Linux. > Do you mean that user space code and kernel code? And I think for the user space code, it should be better to integrate the code in kexec-tools so that we have a uniform interface for kexec boot. Looking forward to the collaboration to make kexec able to boot UKI soon. Thanks, Pingfan
Hi Ard, Hi Jan, On Wed, 28 Aug 2024 19:08:14 +0200 Ard Biesheuvel <ardb@kernel.org> wrote: [...] > Thanks for putting this RFC together. This is useful work, and gives > us food for thought and discussion. > > There are a few problems that become apparent when going through these changes. > > 1. Implementing UEFI entirely is intractable, and unnecessary. > Implementing the subset of UEFI that is actually needed to boot Linux > *is* tractable, though, but we need to work together to write this > down somewhere. > - the EFI stub needs the boot services for the EFI memory map and > the allocation routines > - GRUB needs block I/O > - systemd-stub/UKI needs file I/O to look for sidecars > - etc etc > > I implemented a Rust 'efiloader' crate a while ago that encapsulates > most of this (it can boot Linux/arm64 on QEMU and boot x86 via GRUB in > user space **). Adding file I/O to this should be straight-forward - > as Lennart points out, we only need the protocol, it doesn't need to > be backed by an actual file system, it just needs to be able to expose > other files in the right way. > > 2. Running the UEFI emulator on bare metal is not going to scale. > Cloning UART driver code and MMU code etc is a can of worms that you > want to leave closed. And as Lennart points out, there is other > hardware (TPM) that needs to be accessible as well. Providing a > separate set of drivers for all hardware that the EFI emulator may > need to access is not a tractable problem either. > > The fix for this, as I see it, is to run the EFI emulator in user > space, to the point where the payload calls ExitBootServices(). This > will allow all I/O and memory protocol to be implemented trivially, > using C library routines. I have a crude prototype** of this running > to the point where ExitBootServices() is called (and then it crashes). > The tricky yet interesting bit here is how we migrate a chunk of user > space memory to the bare metal context that will be created by the > kexec syscall later (in which the call to ExitBootServices() would > return and proceed with the boot). But the principle is rather > straight-forward, and would permit us, e.g., to kexec an OS installer > too. I mostly agree on what you have wrote. But I see a big problem in running the EFI emulator in user space when it comes to secure boot. The chain of trust ends in the kernel. So it's the kernel that needs to verify that the image to be loaded can be trusted. But when the EFI runtime is in user space the kernel simply cannot do that. Which means, if we want to go this way, we would need to extend the chain of trust to user space. Which will be a whole bucket of worms, not just a can. That's why I tend more to Jan's suggestion to include the EFI runtime in the kernel. Alas, that comes with it's own problem, as that requires to run code in the kernel that was never intended to run in kernel context. So even when we can trust the code not to be malicious, we cannot trust it to not accidentally change the system state in a way the kernel doesn't expect... Let me throw an other wild idea in the ring. Instead of implementing a EFI runtime we could also include a eBPF version of the stub into the images. kexec could then extract the eBPF program and let it run just like any other eBPF program with all the pros (and cons) that come with it. That won't be as generic as the EFI runtime, e.g. you couldn't simply kexec any OS installer. On the other hand it would make it easier to port UKIs et al. to non-EFI systems. What do you think? Thanks Philipp > 3. We need to figure out how to support TPM and PCRs in the context of > kexec. This is a fundamental issue with verified boot, given that the > kexec PCR state is necessarily different from the boot state, and so > we cannot reuse the TPM directly if we want to pretend that we are > doing an ordinary boot in kexec. The alternative is to leave the TPM > in a state where the kexec kernel can access its sealed secrets, and > mock up the TCG2 EFI protocols using a shim that sits between the TPM > hardware (as the real TCG2 protocols will be long gone) and the EFI > payload. But as I said, this is a fundamental issue, as the ability to > pretend that a kexec boot is a pristine boot would mean that verified > boot is broken. > > > As future work, I'd like to propose to collaborate on some alignment > regarding a UEFI baseline for Linux, i.e., the parts that we actually > need to boot Linux. > > For this series in particular, I don't see a way forward where we > adopt this approach, and carry all this code inside the kernel. > > Thanks. > Ard. >
On Fri Sep 6, 2024 at 1:54 PM EEST, Philipp Rudo wrote: > Let me throw an other wild idea in the ring. Instead of implementing > a EFI runtime we could also include a eBPF version of the stub into the > images. kexec could then extract the eBPF program and let it run just > like any other eBPF program with all the pros (and cons) that come with > it. That won't be as generic as the EFI runtime, e.g. you couldn't > simply kexec any OS installer. On the other hand it would make it > easier to port UKIs et al. to non-EFI systems. What do you think? BPF would have some guarantees that are favorable such as programs always end, even faulty ones. It always has implicit "ExitBootServices". Just a remark. BR, Jarkko
On Sat Sep 7, 2024 at 2:27 PM EEST, Jarkko Sakkinen wrote: > On Fri Sep 6, 2024 at 1:54 PM EEST, Philipp Rudo wrote: > > Let me throw an other wild idea in the ring. Instead of implementing > > a EFI runtime we could also include a eBPF version of the stub into the > > images. kexec could then extract the eBPF program and let it run just > > like any other eBPF program with all the pros (and cons) that come with > > it. That won't be as generic as the EFI runtime, e.g. you couldn't > > simply kexec any OS installer. On the other hand it would make it > > easier to port UKIs et al. to non-EFI systems. What do you think? > > BPF would have some guarantees that are favorable such as programs > always end, even faulty ones. It always has implicit "ExitBootServices". > > Just a remark. Some days ago I was thinking could some of the kernel functionality be eBPF at least like in formal theory because most of it is amortized, i.e. does a fixed chunk of work. Not going into that rabbit hole but I really like this idea and could be good experimentation ground for such innovation. BR, Jarkko
On Sat Sep 7, 2024 at 2:31 PM EEST, Jarkko Sakkinen wrote: > On Sat Sep 7, 2024 at 2:27 PM EEST, Jarkko Sakkinen wrote: > > On Fri Sep 6, 2024 at 1:54 PM EEST, Philipp Rudo wrote: > > > Let me throw an other wild idea in the ring. Instead of implementing > > > a EFI runtime we could also include a eBPF version of the stub into the > > > images. kexec could then extract the eBPF program and let it run just > > > like any other eBPF program with all the pros (and cons) that come with > > > it. That won't be as generic as the EFI runtime, e.g. you couldn't > > > simply kexec any OS installer. On the other hand it would make it > > > easier to port UKIs et al. to non-EFI systems. What do you think? > > > > BPF would have some guarantees that are favorable such as programs > > always end, even faulty ones. It always has implicit "ExitBootServices". > > > > Just a remark. > > Some days ago I was thinking could some of the kernel functionality be > eBPF at least like in formal theory because most of it is amortized, > i.e. does a fixed chunk of work. Not going into that rabbit hole but > I really like this idea and could be good experimentation ground for > such innovation. E.g. let's imagine there would imaginary eBPF-TPM driver framework. How I would go doing that would be to take the existing TPM driver functionality and provide extra functions and resources available for subsystem specific BPF environment, and have the orhestration code as eBPF. I pretty much concluded that there is a chance that such could work out. Not something in my immediate table but it is still really interesting idea, as instead of using language to separate "safe" and unsafe" regions you would use "VM" environments to create the walls. In the end of the day that would also great venture for Rust in kernel, i.e. compile that BPF from Rust. Sorry going of the hook the comment triggered me ;-) BR, Jarkko
On Fr, 06.09.24 12:54, Philipp Rudo (prudo@redhat.com) wrote: > I mostly agree on what you have wrote. But I see a big problem in > running the EFI emulator in user space when it comes to secure boot. > The chain of trust ends in the kernel. So it's the kernel that needs to > verify that the image to be loaded can be trusted. But when the EFI > runtime is in user space the kernel simply cannot do that. Which means, > if we want to go this way, we would need to extend the chain of trust > to user space. Which will be a whole bucket of worms, not just a > can. May it would be nice to have a way to "zap" userspace away, i.e. allow the kernel to get rid of all processes in some way, reliable. And then simply start a new userspace, from a trusted definition. Or in other words: if you don't want to trust the usual userspace, then let's maybe just terminate it, and create it anew, with a clean, pristine definition the old userspace cannot get access to. > Let me throw an other wild idea in the ring. Instead of implementing > a EFI runtime we could also include a eBPF version of the stub into the > images. kexec could then extract the eBPF program and let it run just > like any other eBPF program with all the pros (and cons) that come with > it. That won't be as generic as the EFI runtime, e.g. you couldn't > simply kexec any OS installer. On the other hand it would make it > easier to port UKIs et al. to non-EFI systems. What do you think? ebpf is not turing complete, I am not sure how far you will make it with this, in the various implementations of EFI payloads there are plenty of loops, sometimes IO loops, sometimes hash loops of huge data (for measurements). As I understand ebpf is not really compatible such code. Lennart -- Lennart Poettering, Berlin
On 09 11:48:30, Lennart Poettering wrote: > On Fr, 06.09.24 12:54, Philipp Rudo (prudo@redhat.com) wrote: > > > I mostly agree on what you have wrote. But I see a big problem in > > running the EFI emulator in user space when it comes to secure boot. > > The chain of trust ends in the kernel. So it's the kernel that needs to > > verify that the image to be loaded can be trusted. But when the EFI > > runtime is in user space the kernel simply cannot do that. Which means, > > if we want to go this way, we would need to extend the chain of trust > > to user space. Which will be a whole bucket of worms, not just a > > can. > > May it would be nice to have a way to "zap" userspace away, i.e. allow > the kernel to get rid of all processes in some way, reliable. And then > simply start a new userspace, from a trusted definition. Or in other > words: if you don't want to trust the usual userspace, then let's > maybe just terminate it, and create it anew, with a clean, pristine > definition the old userspace cannot get access to. Well, this is an interesting idea! However, I'm sceptical if this could be done in a secure way. How do we ensure that nothing the old userspace did with the various interfaces to the kernel has no impact on the new userspace? Maybe others can chime in on this? Does kernel_lockdown give more guarantees related to this? Even if this is possible in a secure way, there is a problem with doing this for kernels that are to be kexec'd on kernel panic. In this approach we can't pre-run them until EBS(), so we would rely on the old kernel to still be intact when we want to kexec reboot. You could do a system where you kexec into an intermediate kernel. That kernel get's kexec'd with a signed initrd that can use the normal kexec_load syscall to load do any kind of preparation in userspace. Problem: For that intermediate enviroment we already need a format that combines kernel image, initrd, cmdline all signed in one package aka UKI. Was it the chicken or the egg? But this shows that if we implemented UKIs the easy way (kernel simply checks signature, extracts the pieces, and kexecs them like normal), this approach could always be used to support kexec for other future formats. They could use the kernels UKI support to boot into an intermediate kernel with UEFI implemented in userspace in the initrd. So basically support UKIs the easy way and use them to be able to securely zap away userspace and start with a fresh kernel and signed userspace as a way to support other UEFI formats that are not UKI. > > > Let me throw an other wild idea in the ring. Instead of implementing > > a EFI runtime we could also include a eBPF version of the stub into the > > images. kexec could then extract the eBPF program and let it run just > > like any other eBPF program with all the pros (and cons) that come with > > it. That won't be as generic as the EFI runtime, e.g. you couldn't > > simply kexec any OS installer. On the other hand it would make it > > easier to port UKIs et al. to non-EFI systems. What do you think? > > ebpf is not turing complete, I am not sure how far you will make it > with this, in the various implementations of EFI payloads there are > plenty of loops, sometimes IO loops, sometimes hash loops of huge data > (for measurements). As I understand ebpf is not really compatible such > code. > > Lennart > > -- > Lennart Poettering, Berlin
Hi Lennart, I spent some time understanding the systemd-pcrlock and TPM stuff, and got some idea about it. Could you correct me if I'm wrong? Please see the following comments inlined. On Mon, Aug 26, 2024 at 9:40 PM Lennart Poettering <mzxreary@0pointer.de> wrote: > > On Do, 22.08.24 22:29, Pingfan Liu (piliu@redhat.com) wrote: > > > > Hmm, I'd really think about this with some priority. The measurement > > > stuff should not be an afterthought, it typically has major > > > implications on how you design your transitions, because measurements > > > of some component always need to happen *before* you pass control to > > > it, otherwise they are pointless. > > > > > > > At present, my emulator returns false to is_efi_secure_boot(), so > > systemd-stub does not care about the measurement, and moves on. > > > > Could you enlighten me about how systemd utilizes the measurement? I > > grepped 'TPM2_PCR_KERNEL_CONFIG', and saw the systemd-stub asks to > > extend PCR. But where is the value checked? I guess the systemd will > > hang if the check fails. > > systemd's "systemd-pcrlock" tool will look for measurements like that > and generate disk encryption TPM policies from that. > Before kexec reboots to the new kernel systemd-pcrlock can predict the expected PCR value and store it in the file system. One thing should be noticed is that PCR value can not be affected. And kexec rebooting happens. systemd-stub extends the PCR value. When the system is up, systemd checks the real PCR value against the expected value rendered by systemd-pcrlock? If matching, all related policies succeed. Do I understand correctly? Thanks, Pingfan
Hi Lennart, Hi Jan, On Mon, 9 Sep 2024 12:42:45 +0200 Jan Hendrik Farr <kernel@jfarr.cc> wrote: > On 09 11:48:30, Lennart Poettering wrote: > > On Fr, 06.09.24 12:54, Philipp Rudo (prudo@redhat.com) wrote: > > > > > I mostly agree on what you have wrote. But I see a big problem in > > > running the EFI emulator in user space when it comes to secure boot. > > > The chain of trust ends in the kernel. So it's the kernel that needs to > > > verify that the image to be loaded can be trusted. But when the EFI > > > runtime is in user space the kernel simply cannot do that. Which means, > > > if we want to go this way, we would need to extend the chain of trust > > > to user space. Which will be a whole bucket of worms, not just a > > > can. > > > > May it would be nice to have a way to "zap" userspace away, i.e. allow > > the kernel to get rid of all processes in some way, reliable. And then > > simply start a new userspace, from a trusted definition. Or in other > > words: if you don't want to trust the usual userspace, then let's > > maybe just terminate it, and create it anew, with a clean, pristine > > definition the old userspace cannot get access to. > > Well, this is an interesting idea! > > However, I'm sceptical if this could be done in a secure way. How do we > ensure that nothing the old userspace did with the various interfaces to > the kernel has no impact on the new userspace? Maybe others can chime in > on this? Does kernel_lockdown give more guarantees related to this? > > Even if this is possible in a secure way, there is a problem with doing > this for kernels that are to be kexec'd on kernel panic. In this > approach we can't pre-run them until EBS(), so we would rely on the old > kernel to still be intact when we want to kexec reboot. I don't believe there's a way to do that on running kernels. As Jan pointed out, this cannot be done during reboot, as for kdump that would mean to run after a panic. So it would need to run when the new image is loaded. But at that time your user space is running. Plus you also always have a user space component that triggers kexec. So you cannot simply "zap" user space but have to somehow stash it away, run your trusted user space and, then restore the old user space again. That sounds pretty error prone to me. Plus it will tank your performance every time you do a kexec, which for kdump is every boot... > You could do a system where you kexec into an intermediate kernel. That > kernel get's kexec'd with a signed initrd that can use the normal > kexec_load syscall to load do any kind of preparation in userspace. > Problem: For that intermediate enviroment we already need a format > that combines kernel image, initrd, cmdline all signed in one package > aka UKI. Was it the chicken or the egg? > > But this shows that if we implemented UKIs the easy way (kernel simply > checks signature, extracts the pieces, and kexecs them like normal), > this approach could always be used to support kexec for other future > formats. They could use the kernels UKI support to boot into an > intermediate kernel with UEFI implemented in userspace in the initrd. > > So basically support UKIs the easy way and use them to be able to > securely zap away userspace and start with a fresh kernel and signed > userspace as a way to support other UEFI formats that are not UKI. Well, in theory that should work. But I see several problems: 1) How does the first kernel tell the intermediate kernel which file(s) with wich command line to load? In fact, how does the first kernel get the information itself? You would need a new system call that takes two kernel images, one for the intermediate and one for the kernel to load,for that. Of course you could also build the intermediate UKI during kernel build and include it into the image. Similar to what is done with the purgatory. But that would totally bloat the kernel image. 2) I expect that to be extremely painful to debug, if the intermediate kernel runs into a panic. For sure kdump won't work in that case... 3) Distros would need maintain and test the additional UKI. 4) This approach basically needs to boot twice. But there are people out there who fight to reduce boot times extremely hard. For them every millisecond counts. Telling them that they will need to wait twice as long will be very hard to sell. > > > > > Let me throw an other wild idea in the ring. Instead of implementing > > > a EFI runtime we could also include a eBPF version of the stub into the > > > images. kexec could then extract the eBPF program and let it run just > > > like any other eBPF program with all the pros (and cons) that come with > > > it. That won't be as generic as the EFI runtime, e.g. you couldn't > > > simply kexec any OS installer. On the other hand it would make it > > > easier to port UKIs et al. to non-EFI systems. What do you think? > > > > ebpf is not turing complete, I am not sure how far you will make it > > with this, in the various implementations of EFI payloads there are > > plenty of loops, sometimes IO loops, sometimes hash loops of huge data > > (for measurements). As I understand ebpf is not really compatible such > > code. I don't believe we can simply take all those payloads and recompile them to eBPF. There definitely needs to be some refactoring done first. For example the IO loops you can drop for eBPF and simply map to the corresponding kernel function, letting them do the full IO in one go. There will be cases where that will be more difficult like for hash loops when you have to have the same hash at the end. But I believe even for that ways could be found to get it to work. Anyway, I'm sure that the picture I have in my head is way oversimplified. There will be many pitfalls to handle for sure. Still I believe it would be a nice experiment. Thanks Philipp
Hi Jarkko, On Sat, 07 Sep 2024 14:41:38 +0300 "Jarkko Sakkinen" <jarkko@kernel.org> wrote: > On Sat Sep 7, 2024 at 2:31 PM EEST, Jarkko Sakkinen wrote: > > On Sat Sep 7, 2024 at 2:27 PM EEST, Jarkko Sakkinen wrote: > > > On Fri Sep 6, 2024 at 1:54 PM EEST, Philipp Rudo wrote: > > > > Let me throw an other wild idea in the ring. Instead of implementing > > > > a EFI runtime we could also include a eBPF version of the stub into the > > > > images. kexec could then extract the eBPF program and let it run just > > > > like any other eBPF program with all the pros (and cons) that come with > > > > it. That won't be as generic as the EFI runtime, e.g. you couldn't > > > > simply kexec any OS installer. On the other hand it would make it > > > > easier to port UKIs et al. to non-EFI systems. What do you think? > > > > > > BPF would have some guarantees that are favorable such as programs > > > always end, even faulty ones. It always has implicit "ExitBootServices". > > > > > > Just a remark. > > > > Some days ago I was thinking could some of the kernel functionality be > > eBPF at least like in formal theory because most of it is amortized, > > i.e. does a fixed chunk of work. Not going into that rabbit hole but > > I really like this idea and could be good experimentation ground for > > such innovation. > > E.g. let's imagine there would imaginary eBPF-TPM driver framework. > > How I would go doing that would be to take the existing TPM driver > functionality and provide extra functions and resources available for > subsystem specific BPF environment, and have the orhestration code as > eBPF. I pretty much concluded that there is a chance that such could > work out. > > Not something in my immediate table but it is still really interesting > idea, as instead of using language to separate "safe" and unsafe" > regions you would use "VM" environments to create the walls. In the > end of the day that would also great venture for Rust in kernel, i.e. > compile that BPF from Rust. > > Sorry going of the hook the comment triggered me ;-) I'm glad you like the idea :-) Sounds like an interesting idea you are having there! Thanks Philipp
On Mon, 9 Sept 2024 at 15:49, Philipp Rudo <prudo@redhat.com> wrote: > > Hi Lennart, > Hi Jan, > > On Mon, 9 Sep 2024 12:42:45 +0200 > Jan Hendrik Farr <kernel@jfarr.cc> wrote: > > > On 09 11:48:30, Lennart Poettering wrote: > > > On Fr, 06.09.24 12:54, Philipp Rudo (prudo@redhat.com) wrote: > > > > > > > I mostly agree on what you have wrote. But I see a big problem in > > > > running the EFI emulator in user space when it comes to secure boot. > > > > The chain of trust ends in the kernel. So it's the kernel that needs to > > > > verify that the image to be loaded can be trusted. But when the EFI > > > > runtime is in user space the kernel simply cannot do that. Which means, > > > > if we want to go this way, we would need to extend the chain of trust > > > > to user space. Which will be a whole bucket of worms, not just a > > > > can. > > > > > > May it would be nice to have a way to "zap" userspace away, i.e. allow > > > the kernel to get rid of all processes in some way, reliable. And then > > > simply start a new userspace, from a trusted definition. Or in other > > > words: if you don't want to trust the usual userspace, then let's > > > maybe just terminate it, and create it anew, with a clean, pristine > > > definition the old userspace cannot get access to. > > > > Well, this is an interesting idea! > > > > However, I'm sceptical if this could be done in a secure way. How do we > > ensure that nothing the old userspace did with the various interfaces to > > the kernel has no impact on the new userspace? Maybe others can chime in > > on this? Does kernel_lockdown give more guarantees related to this? > > > > Even if this is possible in a secure way, there is a problem with doing > > this for kernels that are to be kexec'd on kernel panic. In this > > approach we can't pre-run them until EBS(), so we would rely on the old > > kernel to still be intact when we want to kexec reboot. > > I don't believe there's a way to do that on running kernels. As Jan > pointed out, this cannot be done during reboot, as for kdump that would > mean to run after a panic. So it would need to run when the new image > is loaded. But at that time your user space is running. Plus you also > always have a user space component that triggers kexec. So you cannot > simply "zap" user space but have to somehow stash it away, run your > trusted user space and, then restore the old user space again. That > sounds pretty error prone to me. Plus it will tank your performance > every time you do a kexec, which for kdump is every boot... > kdump has a kexec kernel 'standby' to launch when the kernel panics. So for the UKI/EFI payload case, this would imply that the load involves running the payload until EBS() and freezing the state. Whether execution occurs in true user space or in a deprivileged kernel context is an implementation detail, imho. We don't want to run external code in privileged mode inside the kernel in any case, as this would violate lockdown already. But it should be feasible to have a EFI compatible layer in the kernel that invokes the EFI entrypoint of an image in a way that protects the host kernel. This could be user mode on the CPU or perhaps a minimal KVM virtual machine. The advantage of this approach is that the whole concept of purgatory can be avoided - the EFI boot phase runs in parallel with the previous kernel, which has full control over authentication and [emulated] PCR externsion, and has ultimate control over whether the kexec reboot is permitted. > > You could do a system where you kexec into an intermediate kernel. That > > kernel get's kexec'd with a signed initrd that can use the normal > > kexec_load syscall to load do any kind of preparation in userspace. > > Problem: For that intermediate enviroment we already need a format > > that combines kernel image, initrd, cmdline all signed in one package > > aka UKI. Was it the chicken or the egg? > > > > But this shows that if we implemented UKIs the easy way (kernel simply > > checks signature, extracts the pieces, and kexecs them like normal), > > this approach could always be used to support kexec for other future > > formats. They could use the kernels UKI support to boot into an > > intermediate kernel with UEFI implemented in userspace in the initrd. > > > > So basically support UKIs the easy way and use them to be able to > > securely zap away userspace and start with a fresh kernel and signed > > userspace as a way to support other UEFI formats that are not UKI. > > Well, in theory that should work. But I see several problems: > > 1) How does the first kernel tell the intermediate kernel which > file(s) with wich command line to load? In fact, how does the first > kernel get the information itself? You would need a new system call > that takes two kernel images, one for the intermediate and one for the > kernel to load,for that. > > Of course you could also build the intermediate UKI during kernel build > and include it into the image. Similar to what is done with the > purgatory. But that would totally bloat the kernel image. > > 2) I expect that to be extremely painful to debug, if the intermediate > kernel runs into a panic. For sure kdump won't work in that case... > > 3) Distros would need maintain and test the additional UKI. > > 4) This approach basically needs to boot twice. But there are people > out there who fight to reduce boot times extremely hard. For them every > millisecond counts. Telling them that they will need to wait twice as > long will be very hard to sell. > I don't think intermediate kernels are the solution here. We need to run as much as possible under the control of the preceding kernel, and minimize the bare metal handover that occurs after EBS(). Adding more code to the purgatory (as this series does) is not acceptable to me, as it is extremely difficult to debug, and duplicates drivers and other logic (making it an 'intermediate kernel' of sorts already) > > > > > > > Let me throw an other wild idea in the ring. Instead of implementing > > > > a EFI runtime we could also include a eBPF version of the stub into the > > > > images. kexec could then extract the eBPF program and let it run just > > > > like any other eBPF program with all the pros (and cons) that come with > > > > it. That won't be as generic as the EFI runtime, e.g. you couldn't > > > > simply kexec any OS installer. On the other hand it would make it > > > > easier to port UKIs et al. to non-EFI systems. What do you think? > > > > > > ebpf is not turing complete, I am not sure how far you will make it > > > with this, in the various implementations of EFI payloads there are > > > plenty of loops, sometimes IO loops, sometimes hash loops of huge data > > > (for measurements). As I understand ebpf is not really compatible such > > > code. > > I don't believe we can simply take all those payloads and recompile > them to eBPF. There definitely needs to be some refactoring done first. > For example the IO loops you can drop for eBPF and simply map to the > corresponding kernel function, letting them do the full IO in one go. > There will be cases where that will be more difficult like for hash > loops when you have to have the same hash at the end. But I believe > even for that ways could be found to get it to work. > > Anyway, I'm sure that the picture I have in my head is way > oversimplified. There will be many pitfalls to handle for sure. Still I > believe it would be a nice experiment. > Today, UKI functionality is implemented in terms of EFI API calls. Any solution that needs either a parallel implementation (eBPF vs EFI) or needs to unpack the UKI in order to perform the steps that the UKI would perform itself if it were executed in an EFI environment is a no-go in my opinion. So either we provide some EFI compatible runtime sufficient to run a UKI, or we re-engineer UKI to be built on top of an abstraction that can be implemented straight-forwardly both on system firmware and in the EFI context.
On 09 16:04:50, Ard Biesheuvel wrote: > > [...] > > kdump has a kexec kernel 'standby' to launch when the kernel panics. > So for the UKI/EFI payload case, this would imply that the load > involves running the payload until EBS() and freezing the state. > > Whether execution occurs in true user space or in a deprivileged > kernel context is an implementation detail, imho. We don't want to run > external code in privileged mode inside the kernel in any case, as > this would violate lockdown already. But it should be feasible to have > a EFI compatible layer in the kernel that invokes the EFI entrypoint > of an image in a way that protects the host kernel. This could be user > mode on the CPU or perhaps a minimal KVM virtual machine. This solution is what I'm currently in favor of (besides my original approach), see: https://lore.kernel.org/kexec/Zt7EbvWjF9WPCYfn@gardel-login/T/#md4f02b7cb6c694cb28aa8d36fe47a02bd4dc17a4
On Mon Sep 9, 2024 at 4:55 PM EEST, Philipp Rudo wrote: > Hi Jarkko, > > On Sat, 07 Sep 2024 14:41:38 +0300 > "Jarkko Sakkinen" <jarkko@kernel.org> wrote: > > > On Sat Sep 7, 2024 at 2:31 PM EEST, Jarkko Sakkinen wrote: > > > On Sat Sep 7, 2024 at 2:27 PM EEST, Jarkko Sakkinen wrote: > > > > On Fri Sep 6, 2024 at 1:54 PM EEST, Philipp Rudo wrote: > > > > > Let me throw an other wild idea in the ring. Instead of implementing > > > > > a EFI runtime we could also include a eBPF version of the stub into the > > > > > images. kexec could then extract the eBPF program and let it run just > > > > > like any other eBPF program with all the pros (and cons) that come with > > > > > it. That won't be as generic as the EFI runtime, e.g. you couldn't > > > > > simply kexec any OS installer. On the other hand it would make it > > > > > easier to port UKIs et al. to non-EFI systems. What do you think? > > > > > > > > BPF would have some guarantees that are favorable such as programs > > > > always end, even faulty ones. It always has implicit "ExitBootServices". > > > > > > > > Just a remark. > > > > > > Some days ago I was thinking could some of the kernel functionality be > > > eBPF at least like in formal theory because most of it is amortized, > > > i.e. does a fixed chunk of work. Not going into that rabbit hole but > > > I really like this idea and could be good experimentation ground for > > > such innovation. > > > > E.g. let's imagine there would imaginary eBPF-TPM driver framework. > > > > How I would go doing that would be to take the existing TPM driver > > functionality and provide extra functions and resources available for > > subsystem specific BPF environment, and have the orhestration code as > > eBPF. I pretty much concluded that there is a chance that such could > > work out. > > > > Not something in my immediate table but it is still really interesting > > idea, as instead of using language to separate "safe" and unsafe" > > regions you would use "VM" environments to create the walls. In the > > end of the day that would also great venture for Rust in kernel, i.e. > > compile that BPF from Rust. > > > > Sorry going of the hook the comment triggered me ;-) > > I'm glad you like the idea :-) > > Sounds like an interesting idea you are having there! Yeah, if you go forward with this please CC to me any possible follow-ups :-) BR, Jarkko
On Mo, 09.09.24 21:38, Pingfan Liu (piliu@redhat.com) wrote: > Hi Lennart, > > I spent some time understanding the systemd-pcrlock and TPM stuff, and > got some idea about it. Could you correct me if I'm wrong? Please see > the following comments inlined. > > On Mon, Aug 26, 2024 at 9:40 PM Lennart Poettering <mzxreary@0pointer.de> wrote: > > > > On Do, 22.08.24 22:29, Pingfan Liu (piliu@redhat.com) wrote: > > > > > > Hmm, I'd really think about this with some priority. The measurement > > > > stuff should not be an afterthought, it typically has major > > > > implications on how you design your transitions, because measurements > > > > of some component always need to happen *before* you pass control to > > > > it, otherwise they are pointless. > > > > > > > > > > At present, my emulator returns false to is_efi_secure_boot(), so > > > systemd-stub does not care about the measurement, and moves on. > > > > > > Could you enlighten me about how systemd utilizes the measurement? I > > > grepped 'TPM2_PCR_KERNEL_CONFIG', and saw the systemd-stub asks to > > > extend PCR. But where is the value checked? I guess the systemd will > > > hang if the check fails. > > > > systemd's "systemd-pcrlock" tool will look for measurements like that > > and generate disk encryption TPM policies from that. > > > > Before kexec reboots to the new kernel > systemd-pcrlock can predict the expected PCR value and store it in the > file system. I's a set of PCR values pcrlock predicts, one or more for each PCR. It then compiles a TPM "policy" from that, which is identified by a hash, and that hash is then stored in a TPM "nvindex" (which is a bit of memory a tpm provides). > One thing should be noticed is that PCR value can not be affected. Well, a kexec *should* affect some PCRs. Replacement of the kernel *must* be visible in the measurement logs somehow, in a predictable fashion. > And kexec rebooting happens. systemd-stub extends the PCR value. When > the system is up, systemd checks the real PCR value against the > expected value rendered by systemd-pcrlock? If matching, all related > policies succeed. Well, it's not systemd that checks that, but the TPM. i.e. not the untrusted OS but the the suppedly more trusted TPM. So, key is that we want that measurements take place, the kexec operation *must* be made visible in the measurement logs. But it must be in a well-defined way, and ideally as an extension of the measurements sd-stub currently makes. (BTW, I personally don't think emulating EFI is really that important. As long as we get the key functionality that sd-stub provides also when doing kexec I am happy. i.e. whether it is sd-stub that does this or some other piece of code doesn't really matter to me. What I do care about is that we can parameterize the invoked kernel in a similar fashion as we can parameterize sd-stub, and that the measurements applied are also equivalent.) Lennart -- Lennart Poettering, Berlin
On Mo, 09.09.24 12:42, Jan Hendrik Farr (kernel@jfarr.cc) wrote: > On 09 11:48:30, Lennart Poettering wrote: > > On Fr, 06.09.24 12:54, Philipp Rudo (prudo@redhat.com) wrote: > > > > > I mostly agree on what you have wrote. But I see a big problem in > > > running the EFI emulator in user space when it comes to secure boot. > > > The chain of trust ends in the kernel. So it's the kernel that needs to > > > verify that the image to be loaded can be trusted. But when the EFI > > > runtime is in user space the kernel simply cannot do that. Which means, > > > if we want to go this way, we would need to extend the chain of trust > > > to user space. Which will be a whole bucket of worms, not just a > > > can. > > > > May it would be nice to have a way to "zap" userspace away, i.e. allow > > the kernel to get rid of all processes in some way, reliable. And then > > simply start a new userspace, from a trusted definition. Or in other > > words: if you don't want to trust the usual userspace, then let's > > maybe just terminate it, and create it anew, with a clean, pristine > > definition the old userspace cannot get access to. > > Well, this is an interesting idea! > > However, I'm sceptical if this could be done in a secure way. How do we > ensure that nothing the old userspace did with the various interfaces to > the kernel has no impact on the new userspace? Maybe others can chime in > on this? Does kernel_lockdown give more guarantees related to this? Yeah, it's not a trivial thing. I.e. I guess things like sysfs and procfs will retain ownership/access mode. sysctls and sysfs attrs are going to retain their most recently written contents and things like that. Synthetic network interfaces, DM devices, loopback devices all would survive this. So, no idea how realistic this is, but I would *love* it, not only for this purpose here, but also for the "soft-reboot" logic we have in system these days, which shuts down userspace and starts it up again, as a form of super-fast reboot that doesn't replace the kernel. If we could reliably reset sysfs/sysctl/procfs/… during this, this would be really lovely. Lennart -- Lennart Poettering, Berlin
On Thu, Aug 29, 2024 at 1:08 AM Ard Biesheuvel <ardb@kernel.org> wrote: > [...] > > Thanks for putting this RFC together. This is useful work, and gives > us food for thought and discussion. > > There are a few problems that become apparent when going through these changes. > > 1. Implementing UEFI entirely is intractable, and unnecessary. > Implementing the subset of UEFI that is actually needed to boot Linux > *is* tractable, though, but we need to work together to write this > down somewhere. > - the EFI stub needs the boot services for the EFI memory map and > the allocation routines > - GRUB needs block I/O > - systemd-stub/UKI needs file I/O to look for sidecars > - etc etc > I have created a git repo to hold the record for the current status. [https://github.com/rhkdump/kexec_uefi.git] And uefi_subset.md records the minimal requirement of uefi. But I have a question about "GRUB needs block I/O", is it required? As I know, the kernel image e.g. UKI, zboot will be supported. But why should grub be supported too? Thanks, Pingfan > I implemented a Rust 'efiloader' crate a while ago that encapsulates > most of this (it can boot Linux/arm64 on QEMU and boot x86 via GRUB in > user space **). Adding file I/O to this should be straight-forward - > as Lennart points out, we only need the protocol, it doesn't need to > be backed by an actual file system, it just needs to be able to expose > other files in the right way. > > 2. Running the UEFI emulator on bare metal is not going to scale. > Cloning UART driver code and MMU code etc is a can of worms that you > want to leave closed. And as Lennart points out, there is other > hardware (TPM) that needs to be accessible as well. Providing a > separate set of drivers for all hardware that the EFI emulator may > need to access is not a tractable problem either. > > The fix for this, as I see it, is to run the EFI emulator in user > space, to the point where the payload calls ExitBootServices(). This > will allow all I/O and memory protocol to be implemented trivially, > using C library routines. I have a crude prototype** of this running > to the point where ExitBootServices() is called (and then it crashes). > The tricky yet interesting bit here is how we migrate a chunk of user > space memory to the bare metal context that will be created by the > kexec syscall later (in which the call to ExitBootServices() would > return and proceed with the boot). But the principle is rather > straight-forward, and would permit us, e.g., to kexec an OS installer > too. > > 3. We need to figure out how to support TPM and PCRs in the context of > kexec. This is a fundamental issue with verified boot, given that the > kexec PCR state is necessarily different from the boot state, and so > we cannot reuse the TPM directly if we want to pretend that we are > doing an ordinary boot in kexec. The alternative is to leave the TPM > in a state where the kexec kernel can access its sealed secrets, and > mock up the TCG2 EFI protocols using a shim that sits between the TPM > hardware (as the real TCG2 protocols will be long gone) and the EFI > payload. But as I said, this is a fundamental issue, as the ability to > pretend that a kexec boot is a pristine boot would mean that verified > boot is broken. > > > As future work, I'd like to propose to collaborate on some alignment > regarding a UEFI baseline for Linux, i.e., the parts that we actually > need to boot Linux. > > For this series in particular, I don't see a way forward where we > adopt this approach, and carry all this code inside the kernel. > > Thanks. > Ard. >