mbox series

[v9,00/19] x86: Trenchboot secure dynamic launch Linux kernel support

Message ID 20240531010331.134441-1-ross.philipson@oracle.com
Headers show
Series x86: Trenchboot secure dynamic launch Linux kernel support | expand

Message

Ross Philipson May 31, 2024, 1:03 a.m. UTC
The larger focus of the TrenchBoot project (https://github.com/TrenchBoot) is to
enhance the boot security and integrity in a unified manner. The first area of
focus has been on the Trusted Computing Group's Dynamic Launch for establishing
a hardware Root of Trust for Measurement, also know as DRTM (Dynamic Root of
Trust for Measurement). The project has been and continues to work on providing
a unified means to Dynamic Launch that is a cross-platform (Intel and AMD) and
cross-architecture (x86 and Arm), with our recent involvment in the upcoming
Arm DRTM specification. The order of introducing DRTM to the Linux kernel
follows the maturity of DRTM in the architectures. Intel's Trusted eXecution
Technology (TXT) is present today and only requires a preamble loader, e.g. a
boot loader, and an OS kernel that is TXT-aware. AMD DRTM implementation has
been present since the introduction of AMD-V but requires an additional
component that is AMD specific and referred to in the specification as the
Secure Loader, which the TrenchBoot project has an active prototype in
development. Finally Arm's implementation is in specification development stage
and the project is looking to support it when it becomes available.

This patchset provides detailed documentation of DRTM, the approach used for
adding the capbility, and relevant API/ABI documentation. In addition to the
documentation the patch set introduces Intel TXT support as the first platform
for Linux Secure Launch.

A quick note on terminology. The larger open source project itself is called
TrenchBoot, which is hosted on Github (links below). The kernel feature enabling
the use of Dynamic Launch technology is referred to as "Secure Launch" within
the kernel code. As such the prefixes sl_/SL_ or slaunch/SLAUNCH will be seen
in the code. The stub code discussed above is referred to as the SL stub.

The Secure Launch feature starts with patch #2. Patch #1 was authored by Arvind
Sankar. There is no further status on this patch at this point but
Secure Launch depends on it so it is included with the set.

Links:

The TrenchBoot project including documentation:

https://trenchboot.org

The TrenchBoot project on Github:

https://github.com/trenchboot

Intel TXT is documented in its own specification and in the SDM Instruction Set volume:

https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
https://software.intel.com/en-us/articles/intel-sdm

AMD SKINIT is documented in the System Programming manual:

https://www.amd.com/system/files/TechDocs/24593.pdf

The TrenchBoot project provides a quick start guide to help get a system
up and running with Secure Launch for Linux:

https://github.com/TrenchBoot/documentation/blob/master/QUICKSTART.md

Patch set based on commit:

torvalds/master/ea5f6ad9ad9645733b72ab53a98e719b460d36a6

Thanks
Ross Philipson and Daniel P. Smith

Changes in v2:

 - Modified 32b entry code to prevent causing relocations in the compressed
   kernel.
 - Dropped patches for compressed kernel TPM PCR extender.
 - Modified event log code to insert log delimiter events and not rely
   on TPM access.
 - Stop extending PCRs in the early Secure Launch stub code.
 - Removed Kconfig options for hash algorithms and use the algorithms the
   ACM used.
 - Match Secure Launch measurement algorithm use to those reported in the
   TPM 2.0 event log.
 - Read the TPM events out of the TPM and extend them into the PCRs using
   the mainline TPM driver. This is done in the late initcall module.
 - Allow use of alternate PCR 19 and 20 for post ACM measurements.
 - Add Kconfig constraints needed by Secure Launch (disable KASLR
   and add x2apic dependency).
 - Fix testing of SL_FLAGS when determining if Secure Launch is active
   and the architecture is TXT.
 - Use SYM_DATA_START_LOCAL macros in early entry point code.
 - Security audit changes:
   - Validate buffers passed to MLE do not overlap the MLE and are
     properly laid out.
   - Validate buffers and memory regions used by the MLE are
     protected by IOMMU PMRs.
 - Force IOMMU to not use passthrough mode during a Secure Launch.
 - Prevent KASLR use during a Secure Launch.

Changes in v3:

 - Introduce x86 documentation patch to provide background, overview
   and configuration/ABI information for the Secure Launch kernel
   feature.
 - Remove the IOMMU patch with special cases for disabling IOMMU
   passthrough. Configuring the IOMMU is now a documentation matter
   in the previously mentioned new patch.
 - Remove special case KASLR disabling code. Configuring KASLR is now
   a documentation matter in the previously mentioned new patch.
 - Fix incorrect panic on TXT public register read.
 - Properly handle and measure setup_indirect bootparams in the early
   launch code.
 - Use correct compressed kernel image base address when testing buffers
   in the early launch stub code. This bug was introduced by the changes
   to avoid relocation in the compressed kernel.
 - Use CPUID feature bits instead of CPUID vendor strings to determine
   if SMX mode is supported and the system is Intel.
 - Remove early NMI re-enable on the BSP. This can be safely done later
   on the BSP after an IDT is setup.

Changes in v4:
 - Expand the cover letter to provide more context to the order that DRTM
   support will be added.
 - Removed debug tracing in TPM request locality funciton and fixed
   local variable declarations.
 - Fixed missing break in default case in slmodule.c.
 - Reworded commit messages in patches 1 and 2 per suggestions.

Changes in v5:
 - Comprehensive documentation rewrite.
 - Use boot param loadflags to communicate Secure Launch status to
   kernel proper.
 - Fix incorrect check of X86_FEATURE_BIT_SMX bit.
 - Rename the alternate details and authorities PCR support.
 - Refactor the securityfs directory and file setup in slmodule.c.
 - Misc. cleanup from internal code reviews.
 - Use reverse fir tree format for variables.

Changes in v6:
 - Support for the new Secure Launch Resourse Table that standardizes
   the information passed and forms the ABI between the pre and post
   launch code.
 - Support for booting Linux through the EFI stub entry point and
   then being able to do a Secure Launch once EFI stub is done and EBS
   is called.
 - Updates to the documentation to reflect the previous two items listed.

Changes in v7:
 - Switch to using MONITOR/MWAIT instead of NMIs to park the APs for
   later bringup by the SMP code.
 - Use static inline dummy functions instead of macros when the Secure
   Launch feature is disabled.
 - Move early SHA1 code to lib/crypto and pull it in from there.
 - Numerous formatting fixes from comments on LKML.
 - Remove efi-stub/DL stub patch temporarily for redesign/rework.

Changes in v8:
 - Reintroduce efi-stub Linux kernel booting through the dynamic launch
   stub (DL stub).
 - Add new approach to setting localities > 0 through kernel and sysfs
   interfaces in the TPM mainline driver.
 - General code cleanup from v7 post comments.

Changes in v9:
 - Updated DL stub support for recent changes to EFI stub in the kernel.
 - Added patches to fix locality changing support in the TPM driver
   (these patches originally were posted as a separate set).
 - Enhanced Secure Launch TPM locality 2 setting in the TPM driver.
 - Added locality setting support through sysfs for user land to access.
 - Split up SHA1 and SHA256 changes into separate patches and updated
   the commit messages to be more clear (per request from upstream
   review).
 - Fix Clang compile issues detected by kernel test robot.
 - Modifications to the Secure Launch Resource Table ABI:
   . Use flex arrays in table structures.
   . Update and move fields in tables to make everything 8b aligned.
   . Add 2 new DLME fields and a txt_heap address field.
   . Remove platform specific tables that are not defined yet (AMD/ARM).
 - Update Kconfig dependencies for Secure Launch with SHA1/SHA256/TPM.
 - Remove push/pop of rsi since boot params is now stored in r15.
 - Update outdated kernel documentation.
 - Misc. comment fixes for type-os and mispellings.

Arvind Sankar (1):
  x86/boot: Place kernel_info at a fixed offset

Daniel P. Smith (6):
  x86: Add early SHA-1 support for Secure Launch early measurements
  x86: Add early SHA-256 support for Secure Launch early measurements
  tpm: Protect against locality counter underflow
  tpm: Ensure tpm is in known state at startup
  tpm: Make locality requests return consistent values
  x86: Secure Launch late initcall platform module

Ross Philipson (12):
  Documentation/x86: Secure Launch kernel documentation
  x86: Secure Launch Kconfig
  x86: Secure Launch Resource Table header file
  x86: Secure Launch main header file
  x86: Secure Launch kernel early boot stub
  x86: Secure Launch kernel late boot stub
  x86: Secure Launch SMP bringup support
  kexec: Secure Launch kexec SEXIT support
  reboot: Secure Launch SEXIT support on reboot paths
  tpm: Add ability to set the preferred locality the TPM chip uses
  tpm: Add sysfs interface to allow setting and querying the preferred
    locality
  x86: EFI stub DRTM launch support for Secure Launch

 Documentation/arch/x86/boot.rst               |  21 +
 Documentation/security/index.rst              |   1 +
 .../security/launch-integrity/index.rst       |  11 +
 .../security/launch-integrity/principles.rst  | 320 ++++++++
 .../secure_launch_details.rst                 | 587 ++++++++++++++
 .../secure_launch_overview.rst                | 227 ++++++
 arch/x86/Kconfig                              |  11 +
 arch/x86/boot/compressed/Makefile             |   3 +
 arch/x86/boot/compressed/early_sha1.c         |  12 +
 arch/x86/boot/compressed/early_sha256.c       |   6 +
 arch/x86/boot/compressed/head_64.S            |  30 +
 arch/x86/boot/compressed/kernel_info.S        |  53 +-
 arch/x86/boot/compressed/kernel_info.h        |  12 +
 arch/x86/boot/compressed/sl_main.c            | 577 ++++++++++++++
 arch/x86/boot/compressed/sl_stub.S            | 725 ++++++++++++++++++
 arch/x86/boot/compressed/vmlinux.lds.S        |   6 +
 arch/x86/include/asm/msr-index.h              |   5 +
 arch/x86/include/asm/realmode.h               |   3 +
 arch/x86/include/uapi/asm/bootparam.h         |   1 +
 arch/x86/kernel/Makefile                      |   2 +
 arch/x86/kernel/asm-offsets.c                 |  20 +
 arch/x86/kernel/reboot.c                      |  10 +
 arch/x86/kernel/setup.c                       |   3 +
 arch/x86/kernel/slaunch.c                     | 598 +++++++++++++++
 arch/x86/kernel/slmodule.c                    | 513 +++++++++++++
 arch/x86/kernel/smpboot.c                     |  58 +-
 arch/x86/realmode/init.c                      |   3 +
 arch/x86/realmode/rm/header.S                 |   3 +
 arch/x86/realmode/rm/trampoline_64.S          |  32 +
 drivers/char/tpm/tpm-chip.c                   |  24 +-
 drivers/char/tpm/tpm-interface.c              |  15 +
 drivers/char/tpm/tpm-sysfs.c                  |  30 +
 drivers/char/tpm/tpm.h                        |   1 +
 drivers/char/tpm/tpm_tis_core.c               |  25 +-
 drivers/firmware/efi/libstub/x86-stub.c       |  98 +++
 drivers/iommu/intel/dmar.c                    |   4 +
 include/crypto/sha1.h                         |   1 +
 include/linux/slaunch.h                       | 542 +++++++++++++
 include/linux/slr_table.h                     | 271 +++++++
 include/linux/tpm.h                           |  10 +
 kernel/kexec_core.c                           |   4 +
 lib/crypto/sha1.c                             |  81 ++
 42 files changed, 4946 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/security/launch-integrity/index.rst
 create mode 100644 Documentation/security/launch-integrity/principles.rst
 create mode 100644 Documentation/security/launch-integrity/secure_launch_details.rst
 create mode 100644 Documentation/security/launch-integrity/secure_launch_overview.rst
 create mode 100644 arch/x86/boot/compressed/early_sha1.c
 create mode 100644 arch/x86/boot/compressed/early_sha256.c
 create mode 100644 arch/x86/boot/compressed/kernel_info.h
 create mode 100644 arch/x86/boot/compressed/sl_main.c
 create mode 100644 arch/x86/boot/compressed/sl_stub.S
 create mode 100644 arch/x86/kernel/slaunch.c
 create mode 100644 arch/x86/kernel/slmodule.c
 create mode 100644 include/linux/slaunch.h
 create mode 100644 include/linux/slr_table.h

Comments

Ard Biesheuvel May 31, 2024, 11 a.m. UTC | #1
Hello Ross,

On Fri, 31 May 2024 at 03:32, Ross Philipson <ross.philipson@oracle.com> wrote:
>
> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> later AMD SKINIT) to vector to during the late launch. The symbol
> sl_stub_entry is that entry point and its offset into the kernel is
> conveyed to the launching code using the MLE (Measured Launch
> Environment) header in the structure named mle_header. The offset of the
> MLE header is set in the kernel_info. The routine sl_stub contains the
> very early late launch setup code responsible for setting up the basic
> environment to allow the normal kernel startup_32 code to proceed. It is
> also responsible for properly waking and handling the APs on Intel
> platforms. The routine sl_main which runs after entering 64b mode is
> responsible for measuring configuration and module information before
> it is used like the boot params, the kernel command line, the TXT heap,
> an external initramfs, etc.
>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> ---
>  Documentation/arch/x86/boot.rst        |  21 +
>  arch/x86/boot/compressed/Makefile      |   3 +-
>  arch/x86/boot/compressed/head_64.S     |  30 +
>  arch/x86/boot/compressed/kernel_info.S |  34 ++
>  arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
>  arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
>  arch/x86/include/asm/msr-index.h       |   5 +
>  arch/x86/include/uapi/asm/bootparam.h  |   1 +
>  arch/x86/kernel/asm-offsets.c          |  20 +
>  9 files changed, 1415 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/boot/compressed/sl_main.c
>  create mode 100644 arch/x86/boot/compressed/sl_stub.S
>
> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
> index 4fd492cb4970..295cdf9bcbdb 100644
> --- a/Documentation/arch/x86/boot.rst
> +++ b/Documentation/arch/x86/boot.rst
> @@ -482,6 +482,14 @@ Protocol:  2.00+
>             - If 1, KASLR enabled.
>             - If 0, KASLR disabled.
>
> +  Bit 2 (kernel internal): SLAUNCH_FLAG
> +
> +       - Used internally by the setup kernel to communicate
> +         Secure Launch status to kernel proper.
> +
> +           - If 1, Secure Launch enabled.
> +           - If 0, Secure Launch disabled.
> +
>    Bit 5 (write): QUIET_FLAG
>
>         - If 0, print early messages.
> @@ -1028,6 +1036,19 @@ Offset/size:     0x000c/4
>
>    This field contains maximal allowed type for setup_data and setup_indirect structs.
>
> +============   =================
> +Field name:    mle_header_offset
> +Offset/size:   0x0010/4
> +============   =================
> +
> +  This field contains the offset to the Secure Launch Measured Launch Environment
> +  (MLE) header. This offset is used to locate information needed during a secure
> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
> +  following a success measured launch. The specific state of the processors is
> +  outlined in the TXT Software Development Guide, the latest can be found here:
> +  https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
> +
>

Could we just repaint this field as the offset relative to the start
of kernel_info rather than relative to the start of the image? That
way, there is no need for patch #1, and given that the consumer of
this field accesses it via kernel_info, I wouldn't expect any issues
in applying this offset to obtain the actual address.


>  The Image Checksum
>  ==================
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 9189a0e28686..9076a248d4b4 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>  vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>  vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
>
> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
> +       $(obj)/sl_main.o $(obj)/sl_stub.o
>
>  $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
>         $(call if_changed,ld)
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 1dcb794c5479..803c9e2e6d85 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
>         pushq   $0
>         popfq
>
> +#ifdef CONFIG_SECURE_LAUNCH
> +       /* Ensure the relocation region is coverd by a PMR */

covered

> +       movq    %rbx, %rdi
> +       movl    $(_bss - startup_32), %esi
> +       callq   sl_check_region
> +#endif
> +
>  /*
>   * Copy the compressed kernel to the end of our buffer
>   * where decompression in place becomes safe.
> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>         shrq    $3, %rcx
>         rep     stosq
>
> +#ifdef CONFIG_SECURE_LAUNCH
> +       /*
> +        * Have to do the final early sl stub work in 64b area.
> +        *
> +        * *********** NOTE ***********
> +        *
> +        * Several boot params get used before we get a chance to measure
> +        * them in this call. This is a known issue and we currently don't
> +        * have a solution. The scratch field doesn't matter. There is no
> +        * obvious way to do anything about the use of kernel_alignment or
> +        * init_size though these seem low risk with all the PMR and overlap
> +        * checks in place.
> +        */
> +       movq    %r15, %rdi
> +       callq   sl_main
> +
> +       /* Ensure the decompression location is covered by a PMR */
> +       movq    %rbp, %rdi
> +       movq    output_len(%rip), %rsi
> +       callq   sl_check_region
> +#endif
> +
> +       pushq   %rsi

This looks like a rebase error.

>         call    load_stage2_idt
>
>         /* Pass boot_params to initialize_identity_maps() */
> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
> index c18f07181dd5..e199b87764e9 100644
> --- a/arch/x86/boot/compressed/kernel_info.S
> +++ b/arch/x86/boot/compressed/kernel_info.S
> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
>         /* Maximal allowed type for setup_data and setup_indirect structs. */
>         .long   SETUP_TYPE_MAX
>
> +       /* Offset to the MLE header structure */
> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> +       .long   rva(mle_header)

... so this could just be mle_header - kernel_info, and the consumer
can do the math instead.

> +#else
> +       .long   0
> +#endif
> +
>  kernel_info_var_len_data:
>         /* Empty for time being... */
>  SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
> +
> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> +       /*
> +        * The MLE Header per the TXT Specification, section 2.1
> +        * MLE capabilities, see table 4. Capabilities set:
> +        * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
> +        * bit 1: Support for RLP wakeup using MONITOR address
> +        * bit 2: The ECX register will contain the pointer to the MLE page table
> +        * bit 5: TPM 1.2 family: Details/authorities PCR usage support
> +        * bit 9: Supported format of TPM 2.0 event log - TCG compliant
> +        */
> +SYM_DATA_START(mle_header)
> +       .long   0x9082ac5a  /* UUID0 */
> +       .long   0x74a7476f  /* UUID1 */
> +       .long   0xa2555c0f  /* UUID2 */
> +       .long   0x42b651cb  /* UUID3 */
> +       .long   0x00000034  /* MLE header size */
> +       .long   0x00020002  /* MLE version 2.2 */
> +       .long   rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */

and these should perhaps be relative to mle_header?

> +       .long   0x00000000  /* First valid page of MLE */
> +       .long   0x00000000  /* Offset within binary of first byte of MLE */
> +       .long   rva(_edata) /* Offset within binary of last byte + 1 of MLE */

and here

> +       .long   0x00000227  /* Bit vector of MLE-supported capabilities */
> +       .long   0x00000000  /* Starting linear address of command line (unused) */
> +       .long   0x00000000  /* Ending linear address of command line (unused) */
> +SYM_DATA_END(mle_header)
> +#endif
> diff --git a/arch/x86/boot/compressed/sl_main.c b/arch/x86/boot/compressed/sl_main.c
> new file mode 100644
> index 000000000000..61e9baf410fd
> --- /dev/null
> +++ b/arch/x86/boot/compressed/sl_main.c
> @@ -0,0 +1,577 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Secure Launch early measurement and validation routines.
> + *
> + * Copyright (c) 2024, Oracle and/or its affiliates.
> + */
> +
> +#include <linux/init.h>
> +#include <linux/string.h>
> +#include <linux/linkage.h>
> +#include <asm/segment.h>
> +#include <asm/boot.h>
> +#include <asm/msr.h>
> +#include <asm/mtrr.h>
> +#include <asm/processor-flags.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/bootparam.h>
> +#include <asm/bootparam_utils.h>
> +#include <linux/slr_table.h>
> +#include <linux/slaunch.h>
> +#include <crypto/sha1.h>
> +#include <crypto/sha2.h>
> +
> +#define CAPS_VARIABLE_MTRR_COUNT_MASK  0xff
> +
> +#define SL_TPM12_LOG           1
> +#define SL_TPM20_LOG           2
> +
> +#define SL_TPM20_MAX_ALGS      2
> +
> +#define SL_MAX_EVENT_DATA      64
> +#define SL_TPM12_LOG_SIZE      (sizeof(struct tcg_pcr_event) + \
> +                               SL_MAX_EVENT_DATA)
> +#define SL_TPM20_LOG_SIZE      (sizeof(struct tcg_pcr_event2_head) + \
> +                               SHA1_DIGEST_SIZE + SHA256_DIGEST_SIZE + \
> +                               sizeof(struct tcg_event_field) + \
> +                               SL_MAX_EVENT_DATA)
> +
> +static void *evtlog_base;
> +static u32 evtlog_size;
> +static struct txt_heap_event_log_pointer2_1_element *log20_elem;
> +static u32 tpm_log_ver = SL_TPM12_LOG;
> +static struct tcg_efi_specid_event_algs tpm_algs[SL_TPM20_MAX_ALGS] = {0};
> +
> +extern u32 sl_cpu_type;
> +extern u32 sl_mle_start;
> +
> +static u64 sl_txt_read(u32 reg)
> +{
> +       return readq((void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
> +}
> +
> +static void sl_txt_write(u32 reg, u64 val)
> +{
> +       writeq(val, (void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
> +}
> +
> +static void __noreturn sl_txt_reset(u64 error)
> +{
> +       /* Reading the E2STS register acts as a barrier for TXT registers */
> +       sl_txt_write(TXT_CR_ERRORCODE, error);
> +       sl_txt_read(TXT_CR_E2STS);
> +       sl_txt_write(TXT_CR_CMD_UNLOCK_MEM_CONFIG, 1);
> +       sl_txt_read(TXT_CR_E2STS);
> +       sl_txt_write(TXT_CR_CMD_RESET, 1);
> +
> +       for ( ; ; )
> +               asm volatile ("hlt");
> +
> +       unreachable();
> +}
> +
> +static u64 sl_rdmsr(u32 reg)
> +{
> +       u64 lo, hi;
> +
> +       asm volatile ("rdmsr" : "=a" (lo), "=d" (hi) : "c" (reg));
> +
> +       return (hi << 32) | lo;
> +}
> +
> +static struct slr_table *sl_locate_and_validate_slrt(void)
> +{
> +       struct txt_os_mle_data *os_mle_data;
> +       struct slr_table *slrt;
> +       void *txt_heap;
> +
> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +       os_mle_data = txt_os_mle_data_start(txt_heap);
> +
> +       if (!os_mle_data->slrt)
> +               sl_txt_reset(SL_ERROR_INVALID_SLRT);
> +
> +       slrt = (struct slr_table *)os_mle_data->slrt;
> +
> +       if (slrt->magic != SLR_TABLE_MAGIC)
> +               sl_txt_reset(SL_ERROR_INVALID_SLRT);
> +
> +       if (slrt->architecture != SLR_INTEL_TXT)
> +               sl_txt_reset(SL_ERROR_INVALID_SLRT);
> +
> +       return slrt;
> +}
> +
> +static void sl_check_pmr_coverage(void *base, u32 size, bool allow_hi)
> +{
> +       struct txt_os_sinit_data *os_sinit_data;
> +       void *end = base + size;
> +       void *txt_heap;
> +
> +       if (!(sl_cpu_type & SL_CPU_INTEL))
> +               return;
> +
> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +       os_sinit_data = txt_os_sinit_data_start(txt_heap);
> +
> +       if ((end >= (void *)0x100000000ULL) && (base < (void *)0x100000000ULL))
> +               sl_txt_reset(SL_ERROR_REGION_STRADDLE_4GB);
> +
> +       /*
> +        * Note that the late stub code validates that the hi PMR covers
> +        * all memory above 4G. At this point the code can only check that
> +        * regions are within the hi PMR but that is sufficient.
> +        */
> +       if ((end > (void *)0x100000000ULL) && (base >= (void *)0x100000000ULL)) {

Better to put the cast on the pointers, given that we are doing
arithmetic, and use SZ_4G instead of open coding this constant 4
times.



> +               if (allow_hi) {
> +                       if (end >= (void *)(os_sinit_data->vtd_pmr_hi_base +
> +                                          os_sinit_data->vtd_pmr_hi_size))
> +                               sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
> +               } else {
> +                       sl_txt_reset(SL_ERROR_REGION_ABOVE_4GB);
> +               }
> +       }
> +
> +       if (end >= (void *)os_sinit_data->vtd_pmr_lo_size)
> +               sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
> +}
> +
> +/*
> + * Some MSRs are modified by the pre-launch code including the MTRRs.
> + * The early MLE code has to restore these values. This code validates
> + * the values after they are measured.
> + */
> +static void sl_txt_validate_msrs(struct txt_os_mle_data *os_mle_data)
> +{
> +       struct slr_txt_mtrr_state *saved_bsp_mtrrs;
> +       u64 mtrr_caps, mtrr_def_type, mtrr_var;
> +       struct slr_entry_intel_info *txt_info;
> +       u64 misc_en_msr;
> +       u32 vcnt, i;
> +
> +       txt_info = (struct slr_entry_intel_info *)os_mle_data->txt_info;
> +       saved_bsp_mtrrs = &txt_info->saved_bsp_mtrrs;
> +
> +       mtrr_caps = sl_rdmsr(MSR_MTRRcap);
> +       vcnt = (u32)(mtrr_caps & CAPS_VARIABLE_MTRR_COUNT_MASK);
> +
> +       if (saved_bsp_mtrrs->mtrr_vcnt > vcnt)
> +               sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
> +       if (saved_bsp_mtrrs->mtrr_vcnt > TXT_OS_MLE_MAX_VARIABLE_MTRRS)
> +               sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
> +
> +       mtrr_def_type = sl_rdmsr(MSR_MTRRdefType);
> +       if (saved_bsp_mtrrs->default_mem_type != mtrr_def_type)
> +               sl_txt_reset(SL_ERROR_MTRR_INV_DEF_TYPE);
> +
> +       for (i = 0; i < saved_bsp_mtrrs->mtrr_vcnt; i++) {
> +               mtrr_var = sl_rdmsr(MTRRphysBase_MSR(i));
> +               if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physbase != mtrr_var)
> +                       sl_txt_reset(SL_ERROR_MTRR_INV_BASE);
> +               mtrr_var = sl_rdmsr(MTRRphysMask_MSR(i));
> +               if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physmask != mtrr_var)
> +                       sl_txt_reset(SL_ERROR_MTRR_INV_MASK);
> +       }
> +
> +       misc_en_msr = sl_rdmsr(MSR_IA32_MISC_ENABLE);
> +       if (txt_info->saved_misc_enable_msr != misc_en_msr)
> +               sl_txt_reset(SL_ERROR_MSR_INV_MISC_EN);
> +}
> +
> +static void sl_find_drtm_event_log(struct slr_table *slrt)
> +{
> +       struct txt_os_sinit_data *os_sinit_data;
> +       struct slr_entry_log_info *log_info;
> +       void *txt_heap;
> +
> +       log_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_LOG_INFO);
> +       if (!log_info)
> +               sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
> +
> +       evtlog_base = (void *)log_info->addr;
> +       evtlog_size = log_info->size;
> +
> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +
> +       /*
> +        * For TPM 2.0, the event log 2.1 extended data structure has to also
> +        * be located and fixed up.
> +        */
> +       os_sinit_data = txt_os_sinit_data_start(txt_heap);
> +
> +       /*
> +        * Only support version 6 and later that properly handle the
> +        * list of ExtDataElements in the OS-SINIT structure.
> +        */
> +       if (os_sinit_data->version < 6)
> +               sl_txt_reset(SL_ERROR_OS_SINIT_BAD_VERSION);
> +
> +       /* Find the TPM2.0 logging extended heap element */
> +       log20_elem = tpm20_find_log2_1_element(os_sinit_data);
> +
> +       /* If found, this implies TPM20 log and family */
> +       if (log20_elem)
> +               tpm_log_ver = SL_TPM20_LOG;
> +}
> +
> +static void sl_validate_event_log_buffer(void)
> +{
> +       struct txt_os_sinit_data *os_sinit_data;
> +       void *txt_heap, *txt_end;
> +       void *mle_base, *mle_end;
> +       void *evtlog_end;
> +
> +       if ((u64)evtlog_size > (LLONG_MAX - (u64)evtlog_base))
> +               sl_txt_reset(SL_ERROR_INTEGER_OVERFLOW);
> +       evtlog_end = evtlog_base + evtlog_size;
> +
> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +       txt_end = txt_heap + sl_txt_read(TXT_CR_HEAP_SIZE);
> +       os_sinit_data = txt_os_sinit_data_start(txt_heap);
> +
> +       mle_base = (void *)(u64)sl_mle_start;
> +       mle_end = mle_base + os_sinit_data->mle_size;
> +
> +       /*
> +        * This check is to ensure the event log buffer does not overlap with
> +        * the MLE image.
> +        */
> +       if (evtlog_base >= mle_end && evtlog_end > mle_end)
> +               goto pmr_check; /* above */
> +
> +       if (evtlog_end <= mle_base && evtlog_base < mle_base)
> +               goto pmr_check; /* below */
> +
> +       sl_txt_reset(SL_ERROR_MLE_BUFFER_OVERLAP);
> +
> +pmr_check:
> +       /*
> +        * The TXT heap is protected by the DPR. If the TPM event log is
> +        * inside the TXT heap, there is no need for a PMR check.
> +        */
> +       if (evtlog_base > txt_heap && evtlog_end < txt_end)
> +               return;
> +
> +       sl_check_pmr_coverage(evtlog_base, evtlog_size, true);
> +}
> +
> +static void sl_find_event_log_algorithms(void)
> +{
> +       struct tcg_efi_specid_event_head *efi_head =
> +               (struct tcg_efi_specid_event_head *)(evtlog_base +
> +                                       log20_elem->first_record_offset +
> +                                       sizeof(struct tcg_pcr_event));
> +
> +       if (efi_head->num_algs == 0 || efi_head->num_algs > 2)
> +               sl_txt_reset(SL_ERROR_TPM_NUMBER_ALGS);
> +
> +       memcpy(&tpm_algs[0], &efi_head->digest_sizes[0],
> +              sizeof(struct tcg_efi_specid_event_algs) * efi_head->num_algs);
> +}
> +
> +static void sl_tpm12_log_event(u32 pcr, u32 event_type,
> +                              const u8 *data, u32 length,
> +                              const u8 *event_data, u32 event_size)
> +{
> +       u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
> +       u8 log_buf[SL_TPM12_LOG_SIZE] = {0};
> +       struct tcg_pcr_event *pcr_event;
> +       u32 total_size;
> +
> +       pcr_event = (struct tcg_pcr_event *)log_buf;
> +       pcr_event->pcr_idx = pcr;
> +       pcr_event->event_type = event_type;
> +       if (length > 0) {
> +               sha1(data, length, &sha1_hash[0]);
> +               memcpy(&pcr_event->digest[0], &sha1_hash[0], SHA1_DIGEST_SIZE);
> +       }
> +       pcr_event->event_size = event_size;
> +       if (event_size > 0)
> +               memcpy((u8 *)pcr_event + sizeof(struct tcg_pcr_event),
> +                      event_data, event_size);
> +
> +       total_size = sizeof(struct tcg_pcr_event) + event_size;
> +
> +       if (tpm12_log_event(evtlog_base, evtlog_size, total_size, pcr_event))
> +               sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
> +}
> +
> +static void sl_tpm20_log_event(u32 pcr, u32 event_type,
> +                              const u8 *data, u32 length,
> +                              const u8 *event_data, u32 event_size)
> +{
> +       u8 sha256_hash[SHA256_DIGEST_SIZE] = {0};
> +       u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
> +       u8 log_buf[SL_TPM20_LOG_SIZE] = {0};
> +       struct sha256_state sctx256 = {0};
> +       struct tcg_pcr_event2_head *head;
> +       struct tcg_event_field *event;
> +       u32 total_size;
> +       u16 *alg_ptr;
> +       u8 *dgst_ptr;
> +
> +       head = (struct tcg_pcr_event2_head *)log_buf;
> +       head->pcr_idx = pcr;
> +       head->event_type = event_type;
> +       total_size = sizeof(struct tcg_pcr_event2_head);
> +       alg_ptr = (u16 *)(log_buf + sizeof(struct tcg_pcr_event2_head));
> +
> +       for ( ; head->count < 2; head->count++) {
> +               if (!tpm_algs[head->count].alg_id)
> +                       break;
> +
> +               *alg_ptr = tpm_algs[head->count].alg_id;
> +               dgst_ptr = (u8 *)alg_ptr + sizeof(u16);
> +
> +               if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256 &&
> +                   length) {
> +                       sha256_init(&sctx256);
> +                       sha256_update(&sctx256, data, length);
> +                       sha256_final(&sctx256, &sha256_hash[0]);
> +               } else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1 &&
> +                          length) {
> +                       sha1(data, length, &sha1_hash[0]);
> +               }
> +
> +               if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256) {
> +                       memcpy(dgst_ptr, &sha256_hash[0], SHA256_DIGEST_SIZE);
> +                       total_size += SHA256_DIGEST_SIZE + sizeof(u16);
> +                       alg_ptr = (u16 *)((u8 *)alg_ptr + SHA256_DIGEST_SIZE + sizeof(u16));
> +               } else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1) {
> +                       memcpy(dgst_ptr, &sha1_hash[0], SHA1_DIGEST_SIZE);
> +                       total_size += SHA1_DIGEST_SIZE + sizeof(u16);
> +                       alg_ptr = (u16 *)((u8 *)alg_ptr + SHA1_DIGEST_SIZE + sizeof(u16));
> +               } else {
> +                       sl_txt_reset(SL_ERROR_TPM_UNKNOWN_DIGEST);
> +               }
> +       }
> +
> +       event = (struct tcg_event_field *)(log_buf + total_size);
> +       event->event_size = event_size;
> +       if (event_size > 0)
> +               memcpy((u8 *)event + sizeof(struct tcg_event_field), event_data, event_size);
> +       total_size += sizeof(struct tcg_event_field) + event_size;
> +
> +       if (tpm20_log_event(log20_elem, evtlog_base, evtlog_size, total_size, &log_buf[0]))
> +               sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
> +}
> +
> +static void sl_tpm_extend_evtlog(u32 pcr, u32 type,
> +                                const u8 *data, u32 length, const char *desc)
> +{
> +       if (tpm_log_ver == SL_TPM20_LOG)
> +               sl_tpm20_log_event(pcr, type, data, length,
> +                                  (const u8 *)desc, strlen(desc));
> +       else
> +               sl_tpm12_log_event(pcr, type, data, length,
> +                                  (const u8 *)desc, strlen(desc));
> +}
> +
> +static struct setup_data *sl_handle_setup_data(struct setup_data *curr,
> +                                              struct slr_policy_entry *entry)
> +{
> +       struct setup_indirect *ind;
> +       struct setup_data *next;
> +
> +       if (!curr)
> +               return NULL;
> +
> +       next = (struct setup_data *)(unsigned long)curr->next;
> +
> +       /* SETUP_INDIRECT instances have to be handled differently */
> +       if (curr->type == SETUP_INDIRECT) {
> +               ind = (struct setup_indirect *)((u8 *)curr + offsetof(struct setup_data, data));
> +
> +               sl_check_pmr_coverage((void *)ind->addr, ind->len, true);
> +
> +               sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
> +                                    (void *)ind->addr, ind->len,
> +                                    entry->evt_info);
> +
> +               return next;
> +       }
> +
> +       sl_check_pmr_coverage(((u8 *)curr) + sizeof(struct setup_data),
> +                             curr->len, true);
> +
> +       sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
> +                            ((u8 *)curr) + sizeof(struct setup_data),
> +                            curr->len,
> +                            entry->evt_info);
> +
> +       return next;
> +}
> +
> +static void sl_extend_setup_data(struct slr_policy_entry *entry)
> +{
> +       struct setup_data *data;
> +
> +       /*
> +        * Measuring the boot params measured the fixed e820 memory map.
> +        * Measure any setup_data entries including e820 extended entries.
> +        */
> +       data = (struct setup_data *)(unsigned long)entry->entity;
> +       while (data)
> +               data = sl_handle_setup_data(data, entry);
> +}
> +
> +static void sl_extend_slrt(struct slr_policy_entry *entry)
> +{
> +       struct slr_table *slrt = (struct slr_table *)entry->entity;
> +       struct slr_entry_intel_info *intel_info;
> +
> +       /*
> +        * In revision one of the SLRT, the only table that needs to be
> +        * measured is the Intel info table. Everything else is meta-data,
> +        * addresses and sizes. Note the size of what to measure is not set.
> +        * The flag SLR_POLICY_IMPLICIT_SIZE leaves it to the measuring code
> +        * to sort out.
> +        */
> +       if (slrt->revision == 1) {
> +               intel_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_INTEL_INFO);
> +               if (!intel_info)
> +                       sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
> +
> +               sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
> +                                    (void *)entry->entity, sizeof(struct slr_entry_intel_info),
> +                                    entry->evt_info);
> +       }
> +}
> +
> +static void sl_extend_txt_os2mle(struct slr_policy_entry *entry)
> +{
> +       struct txt_os_mle_data *os_mle_data;
> +       void *txt_heap;
> +
> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +       os_mle_data = txt_os_mle_data_start(txt_heap);
> +
> +       /*
> +        * Version 1 of the OS-MLE heap structure has no fields to measure. It just
> +        * has addresses and sizes and a scratch buffer.
> +        */
> +       if (os_mle_data->version == 1)
> +               return;
> +}
> +
> +static void sl_process_extend_policy(struct slr_table *slrt)
> +{
> +       struct slr_entry_policy *policy;
> +       u16 i;
> +
> +       policy = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_ENTRY_POLICY);
> +       if (!policy)
> +               sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
> +
> +       for (i = 0; i < policy->nr_entries; i++) {
> +               switch (policy->policy_entries[i].entity_type) {
> +               case SLR_ET_SETUP_DATA:
> +                       sl_extend_setup_data(&policy->policy_entries[i]);
> +                       break;
> +               case SLR_ET_SLRT:
> +                       sl_extend_slrt(&policy->policy_entries[i]);
> +                       break;
> +               case SLR_ET_TXT_OS2MLE:
> +                       sl_extend_txt_os2mle(&policy->policy_entries[i]);
> +                       break;
> +               case SLR_ET_UNUSED:
> +                       continue;
> +               default:
> +                       sl_tpm_extend_evtlog(policy->policy_entries[i].pcr, TXT_EVTYPE_SLAUNCH,
> +                                            (void *)policy->policy_entries[i].entity,
> +                                            policy->policy_entries[i].size,
> +                                            policy->policy_entries[i].evt_info);
> +               }
> +       }
> +}
> +
> +static void sl_process_extend_uefi_config(struct slr_table *slrt)
> +{
> +       struct slr_entry_uefi_config *uefi_config;
> +       u16 i;
> +
> +       uefi_config = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_UEFI_CONFIG);
> +
> +       /* Optionally here depending on how SL kernel was booted */
> +       if (!uefi_config)
> +               return;
> +
> +       for (i = 0; i < uefi_config->nr_entries; i++) {
> +               sl_tpm_extend_evtlog(uefi_config->uefi_cfg_entries[i].pcr, TXT_EVTYPE_SLAUNCH,
> +                                    (void *)uefi_config->uefi_cfg_entries[i].cfg,
> +                                    uefi_config->uefi_cfg_entries[i].size,
> +                                    uefi_config->uefi_cfg_entries[i].evt_info);
> +       }
> +}
> +
> +asmlinkage __visible void sl_check_region(void *base, u32 size)
> +{
> +       sl_check_pmr_coverage(base, size, false);
> +}
> +
> +asmlinkage __visible void sl_main(void *bootparams)
> +{
> +       struct boot_params *bp  = (struct boot_params *)bootparams;
> +       struct txt_os_mle_data *os_mle_data;
> +       struct slr_table *slrt;
> +       void *txt_heap;
> +
> +       /*
> +        * Ensure loadflags do not indicate a secure launch was done
> +        * unless it really was.
> +        */
> +       bp->hdr.loadflags &= ~SLAUNCH_FLAG;
> +
> +       /*
> +        * Currently only Intel TXT is supported for Secure Launch. Testing
> +        * this value also indicates that the kernel was booted successfully
> +        * through the Secure Launch entry point and is in SMX mode.
> +        */
> +       if (!(sl_cpu_type & SL_CPU_INTEL))
> +               return;
> +
> +       slrt = sl_locate_and_validate_slrt();
> +
> +       /* Locate the TPM event log. */
> +       sl_find_drtm_event_log(slrt);
> +
> +       /* Validate the location of the event log buffer before using it */
> +       sl_validate_event_log_buffer();
> +
> +       /*
> +        * Find the TPM hash algorithms used by the ACM and recorded in the
> +        * event log.
> +        */
> +       if (tpm_log_ver == SL_TPM20_LOG)
> +               sl_find_event_log_algorithms();
> +
> +       /*
> +        * Sanitize them before measuring. Set the SLAUNCH_FLAG early since if
> +        * anything fails, the system will reset anyway.
> +        */
> +       sanitize_boot_params(bp);
> +       bp->hdr.loadflags |= SLAUNCH_FLAG;
> +
> +       sl_check_pmr_coverage(bootparams, PAGE_SIZE, false);
> +
> +       /* Place event log SL specific tags before and after measurements */
> +       sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_START, NULL, 0, "");
> +
> +       /* Process all policy entries and extend the measurements to the evtlog */
> +       sl_process_extend_policy(slrt);
> +
> +       /* Process all EFI config entries and extend the measurements to the evtlog */
> +       sl_process_extend_uefi_config(slrt);
> +
> +       sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_END, NULL, 0, "");
> +
> +       /* No PMR check is needed, the TXT heap is covered by the DPR */
> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +       os_mle_data = txt_os_mle_data_start(txt_heap);
> +
> +       /*
> +        * Now that the OS-MLE data is measured, ensure the MTRR and
> +        * misc enable MSRs are what we expect.
> +        */
> +       sl_txt_validate_msrs(os_mle_data);
> +}
> diff --git a/arch/x86/boot/compressed/sl_stub.S b/arch/x86/boot/compressed/sl_stub.S
> new file mode 100644
> index 000000000000..24b8f23d5dcc
> --- /dev/null
> +++ b/arch/x86/boot/compressed/sl_stub.S
> @@ -0,0 +1,725 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Secure Launch protected mode entry point.
> + *
> + * Copyright (c) 2024, Oracle and/or its affiliates.
> + */
> +       .code32
> +       .text
> +#include <linux/linkage.h>
> +#include <asm/segment.h>
> +#include <asm/msr.h>
> +#include <asm/apicdef.h>
> +#include <asm/trapnr.h>
> +#include <asm/processor-flags.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/bootparam.h>
> +#include <asm/page_types.h>
> +#include <asm/irq_vectors.h>
> +#include <linux/slr_table.h>
> +#include <linux/slaunch.h>
> +
> +/* CPUID: leaf 1, ECX, SMX feature bit */
> +#define X86_FEATURE_BIT_SMX    (1 << 6)
> +
> +#define IDT_VECTOR_LO_BITS     0
> +#define IDT_VECTOR_HI_BITS     6
> +
> +/*
> + * See the comment in head_64.S for detailed information on what this macro
> + * and others like it are used for. The comment appears right at the top of
> + * the file.
> + */
> +#define rva(X) ((X) - sl_stub_entry)
> +
> +/*
> + * The GETSEC op code is open coded because older versions of
> + * GCC do not support the getsec mnemonic.
> + */
> +.macro GETSEC leaf
> +       pushl   %ebx
> +       xorl    %ebx, %ebx      /* Must be zero for SMCTRL */
> +       movl    \leaf, %eax     /* Leaf function */
> +       .byte   0x0f, 0x37      /* GETSEC opcode */
> +       popl    %ebx
> +.endm
> +
> +.macro TXT_RESET error
> +       /*
> +        * Set a sticky error value and reset. Note the movs to %eax act as
> +        * TXT register barriers.
> +        */
> +       movl    \error, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
> +       movl    $1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_NO_SECRETS)
> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
> +       movl    $1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_UNLOCK_MEM_CONFIG)
> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
> +       movl    $1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_RESET)
> +1:
> +       hlt
> +       jmp     1b
> +.endm
> +
> +       .code32
> +SYM_FUNC_START(sl_stub_entry)
> +       cli
> +       cld
> +
> +       /*
> +        * On entry, %ebx has the entry abs offset to sl_stub_entry. This
> +        * will be correctly scaled using the rva macro and avoid causing
> +        * relocations. Only %cs and %ds segments are known good.

Could you please clarify this? 'scaling' is unidiomatic in this
context, and actually means something different in my book. AIUI, %ebx
is guaranteed to carry the actual address of sl_stub_entry(), and
rva() is used to generate relative references using %ebx as a base, as
to avoid /absolute/ relocations, which would require fixups at
runtime.


> +        */
> +
> +       /* Load GDT, set segment regs and lret to __SL32_CS */
> +       leal    rva(sl_gdt_desc)(%ebx), %eax
> +       addl    %eax, 2(%eax)
> +       lgdt    (%eax)
> +
> +       movl    $(__SL32_DS), %eax
> +       movw    %ax, %ds
> +       movw    %ax, %es
> +       movw    %ax, %fs
> +       movw    %ax, %gs
> +       movw    %ax, %ss
> +
> +       /*
> +        * Now that %ss is known good, take the first stack for the BSP. The
> +        * AP stacks are only used on Intel.
> +        */
> +       leal    rva(sl_stacks_end)(%ebx), %esp
> +
> +       leal    rva(.Lsl_cs)(%ebx), %eax
> +       pushl   $(__SL32_CS)
> +       pushl   %eax
> +       lret
> +
> +.Lsl_cs:
> +       /* Save our base pointer reg and page table for MLE */
> +       pushl   %ebx
> +       pushl   %ecx
> +
> +       /* See if SMX feature is supported. */
> +       movl    $1, %eax
> +       cpuid
> +       testl   $(X86_FEATURE_BIT_SMX), %ecx
> +       jz      .Ldo_unknown_cpu
> +
> +       popl    %ecx
> +       popl    %ebx
> +
> +       /* Know it is Intel */
> +       movl    $(SL_CPU_INTEL), rva(sl_cpu_type)(%ebx)
> +
> +       /* Locate the base of the MLE using the page tables in %ecx */
> +       call    sl_find_mle_base
> +
> +       /* Increment CPU count for BSP */
> +       incl    rva(sl_txt_cpu_count)(%ebx)
> +
> +       /*
> +        * Enable SMI with GETSEC[SMCTRL] which were disabled by SENTER.
> +        * NMIs were also disabled by SENTER. Since there is no IDT for the BSP,
> +        * allow the mainline kernel re-enable them in the normal course of
> +        * booting.
> +        */
> +       GETSEC  $(SMX_X86_GETSEC_SMCTRL)
> +
> +       /* Clear the TXT error registers for a clean start of day */
> +       movl    $0, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
> +       movl    $0xffffffff, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ESTS)
> +
> +       /* On Intel, the zero page address is passed in the TXT heap */
> +       /* Read physical base of heap into EAX */
> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
> +       /* Read the size of the BIOS data into ECX (first 8 bytes) */
> +       movl    (%eax), %ecx
> +       /* Skip over BIOS data and size of OS to MLE data section */
> +       leal    8(%eax, %ecx), %eax
> +
> +       /* Need to verify the values in the OS-MLE struct passed in */
> +       call    sl_txt_verify_os_mle_struct
> +
> +       /*
> +        * Get the boot params address from the heap. Note %esi and %ebx MUST
> +        * be preserved across calls and operations.
> +        */
> +       movl    SL_boot_params_addr(%eax), %esi
> +
> +       /* Save %ebx so the APs can find their way home */
> +       movl    %ebx, (SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax)
> +
> +       /* Fetch the AP wake code block address from the heap */
> +       movl    SL_ap_wake_block(%eax), %edi
> +       movl    %edi, rva(sl_txt_ap_wake_block)(%ebx)
> +
> +       /* Store the offset in the AP wake block to the jmp address */
> +       movl    $(sl_ap_jmp_offset - sl_txt_ap_wake_begin), \
> +               (SL_mle_scratch + SL_SCRATCH_AP_JMP_OFFSET)(%eax)
> +
> +       /* Store the offset in the AP wake block to the AP stacks block */
> +       movl    $(sl_stacks - sl_txt_ap_wake_begin), \
> +               (SL_mle_scratch + SL_SCRATCH_AP_STACKS_OFFSET)(%eax)
> +
> +       /* %eax still is the base of the OS-MLE block, save it */
> +       pushl   %eax
> +
> +       /* Relocate the AP wake code to the safe block */
> +       call    sl_txt_reloc_ap_wake
> +
> +       /*
> +        * Wake up all APs that are blocked in the ACM and wait for them to
> +        * halt. This should be done before restoring the MTRRs so the ACM is
> +        * still properly in WB memory.
> +        */
> +       call    sl_txt_wake_aps
> +
> +       /* Restore OS-MLE in %eax */
> +       popl    %eax
> +
> +       /*
> +        * %edi is used by this routine to find the MTRRs which are in the SLRT
> +        * in the Intel info.
> +        */
> +       movl    SL_txt_info(%eax), %edi
> +       call    sl_txt_load_regs
> +
> +       jmp     .Lcpu_setup_done
> +
> +.Ldo_unknown_cpu:
> +       /* Non-Intel CPUs are not yet supported */
> +       ud2
> +
> +.Lcpu_setup_done:
> +       /*
> +        * Don't enable MCE at this point. The kernel will enable
> +        * it on the BSP later when it is ready.
> +        */
> +
> +       /* Done, jump to normal 32b pm entry */
> +       jmp     startup_32
> +SYM_FUNC_END(sl_stub_entry)
> +
> +SYM_FUNC_START(sl_find_mle_base)
> +       /* %ecx has PDPT, get first PD */
> +       movl    (%ecx), %eax
> +       andl    $(PAGE_MASK), %eax
> +       /* Get first PT from first PDE */
> +       movl    (%eax), %eax
> +       andl    $(PAGE_MASK), %eax
> +       /* Get MLE base from first PTE */
> +       movl    (%eax), %eax
> +       andl    $(PAGE_MASK), %eax
> +
> +       movl    %eax, rva(sl_mle_start)(%ebx)
> +       ret
> +SYM_FUNC_END(sl_find_mle_base)
> +
> +SYM_FUNC_START(sl_check_buffer_mle_overlap)
> +       /* %ecx: buffer begin %edx: buffer end */
> +       /* %ebx: MLE begin %edi: MLE end */
> +       /* %eax: region may be inside MLE */
> +
> +       cmpl    %edi, %ecx
> +       jb      .Lnext_check
> +       cmpl    %edi, %edx
> +       jbe     .Lnext_check
> +       jmp     .Lvalid /* Buffer above MLE */
> +
> +.Lnext_check:
> +       cmpl    %ebx, %edx
> +       ja      .Linside_check
> +       cmpl    %ebx, %ecx
> +       jae     .Linside_check
> +       jmp     .Lvalid /* Buffer below MLE */
> +
> +.Linside_check:
> +       cmpl    $0, %eax
> +       jz      .Linvalid
> +       cmpl    %ebx, %ecx
> +       jb      .Linvalid
> +       cmpl    %edi, %edx
> +       ja      .Linvalid
> +       jmp     .Lvalid /* Buffer in MLE */
> +
> +.Linvalid:
> +       TXT_RESET $(SL_ERROR_MLE_BUFFER_OVERLAP)
> +
> +.Lvalid:
> +       ret
> +SYM_FUNC_END(sl_check_buffer_mle_overlap)
> +
> +SYM_FUNC_START(sl_txt_verify_os_mle_struct)
> +       pushl   %ebx
> +       /*
> +        * %eax points to the base of the OS-MLE struct. Need to also
> +        * read some values from the OS-SINIT struct too.
> +        */
> +       movl    -8(%eax), %ecx
> +       /* Skip over OS to MLE data section and size of OS-SINIT structure */
> +       leal    (%eax, %ecx), %edx
> +
> +       /* Load MLE image base absolute offset */
> +       movl    rva(sl_mle_start)(%ebx), %ebx
> +
> +       /* Verify the value of the low PMR base. It should always be 0. */
> +       movl    SL_vtd_pmr_lo_base(%edx), %esi
> +       cmpl    $0, %esi
> +       jz      .Lvalid_pmr_base
> +       TXT_RESET $(SL_ERROR_LO_PMR_BASE)
> +
> +.Lvalid_pmr_base:
> +       /* Grab some values from OS-SINIT structure */
> +       movl    SL_mle_size(%edx), %edi
> +       addl    %ebx, %edi
> +       jc      .Loverflow_detected
> +       movl    SL_vtd_pmr_lo_size(%edx), %esi
> +
> +       /* Check the AP wake block */
> +       movl    SL_ap_wake_block(%eax), %ecx
> +       movl    SL_ap_wake_block_size(%eax), %edx
> +       addl    %ecx, %edx
> +       jc      .Loverflow_detected
> +       pushl   %eax
> +       xorl    %eax, %eax
> +       call    sl_check_buffer_mle_overlap
> +       popl    %eax
> +       cmpl    %esi, %edx
> +       ja      .Lbuffer_beyond_pmr
> +
> +       /*
> +        * Check the boot params. Note during a UEFI boot, the boot
> +        * params will be inside the MLE image. Test for this case
> +        * in the overlap case.
> +        */
> +       movl    SL_boot_params_addr(%eax), %ecx
> +       movl    $(PAGE_SIZE), %edx
> +       addl    %ecx, %edx
> +       jc      .Loverflow_detected
> +       pushl   %eax
> +       movl    $1, %eax
> +       call    sl_check_buffer_mle_overlap
> +       popl    %eax
> +       cmpl    %esi, %edx
> +       ja      .Lbuffer_beyond_pmr
> +
> +       /* Check that the AP wake block is big enough */
> +       cmpl    $(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), \
> +               SL_ap_wake_block_size(%eax)
> +       jae     .Lwake_block_ok
> +       TXT_RESET $(SL_ERROR_WAKE_BLOCK_TOO_SMALL)
> +
> +.Lwake_block_ok:
> +       popl    %ebx
> +       ret
> +
> +.Loverflow_detected:
> +       TXT_RESET $(SL_ERROR_INTEGER_OVERFLOW)
> +
> +.Lbuffer_beyond_pmr:
> +       TXT_RESET $(SL_ERROR_BUFFER_BEYOND_PMR)
> +SYM_FUNC_END(sl_txt_verify_os_mle_struct)
> +
> +SYM_FUNC_START(sl_txt_ap_entry)
> +       cli
> +       cld
> +       /*
> +        * The %cs and %ds segments are known good after waking the AP.
> +        * First order of business is to find where we are and
> +        * save it in %ebx.
> +        */
> +
> +       /* Read physical base of heap into EAX */
> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
> +       /* Read the size of the BIOS data into ECX (first 8 bytes) */
> +       movl    (%eax), %ecx
> +       /* Skip over BIOS data and size of OS to MLE data section */
> +       leal    8(%eax, %ecx), %eax
> +
> +       /* Saved %ebx from the BSP and stash OS-MLE pointer */
> +       movl    (SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax), %ebx
> +
> +       /* Save TXT info ptr in %edi for call to sl_txt_load_regs */
> +       movl    SL_txt_info(%eax), %edi
> +
> +       /* Lock and get our stack index */
> +       movl    $1, %ecx
> +.Lspin:
> +       xorl    %eax, %eax
> +       lock cmpxchgl   %ecx, rva(sl_txt_spin_lock)(%ebx)
> +       pause
> +       jnz     .Lspin
> +
> +       /* Increment the stack index and use the next value inside lock */
> +       incl    rva(sl_txt_stack_index)(%ebx)
> +       movl    rva(sl_txt_stack_index)(%ebx), %eax
> +
> +       /* Unlock */
> +       movl    $0, rva(sl_txt_spin_lock)(%ebx)
> +
> +       /* Location of the relocated AP wake block */
> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %ecx
> +
> +       /* Load reloc GDT, set segment regs and lret to __SL32_CS */
> +       lgdt    (sl_ap_gdt_desc - sl_txt_ap_wake_begin)(%ecx)
> +
> +       movl    $(__SL32_DS), %edx
> +       movw    %dx, %ds
> +       movw    %dx, %es
> +       movw    %dx, %fs
> +       movw    %dx, %gs
> +       movw    %dx, %ss
> +
> +       /* Load our reloc AP stack */
> +       movl    $(TXT_BOOT_STACK_SIZE), %edx
> +       mull    %edx
> +       leal    (sl_stacks_end - sl_txt_ap_wake_begin)(%ecx), %esp
> +       subl    %eax, %esp
> +
> +       /* Switch to AP code segment */
> +       leal    rva(.Lsl_ap_cs)(%ebx), %eax
> +       pushl   $(__SL32_CS)
> +       pushl   %eax
> +       lret
> +
> +.Lsl_ap_cs:
> +       /* Load the relocated AP IDT */
> +       lidt    (sl_ap_idt_desc - sl_txt_ap_wake_begin)(%ecx)
> +
> +       /* Fixup MTRRs and misc enable MSR on APs too */
> +       call    sl_txt_load_regs
> +
> +       /* Enable SMI with GETSEC[SMCTRL] */
> +       GETSEC $(SMX_X86_GETSEC_SMCTRL)
> +
> +       /* IRET-to-self can be used to enable NMIs which SENTER disabled */
> +       leal    rva(.Lnmi_enabled_ap)(%ebx), %eax
> +       pushfl
> +       pushl   $(__SL32_CS)
> +       pushl   %eax
> +       iret
> +
> +.Lnmi_enabled_ap:
> +       /* Put APs in X2APIC mode like the BSP */
> +       movl    $(MSR_IA32_APICBASE), %ecx
> +       rdmsr
> +       orl     $(XAPIC_ENABLE | X2APIC_ENABLE), %eax
> +       wrmsr
> +
> +       /*
> +        * Basically done, increment the CPU count and jump off to the AP
> +        * wake block to wait.
> +        */
> +       lock incl       rva(sl_txt_cpu_count)(%ebx)
> +
> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %eax
> +       jmp     *%eax
> +SYM_FUNC_END(sl_txt_ap_entry)
> +
> +SYM_FUNC_START(sl_txt_reloc_ap_wake)
> +       /* Save boot params register */
> +       pushl   %esi
> +
> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %edi
> +
> +       /* Fixup AP IDT and GDT descriptor before relocating */
> +       leal    rva(sl_ap_idt_desc)(%ebx), %eax
> +       addl    %edi, 2(%eax)
> +       leal    rva(sl_ap_gdt_desc)(%ebx), %eax
> +       addl    %edi, 2(%eax)
> +
> +       /*
> +        * Copy the AP wake code and AP GDT/IDT to the protected wake block
> +        * provided by the loader. Destination already in %edi.
> +        */
> +       movl    $(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), %ecx
> +       leal    rva(sl_txt_ap_wake_begin)(%ebx), %esi
> +       rep movsb
> +
> +       /* Setup the IDT for the APs to use in the relocation block */
> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %ecx
> +       addl    $(sl_ap_idt - sl_txt_ap_wake_begin), %ecx
> +       xorl    %edx, %edx
> +
> +       /* Form the default reset vector relocation address */
> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %esi
> +       addl    $(sl_txt_int_reset - sl_txt_ap_wake_begin), %esi
> +
> +1:
> +       cmpw    $(NR_VECTORS), %dx
> +       jz      .Lap_idt_done
> +
> +       cmpw    $(X86_TRAP_NMI), %dx
> +       jz      2f
> +
> +       /* Load all other fixed vectors with reset handler */
> +       movl    %esi, %eax
> +       movw    %ax, (IDT_VECTOR_LO_BITS)(%ecx)
> +       shrl    $16, %eax
> +       movw    %ax, (IDT_VECTOR_HI_BITS)(%ecx)
> +       jmp     3f
> +
> +2:
> +       /* Load single wake NMI IPI vector at the relocation address */
> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %eax
> +       addl    $(sl_txt_int_nmi - sl_txt_ap_wake_begin), %eax
> +       movw    %ax, (IDT_VECTOR_LO_BITS)(%ecx)
> +       shrl    $16, %eax
> +       movw    %ax, (IDT_VECTOR_HI_BITS)(%ecx)
> +
> +3:
> +       incw    %dx
> +       addl    $8, %ecx
> +       jmp     1b
> +
> +.Lap_idt_done:
> +       popl    %esi
> +       ret
> +SYM_FUNC_END(sl_txt_reloc_ap_wake)
> +
> +SYM_FUNC_START(sl_txt_load_regs)
> +       /* Save base pointer register */
> +       pushl   %ebx
> +
> +       /*
> +        * On Intel, the original variable MTRRs and Misc Enable MSR are
> +        * restored on the BSP at early boot. Each AP will also restore
> +        * its MTRRs and Misc Enable MSR.
> +        */
> +       pushl   %edi
> +       addl    $(SL_saved_bsp_mtrrs), %edi
> +       movl    (%edi), %ebx
> +       pushl   %ebx /* default_mem_type lo */
> +       addl    $4, %edi
> +       movl    (%edi), %ebx
> +       pushl   %ebx /* default_mem_type hi */
> +       addl    $4, %edi
> +       movl    (%edi), %ebx /* mtrr_vcnt lo, don't care about hi part */
> +       addl    $8, %edi /* now at MTRR pair array */
> +       /* Write the variable MTRRs */
> +       movl    $(MSR_MTRRphysBase0), %ecx
> +1:
> +       cmpl    $0, %ebx
> +       jz      2f
> +
> +       movl    (%edi), %eax /* MTRRphysBaseX lo */
> +       addl    $4, %edi
> +       movl    (%edi), %edx /* MTRRphysBaseX hi */
> +       wrmsr
> +       addl    $4, %edi
> +       incl    %ecx
> +       movl    (%edi), %eax /* MTRRphysMaskX lo */
> +       addl    $4, %edi
> +       movl    (%edi), %edx /* MTRRphysMaskX hi */
> +       wrmsr
> +       addl    $4, %edi
> +       incl    %ecx
> +
> +       decl    %ebx
> +       jmp     1b
> +2:
> +       /* Write the default MTRR register */
> +       popl    %edx
> +       popl    %eax
> +       movl    $(MSR_MTRRdefType), %ecx
> +       wrmsr
> +
> +       /* Return to beginning and write the misc enable msr */
> +       popl    %edi
> +       addl    $(SL_saved_misc_enable_msr), %edi
> +       movl    (%edi), %eax /* saved_misc_enable_msr lo */
> +       addl    $4, %edi
> +       movl    (%edi), %edx /* saved_misc_enable_msr hi */
> +       movl    $(MSR_IA32_MISC_ENABLE), %ecx
> +       wrmsr
> +
> +       popl    %ebx
> +       ret
> +SYM_FUNC_END(sl_txt_load_regs)
> +
> +SYM_FUNC_START(sl_txt_wake_aps)
> +       /* Save boot params register */
> +       pushl   %esi
> +
> +       /* First setup the MLE join structure and load it into TXT reg */
> +       leal    rva(sl_gdt)(%ebx), %eax
> +       leal    rva(sl_txt_ap_entry)(%ebx), %ecx
> +       leal    rva(sl_smx_rlp_mle_join)(%ebx), %edx
> +       movl    %eax, SL_rlp_gdt_base(%edx)
> +       movl    %ecx, SL_rlp_entry_point(%edx)
> +       movl    %edx, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_MLE_JOIN)
> +
> +       /* Another TXT heap walk to find various values needed to wake APs */
> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
> +       /* At BIOS data size, find the number of logical processors */
> +       movl    (SL_num_logical_procs + 8)(%eax), %edx
> +       /* Skip over BIOS data */
> +       movl    (%eax), %ecx
> +       addl    %ecx, %eax
> +       /* Skip over OS to MLE */
> +       movl    (%eax), %ecx
> +       addl    %ecx, %eax
> +       /* At OS-SNIT size, get capabilities to know how to wake up the APs */
> +       movl    (SL_capabilities + 8)(%eax), %esi
> +       /* Skip over OS to SNIT */
> +       movl    (%eax), %ecx
> +       addl    %ecx, %eax
> +       /* At SINIT-MLE size, get the AP wake MONITOR address */
> +       movl    (SL_rlp_wakeup_addr + 8)(%eax), %edi
> +
> +       /* Determine how to wake up the APs */
> +       testl   $(1 << TXT_SINIT_MLE_CAP_WAKE_MONITOR), %esi
> +       jz      .Lwake_getsec
> +
> +       /* Wake using MWAIT MONITOR */
> +       movl    $1, (%edi)
> +       jmp     .Laps_awake
> +
> +.Lwake_getsec:
> +       /* Wake using GETSEC(WAKEUP) */
> +       GETSEC  $(SMX_X86_GETSEC_WAKEUP)
> +
> +.Laps_awake:
> +       /*
> +        * All of the APs are woken up and rendesvous in the relocated wake
> +        * block starting at sl_txt_ap_wake_begin. Wait for all of them to
> +        * halt.
> +        */
> +       pause
> +       cmpl    rva(sl_txt_cpu_count)(%ebx), %edx
> +       jne     .Laps_awake
> +
> +       popl    %esi
> +       ret
> +SYM_FUNC_END(sl_txt_wake_aps)
> +
> +/* This is the beginning of the relocated AP wake code block */
> +       .global sl_txt_ap_wake_begin
> +sl_txt_ap_wake_begin:
> +
> +       /* Get the LAPIC ID for each AP and stash it on the stack */
> +       movl    $(MSR_IA32_X2APIC_APICID), %ecx
> +       rdmsr
> +       pushl   %eax
> +
> +       /*
> +        * Get a pointer to the monitor location on this APs stack to test below
> +        * after mwait returns. Currently %esp points to just past the pushed APIC
> +        * ID value.
> +        */
> +       movl    %esp, %eax
> +       subl    $(TXT_BOOT_STACK_SIZE - 4), %eax
> +       movl    $0, (%eax)
> +
> +       /* Clear ecx/edx so no invalid extensions or hints are passed to monitor */
> +       xorl    %ecx, %ecx
> +       xorl    %edx, %edx
> +
> +       /*
> +        * Arm the monitor and wait for it to be poked by he SMP bringup code. The mwait
> +        * instruction can return for a number of reasons. Test to see if it returned
> +        * because the monitor was written to.
> +        */
> +       monitor
> +
> +1:
> +       mfence
> +       mwait
> +       movl    (%eax), %edx
> +       testl   %edx, %edx
> +       jz      1b
> +
> +       /*
> +        * This is the long absolute jump to the 32b Secure Launch protected mode stub
> +        * code in sl_trampoline_start32() in the rmpiggy. The jump address will be
> +        * fixed in the SMP boot code when the first AP is brought up. This whole area
> +        * is provided and protected in the memory map by the prelaunch code.
> +        */
> +       .byte   0xea
> +sl_ap_jmp_offset:
> +       .long   0x00000000
> +       .word   __SL32_CS
> +
> +SYM_FUNC_START(sl_txt_int_nmi)
> +       /* NMI context, just IRET */
> +       iret
> +SYM_FUNC_END(sl_txt_int_nmi)
> +
> +SYM_FUNC_START(sl_txt_int_reset)
> +       TXT_RESET $(SL_ERROR_INV_AP_INTERRUPT)
> +SYM_FUNC_END(sl_txt_int_reset)
> +
> +       .balign 8
> +SYM_DATA_START_LOCAL(sl_ap_idt_desc)
> +       .word   sl_ap_idt_end - sl_ap_idt - 1           /* Limit */
> +       .long   sl_ap_idt - sl_txt_ap_wake_begin        /* Base */
> +SYM_DATA_END_LABEL(sl_ap_idt_desc, SYM_L_LOCAL, sl_ap_idt_desc_end)
> +
> +       .balign 8
> +SYM_DATA_START_LOCAL(sl_ap_idt)
> +       .rept   NR_VECTORS
> +       .word   0x0000          /* Offset 15 to 0 */
> +       .word   __SL32_CS       /* Segment selector */
> +       .word   0x8e00          /* Present, DPL=0, 32b Vector, Interrupt */
> +       .word   0x0000          /* Offset 31 to 16 */
> +       .endr
> +SYM_DATA_END_LABEL(sl_ap_idt, SYM_L_LOCAL, sl_ap_idt_end)
> +
> +       .balign 8
> +SYM_DATA_START_LOCAL(sl_ap_gdt_desc)
> +       .word   sl_ap_gdt_end - sl_ap_gdt - 1
> +       .long   sl_ap_gdt - sl_txt_ap_wake_begin
> +SYM_DATA_END_LABEL(sl_ap_gdt_desc, SYM_L_LOCAL, sl_ap_gdt_desc_end)
> +
> +       .balign 8
> +SYM_DATA_START_LOCAL(sl_ap_gdt)
> +       .quad   0x0000000000000000      /* NULL */
> +       .quad   0x00cf9a000000ffff      /* __SL32_CS */
> +       .quad   0x00cf92000000ffff      /* __SL32_DS */
> +SYM_DATA_END_LABEL(sl_ap_gdt, SYM_L_LOCAL, sl_ap_gdt_end)
> +
> +       /* Small stacks for BSP and APs to work with */
> +       .balign 64
> +SYM_DATA_START_LOCAL(sl_stacks)
> +       .fill (TXT_MAX_CPUS * TXT_BOOT_STACK_SIZE), 1, 0
> +SYM_DATA_END_LABEL(sl_stacks, SYM_L_LOCAL, sl_stacks_end)
> +
> +/* This is the end of the relocated AP wake code block */
> +       .global sl_txt_ap_wake_end
> +sl_txt_ap_wake_end:
> +
> +       .data
> +       .balign 8
> +SYM_DATA_START_LOCAL(sl_gdt_desc)
> +       .word   sl_gdt_end - sl_gdt - 1
> +       .long   sl_gdt - sl_gdt_desc
> +SYM_DATA_END_LABEL(sl_gdt_desc, SYM_L_LOCAL, sl_gdt_desc_end)
> +
> +       .balign 8
> +SYM_DATA_START_LOCAL(sl_gdt)
> +       .quad   0x0000000000000000      /* NULL */
> +       .quad   0x00cf9a000000ffff      /* __SL32_CS */
> +       .quad   0x00cf92000000ffff      /* __SL32_DS */
> +SYM_DATA_END_LABEL(sl_gdt, SYM_L_LOCAL, sl_gdt_end)
> +
> +       .balign 8
> +SYM_DATA_START_LOCAL(sl_smx_rlp_mle_join)
> +       .long   sl_gdt_end - sl_gdt - 1 /* GDT limit */
> +       .long   0x00000000              /* GDT base */
> +       .long   __SL32_CS       /* Seg Sel - CS (DS, ES, SS = seg_sel+8) */
> +       .long   0x00000000      /* Entry point physical address */
> +SYM_DATA_END(sl_smx_rlp_mle_join)
> +
> +SYM_DATA(sl_cpu_type, .long 0x00000000)
> +
> +SYM_DATA(sl_mle_start, .long 0x00000000)
> +
> +SYM_DATA_LOCAL(sl_txt_spin_lock, .long 0x00000000)
> +
> +SYM_DATA_LOCAL(sl_txt_stack_index, .long 0x00000000)
> +
> +SYM_DATA_LOCAL(sl_txt_cpu_count, .long 0x00000000)
> +
> +SYM_DATA_LOCAL(sl_txt_ap_wake_block, .long 0x00000000)
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e022e6eb766c..37f6167f28ba 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -348,6 +348,9 @@
>  #define MSR_IA32_RTIT_OUTPUT_BASE      0x00000560
>  #define MSR_IA32_RTIT_OUTPUT_MASK      0x00000561
>
> +#define MSR_MTRRphysBase0              0x00000200
> +#define MSR_MTRRphysMask0              0x00000201
> +
>  #define MSR_MTRRfix64K_00000           0x00000250
>  #define MSR_MTRRfix16K_80000           0x00000258
>  #define MSR_MTRRfix16K_A0000           0x00000259
> @@ -849,6 +852,8 @@
>  #define MSR_IA32_APICBASE_ENABLE       (1<<11)
>  #define MSR_IA32_APICBASE_BASE         (0xfffff<<12)
>
> +#define MSR_IA32_X2APIC_APICID         0x00000802
> +
>  #define MSR_IA32_UCODE_WRITE           0x00000079
>  #define MSR_IA32_UCODE_REV             0x0000008b
>
> diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
> index 9b82eebd7add..7ce283a22d6b 100644
> --- a/arch/x86/include/uapi/asm/bootparam.h
> +++ b/arch/x86/include/uapi/asm/bootparam.h
> @@ -12,6 +12,7 @@
>  /* loadflags */
>  #define LOADED_HIGH    (1<<0)
>  #define KASLR_FLAG     (1<<1)
> +#define SLAUNCH_FLAG   (1<<2)
>  #define QUIET_FLAG     (1<<5)
>  #define KEEP_SEGMENTS  (1<<6)
>  #define CAN_USE_HEAP   (1<<7)
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index a98020bf31bb..925adce6e2c7 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -13,6 +13,8 @@
>  #include <linux/hardirq.h>
>  #include <linux/suspend.h>
>  #include <linux/kbuild.h>
> +#include <linux/slr_table.h>
> +#include <linux/slaunch.h>
>  #include <asm/processor.h>
>  #include <asm/thread_info.h>
>  #include <asm/sigframe.h>
> @@ -120,4 +122,22 @@ static void __used common(void)
>         OFFSET(ARIA_CTX_rounds, aria_ctx, rounds);
>  #endif
>
> +#ifdef CONFIG_SECURE_LAUNCH
> +       BLANK();
> +       OFFSET(SL_txt_info, txt_os_mle_data, txt_info);
> +       OFFSET(SL_mle_scratch, txt_os_mle_data, mle_scratch);
> +       OFFSET(SL_boot_params_addr, txt_os_mle_data, boot_params_addr);
> +       OFFSET(SL_ap_wake_block, txt_os_mle_data, ap_wake_block);
> +       OFFSET(SL_ap_wake_block_size, txt_os_mle_data, ap_wake_block_size);
> +       OFFSET(SL_saved_misc_enable_msr, slr_entry_intel_info, saved_misc_enable_msr);
> +       OFFSET(SL_saved_bsp_mtrrs, slr_entry_intel_info, saved_bsp_mtrrs);
> +       OFFSET(SL_num_logical_procs, txt_bios_data, num_logical_procs);
> +       OFFSET(SL_capabilities, txt_os_sinit_data, capabilities);
> +       OFFSET(SL_mle_size, txt_os_sinit_data, mle_size);
> +       OFFSET(SL_vtd_pmr_lo_base, txt_os_sinit_data, vtd_pmr_lo_base);
> +       OFFSET(SL_vtd_pmr_lo_size, txt_os_sinit_data, vtd_pmr_lo_size);
> +       OFFSET(SL_rlp_wakeup_addr, txt_sinit_mle_data, rlp_wakeup_addr);
> +       OFFSET(SL_rlp_gdt_base, smx_rlp_mle_join, rlp_gdt_base);
> +       OFFSET(SL_rlp_entry_point, smx_rlp_mle_join, rlp_entry_point);
> +#endif
>  }
> --
> 2.39.3
>
Ross Philipson June 4, 2024, 5:14 p.m. UTC | #2
On 5/31/24 4:00 AM, Ard Biesheuvel wrote:
> Hello Ross,

Hi Ard,

> 
> On Fri, 31 May 2024 at 03:32, Ross Philipson <ross.philipson@oracle.com> wrote:
>>
>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
>> later AMD SKINIT) to vector to during the late launch. The symbol
>> sl_stub_entry is that entry point and its offset into the kernel is
>> conveyed to the launching code using the MLE (Measured Launch
>> Environment) header in the structure named mle_header. The offset of the
>> MLE header is set in the kernel_info. The routine sl_stub contains the
>> very early late launch setup code responsible for setting up the basic
>> environment to allow the normal kernel startup_32 code to proceed. It is
>> also responsible for properly waking and handling the APs on Intel
>> platforms. The routine sl_main which runs after entering 64b mode is
>> responsible for measuring configuration and module information before
>> it is used like the boot params, the kernel command line, the TXT heap,
>> an external initramfs, etc.
>>
>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>> ---
>>   Documentation/arch/x86/boot.rst        |  21 +
>>   arch/x86/boot/compressed/Makefile      |   3 +-
>>   arch/x86/boot/compressed/head_64.S     |  30 +
>>   arch/x86/boot/compressed/kernel_info.S |  34 ++
>>   arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
>>   arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
>>   arch/x86/include/asm/msr-index.h       |   5 +
>>   arch/x86/include/uapi/asm/bootparam.h  |   1 +
>>   arch/x86/kernel/asm-offsets.c          |  20 +
>>   9 files changed, 1415 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/x86/boot/compressed/sl_main.c
>>   create mode 100644 arch/x86/boot/compressed/sl_stub.S
>>
>> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
>> index 4fd492cb4970..295cdf9bcbdb 100644
>> --- a/Documentation/arch/x86/boot.rst
>> +++ b/Documentation/arch/x86/boot.rst
>> @@ -482,6 +482,14 @@ Protocol:  2.00+
>>              - If 1, KASLR enabled.
>>              - If 0, KASLR disabled.
>>
>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
>> +
>> +       - Used internally by the setup kernel to communicate
>> +         Secure Launch status to kernel proper.
>> +
>> +           - If 1, Secure Launch enabled.
>> +           - If 0, Secure Launch disabled.
>> +
>>     Bit 5 (write): QUIET_FLAG
>>
>>          - If 0, print early messages.
>> @@ -1028,6 +1036,19 @@ Offset/size:     0x000c/4
>>
>>     This field contains maximal allowed type for setup_data and setup_indirect structs.
>>
>> +============   =================
>> +Field name:    mle_header_offset
>> +Offset/size:   0x0010/4
>> +============   =================
>> +
>> +  This field contains the offset to the Secure Launch Measured Launch Environment
>> +  (MLE) header. This offset is used to locate information needed during a secure
>> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
>> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
>> +  following a success measured launch. The specific state of the processors is
>> +  outlined in the TXT Software Development Guide, the latest can be found here:
>> +  https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf__;!!ACWV5N9M2RV99hQ!MdqTLUxfB5YUUOB1qyhkN8TF9HAoGW7yy_2qwZGWz8eb73CYkDXY-h1SeyaZjwzsSWz408D3LgDD8Zmw$
>> +
>>
> 
> Could we just repaint this field as the offset relative to the start
> of kernel_info rather than relative to the start of the image? That
> way, there is no need for patch #1, and given that the consumer of
> this field accesses it via kernel_info, I wouldn't expect any issues
> in applying this offset to obtain the actual address.

What you suggest here may be possible with respect to the location of 
the MLE header itself, we need to give that more thought. The real issue 
though is covered in my response below concerning the fields in the MLE 
header.

> 
> 
>>   The Image Checksum
>>   ==================
>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>> index 9189a0e28686..9076a248d4b4 100644
>> --- a/arch/x86/boot/compressed/Makefile
>> +++ b/arch/x86/boot/compressed/Makefile
>> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>>   vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>>   vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
>>
>> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
>> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
>> +       $(obj)/sl_main.o $(obj)/sl_stub.o
>>
>>   $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
>>          $(call if_changed,ld)
>> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
>> index 1dcb794c5479..803c9e2e6d85 100644
>> --- a/arch/x86/boot/compressed/head_64.S
>> +++ b/arch/x86/boot/compressed/head_64.S
>> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
>>          pushq   $0
>>          popfq
>>
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +       /* Ensure the relocation region is coverd by a PMR */
> 
> covered

Ack

> 
>> +       movq    %rbx, %rdi
>> +       movl    $(_bss - startup_32), %esi
>> +       callq   sl_check_region
>> +#endif
>> +
>>   /*
>>    * Copy the compressed kernel to the end of our buffer
>>    * where decompression in place becomes safe.
>> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>          shrq    $3, %rcx
>>          rep     stosq
>>
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +       /*
>> +        * Have to do the final early sl stub work in 64b area.
>> +        *
>> +        * *********** NOTE ***********
>> +        *
>> +        * Several boot params get used before we get a chance to measure
>> +        * them in this call. This is a known issue and we currently don't
>> +        * have a solution. The scratch field doesn't matter. There is no
>> +        * obvious way to do anything about the use of kernel_alignment or
>> +        * init_size though these seem low risk with all the PMR and overlap
>> +        * checks in place.
>> +        */
>> +       movq    %r15, %rdi
>> +       callq   sl_main
>> +
>> +       /* Ensure the decompression location is covered by a PMR */
>> +       movq    %rbp, %rdi
>> +       movq    output_len(%rip), %rsi
>> +       callq   sl_check_region
>> +#endif
>> +
>> +       pushq   %rsi
> 
> This looks like a rebase error.

I will need to look closely at this. I remember removing the push/pop we 
added but this does seem wrong.

> 
>>          call    load_stage2_idt
>>
>>          /* Pass boot_params to initialize_identity_maps() */
>> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
>> index c18f07181dd5..e199b87764e9 100644
>> --- a/arch/x86/boot/compressed/kernel_info.S
>> +++ b/arch/x86/boot/compressed/kernel_info.S
>> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
>>          /* Maximal allowed type for setup_data and setup_indirect structs. */
>>          .long   SETUP_TYPE_MAX
>>
>> +       /* Offset to the MLE header structure */
>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>> +       .long   rva(mle_header)
> 
> ... so this could just be mle_header - kernel_info, and the consumer
> can do the math instead.

While we might be able to do this...

> 
>> +#else
>> +       .long   0
>> +#endif
>> +
>>   kernel_info_var_len_data:
>>          /* Empty for time being... */
>>   SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
>> +
>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>> +       /*
>> +        * The MLE Header per the TXT Specification, section 2.1
>> +        * MLE capabilities, see table 4. Capabilities set:
>> +        * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
>> +        * bit 1: Support for RLP wakeup using MONITOR address
>> +        * bit 2: The ECX register will contain the pointer to the MLE page table
>> +        * bit 5: TPM 1.2 family: Details/authorities PCR usage support
>> +        * bit 9: Supported format of TPM 2.0 event log - TCG compliant
>> +        */
>> +SYM_DATA_START(mle_header)
>> +       .long   0x9082ac5a  /* UUID0 */
>> +       .long   0x74a7476f  /* UUID1 */
>> +       .long   0xa2555c0f  /* UUID2 */
>> +       .long   0x42b651cb  /* UUID3 */
>> +       .long   0x00000034  /* MLE header size */
>> +       .long   0x00020002  /* MLE version 2.2 */
>> +       .long   rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
> 
> and these should perhaps be relative to mle_header?

... this would be much more problematic. For TXT, the SINIT ACM is the 
consumer of this field and it expects it to be relative to the beginning 
of the MLE image (i.e. the setup kernl and piggy in this case) and we 
have no control over that. We used to fix this value up in the prologue 
code for the launch but this was not a clean approach. The same issue 
applies to the next comment too. We will have to see what other options 
we may be able to adopt here.

> 
>> +       .long   0x00000000  /* First valid page of MLE */
>> +       .long   0x00000000  /* Offset within binary of first byte of MLE */
>> +       .long   rva(_edata) /* Offset within binary of last byte + 1 of MLE */
> 
> and here

See last comment.

> 
>> +       .long   0x00000227  /* Bit vector of MLE-supported capabilities */
>> +       .long   0x00000000  /* Starting linear address of command line (unused) */
>> +       .long   0x00000000  /* Ending linear address of command line (unused) */
>> +SYM_DATA_END(mle_header)
>> +#endif
>> diff --git a/arch/x86/boot/compressed/sl_main.c b/arch/x86/boot/compressed/sl_main.c
>> new file mode 100644
>> index 000000000000..61e9baf410fd
>> --- /dev/null
>> +++ b/arch/x86/boot/compressed/sl_main.c
>> @@ -0,0 +1,577 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Secure Launch early measurement and validation routines.
>> + *
>> + * Copyright (c) 2024, Oracle and/or its affiliates.
>> + */
>> +
>> +#include <linux/init.h>
>> +#include <linux/string.h>
>> +#include <linux/linkage.h>
>> +#include <asm/segment.h>
>> +#include <asm/boot.h>
>> +#include <asm/msr.h>
>> +#include <asm/mtrr.h>
>> +#include <asm/processor-flags.h>
>> +#include <asm/asm-offsets.h>
>> +#include <asm/bootparam.h>
>> +#include <asm/bootparam_utils.h>
>> +#include <linux/slr_table.h>
>> +#include <linux/slaunch.h>
>> +#include <crypto/sha1.h>
>> +#include <crypto/sha2.h>
>> +
>> +#define CAPS_VARIABLE_MTRR_COUNT_MASK  0xff
>> +
>> +#define SL_TPM12_LOG           1
>> +#define SL_TPM20_LOG           2
>> +
>> +#define SL_TPM20_MAX_ALGS      2
>> +
>> +#define SL_MAX_EVENT_DATA      64
>> +#define SL_TPM12_LOG_SIZE      (sizeof(struct tcg_pcr_event) + \
>> +                               SL_MAX_EVENT_DATA)
>> +#define SL_TPM20_LOG_SIZE      (sizeof(struct tcg_pcr_event2_head) + \
>> +                               SHA1_DIGEST_SIZE + SHA256_DIGEST_SIZE + \
>> +                               sizeof(struct tcg_event_field) + \
>> +                               SL_MAX_EVENT_DATA)
>> +
>> +static void *evtlog_base;
>> +static u32 evtlog_size;
>> +static struct txt_heap_event_log_pointer2_1_element *log20_elem;
>> +static u32 tpm_log_ver = SL_TPM12_LOG;
>> +static struct tcg_efi_specid_event_algs tpm_algs[SL_TPM20_MAX_ALGS] = {0};
>> +
>> +extern u32 sl_cpu_type;
>> +extern u32 sl_mle_start;
>> +
>> +static u64 sl_txt_read(u32 reg)
>> +{
>> +       return readq((void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
>> +}
>> +
>> +static void sl_txt_write(u32 reg, u64 val)
>> +{
>> +       writeq(val, (void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
>> +}
>> +
>> +static void __noreturn sl_txt_reset(u64 error)
>> +{
>> +       /* Reading the E2STS register acts as a barrier for TXT registers */
>> +       sl_txt_write(TXT_CR_ERRORCODE, error);
>> +       sl_txt_read(TXT_CR_E2STS);
>> +       sl_txt_write(TXT_CR_CMD_UNLOCK_MEM_CONFIG, 1);
>> +       sl_txt_read(TXT_CR_E2STS);
>> +       sl_txt_write(TXT_CR_CMD_RESET, 1);
>> +
>> +       for ( ; ; )
>> +               asm volatile ("hlt");
>> +
>> +       unreachable();
>> +}
>> +
>> +static u64 sl_rdmsr(u32 reg)
>> +{
>> +       u64 lo, hi;
>> +
>> +       asm volatile ("rdmsr" : "=a" (lo), "=d" (hi) : "c" (reg));
>> +
>> +       return (hi << 32) | lo;
>> +}
>> +
>> +static struct slr_table *sl_locate_and_validate_slrt(void)
>> +{
>> +       struct txt_os_mle_data *os_mle_data;
>> +       struct slr_table *slrt;
>> +       void *txt_heap;
>> +
>> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +       os_mle_data = txt_os_mle_data_start(txt_heap);
>> +
>> +       if (!os_mle_data->slrt)
>> +               sl_txt_reset(SL_ERROR_INVALID_SLRT);
>> +
>> +       slrt = (struct slr_table *)os_mle_data->slrt;
>> +
>> +       if (slrt->magic != SLR_TABLE_MAGIC)
>> +               sl_txt_reset(SL_ERROR_INVALID_SLRT);
>> +
>> +       if (slrt->architecture != SLR_INTEL_TXT)
>> +               sl_txt_reset(SL_ERROR_INVALID_SLRT);
>> +
>> +       return slrt;
>> +}
>> +
>> +static void sl_check_pmr_coverage(void *base, u32 size, bool allow_hi)
>> +{
>> +       struct txt_os_sinit_data *os_sinit_data;
>> +       void *end = base + size;
>> +       void *txt_heap;
>> +
>> +       if (!(sl_cpu_type & SL_CPU_INTEL))
>> +               return;
>> +
>> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +       os_sinit_data = txt_os_sinit_data_start(txt_heap);
>> +
>> +       if ((end >= (void *)0x100000000ULL) && (base < (void *)0x100000000ULL))
>> +               sl_txt_reset(SL_ERROR_REGION_STRADDLE_4GB);
>> +
>> +       /*
>> +        * Note that the late stub code validates that the hi PMR covers
>> +        * all memory above 4G. At this point the code can only check that
>> +        * regions are within the hi PMR but that is sufficient.
>> +        */
>> +       if ((end > (void *)0x100000000ULL) && (base >= (void *)0x100000000ULL)) {
> 
> Better to put the cast on the pointers, given that we are doing
> arithmetic, and use SZ_4G instead of open coding this constant 4
> times.

Good idea, we will do that.

> 
> 
> 
>> +               if (allow_hi) {
>> +                       if (end >= (void *)(os_sinit_data->vtd_pmr_hi_base +
>> +                                          os_sinit_data->vtd_pmr_hi_size))
>> +                               sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
>> +               } else {
>> +                       sl_txt_reset(SL_ERROR_REGION_ABOVE_4GB);
>> +               }
>> +       }
>> +
>> +       if (end >= (void *)os_sinit_data->vtd_pmr_lo_size)
>> +               sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
>> +}
>> +
>> +/*
>> + * Some MSRs are modified by the pre-launch code including the MTRRs.
>> + * The early MLE code has to restore these values. This code validates
>> + * the values after they are measured.
>> + */
>> +static void sl_txt_validate_msrs(struct txt_os_mle_data *os_mle_data)
>> +{
>> +       struct slr_txt_mtrr_state *saved_bsp_mtrrs;
>> +       u64 mtrr_caps, mtrr_def_type, mtrr_var;
>> +       struct slr_entry_intel_info *txt_info;
>> +       u64 misc_en_msr;
>> +       u32 vcnt, i;
>> +
>> +       txt_info = (struct slr_entry_intel_info *)os_mle_data->txt_info;
>> +       saved_bsp_mtrrs = &txt_info->saved_bsp_mtrrs;
>> +
>> +       mtrr_caps = sl_rdmsr(MSR_MTRRcap);
>> +       vcnt = (u32)(mtrr_caps & CAPS_VARIABLE_MTRR_COUNT_MASK);
>> +
>> +       if (saved_bsp_mtrrs->mtrr_vcnt > vcnt)
>> +               sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
>> +       if (saved_bsp_mtrrs->mtrr_vcnt > TXT_OS_MLE_MAX_VARIABLE_MTRRS)
>> +               sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
>> +
>> +       mtrr_def_type = sl_rdmsr(MSR_MTRRdefType);
>> +       if (saved_bsp_mtrrs->default_mem_type != mtrr_def_type)
>> +               sl_txt_reset(SL_ERROR_MTRR_INV_DEF_TYPE);
>> +
>> +       for (i = 0; i < saved_bsp_mtrrs->mtrr_vcnt; i++) {
>> +               mtrr_var = sl_rdmsr(MTRRphysBase_MSR(i));
>> +               if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physbase != mtrr_var)
>> +                       sl_txt_reset(SL_ERROR_MTRR_INV_BASE);
>> +               mtrr_var = sl_rdmsr(MTRRphysMask_MSR(i));
>> +               if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physmask != mtrr_var)
>> +                       sl_txt_reset(SL_ERROR_MTRR_INV_MASK);
>> +       }
>> +
>> +       misc_en_msr = sl_rdmsr(MSR_IA32_MISC_ENABLE);
>> +       if (txt_info->saved_misc_enable_msr != misc_en_msr)
>> +               sl_txt_reset(SL_ERROR_MSR_INV_MISC_EN);
>> +}
>> +
>> +static void sl_find_drtm_event_log(struct slr_table *slrt)
>> +{
>> +       struct txt_os_sinit_data *os_sinit_data;
>> +       struct slr_entry_log_info *log_info;
>> +       void *txt_heap;
>> +
>> +       log_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_LOG_INFO);
>> +       if (!log_info)
>> +               sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
>> +
>> +       evtlog_base = (void *)log_info->addr;
>> +       evtlog_size = log_info->size;
>> +
>> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +
>> +       /*
>> +        * For TPM 2.0, the event log 2.1 extended data structure has to also
>> +        * be located and fixed up.
>> +        */
>> +       os_sinit_data = txt_os_sinit_data_start(txt_heap);
>> +
>> +       /*
>> +        * Only support version 6 and later that properly handle the
>> +        * list of ExtDataElements in the OS-SINIT structure.
>> +        */
>> +       if (os_sinit_data->version < 6)
>> +               sl_txt_reset(SL_ERROR_OS_SINIT_BAD_VERSION);
>> +
>> +       /* Find the TPM2.0 logging extended heap element */
>> +       log20_elem = tpm20_find_log2_1_element(os_sinit_data);
>> +
>> +       /* If found, this implies TPM20 log and family */
>> +       if (log20_elem)
>> +               tpm_log_ver = SL_TPM20_LOG;
>> +}
>> +
>> +static void sl_validate_event_log_buffer(void)
>> +{
>> +       struct txt_os_sinit_data *os_sinit_data;
>> +       void *txt_heap, *txt_end;
>> +       void *mle_base, *mle_end;
>> +       void *evtlog_end;
>> +
>> +       if ((u64)evtlog_size > (LLONG_MAX - (u64)evtlog_base))
>> +               sl_txt_reset(SL_ERROR_INTEGER_OVERFLOW);
>> +       evtlog_end = evtlog_base + evtlog_size;
>> +
>> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +       txt_end = txt_heap + sl_txt_read(TXT_CR_HEAP_SIZE);
>> +       os_sinit_data = txt_os_sinit_data_start(txt_heap);
>> +
>> +       mle_base = (void *)(u64)sl_mle_start;
>> +       mle_end = mle_base + os_sinit_data->mle_size;
>> +
>> +       /*
>> +        * This check is to ensure the event log buffer does not overlap with
>> +        * the MLE image.
>> +        */
>> +       if (evtlog_base >= mle_end && evtlog_end > mle_end)
>> +               goto pmr_check; /* above */
>> +
>> +       if (evtlog_end <= mle_base && evtlog_base < mle_base)
>> +               goto pmr_check; /* below */
>> +
>> +       sl_txt_reset(SL_ERROR_MLE_BUFFER_OVERLAP);
>> +
>> +pmr_check:
>> +       /*
>> +        * The TXT heap is protected by the DPR. If the TPM event log is
>> +        * inside the TXT heap, there is no need for a PMR check.
>> +        */
>> +       if (evtlog_base > txt_heap && evtlog_end < txt_end)
>> +               return;
>> +
>> +       sl_check_pmr_coverage(evtlog_base, evtlog_size, true);
>> +}
>> +
>> +static void sl_find_event_log_algorithms(void)
>> +{
>> +       struct tcg_efi_specid_event_head *efi_head =
>> +               (struct tcg_efi_specid_event_head *)(evtlog_base +
>> +                                       log20_elem->first_record_offset +
>> +                                       sizeof(struct tcg_pcr_event));
>> +
>> +       if (efi_head->num_algs == 0 || efi_head->num_algs > 2)
>> +               sl_txt_reset(SL_ERROR_TPM_NUMBER_ALGS);
>> +
>> +       memcpy(&tpm_algs[0], &efi_head->digest_sizes[0],
>> +              sizeof(struct tcg_efi_specid_event_algs) * efi_head->num_algs);
>> +}
>> +
>> +static void sl_tpm12_log_event(u32 pcr, u32 event_type,
>> +                              const u8 *data, u32 length,
>> +                              const u8 *event_data, u32 event_size)
>> +{
>> +       u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
>> +       u8 log_buf[SL_TPM12_LOG_SIZE] = {0};
>> +       struct tcg_pcr_event *pcr_event;
>> +       u32 total_size;
>> +
>> +       pcr_event = (struct tcg_pcr_event *)log_buf;
>> +       pcr_event->pcr_idx = pcr;
>> +       pcr_event->event_type = event_type;
>> +       if (length > 0) {
>> +               sha1(data, length, &sha1_hash[0]);
>> +               memcpy(&pcr_event->digest[0], &sha1_hash[0], SHA1_DIGEST_SIZE);
>> +       }
>> +       pcr_event->event_size = event_size;
>> +       if (event_size > 0)
>> +               memcpy((u8 *)pcr_event + sizeof(struct tcg_pcr_event),
>> +                      event_data, event_size);
>> +
>> +       total_size = sizeof(struct tcg_pcr_event) + event_size;
>> +
>> +       if (tpm12_log_event(evtlog_base, evtlog_size, total_size, pcr_event))
>> +               sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
>> +}
>> +
>> +static void sl_tpm20_log_event(u32 pcr, u32 event_type,
>> +                              const u8 *data, u32 length,
>> +                              const u8 *event_data, u32 event_size)
>> +{
>> +       u8 sha256_hash[SHA256_DIGEST_SIZE] = {0};
>> +       u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
>> +       u8 log_buf[SL_TPM20_LOG_SIZE] = {0};
>> +       struct sha256_state sctx256 = {0};
>> +       struct tcg_pcr_event2_head *head;
>> +       struct tcg_event_field *event;
>> +       u32 total_size;
>> +       u16 *alg_ptr;
>> +       u8 *dgst_ptr;
>> +
>> +       head = (struct tcg_pcr_event2_head *)log_buf;
>> +       head->pcr_idx = pcr;
>> +       head->event_type = event_type;
>> +       total_size = sizeof(struct tcg_pcr_event2_head);
>> +       alg_ptr = (u16 *)(log_buf + sizeof(struct tcg_pcr_event2_head));
>> +
>> +       for ( ; head->count < 2; head->count++) {
>> +               if (!tpm_algs[head->count].alg_id)
>> +                       break;
>> +
>> +               *alg_ptr = tpm_algs[head->count].alg_id;
>> +               dgst_ptr = (u8 *)alg_ptr + sizeof(u16);
>> +
>> +               if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256 &&
>> +                   length) {
>> +                       sha256_init(&sctx256);
>> +                       sha256_update(&sctx256, data, length);
>> +                       sha256_final(&sctx256, &sha256_hash[0]);
>> +               } else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1 &&
>> +                          length) {
>> +                       sha1(data, length, &sha1_hash[0]);
>> +               }
>> +
>> +               if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256) {
>> +                       memcpy(dgst_ptr, &sha256_hash[0], SHA256_DIGEST_SIZE);
>> +                       total_size += SHA256_DIGEST_SIZE + sizeof(u16);
>> +                       alg_ptr = (u16 *)((u8 *)alg_ptr + SHA256_DIGEST_SIZE + sizeof(u16));
>> +               } else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1) {
>> +                       memcpy(dgst_ptr, &sha1_hash[0], SHA1_DIGEST_SIZE);
>> +                       total_size += SHA1_DIGEST_SIZE + sizeof(u16);
>> +                       alg_ptr = (u16 *)((u8 *)alg_ptr + SHA1_DIGEST_SIZE + sizeof(u16));
>> +               } else {
>> +                       sl_txt_reset(SL_ERROR_TPM_UNKNOWN_DIGEST);
>> +               }
>> +       }
>> +
>> +       event = (struct tcg_event_field *)(log_buf + total_size);
>> +       event->event_size = event_size;
>> +       if (event_size > 0)
>> +               memcpy((u8 *)event + sizeof(struct tcg_event_field), event_data, event_size);
>> +       total_size += sizeof(struct tcg_event_field) + event_size;
>> +
>> +       if (tpm20_log_event(log20_elem, evtlog_base, evtlog_size, total_size, &log_buf[0]))
>> +               sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
>> +}
>> +
>> +static void sl_tpm_extend_evtlog(u32 pcr, u32 type,
>> +                                const u8 *data, u32 length, const char *desc)
>> +{
>> +       if (tpm_log_ver == SL_TPM20_LOG)
>> +               sl_tpm20_log_event(pcr, type, data, length,
>> +                                  (const u8 *)desc, strlen(desc));
>> +       else
>> +               sl_tpm12_log_event(pcr, type, data, length,
>> +                                  (const u8 *)desc, strlen(desc));
>> +}
>> +
>> +static struct setup_data *sl_handle_setup_data(struct setup_data *curr,
>> +                                              struct slr_policy_entry *entry)
>> +{
>> +       struct setup_indirect *ind;
>> +       struct setup_data *next;
>> +
>> +       if (!curr)
>> +               return NULL;
>> +
>> +       next = (struct setup_data *)(unsigned long)curr->next;
>> +
>> +       /* SETUP_INDIRECT instances have to be handled differently */
>> +       if (curr->type == SETUP_INDIRECT) {
>> +               ind = (struct setup_indirect *)((u8 *)curr + offsetof(struct setup_data, data));
>> +
>> +               sl_check_pmr_coverage((void *)ind->addr, ind->len, true);
>> +
>> +               sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
>> +                                    (void *)ind->addr, ind->len,
>> +                                    entry->evt_info);
>> +
>> +               return next;
>> +       }
>> +
>> +       sl_check_pmr_coverage(((u8 *)curr) + sizeof(struct setup_data),
>> +                             curr->len, true);
>> +
>> +       sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
>> +                            ((u8 *)curr) + sizeof(struct setup_data),
>> +                            curr->len,
>> +                            entry->evt_info);
>> +
>> +       return next;
>> +}
>> +
>> +static void sl_extend_setup_data(struct slr_policy_entry *entry)
>> +{
>> +       struct setup_data *data;
>> +
>> +       /*
>> +        * Measuring the boot params measured the fixed e820 memory map.
>> +        * Measure any setup_data entries including e820 extended entries.
>> +        */
>> +       data = (struct setup_data *)(unsigned long)entry->entity;
>> +       while (data)
>> +               data = sl_handle_setup_data(data, entry);
>> +}
>> +
>> +static void sl_extend_slrt(struct slr_policy_entry *entry)
>> +{
>> +       struct slr_table *slrt = (struct slr_table *)entry->entity;
>> +       struct slr_entry_intel_info *intel_info;
>> +
>> +       /*
>> +        * In revision one of the SLRT, the only table that needs to be
>> +        * measured is the Intel info table. Everything else is meta-data,
>> +        * addresses and sizes. Note the size of what to measure is not set.
>> +        * The flag SLR_POLICY_IMPLICIT_SIZE leaves it to the measuring code
>> +        * to sort out.
>> +        */
>> +       if (slrt->revision == 1) {
>> +               intel_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_INTEL_INFO);
>> +               if (!intel_info)
>> +                       sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
>> +
>> +               sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
>> +                                    (void *)entry->entity, sizeof(struct slr_entry_intel_info),
>> +                                    entry->evt_info);
>> +       }
>> +}
>> +
>> +static void sl_extend_txt_os2mle(struct slr_policy_entry *entry)
>> +{
>> +       struct txt_os_mle_data *os_mle_data;
>> +       void *txt_heap;
>> +
>> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +       os_mle_data = txt_os_mle_data_start(txt_heap);
>> +
>> +       /*
>> +        * Version 1 of the OS-MLE heap structure has no fields to measure. It just
>> +        * has addresses and sizes and a scratch buffer.
>> +        */
>> +       if (os_mle_data->version == 1)
>> +               return;
>> +}
>> +
>> +static void sl_process_extend_policy(struct slr_table *slrt)
>> +{
>> +       struct slr_entry_policy *policy;
>> +       u16 i;
>> +
>> +       policy = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_ENTRY_POLICY);
>> +       if (!policy)
>> +               sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
>> +
>> +       for (i = 0; i < policy->nr_entries; i++) {
>> +               switch (policy->policy_entries[i].entity_type) {
>> +               case SLR_ET_SETUP_DATA:
>> +                       sl_extend_setup_data(&policy->policy_entries[i]);
>> +                       break;
>> +               case SLR_ET_SLRT:
>> +                       sl_extend_slrt(&policy->policy_entries[i]);
>> +                       break;
>> +               case SLR_ET_TXT_OS2MLE:
>> +                       sl_extend_txt_os2mle(&policy->policy_entries[i]);
>> +                       break;
>> +               case SLR_ET_UNUSED:
>> +                       continue;
>> +               default:
>> +                       sl_tpm_extend_evtlog(policy->policy_entries[i].pcr, TXT_EVTYPE_SLAUNCH,
>> +                                            (void *)policy->policy_entries[i].entity,
>> +                                            policy->policy_entries[i].size,
>> +                                            policy->policy_entries[i].evt_info);
>> +               }
>> +       }
>> +}
>> +
>> +static void sl_process_extend_uefi_config(struct slr_table *slrt)
>> +{
>> +       struct slr_entry_uefi_config *uefi_config;
>> +       u16 i;
>> +
>> +       uefi_config = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_UEFI_CONFIG);
>> +
>> +       /* Optionally here depending on how SL kernel was booted */
>> +       if (!uefi_config)
>> +               return;
>> +
>> +       for (i = 0; i < uefi_config->nr_entries; i++) {
>> +               sl_tpm_extend_evtlog(uefi_config->uefi_cfg_entries[i].pcr, TXT_EVTYPE_SLAUNCH,
>> +                                    (void *)uefi_config->uefi_cfg_entries[i].cfg,
>> +                                    uefi_config->uefi_cfg_entries[i].size,
>> +                                    uefi_config->uefi_cfg_entries[i].evt_info);
>> +       }
>> +}
>> +
>> +asmlinkage __visible void sl_check_region(void *base, u32 size)
>> +{
>> +       sl_check_pmr_coverage(base, size, false);
>> +}
>> +
>> +asmlinkage __visible void sl_main(void *bootparams)
>> +{
>> +       struct boot_params *bp  = (struct boot_params *)bootparams;
>> +       struct txt_os_mle_data *os_mle_data;
>> +       struct slr_table *slrt;
>> +       void *txt_heap;
>> +
>> +       /*
>> +        * Ensure loadflags do not indicate a secure launch was done
>> +        * unless it really was.
>> +        */
>> +       bp->hdr.loadflags &= ~SLAUNCH_FLAG;
>> +
>> +       /*
>> +        * Currently only Intel TXT is supported for Secure Launch. Testing
>> +        * this value also indicates that the kernel was booted successfully
>> +        * through the Secure Launch entry point and is in SMX mode.
>> +        */
>> +       if (!(sl_cpu_type & SL_CPU_INTEL))
>> +               return;
>> +
>> +       slrt = sl_locate_and_validate_slrt();
>> +
>> +       /* Locate the TPM event log. */
>> +       sl_find_drtm_event_log(slrt);
>> +
>> +       /* Validate the location of the event log buffer before using it */
>> +       sl_validate_event_log_buffer();
>> +
>> +       /*
>> +        * Find the TPM hash algorithms used by the ACM and recorded in the
>> +        * event log.
>> +        */
>> +       if (tpm_log_ver == SL_TPM20_LOG)
>> +               sl_find_event_log_algorithms();
>> +
>> +       /*
>> +        * Sanitize them before measuring. Set the SLAUNCH_FLAG early since if
>> +        * anything fails, the system will reset anyway.
>> +        */
>> +       sanitize_boot_params(bp);
>> +       bp->hdr.loadflags |= SLAUNCH_FLAG;
>> +
>> +       sl_check_pmr_coverage(bootparams, PAGE_SIZE, false);
>> +
>> +       /* Place event log SL specific tags before and after measurements */
>> +       sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_START, NULL, 0, "");
>> +
>> +       /* Process all policy entries and extend the measurements to the evtlog */
>> +       sl_process_extend_policy(slrt);
>> +
>> +       /* Process all EFI config entries and extend the measurements to the evtlog */
>> +       sl_process_extend_uefi_config(slrt);
>> +
>> +       sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_END, NULL, 0, "");
>> +
>> +       /* No PMR check is needed, the TXT heap is covered by the DPR */
>> +       txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +       os_mle_data = txt_os_mle_data_start(txt_heap);
>> +
>> +       /*
>> +        * Now that the OS-MLE data is measured, ensure the MTRR and
>> +        * misc enable MSRs are what we expect.
>> +        */
>> +       sl_txt_validate_msrs(os_mle_data);
>> +}
>> diff --git a/arch/x86/boot/compressed/sl_stub.S b/arch/x86/boot/compressed/sl_stub.S
>> new file mode 100644
>> index 000000000000..24b8f23d5dcc
>> --- /dev/null
>> +++ b/arch/x86/boot/compressed/sl_stub.S
>> @@ -0,0 +1,725 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +/*
>> + * Secure Launch protected mode entry point.
>> + *
>> + * Copyright (c) 2024, Oracle and/or its affiliates.
>> + */
>> +       .code32
>> +       .text
>> +#include <linux/linkage.h>
>> +#include <asm/segment.h>
>> +#include <asm/msr.h>
>> +#include <asm/apicdef.h>
>> +#include <asm/trapnr.h>
>> +#include <asm/processor-flags.h>
>> +#include <asm/asm-offsets.h>
>> +#include <asm/bootparam.h>
>> +#include <asm/page_types.h>
>> +#include <asm/irq_vectors.h>
>> +#include <linux/slr_table.h>
>> +#include <linux/slaunch.h>
>> +
>> +/* CPUID: leaf 1, ECX, SMX feature bit */
>> +#define X86_FEATURE_BIT_SMX    (1 << 6)
>> +
>> +#define IDT_VECTOR_LO_BITS     0
>> +#define IDT_VECTOR_HI_BITS     6
>> +
>> +/*
>> + * See the comment in head_64.S for detailed information on what this macro
>> + * and others like it are used for. The comment appears right at the top of
>> + * the file.
>> + */
>> +#define rva(X) ((X) - sl_stub_entry)
>> +
>> +/*
>> + * The GETSEC op code is open coded because older versions of
>> + * GCC do not support the getsec mnemonic.
>> + */
>> +.macro GETSEC leaf
>> +       pushl   %ebx
>> +       xorl    %ebx, %ebx      /* Must be zero for SMCTRL */
>> +       movl    \leaf, %eax     /* Leaf function */
>> +       .byte   0x0f, 0x37      /* GETSEC opcode */
>> +       popl    %ebx
>> +.endm
>> +
>> +.macro TXT_RESET error
>> +       /*
>> +        * Set a sticky error value and reset. Note the movs to %eax act as
>> +        * TXT register barriers.
>> +        */
>> +       movl    \error, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
>> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
>> +       movl    $1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_NO_SECRETS)
>> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
>> +       movl    $1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_UNLOCK_MEM_CONFIG)
>> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
>> +       movl    $1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_RESET)
>> +1:
>> +       hlt
>> +       jmp     1b
>> +.endm
>> +
>> +       .code32
>> +SYM_FUNC_START(sl_stub_entry)
>> +       cli
>> +       cld
>> +
>> +       /*
>> +        * On entry, %ebx has the entry abs offset to sl_stub_entry. This
>> +        * will be correctly scaled using the rva macro and avoid causing
>> +        * relocations. Only %cs and %ds segments are known good.
> 
> Could you please clarify this? 'scaling' is unidiomatic in this
> context, and actually means something different in my book. AIUI, %ebx
> is guaranteed to carry the actual address of sl_stub_entry(), and
> rva() is used to generate relative references using %ebx as a base, as
> to avoid /absolute/ relocations, which would require fixups at
> runtime.

Yes, we will reword this to be clearer.

> 
> 
>> +        */
>> +
>> +       /* Load GDT, set segment regs and lret to __SL32_CS */
>> +       leal    rva(sl_gdt_desc)(%ebx), %eax
>> +       addl    %eax, 2(%eax)
>> +       lgdt    (%eax)
>> +
>> +       movl    $(__SL32_DS), %eax
>> +       movw    %ax, %ds
>> +       movw    %ax, %es
>> +       movw    %ax, %fs
>> +       movw    %ax, %gs
>> +       movw    %ax, %ss
>> +
>> +       /*
>> +        * Now that %ss is known good, take the first stack for the BSP. The
>> +        * AP stacks are only used on Intel.
>> +        */
>> +       leal    rva(sl_stacks_end)(%ebx), %esp
>> +
>> +       leal    rva(.Lsl_cs)(%ebx), %eax
>> +       pushl   $(__SL32_CS)
>> +       pushl   %eax
>> +       lret
>> +
>> +.Lsl_cs:
>> +       /* Save our base pointer reg and page table for MLE */
>> +       pushl   %ebx
>> +       pushl   %ecx
>> +
>> +       /* See if SMX feature is supported. */
>> +       movl    $1, %eax
>> +       cpuid
>> +       testl   $(X86_FEATURE_BIT_SMX), %ecx
>> +       jz      .Ldo_unknown_cpu
>> +
>> +       popl    %ecx
>> +       popl    %ebx
>> +
>> +       /* Know it is Intel */
>> +       movl    $(SL_CPU_INTEL), rva(sl_cpu_type)(%ebx)
>> +
>> +       /* Locate the base of the MLE using the page tables in %ecx */
>> +       call    sl_find_mle_base
>> +
>> +       /* Increment CPU count for BSP */
>> +       incl    rva(sl_txt_cpu_count)(%ebx)
>> +
>> +       /*
>> +        * Enable SMI with GETSEC[SMCTRL] which were disabled by SENTER.
>> +        * NMIs were also disabled by SENTER. Since there is no IDT for the BSP,
>> +        * allow the mainline kernel re-enable them in the normal course of
>> +        * booting.
>> +        */
>> +       GETSEC  $(SMX_X86_GETSEC_SMCTRL)
>> +
>> +       /* Clear the TXT error registers for a clean start of day */
>> +       movl    $0, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
>> +       movl    $0xffffffff, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ESTS)
>> +
>> +       /* On Intel, the zero page address is passed in the TXT heap */
>> +       /* Read physical base of heap into EAX */
>> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
>> +       /* Read the size of the BIOS data into ECX (first 8 bytes) */
>> +       movl    (%eax), %ecx
>> +       /* Skip over BIOS data and size of OS to MLE data section */
>> +       leal    8(%eax, %ecx), %eax
>> +
>> +       /* Need to verify the values in the OS-MLE struct passed in */
>> +       call    sl_txt_verify_os_mle_struct
>> +
>> +       /*
>> +        * Get the boot params address from the heap. Note %esi and %ebx MUST
>> +        * be preserved across calls and operations.
>> +        */
>> +       movl    SL_boot_params_addr(%eax), %esi
>> +
>> +       /* Save %ebx so the APs can find their way home */
>> +       movl    %ebx, (SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax)
>> +
>> +       /* Fetch the AP wake code block address from the heap */
>> +       movl    SL_ap_wake_block(%eax), %edi
>> +       movl    %edi, rva(sl_txt_ap_wake_block)(%ebx)
>> +
>> +       /* Store the offset in the AP wake block to the jmp address */
>> +       movl    $(sl_ap_jmp_offset - sl_txt_ap_wake_begin), \
>> +               (SL_mle_scratch + SL_SCRATCH_AP_JMP_OFFSET)(%eax)
>> +
>> +       /* Store the offset in the AP wake block to the AP stacks block */
>> +       movl    $(sl_stacks - sl_txt_ap_wake_begin), \
>> +               (SL_mle_scratch + SL_SCRATCH_AP_STACKS_OFFSET)(%eax)
>> +
>> +       /* %eax still is the base of the OS-MLE block, save it */
>> +       pushl   %eax
>> +
>> +       /* Relocate the AP wake code to the safe block */
>> +       call    sl_txt_reloc_ap_wake
>> +
>> +       /*
>> +        * Wake up all APs that are blocked in the ACM and wait for them to
>> +        * halt. This should be done before restoring the MTRRs so the ACM is
>> +        * still properly in WB memory.
>> +        */
>> +       call    sl_txt_wake_aps
>> +
>> +       /* Restore OS-MLE in %eax */
>> +       popl    %eax
>> +
>> +       /*
>> +        * %edi is used by this routine to find the MTRRs which are in the SLRT
>> +        * in the Intel info.
>> +        */
>> +       movl    SL_txt_info(%eax), %edi
>> +       call    sl_txt_load_regs
>> +
>> +       jmp     .Lcpu_setup_done
>> +
>> +.Ldo_unknown_cpu:
>> +       /* Non-Intel CPUs are not yet supported */
>> +       ud2
>> +
>> +.Lcpu_setup_done:
>> +       /*
>> +        * Don't enable MCE at this point. The kernel will enable
>> +        * it on the BSP later when it is ready.
>> +        */
>> +
>> +       /* Done, jump to normal 32b pm entry */
>> +       jmp     startup_32
>> +SYM_FUNC_END(sl_stub_entry)
>> +
>> +SYM_FUNC_START(sl_find_mle_base)
>> +       /* %ecx has PDPT, get first PD */
>> +       movl    (%ecx), %eax
>> +       andl    $(PAGE_MASK), %eax
>> +       /* Get first PT from first PDE */
>> +       movl    (%eax), %eax
>> +       andl    $(PAGE_MASK), %eax
>> +       /* Get MLE base from first PTE */
>> +       movl    (%eax), %eax
>> +       andl    $(PAGE_MASK), %eax
>> +
>> +       movl    %eax, rva(sl_mle_start)(%ebx)
>> +       ret
>> +SYM_FUNC_END(sl_find_mle_base)
>> +
>> +SYM_FUNC_START(sl_check_buffer_mle_overlap)
>> +       /* %ecx: buffer begin %edx: buffer end */
>> +       /* %ebx: MLE begin %edi: MLE end */
>> +       /* %eax: region may be inside MLE */
>> +
>> +       cmpl    %edi, %ecx
>> +       jb      .Lnext_check
>> +       cmpl    %edi, %edx
>> +       jbe     .Lnext_check
>> +       jmp     .Lvalid /* Buffer above MLE */
>> +
>> +.Lnext_check:
>> +       cmpl    %ebx, %edx
>> +       ja      .Linside_check
>> +       cmpl    %ebx, %ecx
>> +       jae     .Linside_check
>> +       jmp     .Lvalid /* Buffer below MLE */
>> +
>> +.Linside_check:
>> +       cmpl    $0, %eax
>> +       jz      .Linvalid
>> +       cmpl    %ebx, %ecx
>> +       jb      .Linvalid
>> +       cmpl    %edi, %edx
>> +       ja      .Linvalid
>> +       jmp     .Lvalid /* Buffer in MLE */
>> +
>> +.Linvalid:
>> +       TXT_RESET $(SL_ERROR_MLE_BUFFER_OVERLAP)
>> +
>> +.Lvalid:
>> +       ret
>> +SYM_FUNC_END(sl_check_buffer_mle_overlap)
>> +
>> +SYM_FUNC_START(sl_txt_verify_os_mle_struct)
>> +       pushl   %ebx
>> +       /*
>> +        * %eax points to the base of the OS-MLE struct. Need to also
>> +        * read some values from the OS-SINIT struct too.
>> +        */
>> +       movl    -8(%eax), %ecx
>> +       /* Skip over OS to MLE data section and size of OS-SINIT structure */
>> +       leal    (%eax, %ecx), %edx
>> +
>> +       /* Load MLE image base absolute offset */
>> +       movl    rva(sl_mle_start)(%ebx), %ebx
>> +
>> +       /* Verify the value of the low PMR base. It should always be 0. */
>> +       movl    SL_vtd_pmr_lo_base(%edx), %esi
>> +       cmpl    $0, %esi
>> +       jz      .Lvalid_pmr_base
>> +       TXT_RESET $(SL_ERROR_LO_PMR_BASE)
>> +
>> +.Lvalid_pmr_base:
>> +       /* Grab some values from OS-SINIT structure */
>> +       movl    SL_mle_size(%edx), %edi
>> +       addl    %ebx, %edi
>> +       jc      .Loverflow_detected
>> +       movl    SL_vtd_pmr_lo_size(%edx), %esi
>> +
>> +       /* Check the AP wake block */
>> +       movl    SL_ap_wake_block(%eax), %ecx
>> +       movl    SL_ap_wake_block_size(%eax), %edx
>> +       addl    %ecx, %edx
>> +       jc      .Loverflow_detected
>> +       pushl   %eax
>> +       xorl    %eax, %eax
>> +       call    sl_check_buffer_mle_overlap
>> +       popl    %eax
>> +       cmpl    %esi, %edx
>> +       ja      .Lbuffer_beyond_pmr
>> +
>> +       /*
>> +        * Check the boot params. Note during a UEFI boot, the boot
>> +        * params will be inside the MLE image. Test for this case
>> +        * in the overlap case.
>> +        */
>> +       movl    SL_boot_params_addr(%eax), %ecx
>> +       movl    $(PAGE_SIZE), %edx
>> +       addl    %ecx, %edx
>> +       jc      .Loverflow_detected
>> +       pushl   %eax
>> +       movl    $1, %eax
>> +       call    sl_check_buffer_mle_overlap
>> +       popl    %eax
>> +       cmpl    %esi, %edx
>> +       ja      .Lbuffer_beyond_pmr
>> +
>> +       /* Check that the AP wake block is big enough */
>> +       cmpl    $(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), \
>> +               SL_ap_wake_block_size(%eax)
>> +       jae     .Lwake_block_ok
>> +       TXT_RESET $(SL_ERROR_WAKE_BLOCK_TOO_SMALL)
>> +
>> +.Lwake_block_ok:
>> +       popl    %ebx
>> +       ret
>> +
>> +.Loverflow_detected:
>> +       TXT_RESET $(SL_ERROR_INTEGER_OVERFLOW)
>> +
>> +.Lbuffer_beyond_pmr:
>> +       TXT_RESET $(SL_ERROR_BUFFER_BEYOND_PMR)
>> +SYM_FUNC_END(sl_txt_verify_os_mle_struct)
>> +
>> +SYM_FUNC_START(sl_txt_ap_entry)
>> +       cli
>> +       cld
>> +       /*
>> +        * The %cs and %ds segments are known good after waking the AP.
>> +        * First order of business is to find where we are and
>> +        * save it in %ebx.
>> +        */
>> +
>> +       /* Read physical base of heap into EAX */
>> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
>> +       /* Read the size of the BIOS data into ECX (first 8 bytes) */
>> +       movl    (%eax), %ecx
>> +       /* Skip over BIOS data and size of OS to MLE data section */
>> +       leal    8(%eax, %ecx), %eax
>> +
>> +       /* Saved %ebx from the BSP and stash OS-MLE pointer */
>> +       movl    (SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax), %ebx
>> +
>> +       /* Save TXT info ptr in %edi for call to sl_txt_load_regs */
>> +       movl    SL_txt_info(%eax), %edi
>> +
>> +       /* Lock and get our stack index */
>> +       movl    $1, %ecx
>> +.Lspin:
>> +       xorl    %eax, %eax
>> +       lock cmpxchgl   %ecx, rva(sl_txt_spin_lock)(%ebx)
>> +       pause
>> +       jnz     .Lspin
>> +
>> +       /* Increment the stack index and use the next value inside lock */
>> +       incl    rva(sl_txt_stack_index)(%ebx)
>> +       movl    rva(sl_txt_stack_index)(%ebx), %eax
>> +
>> +       /* Unlock */
>> +       movl    $0, rva(sl_txt_spin_lock)(%ebx)
>> +
>> +       /* Location of the relocated AP wake block */
>> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %ecx
>> +
>> +       /* Load reloc GDT, set segment regs and lret to __SL32_CS */
>> +       lgdt    (sl_ap_gdt_desc - sl_txt_ap_wake_begin)(%ecx)
>> +
>> +       movl    $(__SL32_DS), %edx
>> +       movw    %dx, %ds
>> +       movw    %dx, %es
>> +       movw    %dx, %fs
>> +       movw    %dx, %gs
>> +       movw    %dx, %ss
>> +
>> +       /* Load our reloc AP stack */
>> +       movl    $(TXT_BOOT_STACK_SIZE), %edx
>> +       mull    %edx
>> +       leal    (sl_stacks_end - sl_txt_ap_wake_begin)(%ecx), %esp
>> +       subl    %eax, %esp
>> +
>> +       /* Switch to AP code segment */
>> +       leal    rva(.Lsl_ap_cs)(%ebx), %eax
>> +       pushl   $(__SL32_CS)
>> +       pushl   %eax
>> +       lret
>> +
>> +.Lsl_ap_cs:
>> +       /* Load the relocated AP IDT */
>> +       lidt    (sl_ap_idt_desc - sl_txt_ap_wake_begin)(%ecx)
>> +
>> +       /* Fixup MTRRs and misc enable MSR on APs too */
>> +       call    sl_txt_load_regs
>> +
>> +       /* Enable SMI with GETSEC[SMCTRL] */
>> +       GETSEC $(SMX_X86_GETSEC_SMCTRL)
>> +
>> +       /* IRET-to-self can be used to enable NMIs which SENTER disabled */
>> +       leal    rva(.Lnmi_enabled_ap)(%ebx), %eax
>> +       pushfl
>> +       pushl   $(__SL32_CS)
>> +       pushl   %eax
>> +       iret
>> +
>> +.Lnmi_enabled_ap:
>> +       /* Put APs in X2APIC mode like the BSP */
>> +       movl    $(MSR_IA32_APICBASE), %ecx
>> +       rdmsr
>> +       orl     $(XAPIC_ENABLE | X2APIC_ENABLE), %eax
>> +       wrmsr
>> +
>> +       /*
>> +        * Basically done, increment the CPU count and jump off to the AP
>> +        * wake block to wait.
>> +        */
>> +       lock incl       rva(sl_txt_cpu_count)(%ebx)
>> +
>> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %eax
>> +       jmp     *%eax
>> +SYM_FUNC_END(sl_txt_ap_entry)
>> +
>> +SYM_FUNC_START(sl_txt_reloc_ap_wake)
>> +       /* Save boot params register */
>> +       pushl   %esi
>> +
>> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %edi
>> +
>> +       /* Fixup AP IDT and GDT descriptor before relocating */
>> +       leal    rva(sl_ap_idt_desc)(%ebx), %eax
>> +       addl    %edi, 2(%eax)
>> +       leal    rva(sl_ap_gdt_desc)(%ebx), %eax
>> +       addl    %edi, 2(%eax)
>> +
>> +       /*
>> +        * Copy the AP wake code and AP GDT/IDT to the protected wake block
>> +        * provided by the loader. Destination already in %edi.
>> +        */
>> +       movl    $(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), %ecx
>> +       leal    rva(sl_txt_ap_wake_begin)(%ebx), %esi
>> +       rep movsb
>> +
>> +       /* Setup the IDT for the APs to use in the relocation block */
>> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %ecx
>> +       addl    $(sl_ap_idt - sl_txt_ap_wake_begin), %ecx
>> +       xorl    %edx, %edx
>> +
>> +       /* Form the default reset vector relocation address */
>> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %esi
>> +       addl    $(sl_txt_int_reset - sl_txt_ap_wake_begin), %esi
>> +
>> +1:
>> +       cmpw    $(NR_VECTORS), %dx
>> +       jz      .Lap_idt_done
>> +
>> +       cmpw    $(X86_TRAP_NMI), %dx
>> +       jz      2f
>> +
>> +       /* Load all other fixed vectors with reset handler */
>> +       movl    %esi, %eax
>> +       movw    %ax, (IDT_VECTOR_LO_BITS)(%ecx)
>> +       shrl    $16, %eax
>> +       movw    %ax, (IDT_VECTOR_HI_BITS)(%ecx)
>> +       jmp     3f
>> +
>> +2:
>> +       /* Load single wake NMI IPI vector at the relocation address */
>> +       movl    rva(sl_txt_ap_wake_block)(%ebx), %eax
>> +       addl    $(sl_txt_int_nmi - sl_txt_ap_wake_begin), %eax
>> +       movw    %ax, (IDT_VECTOR_LO_BITS)(%ecx)
>> +       shrl    $16, %eax
>> +       movw    %ax, (IDT_VECTOR_HI_BITS)(%ecx)
>> +
>> +3:
>> +       incw    %dx
>> +       addl    $8, %ecx
>> +       jmp     1b
>> +
>> +.Lap_idt_done:
>> +       popl    %esi
>> +       ret
>> +SYM_FUNC_END(sl_txt_reloc_ap_wake)
>> +
>> +SYM_FUNC_START(sl_txt_load_regs)
>> +       /* Save base pointer register */
>> +       pushl   %ebx
>> +
>> +       /*
>> +        * On Intel, the original variable MTRRs and Misc Enable MSR are
>> +        * restored on the BSP at early boot. Each AP will also restore
>> +        * its MTRRs and Misc Enable MSR.
>> +        */
>> +       pushl   %edi
>> +       addl    $(SL_saved_bsp_mtrrs), %edi
>> +       movl    (%edi), %ebx
>> +       pushl   %ebx /* default_mem_type lo */
>> +       addl    $4, %edi
>> +       movl    (%edi), %ebx
>> +       pushl   %ebx /* default_mem_type hi */
>> +       addl    $4, %edi
>> +       movl    (%edi), %ebx /* mtrr_vcnt lo, don't care about hi part */
>> +       addl    $8, %edi /* now at MTRR pair array */
>> +       /* Write the variable MTRRs */
>> +       movl    $(MSR_MTRRphysBase0), %ecx
>> +1:
>> +       cmpl    $0, %ebx
>> +       jz      2f
>> +
>> +       movl    (%edi), %eax /* MTRRphysBaseX lo */
>> +       addl    $4, %edi
>> +       movl    (%edi), %edx /* MTRRphysBaseX hi */
>> +       wrmsr
>> +       addl    $4, %edi
>> +       incl    %ecx
>> +       movl    (%edi), %eax /* MTRRphysMaskX lo */
>> +       addl    $4, %edi
>> +       movl    (%edi), %edx /* MTRRphysMaskX hi */
>> +       wrmsr
>> +       addl    $4, %edi
>> +       incl    %ecx
>> +
>> +       decl    %ebx
>> +       jmp     1b
>> +2:
>> +       /* Write the default MTRR register */
>> +       popl    %edx
>> +       popl    %eax
>> +       movl    $(MSR_MTRRdefType), %ecx
>> +       wrmsr
>> +
>> +       /* Return to beginning and write the misc enable msr */
>> +       popl    %edi
>> +       addl    $(SL_saved_misc_enable_msr), %edi
>> +       movl    (%edi), %eax /* saved_misc_enable_msr lo */
>> +       addl    $4, %edi
>> +       movl    (%edi), %edx /* saved_misc_enable_msr hi */
>> +       movl    $(MSR_IA32_MISC_ENABLE), %ecx
>> +       wrmsr
>> +
>> +       popl    %ebx
>> +       ret
>> +SYM_FUNC_END(sl_txt_load_regs)
>> +
>> +SYM_FUNC_START(sl_txt_wake_aps)
>> +       /* Save boot params register */
>> +       pushl   %esi
>> +
>> +       /* First setup the MLE join structure and load it into TXT reg */
>> +       leal    rva(sl_gdt)(%ebx), %eax
>> +       leal    rva(sl_txt_ap_entry)(%ebx), %ecx
>> +       leal    rva(sl_smx_rlp_mle_join)(%ebx), %edx
>> +       movl    %eax, SL_rlp_gdt_base(%edx)
>> +       movl    %ecx, SL_rlp_entry_point(%edx)
>> +       movl    %edx, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_MLE_JOIN)
>> +
>> +       /* Another TXT heap walk to find various values needed to wake APs */
>> +       movl    (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
>> +       /* At BIOS data size, find the number of logical processors */
>> +       movl    (SL_num_logical_procs + 8)(%eax), %edx
>> +       /* Skip over BIOS data */
>> +       movl    (%eax), %ecx
>> +       addl    %ecx, %eax
>> +       /* Skip over OS to MLE */
>> +       movl    (%eax), %ecx
>> +       addl    %ecx, %eax
>> +       /* At OS-SNIT size, get capabilities to know how to wake up the APs */
>> +       movl    (SL_capabilities + 8)(%eax), %esi
>> +       /* Skip over OS to SNIT */
>> +       movl    (%eax), %ecx
>> +       addl    %ecx, %eax
>> +       /* At SINIT-MLE size, get the AP wake MONITOR address */
>> +       movl    (SL_rlp_wakeup_addr + 8)(%eax), %edi
>> +
>> +       /* Determine how to wake up the APs */
>> +       testl   $(1 << TXT_SINIT_MLE_CAP_WAKE_MONITOR), %esi
>> +       jz      .Lwake_getsec
>> +
>> +       /* Wake using MWAIT MONITOR */
>> +       movl    $1, (%edi)
>> +       jmp     .Laps_awake
>> +
>> +.Lwake_getsec:
>> +       /* Wake using GETSEC(WAKEUP) */
>> +       GETSEC  $(SMX_X86_GETSEC_WAKEUP)
>> +
>> +.Laps_awake:
>> +       /*
>> +        * All of the APs are woken up and rendesvous in the relocated wake
>> +        * block starting at sl_txt_ap_wake_begin. Wait for all of them to
>> +        * halt.
>> +        */
>> +       pause
>> +       cmpl    rva(sl_txt_cpu_count)(%ebx), %edx
>> +       jne     .Laps_awake
>> +
>> +       popl    %esi
>> +       ret
>> +SYM_FUNC_END(sl_txt_wake_aps)
>> +
>> +/* This is the beginning of the relocated AP wake code block */
>> +       .global sl_txt_ap_wake_begin
>> +sl_txt_ap_wake_begin:
>> +
>> +       /* Get the LAPIC ID for each AP and stash it on the stack */
>> +       movl    $(MSR_IA32_X2APIC_APICID), %ecx
>> +       rdmsr
>> +       pushl   %eax
>> +
>> +       /*
>> +        * Get a pointer to the monitor location on this APs stack to test below
>> +        * after mwait returns. Currently %esp points to just past the pushed APIC
>> +        * ID value.
>> +        */
>> +       movl    %esp, %eax
>> +       subl    $(TXT_BOOT_STACK_SIZE - 4), %eax
>> +       movl    $0, (%eax)
>> +
>> +       /* Clear ecx/edx so no invalid extensions or hints are passed to monitor */
>> +       xorl    %ecx, %ecx
>> +       xorl    %edx, %edx
>> +
>> +       /*
>> +        * Arm the monitor and wait for it to be poked by he SMP bringup code. The mwait
>> +        * instruction can return for a number of reasons. Test to see if it returned
>> +        * because the monitor was written to.
>> +        */
>> +       monitor
>> +
>> +1:
>> +       mfence
>> +       mwait
>> +       movl    (%eax), %edx
>> +       testl   %edx, %edx
>> +       jz      1b
>> +
>> +       /*
>> +        * This is the long absolute jump to the 32b Secure Launch protected mode stub
>> +        * code in sl_trampoline_start32() in the rmpiggy. The jump address will be
>> +        * fixed in the SMP boot code when the first AP is brought up. This whole area
>> +        * is provided and protected in the memory map by the prelaunch code.
>> +        */
>> +       .byte   0xea
>> +sl_ap_jmp_offset:
>> +       .long   0x00000000
>> +       .word   __SL32_CS
>> +
>> +SYM_FUNC_START(sl_txt_int_nmi)
>> +       /* NMI context, just IRET */
>> +       iret
>> +SYM_FUNC_END(sl_txt_int_nmi)
>> +
>> +SYM_FUNC_START(sl_txt_int_reset)
>> +       TXT_RESET $(SL_ERROR_INV_AP_INTERRUPT)
>> +SYM_FUNC_END(sl_txt_int_reset)
>> +
>> +       .balign 8
>> +SYM_DATA_START_LOCAL(sl_ap_idt_desc)
>> +       .word   sl_ap_idt_end - sl_ap_idt - 1           /* Limit */
>> +       .long   sl_ap_idt - sl_txt_ap_wake_begin        /* Base */
>> +SYM_DATA_END_LABEL(sl_ap_idt_desc, SYM_L_LOCAL, sl_ap_idt_desc_end)
>> +
>> +       .balign 8
>> +SYM_DATA_START_LOCAL(sl_ap_idt)
>> +       .rept   NR_VECTORS
>> +       .word   0x0000          /* Offset 15 to 0 */
>> +       .word   __SL32_CS       /* Segment selector */
>> +       .word   0x8e00          /* Present, DPL=0, 32b Vector, Interrupt */
>> +       .word   0x0000          /* Offset 31 to 16 */
>> +       .endr
>> +SYM_DATA_END_LABEL(sl_ap_idt, SYM_L_LOCAL, sl_ap_idt_end)
>> +
>> +       .balign 8
>> +SYM_DATA_START_LOCAL(sl_ap_gdt_desc)
>> +       .word   sl_ap_gdt_end - sl_ap_gdt - 1
>> +       .long   sl_ap_gdt - sl_txt_ap_wake_begin
>> +SYM_DATA_END_LABEL(sl_ap_gdt_desc, SYM_L_LOCAL, sl_ap_gdt_desc_end)
>> +
>> +       .balign 8
>> +SYM_DATA_START_LOCAL(sl_ap_gdt)
>> +       .quad   0x0000000000000000      /* NULL */
>> +       .quad   0x00cf9a000000ffff      /* __SL32_CS */
>> +       .quad   0x00cf92000000ffff      /* __SL32_DS */
>> +SYM_DATA_END_LABEL(sl_ap_gdt, SYM_L_LOCAL, sl_ap_gdt_end)
>> +
>> +       /* Small stacks for BSP and APs to work with */
>> +       .balign 64
>> +SYM_DATA_START_LOCAL(sl_stacks)
>> +       .fill (TXT_MAX_CPUS * TXT_BOOT_STACK_SIZE), 1, 0
>> +SYM_DATA_END_LABEL(sl_stacks, SYM_L_LOCAL, sl_stacks_end)
>> +
>> +/* This is the end of the relocated AP wake code block */
>> +       .global sl_txt_ap_wake_end
>> +sl_txt_ap_wake_end:
>> +
>> +       .data
>> +       .balign 8
>> +SYM_DATA_START_LOCAL(sl_gdt_desc)
>> +       .word   sl_gdt_end - sl_gdt - 1
>> +       .long   sl_gdt - sl_gdt_desc
>> +SYM_DATA_END_LABEL(sl_gdt_desc, SYM_L_LOCAL, sl_gdt_desc_end)
>> +
>> +       .balign 8
>> +SYM_DATA_START_LOCAL(sl_gdt)
>> +       .quad   0x0000000000000000      /* NULL */
>> +       .quad   0x00cf9a000000ffff      /* __SL32_CS */
>> +       .quad   0x00cf92000000ffff      /* __SL32_DS */
>> +SYM_DATA_END_LABEL(sl_gdt, SYM_L_LOCAL, sl_gdt_end)
>> +
>> +       .balign 8
>> +SYM_DATA_START_LOCAL(sl_smx_rlp_mle_join)
>> +       .long   sl_gdt_end - sl_gdt - 1 /* GDT limit */
>> +       .long   0x00000000              /* GDT base */
>> +       .long   __SL32_CS       /* Seg Sel - CS (DS, ES, SS = seg_sel+8) */
>> +       .long   0x00000000      /* Entry point physical address */
>> +SYM_DATA_END(sl_smx_rlp_mle_join)
>> +
>> +SYM_DATA(sl_cpu_type, .long 0x00000000)
>> +
>> +SYM_DATA(sl_mle_start, .long 0x00000000)
>> +
>> +SYM_DATA_LOCAL(sl_txt_spin_lock, .long 0x00000000)
>> +
>> +SYM_DATA_LOCAL(sl_txt_stack_index, .long 0x00000000)
>> +
>> +SYM_DATA_LOCAL(sl_txt_cpu_count, .long 0x00000000)
>> +
>> +SYM_DATA_LOCAL(sl_txt_ap_wake_block, .long 0x00000000)
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index e022e6eb766c..37f6167f28ba 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -348,6 +348,9 @@
>>   #define MSR_IA32_RTIT_OUTPUT_BASE      0x00000560
>>   #define MSR_IA32_RTIT_OUTPUT_MASK      0x00000561
>>
>> +#define MSR_MTRRphysBase0              0x00000200
>> +#define MSR_MTRRphysMask0              0x00000201
>> +
>>   #define MSR_MTRRfix64K_00000           0x00000250
>>   #define MSR_MTRRfix16K_80000           0x00000258
>>   #define MSR_MTRRfix16K_A0000           0x00000259
>> @@ -849,6 +852,8 @@
>>   #define MSR_IA32_APICBASE_ENABLE       (1<<11)
>>   #define MSR_IA32_APICBASE_BASE         (0xfffff<<12)
>>
>> +#define MSR_IA32_X2APIC_APICID         0x00000802
>> +
>>   #define MSR_IA32_UCODE_WRITE           0x00000079
>>   #define MSR_IA32_UCODE_REV             0x0000008b
>>
>> diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
>> index 9b82eebd7add..7ce283a22d6b 100644
>> --- a/arch/x86/include/uapi/asm/bootparam.h
>> +++ b/arch/x86/include/uapi/asm/bootparam.h
>> @@ -12,6 +12,7 @@
>>   /* loadflags */
>>   #define LOADED_HIGH    (1<<0)
>>   #define KASLR_FLAG     (1<<1)
>> +#define SLAUNCH_FLAG   (1<<2)
>>   #define QUIET_FLAG     (1<<5)
>>   #define KEEP_SEGMENTS  (1<<6)
>>   #define CAN_USE_HEAP   (1<<7)
>> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
>> index a98020bf31bb..925adce6e2c7 100644
>> --- a/arch/x86/kernel/asm-offsets.c
>> +++ b/arch/x86/kernel/asm-offsets.c
>> @@ -13,6 +13,8 @@
>>   #include <linux/hardirq.h>
>>   #include <linux/suspend.h>
>>   #include <linux/kbuild.h>
>> +#include <linux/slr_table.h>
>> +#include <linux/slaunch.h>
>>   #include <asm/processor.h>
>>   #include <asm/thread_info.h>
>>   #include <asm/sigframe.h>
>> @@ -120,4 +122,22 @@ static void __used common(void)
>>          OFFSET(ARIA_CTX_rounds, aria_ctx, rounds);
>>   #endif
>>
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +       BLANK();
>> +       OFFSET(SL_txt_info, txt_os_mle_data, txt_info);
>> +       OFFSET(SL_mle_scratch, txt_os_mle_data, mle_scratch);
>> +       OFFSET(SL_boot_params_addr, txt_os_mle_data, boot_params_addr);
>> +       OFFSET(SL_ap_wake_block, txt_os_mle_data, ap_wake_block);
>> +       OFFSET(SL_ap_wake_block_size, txt_os_mle_data, ap_wake_block_size);
>> +       OFFSET(SL_saved_misc_enable_msr, slr_entry_intel_info, saved_misc_enable_msr);
>> +       OFFSET(SL_saved_bsp_mtrrs, slr_entry_intel_info, saved_bsp_mtrrs);
>> +       OFFSET(SL_num_logical_procs, txt_bios_data, num_logical_procs);
>> +       OFFSET(SL_capabilities, txt_os_sinit_data, capabilities);
>> +       OFFSET(SL_mle_size, txt_os_sinit_data, mle_size);
>> +       OFFSET(SL_vtd_pmr_lo_base, txt_os_sinit_data, vtd_pmr_lo_base);
>> +       OFFSET(SL_vtd_pmr_lo_size, txt_os_sinit_data, vtd_pmr_lo_size);
>> +       OFFSET(SL_rlp_wakeup_addr, txt_sinit_mle_data, rlp_wakeup_addr);
>> +       OFFSET(SL_rlp_gdt_base, smx_rlp_mle_join, rlp_gdt_base);
>> +       OFFSET(SL_rlp_entry_point, smx_rlp_mle_join, rlp_entry_point);
>> +#endif
>>   }
>> --
>> 2.39.3
>>
Ross Philipson June 4, 2024, 5:24 p.m. UTC | #3
On 5/31/24 6:33 AM, Ard Biesheuvel wrote:
> On Fri, 31 May 2024 at 13:00, Ard Biesheuvel <ardb@kernel.org> wrote:
>>
>> Hello Ross,
>>
>> On Fri, 31 May 2024 at 03:32, Ross Philipson <ross.philipson@oracle.com> wrote:
>>>
>>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
>>> later AMD SKINIT) to vector to during the late launch. The symbol
>>> sl_stub_entry is that entry point and its offset into the kernel is
>>> conveyed to the launching code using the MLE (Measured Launch
>>> Environment) header in the structure named mle_header. The offset of the
>>> MLE header is set in the kernel_info. The routine sl_stub contains the
>>> very early late launch setup code responsible for setting up the basic
>>> environment to allow the normal kernel startup_32 code to proceed. It is
>>> also responsible for properly waking and handling the APs on Intel
>>> platforms. The routine sl_main which runs after entering 64b mode is
>>> responsible for measuring configuration and module information before
>>> it is used like the boot params, the kernel command line, the TXT heap,
>>> an external initramfs, etc.
>>>
>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>>> ---
>>>   Documentation/arch/x86/boot.rst        |  21 +
>>>   arch/x86/boot/compressed/Makefile      |   3 +-
>>>   arch/x86/boot/compressed/head_64.S     |  30 +
>>>   arch/x86/boot/compressed/kernel_info.S |  34 ++
>>>   arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
>>>   arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
>>>   arch/x86/include/asm/msr-index.h       |   5 +
>>>   arch/x86/include/uapi/asm/bootparam.h  |   1 +
>>>   arch/x86/kernel/asm-offsets.c          |  20 +
>>>   9 files changed, 1415 insertions(+), 1 deletion(-)
>>>   create mode 100644 arch/x86/boot/compressed/sl_main.c
>>>   create mode 100644 arch/x86/boot/compressed/sl_stub.S
>>>
>>> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
>>> index 4fd492cb4970..295cdf9bcbdb 100644
>>> --- a/Documentation/arch/x86/boot.rst
>>> +++ b/Documentation/arch/x86/boot.rst
>>> @@ -482,6 +482,14 @@ Protocol:  2.00+
>>>              - If 1, KASLR enabled.
>>>              - If 0, KASLR disabled.
>>>
>>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
>>> +
>>> +       - Used internally by the setup kernel to communicate
>>> +         Secure Launch status to kernel proper.
>>> +
>>> +           - If 1, Secure Launch enabled.
>>> +           - If 0, Secure Launch disabled.
>>> +
>>>     Bit 5 (write): QUIET_FLAG
>>>
>>>          - If 0, print early messages.
>>> @@ -1028,6 +1036,19 @@ Offset/size:     0x000c/4
>>>
>>>     This field contains maximal allowed type for setup_data and setup_indirect structs.
>>>
>>> +============   =================
>>> +Field name:    mle_header_offset
>>> +Offset/size:   0x0010/4
>>> +============   =================
>>> +
>>> +  This field contains the offset to the Secure Launch Measured Launch Environment
>>> +  (MLE) header. This offset is used to locate information needed during a secure
>>> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
>>> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
>>> +  following a success measured launch. The specific state of the processors is
>>> +  outlined in the TXT Software Development Guide, the latest can be found here:
>>> +  https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf__;!!ACWV5N9M2RV99hQ!Mng0gnPhOYZ8D02t1rYwQfY6U3uWaypJyd1T2rsWz3QNHr9GhIZ9ANB_-cgPExxX0e0KmCpda-3VX8Fj$
>>> +
>>>
>>
>> Could we just repaint this field as the offset relative to the start
>> of kernel_info rather than relative to the start of the image? That
>> way, there is no need for patch #1, and given that the consumer of
>> this field accesses it via kernel_info, I wouldn't expect any issues
>> in applying this offset to obtain the actual address.
>>
>>
>>>   The Image Checksum
>>>   ==================
>>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>>> index 9189a0e28686..9076a248d4b4 100644
>>> --- a/arch/x86/boot/compressed/Makefile
>>> +++ b/arch/x86/boot/compressed/Makefile
>>> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>>>   vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>>>   vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
>>>
>>> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
>>> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
>>> +       $(obj)/sl_main.o $(obj)/sl_stub.o
>>>
>>>   $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
>>>          $(call if_changed,ld)
>>> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
>>> index 1dcb794c5479..803c9e2e6d85 100644
>>> --- a/arch/x86/boot/compressed/head_64.S
>>> +++ b/arch/x86/boot/compressed/head_64.S
>>> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
>>>          pushq   $0
>>>          popfq
>>>
>>> +#ifdef CONFIG_SECURE_LAUNCH
>>> +       /* Ensure the relocation region is coverd by a PMR */
>>
>> covered
>>
>>> +       movq    %rbx, %rdi
>>> +       movl    $(_bss - startup_32), %esi
>>> +       callq   sl_check_region
>>> +#endif
>>> +
>>>   /*
>>>    * Copy the compressed kernel to the end of our buffer
>>>    * where decompression in place becomes safe.
>>> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>>          shrq    $3, %rcx
>>>          rep     stosq
>>>
>>> +#ifdef CONFIG_SECURE_LAUNCH
>>> +       /*
>>> +        * Have to do the final early sl stub work in 64b area.
>>> +        *
>>> +        * *********** NOTE ***********
>>> +        *
>>> +        * Several boot params get used before we get a chance to measure
>>> +        * them in this call. This is a known issue and we currently don't
>>> +        * have a solution. The scratch field doesn't matter. There is no
>>> +        * obvious way to do anything about the use of kernel_alignment or
>>> +        * init_size though these seem low risk with all the PMR and overlap
>>> +        * checks in place.
>>> +        */
>>> +       movq    %r15, %rdi
>>> +       callq   sl_main
>>> +
>>> +       /* Ensure the decompression location is covered by a PMR */
>>> +       movq    %rbp, %rdi
>>> +       movq    output_len(%rip), %rsi
>>> +       callq   sl_check_region
>>> +#endif
>>> +
>>> +       pushq   %rsi
>>
>> This looks like a rebase error.
>>
>>>          call    load_stage2_idt
>>>
>>>          /* Pass boot_params to initialize_identity_maps() */
>>> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
>>> index c18f07181dd5..e199b87764e9 100644
>>> --- a/arch/x86/boot/compressed/kernel_info.S
>>> +++ b/arch/x86/boot/compressed/kernel_info.S
>>> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
>>>          /* Maximal allowed type for setup_data and setup_indirect structs. */
>>>          .long   SETUP_TYPE_MAX
>>>
>>> +       /* Offset to the MLE header structure */
>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>>> +       .long   rva(mle_header)
>>
>> ... so this could just be mle_header - kernel_info, and the consumer
>> can do the math instead.
>>
>>> +#else
>>> +       .long   0
>>> +#endif
>>> +
>>>   kernel_info_var_len_data:
>>>          /* Empty for time being... */
>>>   SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
>>> +
>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>>> +       /*
>>> +        * The MLE Header per the TXT Specification, section 2.1
>>> +        * MLE capabilities, see table 4. Capabilities set:
>>> +        * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
>>> +        * bit 1: Support for RLP wakeup using MONITOR address
>>> +        * bit 2: The ECX register will contain the pointer to the MLE page table
>>> +        * bit 5: TPM 1.2 family: Details/authorities PCR usage support
>>> +        * bit 9: Supported format of TPM 2.0 event log - TCG compliant
>>> +        */
>>> +SYM_DATA_START(mle_header)
>>> +       .long   0x9082ac5a  /* UUID0 */
>>> +       .long   0x74a7476f  /* UUID1 */
>>> +       .long   0xa2555c0f  /* UUID2 */
>>> +       .long   0x42b651cb  /* UUID3 */
>>> +       .long   0x00000034  /* MLE header size */
>>> +       .long   0x00020002  /* MLE version 2.2 */
>>> +       .long   rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
>>
>> and these should perhaps be relative to mle_header?
>>
>>> +       .long   0x00000000  /* First valid page of MLE */
>>> +       .long   0x00000000  /* Offset within binary of first byte of MLE */
>>> +       .long   rva(_edata) /* Offset within binary of last byte + 1 of MLE */
>>
>> and here
>>
> 
> Ugh never mind - these are specified externally.

Can you clarify your follow on comment here?

Thank you,
Ross
Ard Biesheuvel June 4, 2024, 5:27 p.m. UTC | #4
On Tue, 4 Jun 2024 at 19:24, <ross.philipson@oracle.com> wrote:
>
> On 5/31/24 6:33 AM, Ard Biesheuvel wrote:
> > On Fri, 31 May 2024 at 13:00, Ard Biesheuvel <ardb@kernel.org> wrote:
> >>
> >> Hello Ross,
> >>
> >> On Fri, 31 May 2024 at 03:32, Ross Philipson <ross.philipson@oracle.com> wrote:
> >>>
> >>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> >>> later AMD SKINIT) to vector to during the late launch. The symbol
> >>> sl_stub_entry is that entry point and its offset into the kernel is
> >>> conveyed to the launching code using the MLE (Measured Launch
> >>> Environment) header in the structure named mle_header. The offset of the
> >>> MLE header is set in the kernel_info. The routine sl_stub contains the
> >>> very early late launch setup code responsible for setting up the basic
> >>> environment to allow the normal kernel startup_32 code to proceed. It is
> >>> also responsible for properly waking and handling the APs on Intel
> >>> platforms. The routine sl_main which runs after entering 64b mode is
> >>> responsible for measuring configuration and module information before
> >>> it is used like the boot params, the kernel command line, the TXT heap,
> >>> an external initramfs, etc.
> >>>
> >>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> >>> ---
> >>>   Documentation/arch/x86/boot.rst        |  21 +
> >>>   arch/x86/boot/compressed/Makefile      |   3 +-
> >>>   arch/x86/boot/compressed/head_64.S     |  30 +
> >>>   arch/x86/boot/compressed/kernel_info.S |  34 ++
> >>>   arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
> >>>   arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
> >>>   arch/x86/include/asm/msr-index.h       |   5 +
> >>>   arch/x86/include/uapi/asm/bootparam.h  |   1 +
> >>>   arch/x86/kernel/asm-offsets.c          |  20 +
> >>>   9 files changed, 1415 insertions(+), 1 deletion(-)
> >>>   create mode 100644 arch/x86/boot/compressed/sl_main.c
> >>>   create mode 100644 arch/x86/boot/compressed/sl_stub.S
> >>>
> >>> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
> >>> index 4fd492cb4970..295cdf9bcbdb 100644
> >>> --- a/Documentation/arch/x86/boot.rst
> >>> +++ b/Documentation/arch/x86/boot.rst
> >>> @@ -482,6 +482,14 @@ Protocol:  2.00+
> >>>              - If 1, KASLR enabled.
> >>>              - If 0, KASLR disabled.
> >>>
> >>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
> >>> +
> >>> +       - Used internally by the setup kernel to communicate
> >>> +         Secure Launch status to kernel proper.
> >>> +
> >>> +           - If 1, Secure Launch enabled.
> >>> +           - If 0, Secure Launch disabled.
> >>> +
> >>>     Bit 5 (write): QUIET_FLAG
> >>>
> >>>          - If 0, print early messages.
> >>> @@ -1028,6 +1036,19 @@ Offset/size:     0x000c/4
> >>>
> >>>     This field contains maximal allowed type for setup_data and setup_indirect structs.
> >>>
> >>> +============   =================
> >>> +Field name:    mle_header_offset
> >>> +Offset/size:   0x0010/4
> >>> +============   =================
> >>> +
> >>> +  This field contains the offset to the Secure Launch Measured Launch Environment
> >>> +  (MLE) header. This offset is used to locate information needed during a secure
> >>> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
> >>> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
> >>> +  following a success measured launch. The specific state of the processors is
> >>> +  outlined in the TXT Software Development Guide, the latest can be found here:
> >>> +  https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf__;!!ACWV5N9M2RV99hQ!Mng0gnPhOYZ8D02t1rYwQfY6U3uWaypJyd1T2rsWz3QNHr9GhIZ9ANB_-cgPExxX0e0KmCpda-3VX8Fj$
> >>> +
> >>>
> >>
> >> Could we just repaint this field as the offset relative to the start
> >> of kernel_info rather than relative to the start of the image? That
> >> way, there is no need for patch #1, and given that the consumer of
> >> this field accesses it via kernel_info, I wouldn't expect any issues
> >> in applying this offset to obtain the actual address.
> >>
> >>
> >>>   The Image Checksum
> >>>   ==================
> >>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> >>> index 9189a0e28686..9076a248d4b4 100644
> >>> --- a/arch/x86/boot/compressed/Makefile
> >>> +++ b/arch/x86/boot/compressed/Makefile
> >>> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
> >>>   vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
> >>>   vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
> >>>
> >>> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
> >>> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
> >>> +       $(obj)/sl_main.o $(obj)/sl_stub.o
> >>>
> >>>   $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
> >>>          $(call if_changed,ld)
> >>> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> >>> index 1dcb794c5479..803c9e2e6d85 100644
> >>> --- a/arch/x86/boot/compressed/head_64.S
> >>> +++ b/arch/x86/boot/compressed/head_64.S
> >>> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
> >>>          pushq   $0
> >>>          popfq
> >>>
> >>> +#ifdef CONFIG_SECURE_LAUNCH
> >>> +       /* Ensure the relocation region is coverd by a PMR */
> >>
> >> covered
> >>
> >>> +       movq    %rbx, %rdi
> >>> +       movl    $(_bss - startup_32), %esi
> >>> +       callq   sl_check_region
> >>> +#endif
> >>> +
> >>>   /*
> >>>    * Copy the compressed kernel to the end of our buffer
> >>>    * where decompression in place becomes safe.
> >>> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
> >>>          shrq    $3, %rcx
> >>>          rep     stosq
> >>>
> >>> +#ifdef CONFIG_SECURE_LAUNCH
> >>> +       /*
> >>> +        * Have to do the final early sl stub work in 64b area.
> >>> +        *
> >>> +        * *********** NOTE ***********
> >>> +        *
> >>> +        * Several boot params get used before we get a chance to measure
> >>> +        * them in this call. This is a known issue and we currently don't
> >>> +        * have a solution. The scratch field doesn't matter. There is no
> >>> +        * obvious way to do anything about the use of kernel_alignment or
> >>> +        * init_size though these seem low risk with all the PMR and overlap
> >>> +        * checks in place.
> >>> +        */
> >>> +       movq    %r15, %rdi
> >>> +       callq   sl_main
> >>> +
> >>> +       /* Ensure the decompression location is covered by a PMR */
> >>> +       movq    %rbp, %rdi
> >>> +       movq    output_len(%rip), %rsi
> >>> +       callq   sl_check_region
> >>> +#endif
> >>> +
> >>> +       pushq   %rsi
> >>
> >> This looks like a rebase error.
> >>
> >>>          call    load_stage2_idt
> >>>
> >>>          /* Pass boot_params to initialize_identity_maps() */
> >>> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
> >>> index c18f07181dd5..e199b87764e9 100644
> >>> --- a/arch/x86/boot/compressed/kernel_info.S
> >>> +++ b/arch/x86/boot/compressed/kernel_info.S
> >>> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
> >>>          /* Maximal allowed type for setup_data and setup_indirect structs. */
> >>>          .long   SETUP_TYPE_MAX
> >>>
> >>> +       /* Offset to the MLE header structure */
> >>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> >>> +       .long   rva(mle_header)
> >>
> >> ... so this could just be mle_header - kernel_info, and the consumer
> >> can do the math instead.
> >>
> >>> +#else
> >>> +       .long   0
> >>> +#endif
> >>> +
> >>>   kernel_info_var_len_data:
> >>>          /* Empty for time being... */
> >>>   SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
> >>> +
> >>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> >>> +       /*
> >>> +        * The MLE Header per the TXT Specification, section 2.1
> >>> +        * MLE capabilities, see table 4. Capabilities set:
> >>> +        * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
> >>> +        * bit 1: Support for RLP wakeup using MONITOR address
> >>> +        * bit 2: The ECX register will contain the pointer to the MLE page table
> >>> +        * bit 5: TPM 1.2 family: Details/authorities PCR usage support
> >>> +        * bit 9: Supported format of TPM 2.0 event log - TCG compliant
> >>> +        */
> >>> +SYM_DATA_START(mle_header)
> >>> +       .long   0x9082ac5a  /* UUID0 */
> >>> +       .long   0x74a7476f  /* UUID1 */
> >>> +       .long   0xa2555c0f  /* UUID2 */
> >>> +       .long   0x42b651cb  /* UUID3 */
> >>> +       .long   0x00000034  /* MLE header size */
> >>> +       .long   0x00020002  /* MLE version 2.2 */
> >>> +       .long   rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
> >>
> >> and these should perhaps be relative to mle_header?
> >>
> >>> +       .long   0x00000000  /* First valid page of MLE */
> >>> +       .long   0x00000000  /* Offset within binary of first byte of MLE */
> >>> +       .long   rva(_edata) /* Offset within binary of last byte + 1 of MLE */
> >>
> >> and here
> >>
> >
> > Ugh never mind - these are specified externally.
>
> Can you clarify your follow on comment here?
>

I noticed that -as you pointed out in your previous reply- these
fields cannot be repainted at will, as they are defined by an external
specification.

I'll play a bit more with this code tomorrow - I would *really* like
to avoid the need for patch #1, as it adds another constraint on how
we construct the boot image, and this is already riddled with legacy
and other complications.
Ross Philipson June 4, 2024, 5:31 p.m. UTC | #5
On 5/31/24 7:04 AM, Ard Biesheuvel wrote:
> On Fri, 31 May 2024 at 15:33, Ard Biesheuvel <ardb@kernel.org> wrote:
>>
>> On Fri, 31 May 2024 at 13:00, Ard Biesheuvel <ardb@kernel.org> wrote:
>>>
>>> Hello Ross,
>>>
>>> On Fri, 31 May 2024 at 03:32, Ross Philipson <ross.philipson@oracle.com> wrote:
>>>>
>>>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
>>>> later AMD SKINIT) to vector to during the late launch. The symbol
>>>> sl_stub_entry is that entry point and its offset into the kernel is
>>>> conveyed to the launching code using the MLE (Measured Launch
>>>> Environment) header in the structure named mle_header. The offset of the
>>>> MLE header is set in the kernel_info. The routine sl_stub contains the
>>>> very early late launch setup code responsible for setting up the basic
>>>> environment to allow the normal kernel startup_32 code to proceed. It is
>>>> also responsible for properly waking and handling the APs on Intel
>>>> platforms. The routine sl_main which runs after entering 64b mode is
>>>> responsible for measuring configuration and module information before
>>>> it is used like the boot params, the kernel command line, the TXT heap,
>>>> an external initramfs, etc.
>>>>
>>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>>>> ---
>>>>   Documentation/arch/x86/boot.rst        |  21 +
>>>>   arch/x86/boot/compressed/Makefile      |   3 +-
>>>>   arch/x86/boot/compressed/head_64.S     |  30 +
>>>>   arch/x86/boot/compressed/kernel_info.S |  34 ++
>>>>   arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
>>>>   arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
>>>>   arch/x86/include/asm/msr-index.h       |   5 +
>>>>   arch/x86/include/uapi/asm/bootparam.h  |   1 +
>>>>   arch/x86/kernel/asm-offsets.c          |  20 +
>>>>   9 files changed, 1415 insertions(+), 1 deletion(-)
>>>>   create mode 100644 arch/x86/boot/compressed/sl_main.c
>>>>   create mode 100644 arch/x86/boot/compressed/sl_stub.S
>>>>
>>>> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
>>>> index 4fd492cb4970..295cdf9bcbdb 100644
>>>> --- a/Documentation/arch/x86/boot.rst
>>>> +++ b/Documentation/arch/x86/boot.rst
>>>> @@ -482,6 +482,14 @@ Protocol:  2.00+
>>>>              - If 1, KASLR enabled.
>>>>              - If 0, KASLR disabled.
>>>>
>>>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
>>>> +
>>>> +       - Used internally by the setup kernel to communicate
>>>> +         Secure Launch status to kernel proper.
>>>> +
>>>> +           - If 1, Secure Launch enabled.
>>>> +           - If 0, Secure Launch disabled.
>>>> +
>>>>     Bit 5 (write): QUIET_FLAG
>>>>
>>>>          - If 0, print early messages.
>>>> @@ -1028,6 +1036,19 @@ Offset/size:     0x000c/4
>>>>
>>>>     This field contains maximal allowed type for setup_data and setup_indirect structs.
>>>>
>>>> +============   =================
>>>> +Field name:    mle_header_offset
>>>> +Offset/size:   0x0010/4
>>>> +============   =================
>>>> +
>>>> +  This field contains the offset to the Secure Launch Measured Launch Environment
>>>> +  (MLE) header. This offset is used to locate information needed during a secure
>>>> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
>>>> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
>>>> +  following a success measured launch. The specific state of the processors is
>>>> +  outlined in the TXT Software Development Guide, the latest can be found here:
>>>> +  https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf__;!!ACWV5N9M2RV99hQ!ItP96GzpIqxa7wGXth63mmzkWPbBgoixpG3-Gj1tlstBVkReH_hagE-Sa_E6DwcvYtu5xLOwbVWeeXGa$
>>>> +
>>>>
>>>
>>> Could we just repaint this field as the offset relative to the start
>>> of kernel_info rather than relative to the start of the image? That
>>> way, there is no need for patch #1, and given that the consumer of
>>> this field accesses it via kernel_info, I wouldn't expect any issues
>>> in applying this offset to obtain the actual address.
>>>
>>>
>>>>   The Image Checksum
>>>>   ==================
>>>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>>>> index 9189a0e28686..9076a248d4b4 100644
>>>> --- a/arch/x86/boot/compressed/Makefile
>>>> +++ b/arch/x86/boot/compressed/Makefile
>>>> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>>>>   vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>>>>   vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
>>>>
>>>> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
>>>> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
>>>> +       $(obj)/sl_main.o $(obj)/sl_stub.o
>>>>
>>>>   $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
>>>>          $(call if_changed,ld)
>>>> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
>>>> index 1dcb794c5479..803c9e2e6d85 100644
>>>> --- a/arch/x86/boot/compressed/head_64.S
>>>> +++ b/arch/x86/boot/compressed/head_64.S
>>>> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
>>>>          pushq   $0
>>>>          popfq
>>>>
>>>> +#ifdef CONFIG_SECURE_LAUNCH
>>>> +       /* Ensure the relocation region is coverd by a PMR */
>>>
>>> covered
>>>
>>>> +       movq    %rbx, %rdi
>>>> +       movl    $(_bss - startup_32), %esi
>>>> +       callq   sl_check_region
>>>> +#endif
>>>> +
>>>>   /*
>>>>    * Copy the compressed kernel to the end of our buffer
>>>>    * where decompression in place becomes safe.
>>>> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>>>          shrq    $3, %rcx
>>>>          rep     stosq
>>>>
>>>> +#ifdef CONFIG_SECURE_LAUNCH
>>>> +       /*
>>>> +        * Have to do the final early sl stub work in 64b area.
>>>> +        *
>>>> +        * *********** NOTE ***********
>>>> +        *
>>>> +        * Several boot params get used before we get a chance to measure
>>>> +        * them in this call. This is a known issue and we currently don't
>>>> +        * have a solution. The scratch field doesn't matter. There is no
>>>> +        * obvious way to do anything about the use of kernel_alignment or
>>>> +        * init_size though these seem low risk with all the PMR and overlap
>>>> +        * checks in place.
>>>> +        */
>>>> +       movq    %r15, %rdi
>>>> +       callq   sl_main
>>>> +
>>>> +       /* Ensure the decompression location is covered by a PMR */
>>>> +       movq    %rbp, %rdi
>>>> +       movq    output_len(%rip), %rsi
>>>> +       callq   sl_check_region
>>>> +#endif
>>>> +
>>>> +       pushq   %rsi
>>>
>>> This looks like a rebase error.
>>>
>>>>          call    load_stage2_idt
>>>>
>>>>          /* Pass boot_params to initialize_identity_maps() */
>>>> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
>>>> index c18f07181dd5..e199b87764e9 100644
>>>> --- a/arch/x86/boot/compressed/kernel_info.S
>>>> +++ b/arch/x86/boot/compressed/kernel_info.S
>>>> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
>>>>          /* Maximal allowed type for setup_data and setup_indirect structs. */
>>>>          .long   SETUP_TYPE_MAX
>>>>
>>>> +       /* Offset to the MLE header structure */
>>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>>>> +       .long   rva(mle_header)
>>>
>>> ... so this could just be mle_header - kernel_info, and the consumer
>>> can do the math instead.
>>>
>>>> +#else
>>>> +       .long   0
>>>> +#endif
>>>> +
>>>>   kernel_info_var_len_data:
>>>>          /* Empty for time being... */
>>>>   SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
>>>> +
>>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>>>> +       /*
>>>> +        * The MLE Header per the TXT Specification, section 2.1
>>>> +        * MLE capabilities, see table 4. Capabilities set:
>>>> +        * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
>>>> +        * bit 1: Support for RLP wakeup using MONITOR address
>>>> +        * bit 2: The ECX register will contain the pointer to the MLE page table
>>>> +        * bit 5: TPM 1.2 family: Details/authorities PCR usage support
>>>> +        * bit 9: Supported format of TPM 2.0 event log - TCG compliant
>>>> +        */
>>>> +SYM_DATA_START(mle_header)
>>>> +       .long   0x9082ac5a  /* UUID0 */
>>>> +       .long   0x74a7476f  /* UUID1 */
>>>> +       .long   0xa2555c0f  /* UUID2 */
>>>> +       .long   0x42b651cb  /* UUID3 */
>>>> +       .long   0x00000034  /* MLE header size */
>>>> +       .long   0x00020002  /* MLE version 2.2 */
>>>> +       .long   rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
>>>
>>> and these should perhaps be relative to mle_header?
>>>
>>>> +       .long   0x00000000  /* First valid page of MLE */
>>>> +       .long   0x00000000  /* Offset within binary of first byte of MLE */
>>>> +       .long   rva(_edata) /* Offset within binary of last byte + 1 of MLE */
>>>
>>> and here
>>>
>>
>> Ugh never mind - these are specified externally.
> 
> OK, so instead of patch #1, please use the linker script to generate
> these constants.
> 
> I.e., add this to arch/x86/boot/compressed/vmlinux.lds.S
> 
> #ifdef CONFIG_SECURE_LAUNCH
> PROVIDE(mle_header_offset       = mle_header - startup_32);
> PROVIDE(sl_stub_entry_offset    = sl_stub_entry - startup_32);
> PROVIDE(_edata_offset           = _edata - startup_32);
> #endif
> 
> and use the symbols on the left hand side in the code.

Hmmm that is an interesting approach we had not considered but we surely 
will now. We are not wedded to keeping patch #1 by any means. Thank you 
for your suggestions.

Ross
Ross Philipson June 4, 2024, 5:33 p.m. UTC | #6
On 6/4/24 10:27 AM, Ard Biesheuvel wrote:
> On Tue, 4 Jun 2024 at 19:24, <ross.philipson@oracle.com> wrote:
>>
>> On 5/31/24 6:33 AM, Ard Biesheuvel wrote:
>>> On Fri, 31 May 2024 at 13:00, Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>
>>>> Hello Ross,
>>>>
>>>> On Fri, 31 May 2024 at 03:32, Ross Philipson <ross.philipson@oracle.com> wrote:
>>>>>
>>>>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
>>>>> later AMD SKINIT) to vector to during the late launch. The symbol
>>>>> sl_stub_entry is that entry point and its offset into the kernel is
>>>>> conveyed to the launching code using the MLE (Measured Launch
>>>>> Environment) header in the structure named mle_header. The offset of the
>>>>> MLE header is set in the kernel_info. The routine sl_stub contains the
>>>>> very early late launch setup code responsible for setting up the basic
>>>>> environment to allow the normal kernel startup_32 code to proceed. It is
>>>>> also responsible for properly waking and handling the APs on Intel
>>>>> platforms. The routine sl_main which runs after entering 64b mode is
>>>>> responsible for measuring configuration and module information before
>>>>> it is used like the boot params, the kernel command line, the TXT heap,
>>>>> an external initramfs, etc.
>>>>>
>>>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>>>>> ---
>>>>>    Documentation/arch/x86/boot.rst        |  21 +
>>>>>    arch/x86/boot/compressed/Makefile      |   3 +-
>>>>>    arch/x86/boot/compressed/head_64.S     |  30 +
>>>>>    arch/x86/boot/compressed/kernel_info.S |  34 ++
>>>>>    arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
>>>>>    arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
>>>>>    arch/x86/include/asm/msr-index.h       |   5 +
>>>>>    arch/x86/include/uapi/asm/bootparam.h  |   1 +
>>>>>    arch/x86/kernel/asm-offsets.c          |  20 +
>>>>>    9 files changed, 1415 insertions(+), 1 deletion(-)
>>>>>    create mode 100644 arch/x86/boot/compressed/sl_main.c
>>>>>    create mode 100644 arch/x86/boot/compressed/sl_stub.S
>>>>>
>>>>> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
>>>>> index 4fd492cb4970..295cdf9bcbdb 100644
>>>>> --- a/Documentation/arch/x86/boot.rst
>>>>> +++ b/Documentation/arch/x86/boot.rst
>>>>> @@ -482,6 +482,14 @@ Protocol:  2.00+
>>>>>               - If 1, KASLR enabled.
>>>>>               - If 0, KASLR disabled.
>>>>>
>>>>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
>>>>> +
>>>>> +       - Used internally by the setup kernel to communicate
>>>>> +         Secure Launch status to kernel proper.
>>>>> +
>>>>> +           - If 1, Secure Launch enabled.
>>>>> +           - If 0, Secure Launch disabled.
>>>>> +
>>>>>      Bit 5 (write): QUIET_FLAG
>>>>>
>>>>>           - If 0, print early messages.
>>>>> @@ -1028,6 +1036,19 @@ Offset/size:     0x000c/4
>>>>>
>>>>>      This field contains maximal allowed type for setup_data and setup_indirect structs.
>>>>>
>>>>> +============   =================
>>>>> +Field name:    mle_header_offset
>>>>> +Offset/size:   0x0010/4
>>>>> +============   =================
>>>>> +
>>>>> +  This field contains the offset to the Secure Launch Measured Launch Environment
>>>>> +  (MLE) header. This offset is used to locate information needed during a secure
>>>>> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
>>>>> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
>>>>> +  following a success measured launch. The specific state of the processors is
>>>>> +  outlined in the TXT Software Development Guide, the latest can be found here:
>>>>> +  https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf__;!!ACWV5N9M2RV99hQ!Mng0gnPhOYZ8D02t1rYwQfY6U3uWaypJyd1T2rsWz3QNHr9GhIZ9ANB_-cgPExxX0e0KmCpda-3VX8Fj$
>>>>> +
>>>>>
>>>>
>>>> Could we just repaint this field as the offset relative to the start
>>>> of kernel_info rather than relative to the start of the image? That
>>>> way, there is no need for patch #1, and given that the consumer of
>>>> this field accesses it via kernel_info, I wouldn't expect any issues
>>>> in applying this offset to obtain the actual address.
>>>>
>>>>
>>>>>    The Image Checksum
>>>>>    ==================
>>>>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>>>>> index 9189a0e28686..9076a248d4b4 100644
>>>>> --- a/arch/x86/boot/compressed/Makefile
>>>>> +++ b/arch/x86/boot/compressed/Makefile
>>>>> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>>>>>    vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>>>>>    vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
>>>>>
>>>>> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
>>>>> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
>>>>> +       $(obj)/sl_main.o $(obj)/sl_stub.o
>>>>>
>>>>>    $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
>>>>>           $(call if_changed,ld)
>>>>> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
>>>>> index 1dcb794c5479..803c9e2e6d85 100644
>>>>> --- a/arch/x86/boot/compressed/head_64.S
>>>>> +++ b/arch/x86/boot/compressed/head_64.S
>>>>> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
>>>>>           pushq   $0
>>>>>           popfq
>>>>>
>>>>> +#ifdef CONFIG_SECURE_LAUNCH
>>>>> +       /* Ensure the relocation region is coverd by a PMR */
>>>>
>>>> covered
>>>>
>>>>> +       movq    %rbx, %rdi
>>>>> +       movl    $(_bss - startup_32), %esi
>>>>> +       callq   sl_check_region
>>>>> +#endif
>>>>> +
>>>>>    /*
>>>>>     * Copy the compressed kernel to the end of our buffer
>>>>>     * where decompression in place becomes safe.
>>>>> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>>>>           shrq    $3, %rcx
>>>>>           rep     stosq
>>>>>
>>>>> +#ifdef CONFIG_SECURE_LAUNCH
>>>>> +       /*
>>>>> +        * Have to do the final early sl stub work in 64b area.
>>>>> +        *
>>>>> +        * *********** NOTE ***********
>>>>> +        *
>>>>> +        * Several boot params get used before we get a chance to measure
>>>>> +        * them in this call. This is a known issue and we currently don't
>>>>> +        * have a solution. The scratch field doesn't matter. There is no
>>>>> +        * obvious way to do anything about the use of kernel_alignment or
>>>>> +        * init_size though these seem low risk with all the PMR and overlap
>>>>> +        * checks in place.
>>>>> +        */
>>>>> +       movq    %r15, %rdi
>>>>> +       callq   sl_main
>>>>> +
>>>>> +       /* Ensure the decompression location is covered by a PMR */
>>>>> +       movq    %rbp, %rdi
>>>>> +       movq    output_len(%rip), %rsi
>>>>> +       callq   sl_check_region
>>>>> +#endif
>>>>> +
>>>>> +       pushq   %rsi
>>>>
>>>> This looks like a rebase error.
>>>>
>>>>>           call    load_stage2_idt
>>>>>
>>>>>           /* Pass boot_params to initialize_identity_maps() */
>>>>> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
>>>>> index c18f07181dd5..e199b87764e9 100644
>>>>> --- a/arch/x86/boot/compressed/kernel_info.S
>>>>> +++ b/arch/x86/boot/compressed/kernel_info.S
>>>>> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
>>>>>           /* Maximal allowed type for setup_data and setup_indirect structs. */
>>>>>           .long   SETUP_TYPE_MAX
>>>>>
>>>>> +       /* Offset to the MLE header structure */
>>>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>>>>> +       .long   rva(mle_header)
>>>>
>>>> ... so this could just be mle_header - kernel_info, and the consumer
>>>> can do the math instead.
>>>>
>>>>> +#else
>>>>> +       .long   0
>>>>> +#endif
>>>>> +
>>>>>    kernel_info_var_len_data:
>>>>>           /* Empty for time being... */
>>>>>    SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
>>>>> +
>>>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>>>>> +       /*
>>>>> +        * The MLE Header per the TXT Specification, section 2.1
>>>>> +        * MLE capabilities, see table 4. Capabilities set:
>>>>> +        * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
>>>>> +        * bit 1: Support for RLP wakeup using MONITOR address
>>>>> +        * bit 2: The ECX register will contain the pointer to the MLE page table
>>>>> +        * bit 5: TPM 1.2 family: Details/authorities PCR usage support
>>>>> +        * bit 9: Supported format of TPM 2.0 event log - TCG compliant
>>>>> +        */
>>>>> +SYM_DATA_START(mle_header)
>>>>> +       .long   0x9082ac5a  /* UUID0 */
>>>>> +       .long   0x74a7476f  /* UUID1 */
>>>>> +       .long   0xa2555c0f  /* UUID2 */
>>>>> +       .long   0x42b651cb  /* UUID3 */
>>>>> +       .long   0x00000034  /* MLE header size */
>>>>> +       .long   0x00020002  /* MLE version 2.2 */
>>>>> +       .long   rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
>>>>
>>>> and these should perhaps be relative to mle_header?
>>>>
>>>>> +       .long   0x00000000  /* First valid page of MLE */
>>>>> +       .long   0x00000000  /* Offset within binary of first byte of MLE */
>>>>> +       .long   rva(_edata) /* Offset within binary of last byte + 1 of MLE */
>>>>
>>>> and here
>>>>
>>>
>>> Ugh never mind - these are specified externally.
>>
>> Can you clarify your follow on comment here?
>>
> 
> I noticed that -as you pointed out in your previous reply- these
> fields cannot be repainted at will, as they are defined by an external
> specification.
> 
> I'll play a bit more with this code tomorrow - I would *really* like
> to avoid the need for patch #1, as it adds another constraint on how
> we construct the boot image, and this is already riddled with legacy
> and other complications.

Yea I should have read forward through all your replies before 
responding to the first one but I think it clarified things as you point 
out here. We appreciate you help and suggestions.

Ross
Jarkko Sakkinen June 4, 2024, 6:18 p.m. UTC | #7
On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> From: Arvind Sankar <nivedita@alum.mit.edu>
>
> There are use cases for storing the offset of a symbol in kernel_info.
> For example, the trenchboot series [0] needs to store the offset of the
> Measured Launch Environment header in kernel_info.

So either there are other use cases that you should enumerate, or just
be straight and state that this is done for Trenchboot.

I believe latter is the case, and there is no reason to project further.
If it does not interfere kernel otherwise, it should be fine just by
that.

Also I believe that it is written as Trenchboot, without "series" ;-)
Think when writing commit message that it will some day be part of the
commit log, not a series flying in the air.

Sorry for the nitpicks but better to be punctual and that way also
transparent as possible, right?

>
> Since commit (note: commit ID from tip/master)
>
> commit 527afc212231 ("x86/boot: Check that there are no run-time relocations")
>
> run-time relocations are not allowed in the compressed kernel, so simply
> using the symbol in kernel_info, as
>
> 	.long	symbol
>
> will cause a linker error because this is not position-independent.
>
> With kernel_info being a separate object file and in a different section
> from startup_32, there is no way to calculate the offset of a symbol
> from the start of the image in a position-independent way.
>
> To enable such use cases, put kernel_info into its own section which is

"To allow Trenchboot to access the fields of kernel_info..."

Much more understandable.

> placed at a predetermined offset (KERNEL_INFO_OFFSET) via the linker
> script. This will allow calculating the symbol offset in a
> position-independent way, by adding the offset from the start of
> kernel_info to KERNEL_INFO_OFFSET.
>
> Ensure that kernel_info is aligned, and use the SYM_DATA.* macros
> instead of bare labels. This stores the size of the kernel_info
> structure in the ELF symbol table.

Aligned to which boundary and short explanation why to that boundary,
i.e. state the obvious if you bring it up anyway here.

Just seems to be progressing pretty well so taking my eye glass and
looking into nitty gritty details...

BR, Jarkko
Jarkko Sakkinen June 4, 2024, 6:21 p.m. UTC | #8
On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> Introduce the Secure Launch Resource Table which forms the formal
> interface between the pre and post launch code.
>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>

If a uarch specific, I'd appreciate Intel SDM reference here so that I
can look it up and compare. Like in section granularity.

BR, Jarkko
Jarkko Sakkinen June 4, 2024, 7:56 p.m. UTC | #9
On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> later AMD SKINIT) to vector to during the late launch. The symbol
> sl_stub_entry is that entry point and its offset into the kernel is
> conveyed to the launching code using the MLE (Measured Launch
> Environment) header in the structure named mle_header. The offset of the
> MLE header is set in the kernel_info. The routine sl_stub contains the
> very early late launch setup code responsible for setting up the basic
> environment to allow the normal kernel startup_32 code to proceed. It is
> also responsible for properly waking and handling the APs on Intel
> platforms. The routine sl_main which runs after entering 64b mode is
> responsible for measuring configuration and module information before
> it is used like the boot params, the kernel command line, the TXT heap,
> an external initramfs, etc.
>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> ---
>  Documentation/arch/x86/boot.rst        |  21 +
>  arch/x86/boot/compressed/Makefile      |   3 +-
>  arch/x86/boot/compressed/head_64.S     |  30 +
>  arch/x86/boot/compressed/kernel_info.S |  34 ++
>  arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
>  arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
>  arch/x86/include/asm/msr-index.h       |   5 +
>  arch/x86/include/uapi/asm/bootparam.h  |   1 +
>  arch/x86/kernel/asm-offsets.c          |  20 +
>  9 files changed, 1415 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/boot/compressed/sl_main.c
>  create mode 100644 arch/x86/boot/compressed/sl_stub.S
>
> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
> index 4fd492cb4970..295cdf9bcbdb 100644
> --- a/Documentation/arch/x86/boot.rst
> +++ b/Documentation/arch/x86/boot.rst
> @@ -482,6 +482,14 @@ Protocol:	2.00+
>  	    - If 1, KASLR enabled.
>  	    - If 0, KASLR disabled.
>  
> +  Bit 2 (kernel internal): SLAUNCH_FLAG
> +
> +	- Used internally by the setup kernel to communicate
> +	  Secure Launch status to kernel proper.
> +
> +	    - If 1, Secure Launch enabled.
> +	    - If 0, Secure Launch disabled.
> +
>    Bit 5 (write): QUIET_FLAG
>  
>  	- If 0, print early messages.
> @@ -1028,6 +1036,19 @@ Offset/size:	0x000c/4
>  
>    This field contains maximal allowed type for setup_data and setup_indirect structs.
>  
> +============	=================
> +Field name:	mle_header_offset
> +Offset/size:	0x0010/4
> +============	=================
> +
> +  This field contains the offset to the Secure Launch Measured Launch Environment
> +  (MLE) header. This offset is used to locate information needed during a secure
> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
> +  following a success measured launch. The specific state of the processors is
> +  outlined in the TXT Software Development Guide, the latest can be found here:
> +  https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
> +
>  
>  The Image Checksum
>  ==================
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 9189a0e28686..9076a248d4b4 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>  vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>  vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
>  
> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
> +	$(obj)/sl_main.o $(obj)/sl_stub.o
>  
>  $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
>  	$(call if_changed,ld)
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 1dcb794c5479..803c9e2e6d85 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
>  	pushq	$0
>  	popfq
>  
> +#ifdef CONFIG_SECURE_LAUNCH
> +	/* Ensure the relocation region is coverd by a PMR */
> +	movq	%rbx, %rdi
> +	movl	$(_bss - startup_32), %esi
> +	callq	sl_check_region
> +#endif
> +
>  /*
>   * Copy the compressed kernel to the end of our buffer
>   * where decompression in place becomes safe.
> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>  	shrq	$3, %rcx
>  	rep	stosq
>  
> +#ifdef CONFIG_SECURE_LAUNCH
> +	/*
> +	 * Have to do the final early sl stub work in 64b area.
> +	 *
> +	 * *********** NOTE ***********
> +	 *
> +	 * Several boot params get used before we get a chance to measure
> +	 * them in this call. This is a known issue and we currently don't
> +	 * have a solution. The scratch field doesn't matter. There is no
> +	 * obvious way to do anything about the use of kernel_alignment or
> +	 * init_size though these seem low risk with all the PMR and overlap
> +	 * checks in place.
> +	 */
> +	movq	%r15, %rdi
> +	callq	sl_main
> +
> +	/* Ensure the decompression location is covered by a PMR */
> +	movq	%rbp, %rdi
> +	movq	output_len(%rip), %rsi
> +	callq	sl_check_region
> +#endif
> +
> +	pushq	%rsi
>  	call	load_stage2_idt
>  
>  	/* Pass boot_params to initialize_identity_maps() */
> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
> index c18f07181dd5..e199b87764e9 100644
> --- a/arch/x86/boot/compressed/kernel_info.S
> +++ b/arch/x86/boot/compressed/kernel_info.S
> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
>  	/* Maximal allowed type for setup_data and setup_indirect structs. */
>  	.long	SETUP_TYPE_MAX
>  
> +	/* Offset to the MLE header structure */
> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> +	.long	rva(mle_header)
> +#else
> +	.long	0
> +#endif
> +
>  kernel_info_var_len_data:
>  	/* Empty for time being... */
>  SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
> +
> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> +	/*
> +	 * The MLE Header per the TXT Specification, section 2.1
> +	 * MLE capabilities, see table 4. Capabilities set:
> +	 * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
> +	 * bit 1: Support for RLP wakeup using MONITOR address
> +	 * bit 2: The ECX register will contain the pointer to the MLE page table
> +	 * bit 5: TPM 1.2 family: Details/authorities PCR usage support
> +	 * bit 9: Supported format of TPM 2.0 event log - TCG compliant
> +	 */
> +SYM_DATA_START(mle_header)
> +	.long	0x9082ac5a  /* UUID0 */
> +	.long	0x74a7476f  /* UUID1 */
> +	.long	0xa2555c0f  /* UUID2 */
> +	.long	0x42b651cb  /* UUID3 */
> +	.long	0x00000034  /* MLE header size */
> +	.long	0x00020002  /* MLE version 2.2 */
> +	.long	rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
> +	.long	0x00000000  /* First valid page of MLE */
> +	.long	0x00000000  /* Offset within binary of first byte of MLE */
> +	.long	rva(_edata) /* Offset within binary of last byte + 1 of MLE */
> +	.long	0x00000227  /* Bit vector of MLE-supported capabilities */
> +	.long	0x00000000  /* Starting linear address of command line (unused) */
> +	.long	0x00000000  /* Ending linear address of command line (unused) */
> +SYM_DATA_END(mle_header)
> +#endif
> diff --git a/arch/x86/boot/compressed/sl_main.c b/arch/x86/boot/compressed/sl_main.c
> new file mode 100644
> index 000000000000..61e9baf410fd
> --- /dev/null
> +++ b/arch/x86/boot/compressed/sl_main.c
> @@ -0,0 +1,577 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Secure Launch early measurement and validation routines.
> + *
> + * Copyright (c) 2024, Oracle and/or its affiliates.
> + */
> +
> +#include <linux/init.h>
> +#include <linux/string.h>
> +#include <linux/linkage.h>
> +#include <asm/segment.h>
> +#include <asm/boot.h>
> +#include <asm/msr.h>
> +#include <asm/mtrr.h>
> +#include <asm/processor-flags.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/bootparam.h>
> +#include <asm/bootparam_utils.h>
> +#include <linux/slr_table.h>
> +#include <linux/slaunch.h>
> +#include <crypto/sha1.h>
> +#include <crypto/sha2.h>
> +
> +#define CAPS_VARIABLE_MTRR_COUNT_MASK	0xff
> +
> +#define SL_TPM12_LOG		1
> +#define SL_TPM20_LOG		2
> +
> +#define SL_TPM20_MAX_ALGS	2
> +
> +#define SL_MAX_EVENT_DATA	64
> +#define SL_TPM12_LOG_SIZE	(sizeof(struct tcg_pcr_event) + \
> +				SL_MAX_EVENT_DATA)
> +#define SL_TPM20_LOG_SIZE	(sizeof(struct tcg_pcr_event2_head) + \
> +				SHA1_DIGEST_SIZE + SHA256_DIGEST_SIZE + \
> +				sizeof(struct tcg_event_field) + \
> +				SL_MAX_EVENT_DATA)
> +
> +static void *evtlog_base;
> +static u32 evtlog_size;
> +static struct txt_heap_event_log_pointer2_1_element *log20_elem;
> +static u32 tpm_log_ver = SL_TPM12_LOG;
> +static struct tcg_efi_specid_event_algs tpm_algs[SL_TPM20_MAX_ALGS] = {0};
> +
> +extern u32 sl_cpu_type;
> +extern u32 sl_mle_start;
> +
> +static u64 sl_txt_read(u32 reg)
> +{
> +	return readq((void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
> +}
> +
> +static void sl_txt_write(u32 reg, u64 val)
> +{
> +	writeq(val, (void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
> +}
> +
> +static void __noreturn sl_txt_reset(u64 error)
> +{
> +	/* Reading the E2STS register acts as a barrier for TXT registers */
> +	sl_txt_write(TXT_CR_ERRORCODE, error);
> +	sl_txt_read(TXT_CR_E2STS);
> +	sl_txt_write(TXT_CR_CMD_UNLOCK_MEM_CONFIG, 1);
> +	sl_txt_read(TXT_CR_E2STS);
> +	sl_txt_write(TXT_CR_CMD_RESET, 1);
> +
> +	for ( ; ; )
> +		asm volatile ("hlt");
> +
> +	unreachable();
> +}
> +
> +static u64 sl_rdmsr(u32 reg)
> +{
> +	u64 lo, hi;
> +
> +	asm volatile ("rdmsr" : "=a" (lo), "=d" (hi) : "c" (reg));
> +
> +	return (hi << 32) | lo;
> +}
> +
> +static struct slr_table *sl_locate_and_validate_slrt(void)
> +{
> +	struct txt_os_mle_data *os_mle_data;
> +	struct slr_table *slrt;
> +	void *txt_heap;
> +
> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +	os_mle_data = txt_os_mle_data_start(txt_heap);
> +
> +	if (!os_mle_data->slrt)
> +		sl_txt_reset(SL_ERROR_INVALID_SLRT);
> +
> +	slrt = (struct slr_table *)os_mle_data->slrt;
> +
> +	if (slrt->magic != SLR_TABLE_MAGIC)
> +		sl_txt_reset(SL_ERROR_INVALID_SLRT);
> +
> +	if (slrt->architecture != SLR_INTEL_TXT)
> +		sl_txt_reset(SL_ERROR_INVALID_SLRT);
> +
> +	return slrt;
> +}
> +
> +static void sl_check_pmr_coverage(void *base, u32 size, bool allow_hi)
> +{
> +	struct txt_os_sinit_data *os_sinit_data;
> +	void *end = base + size;
> +	void *txt_heap;
> +
> +	if (!(sl_cpu_type & SL_CPU_INTEL))
> +		return;
> +
> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +	os_sinit_data = txt_os_sinit_data_start(txt_heap);
> +
> +	if ((end >= (void *)0x100000000ULL) && (base < (void *)0x100000000ULL))
> +		sl_txt_reset(SL_ERROR_REGION_STRADDLE_4GB);
> +
> +	/*
> +	 * Note that the late stub code validates that the hi PMR covers
> +	 * all memory above 4G. At this point the code can only check that
> +	 * regions are within the hi PMR but that is sufficient.
> +	 */
> +	if ((end > (void *)0x100000000ULL) && (base >= (void *)0x100000000ULL)) {
> +		if (allow_hi) {
> +			if (end >= (void *)(os_sinit_data->vtd_pmr_hi_base +
> +					   os_sinit_data->vtd_pmr_hi_size))
> +				sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
> +		} else {
> +			sl_txt_reset(SL_ERROR_REGION_ABOVE_4GB);
> +		}
> +	}
> +
> +	if (end >= (void *)os_sinit_data->vtd_pmr_lo_size)
> +		sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
> +}
> +
> +/*
> + * Some MSRs are modified by the pre-launch code including the MTRRs.
> + * The early MLE code has to restore these values. This code validates
> + * the values after they are measured.
> + */
> +static void sl_txt_validate_msrs(struct txt_os_mle_data *os_mle_data)
> +{
> +	struct slr_txt_mtrr_state *saved_bsp_mtrrs;
> +	u64 mtrr_caps, mtrr_def_type, mtrr_var;
> +	struct slr_entry_intel_info *txt_info;
> +	u64 misc_en_msr;
> +	u32 vcnt, i;
> +
> +	txt_info = (struct slr_entry_intel_info *)os_mle_data->txt_info;
> +	saved_bsp_mtrrs = &txt_info->saved_bsp_mtrrs;
> +
> +	mtrr_caps = sl_rdmsr(MSR_MTRRcap);
> +	vcnt = (u32)(mtrr_caps & CAPS_VARIABLE_MTRR_COUNT_MASK);
> +
> +	if (saved_bsp_mtrrs->mtrr_vcnt > vcnt)
> +		sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
> +	if (saved_bsp_mtrrs->mtrr_vcnt > TXT_OS_MLE_MAX_VARIABLE_MTRRS)
> +		sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
> +
> +	mtrr_def_type = sl_rdmsr(MSR_MTRRdefType);
> +	if (saved_bsp_mtrrs->default_mem_type != mtrr_def_type)
> +		sl_txt_reset(SL_ERROR_MTRR_INV_DEF_TYPE);
> +
> +	for (i = 0; i < saved_bsp_mtrrs->mtrr_vcnt; i++) {
> +		mtrr_var = sl_rdmsr(MTRRphysBase_MSR(i));
> +		if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physbase != mtrr_var)
> +			sl_txt_reset(SL_ERROR_MTRR_INV_BASE);
> +		mtrr_var = sl_rdmsr(MTRRphysMask_MSR(i));
> +		if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physmask != mtrr_var)
> +			sl_txt_reset(SL_ERROR_MTRR_INV_MASK);
> +	}
> +
> +	misc_en_msr = sl_rdmsr(MSR_IA32_MISC_ENABLE);
> +	if (txt_info->saved_misc_enable_msr != misc_en_msr)
> +		sl_txt_reset(SL_ERROR_MSR_INV_MISC_EN);
> +}
> +
> +static void sl_find_drtm_event_log(struct slr_table *slrt)
> +{
> +	struct txt_os_sinit_data *os_sinit_data;
> +	struct slr_entry_log_info *log_info;
> +	void *txt_heap;
> +
> +	log_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_LOG_INFO);
> +	if (!log_info)
> +		sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
> +
> +	evtlog_base = (void *)log_info->addr;
> +	evtlog_size = log_info->size;
> +
> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +
> +	/*
> +	 * For TPM 2.0, the event log 2.1 extended data structure has to also
> +	 * be located and fixed up.
> +	 */
> +	os_sinit_data = txt_os_sinit_data_start(txt_heap);
> +
> +	/*
> +	 * Only support version 6 and later that properly handle the
> +	 * list of ExtDataElements in the OS-SINIT structure.
> +	 */
> +	if (os_sinit_data->version < 6)
> +		sl_txt_reset(SL_ERROR_OS_SINIT_BAD_VERSION);
> +
> +	/* Find the TPM2.0 logging extended heap element */
> +	log20_elem = tpm20_find_log2_1_element(os_sinit_data);

s/tpm20/tpm2/

> +
> +	/* If found, this implies TPM20 log and family */
> +	if (log20_elem)
> +		tpm_log_ver = SL_TPM20_LOG;
> +}
> +
> +static void sl_validate_event_log_buffer(void)
> +{
> +	struct txt_os_sinit_data *os_sinit_data;
> +	void *txt_heap, *txt_end;
> +	void *mle_base, *mle_end;
> +	void *evtlog_end;
> +
> +	if ((u64)evtlog_size > (LLONG_MAX - (u64)evtlog_base))
> +		sl_txt_reset(SL_ERROR_INTEGER_OVERFLOW);
> +	evtlog_end = evtlog_base + evtlog_size;
> +
> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +	txt_end = txt_heap + sl_txt_read(TXT_CR_HEAP_SIZE);
> +	os_sinit_data = txt_os_sinit_data_start(txt_heap);
> +
> +	mle_base = (void *)(u64)sl_mle_start;
> +	mle_end = mle_base + os_sinit_data->mle_size;
> +
> +	/*
> +	 * This check is to ensure the event log buffer does not overlap with
> +	 * the MLE image.
> +	 */
> +	if (evtlog_base >= mle_end && evtlog_end > mle_end)
> +		goto pmr_check; /* above */
> +
> +	if (evtlog_end <= mle_base && evtlog_base < mle_base)
> +		goto pmr_check; /* below */
> +
> +	sl_txt_reset(SL_ERROR_MLE_BUFFER_OVERLAP);
> +
> +pmr_check:
> +	/*
> +	 * The TXT heap is protected by the DPR. If the TPM event log is
> +	 * inside the TXT heap, there is no need for a PMR check.
> +	 */
> +	if (evtlog_base > txt_heap && evtlog_end < txt_end)
> +		return;
> +
> +	sl_check_pmr_coverage(evtlog_base, evtlog_size, true);
> +}
> +
> +static void sl_find_event_log_algorithms(void)
> +{
> +	struct tcg_efi_specid_event_head *efi_head =
> +		(struct tcg_efi_specid_event_head *)(evtlog_base +
> +					log20_elem->first_record_offset +
> +					sizeof(struct tcg_pcr_event));
> +
> +	if (efi_head->num_algs == 0 || efi_head->num_algs > 2)
> +		sl_txt_reset(SL_ERROR_TPM_NUMBER_ALGS);
> +
> +	memcpy(&tpm_algs[0], &efi_head->digest_sizes[0],
> +	       sizeof(struct tcg_efi_specid_event_algs) * efi_head->num_algs);
> +}
> +
> +static void sl_tpm12_log_event(u32 pcr, u32 event_type,
> +			       const u8 *data, u32 length,
> +			       const u8 *event_data, u32 event_size)
> +{
> +	u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
> +	u8 log_buf[SL_TPM12_LOG_SIZE] = {0};
> +	struct tcg_pcr_event *pcr_event;
> +	u32 total_size;
> +
> +	pcr_event = (struct tcg_pcr_event *)log_buf;
> +	pcr_event->pcr_idx = pcr;
> +	pcr_event->event_type = event_type;
> +	if (length > 0) {
> +		sha1(data, length, &sha1_hash[0]);
> +		memcpy(&pcr_event->digest[0], &sha1_hash[0], SHA1_DIGEST_SIZE);
> +	}
> +	pcr_event->event_size = event_size;
> +	if (event_size > 0)
> +		memcpy((u8 *)pcr_event + sizeof(struct tcg_pcr_event),
> +		       event_data, event_size);
> +
> +	total_size = sizeof(struct tcg_pcr_event) + event_size;
> +
> +	if (tpm12_log_event(evtlog_base, evtlog_size, total_size, pcr_event))
> +		sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
> +}
> +
> +static void sl_tpm20_log_event(u32 pcr, u32 event_type,
> +			       const u8 *data, u32 length,
> +			       const u8 *event_data, u32 event_size)
> +{
> +	u8 sha256_hash[SHA256_DIGEST_SIZE] = {0};
> +	u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
> +	u8 log_buf[SL_TPM20_LOG_SIZE] = {0};
> +	struct sha256_state sctx256 = {0};
> +	struct tcg_pcr_event2_head *head;
> +	struct tcg_event_field *event;
> +	u32 total_size;
> +	u16 *alg_ptr;
> +	u8 *dgst_ptr;
> +
> +	head = (struct tcg_pcr_event2_head *)log_buf;
> +	head->pcr_idx = pcr;
> +	head->event_type = event_type;
> +	total_size = sizeof(struct tcg_pcr_event2_head);
> +	alg_ptr = (u16 *)(log_buf + sizeof(struct tcg_pcr_event2_head));
> +
> +	for ( ; head->count < 2; head->count++) {
> +		if (!tpm_algs[head->count].alg_id)
> +			break;
> +
> +		*alg_ptr = tpm_algs[head->count].alg_id;
> +		dgst_ptr = (u8 *)alg_ptr + sizeof(u16);
> +
> +		if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256 &&
> +		    length) {
> +			sha256_init(&sctx256);
> +			sha256_update(&sctx256, data, length);
> +			sha256_final(&sctx256, &sha256_hash[0]);
> +		} else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1 &&
> +			   length) {
> +			sha1(data, length, &sha1_hash[0]);
> +		}
> +
> +		if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256) {
> +			memcpy(dgst_ptr, &sha256_hash[0], SHA256_DIGEST_SIZE);
> +			total_size += SHA256_DIGEST_SIZE + sizeof(u16);
> +			alg_ptr = (u16 *)((u8 *)alg_ptr + SHA256_DIGEST_SIZE + sizeof(u16));
> +		} else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1) {
> +			memcpy(dgst_ptr, &sha1_hash[0], SHA1_DIGEST_SIZE);
> +			total_size += SHA1_DIGEST_SIZE + sizeof(u16);
> +			alg_ptr = (u16 *)((u8 *)alg_ptr + SHA1_DIGEST_SIZE + sizeof(u16));
> +		} else {
> +			sl_txt_reset(SL_ERROR_TPM_UNKNOWN_DIGEST);
> +		}
> +	}
> +
> +	event = (struct tcg_event_field *)(log_buf + total_size);
> +	event->event_size = event_size;
> +	if (event_size > 0)
> +		memcpy((u8 *)event + sizeof(struct tcg_event_field), event_data, event_size);
> +	total_size += sizeof(struct tcg_event_field) + event_size;
> +
> +	if (tpm20_log_event(log20_elem, evtlog_base, evtlog_size, total_size, &log_buf[0]))
> +		sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
> +}
> +
> +static void sl_tpm_extend_evtlog(u32 pcr, u32 type,
> +				 const u8 *data, u32 length, const char *desc)
> +{
> +	if (tpm_log_ver == SL_TPM20_LOG)
> +		sl_tpm20_log_event(pcr, type, data, length,
> +				   (const u8 *)desc, strlen(desc));
> +	else
> +		sl_tpm12_log_event(pcr, type, data, length,
> +				   (const u8 *)desc, strlen(desc));
> +}
> +
> +static struct setup_data *sl_handle_setup_data(struct setup_data *curr,
> +					       struct slr_policy_entry *entry)
> +{
> +	struct setup_indirect *ind;
> +	struct setup_data *next;
> +
> +	if (!curr)
> +		return NULL;
> +
> +	next = (struct setup_data *)(unsigned long)curr->next;
> +
> +	/* SETUP_INDIRECT instances have to be handled differently */
> +	if (curr->type == SETUP_INDIRECT) {
> +		ind = (struct setup_indirect *)((u8 *)curr + offsetof(struct setup_data, data));
> +
> +		sl_check_pmr_coverage((void *)ind->addr, ind->len, true);
> +
> +		sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
> +				     (void *)ind->addr, ind->len,
> +				     entry->evt_info);
> +
> +		return next;
> +	}
> +
> +	sl_check_pmr_coverage(((u8 *)curr) + sizeof(struct setup_data),
> +			      curr->len, true);
> +
> +	sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
> +			     ((u8 *)curr) + sizeof(struct setup_data),
> +			     curr->len,
> +			     entry->evt_info);
> +
> +	return next;
> +}
> +
> +static void sl_extend_setup_data(struct slr_policy_entry *entry)
> +{
> +	struct setup_data *data;
> +
> +	/*
> +	 * Measuring the boot params measured the fixed e820 memory map.
> +	 * Measure any setup_data entries including e820 extended entries.
> +	 */
> +	data = (struct setup_data *)(unsigned long)entry->entity;
> +	while (data)
> +		data = sl_handle_setup_data(data, entry);
> +}
> +
> +static void sl_extend_slrt(struct slr_policy_entry *entry)
> +{
> +	struct slr_table *slrt = (struct slr_table *)entry->entity;
> +	struct slr_entry_intel_info *intel_info;
> +
> +	/*
> +	 * In revision one of the SLRT, the only table that needs to be
> +	 * measured is the Intel info table. Everything else is meta-data,
> +	 * addresses and sizes. Note the size of what to measure is not set.
> +	 * The flag SLR_POLICY_IMPLICIT_SIZE leaves it to the measuring code
> +	 * to sort out.
> +	 */
> +	if (slrt->revision == 1) {
> +		intel_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_INTEL_INFO);
> +		if (!intel_info)
> +			sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
> +
> +		sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
> +				     (void *)entry->entity, sizeof(struct slr_entry_intel_info),
> +				     entry->evt_info);
> +	}
> +}
> +
> +static void sl_extend_txt_os2mle(struct slr_policy_entry *entry)
> +{
> +	struct txt_os_mle_data *os_mle_data;
> +	void *txt_heap;
> +
> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +	os_mle_data = txt_os_mle_data_start(txt_heap);
> +
> +	/*
> +	 * Version 1 of the OS-MLE heap structure has no fields to measure. It just
> +	 * has addresses and sizes and a scratch buffer.
> +	 */
> +	if (os_mle_data->version == 1)
> +		return;
> +}
> +
> +static void sl_process_extend_policy(struct slr_table *slrt)
> +{
> +	struct slr_entry_policy *policy;
> +	u16 i;
> +
> +	policy = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_ENTRY_POLICY);
> +	if (!policy)
> +		sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
> +
> +	for (i = 0; i < policy->nr_entries; i++) {
> +		switch (policy->policy_entries[i].entity_type) {
> +		case SLR_ET_SETUP_DATA:
> +			sl_extend_setup_data(&policy->policy_entries[i]);
> +			break;
> +		case SLR_ET_SLRT:
> +			sl_extend_slrt(&policy->policy_entries[i]);
> +			break;
> +		case SLR_ET_TXT_OS2MLE:
> +			sl_extend_txt_os2mle(&policy->policy_entries[i]);
> +			break;
> +		case SLR_ET_UNUSED:
> +			continue;
> +		default:
> +			sl_tpm_extend_evtlog(policy->policy_entries[i].pcr, TXT_EVTYPE_SLAUNCH,
> +					     (void *)policy->policy_entries[i].entity,
> +					     policy->policy_entries[i].size,
> +					     policy->policy_entries[i].evt_info);
> +		}
> +	}
> +}
> +
> +static void sl_process_extend_uefi_config(struct slr_table *slrt)
> +{
> +	struct slr_entry_uefi_config *uefi_config;
> +	u16 i;
> +
> +	uefi_config = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_UEFI_CONFIG);
> +
> +	/* Optionally here depending on how SL kernel was booted */
> +	if (!uefi_config)
> +		return;
> +
> +	for (i = 0; i < uefi_config->nr_entries; i++) {
> +		sl_tpm_extend_evtlog(uefi_config->uefi_cfg_entries[i].pcr, TXT_EVTYPE_SLAUNCH,
> +				     (void *)uefi_config->uefi_cfg_entries[i].cfg,
> +				     uefi_config->uefi_cfg_entries[i].size,
> +				     uefi_config->uefi_cfg_entries[i].evt_info);
> +	}
> +}
> +
> +asmlinkage __visible void sl_check_region(void *base, u32 size)
> +{
> +	sl_check_pmr_coverage(base, size, false);
> +}
> +
> +asmlinkage __visible void sl_main(void *bootparams)
> +{
> +	struct boot_params *bp  = (struct boot_params *)bootparams;
> +	struct txt_os_mle_data *os_mle_data;
> +	struct slr_table *slrt;
> +	void *txt_heap;
> +
> +	/*
> +	 * Ensure loadflags do not indicate a secure launch was done
> +	 * unless it really was.
> +	 */
> +	bp->hdr.loadflags &= ~SLAUNCH_FLAG;
> +
> +	/*
> +	 * Currently only Intel TXT is supported for Secure Launch. Testing
> +	 * this value also indicates that the kernel was booted successfully
> +	 * through the Secure Launch entry point and is in SMX mode.
> +	 */
> +	if (!(sl_cpu_type & SL_CPU_INTEL))
> +		return;
> +
> +	slrt = sl_locate_and_validate_slrt();
> +
> +	/* Locate the TPM event log. */
> +	sl_find_drtm_event_log(slrt);
> +
> +	/* Validate the location of the event log buffer before using it */
> +	sl_validate_event_log_buffer();
> +
> +	/*
> +	 * Find the TPM hash algorithms used by the ACM and recorded in the
> +	 * event log.
> +	 */
> +	if (tpm_log_ver == SL_TPM20_LOG)
> +		sl_find_event_log_algorithms();
> +
> +	/*
> +	 * Sanitize them before measuring. Set the SLAUNCH_FLAG early since if
> +	 * anything fails, the system will reset anyway.
> +	 */
> +	sanitize_boot_params(bp);
> +	bp->hdr.loadflags |= SLAUNCH_FLAG;
> +
> +	sl_check_pmr_coverage(bootparams, PAGE_SIZE, false);
> +
> +	/* Place event log SL specific tags before and after measurements */
> +	sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_START, NULL, 0, "");
> +
> +	/* Process all policy entries and extend the measurements to the evtlog */

These comments obfuscate code here but would make a lot more sense
in the beginning of each corresponding function.

/*
 * Process all policy entries and extend the measurements to the evtlog 
 */
static void sl_process_extend_policy(struct slr_table *slrt)
{
	/* ... */
}

BTW what good that "process" does here? Why not just sl_extend_policy()?


> +	sl_process_extend_policy(slrt);
> +
> +	/* Process all EFI config entries and extend the measurements to the evtlog */
> +	sl_process_extend_uefi_config(slrt);

Ditto.

> +
> +	sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_END, NULL, 0, "");
> +
> +	/* No PMR check is needed, the TXT heap is covered by the DPR */
> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
> +	os_mle_data = txt_os_mle_data_start(txt_heap);
> +
> +	/*
> +	 * Now that the OS-MLE data is measured, ensure the MTRR and
> +	 * misc enable MSRs are what we expect.
> +	 */
> +	sl_txt_validate_msrs(os_mle_data);
> +}
> diff --git a/arch/x86/boot/compressed/sl_stub.S b/arch/x86/boot/compressed/sl_stub.S
> new file mode 100644
> index 000000000000..24b8f23d5dcc
> --- /dev/null
> +++ b/arch/x86/boot/compressed/sl_stub.S
> @@ -0,0 +1,725 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Secure Launch protected mode entry point.
> + *
> + * Copyright (c) 2024, Oracle and/or its affiliates.
> + */
> +	.code32
> +	.text
> +#include <linux/linkage.h>
> +#include <asm/segment.h>
> +#include <asm/msr.h>
> +#include <asm/apicdef.h>
> +#include <asm/trapnr.h>
> +#include <asm/processor-flags.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/bootparam.h>
> +#include <asm/page_types.h>
> +#include <asm/irq_vectors.h>
> +#include <linux/slr_table.h>
> +#include <linux/slaunch.h>
> +
> +/* CPUID: leaf 1, ECX, SMX feature bit */
> +#define X86_FEATURE_BIT_SMX	(1 << 6)
> +
> +#define IDT_VECTOR_LO_BITS	0
> +#define IDT_VECTOR_HI_BITS	6
> +
> +/*
> + * See the comment in head_64.S for detailed information on what this macro
> + * and others like it are used for. The comment appears right at the top of
> + * the file.
> + */
> +#define rva(X) ((X) - sl_stub_entry)
> +
> +/*
> + * The GETSEC op code is open coded because older versions of
> + * GCC do not support the getsec mnemonic.
> + */
> +.macro GETSEC leaf
> +	pushl	%ebx
> +	xorl	%ebx, %ebx	/* Must be zero for SMCTRL */
> +	movl	\leaf, %eax	/* Leaf function */
> +	.byte 	0x0f, 0x37	/* GETSEC opcode */
> +	popl	%ebx
> +.endm
> +
> +.macro TXT_RESET error
> +	/*
> +	 * Set a sticky error value and reset. Note the movs to %eax act as
> +	 * TXT register barriers.
> +	 */
> +	movl	\error, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
> +	movl	$1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_NO_SECRETS)
> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
> +	movl	$1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_UNLOCK_MEM_CONFIG)
> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
> +	movl	$1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_RESET)
> +1:
> +	hlt
> +	jmp	1b
> +.endm
> +
> +	.code32
> +SYM_FUNC_START(sl_stub_entry)
> +	cli
> +	cld
> +
> +	/*
> +	 * On entry, %ebx has the entry abs offset to sl_stub_entry. This
> +	 * will be correctly scaled using the rva macro and avoid causing
> +	 * relocations. Only %cs and %ds segments are known good.
> +	 */
> +
> +	/* Load GDT, set segment regs and lret to __SL32_CS */
> +	leal	rva(sl_gdt_desc)(%ebx), %eax
> +	addl	%eax, 2(%eax)
> +	lgdt	(%eax)
> +
> +	movl	$(__SL32_DS), %eax
> +	movw	%ax, %ds
> +	movw	%ax, %es
> +	movw	%ax, %fs
> +	movw	%ax, %gs
> +	movw	%ax, %ss
> +
> +	/*
> +	 * Now that %ss is known good, take the first stack for the BSP. The
> +	 * AP stacks are only used on Intel.
> +	 */
> +	leal	rva(sl_stacks_end)(%ebx), %esp
> +
> +	leal	rva(.Lsl_cs)(%ebx), %eax
> +	pushl	$(__SL32_CS)
> +	pushl	%eax
> +	lret
> +
> +.Lsl_cs:
> +	/* Save our base pointer reg and page table for MLE */
> +	pushl	%ebx
> +	pushl	%ecx
> +
> +	/* See if SMX feature is supported. */
> +	movl	$1, %eax
> +	cpuid
> +	testl	$(X86_FEATURE_BIT_SMX), %ecx
> +	jz	.Ldo_unknown_cpu
> +
> +	popl	%ecx
> +	popl	%ebx
> +
> +	/* Know it is Intel */
> +	movl	$(SL_CPU_INTEL), rva(sl_cpu_type)(%ebx)
> +
> +	/* Locate the base of the MLE using the page tables in %ecx */
> +	call	sl_find_mle_base
> +
> +	/* Increment CPU count for BSP */
> +	incl	rva(sl_txt_cpu_count)(%ebx)
> +
> +	/*
> +	 * Enable SMI with GETSEC[SMCTRL] which were disabled by SENTER.
> +	 * NMIs were also disabled by SENTER. Since there is no IDT for the BSP,
> +	 * allow the mainline kernel re-enable them in the normal course of
> +	 * booting.
> +	 */
> +	GETSEC	$(SMX_X86_GETSEC_SMCTRL)
> +
> +	/* Clear the TXT error registers for a clean start of day */
> +	movl	$0, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
> +	movl	$0xffffffff, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ESTS)
> +
> +	/* On Intel, the zero page address is passed in the TXT heap */
> +	/* Read physical base of heap into EAX */
> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
> +	/* Read the size of the BIOS data into ECX (first 8 bytes) */
> +	movl	(%eax), %ecx
> +	/* Skip over BIOS data and size of OS to MLE data section */
> +	leal	8(%eax, %ecx), %eax
> +
> +	/* Need to verify the values in the OS-MLE struct passed in */
> +	call	sl_txt_verify_os_mle_struct
> +
> +	/*
> +	 * Get the boot params address from the heap. Note %esi and %ebx MUST
> +	 * be preserved across calls and operations.
> +	 */
> +	movl	SL_boot_params_addr(%eax), %esi
> +
> +	/* Save %ebx so the APs can find their way home */
> +	movl	%ebx, (SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax)
> +
> +	/* Fetch the AP wake code block address from the heap */
> +	movl	SL_ap_wake_block(%eax), %edi
> +	movl	%edi, rva(sl_txt_ap_wake_block)(%ebx)
> +
> +	/* Store the offset in the AP wake block to the jmp address */
> +	movl	$(sl_ap_jmp_offset - sl_txt_ap_wake_begin), \
> +		(SL_mle_scratch + SL_SCRATCH_AP_JMP_OFFSET)(%eax)
> +
> +	/* Store the offset in the AP wake block to the AP stacks block */
> +	movl	$(sl_stacks - sl_txt_ap_wake_begin), \
> +		(SL_mle_scratch + SL_SCRATCH_AP_STACKS_OFFSET)(%eax)
> +
> +	/* %eax still is the base of the OS-MLE block, save it */
> +	pushl	%eax
> +
> +	/* Relocate the AP wake code to the safe block */
> +	call	sl_txt_reloc_ap_wake
> +
> +	/*
> +	 * Wake up all APs that are blocked in the ACM and wait for them to
> +	 * halt. This should be done before restoring the MTRRs so the ACM is
> +	 * still properly in WB memory.
> +	 */
> +	call	sl_txt_wake_aps
> +
> +	/* Restore OS-MLE in %eax */
> +	popl	%eax
> +
> +	/*
> +	 * %edi is used by this routine to find the MTRRs which are in the SLRT
> +	 * in the Intel info.
> +	 */
> +	movl	SL_txt_info(%eax), %edi
> +	call	sl_txt_load_regs
> +
> +	jmp	.Lcpu_setup_done
> +
> +.Ldo_unknown_cpu:
> +	/* Non-Intel CPUs are not yet supported */
> +	ud2
> +
> +.Lcpu_setup_done:
> +	/*
> +	 * Don't enable MCE at this point. The kernel will enable
> +	 * it on the BSP later when it is ready.
> +	 */
> +
> +	/* Done, jump to normal 32b pm entry */
> +	jmp	startup_32
> +SYM_FUNC_END(sl_stub_entry)
> +
> +SYM_FUNC_START(sl_find_mle_base)
> +	/* %ecx has PDPT, get first PD */
> +	movl	(%ecx), %eax
> +	andl	$(PAGE_MASK), %eax
> +	/* Get first PT from first PDE */
> +	movl	(%eax), %eax
> +	andl	$(PAGE_MASK), %eax
> +	/* Get MLE base from first PTE */
> +	movl	(%eax), %eax
> +	andl	$(PAGE_MASK), %eax
> +
> +	movl	%eax, rva(sl_mle_start)(%ebx)
> +	ret
> +SYM_FUNC_END(sl_find_mle_base)
> +
> +SYM_FUNC_START(sl_check_buffer_mle_overlap)
> +	/* %ecx: buffer begin %edx: buffer end */
> +	/* %ebx: MLE begin %edi: MLE end */
> +	/* %eax: region may be inside MLE */
> +
> +	cmpl	%edi, %ecx
> +	jb	.Lnext_check
> +	cmpl	%edi, %edx
> +	jbe	.Lnext_check
> +	jmp	.Lvalid /* Buffer above MLE */
> +
> +.Lnext_check:
> +	cmpl	%ebx, %edx
> +	ja	.Linside_check
> +	cmpl	%ebx, %ecx
> +	jae	.Linside_check
> +	jmp	.Lvalid /* Buffer below MLE */
> +
> +.Linside_check:
> +	cmpl	$0, %eax
> +	jz	.Linvalid
> +	cmpl	%ebx, %ecx
> +	jb	.Linvalid
> +	cmpl	%edi, %edx
> +	ja	.Linvalid
> +	jmp	.Lvalid /* Buffer in MLE */
> +
> +.Linvalid:
> +	TXT_RESET $(SL_ERROR_MLE_BUFFER_OVERLAP)
> +
> +.Lvalid:
> +	ret
> +SYM_FUNC_END(sl_check_buffer_mle_overlap)
> +
> +SYM_FUNC_START(sl_txt_verify_os_mle_struct)
> +	pushl	%ebx
> +	/*
> +	 * %eax points to the base of the OS-MLE struct. Need to also
> +	 * read some values from the OS-SINIT struct too.
> +	 */
> +	movl	-8(%eax), %ecx
> +	/* Skip over OS to MLE data section and size of OS-SINIT structure */
> +	leal	(%eax, %ecx), %edx
> +
> +	/* Load MLE image base absolute offset */
> +	movl	rva(sl_mle_start)(%ebx), %ebx
> +
> +	/* Verify the value of the low PMR base. It should always be 0. */
> +	movl	SL_vtd_pmr_lo_base(%edx), %esi
> +	cmpl	$0, %esi
> +	jz	.Lvalid_pmr_base
> +	TXT_RESET $(SL_ERROR_LO_PMR_BASE)
> +
> +.Lvalid_pmr_base:
> +	/* Grab some values from OS-SINIT structure */
> +	movl	SL_mle_size(%edx), %edi
> +	addl	%ebx, %edi
> +	jc	.Loverflow_detected
> +	movl	SL_vtd_pmr_lo_size(%edx), %esi
> +
> +	/* Check the AP wake block */
> +	movl	SL_ap_wake_block(%eax), %ecx
> +	movl	SL_ap_wake_block_size(%eax), %edx
> +	addl	%ecx, %edx
> +	jc	.Loverflow_detected
> +	pushl	%eax
> +	xorl	%eax, %eax
> +	call	sl_check_buffer_mle_overlap
> +	popl	%eax
> +	cmpl	%esi, %edx
> +	ja	.Lbuffer_beyond_pmr
> +
> +	/*
> +	 * Check the boot params. Note during a UEFI boot, the boot
> +	 * params will be inside the MLE image. Test for this case
> +	 * in the overlap case.
> +	 */
> +	movl	SL_boot_params_addr(%eax), %ecx
> +	movl	$(PAGE_SIZE), %edx
> +	addl	%ecx, %edx
> +	jc	.Loverflow_detected
> +	pushl	%eax
> +	movl	$1, %eax
> +	call	sl_check_buffer_mle_overlap
> +	popl	%eax
> +	cmpl	%esi, %edx
> +	ja	.Lbuffer_beyond_pmr
> +
> +	/* Check that the AP wake block is big enough */
> +	cmpl	$(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), \
> +		SL_ap_wake_block_size(%eax)
> +	jae	.Lwake_block_ok
> +	TXT_RESET $(SL_ERROR_WAKE_BLOCK_TOO_SMALL)
> +
> +.Lwake_block_ok:
> +	popl	%ebx
> +	ret
> +
> +.Loverflow_detected:
> +	TXT_RESET $(SL_ERROR_INTEGER_OVERFLOW)
> +
> +.Lbuffer_beyond_pmr:
> +	TXT_RESET $(SL_ERROR_BUFFER_BEYOND_PMR)
> +SYM_FUNC_END(sl_txt_verify_os_mle_struct)
> +
> +SYM_FUNC_START(sl_txt_ap_entry)
> +	cli
> +	cld
> +	/*
> +	 * The %cs and %ds segments are known good after waking the AP.
> +	 * First order of business is to find where we are and
> +	 * save it in %ebx.
> +	 */
> +
> +	/* Read physical base of heap into EAX */
> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
> +	/* Read the size of the BIOS data into ECX (first 8 bytes) */
> +	movl	(%eax), %ecx
> +	/* Skip over BIOS data and size of OS to MLE data section */
> +	leal	8(%eax, %ecx), %eax
> +
> +	/* Saved %ebx from the BSP and stash OS-MLE pointer */
> +	movl	(SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax), %ebx
> +
> +	/* Save TXT info ptr in %edi for call to sl_txt_load_regs */
> +	movl	SL_txt_info(%eax), %edi
> +
> +	/* Lock and get our stack index */
> +	movl	$1, %ecx
> +.Lspin:
> +	xorl	%eax, %eax
> +	lock cmpxchgl	%ecx, rva(sl_txt_spin_lock)(%ebx)
> +	pause
> +	jnz	.Lspin
> +
> +	/* Increment the stack index and use the next value inside lock */
> +	incl	rva(sl_txt_stack_index)(%ebx)
> +	movl	rva(sl_txt_stack_index)(%ebx), %eax
> +
> +	/* Unlock */
> +	movl	$0, rva(sl_txt_spin_lock)(%ebx)
> +
> +	/* Location of the relocated AP wake block */
> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %ecx
> +
> +	/* Load reloc GDT, set segment regs and lret to __SL32_CS */
> +	lgdt	(sl_ap_gdt_desc - sl_txt_ap_wake_begin)(%ecx)
> +
> +	movl	$(__SL32_DS), %edx
> +	movw	%dx, %ds
> +	movw	%dx, %es
> +	movw	%dx, %fs
> +	movw	%dx, %gs
> +	movw	%dx, %ss
> +
> +	/* Load our reloc AP stack */
> +	movl	$(TXT_BOOT_STACK_SIZE), %edx
> +	mull	%edx
> +	leal	(sl_stacks_end - sl_txt_ap_wake_begin)(%ecx), %esp
> +	subl	%eax, %esp
> +
> +	/* Switch to AP code segment */
> +	leal	rva(.Lsl_ap_cs)(%ebx), %eax
> +	pushl	$(__SL32_CS)
> +	pushl	%eax
> +	lret
> +
> +.Lsl_ap_cs:
> +	/* Load the relocated AP IDT */
> +	lidt	(sl_ap_idt_desc - sl_txt_ap_wake_begin)(%ecx)
> +
> +	/* Fixup MTRRs and misc enable MSR on APs too */
> +	call	sl_txt_load_regs
> +
> +	/* Enable SMI with GETSEC[SMCTRL] */
> +	GETSEC $(SMX_X86_GETSEC_SMCTRL)
> +
> +	/* IRET-to-self can be used to enable NMIs which SENTER disabled */
> +	leal	rva(.Lnmi_enabled_ap)(%ebx), %eax
> +	pushfl
> +	pushl	$(__SL32_CS)
> +	pushl	%eax
> +	iret
> +
> +.Lnmi_enabled_ap:
> +	/* Put APs in X2APIC mode like the BSP */
> +	movl	$(MSR_IA32_APICBASE), %ecx
> +	rdmsr
> +	orl	$(XAPIC_ENABLE | X2APIC_ENABLE), %eax
> +	wrmsr
> +
> +	/*
> +	 * Basically done, increment the CPU count and jump off to the AP
> +	 * wake block to wait.
> +	 */
> +	lock incl	rva(sl_txt_cpu_count)(%ebx)
> +
> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %eax
> +	jmp	*%eax
> +SYM_FUNC_END(sl_txt_ap_entry)
> +
> +SYM_FUNC_START(sl_txt_reloc_ap_wake)
> +	/* Save boot params register */
> +	pushl	%esi
> +
> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %edi
> +
> +	/* Fixup AP IDT and GDT descriptor before relocating */
> +	leal	rva(sl_ap_idt_desc)(%ebx), %eax
> +	addl	%edi, 2(%eax)
> +	leal	rva(sl_ap_gdt_desc)(%ebx), %eax
> +	addl	%edi, 2(%eax)
> +
> +	/*
> +	 * Copy the AP wake code and AP GDT/IDT to the protected wake block
> +	 * provided by the loader. Destination already in %edi.
> +	 */
> +	movl	$(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), %ecx
> +	leal	rva(sl_txt_ap_wake_begin)(%ebx), %esi
> +	rep movsb
> +
> +	/* Setup the IDT for the APs to use in the relocation block */
> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %ecx
> +	addl	$(sl_ap_idt - sl_txt_ap_wake_begin), %ecx
> +	xorl	%edx, %edx
> +
> +	/* Form the default reset vector relocation address */
> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %esi
> +	addl	$(sl_txt_int_reset - sl_txt_ap_wake_begin), %esi
> +
> +1:
> +	cmpw	$(NR_VECTORS), %dx
> +	jz	.Lap_idt_done
> +
> +	cmpw	$(X86_TRAP_NMI), %dx
> +	jz	2f
> +
> +	/* Load all other fixed vectors with reset handler */
> +	movl	%esi, %eax
> +	movw	%ax, (IDT_VECTOR_LO_BITS)(%ecx)
> +	shrl	$16, %eax
> +	movw	%ax, (IDT_VECTOR_HI_BITS)(%ecx)
> +	jmp	3f
> +
> +2:
> +	/* Load single wake NMI IPI vector at the relocation address */
> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %eax
> +	addl	$(sl_txt_int_nmi - sl_txt_ap_wake_begin), %eax
> +	movw	%ax, (IDT_VECTOR_LO_BITS)(%ecx)
> +	shrl	$16, %eax
> +	movw	%ax, (IDT_VECTOR_HI_BITS)(%ecx)
> +
> +3:
> +	incw	%dx
> +	addl	$8, %ecx
> +	jmp	1b
> +
> +.Lap_idt_done:
> +	popl	%esi
> +	ret
> +SYM_FUNC_END(sl_txt_reloc_ap_wake)
> +
> +SYM_FUNC_START(sl_txt_load_regs)
> +	/* Save base pointer register */
> +	pushl	%ebx
> +
> +	/*
> +	 * On Intel, the original variable MTRRs and Misc Enable MSR are
> +	 * restored on the BSP at early boot. Each AP will also restore
> +	 * its MTRRs and Misc Enable MSR.
> +	 */
> +	pushl	%edi
> +	addl	$(SL_saved_bsp_mtrrs), %edi
> +	movl	(%edi), %ebx
> +	pushl	%ebx /* default_mem_type lo */
> +	addl	$4, %edi
> +	movl	(%edi), %ebx
> +	pushl	%ebx /* default_mem_type hi */
> +	addl	$4, %edi
> +	movl	(%edi), %ebx /* mtrr_vcnt lo, don't care about hi part */
> +	addl	$8, %edi /* now at MTRR pair array */
> +	/* Write the variable MTRRs */
> +	movl	$(MSR_MTRRphysBase0), %ecx
> +1:
> +	cmpl	$0, %ebx
> +	jz	2f
> +
> +	movl	(%edi), %eax /* MTRRphysBaseX lo */
> +	addl	$4, %edi
> +	movl	(%edi), %edx /* MTRRphysBaseX hi */
> +	wrmsr
> +	addl	$4, %edi
> +	incl	%ecx
> +	movl	(%edi), %eax /* MTRRphysMaskX lo */
> +	addl	$4, %edi
> +	movl	(%edi), %edx /* MTRRphysMaskX hi */
> +	wrmsr
> +	addl	$4, %edi
> +	incl	%ecx
> +
> +	decl	%ebx
> +	jmp	1b
> +2:
> +	/* Write the default MTRR register */
> +	popl	%edx
> +	popl	%eax
> +	movl	$(MSR_MTRRdefType), %ecx
> +	wrmsr
> +
> +	/* Return to beginning and write the misc enable msr */
> +	popl	%edi
> +	addl	$(SL_saved_misc_enable_msr), %edi
> +	movl	(%edi), %eax /* saved_misc_enable_msr lo */
> +	addl	$4, %edi
> +	movl	(%edi), %edx /* saved_misc_enable_msr hi */
> +	movl	$(MSR_IA32_MISC_ENABLE), %ecx
> +	wrmsr
> +
> +	popl	%ebx
> +	ret
> +SYM_FUNC_END(sl_txt_load_regs)
> +
> +SYM_FUNC_START(sl_txt_wake_aps)
> +	/* Save boot params register */
> +	pushl	%esi
> +
> +	/* First setup the MLE join structure and load it into TXT reg */
> +	leal	rva(sl_gdt)(%ebx), %eax
> +	leal	rva(sl_txt_ap_entry)(%ebx), %ecx
> +	leal	rva(sl_smx_rlp_mle_join)(%ebx), %edx
> +	movl	%eax, SL_rlp_gdt_base(%edx)
> +	movl	%ecx, SL_rlp_entry_point(%edx)
> +	movl	%edx, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_MLE_JOIN)
> +
> +	/* Another TXT heap walk to find various values needed to wake APs */
> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
> +	/* At BIOS data size, find the number of logical processors */
> +	movl	(SL_num_logical_procs + 8)(%eax), %edx
> +	/* Skip over BIOS data */
> +	movl	(%eax), %ecx
> +	addl	%ecx, %eax
> +	/* Skip over OS to MLE */
> +	movl	(%eax), %ecx
> +	addl	%ecx, %eax
> +	/* At OS-SNIT size, get capabilities to know how to wake up the APs */
> +	movl	(SL_capabilities + 8)(%eax), %esi
> +	/* Skip over OS to SNIT */
> +	movl	(%eax), %ecx
> +	addl	%ecx, %eax
> +	/* At SINIT-MLE size, get the AP wake MONITOR address */
> +	movl	(SL_rlp_wakeup_addr + 8)(%eax), %edi
> +
> +	/* Determine how to wake up the APs */
> +	testl	$(1 << TXT_SINIT_MLE_CAP_WAKE_MONITOR), %esi
> +	jz	.Lwake_getsec
> +
> +	/* Wake using MWAIT MONITOR */
> +	movl	$1, (%edi)
> +	jmp	.Laps_awake
> +
> +.Lwake_getsec:
> +	/* Wake using GETSEC(WAKEUP) */
> +	GETSEC	$(SMX_X86_GETSEC_WAKEUP)
> +
> +.Laps_awake:
> +	/*
> +	 * All of the APs are woken up and rendesvous in the relocated wake
> +	 * block starting at sl_txt_ap_wake_begin. Wait for all of them to
> +	 * halt.
> +	 */
> +	pause
> +	cmpl	rva(sl_txt_cpu_count)(%ebx), %edx
> +	jne	.Laps_awake
> +
> +	popl	%esi
> +	ret
> +SYM_FUNC_END(sl_txt_wake_aps)
> +
> +/* This is the beginning of the relocated AP wake code block */
> +	.global sl_txt_ap_wake_begin
> +sl_txt_ap_wake_begin:
> +
> +	/* Get the LAPIC ID for each AP and stash it on the stack */
> +	movl	$(MSR_IA32_X2APIC_APICID), %ecx
> +	rdmsr
> +	pushl	%eax
> +
> +	/*
> +	 * Get a pointer to the monitor location on this APs stack to test below
> +	 * after mwait returns. Currently %esp points to just past the pushed APIC
> +	 * ID value.
> +	 */
> +	movl	%esp, %eax
> +	subl	$(TXT_BOOT_STACK_SIZE - 4), %eax
> +	movl	$0, (%eax)
> +
> +	/* Clear ecx/edx so no invalid extensions or hints are passed to monitor */
> +	xorl	%ecx, %ecx
> +	xorl	%edx, %edx
> +
> +	/*
> +	 * Arm the monitor and wait for it to be poked by he SMP bringup code. The mwait
> +	 * instruction can return for a number of reasons. Test to see if it returned
> +	 * because the monitor was written to.
> +	 */
> +	monitor
> +
> +1:
> +	mfence
> +	mwait
> +	movl	(%eax), %edx
> +	testl	%edx, %edx
> +	jz	1b
> +
> +	/*
> +	 * This is the long absolute jump to the 32b Secure Launch protected mode stub
> +	 * code in sl_trampoline_start32() in the rmpiggy. The jump address will be
> +	 * fixed in the SMP boot code when the first AP is brought up. This whole area
> +	 * is provided and protected in the memory map by the prelaunch code.
> +	 */
> +	.byte	0xea
> +sl_ap_jmp_offset:
> +	.long	0x00000000
> +	.word	__SL32_CS
> +
> +SYM_FUNC_START(sl_txt_int_nmi)
> +	/* NMI context, just IRET */
> +	iret
> +SYM_FUNC_END(sl_txt_int_nmi)
> +
> +SYM_FUNC_START(sl_txt_int_reset)
> +	TXT_RESET $(SL_ERROR_INV_AP_INTERRUPT)
> +SYM_FUNC_END(sl_txt_int_reset)
> +
> +	.balign 8
> +SYM_DATA_START_LOCAL(sl_ap_idt_desc)
> +	.word	sl_ap_idt_end - sl_ap_idt - 1		/* Limit */
> +	.long	sl_ap_idt - sl_txt_ap_wake_begin	/* Base */
> +SYM_DATA_END_LABEL(sl_ap_idt_desc, SYM_L_LOCAL, sl_ap_idt_desc_end)
> +
> +	.balign 8
> +SYM_DATA_START_LOCAL(sl_ap_idt)
> +	.rept	NR_VECTORS
> +	.word	0x0000		/* Offset 15 to 0 */
> +	.word	__SL32_CS	/* Segment selector */
> +	.word	0x8e00		/* Present, DPL=0, 32b Vector, Interrupt */
> +	.word	0x0000		/* Offset 31 to 16 */
> +	.endr
> +SYM_DATA_END_LABEL(sl_ap_idt, SYM_L_LOCAL, sl_ap_idt_end)
> +
> +	.balign 8
> +SYM_DATA_START_LOCAL(sl_ap_gdt_desc)
> +	.word	sl_ap_gdt_end - sl_ap_gdt - 1
> +	.long	sl_ap_gdt - sl_txt_ap_wake_begin
> +SYM_DATA_END_LABEL(sl_ap_gdt_desc, SYM_L_LOCAL, sl_ap_gdt_desc_end)
> +
> +	.balign	8
> +SYM_DATA_START_LOCAL(sl_ap_gdt)
> +	.quad	0x0000000000000000	/* NULL */
> +	.quad	0x00cf9a000000ffff	/* __SL32_CS */
> +	.quad	0x00cf92000000ffff	/* __SL32_DS */
> +SYM_DATA_END_LABEL(sl_ap_gdt, SYM_L_LOCAL, sl_ap_gdt_end)
> +
> +	/* Small stacks for BSP and APs to work with */
> +	.balign 64
> +SYM_DATA_START_LOCAL(sl_stacks)
> +	.fill (TXT_MAX_CPUS * TXT_BOOT_STACK_SIZE), 1, 0
> +SYM_DATA_END_LABEL(sl_stacks, SYM_L_LOCAL, sl_stacks_end)
> +
> +/* This is the end of the relocated AP wake code block */
> +	.global sl_txt_ap_wake_end
> +sl_txt_ap_wake_end:
> +
> +	.data
> +	.balign 8
> +SYM_DATA_START_LOCAL(sl_gdt_desc)
> +	.word	sl_gdt_end - sl_gdt - 1
> +	.long	sl_gdt - sl_gdt_desc
> +SYM_DATA_END_LABEL(sl_gdt_desc, SYM_L_LOCAL, sl_gdt_desc_end)
> +
> +	.balign	8
> +SYM_DATA_START_LOCAL(sl_gdt)
> +	.quad	0x0000000000000000	/* NULL */
> +	.quad	0x00cf9a000000ffff	/* __SL32_CS */
> +	.quad	0x00cf92000000ffff	/* __SL32_DS */
> +SYM_DATA_END_LABEL(sl_gdt, SYM_L_LOCAL, sl_gdt_end)
> +
> +	.balign 8
> +SYM_DATA_START_LOCAL(sl_smx_rlp_mle_join)
> +	.long	sl_gdt_end - sl_gdt - 1	/* GDT limit */
> +	.long	0x00000000		/* GDT base */
> +	.long	__SL32_CS	/* Seg Sel - CS (DS, ES, SS = seg_sel+8) */
> +	.long	0x00000000	/* Entry point physical address */
> +SYM_DATA_END(sl_smx_rlp_mle_join)
> +
> +SYM_DATA(sl_cpu_type, .long 0x00000000)
> +
> +SYM_DATA(sl_mle_start, .long 0x00000000)
> +
> +SYM_DATA_LOCAL(sl_txt_spin_lock, .long 0x00000000)
> +
> +SYM_DATA_LOCAL(sl_txt_stack_index, .long 0x00000000)
> +
> +SYM_DATA_LOCAL(sl_txt_cpu_count, .long 0x00000000)
> +
> +SYM_DATA_LOCAL(sl_txt_ap_wake_block, .long 0x00000000)
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index e022e6eb766c..37f6167f28ba 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -348,6 +348,9 @@
>  #define MSR_IA32_RTIT_OUTPUT_BASE	0x00000560
>  #define MSR_IA32_RTIT_OUTPUT_MASK	0x00000561
>  
> +#define MSR_MTRRphysBase0		0x00000200
> +#define MSR_MTRRphysMask0		0x00000201
> +
>  #define MSR_MTRRfix64K_00000		0x00000250
>  #define MSR_MTRRfix16K_80000		0x00000258
>  #define MSR_MTRRfix16K_A0000		0x00000259
> @@ -849,6 +852,8 @@
>  #define MSR_IA32_APICBASE_ENABLE	(1<<11)
>  #define MSR_IA32_APICBASE_BASE		(0xfffff<<12)
>  
> +#define MSR_IA32_X2APIC_APICID		0x00000802
> +
>  #define MSR_IA32_UCODE_WRITE		0x00000079
>  #define MSR_IA32_UCODE_REV		0x0000008b
>  

MSR updates are better to be split to their own patch.

> diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
> index 9b82eebd7add..7ce283a22d6b 100644
> --- a/arch/x86/include/uapi/asm/bootparam.h
> +++ b/arch/x86/include/uapi/asm/bootparam.h
> @@ -12,6 +12,7 @@
>  /* loadflags */
>  #define LOADED_HIGH	(1<<0)
>  #define KASLR_FLAG	(1<<1)
> +#define SLAUNCH_FLAG	(1<<2)
>  #define QUIET_FLAG	(1<<5)
>  #define KEEP_SEGMENTS	(1<<6)
>  #define CAN_USE_HEAP	(1<<7)
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index a98020bf31bb..925adce6e2c7 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -13,6 +13,8 @@
>  #include <linux/hardirq.h>
>  #include <linux/suspend.h>
>  #include <linux/kbuild.h>
> +#include <linux/slr_table.h>
> +#include <linux/slaunch.h>
>  #include <asm/processor.h>
>  #include <asm/thread_info.h>
>  #include <asm/sigframe.h>
> @@ -120,4 +122,22 @@ static void __used common(void)
>  	OFFSET(ARIA_CTX_rounds, aria_ctx, rounds);
>  #endif
>  
> +#ifdef CONFIG_SECURE_LAUNCH
> +	BLANK();
> +	OFFSET(SL_txt_info, txt_os_mle_data, txt_info);
> +	OFFSET(SL_mle_scratch, txt_os_mle_data, mle_scratch);
> +	OFFSET(SL_boot_params_addr, txt_os_mle_data, boot_params_addr);
> +	OFFSET(SL_ap_wake_block, txt_os_mle_data, ap_wake_block);
> +	OFFSET(SL_ap_wake_block_size, txt_os_mle_data, ap_wake_block_size);
> +	OFFSET(SL_saved_misc_enable_msr, slr_entry_intel_info, saved_misc_enable_msr);
> +	OFFSET(SL_saved_bsp_mtrrs, slr_entry_intel_info, saved_bsp_mtrrs);
> +	OFFSET(SL_num_logical_procs, txt_bios_data, num_logical_procs);
> +	OFFSET(SL_capabilities, txt_os_sinit_data, capabilities);
> +	OFFSET(SL_mle_size, txt_os_sinit_data, mle_size);
> +	OFFSET(SL_vtd_pmr_lo_base, txt_os_sinit_data, vtd_pmr_lo_base);
> +	OFFSET(SL_vtd_pmr_lo_size, txt_os_sinit_data, vtd_pmr_lo_size);
> +	OFFSET(SL_rlp_wakeup_addr, txt_sinit_mle_data, rlp_wakeup_addr);
> +	OFFSET(SL_rlp_gdt_base, smx_rlp_mle_join, rlp_gdt_base);
> +	OFFSET(SL_rlp_entry_point, smx_rlp_mle_join, rlp_entry_point);
> +#endif
>  }

BR, Jarkko
Jarkko Sakkinen June 4, 2024, 8:05 p.m. UTC | #10
On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> On Intel, the APs are left in a well documented state after TXT performs
> the late launch. Specifically they cannot have #INIT asserted on them so
> a standard startup via INIT/SIPI/SIPI cannot be performed. Instead the
> early SL stub code uses MONITOR and MWAIT to park the APs. The realmode/init.c
> code updates the jump address for the waiting APs with the location of the
> Secure Launch entry point in the RM piggy after it is loaded and fixed up.
> As the APs are woken up by writing the monitor, the APs jump to the Secure
> Launch entry point in the RM piggy which mimics what the real mode code would
> do then jumps to the standard RM piggy protected mode entry point.
>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> ---
>  arch/x86/include/asm/realmode.h      |  3 ++
>  arch/x86/kernel/smpboot.c            | 58 +++++++++++++++++++++++++++-
>  arch/x86/realmode/init.c             |  3 ++
>  arch/x86/realmode/rm/header.S        |  3 ++
>  arch/x86/realmode/rm/trampoline_64.S | 32 +++++++++++++++
>  5 files changed, 97 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
> index 87e5482acd0d..339b48e2543d 100644
> --- a/arch/x86/include/asm/realmode.h
> +++ b/arch/x86/include/asm/realmode.h
> @@ -38,6 +38,9 @@ struct real_mode_header {
>  #ifdef CONFIG_X86_64
>  	u32	machine_real_restart_seg;
>  #endif
> +#ifdef CONFIG_SECURE_LAUNCH
> +	u32	sl_trampoline_start32;
> +#endif
>  };
>  
>  /* This must match data at realmode/rm/trampoline_{32,64}.S */
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 0c35207320cb..adb521221d6c 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -60,6 +60,7 @@
>  #include <linux/stackprotector.h>
>  #include <linux/cpuhotplug.h>
>  #include <linux/mc146818rtc.h>
> +#include <linux/slaunch.h>
>  
>  #include <asm/acpi.h>
>  #include <asm/cacheinfo.h>
> @@ -868,6 +869,56 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_SECURE_LAUNCH
> +
> +static bool slaunch_is_txt_launch(void)
> +{
> +	if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) ==
> +	    (SL_FLAG_ACTIVE | SL_FLAG_ARCH_TXT))
> +		return true;
> +
> +	return false;
> +}

static inline bool slaunch_is_txt_launch(void)
{
	u32 mask =  SL_FLAG_ACTIVE | SL_FLAG_ARCH_TXT;

	return slaunch_get_flags() & mask == mask;
}


> +
> +/*
> + * TXT AP startup is quite different than normal. The APs cannot have #INIT
> + * asserted on them or receive SIPIs. The early Secure Launch code has parked
> + * the APs using monitor/mwait. This will wake the APs by writing the monitor
> + * and have them jump to the protected mode code in the rmpiggy where the rest
> + * of the SMP boot of the AP will proceed normally.
> + */
> +static void slaunch_wakeup_cpu_from_txt(int cpu, int apicid)
> +{
> +	struct sl_ap_wake_info *ap_wake_info;
> +	struct sl_ap_stack_and_monitor *stack_monitor = NULL;

struct sl_ap_stack_and_monitor *stack_monitor; /* note: no initialization */
struct sl_ap_wake_info *ap_wake_info;


> +
> +	ap_wake_info = slaunch_get_ap_wake_info();
> +
> +	stack_monitor = (struct sl_ap_stack_and_monitor *)__va(ap_wake_info->ap_wake_block +
> +							       ap_wake_info->ap_stacks_offset);
> +
> +	for (unsigned int i = TXT_MAX_CPUS - 1; i >= 0; i--) {
> +		if (stack_monitor[i].apicid == apicid) {
> +			/* Write the monitor */

I'd remove this comment.

> +			stack_monitor[i].monitor = 1;
> +			break;
> +		}
> +	}
> +}
> +
> +#else
> +
> +static inline bool slaunch_is_txt_launch(void)
> +{
> +	return false;
> +}
> +
> +static inline void slaunch_wakeup_cpu_from_txt(int cpu, int apicid)
> +{
> +}
> +
> +#endif  /* !CONFIG_SECURE_LAUNCH */
> +
>  /*
>   * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
>   * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
> @@ -877,7 +928,7 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
>  static int do_boot_cpu(u32 apicid, int cpu, struct task_struct *idle)
>  {
>  	unsigned long start_ip = real_mode_header->trampoline_start;
> -	int ret;
> +	int ret = 0;
>  
>  #ifdef CONFIG_X86_64
>  	/* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */
> @@ -922,12 +973,15 @@ static int do_boot_cpu(u32 apicid, int cpu, struct task_struct *idle)
>  
>  	/*
>  	 * Wake up a CPU in difference cases:
> +	 * - Intel TXT DRTM launch uses its own method to wake the APs
>  	 * - Use a method from the APIC driver if one defined, with wakeup
>  	 *   straight to 64-bit mode preferred over wakeup to RM.
>  	 * Otherwise,
>  	 * - Use an INIT boot APIC message
>  	 */
> -	if (apic->wakeup_secondary_cpu_64)
> +	if (slaunch_is_txt_launch())
> +		slaunch_wakeup_cpu_from_txt(cpu, apicid);
> +	else if (apic->wakeup_secondary_cpu_64)
>  		ret = apic->wakeup_secondary_cpu_64(apicid, start_ip);
>  	else if (apic->wakeup_secondary_cpu)
>  		ret = apic->wakeup_secondary_cpu(apicid, start_ip);
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index f9bc444a3064..d95776cb30d3 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -4,6 +4,7 @@
>  #include <linux/memblock.h>
>  #include <linux/cc_platform.h>
>  #include <linux/pgtable.h>
> +#include <linux/slaunch.h>
>  
>  #include <asm/set_memory.h>
>  #include <asm/realmode.h>
> @@ -210,6 +211,8 @@ void __init init_real_mode(void)
>  
>  	setup_real_mode();
>  	set_real_mode_permissions();
> +
> +	slaunch_fixup_jump_vector();
>  }
>  
>  static int __init do_init_real_mode(void)
> diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S
> index 2eb62be6d256..3b5cbcbbfc90 100644
> --- a/arch/x86/realmode/rm/header.S
> +++ b/arch/x86/realmode/rm/header.S
> @@ -37,6 +37,9 @@ SYM_DATA_START(real_mode_header)
>  #ifdef CONFIG_X86_64
>  	.long	__KERNEL32_CS
>  #endif
> +#ifdef CONFIG_SECURE_LAUNCH
> +	.long	pa_sl_trampoline_start32
> +#endif
>  SYM_DATA_END(real_mode_header)
>  
>  	/* End signature, used to verify integrity */
> diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
> index 14d9c7daf90f..b0ce6205d7ea 100644
> --- a/arch/x86/realmode/rm/trampoline_64.S
> +++ b/arch/x86/realmode/rm/trampoline_64.S
> @@ -122,6 +122,38 @@ SYM_CODE_END(sev_es_trampoline_start)
>  
>  	.section ".text32","ax"
>  	.code32
> +#ifdef CONFIG_SECURE_LAUNCH
> +	.balign 4
> +SYM_CODE_START(sl_trampoline_start32)
> +	/*
> +	 * The early secure launch stub AP wakeup code has taken care of all
> +	 * the vagaries of launching out of TXT. This bit just mimics what the
> +	 * 16b entry code does and jumps off to the real startup_32.
> +	 */
> +	cli
> +	wbinvd
> +
> +	/*
> +	 * The %ebx provided is not terribly useful since it is the physical
> +	 * address of tb_trampoline_start and not the base of the image.
> +	 * Use pa_real_mode_base, which is fixed up, to get a run time
> +	 * base register to use for offsets to location that do not have
> +	 * pa_ symbols.
> +	 */
> +	movl    $pa_real_mode_base, %ebx
> +
> +	LOCK_AND_LOAD_REALMODE_ESP lock_pa=1
> +
> +	lgdt    tr_gdt(%ebx)
> +	lidt    tr_idt(%ebx)
> +
> +	movw	$__KERNEL_DS, %dx	# Data segment descriptor
> +
> +	/* Jump to where the 16b code would have jumped */
> +	ljmpl	$__KERNEL32_CS, $pa_startup_32
> +SYM_CODE_END(sl_trampoline_start32)
> +#endif
> +
>  	.balign 4
>  SYM_CODE_START(startup_32)
>  	movl	%edx, %ss

BR, Jarkko
Jarkko Sakkinen June 4, 2024, 8:12 p.m. UTC | #11
On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> From: "Daniel P. Smith" <dpsmith@apertussolutions.com>
>
> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
> locality request is allowed to be sent to the TPM. In the commit, the counter
> is indiscriminately decremented. Thus creating a situation for an integer
> underflow of the counter.
>
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> Reported-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
> Fixes: 933bfc5ad213 ("tpm, tpm: Implement usage counter for locality")

Not sure if we have practical use for fixes tag here but open for
argument ofc. I.e. I'm not sure what is the practical scenario to
worry about if Trenchboot did not exist.

> ---
>  drivers/char/tpm/tpm_tis_core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 176cd8dbf1db..7c1761bd6000 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -180,7 +180,8 @@ static int tpm_tis_relinquish_locality(struct tpm_chip *chip, int l)
>  	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>  
>  	mutex_lock(&priv->locality_count_mutex);
> -	priv->locality_count--;
> +	if (priv->locality_count > 0)
> +		priv->locality_count--;

I'd signal the situation with pr_info() in else branch.

>  	if (priv->locality_count == 0)
>  		__tpm_tis_relinquish_locality(priv, l);
>  	mutex_unlock(&priv->locality_count_mutex);

BR, Jarkko
Jarkko Sakkinen June 4, 2024, 8:14 p.m. UTC | #12
On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> From: "Daniel P. Smith" <dpsmith@apertussolutions.com>
>
> When tis core initializes, it assumes all localities are closed. There

s/tis_core/tpm_tis_core/

> are cases when this may not be the case. This commit addresses this by
> ensuring all localities are closed before initializing begins.
>
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> ---
>  drivers/char/tpm/tpm_tis_core.c | 11 ++++++++++-
>  include/linux/tpm.h             |  6 ++++++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 7c1761bd6000..9fb53bb3e73f 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -1104,7 +1104,7 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  	u32 intmask;
>  	u32 clkrun_val;
>  	u8 rid;
> -	int rc, probe;
> +	int rc, probe, i;
>  	struct tpm_chip *chip;
>  
>  	chip = tpmm_chip_alloc(dev, &tpm_tis);
> @@ -1166,6 +1166,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  		goto out_err;
>  	}
>  
> +	/*
> +	 * There are environments, like Intel TXT, that may leave a TPM

What else at this point than Intel TXT reflecting the state of the
mainline?

> +	 * locality open. Close all localities to start from a known state.
> +	 */
> +	for (i = 0; i <= TPM_MAX_LOCALITY; i++) {
> +		if (check_locality(chip, i))
> +			tpm_tis_relinquish_locality(chip, i);
> +	}

To be strict this should be enabled only for x86 platforms.

I.e. should be flagged.

> +
>  	/* Take control of the TPM's interrupt hardware and shut it off */
>  	rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>  	if (rc < 0)
> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> index c17e4efbb2e5..363f7078c3a9 100644
> --- a/include/linux/tpm.h
> +++ b/include/linux/tpm.h
> @@ -147,6 +147,12 @@ struct tpm_chip_seqops {
>   */
>  #define TPM2_MAX_CONTEXT_SIZE 4096
>  
> +/*
> + * The maximum locality (0 - 4) for a TPM, as defined in section 3.2 of the
> + * Client Platform Profile Specification.
> + */
> +#define TPM_MAX_LOCALITY		4
> +
>  struct tpm_chip {
>  	struct device dev;
>  	struct device devs;


BR, Jarkko
Jarkko Sakkinen June 4, 2024, 8:27 p.m. UTC | #13
On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> Curently the locality is hard coded to 0 but for DRTM support, access
> is needed to localities 1 through 4.
>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> ---
>  drivers/char/tpm/tpm-chip.c      | 24 +++++++++++++++++++++++-
>  drivers/char/tpm/tpm-interface.c | 15 +++++++++++++++
>  drivers/char/tpm/tpm.h           |  1 +
>  include/linux/tpm.h              |  4 ++++
>  4 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index 854546000c92..73eac54d61fb 100644
> --- a/drivers/char/tpm/tpm-chip.c
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -44,7 +44,7 @@ static int tpm_request_locality(struct tpm_chip *chip)
>  	if (!chip->ops->request_locality)
>  		return 0;
>  
> -	rc = chip->ops->request_locality(chip, 0);
> +	rc = chip->ops->request_locality(chip, chip->pref_locality);
>  	if (rc < 0)
>  		return rc;
>  
> @@ -143,6 +143,27 @@ void tpm_chip_stop(struct tpm_chip *chip)
>  }
>  EXPORT_SYMBOL_GPL(tpm_chip_stop);
>  
> +/**
> + * tpm_chip_preferred_locality() - set the TPM chip preferred locality to open
> + * @chip:	a TPM chip to use
> + * @locality:   the preferred locality
> + *
> + * Return:
> + * * true      - Preferred locality set
> + * * false     - Invalid locality specified
> + */
> +bool tpm_chip_preferred_locality(struct tpm_chip *chip, int locality)
> +{
> +	if (locality < 0 || locality >=TPM_MAX_LOCALITY)
> +		return false;
> +
> +	mutex_lock(&chip->tpm_mutex);
> +	chip->pref_locality = locality;
> +	mutex_unlock(&chip->tpm_mutex);
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(tpm_chip_preferred_locality);
> +
>  /**
>   * tpm_try_get_ops() - Get a ref to the tpm_chip
>   * @chip: Chip to ref
> @@ -374,6 +395,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
>  	}
>  
>  	chip->locality = -1;
> +	chip->pref_locality = 0;
>  	return chip;
>  
>  out:
> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
> index 5da134f12c9a..35f14ccecf0e 100644
> --- a/drivers/char/tpm/tpm-interface.c
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -274,6 +274,21 @@ int tpm_is_tpm2(struct tpm_chip *chip)
>  }
>  EXPORT_SYMBOL_GPL(tpm_is_tpm2);
>  
> +/**
> + * tpm_preferred_locality() - set the TPM chip preferred locality to open
> + * @chip:	a TPM chip to use
> + * @locality:   the preferred locality
> + *
> + * Return:
> + * * true      - Preferred locality set
> + * * false     - Invalid locality specified
> + */
> +bool tpm_preferred_locality(struct tpm_chip *chip, int locality)
> +{
> +	return tpm_chip_preferred_locality(chip, locality);
> +}
> +EXPORT_SYMBOL_GPL(tpm_preferred_locality);

 What good does this extra wrapping do?

 tpm_set_default_locality() and default_locality would make so much more
 sense in any case.

 BR, Jarkko
Jarkko Sakkinen June 4, 2024, 8:27 p.m. UTC | #14
On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> Expose a sysfs interface to allow user mode to set and query the preferred
> locality for the TPM chip.
>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> ---
>  drivers/char/tpm/tpm-sysfs.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
>
> diff --git a/drivers/char/tpm/tpm-sysfs.c b/drivers/char/tpm/tpm-sysfs.c
> index 94231f052ea7..5f4a966a4599 100644
> --- a/drivers/char/tpm/tpm-sysfs.c
> +++ b/drivers/char/tpm/tpm-sysfs.c
> @@ -324,6 +324,34 @@ static ssize_t null_name_show(struct device *dev, struct device_attribute *attr,
>  static DEVICE_ATTR_RO(null_name);
>  #endif
>  
> +static ssize_t preferred_locality_show(struct device *dev,
> +				       struct device_attribute *attr, char *buf)
> +{
> +	struct tpm_chip *chip = to_tpm_chip(dev);
> +
> +	return sprintf(buf, "%d\n", chip->pref_locality);
> +}

Disagree with the naming.

BR, Jarkko
Ross Philipson June 4, 2024, 8:28 p.m. UTC | #15
On 6/4/24 11:18 AM, Jarkko Sakkinen wrote:
> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>> From: Arvind Sankar <nivedita@alum.mit.edu>
>>
>> There are use cases for storing the offset of a symbol in kernel_info.
>> For example, the trenchboot series [0] needs to store the offset of the
>> Measured Launch Environment header in kernel_info.
> 
> So either there are other use cases that you should enumerate, or just
> be straight and state that this is done for Trenchboot.

The kernel_info concept came about because of the work we were doing on 
TrenchBoot but it was not done for TrenchBoot. It was a collaborative 
effort between the TrenchBoot team and H. Peter Anvin at Intel. He 
actually envisioned it being useful elsewhere. If you find the original 
commits for it (that went in stand-alone) from Daniel Kiper, there is a 
fair amount of detail what kernel_info is supposed to be and should be 
used for.

> 
> I believe latter is the case, and there is no reason to project further.
> If it does not interfere kernel otherwise, it should be fine just by
> that.
> 
> Also I believe that it is written as Trenchboot, without "series" ;-)
> Think when writing commit message that it will some day be part of the
> commit log, not a series flying in the air.
> 
> Sorry for the nitpicks but better to be punctual and that way also
> transparent as possible, right?

No problem. We submit the patch sets to get feedback :)

Thanks for the feedback.

> 
>>
>> Since commit (note: commit ID from tip/master)
>>
>> commit 527afc212231 ("x86/boot: Check that there are no run-time relocations")
>>
>> run-time relocations are not allowed in the compressed kernel, so simply
>> using the symbol in kernel_info, as
>>
>> 	.long	symbol
>>
>> will cause a linker error because this is not position-independent.
>>
>> With kernel_info being a separate object file and in a different section
>> from startup_32, there is no way to calculate the offset of a symbol
>> from the start of the image in a position-independent way.
>>
>> To enable such use cases, put kernel_info into its own section which is
> 
> "To allow Trenchboot to access the fields of kernel_info..."
> 
> Much more understandable.
> 
>> placed at a predetermined offset (KERNEL_INFO_OFFSET) via the linker
>> script. This will allow calculating the symbol offset in a
>> position-independent way, by adding the offset from the start of
>> kernel_info to KERNEL_INFO_OFFSET.
>>
>> Ensure that kernel_info is aligned, and use the SYM_DATA.* macros
>> instead of bare labels. This stores the size of the kernel_info
>> structure in the ELF symbol table.
> 
> Aligned to which boundary and short explanation why to that boundary,
> i.e. state the obvious if you bring it up anyway here.
> 
> Just seems to be progressing pretty well so taking my eye glass and
> looking into nitty gritty details...

So a lot of this is up in the air if you read the responses between us 
and Ard Biesheuvel. It would be nice to get rid of the part where 
kernel_info is forced to a fixed offset in the setup kernel.

Thanks
Ross

> 
> BR, Jarkko
Ross Philipson June 4, 2024, 8:31 p.m. UTC | #16
On 6/4/24 11:21 AM, Jarkko Sakkinen wrote:
> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>> Introduce the Secure Launch Resource Table which forms the formal
>> interface between the pre and post launch code.
>>
>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> 
> If a uarch specific, I'd appreciate Intel SDM reference here so that I
> can look it up and compare. Like in section granularity.

This table is meant to not be architecture specific though it can 
contain architecture specific sub-entities. E.g. there is a TXT specific 
table and in the future there will be an AMD and ARM one (and hopefully 
some others). I hope that addresses what you are pointing out or maybe I 
don't fully understand what you mean here...

Thanks
Ross

> 
> BR, Jarkko
Ard Biesheuvel June 4, 2024, 8:54 p.m. UTC | #17
On Tue, 4 Jun 2024 at 19:34, <ross.philipson@oracle.com> wrote:
>
> On 6/4/24 10:27 AM, Ard Biesheuvel wrote:
> > On Tue, 4 Jun 2024 at 19:24, <ross.philipson@oracle.com> wrote:
> >>
> >> On 5/31/24 6:33 AM, Ard Biesheuvel wrote:
> >>> On Fri, 31 May 2024 at 13:00, Ard Biesheuvel <ardb@kernel.org> wrote:
> >>>>
> >>>> Hello Ross,
> >>>>
> >>>> On Fri, 31 May 2024 at 03:32, Ross Philipson <ross.philipson@oracle.com> wrote:
> >>>>>
> >>>>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
> >>>>> later AMD SKINIT) to vector to during the late launch. The symbol
> >>>>> sl_stub_entry is that entry point and its offset into the kernel is
> >>>>> conveyed to the launching code using the MLE (Measured Launch
> >>>>> Environment) header in the structure named mle_header. The offset of the
> >>>>> MLE header is set in the kernel_info. The routine sl_stub contains the
> >>>>> very early late launch setup code responsible for setting up the basic
> >>>>> environment to allow the normal kernel startup_32 code to proceed. It is
> >>>>> also responsible for properly waking and handling the APs on Intel
> >>>>> platforms. The routine sl_main which runs after entering 64b mode is
> >>>>> responsible for measuring configuration and module information before
> >>>>> it is used like the boot params, the kernel command line, the TXT heap,
> >>>>> an external initramfs, etc.
> >>>>>
> >>>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> >>>>> ---
> >>>>>    Documentation/arch/x86/boot.rst        |  21 +
> >>>>>    arch/x86/boot/compressed/Makefile      |   3 +-
> >>>>>    arch/x86/boot/compressed/head_64.S     |  30 +
> >>>>>    arch/x86/boot/compressed/kernel_info.S |  34 ++
> >>>>>    arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
> >>>>>    arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
> >>>>>    arch/x86/include/asm/msr-index.h       |   5 +
> >>>>>    arch/x86/include/uapi/asm/bootparam.h  |   1 +
> >>>>>    arch/x86/kernel/asm-offsets.c          |  20 +
> >>>>>    9 files changed, 1415 insertions(+), 1 deletion(-)
> >>>>>    create mode 100644 arch/x86/boot/compressed/sl_main.c
> >>>>>    create mode 100644 arch/x86/boot/compressed/sl_stub.S
> >>>>>
> >>>>> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
> >>>>> index 4fd492cb4970..295cdf9bcbdb 100644
> >>>>> --- a/Documentation/arch/x86/boot.rst
> >>>>> +++ b/Documentation/arch/x86/boot.rst
> >>>>> @@ -482,6 +482,14 @@ Protocol:  2.00+
> >>>>>               - If 1, KASLR enabled.
> >>>>>               - If 0, KASLR disabled.
> >>>>>
> >>>>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
> >>>>> +
> >>>>> +       - Used internally by the setup kernel to communicate
> >>>>> +         Secure Launch status to kernel proper.
> >>>>> +
> >>>>> +           - If 1, Secure Launch enabled.
> >>>>> +           - If 0, Secure Launch disabled.
> >>>>> +
> >>>>>      Bit 5 (write): QUIET_FLAG
> >>>>>
> >>>>>           - If 0, print early messages.
> >>>>> @@ -1028,6 +1036,19 @@ Offset/size:     0x000c/4
> >>>>>
> >>>>>      This field contains maximal allowed type for setup_data and setup_indirect structs.
> >>>>>
> >>>>> +============   =================
> >>>>> +Field name:    mle_header_offset
> >>>>> +Offset/size:   0x0010/4
> >>>>> +============   =================
> >>>>> +
> >>>>> +  This field contains the offset to the Secure Launch Measured Launch Environment
> >>>>> +  (MLE) header. This offset is used to locate information needed during a secure
> >>>>> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
> >>>>> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
> >>>>> +  following a success measured launch. The specific state of the processors is
> >>>>> +  outlined in the TXT Software Development Guide, the latest can be found here:
> >>>>> +  https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf__;!!ACWV5N9M2RV99hQ!Mng0gnPhOYZ8D02t1rYwQfY6U3uWaypJyd1T2rsWz3QNHr9GhIZ9ANB_-cgPExxX0e0KmCpda-3VX8Fj$
> >>>>> +
> >>>>>
> >>>>
> >>>> Could we just repaint this field as the offset relative to the start
> >>>> of kernel_info rather than relative to the start of the image? That
> >>>> way, there is no need for patch #1, and given that the consumer of
> >>>> this field accesses it via kernel_info, I wouldn't expect any issues
> >>>> in applying this offset to obtain the actual address.
> >>>>
> >>>>
> >>>>>    The Image Checksum
> >>>>>    ==================
> >>>>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> >>>>> index 9189a0e28686..9076a248d4b4 100644
> >>>>> --- a/arch/x86/boot/compressed/Makefile
> >>>>> +++ b/arch/x86/boot/compressed/Makefile
> >>>>> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
> >>>>>    vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
> >>>>>    vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
> >>>>>
> >>>>> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
> >>>>> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
> >>>>> +       $(obj)/sl_main.o $(obj)/sl_stub.o
> >>>>>
> >>>>>    $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
> >>>>>           $(call if_changed,ld)
> >>>>> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> >>>>> index 1dcb794c5479..803c9e2e6d85 100644
> >>>>> --- a/arch/x86/boot/compressed/head_64.S
> >>>>> +++ b/arch/x86/boot/compressed/head_64.S
> >>>>> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
> >>>>>           pushq   $0
> >>>>>           popfq
> >>>>>
> >>>>> +#ifdef CONFIG_SECURE_LAUNCH
> >>>>> +       /* Ensure the relocation region is coverd by a PMR */
> >>>>
> >>>> covered
> >>>>
> >>>>> +       movq    %rbx, %rdi
> >>>>> +       movl    $(_bss - startup_32), %esi
> >>>>> +       callq   sl_check_region
> >>>>> +#endif
> >>>>> +
> >>>>>    /*
> >>>>>     * Copy the compressed kernel to the end of our buffer
> >>>>>     * where decompression in place becomes safe.
> >>>>> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
> >>>>>           shrq    $3, %rcx
> >>>>>           rep     stosq
> >>>>>
> >>>>> +#ifdef CONFIG_SECURE_LAUNCH
> >>>>> +       /*
> >>>>> +        * Have to do the final early sl stub work in 64b area.
> >>>>> +        *
> >>>>> +        * *********** NOTE ***********
> >>>>> +        *
> >>>>> +        * Several boot params get used before we get a chance to measure
> >>>>> +        * them in this call. This is a known issue and we currently don't
> >>>>> +        * have a solution. The scratch field doesn't matter. There is no
> >>>>> +        * obvious way to do anything about the use of kernel_alignment or
> >>>>> +        * init_size though these seem low risk with all the PMR and overlap
> >>>>> +        * checks in place.
> >>>>> +        */
> >>>>> +       movq    %r15, %rdi
> >>>>> +       callq   sl_main
> >>>>> +
> >>>>> +       /* Ensure the decompression location is covered by a PMR */
> >>>>> +       movq    %rbp, %rdi
> >>>>> +       movq    output_len(%rip), %rsi
> >>>>> +       callq   sl_check_region
> >>>>> +#endif
> >>>>> +
> >>>>> +       pushq   %rsi
> >>>>
> >>>> This looks like a rebase error.
> >>>>
> >>>>>           call    load_stage2_idt
> >>>>>
> >>>>>           /* Pass boot_params to initialize_identity_maps() */
> >>>>> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
> >>>>> index c18f07181dd5..e199b87764e9 100644
> >>>>> --- a/arch/x86/boot/compressed/kernel_info.S
> >>>>> +++ b/arch/x86/boot/compressed/kernel_info.S
> >>>>> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
> >>>>>           /* Maximal allowed type for setup_data and setup_indirect structs. */
> >>>>>           .long   SETUP_TYPE_MAX
> >>>>>
> >>>>> +       /* Offset to the MLE header structure */
> >>>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> >>>>> +       .long   rva(mle_header)
> >>>>
> >>>> ... so this could just be mle_header - kernel_info, and the consumer
> >>>> can do the math instead.
> >>>>
> >>>>> +#else
> >>>>> +       .long   0
> >>>>> +#endif
> >>>>> +
> >>>>>    kernel_info_var_len_data:
> >>>>>           /* Empty for time being... */
> >>>>>    SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
> >>>>> +
> >>>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
> >>>>> +       /*
> >>>>> +        * The MLE Header per the TXT Specification, section 2.1
> >>>>> +        * MLE capabilities, see table 4. Capabilities set:
> >>>>> +        * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
> >>>>> +        * bit 1: Support for RLP wakeup using MONITOR address
> >>>>> +        * bit 2: The ECX register will contain the pointer to the MLE page table
> >>>>> +        * bit 5: TPM 1.2 family: Details/authorities PCR usage support
> >>>>> +        * bit 9: Supported format of TPM 2.0 event log - TCG compliant
> >>>>> +        */
> >>>>> +SYM_DATA_START(mle_header)
> >>>>> +       .long   0x9082ac5a  /* UUID0 */
> >>>>> +       .long   0x74a7476f  /* UUID1 */
> >>>>> +       .long   0xa2555c0f  /* UUID2 */
> >>>>> +       .long   0x42b651cb  /* UUID3 */
> >>>>> +       .long   0x00000034  /* MLE header size */
> >>>>> +       .long   0x00020002  /* MLE version 2.2 */
> >>>>> +       .long   rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
> >>>>
> >>>> and these should perhaps be relative to mle_header?
> >>>>
> >>>>> +       .long   0x00000000  /* First valid page of MLE */
> >>>>> +       .long   0x00000000  /* Offset within binary of first byte of MLE */
> >>>>> +       .long   rva(_edata) /* Offset within binary of last byte + 1 of MLE */
> >>>>
> >>>> and here
> >>>>
> >>>
> >>> Ugh never mind - these are specified externally.
> >>
> >> Can you clarify your follow on comment here?
> >>
> >
> > I noticed that -as you pointed out in your previous reply- these
> > fields cannot be repainted at will, as they are defined by an external
> > specification.
> >
> > I'll play a bit more with this code tomorrow - I would *really* like
> > to avoid the need for patch #1, as it adds another constraint on how
> > we construct the boot image, and this is already riddled with legacy
> > and other complications.
>
> Yea I should have read forward through all your replies before
> responding to the first one but I think it clarified things as you point
> out here. We appreciate you help and suggestions.
>

OK, so I have a solution that does not require kernel_info at a fixed offset:

- put this at the end of arch/x86/boot/compressed/vmlinux.lds.S

#ifdef CONFIG_SECURE_LAUNCH
PROVIDE(kernel_info_offset      = ABSOLUTE(kernel_info - startup_32));
PROVIDE(mle_header_offset       = kernel_info_offset +
ABSOLUTE(mle_header - startup_32));
PROVIDE(sl_stub_entry_offset    = kernel_info_offset +
ABSOLUTE(sl_stub_entry - startup_32));
PROVIDE(_edata_offset           = kernel_info_offset + ABSOLUTE(_edata
- startup_32));
#endif


and use this for the header fields:

        /* Offset to the MLE header structure */
#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
        .long   mle_header_offset - kernel_info
#else
        .long   0
#endif



SYM_DATA_START(mle_header)
        .long   0x9082ac5a  /* UUID0 */
        .long   0x74a7476f  /* UUID1 */
        .long   0xa2555c0f  /* UUID2 */
        .long   0x42b651cb  /* UUID3 */
        .long   0x00000034  /* MLE header size */
        .long   0x00020002  /* MLE version 2.2 */
        .long   sl_stub_entry_offset - kernel_info /* Linear entry
point of MLE (virt. address) */
        .long   0x00000000  /* First valid page of MLE */
        .long   0x00000000  /* Offset within binary of first byte of MLE */
        .long   _edata_offset - kernel_info /* Offset within binary of
last byte + 1 of MLE */
        .long   0x00000227  /* Bit vector of MLE-supported capabilities */
        .long   0x00000000  /* Starting linear address of command line
(unused) */
        .long   0x00000000  /* Ending linear address of command line (unused) */
Ross Philipson June 4, 2024, 9:09 p.m. UTC | #18
On 6/4/24 12:56 PM, Jarkko Sakkinen wrote:
> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
>> later AMD SKINIT) to vector to during the late launch. The symbol
>> sl_stub_entry is that entry point and its offset into the kernel is
>> conveyed to the launching code using the MLE (Measured Launch
>> Environment) header in the structure named mle_header. The offset of the
>> MLE header is set in the kernel_info. The routine sl_stub contains the
>> very early late launch setup code responsible for setting up the basic
>> environment to allow the normal kernel startup_32 code to proceed. It is
>> also responsible for properly waking and handling the APs on Intel
>> platforms. The routine sl_main which runs after entering 64b mode is
>> responsible for measuring configuration and module information before
>> it is used like the boot params, the kernel command line, the TXT heap,
>> an external initramfs, etc.
>>
>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>> ---
>>   Documentation/arch/x86/boot.rst        |  21 +
>>   arch/x86/boot/compressed/Makefile      |   3 +-
>>   arch/x86/boot/compressed/head_64.S     |  30 +
>>   arch/x86/boot/compressed/kernel_info.S |  34 ++
>>   arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
>>   arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
>>   arch/x86/include/asm/msr-index.h       |   5 +
>>   arch/x86/include/uapi/asm/bootparam.h  |   1 +
>>   arch/x86/kernel/asm-offsets.c          |  20 +
>>   9 files changed, 1415 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/x86/boot/compressed/sl_main.c
>>   create mode 100644 arch/x86/boot/compressed/sl_stub.S
>>
>> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
>> index 4fd492cb4970..295cdf9bcbdb 100644
>> --- a/Documentation/arch/x86/boot.rst
>> +++ b/Documentation/arch/x86/boot.rst
>> @@ -482,6 +482,14 @@ Protocol:	2.00+
>>   	    - If 1, KASLR enabled.
>>   	    - If 0, KASLR disabled.
>>   
>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
>> +
>> +	- Used internally by the setup kernel to communicate
>> +	  Secure Launch status to kernel proper.
>> +
>> +	    - If 1, Secure Launch enabled.
>> +	    - If 0, Secure Launch disabled.
>> +
>>     Bit 5 (write): QUIET_FLAG
>>   
>>   	- If 0, print early messages.
>> @@ -1028,6 +1036,19 @@ Offset/size:	0x000c/4
>>   
>>     This field contains maximal allowed type for setup_data and setup_indirect structs.
>>   
>> +============	=================
>> +Field name:	mle_header_offset
>> +Offset/size:	0x0010/4
>> +============	=================
>> +
>> +  This field contains the offset to the Secure Launch Measured Launch Environment
>> +  (MLE) header. This offset is used to locate information needed during a secure
>> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
>> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
>> +  following a success measured launch. The specific state of the processors is
>> +  outlined in the TXT Software Development Guide, the latest can be found here:
>> +  https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf__;!!ACWV5N9M2RV99hQ!KPXGsFBxHXv1-jmHhyS3xHCC_3EnOUbN697TXyjlZlNw9YPQG9tQKo2s-6cn-HEv3gP_PpQqGwTYYQT3jxE$
>> +
>>   
>>   The Image Checksum
>>   ==================
>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>> index 9189a0e28686..9076a248d4b4 100644
>> --- a/arch/x86/boot/compressed/Makefile
>> +++ b/arch/x86/boot/compressed/Makefile
>> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>>   vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>>   vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
>>   
>> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
>> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
>> +	$(obj)/sl_main.o $(obj)/sl_stub.o
>>   
>>   $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
>>   	$(call if_changed,ld)
>> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
>> index 1dcb794c5479..803c9e2e6d85 100644
>> --- a/arch/x86/boot/compressed/head_64.S
>> +++ b/arch/x86/boot/compressed/head_64.S
>> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
>>   	pushq	$0
>>   	popfq
>>   
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +	/* Ensure the relocation region is coverd by a PMR */
>> +	movq	%rbx, %rdi
>> +	movl	$(_bss - startup_32), %esi
>> +	callq	sl_check_region
>> +#endif
>> +
>>   /*
>>    * Copy the compressed kernel to the end of our buffer
>>    * where decompression in place becomes safe.
>> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>   	shrq	$3, %rcx
>>   	rep	stosq
>>   
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +	/*
>> +	 * Have to do the final early sl stub work in 64b area.
>> +	 *
>> +	 * *********** NOTE ***********
>> +	 *
>> +	 * Several boot params get used before we get a chance to measure
>> +	 * them in this call. This is a known issue and we currently don't
>> +	 * have a solution. The scratch field doesn't matter. There is no
>> +	 * obvious way to do anything about the use of kernel_alignment or
>> +	 * init_size though these seem low risk with all the PMR and overlap
>> +	 * checks in place.
>> +	 */
>> +	movq	%r15, %rdi
>> +	callq	sl_main
>> +
>> +	/* Ensure the decompression location is covered by a PMR */
>> +	movq	%rbp, %rdi
>> +	movq	output_len(%rip), %rsi
>> +	callq	sl_check_region
>> +#endif
>> +
>> +	pushq	%rsi
>>   	call	load_stage2_idt
>>   
>>   	/* Pass boot_params to initialize_identity_maps() */
>> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
>> index c18f07181dd5..e199b87764e9 100644
>> --- a/arch/x86/boot/compressed/kernel_info.S
>> +++ b/arch/x86/boot/compressed/kernel_info.S
>> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
>>   	/* Maximal allowed type for setup_data and setup_indirect structs. */
>>   	.long	SETUP_TYPE_MAX
>>   
>> +	/* Offset to the MLE header structure */
>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>> +	.long	rva(mle_header)
>> +#else
>> +	.long	0
>> +#endif
>> +
>>   kernel_info_var_len_data:
>>   	/* Empty for time being... */
>>   SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
>> +
>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>> +	/*
>> +	 * The MLE Header per the TXT Specification, section 2.1
>> +	 * MLE capabilities, see table 4. Capabilities set:
>> +	 * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
>> +	 * bit 1: Support for RLP wakeup using MONITOR address
>> +	 * bit 2: The ECX register will contain the pointer to the MLE page table
>> +	 * bit 5: TPM 1.2 family: Details/authorities PCR usage support
>> +	 * bit 9: Supported format of TPM 2.0 event log - TCG compliant
>> +	 */
>> +SYM_DATA_START(mle_header)
>> +	.long	0x9082ac5a  /* UUID0 */
>> +	.long	0x74a7476f  /* UUID1 */
>> +	.long	0xa2555c0f  /* UUID2 */
>> +	.long	0x42b651cb  /* UUID3 */
>> +	.long	0x00000034  /* MLE header size */
>> +	.long	0x00020002  /* MLE version 2.2 */
>> +	.long	rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
>> +	.long	0x00000000  /* First valid page of MLE */
>> +	.long	0x00000000  /* Offset within binary of first byte of MLE */
>> +	.long	rva(_edata) /* Offset within binary of last byte + 1 of MLE */
>> +	.long	0x00000227  /* Bit vector of MLE-supported capabilities */
>> +	.long	0x00000000  /* Starting linear address of command line (unused) */
>> +	.long	0x00000000  /* Ending linear address of command line (unused) */
>> +SYM_DATA_END(mle_header)
>> +#endif
>> diff --git a/arch/x86/boot/compressed/sl_main.c b/arch/x86/boot/compressed/sl_main.c
>> new file mode 100644
>> index 000000000000..61e9baf410fd
>> --- /dev/null
>> +++ b/arch/x86/boot/compressed/sl_main.c
>> @@ -0,0 +1,577 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Secure Launch early measurement and validation routines.
>> + *
>> + * Copyright (c) 2024, Oracle and/or its affiliates.
>> + */
>> +
>> +#include <linux/init.h>
>> +#include <linux/string.h>
>> +#include <linux/linkage.h>
>> +#include <asm/segment.h>
>> +#include <asm/boot.h>
>> +#include <asm/msr.h>
>> +#include <asm/mtrr.h>
>> +#include <asm/processor-flags.h>
>> +#include <asm/asm-offsets.h>
>> +#include <asm/bootparam.h>
>> +#include <asm/bootparam_utils.h>
>> +#include <linux/slr_table.h>
>> +#include <linux/slaunch.h>
>> +#include <crypto/sha1.h>
>> +#include <crypto/sha2.h>
>> +
>> +#define CAPS_VARIABLE_MTRR_COUNT_MASK	0xff
>> +
>> +#define SL_TPM12_LOG		1
>> +#define SL_TPM20_LOG		2
>> +
>> +#define SL_TPM20_MAX_ALGS	2
>> +
>> +#define SL_MAX_EVENT_DATA	64
>> +#define SL_TPM12_LOG_SIZE	(sizeof(struct tcg_pcr_event) + \
>> +				SL_MAX_EVENT_DATA)
>> +#define SL_TPM20_LOG_SIZE	(sizeof(struct tcg_pcr_event2_head) + \
>> +				SHA1_DIGEST_SIZE + SHA256_DIGEST_SIZE + \
>> +				sizeof(struct tcg_event_field) + \
>> +				SL_MAX_EVENT_DATA)
>> +
>> +static void *evtlog_base;
>> +static u32 evtlog_size;
>> +static struct txt_heap_event_log_pointer2_1_element *log20_elem;
>> +static u32 tpm_log_ver = SL_TPM12_LOG;
>> +static struct tcg_efi_specid_event_algs tpm_algs[SL_TPM20_MAX_ALGS] = {0};
>> +
>> +extern u32 sl_cpu_type;
>> +extern u32 sl_mle_start;
>> +
>> +static u64 sl_txt_read(u32 reg)
>> +{
>> +	return readq((void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
>> +}
>> +
>> +static void sl_txt_write(u32 reg, u64 val)
>> +{
>> +	writeq(val, (void *)(u64)(TXT_PRIV_CONFIG_REGS_BASE + reg));
>> +}
>> +
>> +static void __noreturn sl_txt_reset(u64 error)
>> +{
>> +	/* Reading the E2STS register acts as a barrier for TXT registers */
>> +	sl_txt_write(TXT_CR_ERRORCODE, error);
>> +	sl_txt_read(TXT_CR_E2STS);
>> +	sl_txt_write(TXT_CR_CMD_UNLOCK_MEM_CONFIG, 1);
>> +	sl_txt_read(TXT_CR_E2STS);
>> +	sl_txt_write(TXT_CR_CMD_RESET, 1);
>> +
>> +	for ( ; ; )
>> +		asm volatile ("hlt");
>> +
>> +	unreachable();
>> +}
>> +
>> +static u64 sl_rdmsr(u32 reg)
>> +{
>> +	u64 lo, hi;
>> +
>> +	asm volatile ("rdmsr" : "=a" (lo), "=d" (hi) : "c" (reg));
>> +
>> +	return (hi << 32) | lo;
>> +}
>> +
>> +static struct slr_table *sl_locate_and_validate_slrt(void)
>> +{
>> +	struct txt_os_mle_data *os_mle_data;
>> +	struct slr_table *slrt;
>> +	void *txt_heap;
>> +
>> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +	os_mle_data = txt_os_mle_data_start(txt_heap);
>> +
>> +	if (!os_mle_data->slrt)
>> +		sl_txt_reset(SL_ERROR_INVALID_SLRT);
>> +
>> +	slrt = (struct slr_table *)os_mle_data->slrt;
>> +
>> +	if (slrt->magic != SLR_TABLE_MAGIC)
>> +		sl_txt_reset(SL_ERROR_INVALID_SLRT);
>> +
>> +	if (slrt->architecture != SLR_INTEL_TXT)
>> +		sl_txt_reset(SL_ERROR_INVALID_SLRT);
>> +
>> +	return slrt;
>> +}
>> +
>> +static void sl_check_pmr_coverage(void *base, u32 size, bool allow_hi)
>> +{
>> +	struct txt_os_sinit_data *os_sinit_data;
>> +	void *end = base + size;
>> +	void *txt_heap;
>> +
>> +	if (!(sl_cpu_type & SL_CPU_INTEL))
>> +		return;
>> +
>> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +	os_sinit_data = txt_os_sinit_data_start(txt_heap);
>> +
>> +	if ((end >= (void *)0x100000000ULL) && (base < (void *)0x100000000ULL))
>> +		sl_txt_reset(SL_ERROR_REGION_STRADDLE_4GB);
>> +
>> +	/*
>> +	 * Note that the late stub code validates that the hi PMR covers
>> +	 * all memory above 4G. At this point the code can only check that
>> +	 * regions are within the hi PMR but that is sufficient.
>> +	 */
>> +	if ((end > (void *)0x100000000ULL) && (base >= (void *)0x100000000ULL)) {
>> +		if (allow_hi) {
>> +			if (end >= (void *)(os_sinit_data->vtd_pmr_hi_base +
>> +					   os_sinit_data->vtd_pmr_hi_size))
>> +				sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
>> +		} else {
>> +			sl_txt_reset(SL_ERROR_REGION_ABOVE_4GB);
>> +		}
>> +	}
>> +
>> +	if (end >= (void *)os_sinit_data->vtd_pmr_lo_size)
>> +		sl_txt_reset(SL_ERROR_BUFFER_BEYOND_PMR);
>> +}
>> +
>> +/*
>> + * Some MSRs are modified by the pre-launch code including the MTRRs.
>> + * The early MLE code has to restore these values. This code validates
>> + * the values after they are measured.
>> + */
>> +static void sl_txt_validate_msrs(struct txt_os_mle_data *os_mle_data)
>> +{
>> +	struct slr_txt_mtrr_state *saved_bsp_mtrrs;
>> +	u64 mtrr_caps, mtrr_def_type, mtrr_var;
>> +	struct slr_entry_intel_info *txt_info;
>> +	u64 misc_en_msr;
>> +	u32 vcnt, i;
>> +
>> +	txt_info = (struct slr_entry_intel_info *)os_mle_data->txt_info;
>> +	saved_bsp_mtrrs = &txt_info->saved_bsp_mtrrs;
>> +
>> +	mtrr_caps = sl_rdmsr(MSR_MTRRcap);
>> +	vcnt = (u32)(mtrr_caps & CAPS_VARIABLE_MTRR_COUNT_MASK);
>> +
>> +	if (saved_bsp_mtrrs->mtrr_vcnt > vcnt)
>> +		sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
>> +	if (saved_bsp_mtrrs->mtrr_vcnt > TXT_OS_MLE_MAX_VARIABLE_MTRRS)
>> +		sl_txt_reset(SL_ERROR_MTRR_INV_VCNT);
>> +
>> +	mtrr_def_type = sl_rdmsr(MSR_MTRRdefType);
>> +	if (saved_bsp_mtrrs->default_mem_type != mtrr_def_type)
>> +		sl_txt_reset(SL_ERROR_MTRR_INV_DEF_TYPE);
>> +
>> +	for (i = 0; i < saved_bsp_mtrrs->mtrr_vcnt; i++) {
>> +		mtrr_var = sl_rdmsr(MTRRphysBase_MSR(i));
>> +		if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physbase != mtrr_var)
>> +			sl_txt_reset(SL_ERROR_MTRR_INV_BASE);
>> +		mtrr_var = sl_rdmsr(MTRRphysMask_MSR(i));
>> +		if (saved_bsp_mtrrs->mtrr_pair[i].mtrr_physmask != mtrr_var)
>> +			sl_txt_reset(SL_ERROR_MTRR_INV_MASK);
>> +	}
>> +
>> +	misc_en_msr = sl_rdmsr(MSR_IA32_MISC_ENABLE);
>> +	if (txt_info->saved_misc_enable_msr != misc_en_msr)
>> +		sl_txt_reset(SL_ERROR_MSR_INV_MISC_EN);
>> +}
>> +
>> +static void sl_find_drtm_event_log(struct slr_table *slrt)
>> +{
>> +	struct txt_os_sinit_data *os_sinit_data;
>> +	struct slr_entry_log_info *log_info;
>> +	void *txt_heap;
>> +
>> +	log_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_LOG_INFO);
>> +	if (!log_info)
>> +		sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
>> +
>> +	evtlog_base = (void *)log_info->addr;
>> +	evtlog_size = log_info->size;
>> +
>> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +
>> +	/*
>> +	 * For TPM 2.0, the event log 2.1 extended data structure has to also
>> +	 * be located and fixed up.
>> +	 */
>> +	os_sinit_data = txt_os_sinit_data_start(txt_heap);
>> +
>> +	/*
>> +	 * Only support version 6 and later that properly handle the
>> +	 * list of ExtDataElements in the OS-SINIT structure.
>> +	 */
>> +	if (os_sinit_data->version < 6)
>> +		sl_txt_reset(SL_ERROR_OS_SINIT_BAD_VERSION);
>> +
>> +	/* Find the TPM2.0 logging extended heap element */
>> +	log20_elem = tpm20_find_log2_1_element(os_sinit_data);
> 
> s/tpm20/tpm2/

Reasonable. We can change it.

> 
>> +
>> +	/* If found, this implies TPM20 log and family */
>> +	if (log20_elem)
>> +		tpm_log_ver = SL_TPM20_LOG;
>> +}
>> +
>> +static void sl_validate_event_log_buffer(void)
>> +{
>> +	struct txt_os_sinit_data *os_sinit_data;
>> +	void *txt_heap, *txt_end;
>> +	void *mle_base, *mle_end;
>> +	void *evtlog_end;
>> +
>> +	if ((u64)evtlog_size > (LLONG_MAX - (u64)evtlog_base))
>> +		sl_txt_reset(SL_ERROR_INTEGER_OVERFLOW);
>> +	evtlog_end = evtlog_base + evtlog_size;
>> +
>> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +	txt_end = txt_heap + sl_txt_read(TXT_CR_HEAP_SIZE);
>> +	os_sinit_data = txt_os_sinit_data_start(txt_heap);
>> +
>> +	mle_base = (void *)(u64)sl_mle_start;
>> +	mle_end = mle_base + os_sinit_data->mle_size;
>> +
>> +	/*
>> +	 * This check is to ensure the event log buffer does not overlap with
>> +	 * the MLE image.
>> +	 */
>> +	if (evtlog_base >= mle_end && evtlog_end > mle_end)
>> +		goto pmr_check; /* above */
>> +
>> +	if (evtlog_end <= mle_base && evtlog_base < mle_base)
>> +		goto pmr_check; /* below */
>> +
>> +	sl_txt_reset(SL_ERROR_MLE_BUFFER_OVERLAP);
>> +
>> +pmr_check:
>> +	/*
>> +	 * The TXT heap is protected by the DPR. If the TPM event log is
>> +	 * inside the TXT heap, there is no need for a PMR check.
>> +	 */
>> +	if (evtlog_base > txt_heap && evtlog_end < txt_end)
>> +		return;
>> +
>> +	sl_check_pmr_coverage(evtlog_base, evtlog_size, true);
>> +}
>> +
>> +static void sl_find_event_log_algorithms(void)
>> +{
>> +	struct tcg_efi_specid_event_head *efi_head =
>> +		(struct tcg_efi_specid_event_head *)(evtlog_base +
>> +					log20_elem->first_record_offset +
>> +					sizeof(struct tcg_pcr_event));
>> +
>> +	if (efi_head->num_algs == 0 || efi_head->num_algs > 2)
>> +		sl_txt_reset(SL_ERROR_TPM_NUMBER_ALGS);
>> +
>> +	memcpy(&tpm_algs[0], &efi_head->digest_sizes[0],
>> +	       sizeof(struct tcg_efi_specid_event_algs) * efi_head->num_algs);
>> +}
>> +
>> +static void sl_tpm12_log_event(u32 pcr, u32 event_type,
>> +			       const u8 *data, u32 length,
>> +			       const u8 *event_data, u32 event_size)
>> +{
>> +	u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
>> +	u8 log_buf[SL_TPM12_LOG_SIZE] = {0};
>> +	struct tcg_pcr_event *pcr_event;
>> +	u32 total_size;
>> +
>> +	pcr_event = (struct tcg_pcr_event *)log_buf;
>> +	pcr_event->pcr_idx = pcr;
>> +	pcr_event->event_type = event_type;
>> +	if (length > 0) {
>> +		sha1(data, length, &sha1_hash[0]);
>> +		memcpy(&pcr_event->digest[0], &sha1_hash[0], SHA1_DIGEST_SIZE);
>> +	}
>> +	pcr_event->event_size = event_size;
>> +	if (event_size > 0)
>> +		memcpy((u8 *)pcr_event + sizeof(struct tcg_pcr_event),
>> +		       event_data, event_size);
>> +
>> +	total_size = sizeof(struct tcg_pcr_event) + event_size;
>> +
>> +	if (tpm12_log_event(evtlog_base, evtlog_size, total_size, pcr_event))
>> +		sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
>> +}
>> +
>> +static void sl_tpm20_log_event(u32 pcr, u32 event_type,
>> +			       const u8 *data, u32 length,
>> +			       const u8 *event_data, u32 event_size)
>> +{
>> +	u8 sha256_hash[SHA256_DIGEST_SIZE] = {0};
>> +	u8 sha1_hash[SHA1_DIGEST_SIZE] = {0};
>> +	u8 log_buf[SL_TPM20_LOG_SIZE] = {0};
>> +	struct sha256_state sctx256 = {0};
>> +	struct tcg_pcr_event2_head *head;
>> +	struct tcg_event_field *event;
>> +	u32 total_size;
>> +	u16 *alg_ptr;
>> +	u8 *dgst_ptr;
>> +
>> +	head = (struct tcg_pcr_event2_head *)log_buf;
>> +	head->pcr_idx = pcr;
>> +	head->event_type = event_type;
>> +	total_size = sizeof(struct tcg_pcr_event2_head);
>> +	alg_ptr = (u16 *)(log_buf + sizeof(struct tcg_pcr_event2_head));
>> +
>> +	for ( ; head->count < 2; head->count++) {
>> +		if (!tpm_algs[head->count].alg_id)
>> +			break;
>> +
>> +		*alg_ptr = tpm_algs[head->count].alg_id;
>> +		dgst_ptr = (u8 *)alg_ptr + sizeof(u16);
>> +
>> +		if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256 &&
>> +		    length) {
>> +			sha256_init(&sctx256);
>> +			sha256_update(&sctx256, data, length);
>> +			sha256_final(&sctx256, &sha256_hash[0]);
>> +		} else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1 &&
>> +			   length) {
>> +			sha1(data, length, &sha1_hash[0]);
>> +		}
>> +
>> +		if (tpm_algs[head->count].alg_id == TPM_ALG_SHA256) {
>> +			memcpy(dgst_ptr, &sha256_hash[0], SHA256_DIGEST_SIZE);
>> +			total_size += SHA256_DIGEST_SIZE + sizeof(u16);
>> +			alg_ptr = (u16 *)((u8 *)alg_ptr + SHA256_DIGEST_SIZE + sizeof(u16));
>> +		} else if (tpm_algs[head->count].alg_id == TPM_ALG_SHA1) {
>> +			memcpy(dgst_ptr, &sha1_hash[0], SHA1_DIGEST_SIZE);
>> +			total_size += SHA1_DIGEST_SIZE + sizeof(u16);
>> +			alg_ptr = (u16 *)((u8 *)alg_ptr + SHA1_DIGEST_SIZE + sizeof(u16));
>> +		} else {
>> +			sl_txt_reset(SL_ERROR_TPM_UNKNOWN_DIGEST);
>> +		}
>> +	}
>> +
>> +	event = (struct tcg_event_field *)(log_buf + total_size);
>> +	event->event_size = event_size;
>> +	if (event_size > 0)
>> +		memcpy((u8 *)event + sizeof(struct tcg_event_field), event_data, event_size);
>> +	total_size += sizeof(struct tcg_event_field) + event_size;
>> +
>> +	if (tpm20_log_event(log20_elem, evtlog_base, evtlog_size, total_size, &log_buf[0]))
>> +		sl_txt_reset(SL_ERROR_TPM_LOGGING_FAILED);
>> +}
>> +
>> +static void sl_tpm_extend_evtlog(u32 pcr, u32 type,
>> +				 const u8 *data, u32 length, const char *desc)
>> +{
>> +	if (tpm_log_ver == SL_TPM20_LOG)
>> +		sl_tpm20_log_event(pcr, type, data, length,
>> +				   (const u8 *)desc, strlen(desc));
>> +	else
>> +		sl_tpm12_log_event(pcr, type, data, length,
>> +				   (const u8 *)desc, strlen(desc));
>> +}
>> +
>> +static struct setup_data *sl_handle_setup_data(struct setup_data *curr,
>> +					       struct slr_policy_entry *entry)
>> +{
>> +	struct setup_indirect *ind;
>> +	struct setup_data *next;
>> +
>> +	if (!curr)
>> +		return NULL;
>> +
>> +	next = (struct setup_data *)(unsigned long)curr->next;
>> +
>> +	/* SETUP_INDIRECT instances have to be handled differently */
>> +	if (curr->type == SETUP_INDIRECT) {
>> +		ind = (struct setup_indirect *)((u8 *)curr + offsetof(struct setup_data, data));
>> +
>> +		sl_check_pmr_coverage((void *)ind->addr, ind->len, true);
>> +
>> +		sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
>> +				     (void *)ind->addr, ind->len,
>> +				     entry->evt_info);
>> +
>> +		return next;
>> +	}
>> +
>> +	sl_check_pmr_coverage(((u8 *)curr) + sizeof(struct setup_data),
>> +			      curr->len, true);
>> +
>> +	sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
>> +			     ((u8 *)curr) + sizeof(struct setup_data),
>> +			     curr->len,
>> +			     entry->evt_info);
>> +
>> +	return next;
>> +}
>> +
>> +static void sl_extend_setup_data(struct slr_policy_entry *entry)
>> +{
>> +	struct setup_data *data;
>> +
>> +	/*
>> +	 * Measuring the boot params measured the fixed e820 memory map.
>> +	 * Measure any setup_data entries including e820 extended entries.
>> +	 */
>> +	data = (struct setup_data *)(unsigned long)entry->entity;
>> +	while (data)
>> +		data = sl_handle_setup_data(data, entry);
>> +}
>> +
>> +static void sl_extend_slrt(struct slr_policy_entry *entry)
>> +{
>> +	struct slr_table *slrt = (struct slr_table *)entry->entity;
>> +	struct slr_entry_intel_info *intel_info;
>> +
>> +	/*
>> +	 * In revision one of the SLRT, the only table that needs to be
>> +	 * measured is the Intel info table. Everything else is meta-data,
>> +	 * addresses and sizes. Note the size of what to measure is not set.
>> +	 * The flag SLR_POLICY_IMPLICIT_SIZE leaves it to the measuring code
>> +	 * to sort out.
>> +	 */
>> +	if (slrt->revision == 1) {
>> +		intel_info = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_INTEL_INFO);
>> +		if (!intel_info)
>> +			sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
>> +
>> +		sl_tpm_extend_evtlog(entry->pcr, TXT_EVTYPE_SLAUNCH,
>> +				     (void *)entry->entity, sizeof(struct slr_entry_intel_info),
>> +				     entry->evt_info);
>> +	}
>> +}
>> +
>> +static void sl_extend_txt_os2mle(struct slr_policy_entry *entry)
>> +{
>> +	struct txt_os_mle_data *os_mle_data;
>> +	void *txt_heap;
>> +
>> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +	os_mle_data = txt_os_mle_data_start(txt_heap);
>> +
>> +	/*
>> +	 * Version 1 of the OS-MLE heap structure has no fields to measure. It just
>> +	 * has addresses and sizes and a scratch buffer.
>> +	 */
>> +	if (os_mle_data->version == 1)
>> +		return;
>> +}
>> +
>> +static void sl_process_extend_policy(struct slr_table *slrt)
>> +{
>> +	struct slr_entry_policy *policy;
>> +	u16 i;
>> +
>> +	policy = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_ENTRY_POLICY);
>> +	if (!policy)
>> +		sl_txt_reset(SL_ERROR_SLRT_MISSING_ENTRY);
>> +
>> +	for (i = 0; i < policy->nr_entries; i++) {
>> +		switch (policy->policy_entries[i].entity_type) {
>> +		case SLR_ET_SETUP_DATA:
>> +			sl_extend_setup_data(&policy->policy_entries[i]);
>> +			break;
>> +		case SLR_ET_SLRT:
>> +			sl_extend_slrt(&policy->policy_entries[i]);
>> +			break;
>> +		case SLR_ET_TXT_OS2MLE:
>> +			sl_extend_txt_os2mle(&policy->policy_entries[i]);
>> +			break;
>> +		case SLR_ET_UNUSED:
>> +			continue;
>> +		default:
>> +			sl_tpm_extend_evtlog(policy->policy_entries[i].pcr, TXT_EVTYPE_SLAUNCH,
>> +					     (void *)policy->policy_entries[i].entity,
>> +					     policy->policy_entries[i].size,
>> +					     policy->policy_entries[i].evt_info);
>> +		}
>> +	}
>> +}
>> +
>> +static void sl_process_extend_uefi_config(struct slr_table *slrt)
>> +{
>> +	struct slr_entry_uefi_config *uefi_config;
>> +	u16 i;
>> +
>> +	uefi_config = slr_next_entry_by_tag(slrt, NULL, SLR_ENTRY_UEFI_CONFIG);
>> +
>> +	/* Optionally here depending on how SL kernel was booted */
>> +	if (!uefi_config)
>> +		return;
>> +
>> +	for (i = 0; i < uefi_config->nr_entries; i++) {
>> +		sl_tpm_extend_evtlog(uefi_config->uefi_cfg_entries[i].pcr, TXT_EVTYPE_SLAUNCH,
>> +				     (void *)uefi_config->uefi_cfg_entries[i].cfg,
>> +				     uefi_config->uefi_cfg_entries[i].size,
>> +				     uefi_config->uefi_cfg_entries[i].evt_info);
>> +	}
>> +}
>> +
>> +asmlinkage __visible void sl_check_region(void *base, u32 size)
>> +{
>> +	sl_check_pmr_coverage(base, size, false);
>> +}
>> +
>> +asmlinkage __visible void sl_main(void *bootparams)
>> +{
>> +	struct boot_params *bp  = (struct boot_params *)bootparams;
>> +	struct txt_os_mle_data *os_mle_data;
>> +	struct slr_table *slrt;
>> +	void *txt_heap;
>> +
>> +	/*
>> +	 * Ensure loadflags do not indicate a secure launch was done
>> +	 * unless it really was.
>> +	 */
>> +	bp->hdr.loadflags &= ~SLAUNCH_FLAG;
>> +
>> +	/*
>> +	 * Currently only Intel TXT is supported for Secure Launch. Testing
>> +	 * this value also indicates that the kernel was booted successfully
>> +	 * through the Secure Launch entry point and is in SMX mode.
>> +	 */
>> +	if (!(sl_cpu_type & SL_CPU_INTEL))
>> +		return;
>> +
>> +	slrt = sl_locate_and_validate_slrt();
>> +
>> +	/* Locate the TPM event log. */
>> +	sl_find_drtm_event_log(slrt);
>> +
>> +	/* Validate the location of the event log buffer before using it */
>> +	sl_validate_event_log_buffer();
>> +
>> +	/*
>> +	 * Find the TPM hash algorithms used by the ACM and recorded in the
>> +	 * event log.
>> +	 */
>> +	if (tpm_log_ver == SL_TPM20_LOG)
>> +		sl_find_event_log_algorithms();
>> +
>> +	/*
>> +	 * Sanitize them before measuring. Set the SLAUNCH_FLAG early since if
>> +	 * anything fails, the system will reset anyway.
>> +	 */
>> +	sanitize_boot_params(bp);
>> +	bp->hdr.loadflags |= SLAUNCH_FLAG;
>> +
>> +	sl_check_pmr_coverage(bootparams, PAGE_SIZE, false);
>> +
>> +	/* Place event log SL specific tags before and after measurements */
>> +	sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_START, NULL, 0, "");
>> +
>> +	/* Process all policy entries and extend the measurements to the evtlog */
> 
> These comments obfuscate code here but would make a lot more sense
> in the beginning of each corresponding function.
> 
> /*
>   * Process all policy entries and extend the measurements to the evtlog
>   */
> static void sl_process_extend_policy(struct slr_table *slrt)
> {
> 	/* ... */
> }

Sure that sounds like a good idea.

> 
> BTW what good that "process" does here? Why not just sl_extend_policy()?

Because the entities in the SLR table have to be processed then 
extended. They are not just fed into the extend routine as they are when 
fetched from the SLR table.

> 
> 
>> +	sl_process_extend_policy(slrt);
>> +
>> +	/* Process all EFI config entries and extend the measurements to the evtlog */
>> +	sl_process_extend_uefi_config(slrt);
> 
> Ditto.
> 
>> +
>> +	sl_tpm_extend_evtlog(17, TXT_EVTYPE_SLAUNCH_END, NULL, 0, "");
>> +
>> +	/* No PMR check is needed, the TXT heap is covered by the DPR */
>> +	txt_heap = (void *)sl_txt_read(TXT_CR_HEAP_BASE);
>> +	os_mle_data = txt_os_mle_data_start(txt_heap);
>> +
>> +	/*
>> +	 * Now that the OS-MLE data is measured, ensure the MTRR and
>> +	 * misc enable MSRs are what we expect.
>> +	 */
>> +	sl_txt_validate_msrs(os_mle_data);
>> +}
>> diff --git a/arch/x86/boot/compressed/sl_stub.S b/arch/x86/boot/compressed/sl_stub.S
>> new file mode 100644
>> index 000000000000..24b8f23d5dcc
>> --- /dev/null
>> +++ b/arch/x86/boot/compressed/sl_stub.S
>> @@ -0,0 +1,725 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +
>> +/*
>> + * Secure Launch protected mode entry point.
>> + *
>> + * Copyright (c) 2024, Oracle and/or its affiliates.
>> + */
>> +	.code32
>> +	.text
>> +#include <linux/linkage.h>
>> +#include <asm/segment.h>
>> +#include <asm/msr.h>
>> +#include <asm/apicdef.h>
>> +#include <asm/trapnr.h>
>> +#include <asm/processor-flags.h>
>> +#include <asm/asm-offsets.h>
>> +#include <asm/bootparam.h>
>> +#include <asm/page_types.h>
>> +#include <asm/irq_vectors.h>
>> +#include <linux/slr_table.h>
>> +#include <linux/slaunch.h>
>> +
>> +/* CPUID: leaf 1, ECX, SMX feature bit */
>> +#define X86_FEATURE_BIT_SMX	(1 << 6)
>> +
>> +#define IDT_VECTOR_LO_BITS	0
>> +#define IDT_VECTOR_HI_BITS	6
>> +
>> +/*
>> + * See the comment in head_64.S for detailed information on what this macro
>> + * and others like it are used for. The comment appears right at the top of
>> + * the file.
>> + */
>> +#define rva(X) ((X) - sl_stub_entry)
>> +
>> +/*
>> + * The GETSEC op code is open coded because older versions of
>> + * GCC do not support the getsec mnemonic.
>> + */
>> +.macro GETSEC leaf
>> +	pushl	%ebx
>> +	xorl	%ebx, %ebx	/* Must be zero for SMCTRL */
>> +	movl	\leaf, %eax	/* Leaf function */
>> +	.byte 	0x0f, 0x37	/* GETSEC opcode */
>> +	popl	%ebx
>> +.endm
>> +
>> +.macro TXT_RESET error
>> +	/*
>> +	 * Set a sticky error value and reset. Note the movs to %eax act as
>> +	 * TXT register barriers.
>> +	 */
>> +	movl	\error, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
>> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
>> +	movl	$1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_NO_SECRETS)
>> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
>> +	movl	$1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_UNLOCK_MEM_CONFIG)
>> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_E2STS), %eax
>> +	movl	$1, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_CMD_RESET)
>> +1:
>> +	hlt
>> +	jmp	1b
>> +.endm
>> +
>> +	.code32
>> +SYM_FUNC_START(sl_stub_entry)
>> +	cli
>> +	cld
>> +
>> +	/*
>> +	 * On entry, %ebx has the entry abs offset to sl_stub_entry. This
>> +	 * will be correctly scaled using the rva macro and avoid causing
>> +	 * relocations. Only %cs and %ds segments are known good.
>> +	 */
>> +
>> +	/* Load GDT, set segment regs and lret to __SL32_CS */
>> +	leal	rva(sl_gdt_desc)(%ebx), %eax
>> +	addl	%eax, 2(%eax)
>> +	lgdt	(%eax)
>> +
>> +	movl	$(__SL32_DS), %eax
>> +	movw	%ax, %ds
>> +	movw	%ax, %es
>> +	movw	%ax, %fs
>> +	movw	%ax, %gs
>> +	movw	%ax, %ss
>> +
>> +	/*
>> +	 * Now that %ss is known good, take the first stack for the BSP. The
>> +	 * AP stacks are only used on Intel.
>> +	 */
>> +	leal	rva(sl_stacks_end)(%ebx), %esp
>> +
>> +	leal	rva(.Lsl_cs)(%ebx), %eax
>> +	pushl	$(__SL32_CS)
>> +	pushl	%eax
>> +	lret
>> +
>> +.Lsl_cs:
>> +	/* Save our base pointer reg and page table for MLE */
>> +	pushl	%ebx
>> +	pushl	%ecx
>> +
>> +	/* See if SMX feature is supported. */
>> +	movl	$1, %eax
>> +	cpuid
>> +	testl	$(X86_FEATURE_BIT_SMX), %ecx
>> +	jz	.Ldo_unknown_cpu
>> +
>> +	popl	%ecx
>> +	popl	%ebx
>> +
>> +	/* Know it is Intel */
>> +	movl	$(SL_CPU_INTEL), rva(sl_cpu_type)(%ebx)
>> +
>> +	/* Locate the base of the MLE using the page tables in %ecx */
>> +	call	sl_find_mle_base
>> +
>> +	/* Increment CPU count for BSP */
>> +	incl	rva(sl_txt_cpu_count)(%ebx)
>> +
>> +	/*
>> +	 * Enable SMI with GETSEC[SMCTRL] which were disabled by SENTER.
>> +	 * NMIs were also disabled by SENTER. Since there is no IDT for the BSP,
>> +	 * allow the mainline kernel re-enable them in the normal course of
>> +	 * booting.
>> +	 */
>> +	GETSEC	$(SMX_X86_GETSEC_SMCTRL)
>> +
>> +	/* Clear the TXT error registers for a clean start of day */
>> +	movl	$0, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ERRORCODE)
>> +	movl	$0xffffffff, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_ESTS)
>> +
>> +	/* On Intel, the zero page address is passed in the TXT heap */
>> +	/* Read physical base of heap into EAX */
>> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
>> +	/* Read the size of the BIOS data into ECX (first 8 bytes) */
>> +	movl	(%eax), %ecx
>> +	/* Skip over BIOS data and size of OS to MLE data section */
>> +	leal	8(%eax, %ecx), %eax
>> +
>> +	/* Need to verify the values in the OS-MLE struct passed in */
>> +	call	sl_txt_verify_os_mle_struct
>> +
>> +	/*
>> +	 * Get the boot params address from the heap. Note %esi and %ebx MUST
>> +	 * be preserved across calls and operations.
>> +	 */
>> +	movl	SL_boot_params_addr(%eax), %esi
>> +
>> +	/* Save %ebx so the APs can find their way home */
>> +	movl	%ebx, (SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax)
>> +
>> +	/* Fetch the AP wake code block address from the heap */
>> +	movl	SL_ap_wake_block(%eax), %edi
>> +	movl	%edi, rva(sl_txt_ap_wake_block)(%ebx)
>> +
>> +	/* Store the offset in the AP wake block to the jmp address */
>> +	movl	$(sl_ap_jmp_offset - sl_txt_ap_wake_begin), \
>> +		(SL_mle_scratch + SL_SCRATCH_AP_JMP_OFFSET)(%eax)
>> +
>> +	/* Store the offset in the AP wake block to the AP stacks block */
>> +	movl	$(sl_stacks - sl_txt_ap_wake_begin), \
>> +		(SL_mle_scratch + SL_SCRATCH_AP_STACKS_OFFSET)(%eax)
>> +
>> +	/* %eax still is the base of the OS-MLE block, save it */
>> +	pushl	%eax
>> +
>> +	/* Relocate the AP wake code to the safe block */
>> +	call	sl_txt_reloc_ap_wake
>> +
>> +	/*
>> +	 * Wake up all APs that are blocked in the ACM and wait for them to
>> +	 * halt. This should be done before restoring the MTRRs so the ACM is
>> +	 * still properly in WB memory.
>> +	 */
>> +	call	sl_txt_wake_aps
>> +
>> +	/* Restore OS-MLE in %eax */
>> +	popl	%eax
>> +
>> +	/*
>> +	 * %edi is used by this routine to find the MTRRs which are in the SLRT
>> +	 * in the Intel info.
>> +	 */
>> +	movl	SL_txt_info(%eax), %edi
>> +	call	sl_txt_load_regs
>> +
>> +	jmp	.Lcpu_setup_done
>> +
>> +.Ldo_unknown_cpu:
>> +	/* Non-Intel CPUs are not yet supported */
>> +	ud2
>> +
>> +.Lcpu_setup_done:
>> +	/*
>> +	 * Don't enable MCE at this point. The kernel will enable
>> +	 * it on the BSP later when it is ready.
>> +	 */
>> +
>> +	/* Done, jump to normal 32b pm entry */
>> +	jmp	startup_32
>> +SYM_FUNC_END(sl_stub_entry)
>> +
>> +SYM_FUNC_START(sl_find_mle_base)
>> +	/* %ecx has PDPT, get first PD */
>> +	movl	(%ecx), %eax
>> +	andl	$(PAGE_MASK), %eax
>> +	/* Get first PT from first PDE */
>> +	movl	(%eax), %eax
>> +	andl	$(PAGE_MASK), %eax
>> +	/* Get MLE base from first PTE */
>> +	movl	(%eax), %eax
>> +	andl	$(PAGE_MASK), %eax
>> +
>> +	movl	%eax, rva(sl_mle_start)(%ebx)
>> +	ret
>> +SYM_FUNC_END(sl_find_mle_base)
>> +
>> +SYM_FUNC_START(sl_check_buffer_mle_overlap)
>> +	/* %ecx: buffer begin %edx: buffer end */
>> +	/* %ebx: MLE begin %edi: MLE end */
>> +	/* %eax: region may be inside MLE */
>> +
>> +	cmpl	%edi, %ecx
>> +	jb	.Lnext_check
>> +	cmpl	%edi, %edx
>> +	jbe	.Lnext_check
>> +	jmp	.Lvalid /* Buffer above MLE */
>> +
>> +.Lnext_check:
>> +	cmpl	%ebx, %edx
>> +	ja	.Linside_check
>> +	cmpl	%ebx, %ecx
>> +	jae	.Linside_check
>> +	jmp	.Lvalid /* Buffer below MLE */
>> +
>> +.Linside_check:
>> +	cmpl	$0, %eax
>> +	jz	.Linvalid
>> +	cmpl	%ebx, %ecx
>> +	jb	.Linvalid
>> +	cmpl	%edi, %edx
>> +	ja	.Linvalid
>> +	jmp	.Lvalid /* Buffer in MLE */
>> +
>> +.Linvalid:
>> +	TXT_RESET $(SL_ERROR_MLE_BUFFER_OVERLAP)
>> +
>> +.Lvalid:
>> +	ret
>> +SYM_FUNC_END(sl_check_buffer_mle_overlap)
>> +
>> +SYM_FUNC_START(sl_txt_verify_os_mle_struct)
>> +	pushl	%ebx
>> +	/*
>> +	 * %eax points to the base of the OS-MLE struct. Need to also
>> +	 * read some values from the OS-SINIT struct too.
>> +	 */
>> +	movl	-8(%eax), %ecx
>> +	/* Skip over OS to MLE data section and size of OS-SINIT structure */
>> +	leal	(%eax, %ecx), %edx
>> +
>> +	/* Load MLE image base absolute offset */
>> +	movl	rva(sl_mle_start)(%ebx), %ebx
>> +
>> +	/* Verify the value of the low PMR base. It should always be 0. */
>> +	movl	SL_vtd_pmr_lo_base(%edx), %esi
>> +	cmpl	$0, %esi
>> +	jz	.Lvalid_pmr_base
>> +	TXT_RESET $(SL_ERROR_LO_PMR_BASE)
>> +
>> +.Lvalid_pmr_base:
>> +	/* Grab some values from OS-SINIT structure */
>> +	movl	SL_mle_size(%edx), %edi
>> +	addl	%ebx, %edi
>> +	jc	.Loverflow_detected
>> +	movl	SL_vtd_pmr_lo_size(%edx), %esi
>> +
>> +	/* Check the AP wake block */
>> +	movl	SL_ap_wake_block(%eax), %ecx
>> +	movl	SL_ap_wake_block_size(%eax), %edx
>> +	addl	%ecx, %edx
>> +	jc	.Loverflow_detected
>> +	pushl	%eax
>> +	xorl	%eax, %eax
>> +	call	sl_check_buffer_mle_overlap
>> +	popl	%eax
>> +	cmpl	%esi, %edx
>> +	ja	.Lbuffer_beyond_pmr
>> +
>> +	/*
>> +	 * Check the boot params. Note during a UEFI boot, the boot
>> +	 * params will be inside the MLE image. Test for this case
>> +	 * in the overlap case.
>> +	 */
>> +	movl	SL_boot_params_addr(%eax), %ecx
>> +	movl	$(PAGE_SIZE), %edx
>> +	addl	%ecx, %edx
>> +	jc	.Loverflow_detected
>> +	pushl	%eax
>> +	movl	$1, %eax
>> +	call	sl_check_buffer_mle_overlap
>> +	popl	%eax
>> +	cmpl	%esi, %edx
>> +	ja	.Lbuffer_beyond_pmr
>> +
>> +	/* Check that the AP wake block is big enough */
>> +	cmpl	$(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), \
>> +		SL_ap_wake_block_size(%eax)
>> +	jae	.Lwake_block_ok
>> +	TXT_RESET $(SL_ERROR_WAKE_BLOCK_TOO_SMALL)
>> +
>> +.Lwake_block_ok:
>> +	popl	%ebx
>> +	ret
>> +
>> +.Loverflow_detected:
>> +	TXT_RESET $(SL_ERROR_INTEGER_OVERFLOW)
>> +
>> +.Lbuffer_beyond_pmr:
>> +	TXT_RESET $(SL_ERROR_BUFFER_BEYOND_PMR)
>> +SYM_FUNC_END(sl_txt_verify_os_mle_struct)
>> +
>> +SYM_FUNC_START(sl_txt_ap_entry)
>> +	cli
>> +	cld
>> +	/*
>> +	 * The %cs and %ds segments are known good after waking the AP.
>> +	 * First order of business is to find where we are and
>> +	 * save it in %ebx.
>> +	 */
>> +
>> +	/* Read physical base of heap into EAX */
>> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
>> +	/* Read the size of the BIOS data into ECX (first 8 bytes) */
>> +	movl	(%eax), %ecx
>> +	/* Skip over BIOS data and size of OS to MLE data section */
>> +	leal	8(%eax, %ecx), %eax
>> +
>> +	/* Saved %ebx from the BSP and stash OS-MLE pointer */
>> +	movl	(SL_mle_scratch + SL_SCRATCH_AP_EBX)(%eax), %ebx
>> +
>> +	/* Save TXT info ptr in %edi for call to sl_txt_load_regs */
>> +	movl	SL_txt_info(%eax), %edi
>> +
>> +	/* Lock and get our stack index */
>> +	movl	$1, %ecx
>> +.Lspin:
>> +	xorl	%eax, %eax
>> +	lock cmpxchgl	%ecx, rva(sl_txt_spin_lock)(%ebx)
>> +	pause
>> +	jnz	.Lspin
>> +
>> +	/* Increment the stack index and use the next value inside lock */
>> +	incl	rva(sl_txt_stack_index)(%ebx)
>> +	movl	rva(sl_txt_stack_index)(%ebx), %eax
>> +
>> +	/* Unlock */
>> +	movl	$0, rva(sl_txt_spin_lock)(%ebx)
>> +
>> +	/* Location of the relocated AP wake block */
>> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %ecx
>> +
>> +	/* Load reloc GDT, set segment regs and lret to __SL32_CS */
>> +	lgdt	(sl_ap_gdt_desc - sl_txt_ap_wake_begin)(%ecx)
>> +
>> +	movl	$(__SL32_DS), %edx
>> +	movw	%dx, %ds
>> +	movw	%dx, %es
>> +	movw	%dx, %fs
>> +	movw	%dx, %gs
>> +	movw	%dx, %ss
>> +
>> +	/* Load our reloc AP stack */
>> +	movl	$(TXT_BOOT_STACK_SIZE), %edx
>> +	mull	%edx
>> +	leal	(sl_stacks_end - sl_txt_ap_wake_begin)(%ecx), %esp
>> +	subl	%eax, %esp
>> +
>> +	/* Switch to AP code segment */
>> +	leal	rva(.Lsl_ap_cs)(%ebx), %eax
>> +	pushl	$(__SL32_CS)
>> +	pushl	%eax
>> +	lret
>> +
>> +.Lsl_ap_cs:
>> +	/* Load the relocated AP IDT */
>> +	lidt	(sl_ap_idt_desc - sl_txt_ap_wake_begin)(%ecx)
>> +
>> +	/* Fixup MTRRs and misc enable MSR on APs too */
>> +	call	sl_txt_load_regs
>> +
>> +	/* Enable SMI with GETSEC[SMCTRL] */
>> +	GETSEC $(SMX_X86_GETSEC_SMCTRL)
>> +
>> +	/* IRET-to-self can be used to enable NMIs which SENTER disabled */
>> +	leal	rva(.Lnmi_enabled_ap)(%ebx), %eax
>> +	pushfl
>> +	pushl	$(__SL32_CS)
>> +	pushl	%eax
>> +	iret
>> +
>> +.Lnmi_enabled_ap:
>> +	/* Put APs in X2APIC mode like the BSP */
>> +	movl	$(MSR_IA32_APICBASE), %ecx
>> +	rdmsr
>> +	orl	$(XAPIC_ENABLE | X2APIC_ENABLE), %eax
>> +	wrmsr
>> +
>> +	/*
>> +	 * Basically done, increment the CPU count and jump off to the AP
>> +	 * wake block to wait.
>> +	 */
>> +	lock incl	rva(sl_txt_cpu_count)(%ebx)
>> +
>> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %eax
>> +	jmp	*%eax
>> +SYM_FUNC_END(sl_txt_ap_entry)
>> +
>> +SYM_FUNC_START(sl_txt_reloc_ap_wake)
>> +	/* Save boot params register */
>> +	pushl	%esi
>> +
>> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %edi
>> +
>> +	/* Fixup AP IDT and GDT descriptor before relocating */
>> +	leal	rva(sl_ap_idt_desc)(%ebx), %eax
>> +	addl	%edi, 2(%eax)
>> +	leal	rva(sl_ap_gdt_desc)(%ebx), %eax
>> +	addl	%edi, 2(%eax)
>> +
>> +	/*
>> +	 * Copy the AP wake code and AP GDT/IDT to the protected wake block
>> +	 * provided by the loader. Destination already in %edi.
>> +	 */
>> +	movl	$(sl_txt_ap_wake_end - sl_txt_ap_wake_begin), %ecx
>> +	leal	rva(sl_txt_ap_wake_begin)(%ebx), %esi
>> +	rep movsb
>> +
>> +	/* Setup the IDT for the APs to use in the relocation block */
>> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %ecx
>> +	addl	$(sl_ap_idt - sl_txt_ap_wake_begin), %ecx
>> +	xorl	%edx, %edx
>> +
>> +	/* Form the default reset vector relocation address */
>> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %esi
>> +	addl	$(sl_txt_int_reset - sl_txt_ap_wake_begin), %esi
>> +
>> +1:
>> +	cmpw	$(NR_VECTORS), %dx
>> +	jz	.Lap_idt_done
>> +
>> +	cmpw	$(X86_TRAP_NMI), %dx
>> +	jz	2f
>> +
>> +	/* Load all other fixed vectors with reset handler */
>> +	movl	%esi, %eax
>> +	movw	%ax, (IDT_VECTOR_LO_BITS)(%ecx)
>> +	shrl	$16, %eax
>> +	movw	%ax, (IDT_VECTOR_HI_BITS)(%ecx)
>> +	jmp	3f
>> +
>> +2:
>> +	/* Load single wake NMI IPI vector at the relocation address */
>> +	movl	rva(sl_txt_ap_wake_block)(%ebx), %eax
>> +	addl	$(sl_txt_int_nmi - sl_txt_ap_wake_begin), %eax
>> +	movw	%ax, (IDT_VECTOR_LO_BITS)(%ecx)
>> +	shrl	$16, %eax
>> +	movw	%ax, (IDT_VECTOR_HI_BITS)(%ecx)
>> +
>> +3:
>> +	incw	%dx
>> +	addl	$8, %ecx
>> +	jmp	1b
>> +
>> +.Lap_idt_done:
>> +	popl	%esi
>> +	ret
>> +SYM_FUNC_END(sl_txt_reloc_ap_wake)
>> +
>> +SYM_FUNC_START(sl_txt_load_regs)
>> +	/* Save base pointer register */
>> +	pushl	%ebx
>> +
>> +	/*
>> +	 * On Intel, the original variable MTRRs and Misc Enable MSR are
>> +	 * restored on the BSP at early boot. Each AP will also restore
>> +	 * its MTRRs and Misc Enable MSR.
>> +	 */
>> +	pushl	%edi
>> +	addl	$(SL_saved_bsp_mtrrs), %edi
>> +	movl	(%edi), %ebx
>> +	pushl	%ebx /* default_mem_type lo */
>> +	addl	$4, %edi
>> +	movl	(%edi), %ebx
>> +	pushl	%ebx /* default_mem_type hi */
>> +	addl	$4, %edi
>> +	movl	(%edi), %ebx /* mtrr_vcnt lo, don't care about hi part */
>> +	addl	$8, %edi /* now at MTRR pair array */
>> +	/* Write the variable MTRRs */
>> +	movl	$(MSR_MTRRphysBase0), %ecx
>> +1:
>> +	cmpl	$0, %ebx
>> +	jz	2f
>> +
>> +	movl	(%edi), %eax /* MTRRphysBaseX lo */
>> +	addl	$4, %edi
>> +	movl	(%edi), %edx /* MTRRphysBaseX hi */
>> +	wrmsr
>> +	addl	$4, %edi
>> +	incl	%ecx
>> +	movl	(%edi), %eax /* MTRRphysMaskX lo */
>> +	addl	$4, %edi
>> +	movl	(%edi), %edx /* MTRRphysMaskX hi */
>> +	wrmsr
>> +	addl	$4, %edi
>> +	incl	%ecx
>> +
>> +	decl	%ebx
>> +	jmp	1b
>> +2:
>> +	/* Write the default MTRR register */
>> +	popl	%edx
>> +	popl	%eax
>> +	movl	$(MSR_MTRRdefType), %ecx
>> +	wrmsr
>> +
>> +	/* Return to beginning and write the misc enable msr */
>> +	popl	%edi
>> +	addl	$(SL_saved_misc_enable_msr), %edi
>> +	movl	(%edi), %eax /* saved_misc_enable_msr lo */
>> +	addl	$4, %edi
>> +	movl	(%edi), %edx /* saved_misc_enable_msr hi */
>> +	movl	$(MSR_IA32_MISC_ENABLE), %ecx
>> +	wrmsr
>> +
>> +	popl	%ebx
>> +	ret
>> +SYM_FUNC_END(sl_txt_load_regs)
>> +
>> +SYM_FUNC_START(sl_txt_wake_aps)
>> +	/* Save boot params register */
>> +	pushl	%esi
>> +
>> +	/* First setup the MLE join structure and load it into TXT reg */
>> +	leal	rva(sl_gdt)(%ebx), %eax
>> +	leal	rva(sl_txt_ap_entry)(%ebx), %ecx
>> +	leal	rva(sl_smx_rlp_mle_join)(%ebx), %edx
>> +	movl	%eax, SL_rlp_gdt_base(%edx)
>> +	movl	%ecx, SL_rlp_entry_point(%edx)
>> +	movl	%edx, (TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_MLE_JOIN)
>> +
>> +	/* Another TXT heap walk to find various values needed to wake APs */
>> +	movl	(TXT_PRIV_CONFIG_REGS_BASE + TXT_CR_HEAP_BASE), %eax
>> +	/* At BIOS data size, find the number of logical processors */
>> +	movl	(SL_num_logical_procs + 8)(%eax), %edx
>> +	/* Skip over BIOS data */
>> +	movl	(%eax), %ecx
>> +	addl	%ecx, %eax
>> +	/* Skip over OS to MLE */
>> +	movl	(%eax), %ecx
>> +	addl	%ecx, %eax
>> +	/* At OS-SNIT size, get capabilities to know how to wake up the APs */
>> +	movl	(SL_capabilities + 8)(%eax), %esi
>> +	/* Skip over OS to SNIT */
>> +	movl	(%eax), %ecx
>> +	addl	%ecx, %eax
>> +	/* At SINIT-MLE size, get the AP wake MONITOR address */
>> +	movl	(SL_rlp_wakeup_addr + 8)(%eax), %edi
>> +
>> +	/* Determine how to wake up the APs */
>> +	testl	$(1 << TXT_SINIT_MLE_CAP_WAKE_MONITOR), %esi
>> +	jz	.Lwake_getsec
>> +
>> +	/* Wake using MWAIT MONITOR */
>> +	movl	$1, (%edi)
>> +	jmp	.Laps_awake
>> +
>> +.Lwake_getsec:
>> +	/* Wake using GETSEC(WAKEUP) */
>> +	GETSEC	$(SMX_X86_GETSEC_WAKEUP)
>> +
>> +.Laps_awake:
>> +	/*
>> +	 * All of the APs are woken up and rendesvous in the relocated wake
>> +	 * block starting at sl_txt_ap_wake_begin. Wait for all of them to
>> +	 * halt.
>> +	 */
>> +	pause
>> +	cmpl	rva(sl_txt_cpu_count)(%ebx), %edx
>> +	jne	.Laps_awake
>> +
>> +	popl	%esi
>> +	ret
>> +SYM_FUNC_END(sl_txt_wake_aps)
>> +
>> +/* This is the beginning of the relocated AP wake code block */
>> +	.global sl_txt_ap_wake_begin
>> +sl_txt_ap_wake_begin:
>> +
>> +	/* Get the LAPIC ID for each AP and stash it on the stack */
>> +	movl	$(MSR_IA32_X2APIC_APICID), %ecx
>> +	rdmsr
>> +	pushl	%eax
>> +
>> +	/*
>> +	 * Get a pointer to the monitor location on this APs stack to test below
>> +	 * after mwait returns. Currently %esp points to just past the pushed APIC
>> +	 * ID value.
>> +	 */
>> +	movl	%esp, %eax
>> +	subl	$(TXT_BOOT_STACK_SIZE - 4), %eax
>> +	movl	$0, (%eax)
>> +
>> +	/* Clear ecx/edx so no invalid extensions or hints are passed to monitor */
>> +	xorl	%ecx, %ecx
>> +	xorl	%edx, %edx
>> +
>> +	/*
>> +	 * Arm the monitor and wait for it to be poked by he SMP bringup code. The mwait
>> +	 * instruction can return for a number of reasons. Test to see if it returned
>> +	 * because the monitor was written to.
>> +	 */
>> +	monitor
>> +
>> +1:
>> +	mfence
>> +	mwait
>> +	movl	(%eax), %edx
>> +	testl	%edx, %edx
>> +	jz	1b
>> +
>> +	/*
>> +	 * This is the long absolute jump to the 32b Secure Launch protected mode stub
>> +	 * code in sl_trampoline_start32() in the rmpiggy. The jump address will be
>> +	 * fixed in the SMP boot code when the first AP is brought up. This whole area
>> +	 * is provided and protected in the memory map by the prelaunch code.
>> +	 */
>> +	.byte	0xea
>> +sl_ap_jmp_offset:
>> +	.long	0x00000000
>> +	.word	__SL32_CS
>> +
>> +SYM_FUNC_START(sl_txt_int_nmi)
>> +	/* NMI context, just IRET */
>> +	iret
>> +SYM_FUNC_END(sl_txt_int_nmi)
>> +
>> +SYM_FUNC_START(sl_txt_int_reset)
>> +	TXT_RESET $(SL_ERROR_INV_AP_INTERRUPT)
>> +SYM_FUNC_END(sl_txt_int_reset)
>> +
>> +	.balign 8
>> +SYM_DATA_START_LOCAL(sl_ap_idt_desc)
>> +	.word	sl_ap_idt_end - sl_ap_idt - 1		/* Limit */
>> +	.long	sl_ap_idt - sl_txt_ap_wake_begin	/* Base */
>> +SYM_DATA_END_LABEL(sl_ap_idt_desc, SYM_L_LOCAL, sl_ap_idt_desc_end)
>> +
>> +	.balign 8
>> +SYM_DATA_START_LOCAL(sl_ap_idt)
>> +	.rept	NR_VECTORS
>> +	.word	0x0000		/* Offset 15 to 0 */
>> +	.word	__SL32_CS	/* Segment selector */
>> +	.word	0x8e00		/* Present, DPL=0, 32b Vector, Interrupt */
>> +	.word	0x0000		/* Offset 31 to 16 */
>> +	.endr
>> +SYM_DATA_END_LABEL(sl_ap_idt, SYM_L_LOCAL, sl_ap_idt_end)
>> +
>> +	.balign 8
>> +SYM_DATA_START_LOCAL(sl_ap_gdt_desc)
>> +	.word	sl_ap_gdt_end - sl_ap_gdt - 1
>> +	.long	sl_ap_gdt - sl_txt_ap_wake_begin
>> +SYM_DATA_END_LABEL(sl_ap_gdt_desc, SYM_L_LOCAL, sl_ap_gdt_desc_end)
>> +
>> +	.balign	8
>> +SYM_DATA_START_LOCAL(sl_ap_gdt)
>> +	.quad	0x0000000000000000	/* NULL */
>> +	.quad	0x00cf9a000000ffff	/* __SL32_CS */
>> +	.quad	0x00cf92000000ffff	/* __SL32_DS */
>> +SYM_DATA_END_LABEL(sl_ap_gdt, SYM_L_LOCAL, sl_ap_gdt_end)
>> +
>> +	/* Small stacks for BSP and APs to work with */
>> +	.balign 64
>> +SYM_DATA_START_LOCAL(sl_stacks)
>> +	.fill (TXT_MAX_CPUS * TXT_BOOT_STACK_SIZE), 1, 0
>> +SYM_DATA_END_LABEL(sl_stacks, SYM_L_LOCAL, sl_stacks_end)
>> +
>> +/* This is the end of the relocated AP wake code block */
>> +	.global sl_txt_ap_wake_end
>> +sl_txt_ap_wake_end:
>> +
>> +	.data
>> +	.balign 8
>> +SYM_DATA_START_LOCAL(sl_gdt_desc)
>> +	.word	sl_gdt_end - sl_gdt - 1
>> +	.long	sl_gdt - sl_gdt_desc
>> +SYM_DATA_END_LABEL(sl_gdt_desc, SYM_L_LOCAL, sl_gdt_desc_end)
>> +
>> +	.balign	8
>> +SYM_DATA_START_LOCAL(sl_gdt)
>> +	.quad	0x0000000000000000	/* NULL */
>> +	.quad	0x00cf9a000000ffff	/* __SL32_CS */
>> +	.quad	0x00cf92000000ffff	/* __SL32_DS */
>> +SYM_DATA_END_LABEL(sl_gdt, SYM_L_LOCAL, sl_gdt_end)
>> +
>> +	.balign 8
>> +SYM_DATA_START_LOCAL(sl_smx_rlp_mle_join)
>> +	.long	sl_gdt_end - sl_gdt - 1	/* GDT limit */
>> +	.long	0x00000000		/* GDT base */
>> +	.long	__SL32_CS	/* Seg Sel - CS (DS, ES, SS = seg_sel+8) */
>> +	.long	0x00000000	/* Entry point physical address */
>> +SYM_DATA_END(sl_smx_rlp_mle_join)
>> +
>> +SYM_DATA(sl_cpu_type, .long 0x00000000)
>> +
>> +SYM_DATA(sl_mle_start, .long 0x00000000)
>> +
>> +SYM_DATA_LOCAL(sl_txt_spin_lock, .long 0x00000000)
>> +
>> +SYM_DATA_LOCAL(sl_txt_stack_index, .long 0x00000000)
>> +
>> +SYM_DATA_LOCAL(sl_txt_cpu_count, .long 0x00000000)
>> +
>> +SYM_DATA_LOCAL(sl_txt_ap_wake_block, .long 0x00000000)
>> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
>> index e022e6eb766c..37f6167f28ba 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -348,6 +348,9 @@
>>   #define MSR_IA32_RTIT_OUTPUT_BASE	0x00000560
>>   #define MSR_IA32_RTIT_OUTPUT_MASK	0x00000561
>>   
>> +#define MSR_MTRRphysBase0		0x00000200
>> +#define MSR_MTRRphysMask0		0x00000201
>> +
>>   #define MSR_MTRRfix64K_00000		0x00000250
>>   #define MSR_MTRRfix16K_80000		0x00000258
>>   #define MSR_MTRRfix16K_A0000		0x00000259
>> @@ -849,6 +852,8 @@
>>   #define MSR_IA32_APICBASE_ENABLE	(1<<11)
>>   #define MSR_IA32_APICBASE_BASE		(0xfffff<<12)
>>   
>> +#define MSR_IA32_X2APIC_APICID		0x00000802
>> +
>>   #define MSR_IA32_UCODE_WRITE		0x00000079
>>   #define MSR_IA32_UCODE_REV		0x0000008b
>>   
> 
> MSR updates are better to be split to their own patch.

Yes we can do that, it makes sense.

Thanks

> 
>> diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
>> index 9b82eebd7add..7ce283a22d6b 100644
>> --- a/arch/x86/include/uapi/asm/bootparam.h
>> +++ b/arch/x86/include/uapi/asm/bootparam.h
>> @@ -12,6 +12,7 @@
>>   /* loadflags */
>>   #define LOADED_HIGH	(1<<0)
>>   #define KASLR_FLAG	(1<<1)
>> +#define SLAUNCH_FLAG	(1<<2)
>>   #define QUIET_FLAG	(1<<5)
>>   #define KEEP_SEGMENTS	(1<<6)
>>   #define CAN_USE_HEAP	(1<<7)
>> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
>> index a98020bf31bb..925adce6e2c7 100644
>> --- a/arch/x86/kernel/asm-offsets.c
>> +++ b/arch/x86/kernel/asm-offsets.c
>> @@ -13,6 +13,8 @@
>>   #include <linux/hardirq.h>
>>   #include <linux/suspend.h>
>>   #include <linux/kbuild.h>
>> +#include <linux/slr_table.h>
>> +#include <linux/slaunch.h>
>>   #include <asm/processor.h>
>>   #include <asm/thread_info.h>
>>   #include <asm/sigframe.h>
>> @@ -120,4 +122,22 @@ static void __used common(void)
>>   	OFFSET(ARIA_CTX_rounds, aria_ctx, rounds);
>>   #endif
>>   
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +	BLANK();
>> +	OFFSET(SL_txt_info, txt_os_mle_data, txt_info);
>> +	OFFSET(SL_mle_scratch, txt_os_mle_data, mle_scratch);
>> +	OFFSET(SL_boot_params_addr, txt_os_mle_data, boot_params_addr);
>> +	OFFSET(SL_ap_wake_block, txt_os_mle_data, ap_wake_block);
>> +	OFFSET(SL_ap_wake_block_size, txt_os_mle_data, ap_wake_block_size);
>> +	OFFSET(SL_saved_misc_enable_msr, slr_entry_intel_info, saved_misc_enable_msr);
>> +	OFFSET(SL_saved_bsp_mtrrs, slr_entry_intel_info, saved_bsp_mtrrs);
>> +	OFFSET(SL_num_logical_procs, txt_bios_data, num_logical_procs);
>> +	OFFSET(SL_capabilities, txt_os_sinit_data, capabilities);
>> +	OFFSET(SL_mle_size, txt_os_sinit_data, mle_size);
>> +	OFFSET(SL_vtd_pmr_lo_base, txt_os_sinit_data, vtd_pmr_lo_base);
>> +	OFFSET(SL_vtd_pmr_lo_size, txt_os_sinit_data, vtd_pmr_lo_size);
>> +	OFFSET(SL_rlp_wakeup_addr, txt_sinit_mle_data, rlp_wakeup_addr);
>> +	OFFSET(SL_rlp_gdt_base, smx_rlp_mle_join, rlp_gdt_base);
>> +	OFFSET(SL_rlp_entry_point, smx_rlp_mle_join, rlp_entry_point);
>> +#endif
>>   }
> 
> BR, Jarkko
Ross Philipson June 4, 2024, 9:12 p.m. UTC | #19
On 6/4/24 1:54 PM, Ard Biesheuvel wrote:
> On Tue, 4 Jun 2024 at 19:34, <ross.philipson@oracle.com> wrote:
>>
>> On 6/4/24 10:27 AM, Ard Biesheuvel wrote:
>>> On Tue, 4 Jun 2024 at 19:24, <ross.philipson@oracle.com> wrote:
>>>>
>>>> On 5/31/24 6:33 AM, Ard Biesheuvel wrote:
>>>>> On Fri, 31 May 2024 at 13:00, Ard Biesheuvel <ardb@kernel.org> wrote:
>>>>>>
>>>>>> Hello Ross,
>>>>>>
>>>>>> On Fri, 31 May 2024 at 03:32, Ross Philipson <ross.philipson@oracle.com> wrote:
>>>>>>>
>>>>>>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
>>>>>>> later AMD SKINIT) to vector to during the late launch. The symbol
>>>>>>> sl_stub_entry is that entry point and its offset into the kernel is
>>>>>>> conveyed to the launching code using the MLE (Measured Launch
>>>>>>> Environment) header in the structure named mle_header. The offset of the
>>>>>>> MLE header is set in the kernel_info. The routine sl_stub contains the
>>>>>>> very early late launch setup code responsible for setting up the basic
>>>>>>> environment to allow the normal kernel startup_32 code to proceed. It is
>>>>>>> also responsible for properly waking and handling the APs on Intel
>>>>>>> platforms. The routine sl_main which runs after entering 64b mode is
>>>>>>> responsible for measuring configuration and module information before
>>>>>>> it is used like the boot params, the kernel command line, the TXT heap,
>>>>>>> an external initramfs, etc.
>>>>>>>
>>>>>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>>>>>>> ---
>>>>>>>     Documentation/arch/x86/boot.rst        |  21 +
>>>>>>>     arch/x86/boot/compressed/Makefile      |   3 +-
>>>>>>>     arch/x86/boot/compressed/head_64.S     |  30 +
>>>>>>>     arch/x86/boot/compressed/kernel_info.S |  34 ++
>>>>>>>     arch/x86/boot/compressed/sl_main.c     | 577 ++++++++++++++++++++
>>>>>>>     arch/x86/boot/compressed/sl_stub.S     | 725 +++++++++++++++++++++++++
>>>>>>>     arch/x86/include/asm/msr-index.h       |   5 +
>>>>>>>     arch/x86/include/uapi/asm/bootparam.h  |   1 +
>>>>>>>     arch/x86/kernel/asm-offsets.c          |  20 +
>>>>>>>     9 files changed, 1415 insertions(+), 1 deletion(-)
>>>>>>>     create mode 100644 arch/x86/boot/compressed/sl_main.c
>>>>>>>     create mode 100644 arch/x86/boot/compressed/sl_stub.S
>>>>>>>
>>>>>>> diff --git a/Documentation/arch/x86/boot.rst b/Documentation/arch/x86/boot.rst
>>>>>>> index 4fd492cb4970..295cdf9bcbdb 100644
>>>>>>> --- a/Documentation/arch/x86/boot.rst
>>>>>>> +++ b/Documentation/arch/x86/boot.rst
>>>>>>> @@ -482,6 +482,14 @@ Protocol:  2.00+
>>>>>>>                - If 1, KASLR enabled.
>>>>>>>                - If 0, KASLR disabled.
>>>>>>>
>>>>>>> +  Bit 2 (kernel internal): SLAUNCH_FLAG
>>>>>>> +
>>>>>>> +       - Used internally by the setup kernel to communicate
>>>>>>> +         Secure Launch status to kernel proper.
>>>>>>> +
>>>>>>> +           - If 1, Secure Launch enabled.
>>>>>>> +           - If 0, Secure Launch disabled.
>>>>>>> +
>>>>>>>       Bit 5 (write): QUIET_FLAG
>>>>>>>
>>>>>>>            - If 0, print early messages.
>>>>>>> @@ -1028,6 +1036,19 @@ Offset/size:     0x000c/4
>>>>>>>
>>>>>>>       This field contains maximal allowed type for setup_data and setup_indirect structs.
>>>>>>>
>>>>>>> +============   =================
>>>>>>> +Field name:    mle_header_offset
>>>>>>> +Offset/size:   0x0010/4
>>>>>>> +============   =================
>>>>>>> +
>>>>>>> +  This field contains the offset to the Secure Launch Measured Launch Environment
>>>>>>> +  (MLE) header. This offset is used to locate information needed during a secure
>>>>>>> +  late launch using Intel TXT. If the offset is zero, the kernel does not have
>>>>>>> +  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
>>>>>>> +  following a success measured launch. The specific state of the processors is
>>>>>>> +  outlined in the TXT Software Development Guide, the latest can be found here:
>>>>>>> +  https://urldefense.com/v3/__https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf__;!!ACWV5N9M2RV99hQ!Mng0gnPhOYZ8D02t1rYwQfY6U3uWaypJyd1T2rsWz3QNHr9GhIZ9ANB_-cgPExxX0e0KmCpda-3VX8Fj$
>>>>>>> +
>>>>>>>
>>>>>>
>>>>>> Could we just repaint this field as the offset relative to the start
>>>>>> of kernel_info rather than relative to the start of the image? That
>>>>>> way, there is no need for patch #1, and given that the consumer of
>>>>>> this field accesses it via kernel_info, I wouldn't expect any issues
>>>>>> in applying this offset to obtain the actual address.
>>>>>>
>>>>>>
>>>>>>>     The Image Checksum
>>>>>>>     ==================
>>>>>>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>>>>>>> index 9189a0e28686..9076a248d4b4 100644
>>>>>>> --- a/arch/x86/boot/compressed/Makefile
>>>>>>> +++ b/arch/x86/boot/compressed/Makefile
>>>>>>> @@ -118,7 +118,8 @@ vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>>>>>>>     vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>>>>>>>     vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
>>>>>>>
>>>>>>> -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o
>>>>>>> +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \
>>>>>>> +       $(obj)/sl_main.o $(obj)/sl_stub.o
>>>>>>>
>>>>>>>     $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
>>>>>>>            $(call if_changed,ld)
>>>>>>> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
>>>>>>> index 1dcb794c5479..803c9e2e6d85 100644
>>>>>>> --- a/arch/x86/boot/compressed/head_64.S
>>>>>>> +++ b/arch/x86/boot/compressed/head_64.S
>>>>>>> @@ -420,6 +420,13 @@ SYM_CODE_START(startup_64)
>>>>>>>            pushq   $0
>>>>>>>            popfq
>>>>>>>
>>>>>>> +#ifdef CONFIG_SECURE_LAUNCH
>>>>>>> +       /* Ensure the relocation region is coverd by a PMR */
>>>>>>
>>>>>> covered
>>>>>>
>>>>>>> +       movq    %rbx, %rdi
>>>>>>> +       movl    $(_bss - startup_32), %esi
>>>>>>> +       callq   sl_check_region
>>>>>>> +#endif
>>>>>>> +
>>>>>>>     /*
>>>>>>>      * Copy the compressed kernel to the end of our buffer
>>>>>>>      * where decompression in place becomes safe.
>>>>>>> @@ -462,6 +469,29 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>>>>>>            shrq    $3, %rcx
>>>>>>>            rep     stosq
>>>>>>>
>>>>>>> +#ifdef CONFIG_SECURE_LAUNCH
>>>>>>> +       /*
>>>>>>> +        * Have to do the final early sl stub work in 64b area.
>>>>>>> +        *
>>>>>>> +        * *********** NOTE ***********
>>>>>>> +        *
>>>>>>> +        * Several boot params get used before we get a chance to measure
>>>>>>> +        * them in this call. This is a known issue and we currently don't
>>>>>>> +        * have a solution. The scratch field doesn't matter. There is no
>>>>>>> +        * obvious way to do anything about the use of kernel_alignment or
>>>>>>> +        * init_size though these seem low risk with all the PMR and overlap
>>>>>>> +        * checks in place.
>>>>>>> +        */
>>>>>>> +       movq    %r15, %rdi
>>>>>>> +       callq   sl_main
>>>>>>> +
>>>>>>> +       /* Ensure the decompression location is covered by a PMR */
>>>>>>> +       movq    %rbp, %rdi
>>>>>>> +       movq    output_len(%rip), %rsi
>>>>>>> +       callq   sl_check_region
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +       pushq   %rsi
>>>>>>
>>>>>> This looks like a rebase error.
>>>>>>
>>>>>>>            call    load_stage2_idt
>>>>>>>
>>>>>>>            /* Pass boot_params to initialize_identity_maps() */
>>>>>>> diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S
>>>>>>> index c18f07181dd5..e199b87764e9 100644
>>>>>>> --- a/arch/x86/boot/compressed/kernel_info.S
>>>>>>> +++ b/arch/x86/boot/compressed/kernel_info.S
>>>>>>> @@ -28,6 +28,40 @@ SYM_DATA_START(kernel_info)
>>>>>>>            /* Maximal allowed type for setup_data and setup_indirect structs. */
>>>>>>>            .long   SETUP_TYPE_MAX
>>>>>>>
>>>>>>> +       /* Offset to the MLE header structure */
>>>>>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>>>>>>> +       .long   rva(mle_header)
>>>>>>
>>>>>> ... so this could just be mle_header - kernel_info, and the consumer
>>>>>> can do the math instead.
>>>>>>
>>>>>>> +#else
>>>>>>> +       .long   0
>>>>>>> +#endif
>>>>>>> +
>>>>>>>     kernel_info_var_len_data:
>>>>>>>            /* Empty for time being... */
>>>>>>>     SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
>>>>>>> +
>>>>>>> +#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>>>>>>> +       /*
>>>>>>> +        * The MLE Header per the TXT Specification, section 2.1
>>>>>>> +        * MLE capabilities, see table 4. Capabilities set:
>>>>>>> +        * bit 0: Support for GETSEC[WAKEUP] for RLP wakeup
>>>>>>> +        * bit 1: Support for RLP wakeup using MONITOR address
>>>>>>> +        * bit 2: The ECX register will contain the pointer to the MLE page table
>>>>>>> +        * bit 5: TPM 1.2 family: Details/authorities PCR usage support
>>>>>>> +        * bit 9: Supported format of TPM 2.0 event log - TCG compliant
>>>>>>> +        */
>>>>>>> +SYM_DATA_START(mle_header)
>>>>>>> +       .long   0x9082ac5a  /* UUID0 */
>>>>>>> +       .long   0x74a7476f  /* UUID1 */
>>>>>>> +       .long   0xa2555c0f  /* UUID2 */
>>>>>>> +       .long   0x42b651cb  /* UUID3 */
>>>>>>> +       .long   0x00000034  /* MLE header size */
>>>>>>> +       .long   0x00020002  /* MLE version 2.2 */
>>>>>>> +       .long   rva(sl_stub_entry) /* Linear entry point of MLE (virt. address) */
>>>>>>
>>>>>> and these should perhaps be relative to mle_header?
>>>>>>
>>>>>>> +       .long   0x00000000  /* First valid page of MLE */
>>>>>>> +       .long   0x00000000  /* Offset within binary of first byte of MLE */
>>>>>>> +       .long   rva(_edata) /* Offset within binary of last byte + 1 of MLE */
>>>>>>
>>>>>> and here
>>>>>>
>>>>>
>>>>> Ugh never mind - these are specified externally.
>>>>
>>>> Can you clarify your follow on comment here?
>>>>
>>>
>>> I noticed that -as you pointed out in your previous reply- these
>>> fields cannot be repainted at will, as they are defined by an external
>>> specification.
>>>
>>> I'll play a bit more with this code tomorrow - I would *really* like
>>> to avoid the need for patch #1, as it adds another constraint on how
>>> we construct the boot image, and this is already riddled with legacy
>>> and other complications.
>>
>> Yea I should have read forward through all your replies before
>> responding to the first one but I think it clarified things as you point
>> out here. We appreciate you help and suggestions.
>>
> 
> OK, so I have a solution that does not require kernel_info at a fixed offset:
> 
> - put this at the end of arch/x86/boot/compressed/vmlinux.lds.S
> 
> #ifdef CONFIG_SECURE_LAUNCH
> PROVIDE(kernel_info_offset      = ABSOLUTE(kernel_info - startup_32));
> PROVIDE(mle_header_offset       = kernel_info_offset +
> ABSOLUTE(mle_header - startup_32));
> PROVIDE(sl_stub_entry_offset    = kernel_info_offset +
> ABSOLUTE(sl_stub_entry - startup_32));
> PROVIDE(_edata_offset           = kernel_info_offset + ABSOLUTE(_edata
> - startup_32));
> #endif
> 
> 
> and use this for the header fields:
> 
>          /* Offset to the MLE header structure */
> #if IS_ENABLED(CONFIG_SECURE_LAUNCH)
>          .long   mle_header_offset - kernel_info
> #else
>          .long   0
> #endif
> 

Awesome thank you! We will work on incorporating this unless someone 
else sees a problem with it (or we run into problems and need to revisit).

Ross

> 
> 
> SYM_DATA_START(mle_header)
>          .long   0x9082ac5a  /* UUID0 */
>          .long   0x74a7476f  /* UUID1 */
>          .long   0xa2555c0f  /* UUID2 */
>          .long   0x42b651cb  /* UUID3 */
>          .long   0x00000034  /* MLE header size */
>          .long   0x00020002  /* MLE version 2.2 */
>          .long   sl_stub_entry_offset - kernel_info /* Linear entry
> point of MLE (virt. address) */
>          .long   0x00000000  /* First valid page of MLE */
>          .long   0x00000000  /* Offset within binary of first byte of MLE */
>          .long   _edata_offset - kernel_info /* Offset within binary of
> last byte + 1 of MLE */
>          .long   0x00000227  /* Bit vector of MLE-supported capabilities */
>          .long   0x00000000  /* Starting linear address of command line
> (unused) */
>          .long   0x00000000  /* Ending linear address of command line (unused) */
>
Ross Philipson June 4, 2024, 9:47 p.m. UTC | #20
On 6/4/24 1:05 PM, Jarkko Sakkinen wrote:
> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>> On Intel, the APs are left in a well documented state after TXT performs
>> the late launch. Specifically they cannot have #INIT asserted on them so
>> a standard startup via INIT/SIPI/SIPI cannot be performed. Instead the
>> early SL stub code uses MONITOR and MWAIT to park the APs. The realmode/init.c
>> code updates the jump address for the waiting APs with the location of the
>> Secure Launch entry point in the RM piggy after it is loaded and fixed up.
>> As the APs are woken up by writing the monitor, the APs jump to the Secure
>> Launch entry point in the RM piggy which mimics what the real mode code would
>> do then jumps to the standard RM piggy protected mode entry point.
>>
>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>> ---
>>   arch/x86/include/asm/realmode.h      |  3 ++
>>   arch/x86/kernel/smpboot.c            | 58 +++++++++++++++++++++++++++-
>>   arch/x86/realmode/init.c             |  3 ++
>>   arch/x86/realmode/rm/header.S        |  3 ++
>>   arch/x86/realmode/rm/trampoline_64.S | 32 +++++++++++++++
>>   5 files changed, 97 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
>> index 87e5482acd0d..339b48e2543d 100644
>> --- a/arch/x86/include/asm/realmode.h
>> +++ b/arch/x86/include/asm/realmode.h
>> @@ -38,6 +38,9 @@ struct real_mode_header {
>>   #ifdef CONFIG_X86_64
>>   	u32	machine_real_restart_seg;
>>   #endif
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +	u32	sl_trampoline_start32;
>> +#endif
>>   };
>>   
>>   /* This must match data at realmode/rm/trampoline_{32,64}.S */
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index 0c35207320cb..adb521221d6c 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -60,6 +60,7 @@
>>   #include <linux/stackprotector.h>
>>   #include <linux/cpuhotplug.h>
>>   #include <linux/mc146818rtc.h>
>> +#include <linux/slaunch.h>
>>   
>>   #include <asm/acpi.h>
>>   #include <asm/cacheinfo.h>
>> @@ -868,6 +869,56 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
>>   	return 0;
>>   }
>>   
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +
>> +static bool slaunch_is_txt_launch(void)
>> +{
>> +	if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) ==
>> +	    (SL_FLAG_ACTIVE | SL_FLAG_ARCH_TXT))
>> +		return true;
>> +
>> +	return false;
>> +}
> 
> static inline bool slaunch_is_txt_launch(void)
> {
> 	u32 mask =  SL_FLAG_ACTIVE | SL_FLAG_ARCH_TXT;
> 
> 	return slaunch_get_flags() & mask == mask;
> }

Actually I think I can take your suggested change and move this function 
to the main header files since this check is done elsewhere. And later I 
can make others like slaunch_is_skinit_launch(). Thanks.

> 
> 
>> +
>> +/*
>> + * TXT AP startup is quite different than normal. The APs cannot have #INIT
>> + * asserted on them or receive SIPIs. The early Secure Launch code has parked
>> + * the APs using monitor/mwait. This will wake the APs by writing the monitor
>> + * and have them jump to the protected mode code in the rmpiggy where the rest
>> + * of the SMP boot of the AP will proceed normally.
>> + */
>> +static void slaunch_wakeup_cpu_from_txt(int cpu, int apicid)
>> +{
>> +	struct sl_ap_wake_info *ap_wake_info;
>> +	struct sl_ap_stack_and_monitor *stack_monitor = NULL;
> 
> struct sl_ap_stack_and_monitor *stack_monitor; /* note: no initialization */
> struct sl_ap_wake_info *ap_wake_info;

Will fix.

> 
> 
>> +
>> +	ap_wake_info = slaunch_get_ap_wake_info();
>> +
>> +	stack_monitor = (struct sl_ap_stack_and_monitor *)__va(ap_wake_info->ap_wake_block +
>> +							       ap_wake_info->ap_stacks_offset);
>> +
>> +	for (unsigned int i = TXT_MAX_CPUS - 1; i >= 0; i--) {
>> +		if (stack_monitor[i].apicid == apicid) {
>> +			/* Write the monitor */
> 
> I'd remove this comment.

Sure.

Ross

> 
>> +			stack_monitor[i].monitor = 1;
>> +			break;
>> +		}
>> +	}
>> +}
>> +
>> +#else
>> +
>> +static inline bool slaunch_is_txt_launch(void)
>> +{
>> +	return false;
>> +}
>> +
>> +static inline void slaunch_wakeup_cpu_from_txt(int cpu, int apicid)
>> +{
>> +}
>> +
>> +#endif  /* !CONFIG_SECURE_LAUNCH */
>> +
>>   /*
>>    * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
>>    * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
>> @@ -877,7 +928,7 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle)
>>   static int do_boot_cpu(u32 apicid, int cpu, struct task_struct *idle)
>>   {
>>   	unsigned long start_ip = real_mode_header->trampoline_start;
>> -	int ret;
>> +	int ret = 0;
>>   
>>   #ifdef CONFIG_X86_64
>>   	/* If 64-bit wakeup method exists, use the 64-bit mode trampoline IP */
>> @@ -922,12 +973,15 @@ static int do_boot_cpu(u32 apicid, int cpu, struct task_struct *idle)
>>   
>>   	/*
>>   	 * Wake up a CPU in difference cases:
>> +	 * - Intel TXT DRTM launch uses its own method to wake the APs
>>   	 * - Use a method from the APIC driver if one defined, with wakeup
>>   	 *   straight to 64-bit mode preferred over wakeup to RM.
>>   	 * Otherwise,
>>   	 * - Use an INIT boot APIC message
>>   	 */
>> -	if (apic->wakeup_secondary_cpu_64)
>> +	if (slaunch_is_txt_launch())
>> +		slaunch_wakeup_cpu_from_txt(cpu, apicid);
>> +	else if (apic->wakeup_secondary_cpu_64)
>>   		ret = apic->wakeup_secondary_cpu_64(apicid, start_ip);
>>   	else if (apic->wakeup_secondary_cpu)
>>   		ret = apic->wakeup_secondary_cpu(apicid, start_ip);
>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>> index f9bc444a3064..d95776cb30d3 100644
>> --- a/arch/x86/realmode/init.c
>> +++ b/arch/x86/realmode/init.c
>> @@ -4,6 +4,7 @@
>>   #include <linux/memblock.h>
>>   #include <linux/cc_platform.h>
>>   #include <linux/pgtable.h>
>> +#include <linux/slaunch.h>
>>   
>>   #include <asm/set_memory.h>
>>   #include <asm/realmode.h>
>> @@ -210,6 +211,8 @@ void __init init_real_mode(void)
>>   
>>   	setup_real_mode();
>>   	set_real_mode_permissions();
>> +
>> +	slaunch_fixup_jump_vector();
>>   }
>>   
>>   static int __init do_init_real_mode(void)
>> diff --git a/arch/x86/realmode/rm/header.S b/arch/x86/realmode/rm/header.S
>> index 2eb62be6d256..3b5cbcbbfc90 100644
>> --- a/arch/x86/realmode/rm/header.S
>> +++ b/arch/x86/realmode/rm/header.S
>> @@ -37,6 +37,9 @@ SYM_DATA_START(real_mode_header)
>>   #ifdef CONFIG_X86_64
>>   	.long	__KERNEL32_CS
>>   #endif
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +	.long	pa_sl_trampoline_start32
>> +#endif
>>   SYM_DATA_END(real_mode_header)
>>   
>>   	/* End signature, used to verify integrity */
>> diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
>> index 14d9c7daf90f..b0ce6205d7ea 100644
>> --- a/arch/x86/realmode/rm/trampoline_64.S
>> +++ b/arch/x86/realmode/rm/trampoline_64.S
>> @@ -122,6 +122,38 @@ SYM_CODE_END(sev_es_trampoline_start)
>>   
>>   	.section ".text32","ax"
>>   	.code32
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +	.balign 4
>> +SYM_CODE_START(sl_trampoline_start32)
>> +	/*
>> +	 * The early secure launch stub AP wakeup code has taken care of all
>> +	 * the vagaries of launching out of TXT. This bit just mimics what the
>> +	 * 16b entry code does and jumps off to the real startup_32.
>> +	 */
>> +	cli
>> +	wbinvd
>> +
>> +	/*
>> +	 * The %ebx provided is not terribly useful since it is the physical
>> +	 * address of tb_trampoline_start and not the base of the image.
>> +	 * Use pa_real_mode_base, which is fixed up, to get a run time
>> +	 * base register to use for offsets to location that do not have
>> +	 * pa_ symbols.
>> +	 */
>> +	movl    $pa_real_mode_base, %ebx
>> +
>> +	LOCK_AND_LOAD_REALMODE_ESP lock_pa=1
>> +
>> +	lgdt    tr_gdt(%ebx)
>> +	lidt    tr_idt(%ebx)
>> +
>> +	movw	$__KERNEL_DS, %dx	# Data segment descriptor
>> +
>> +	/* Jump to where the 16b code would have jumped */
>> +	ljmpl	$__KERNEL32_CS, $pa_startup_32
>> +SYM_CODE_END(sl_trampoline_start32)
>> +#endif
>> +
>>   	.balign 4
>>   SYM_CODE_START(startup_32)
>>   	movl	%edx, %ss
> 
> BR, Jarkko
>
Ross Philipson June 4, 2024, 10:14 p.m. UTC | #21
On 6/4/24 1:27 PM, Jarkko Sakkinen wrote:
> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>> Curently the locality is hard coded to 0 but for DRTM support, access
>> is needed to localities 1 through 4.
>>
>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>> ---
>>   drivers/char/tpm/tpm-chip.c      | 24 +++++++++++++++++++++++-
>>   drivers/char/tpm/tpm-interface.c | 15 +++++++++++++++
>>   drivers/char/tpm/tpm.h           |  1 +
>>   include/linux/tpm.h              |  4 ++++
>>   4 files changed, 43 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
>> index 854546000c92..73eac54d61fb 100644
>> --- a/drivers/char/tpm/tpm-chip.c
>> +++ b/drivers/char/tpm/tpm-chip.c
>> @@ -44,7 +44,7 @@ static int tpm_request_locality(struct tpm_chip *chip)
>>   	if (!chip->ops->request_locality)
>>   		return 0;
>>   
>> -	rc = chip->ops->request_locality(chip, 0);
>> +	rc = chip->ops->request_locality(chip, chip->pref_locality);
>>   	if (rc < 0)
>>   		return rc;
>>   
>> @@ -143,6 +143,27 @@ void tpm_chip_stop(struct tpm_chip *chip)
>>   }
>>   EXPORT_SYMBOL_GPL(tpm_chip_stop);
>>   
>> +/**
>> + * tpm_chip_preferred_locality() - set the TPM chip preferred locality to open
>> + * @chip:	a TPM chip to use
>> + * @locality:   the preferred locality
>> + *
>> + * Return:
>> + * * true      - Preferred locality set
>> + * * false     - Invalid locality specified
>> + */
>> +bool tpm_chip_preferred_locality(struct tpm_chip *chip, int locality)
>> +{
>> +	if (locality < 0 || locality >=TPM_MAX_LOCALITY)
>> +		return false;
>> +
>> +	mutex_lock(&chip->tpm_mutex);
>> +	chip->pref_locality = locality;
>> +	mutex_unlock(&chip->tpm_mutex);
>> +	return true;
>> +}
>> +EXPORT_SYMBOL_GPL(tpm_chip_preferred_locality);
>> +
>>   /**
>>    * tpm_try_get_ops() - Get a ref to the tpm_chip
>>    * @chip: Chip to ref
>> @@ -374,6 +395,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
>>   	}
>>   
>>   	chip->locality = -1;
>> +	chip->pref_locality = 0;
>>   	return chip;
>>   
>>   out:
>> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
>> index 5da134f12c9a..35f14ccecf0e 100644
>> --- a/drivers/char/tpm/tpm-interface.c
>> +++ b/drivers/char/tpm/tpm-interface.c
>> @@ -274,6 +274,21 @@ int tpm_is_tpm2(struct tpm_chip *chip)
>>   }
>>   EXPORT_SYMBOL_GPL(tpm_is_tpm2);
>>   
>> +/**
>> + * tpm_preferred_locality() - set the TPM chip preferred locality to open
>> + * @chip:	a TPM chip to use
>> + * @locality:   the preferred locality
>> + *
>> + * Return:
>> + * * true      - Preferred locality set
>> + * * false     - Invalid locality specified
>> + */
>> +bool tpm_preferred_locality(struct tpm_chip *chip, int locality)
>> +{
>> +	return tpm_chip_preferred_locality(chip, locality);
>> +}
>> +EXPORT_SYMBOL_GPL(tpm_preferred_locality);
> 
>   What good does this extra wrapping do?
> 
>   tpm_set_default_locality() and default_locality would make so much more
>   sense in any case.

Are you mainly just talking about my naming choices here and in the 
follow-on response? Can you clarify what you are requesting?

Thanks
Ross

> 
>   BR, Jarkko
Jarkko Sakkinen June 4, 2024, 10:36 p.m. UTC | #22
On Tue Jun 4, 2024 at 11:31 PM EEST,  wrote:
> On 6/4/24 11:21 AM, Jarkko Sakkinen wrote:
> > On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> >> Introduce the Secure Launch Resource Table which forms the formal
> >> interface between the pre and post launch code.
> >>
> >> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> > 
> > If a uarch specific, I'd appreciate Intel SDM reference here so that I
> > can look it up and compare. Like in section granularity.
>
> This table is meant to not be architecture specific though it can 
> contain architecture specific sub-entities. E.g. there is a TXT specific 
> table and in the future there will be an AMD and ARM one (and hopefully 
> some others). I hope that addresses what you are pointing out or maybe I 
> don't fully understand what you mean here...

At least Intel SDM has a definition of any possible architecture
specific data structure. It is handy to also have this available
in inline comment for any possible such structure pointing out the
section where it is defined.

BR, Jarkko
Jarkko Sakkinen June 4, 2024, 10:43 p.m. UTC | #23
> > s/tpm20/tpm2/
>
> Reasonable. We can change it.

For the sake of consistency. Anywhere else where we have code using TPM,
either "tpm_" or "tpm2_" is used.

BR, Jarkko
Jarkko Sakkinen June 4, 2024, 10:46 p.m. UTC | #24
On Wed Jun 5, 2024 at 12:47 AM EEST,  wrote:
> > static inline bool slaunch_is_txt_launch(void)
> > {
> > 	u32 mask =  SL_FLAG_ACTIVE | SL_FLAG_ARCH_TXT;
> > 
> > 	return slaunch_get_flags() & mask == mask;
> > }
>
> Actually I think I can take your suggested change and move this function 
> to the main header files since this check is done elsewhere. And later I 
> can make others like slaunch_is_skinit_launch(). Thanks.

Yep, makes sense to me.

BR, Jarkko
Jarkko Sakkinen June 4, 2024, 10:50 p.m. UTC | #25
On Wed Jun 5, 2024 at 1:14 AM EEST,  wrote:
> On 6/4/24 1:27 PM, Jarkko Sakkinen wrote:
> > On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> >> Curently the locality is hard coded to 0 but for DRTM support, access
> >> is needed to localities 1 through 4.
> >>
> >> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> >> ---
> >>   drivers/char/tpm/tpm-chip.c      | 24 +++++++++++++++++++++++-
> >>   drivers/char/tpm/tpm-interface.c | 15 +++++++++++++++
> >>   drivers/char/tpm/tpm.h           |  1 +
> >>   include/linux/tpm.h              |  4 ++++
> >>   4 files changed, 43 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> >> index 854546000c92..73eac54d61fb 100644
> >> --- a/drivers/char/tpm/tpm-chip.c
> >> +++ b/drivers/char/tpm/tpm-chip.c
> >> @@ -44,7 +44,7 @@ static int tpm_request_locality(struct tpm_chip *chip)
> >>   	if (!chip->ops->request_locality)
> >>   		return 0;
> >>   
> >> -	rc = chip->ops->request_locality(chip, 0);
> >> +	rc = chip->ops->request_locality(chip, chip->pref_locality);
> >>   	if (rc < 0)
> >>   		return rc;
> >>   
> >> @@ -143,6 +143,27 @@ void tpm_chip_stop(struct tpm_chip *chip)
> >>   }
> >>   EXPORT_SYMBOL_GPL(tpm_chip_stop);
> >>   
> >> +/**
> >> + * tpm_chip_preferred_locality() - set the TPM chip preferred locality to open
> >> + * @chip:	a TPM chip to use
> >> + * @locality:   the preferred locality
> >> + *
> >> + * Return:
> >> + * * true      - Preferred locality set
> >> + * * false     - Invalid locality specified
> >> + */
> >> +bool tpm_chip_preferred_locality(struct tpm_chip *chip, int locality)
> >> +{
> >> +	if (locality < 0 || locality >=TPM_MAX_LOCALITY)
> >> +		return false;
> >> +
> >> +	mutex_lock(&chip->tpm_mutex);
> >> +	chip->pref_locality = locality;
> >> +	mutex_unlock(&chip->tpm_mutex);
> >> +	return true;
> >> +}
> >> +EXPORT_SYMBOL_GPL(tpm_chip_preferred_locality);
> >> +
> >>   /**
> >>    * tpm_try_get_ops() - Get a ref to the tpm_chip
> >>    * @chip: Chip to ref
> >> @@ -374,6 +395,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
> >>   	}
> >>   
> >>   	chip->locality = -1;
> >> +	chip->pref_locality = 0;
> >>   	return chip;
> >>   
> >>   out:
> >> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
> >> index 5da134f12c9a..35f14ccecf0e 100644
> >> --- a/drivers/char/tpm/tpm-interface.c
> >> +++ b/drivers/char/tpm/tpm-interface.c
> >> @@ -274,6 +274,21 @@ int tpm_is_tpm2(struct tpm_chip *chip)
> >>   }
> >>   EXPORT_SYMBOL_GPL(tpm_is_tpm2);
> >>   
> >> +/**
> >> + * tpm_preferred_locality() - set the TPM chip preferred locality to open
> >> + * @chip:	a TPM chip to use
> >> + * @locality:   the preferred locality
> >> + *
> >> + * Return:
> >> + * * true      - Preferred locality set
> >> + * * false     - Invalid locality specified
> >> + */
> >> +bool tpm_preferred_locality(struct tpm_chip *chip, int locality)
> >> +{
> >> +	return tpm_chip_preferred_locality(chip, locality);
> >> +}
> >> +EXPORT_SYMBOL_GPL(tpm_preferred_locality);
> > 
> >   What good does this extra wrapping do?
> > 
> >   tpm_set_default_locality() and default_locality would make so much more
> >   sense in any case.
>
> Are you mainly just talking about my naming choices here and in the 
> follow-on response? Can you clarify what you are requesting?

I'd prefer:

1. Name the variable as default_locality.
2. Only create a single expored to function to tpm-chip.c:
   tpm_chip_set_default_locality().
3. Call this function in all call sites.

"tpm_preferred_locality" should be just removed, as tpm_chip_*
is exported anyway.

BR, Jarkko
Ross Philipson June 4, 2024, 11 p.m. UTC | #26
On 6/4/24 3:36 PM, Jarkko Sakkinen wrote:
> On Tue Jun 4, 2024 at 11:31 PM EEST,  wrote:
>> On 6/4/24 11:21 AM, Jarkko Sakkinen wrote:
>>> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>>>> Introduce the Secure Launch Resource Table which forms the formal
>>>> interface between the pre and post launch code.
>>>>
>>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>>>
>>> If a uarch specific, I'd appreciate Intel SDM reference here so that I
>>> can look it up and compare. Like in section granularity.
>>
>> This table is meant to not be architecture specific though it can
>> contain architecture specific sub-entities. E.g. there is a TXT specific
>> table and in the future there will be an AMD and ARM one (and hopefully
>> some others). I hope that addresses what you are pointing out or maybe I
>> don't fully understand what you mean here...
> 
> At least Intel SDM has a definition of any possible architecture
> specific data structure. It is handy to also have this available
> in inline comment for any possible such structure pointing out the
> section where it is defined.

The TXT specific structure is not defined in the SDM or the TXT dev 
guide. Part of it is driven by requirements in the TXT dev guide but 
that guide does not contain implementation details.

That said, if you would like links to relevant documents in the comments 
before arch specific structures, I can add them.

Ross

> 
> BR, Jarkko
Ross Philipson June 4, 2024, 11:04 p.m. UTC | #27
On 6/4/24 3:50 PM, Jarkko Sakkinen wrote:
> On Wed Jun 5, 2024 at 1:14 AM EEST,  wrote:
>> On 6/4/24 1:27 PM, Jarkko Sakkinen wrote:
>>> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>>>> Curently the locality is hard coded to 0 but for DRTM support, access
>>>> is needed to localities 1 through 4.
>>>>
>>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>>>> ---
>>>>    drivers/char/tpm/tpm-chip.c      | 24 +++++++++++++++++++++++-
>>>>    drivers/char/tpm/tpm-interface.c | 15 +++++++++++++++
>>>>    drivers/char/tpm/tpm.h           |  1 +
>>>>    include/linux/tpm.h              |  4 ++++
>>>>    4 files changed, 43 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
>>>> index 854546000c92..73eac54d61fb 100644
>>>> --- a/drivers/char/tpm/tpm-chip.c
>>>> +++ b/drivers/char/tpm/tpm-chip.c
>>>> @@ -44,7 +44,7 @@ static int tpm_request_locality(struct tpm_chip *chip)
>>>>    	if (!chip->ops->request_locality)
>>>>    		return 0;
>>>>    
>>>> -	rc = chip->ops->request_locality(chip, 0);
>>>> +	rc = chip->ops->request_locality(chip, chip->pref_locality);
>>>>    	if (rc < 0)
>>>>    		return rc;
>>>>    
>>>> @@ -143,6 +143,27 @@ void tpm_chip_stop(struct tpm_chip *chip)
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(tpm_chip_stop);
>>>>    
>>>> +/**
>>>> + * tpm_chip_preferred_locality() - set the TPM chip preferred locality to open
>>>> + * @chip:	a TPM chip to use
>>>> + * @locality:   the preferred locality
>>>> + *
>>>> + * Return:
>>>> + * * true      - Preferred locality set
>>>> + * * false     - Invalid locality specified
>>>> + */
>>>> +bool tpm_chip_preferred_locality(struct tpm_chip *chip, int locality)
>>>> +{
>>>> +	if (locality < 0 || locality >=TPM_MAX_LOCALITY)
>>>> +		return false;
>>>> +
>>>> +	mutex_lock(&chip->tpm_mutex);
>>>> +	chip->pref_locality = locality;
>>>> +	mutex_unlock(&chip->tpm_mutex);
>>>> +	return true;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(tpm_chip_preferred_locality);
>>>> +
>>>>    /**
>>>>     * tpm_try_get_ops() - Get a ref to the tpm_chip
>>>>     * @chip: Chip to ref
>>>> @@ -374,6 +395,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
>>>>    	}
>>>>    
>>>>    	chip->locality = -1;
>>>> +	chip->pref_locality = 0;
>>>>    	return chip;
>>>>    
>>>>    out:
>>>> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
>>>> index 5da134f12c9a..35f14ccecf0e 100644
>>>> --- a/drivers/char/tpm/tpm-interface.c
>>>> +++ b/drivers/char/tpm/tpm-interface.c
>>>> @@ -274,6 +274,21 @@ int tpm_is_tpm2(struct tpm_chip *chip)
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(tpm_is_tpm2);
>>>>    
>>>> +/**
>>>> + * tpm_preferred_locality() - set the TPM chip preferred locality to open
>>>> + * @chip:	a TPM chip to use
>>>> + * @locality:   the preferred locality
>>>> + *
>>>> + * Return:
>>>> + * * true      - Preferred locality set
>>>> + * * false     - Invalid locality specified
>>>> + */
>>>> +bool tpm_preferred_locality(struct tpm_chip *chip, int locality)
>>>> +{
>>>> +	return tpm_chip_preferred_locality(chip, locality);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(tpm_preferred_locality);
>>>
>>>    What good does this extra wrapping do?
>>>
>>>    tpm_set_default_locality() and default_locality would make so much more
>>>    sense in any case.
>>
>> Are you mainly just talking about my naming choices here and in the
>> follow-on response? Can you clarify what you are requesting?
> 
> I'd prefer:
> 
> 1. Name the variable as default_locality.
> 2. Only create a single expored to function to tpm-chip.c:
>     tpm_chip_set_default_locality().
> 3. Call this function in all call sites.
> 
> "tpm_preferred_locality" should be just removed, as tpm_chip_*
> is exported anyway.

Ok got it, thanks.

> 
> BR, Jarkko
>
Jarkko Sakkinen June 5, 2024, 12:22 a.m. UTC | #28
On Wed Jun 5, 2024 at 2:00 AM EEST,  wrote:
> On 6/4/24 3:36 PM, Jarkko Sakkinen wrote:
> > On Tue Jun 4, 2024 at 11:31 PM EEST,  wrote:
> >> On 6/4/24 11:21 AM, Jarkko Sakkinen wrote:
> >>> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> >>>> Introduce the Secure Launch Resource Table which forms the formal
> >>>> interface between the pre and post launch code.
> >>>>
> >>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> >>>
> >>> If a uarch specific, I'd appreciate Intel SDM reference here so that I
> >>> can look it up and compare. Like in section granularity.
> >>
> >> This table is meant to not be architecture specific though it can
> >> contain architecture specific sub-entities. E.g. there is a TXT specific
> >> table and in the future there will be an AMD and ARM one (and hopefully
> >> some others). I hope that addresses what you are pointing out or maybe I
> >> don't fully understand what you mean here...
> > 
> > At least Intel SDM has a definition of any possible architecture
> > specific data structure. It is handy to also have this available
> > in inline comment for any possible such structure pointing out the
> > section where it is defined.
>
> The TXT specific structure is not defined in the SDM or the TXT dev 
> guide. Part of it is driven by requirements in the TXT dev guide but 
> that guide does not contain implementation details.
>
> That said, if you would like links to relevant documents in the comments 
> before arch specific structures, I can add them.

Vol. 2D 7-40, in the description of GETSEC[WAKEUP] there is in fact a
description of MLE JOINT structure at least:

1. GDT limit (offset 0)
2. GDT base (offset 4)
3. Segment selector initializer (offset 8)
4. EIP (offset 12)

So is this only exercised in protect mode, and not in long mode? Just
wondering whether I should make a bug report on this for SDM or not.

Especially this puzzles me, given that x86s won't have protected
mode in the first place...

BR, Jarkko
Ross Philipson June 5, 2024, 2:33 a.m. UTC | #29
On 6/4/24 5:22 PM, Jarkko Sakkinen wrote:
> On Wed Jun 5, 2024 at 2:00 AM EEST,  wrote:
>> On 6/4/24 3:36 PM, Jarkko Sakkinen wrote:
>>> On Tue Jun 4, 2024 at 11:31 PM EEST,  wrote:
>>>> On 6/4/24 11:21 AM, Jarkko Sakkinen wrote:
>>>>> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>>>>>> Introduce the Secure Launch Resource Table which forms the formal
>>>>>> interface between the pre and post launch code.
>>>>>>
>>>>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>>>>>
>>>>> If a uarch specific, I'd appreciate Intel SDM reference here so that I
>>>>> can look it up and compare. Like in section granularity.
>>>>
>>>> This table is meant to not be architecture specific though it can
>>>> contain architecture specific sub-entities. E.g. there is a TXT specific
>>>> table and in the future there will be an AMD and ARM one (and hopefully
>>>> some others). I hope that addresses what you are pointing out or maybe I
>>>> don't fully understand what you mean here...
>>>
>>> At least Intel SDM has a definition of any possible architecture
>>> specific data structure. It is handy to also have this available
>>> in inline comment for any possible such structure pointing out the
>>> section where it is defined.
>>
>> The TXT specific structure is not defined in the SDM or the TXT dev
>> guide. Part of it is driven by requirements in the TXT dev guide but
>> that guide does not contain implementation details.
>>
>> That said, if you would like links to relevant documents in the comments
>> before arch specific structures, I can add them.
> 
> Vol. 2D 7-40, in the description of GETSEC[WAKEUP] there is in fact a
> description of MLE JOINT structure at least:
> 
> 1. GDT limit (offset 0)
> 2. GDT base (offset 4)
> 3. Segment selector initializer (offset 8)
> 4. EIP (offset 12)
> 
> So is this only exercised in protect mode, and not in long mode? Just
> wondering whether I should make a bug report on this for SDM or not.

I believe you can issue the SENTER instruction in long mode, compat mode 
or protected mode. On the other side thought, you will pop out of the 
TXT initialization in protected mode. The SDM outlines what registers 
will hold what values and what is valid and not valid. The APs will also 
vector through the join structure mentioned above to the location 
specified in protected mode using the GDT information you provide.

> 
> Especially this puzzles me, given that x86s won't have protected
> mode in the first place...

My guess is the simplified x86 architecture will not support TXT. It is 
not supported on a number of CPUs/chipsets as it stands today. Just a 
guess but we know only vPro systems support TXT today.

Thanks
Ross

> 
> BR, Jarkko
>
Jarkko Sakkinen June 5, 2024, 4:04 a.m. UTC | #30
On Wed Jun 5, 2024 at 5:33 AM EEST,  wrote:
> On 6/4/24 5:22 PM, Jarkko Sakkinen wrote:
> > On Wed Jun 5, 2024 at 2:00 AM EEST,  wrote:
> >> On 6/4/24 3:36 PM, Jarkko Sakkinen wrote:
> >>> On Tue Jun 4, 2024 at 11:31 PM EEST,  wrote:
> >>>> On 6/4/24 11:21 AM, Jarkko Sakkinen wrote:
> >>>>> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
> >>>>>> Introduce the Secure Launch Resource Table which forms the formal
> >>>>>> interface between the pre and post launch code.
> >>>>>>
> >>>>>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> >>>>>
> >>>>> If a uarch specific, I'd appreciate Intel SDM reference here so that I
> >>>>> can look it up and compare. Like in section granularity.
> >>>>
> >>>> This table is meant to not be architecture specific though it can
> >>>> contain architecture specific sub-entities. E.g. there is a TXT specific
> >>>> table and in the future there will be an AMD and ARM one (and hopefully
> >>>> some others). I hope that addresses what you are pointing out or maybe I
> >>>> don't fully understand what you mean here...
> >>>
> >>> At least Intel SDM has a definition of any possible architecture
> >>> specific data structure. It is handy to also have this available
> >>> in inline comment for any possible such structure pointing out the
> >>> section where it is defined.
> >>
> >> The TXT specific structure is not defined in the SDM or the TXT dev
> >> guide. Part of it is driven by requirements in the TXT dev guide but
> >> that guide does not contain implementation details.
> >>
> >> That said, if you would like links to relevant documents in the comments
> >> before arch specific structures, I can add them.
> > 
> > Vol. 2D 7-40, in the description of GETSEC[WAKEUP] there is in fact a
> > description of MLE JOINT structure at least:
> > 
> > 1. GDT limit (offset 0)
> > 2. GDT base (offset 4)
> > 3. Segment selector initializer (offset 8)
> > 4. EIP (offset 12)
> > 
> > So is this only exercised in protect mode, and not in long mode? Just
> > wondering whether I should make a bug report on this for SDM or not.
>
> I believe you can issue the SENTER instruction in long mode, compat mode 
> or protected mode. On the other side thought, you will pop out of the 
> TXT initialization in protected mode. The SDM outlines what registers 
> will hold what values and what is valid and not valid. The APs will also 
> vector through the join structure mentioned above to the location 
> specified in protected mode using the GDT information you provide.
>
> > 
> > Especially this puzzles me, given that x86s won't have protected
> > mode in the first place...
>
> My guess is the simplified x86 architecture will not support TXT. It is 
> not supported on a number of CPUs/chipsets as it stands today. Just a 
> guess but we know only vPro systems support TXT today.

I'm wondering could this bootstrap itself inside TDX or SNP, and that
way provide path forward? AFAIK, TDX can be nested straight of the bat
and SNP from 2nd generation EPYC's, which contain the feature.

I do buy the idea of attesting the host, not just the guests, even in
the "confidential world". That said, I'm not sure does it make sense
to add all this infrastructure for a technology with such a short
expiration date?

I would not want to say this at v9, and it is not really your fault
either, but for me this would make a lot more sense if the core of
Trenchboot was redesigned around these newer technologies with a
long-term future.

The idea itself is great!

BR, Jarkko
Jarkko Sakkinen June 20, 2024, 12:18 a.m. UTC | #31
On Thu Jun 6, 2024 at 7:49 PM EEST,  wrote:
> > For any architectures dig a similar fact:
> > 
> > 1. Is not dead.
> > 2. Will be there also in future.
> > 
> > Make any architecture existentially relevant for and not too much
> > coloring in the text that is easy to check.
> > 
> > It is nearing 5k lines so you should be really good with measured
> > facts too (not just launch) :-)
>
> ... but overall I get your meaning. We will spend time on this sort of 
> documentation for the v10 release.

Yeah, I mean we live in the universe of 3 letter acronyms so
it is better to summarize the existential part, especially
in a ~5 KSLOC patch set ;-)

BR, Jarkko
Ross Philipson June 20, 2024, 4:55 p.m. UTC | #32
On 6/19/24 5:18 PM, Jarkko Sakkinen wrote:
> On Thu Jun 6, 2024 at 7:49 PM EEST,  wrote:
>>> For any architectures dig a similar fact:
>>>
>>> 1. Is not dead.
>>> 2. Will be there also in future.
>>>
>>> Make any architecture existentially relevant for and not too much
>>> coloring in the text that is easy to check.
>>>
>>> It is nearing 5k lines so you should be really good with measured
>>> facts too (not just launch) :-)
>>
>> ... but overall I get your meaning. We will spend time on this sort of
>> documentation for the v10 release.
> 
> Yeah, I mean we live in the universe of 3 letter acronyms so
> it is better to summarize the existential part, especially
> in a ~5 KSLOC patch set ;-)

Indeed, thanks.

Ross

> 
> BR, Jarkko
>
Daniel P. Smith Aug. 15, 2024, 6:52 p.m. UTC | #33
On 6/4/24 16:12, Jarkko Sakkinen wrote:
> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>> From: "Daniel P. Smith" <dpsmith@apertussolutions.com>
>>
>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
>> locality request is allowed to be sent to the TPM. In the commit, the counter
>> is indiscriminately decremented. Thus creating a situation for an integer
>> underflow of the counter.
>>
>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>> Reported-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
>> Fixes: 933bfc5ad213 ("tpm, tpm: Implement usage counter for locality")
> 
> Not sure if we have practical use for fixes tag here but open for
> argument ofc. I.e. I'm not sure what is the practical scenario to
> worry about if Trenchboot did not exist.

We can drop the fixes line.

>> ---
>>   drivers/char/tpm/tpm_tis_core.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 176cd8dbf1db..7c1761bd6000 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -180,7 +180,8 @@ static int tpm_tis_relinquish_locality(struct tpm_chip *chip, int l)
>>   	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>>   
>>   	mutex_lock(&priv->locality_count_mutex);
>> -	priv->locality_count--;
>> +	if (priv->locality_count > 0)
>> +		priv->locality_count--;
> 
> I'd signal the situation with pr_info() in else branch.

Ack.

>>   	if (priv->locality_count == 0)
>>   		__tpm_tis_relinquish_locality(priv, l);
>>   	mutex_unlock(&priv->locality_count_mutex);
> 
> BR, Jarkko
Daniel P. Smith Aug. 15, 2024, 7:24 p.m. UTC | #34
On 6/4/24 16:14, Jarkko Sakkinen wrote:
> On Fri May 31, 2024 at 4:03 AM EEST, Ross Philipson wrote:
>> From: "Daniel P. Smith" <dpsmith@apertussolutions.com>
>>
>> When tis core initializes, it assumes all localities are closed. There
> 
> s/tis_core/tpm_tis_core/

Ack.

>> are cases when this may not be the case. This commit addresses this by
>> ensuring all localities are closed before initializing begins.
>>
>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
>> ---
>>   drivers/char/tpm/tpm_tis_core.c | 11 ++++++++++-
>>   include/linux/tpm.h             |  6 ++++++
>>   2 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 7c1761bd6000..9fb53bb3e73f 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -1104,7 +1104,7 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>   	u32 intmask;
>>   	u32 clkrun_val;
>>   	u8 rid;
>> -	int rc, probe;
>> +	int rc, probe, i;
>>   	struct tpm_chip *chip;
>>   
>>   	chip = tpmm_chip_alloc(dev, &tpm_tis);
>> @@ -1166,6 +1166,15 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>>   		goto out_err;
>>   	}
>>   
>> +	/*
>> +	 * There are environments, like Intel TXT, that may leave a TPM
> 
> What else at this point than Intel TXT reflecting the state of the
> mainline?

Leaving the TPM in Locality 2 is a requirement of the TCG D-RTM 
specification. This will be the situation for AMD and Arm as well. The 
comment can be updated to ref the TCG spec instead of a specific 
implementation.

>> +	 * locality open. Close all localities to start from a known state.
>> +	 */
>> +	for (i = 0; i <= TPM_MAX_LOCALITY; i++) {
>> +		if (check_locality(chip, i))
>> +			tpm_tis_relinquish_locality(chip, i);
>> +	}
> 
> To be strict this should be enabled only for x86 platforms.
> 
> I.e. should be flagged.

As mentioned above, this will also affect Arm.

>> +
>>   	/* Take control of the TPM's interrupt hardware and shut it off */
>>   	rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>>   	if (rc < 0)
>> diff --git a/include/linux/tpm.h b/include/linux/tpm.h
>> index c17e4efbb2e5..363f7078c3a9 100644
>> --- a/include/linux/tpm.h
>> +++ b/include/linux/tpm.h
>> @@ -147,6 +147,12 @@ struct tpm_chip_seqops {
>>    */
>>   #define TPM2_MAX_CONTEXT_SIZE 4096
>>   
>> +/*
>> + * The maximum locality (0 - 4) for a TPM, as defined in section 3.2 of the
>> + * Client Platform Profile Specification.
>> + */
>> +#define TPM_MAX_LOCALITY		4
>> +
>>   struct tpm_chip {
>>   	struct device dev;
>>   	struct device devs;
> 
> 
> BR, Jarkko

v/r,
dps