mbox series

[RFC,v0,0/6] Minimal Linux/arm64 VM firmware (written in Rust)

Message ID 20220314082644.3436071-1-ardb@kernel.org
Headers show
Series Minimal Linux/arm64 VM firmware (written in Rust) | expand

Message

Ard Biesheuvel March 14, 2022, 8:26 a.m. UTC
From: Ard Biesheuvel <ardb@google.com>

One of the tedious bits of booting a virtual machine under KVM on ARM is
dealing with guest memory coherency. This is due the fact that running
with the MMU off is problematic, as manipulations of memory by the guest
are incoherent with the host's cached view of memory. For this reason,
KVM needs to keep track of the MMU state of the guest, and perform cache
maintenance to the point of coherency (PoC) on all memory that is
exposed to the guest (and populated at stage 2) at that point.

Existing VM firmware is often based on bare metal firmware, which sets
up page tables with the MMU and caches off, and does the necessary (as
well as unnecessary *) cache maintenance to ensure that all
manipulations of memory performed with the MMU off are coherent, and not
covered by stale cachelines (either clean or dirty) that either obstruct
the view of the real memory contents, or are at risk of corrupting them
if such dirty cachelines are evicted and written back inadvertently.

As firmware is usually intimately tied to the memory topology of the
platform, we can do much better than this. Instead of setting up the
initial page tables at runtime, we can bake the into the boot image,
provided that it runs at an a priori known address. This means we can
enable MMU and caches straight out of reset, and defer all memory
accesses that go via the D side until after.

This is the approach taken by this series: it implements a minimal
firmware/bootloader for booting a Linux arm64 kernel on QEMU's
mach-virt, which does minimal code execution and no memory access (other
than instruction fetching) with the MMU disabled. Combined with the
series that I sent out recently [0] for Linux, which implements
something similar for the kernel itself, virtually all cache maintenance
to the PoC can be dropped from the boot flow (with the exception of the
.idmap page in the kernel itself). Given that no stores to memory occur
at all with the MMU off, KVM should be able to detect that the PoC
maintenance is no longer necessary when the MMU is turned on.

This is not only a simplification in itself, it also means that minimal
code execution occurs while restricted memory permissions are being
honoured: the firmware boots with WXN protections enabled, and the Rust
code itself as well as the text section of the loaded kernel Image need
to be mapped with read-only permissions in order to execute them.

This prototype is presented as v0, as it cuts some corners, while the
intent is to make this an implementation of EFI that provides all that
Linux needs to boot. Most notably,

- only ~900 MiB of DRAM is supported, due to the fact that the page
  table code I nicked greedily maps down to pages, and the heap is only
  around 2 MiB, so we run out of memory if we try to map more.

- it boots via the kernel's 'bare metal' entrypoint as EFI features are
  entirely missing for the moment.

- only uncompressed kernels are supported

How to build and run:

(first, build a kernel with [0] applied, so the image tolerates being
booted with MMU and caches enabled)

$ cargo build  # using a nightly Rust compiler

$ objcopy -O binary target/aarch64-unknown-linux-gnu/debug/efilite efilite.bin

$ qemu-system-aarch64 \
    -M virt,gic-version=host -cpu host -enable-kvm -smp 4 \
    -net none -nographic -m 900m -bios efilite.bin -kernel path/to/Image \
    -drive if=virtio,file=path/to/hda.xxx,format=xxx -append root=/dev/vda2

* U-Boot in particular carries a lot of set/way cache maintenance that
  was cargo culted from the v7 days, and should never be needed in VM

[0] https://lore.kernel.org/all/20220304175657.2744400-1-ardb@kernel.org/

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Kees Cook <keescook@chromium.org>

Ard Biesheuvel (6):
  Implement a bare metal Rust runtime on top of QEMU's mach-virt
  Add DTB processing
  Add paging code to manage the full ID map
  Discover QEMU fwcfg device and use it to load the kernel
  Remap code section of loaded kernel and boot it
  Temporarily pass the kaslr seed via register X1

 .cargo/config    |   5 +
 .gitignore       |   2 +
 Cargo.lock       |  87 ++++
 Cargo.toml       |  12 +
 efilite.lds      |  62 +++
 src/cmo.rs       |  37 ++
 src/console.rs   |  57 +++
 src/cstring.rs   |   9 +
 src/fwcfg.rs     |  85 ++++
 src/head.S       | 121 +++++
 src/main.rs      | 155 +++++-
 src/pagealloc.rs |  44 ++
 src/paging.rs    | 499 ++++++++++++++++++++
 src/pecoff.rs    |  23 +
 src/ttable.S     |  37 ++
 15 files changed, 1233 insertions(+), 2 deletions(-)
 create mode 100644 .cargo/config
 create mode 100644 Cargo.lock
 create mode 100644 efilite.lds
 create mode 100644 src/cmo.rs
 create mode 100644 src/console.rs
 create mode 100644 src/cstring.rs
 create mode 100644 src/fwcfg.rs
 create mode 100644 src/head.S
 create mode 100644 src/pagealloc.rs
 create mode 100644 src/paging.rs
 create mode 100644 src/pecoff.rs
 create mode 100644 src/ttable.S