[v5,00/16] Enable Linear Address Space Separation support

Message ID	20241028160917.1380714-1-alexander.shishkin@linux.intel.com
Headers	show Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55D3A1DE2DC; Mon, 28 Oct 2024 16:09:40 +0000 (UTC) From: Alexander Shishkin <alexander.shishkin@linux.intel.com> To: Andy Lutomirski <luto@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, Peter Zijlstra <peterz@infradead.org>, Ard Biesheuvel <ardb@kernel.org>, "Paul E. McKenney" <paulmck@kernel.org>, Josh Poimboeuf <jpoimboe@kernel.org>, Xiongwei Song <xiongwei.song@windriver.com>, Xin Li <xin3.li@intel.com>, "Mike Rapoport (IBM)" <rppt@kernel.org>, Brijesh Singh <brijesh.singh@amd.com>, Michael Roth <michael.roth@amd.com>, Tony Luck <tony.luck@intel.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Alexey Kardashevskiy <aik@amd.com> Cc: Jonathan Corbet <corbet@lwn.net>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Sohil Mehta <sohil.mehta@intel.com>, Ingo Molnar <mingo@kernel.org>, Pawan Gupta <pawan.kumar.gupta@linux.intel.com>, Daniel Sneddon <daniel.sneddon@linux.intel.com>, Kai Huang <kai.huang@intel.com>, Sandipan Das <sandipan.das@amd.com>, Breno Leitao <leitao@debian.org>, Rick Edgecombe <rick.p.edgecombe@intel.com>, Alexei Starovoitov <ast@kernel.org>, Hou Tao <houtao1@huawei.com>, Juergen Gross <jgross@suse.com>, Vegard Nossum <vegard.nossum@oracle.com>, Kees Cook <kees@kernel.org>, Eric Biggers <ebiggers@google.com>, Jason Gunthorpe <jgg@ziepe.ca>, "Masami Hiramatsu (Google)" <mhiramat@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Luis Chamberlain <mcgrof@kernel.org>, Yuntao Wang <ytcoode@gmail.com>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Christophe Leroy <christophe.leroy@csgroup.eu>, Tejun Heo <tj@kernel.org>, Changbin Du <changbin.du@huawei.com>, Huang Shijie <shijie@os.amperecomputing.com>, Geert Uytterhoeven <geert+renesas@glider.be>, Namhyung Kim <namhyung@kernel.org>, Arnaldo Carvalho de Melo <acme@redhat.com>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org Subject: [PATCH v5 00/16] Enable Linear Address Space Separation support Date: Mon, 28 Oct 2024 18:07:48 +0200 Message-ID: <20241028160917.1380714-1-alexander.shishkin@linux.intel.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	Enable Linear Address Space Separation support \| expand [v5,00/16] Enable Linear Address Space Separation support [v5,01/16] x86/cpu: Enumerate the LASS feature bits [v5,02/16] x86/asm: Introduce inline memcpy and memset [v5,03/16] x86/alternatives: Disable LASS when patching kernel alternatives [v5,04/16] init/main.c: Move EFI runtime service initialization to x86/cpu [v5,05/16] x86/cpu: Defer CR pinning setup until after EFI initialization [v5,06/16] efi: Disable LASS around set_virtual_address_map call [v5,07/16] x86/vsyscall: Reorganize the #PF emulation code [v5,08/16] x86/traps: Consolidate user fixups in exc_general_protection() [v5,09/16] x86/vsyscall: Add vsyscall emulation for #GP [v5,10/16] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE [v5,11/16] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS [v5,12/16] x86/cpu: Set LASS CR4 bit as pinning sensitive [v5,13/16] x86/traps: Communicate a LASS violation in #GP message [v5,14/16] x86/cpu: Make LAM depend on LASS [v5,15/16] x86/cpu: Enable LASS during CPU initialization [v5,16/16] Revert "x86/lam: Disable ADDRESS_MASKING in most cases"

Message ID

20241028160917.1380714-1-alexander.shishkin@linux.intel.com

Headers

From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
To: Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Xiongwei Song <xiongwei.song@windriver.com>,
	Xin Li <xin3.li@intel.com>,
	"Mike Rapoport (IBM)" <rppt@kernel.org>,
	Brijesh Singh <brijesh.singh@amd.com>,
	Michael Roth <michael.roth@amd.com>,
	Tony Luck <tony.luck@intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Alexey Kardashevskiy <aik@amd.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Sohil Mehta <sohil.mehta@intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Daniel Sneddon <daniel.sneddon@linux.intel.com>,
	Kai Huang <kai.huang@intel.com>,
	Sandipan Das <sandipan.das@amd.com>,
	Breno Leitao <leitao@debian.org>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Hou Tao <houtao1@huawei.com>,
	Juergen Gross <jgross@suse.com>,
	Vegard Nossum <vegard.nossum@oracle.com>,
	Kees Cook <kees@kernel.org>,
	Eric Biggers <ebiggers@google.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	"Masami Hiramatsu (Google)" <mhiramat@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Yuntao Wang <ytcoode@gmail.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Tejun Heo <tj@kernel.org>,
	Changbin Du <changbin.du@huawei.com>,
	Huang Shijie <shijie@os.amperecomputing.com>,
	Geert Uytterhoeven <geert+renesas@glider.be>,
	Namhyung Kim <namhyung@kernel.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-efi@vger.kernel.org
Subject: [PATCH v5 00/16] Enable Linear Address Space Separation support
Date: Mon, 28 Oct 2024 18:07:48 +0200
Message-ID: <20241028160917.1380714-1-alexander.shishkin@linux.intel.com>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Series

Enable Linear Address Space Separation support | expand

Message

Alexander Shishkin Oct. 28, 2024, 4:07 p.m. UTC

Changes from v4[8]:
- Added PeterZ's Originally-by and SoB to 2/16
- Added lass_clac()/lass_stac() to differentiate from SMAP necessitated
  clac()/stac() and to be NOPs on CPUs that don't support LASS
- Moved LASS enabling patch to the end to avoid rendering machines
  unbootable between until the patch that disables LASS around EFI
  initialization
- Reverted Pawan's LAM disabling commit

Changes from v3[6]:
- Made LAM dependent on LASS
- Moved EFI runtime initialization to x86 side of things
- Suspended LASS validation around EFI set_virtual_address_map call
- Added a message for the case of kernel side LASS violation
- Moved inline memset/memcpy versions to the common string.h

Changes from v2[5]:
- Added myself to the SoB chain

Changes from v1[1]:
- Emulate vsyscall violations in execute mode in the #GP fault handler
- Use inline memcpy and memset while patching alternatives
- Remove CONFIG_X86_LASS
- Make LASS depend on SMAP
- Dropped the minimal KVM enabling patch

Linear Address Space Separation (LASS) is a security feature that intends to
prevent malicious virtual address space accesses across user/kernel mode.

Such mode based access protection already exists today with paging and features
such as SMEP and SMAP. However, to enforce these protections, the processor
must traverse the paging structures in memory.  Malicious software can use
timing information resulting from this traversal to determine details about the
paging structures, and these details may also be used to determine the layout
of the kernel memory.

The LASS mechanism provides the same mode-based protections as paging but
without traversing the paging structures. Because the protections enforced by
LASS are applied before paging, software will not be able to derive
paging-based timing information from the various caching structures such as the
TLBs, mid-level caches, page walker, data caches, etc. LASS can avoid probing
using double page faults, TLB flush and reload, and SW prefetch instructions.
See [2], [3] and [4] for some research on the related attack vectors.

In addition, LASS prevents an attack vector described in a Spectre LAM (SLAM)
whitepaper [7].

LASS enforcement relies on the typical kernel implemetation to divide the
64-bit virtual address space into two halves:
  Addr[63]=0 -> User address space
  Addr[63]=1 -> Kernel address space
Any data access or code execution across address spaces typically results in a
#GP fault.

Kernel accesses usually only happen to the kernel address space. However, there
are valid reasons for kernel to access memory in the user half. For these cases
(such as text poking and EFI runtime accesses), the kernel can temporarily
suspend the enforcement of LASS by toggling SMAP (Supervisor Mode Access
Prevention) using the stac()/clac() instructions and in one instance a downright
disabling LASS for an EFI runtime call.

User space cannot access any kernel address while LASS is enabled.
Unfortunately, legacy vsyscall functions are located in the address range
0xffffffffff600000 - 0xffffffffff601000 and emulated in kernel.  To avoid
breaking user applications when LASS is enabled, extend the vsyscall emulation
in execute (XONLY) mode to the #GP fault handler.

In contrast, the vsyscall EMULATE mode is deprecated and not expected to be
used by anyone.  Supporting EMULATE mode with LASS would need complex
intruction decoding in the #GP fault handler and is probably not worth the
hassle. Disable LASS in this rare case when someone absolutely needs and
enables vsyscall=emulate via the command line.

[1] https://lore.kernel.org/lkml/20230110055204.3227669-1-yian.chen@intel.com/
[2] “Practical Timing Side Channel Attacks against Kernel Space ASLR”,
https://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[3] “Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR”, http://doi.acm.org/10.1145/2976749.2978356
[4] “Harmful prefetch on Intel”, https://ioactive.com/harmful-prefetch-on-intel/ (H/T Anders)
[5] https://lore.kernel.org/all/20230530114247.21821-1-alexander.shishkin@linux.intel.com/
[6] https://lore.kernel.org/all/20230609183632.48706-1-alexander.shishkin@linux.intel.com/
[7] https://download.vusec.net/papers/slam_sp24.pdf
[8] https://lore.kernel.org/all/20240710160655.3402786-1-alexander.shishkin@linux.intel.com/

Alexander Shishkin (7):
  init/main.c: Move EFI runtime service initialization to x86/cpu
  x86/cpu: Defer CR pinning setup until after EFI initialization
  efi: Disable LASS around set_virtual_address_map call
  x86/vsyscall: Document the fact that vsyscall=emulate disables LASS
  x86/traps: Communicate a LASS violation in #GP message
  x86/cpu: Make LAM depend on LASS
  Revert "x86/lam: Disable ADDRESS_MASKING in most cases"

Peter Zijlstra (1):
  x86/asm: Introduce inline memcpy and memset

Sohil Mehta (7):
  x86/cpu: Enumerate the LASS feature bits
  x86/alternatives: Disable LASS when patching kernel alternatives
  x86/vsyscall: Reorganize the #PF emulation code
  x86/traps: Consolidate user fixups in exc_general_protection()
  x86/vsyscall: Add vsyscall emulation for #GP
  x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
  x86/cpu: Enable LASS during CPU initialization

Yian Chen (1):
  x86/cpu: Set LASS CR4 bit as pinning sensitive

 .../admin-guide/kernel-parameters.txt         |  4 +-
 arch/x86/Kconfig                              |  1 -
 arch/x86/entry/vsyscall/vsyscall_64.c         | 61 +++++++++++++------
 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/disabled-features.h      |  4 +-
 arch/x86/include/asm/smap.h                   | 18 ++++++
 arch/x86/include/asm/string.h                 | 26 ++++++++
 arch/x86/include/asm/vsyscall.h               | 14 +++--
 arch/x86/include/uapi/asm/processor-flags.h   |  2 +
 arch/x86/kernel/alternative.c                 | 12 +++-
 arch/x86/kernel/cpu/common.c                  | 25 +++++++-
 arch/x86/kernel/cpu/cpuid-deps.c              |  2 +
 arch/x86/kernel/traps.c                       | 26 +++++---
 arch/x86/mm/fault.c                           |  2 +-
 arch/x86/platform/efi/efi.c                   | 13 ++++
 init/main.c                                   |  5 --
 tools/arch/x86/include/asm/cpufeatures.h      |  1 +
 17 files changed, 171 insertions(+), 46 deletions(-)

Comments

Matthew Wilcox Oct. 29, 2024, 5:14 p.m. UTC | #1

On Mon, Oct 28, 2024 at 06:07:48PM +0200, Alexander Shishkin wrote:
> Linear Address Space Separation (LASS) is a security feature that intends to
> prevent malicious virtual address space accesses across user/kernel mode.
> 
> Such mode based access protection already exists today with paging and features
> such as SMEP and SMAP. However, to enforce these protections, the processor
> must traverse the paging structures in memory.  Malicious software can use
> timing information resulting from this traversal to determine details about the
> paging structures, and these details may also be used to determine the layout
> of the kernel memory.
> 
> The LASS mechanism provides the same mode-based protections as paging but
> without traversing the paging structures. Because the protections enforced by
> LASS are applied before paging, software will not be able to derive
> paging-based timing information from the various caching structures such as the
> TLBs, mid-level caches, page walker, data caches, etc. LASS can avoid probing
> using double page faults, TLB flush and reload, and SW prefetch instructions.
> See [2], [3] and [4] for some research on the related attack vectors.
> 
> In addition, LASS prevents an attack vector described in a Spectre LAM (SLAM)
> whitepaper [7].
> 
> LASS enforcement relies on the typical kernel implemetation to divide the
> 64-bit virtual address space into two halves:
>   Addr[63]=0 -> User address space
>   Addr[63]=1 -> Kernel address space
> Any data access or code execution across address spaces typically results in a
> #GP fault.
> 
> Kernel accesses usually only happen to the kernel address space. However, there
> are valid reasons for kernel to access memory in the user half. For these cases
> (such as text poking and EFI runtime accesses), the kernel can temporarily
> suspend the enforcement of LASS by toggling SMAP (Supervisor Mode Access
> Prevention) using the stac()/clac() instructions and in one instance a downright
> disabling LASS for an EFI runtime call.
> 
> User space cannot access any kernel address while LASS is enabled.
> Unfortunately, legacy vsyscall functions are located in the address range
> 0xffffffffff600000 - 0xffffffffff601000 and emulated in kernel.  To avoid
> breaking user applications when LASS is enabled, extend the vsyscall emulation
> in execute (XONLY) mode to the #GP fault handler.
> 
> In contrast, the vsyscall EMULATE mode is deprecated and not expected to be
> used by anyone.  Supporting EMULATE mode with LASS would need complex
> intruction decoding in the #GP fault handler and is probably not worth the
> hassle. Disable LASS in this rare case when someone absolutely needs and
> enables vsyscall=emulate via the command line.

I lack the wit to read & understand these patches to answer this
question, so I'll just ask it:

What happens when the kernel does a NULL pointer dereference (due to a
bug)?  It's not an attempt to access userspace, but it should result in
a good bug report.  Normally this would be outside a STAC/CLAC region,
but I suppose technically it could be within one.