From patchwork Fri Jan 10 18:40:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856335 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88996214A61 for ; Fri, 10 Jan 2025 18:40:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534458; cv=none; b=d7uwynpzrapZkkv8bp+0lTzxx9oYLYRVQXtXEU2Rwj+vjwLAR1Re8RZn37Et3HbbhQ/JrGMuuOxbbT0fMHza/jkxinUuC+KFtClvRABS3ntUBb+aypA7XR3rFcLXQ4ZndkK6iJc82HJarhQOAR5V7mEvUIdn3Y3cpRTDSD46Etk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534458; c=relaxed/simple; bh=BPqqyfx5HZlXL3i5o+Z6h6xBC+zWAMx+gPf6PkTRMIQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=toZfd0C1NYx+DHBNqKWYhAkMLQg3fX//ynl6qAOBB2E/LhUPaXs5SY8+eQIzr0PBPH6M5dTIsOSRrf1gvgFrGE0Er3Q6Y9Ldlmy4De0K6CYKsKIMdflpWjSVnOOp/aZfkdY8XjnFU5Wwaw8+59sTsdJ6TasjeDUMoTgHVsvO6rc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gdWzFfeq; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gdWzFfeq" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43619b135bcso12091035e9.1 for ; Fri, 10 Jan 2025 10:40:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534450; x=1737139250; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LLeVb0ZgyFkqRQqFmxHZC1txeWSVEDEElINoAb1GRXw=; b=gdWzFfeq2N4w6ax08IZbe+K8OoQYxSjbl/+XQzwCfqsh/bBoOrgeZ0QNO/sMdfPHFq EhmD8xoHn1R0uucczFACqkwBTMd59Xu6ZiU3X+krHv414P6VkKA1RSuwBl6/pMitiPTg s2rJqXUbw0XGQA82Q5zFYl5VuZGCihJux7GCOxa5Q+6/EgJ3CUoztRxgoLhkQdKzfp5o aWcEx8X0HWnHqcrwHqxMYB30x0kBhI2R3EokaScQWdgQCCDfPH0EwtDGqvGzGCVp9ICL 6MulC4KP57wBKW7k8sfU3R7kM2vWTm8wLKNcOFfnHNpYzj1dkJXmAzufh/wPzYfT1nwg fYHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534450; x=1737139250; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LLeVb0ZgyFkqRQqFmxHZC1txeWSVEDEElINoAb1GRXw=; b=trE23tmRwXGfDXuJr0odbtwArHN00Oc4QGKrlhK6yhZvmLP7AHk/rHYllOjcAumRv2 w9Bj27cGY9yKAqcacZTn3VQNaKVC5ee4QgJd0dN/FrwIm/HD3bLlrowvFwtXDuAgYCcn TRavfaXTTA3g2TRsiC27QjnBedHRdyu5OvHlb+x8CyQaeoBVR2ZczxgNYnbw8lLHPewj jfs2iyShWwYuS5k1tsAKs1U7pIJE5y/oggWShq3yhtM7pSAQfi9UgKKzgHWg4v5c2rrJ Y3QsxDhehvpGzgkHmih+R7MQgFo6YGG3yByzwq4AtKER2OBxfue5iSJEo5z3gSM4EKDV CF3g== X-Forwarded-Encrypted: i=1; AJvYcCUCDLl08CyjFQUd1l+ul15sbtb+z4AUXAmZIWgEhUfYzhy8kSPuN7jdzSozQh84RLPMKK7+3Sfa3Hw=@vger.kernel.org X-Gm-Message-State: AOJu0Yyjwsj+w2df+0qFf19RcH1TuozTF1F6ScIm3k9sqYTO9U6iFAMI Lupg9OARFP1nn3nJsvMob2lPhjKPR11ZdOC+pcqhgBHVb/CEv6tOTDCcbzxB6At4gXjK/w0Gz3B dldDjSQ0m7A== X-Google-Smtp-Source: AGHT+IEYbvktq49LdsReUEtwVFMZ4u0AErNwMdQm9eBuwPrj68sv++2uU8TAXJvwJwbpoJD68/9N90LfrlOyOw== X-Received: from wmbay14.prod.google.com ([2002:a05:600c:1e0e:b0:434:a8d7:e59b]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1d07:b0:434:9934:575 with SMTP id 5b1f17b1804b1-436e26a8f4dmr128290085e9.16.1736534450259; Fri, 10 Jan 2025 10:40:50 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:28 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-2-8419288bc805@google.com> Subject: [PATCH RFC v2 02/29] x86: Create CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Junaid Shahid Currently a nop config. Keeping as a separate commit for easy review of the boring bits. Later commits will use and enable this new config. This config is only added for non-UML x86_64 as other architectures do not yet have pending implementations. It also has somewhat artificial dependencies on !PARAVIRT and !KASAN which are explained in the Kconfig file. Co-developed-by: Junaid Shahid Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/alpha/include/asm/Kbuild | 1 + arch/arc/include/asm/Kbuild | 1 + arch/arm/include/asm/Kbuild | 1 + arch/arm64/include/asm/Kbuild | 1 + arch/csky/include/asm/Kbuild | 1 + arch/hexagon/include/asm/Kbuild | 1 + arch/loongarch/include/asm/Kbuild | 3 +++ arch/m68k/include/asm/Kbuild | 1 + arch/microblaze/include/asm/Kbuild | 1 + arch/mips/include/asm/Kbuild | 1 + arch/nios2/include/asm/Kbuild | 1 + arch/openrisc/include/asm/Kbuild | 1 + arch/parisc/include/asm/Kbuild | 1 + arch/powerpc/include/asm/Kbuild | 1 + arch/riscv/include/asm/Kbuild | 1 + arch/s390/include/asm/Kbuild | 1 + arch/sh/include/asm/Kbuild | 1 + arch/sparc/include/asm/Kbuild | 1 + arch/um/include/asm/Kbuild | 2 +- arch/x86/Kconfig | 14 ++++++++++++++ arch/xtensa/include/asm/Kbuild | 1 + include/asm-generic/asi.h | 5 +++++ 22 files changed, 41 insertions(+), 1 deletion(-) diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index 396caece6d6d99c7a428f439322a0a18452e1a42..ca72ce3baca13a32913ac9e01a8f86ef42180b1c 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y += agp.h generic-y += asm-offsets.h generic-y += kvm_para.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index 49285a3ce2398cc7442bc44172de76367dc33dda..68604480864bbcb58d896da6bdf71591006ab2f6 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += parport.h generic-y += user.h +generic-y += asi.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 03657ff8fbe3d202563184b8902aa181e7474a5e..1e2c3d8dbbd99bdf95dbc6b47c2c78092c68b808 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y += parport.h generated-y += mach-types.h generated-y += unistd-nr.h +generic-y += asi.h diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 4e350df9a02dd8de387b912740af69035da93e34..15f8aaaa96b80b5657b789ecf3529b1f18d16d80 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -14,6 +14,7 @@ generic-y += qrwlock.h generic-y += qspinlock.h generic-y += parport.h generic-y += user.h +generic-y += asi.h generated-y += cpucap-defs.h generated-y += sysreg-defs.h diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild index 9a9bc65b57a9d73dadc9d597700d7229f8554ddf..4f497118fb172d1f2bf0f9e472479f24227f42f4 100644 --- a/arch/csky/include/asm/Kbuild +++ b/arch/csky/include/asm/Kbuild @@ -11,3 +11,4 @@ generic-y += qspinlock.h generic-y += parport.h generic-y += user.h generic-y += vmlinux.lds.h +generic-y += asi.h diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild index 8c1a78c8f5271ebd47f1baad7b85e87220d1bbe8..b26f186bc03c2e135f8d125a4805b95a41513655 100644 --- a/arch/hexagon/include/asm/Kbuild +++ b/arch/hexagon/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y += extable.h generic-y += iomap.h generic-y += kvm_para.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild index 5b5a6c90e6e20771b1074a6262230861cc51bcb4..dd3d0c6891369a9dfa35ccfb8b81c8697c2a3e90 100644 --- a/arch/loongarch/include/asm/Kbuild +++ b/arch/loongarch/include/asm/Kbuild @@ -11,3 +11,6 @@ generic-y += ioctl.h generic-y += mmzone.h generic-y += statfs.h generic-y += param.h +generic-y += asi.h +generic-y += posix_types.h +generic-y += resource.h diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild index 0dbf9c5c6faeb30eeb38bea52ab7fade99bbd44a..faf0f135df4ab946ef115f3a2fc363f370fc7491 100644 --- a/arch/m68k/include/asm/Kbuild +++ b/arch/m68k/include/asm/Kbuild @@ -4,3 +4,4 @@ generic-y += extable.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += spinlock.h +generic-y += asi.h diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild index a055f5dbe00a31616592c3a848b49bbf9ead5d17..012e4bf83c13497dc296b66cd5e0fd519274306b 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y += parport.h generic-y += syscalls.h generic-y += tlb.h generic-y += user.h +generic-y += asi.h diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild index 7ba67a0d6c97b2879fb710aca05ae1e2d47c8ce2..3191699298d80735920481eecc64dd2d1dbd2e54 100644 --- a/arch/mips/include/asm/Kbuild +++ b/arch/mips/include/asm/Kbuild @@ -13,3 +13,4 @@ generic-y += parport.h generic-y += qrwlock.h generic-y += qspinlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild index 0d09829ed14454f2f15a32bf713fa1eb213e85ea..03a5ec74e28b3679a5ef7271606af3c07bb7a198 100644 --- a/arch/nios2/include/asm/Kbuild +++ b/arch/nios2/include/asm/Kbuild @@ -7,3 +7,4 @@ generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += spinlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild index cef49d60d74c0f46f01cf46cc35e1e52404185f3..6a81a58bf59e20cafa563c422df4dfa6f9f791ec 100644 --- a/arch/openrisc/include/asm/Kbuild +++ b/arch/openrisc/include/asm/Kbuild @@ -9,3 +9,4 @@ generic-y += spinlock.h generic-y += qrwlock_types.h generic-y += qrwlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild index 4fb596d94c8932dd1e12a765a21af5b5099fbafd..3cbb4eb14712c7bd6c248dd26ab91cc41da01825 100644 --- a/arch/parisc/include/asm/Kbuild +++ b/arch/parisc/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y += agp.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index e5fdc336c9b22527f824ed30d06b5e8c0fa8a1ef..e86cc027f35564c7b301c283043bde0e5d2d3b6a 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -7,3 +7,4 @@ generic-y += kvm_types.h generic-y += mcs_spinlock.h generic-y += qrwlock.h generic-y += early_ioremap.h +generic-y += asi.h diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild index 1461af12da6e2bbbff6cf737a7babf33bd298cdd..82060ed50d9beb1ea72d3570ad236d1e08d9d8c6 100644 --- a/arch/riscv/include/asm/Kbuild +++ b/arch/riscv/include/asm/Kbuild @@ -13,3 +13,4 @@ generic-y += qrwlock.h generic-y += qrwlock_types.h generic-y += user.h generic-y += vmlinux.lds.h +generic-y += asi.h diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index 297bf7157968907d6e4c4ff8b65deeef02dbd630..e15c2a138392b57b186633738ddda913474aa8c4 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y += asm-offsets.h generic-y += kvm_types.h generic-y += mcs_spinlock.h generic-y += mmzone.h +generic-y += asi.h diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild index fc44d9c88b41915a7021042eb8b462517cfdbd2c..ea19e4515828552f436d67f764607dd5d15cb19f 100644 --- a/arch/sh/include/asm/Kbuild +++ b/arch/sh/include/asm/Kbuild @@ -3,3 +3,4 @@ generated-y += syscall_table.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += parport.h +generic-y += asi.h diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index 43b0ae4c2c2112d4d4d3cb3c60e787b175172dea..cb9062c9be17fe276cc92d2ac99d8b165f6297bf 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -4,3 +4,4 @@ generated-y += syscall_table_64.h generic-y += agp.h generic-y += kvm_para.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index 18f902da8e99769da857d34af43141ea97a0ca63..6054972f1babdaebae64040b05ab48893915cb04 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -27,4 +27,4 @@ generic-y += trace_clock.h generic-y += kprobes.h generic-y += mm_hooks.h generic-y += vga.h -generic-y += video.h +generic-y += asi.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7b9a7e8f39acc8e9aeb7d4213e87d71047865f5c..5a50582eb210e9d1309856a737d32b76fa1bfc85 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2519,6 +2519,20 @@ config MITIGATION_PAGE_TABLE_ISOLATION See Documentation/arch/x86/pti.rst for more details. +config MITIGATION_ADDRESS_SPACE_ISOLATION + bool "Allow code to run with a reduced kernel address space" + default n + depends on X86_64 && !PARAVIRT && !UML + help + This feature provides the ability to run some kernel code + with a reduced kernel address space. This can be used to + mitigate some speculative execution attacks. + + The !PARAVIRT dependency is only because of lack of testing; in theory + the code is written to work under paravirtualization. In practice + there are likely to be unhandled cases, in particular concerning TLB + flushes. + config MITIGATION_RETPOLINE bool "Avoid speculative indirect branches in kernel" select OBJTOOL if HAVE_OBJTOOL diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild index fa07c686cbcc2153776a478ac4093846f01eddab..07cea6902f98053be244d026ed594fe7246755a6 100644 --- a/arch/xtensa/include/asm/Kbuild +++ b/arch/xtensa/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y += parport.h generic-y += qrwlock.h generic-y += qspinlock.h generic-y += user.h +generic-y += asi.h diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h new file mode 100644 index 0000000000000000000000000000000000000000..c4d9a5ff860a96428422a15000c622aeecc2d664 --- /dev/null +++ b/include/asm-generic/asi.h @@ -0,0 +1,5 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_GENERIC_ASI_H +#define __ASM_GENERIC_ASI_H + +#endif From patchwork Fri Jan 10 18:40:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856334 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CE4221422B for ; Fri, 10 Jan 2025 18:40:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534466; cv=none; b=Jd1kKIQNwdf9hgjeUlLdnUs1k95iiijvlPeYXiqSQ1NOlIVzde0j4RZLfz1cEV7sfT5394RnpwuWZP3bjKXfCQTxRuA502rHZn4sxFia2BGjnr/mhgjFENo8mtl8DHik7pBcPK8ST/pMrO+2vH9XizSHDQiSwinOaaT2As0ndkk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534466; c=relaxed/simple; bh=yUJhfygL4zShgEMCarvXajCch8gRGTatWhXnwaWQKgM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Z7o9nM+y4NfK8/l4KJJGDGdoHlB54UZ2EPEliRvQf0zCXcq/giksk3us8SfeEp+z+a0WB27rYDTk5+CMMFv2b9lF2Q/HhBwp/c8Ag7Uec2n3DhQ+YrnlKRFIt0k22YeOvf3WdDKeaztb4k0MwdMegpK4B1OrvHWqu4LEZN3YZvY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mozzmvQ5; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mozzmvQ5" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-436248d1240so11542235e9.0 for ; Fri, 10 Jan 2025 10:40:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534455; x=1737139255; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2/DwiwuishkWohq1Selw9AVX00i6WMTVrJEAuBvDTaE=; b=mozzmvQ5Q89WknVh3HFBLi6LdQv8dp+ZGQCV7FRye6W+J9vOvcUt5P56CUk6P347cV 6yXjhqfraYIXZzCC4EvY+1RIIk3snnlnXrLJvGMqORGj1HewheZW992mvfjip46vXA9T i9ZS4gRE8HbrrG8FNcFeSB7e6gk5kuQACCLYYQK30Ju/4t9WtDDAxGYW4JNGD58dyw1P 8ez9nimM0WH3Rq/p9CgkenJKT631QXvy+YWiwRzsUsu4+3sRv63+uZ0hg0y8e3hN76IH P+0eo6l44czEI8KUT9nF8lpzgTW8YooNxMaBkQz9HAvyVIOIAqVjY9h6HNz6jzWWS2Jl uA3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534455; x=1737139255; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2/DwiwuishkWohq1Selw9AVX00i6WMTVrJEAuBvDTaE=; b=m5PfreZJkk+uOnE1uIPZKQtY1BHnwC5sPJXlOMNYR17dFLwG7wph2P3RLj32moWIr5 23/c1vSUeENKbZ4bB4Ii1Wo9JqGei6Sx9n4OjxnueOLTHkDrnf5PZRKW6KSTHbCX0MG2 Df7p/28guIki8C53BDu4IEOSPhD1ipngATDrRRdWo7gzGRHz7QkX58si7mK/RjhSQTho 31hCew8xYG+nNuiDd7PRT/k/UleSRm9UC3qArxF2HQjZNBVOhgpUcmUPr1nk7v+5s6z/ 8pqP49m1E97wONlrGTt3kVyqEb9637tkReZe6/c5fpMEzM1pDelBhuWOCl3RSkW5P0Vh etRg== X-Forwarded-Encrypted: i=1; AJvYcCVp5xVs3TfyjrgEL/Z8YrqfZkDK0vMpTTqe/KOVlDd44QvxS3T4uvfCIcatYSfMMiBt8QK8SwRsAEE=@vger.kernel.org X-Gm-Message-State: AOJu0YyKNPBjNJvAE0TlMSSnlfZr6dncx5CN/BELSFN6U/MLHacm+MId Xv3DajYsXbfgFPnuTlBbsIS3Bcjis9+kX1fTI82xwKp3YSG8NFcKF8DsTo/v3hlun4o+I6Mn4iO YeHJUT3A6Sg== X-Google-Smtp-Source: AGHT+IFG2gVbYjLEgd+rVsDr92xsiDXdhg3a2JRtZbNVmnh3EgFoUPLrUSfZq6jOhgEgZFCkNrLAplfHg1IQBQ== X-Received: from wmqa17.prod.google.com ([2002:a05:600c:3491:b0:434:fa72:f1bf]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1d14:b0:436:5fc9:309d with SMTP id 5b1f17b1804b1-436e26f6d81mr55448175e9.30.1736534454514; Fri, 10 Jan 2025 10:40:54 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:30 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-4-8419288bc805@google.com> Subject: [PATCH RFC v2 04/29] mm: asi: Add infrastructure for boot-time enablement From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Junaid Shahid , Yosry Ahmed Add a boot time parameter to control the newly added X86_FEATURE_ASI. "asi=on" or "asi=off" can be used in the kernel command line to enable or disable ASI at boot time. If not specified, ASI enablement depends on CONFIG_ADDRESS_SPACE_ISOLATION_DEFAULT_ON, which is off by default. asi_check_boottime_disable() is modeled after pti_check_boottime_disable(). The boot parameter is currently ignored until ASI is fully functional. Once we have a set of ASI features checked in that we have actually tested, we will stop ignoring the flag. But for now let's just add the infrastructure so we can implement the usage code. Ignoring checkpatch.pl CONFIG_DESCRIPTION because the _DEFAULT_ON Kconfig is trivial to explain. Checkpatch-args: --ignore CONFIG_DESCRIPTION Co-developed-by: Junaid Shahid Signed-off-by: Junaid Shahid Co-developed-by: Yosry Ahmed Signed-off-by: Yosry Ahmed Signed-off-by: Brendan Jackman --- arch/x86/Kconfig | 9 +++++ arch/x86/include/asm/asi.h | 19 ++++++++-- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 ++++- arch/x86/mm/asi.c | 61 +++++++++++++++++++++++++++----- arch/x86/mm/init.c | 4 ++- include/asm-generic/asi.h | 4 +++ 7 files changed, 92 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5a50582eb210e9d1309856a737d32b76fa1bfc85..1fcb52cb8cd5084ac3cef04af61b7d1653162bdb 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2533,6 +2533,15 @@ config MITIGATION_ADDRESS_SPACE_ISOLATION there are likely to be unhandled cases, in particular concerning TLB flushes. + +config ADDRESS_SPACE_ISOLATION_DEFAULT_ON + bool "Enable address space isolation by default" + default n + depends on MITIGATION_ADDRESS_SPACE_ISOLATION + help + If selected, ASI is enabled by default at boot if the asi=on or + asi=off are not specified. + config MITIGATION_RETPOLINE bool "Avoid speculative indirect branches in kernel" select OBJTOOL if HAVE_OBJTOOL diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 7cc635b6653a3970ba9dbfdc9c828a470e27bd44..b9671ef2dd3278adceed18507fd260e21954d574 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -8,6 +8,7 @@ #include #include +#include #include #ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION @@ -66,6 +67,8 @@ * the N ASI classes. */ +#define static_asi_enabled() cpu_feature_enabled(X86_FEATURE_ASI) + /* * ASI uses a per-CPU tainting model to track what mitigation actions are * required on domain transitions. Taints exist along two dimensions: @@ -131,6 +134,8 @@ struct asi { DECLARE_PER_CPU_ALIGNED(struct asi *, curr_asi); +void asi_check_boottime_disable(void); + void asi_init_mm_state(struct mm_struct *mm); int asi_init_class(enum asi_class_id class_id, struct asi_taint_policy *taint_policy); @@ -155,7 +160,9 @@ void asi_exit(void); /* The target is the domain we'll enter when returning to process context. */ static __always_inline struct asi *asi_get_target(struct task_struct *p) { - return p->thread.asi_state.target; + return static_asi_enabled() + ? p->thread.asi_state.target + : NULL; } static __always_inline void asi_set_target(struct task_struct *p, @@ -166,7 +173,9 @@ static __always_inline void asi_set_target(struct task_struct *p, static __always_inline struct asi *asi_get_current(void) { - return this_cpu_read(curr_asi); + return static_asi_enabled() + ? this_cpu_read(curr_asi) + : NULL; } /* Are we currently in a restricted address space? */ @@ -175,7 +184,11 @@ static __always_inline bool asi_is_restricted(void) return (bool)asi_get_current(); } -/* If we exit/have exited, can we stay that way until the next asi_enter? */ +/* + * If we exit/have exited, can we stay that way until the next asi_enter? + * + * When ASI is disabled, this returns true. + */ static __always_inline bool asi_is_relaxed(void) { return !asi_get_target(current); diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 913fd3a7bac6506141de65f33b9ee61c615c7d7d..d6a808d10c3b8900d190ea01c66fc248863f05e2 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -474,6 +474,7 @@ #define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* BHI_DIS_S HW control enabled */ #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* Clear branch history at vmexit using SW loop */ #define X86_FEATURE_FAST_CPPC (21*32 + 5) /* AMD Fast CPPC */ +#define X86_FEATURE_ASI (21*32+6) /* Kernel Address Space Isolation */ /* * BUG word(s) diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index c492bdc97b0595ec77f89dc9b0cefe5e3e64be41..c7964ed4fef8b9441e1c0453da587787d8008d9d 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -50,6 +50,12 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +# define DISABLE_ASI 0 +#else +# define DISABLE_ASI (1 << (X86_FEATURE_ASI & 31)) +#endif + #ifdef CONFIG_MITIGATION_RETPOLINE # define DISABLE_RETPOLINE 0 #else @@ -154,7 +160,7 @@ #define DISABLED_MASK17 0 #define DISABLED_MASK18 (DISABLE_IBT) #define DISABLED_MASK19 (DISABLE_SEV_SNP) -#define DISABLED_MASK20 0 +#define DISABLED_MASK20 (DISABLE_ASI) #define DISABLED_MASK21 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 22) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 105cd8b43eaf5c20acc80d4916b761559fb95d74..5baf563a078f5b3a6cd4b9f5e92baaf81b0774c4 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -4,6 +4,7 @@ #include #include +#include #include #include #include @@ -29,6 +30,9 @@ static inline bool asi_class_id_valid(enum asi_class_id class_id) static inline bool asi_class_initialized(enum asi_class_id class_id) { + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + if (WARN_ON(!asi_class_id_valid(class_id))) return false; @@ -51,6 +55,9 @@ EXPORT_SYMBOL_GPL(asi_init_class); void asi_uninit_class(enum asi_class_id class_id) { + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + if (!asi_class_initialized(class_id)) return; @@ -66,10 +73,36 @@ const char *asi_class_name(enum asi_class_id class_id) return asi_class_names[class_id]; } +void __init asi_check_boottime_disable(void) +{ + bool enabled = IS_ENABLED(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION_DEFAULT_ON); + char arg[4]; + int ret; + + ret = cmdline_find_option(boot_command_line, "asi", arg, sizeof(arg)); + if (ret == 3 && !strncmp(arg, "off", 3)) { + enabled = false; + pr_info("ASI disabled through kernel command line.\n"); + } else if (ret == 2 && !strncmp(arg, "on", 2)) { + enabled = true; + pr_info("Ignoring asi=on param while ASI implementation is incomplete.\n"); + } else { + pr_info("ASI %s by default.\n", + enabled ? "enabled" : "disabled"); + } + + if (enabled) + pr_info("ASI enablement ignored due to incomplete implementation.\n"); +} + static void __asi_destroy(struct asi *asi) { - lockdep_assert_held(&asi->mm->asi_init_lock); + WARN_ON_ONCE(asi->ref_count <= 0); + if (--(asi->ref_count) > 0) + return; + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); + memset(asi, 0, sizeof(struct asi)); } int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_asi) @@ -79,6 +112,9 @@ int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_ *out_asi = NULL; + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + if (WARN_ON(!asi_class_initialized(class_id))) return -EINVAL; @@ -122,7 +158,7 @@ void asi_destroy(struct asi *asi) { struct mm_struct *mm; - if (!asi) + if (!boot_cpu_has(X86_FEATURE_ASI) || !asi) return; if (WARN_ON(!asi_class_initialized(asi->class_id))) @@ -134,11 +170,7 @@ void asi_destroy(struct asi *asi) * to block concurrent asi_init calls. */ mutex_lock(&mm->asi_init_lock); - WARN_ON_ONCE(asi->ref_count <= 0); - if (--(asi->ref_count) == 0) { - free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); - memset(asi, 0, sizeof(struct asi)); - } + __asi_destroy(asi); mutex_unlock(&mm->asi_init_lock); } EXPORT_SYMBOL_GPL(asi_destroy); @@ -255,6 +287,9 @@ static noinstr void __asi_enter(void) noinstr void asi_enter(struct asi *asi) { + if (!static_asi_enabled()) + return; + VM_WARN_ON_ONCE(!asi); /* Should not have an asi_enter() without a prior asi_relax(). */ @@ -269,8 +304,10 @@ EXPORT_SYMBOL_GPL(asi_enter); noinstr void asi_relax(void) { - barrier(); - asi_set_target(current, NULL); + if (static_asi_enabled()) { + barrier(); + asi_set_target(current, NULL); + } } EXPORT_SYMBOL_GPL(asi_relax); @@ -279,6 +316,9 @@ noinstr void asi_exit(void) u64 unrestricted_cr3; struct asi *asi; + if (!static_asi_enabled()) + return; + preempt_disable_notrace(); VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) == @@ -310,6 +350,9 @@ EXPORT_SYMBOL_GPL(asi_exit); void asi_init_mm_state(struct mm_struct *mm) { + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + memset(mm->asi, 0, sizeof(mm->asi)); mutex_init(&mm->asi_init_lock); } diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index de4227ed5169ff84d0ce80b677caffc475198fa6..ded3a47f2a9c1f554824d4ad19f3b48bce271274 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -28,6 +28,7 @@ #include #include #include +#include /* * We need to define the tracepoints somewhere, and tlb.c @@ -251,7 +252,7 @@ static void __init probe_page_size_mask(void) __default_kernel_pte_mask = __supported_pte_mask; /* Except when with PTI where the kernel is mostly non-Global: */ if (cpu_feature_enabled(X86_FEATURE_PTI) || - IS_ENABLED(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION)) + cpu_feature_enabled(X86_FEATURE_ASI)) __default_kernel_pte_mask &= ~_PAGE_GLOBAL; /* Enable 1 GB linear kernel mappings if available: */ @@ -754,6 +755,7 @@ void __init init_mem_mapping(void) unsigned long end; pti_check_boottime_disable(); + asi_check_boottime_disable(); probe_page_size_mask(); setup_pcid(); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 6b84202837605fa57e4a910318c8353b3f816f06..eedc961ee916a9e1da631ca489ea4a7bc9e6089f 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -65,6 +65,10 @@ static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } static inline void asi_handle_switch_mm(void) { } +#define static_asi_enabled() false + +static inline void asi_check_boottime_disable(void) { } + #endif /* !CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ #endif /* !_ASSEMBLY_ */ From patchwork Fri Jan 10 18:40:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856333 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4225321858C for ; Fri, 10 Jan 2025 18:41:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534470; cv=none; b=Co/GHj2iMuBG7PVnOqplo/9gPYKhlMyMIAFjLKvS2J49LckQ786NKimFyAg3b/EDKgK63h4W2oFmIA30NbqIiR0BP9rcfPMIwz1B+VNYiiSCGryaxLNMOJyq0C/tKkVSef0Q3TTTd5MADpuKZ0iXB/QPYv9G+jHKiz2JYl97WOg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534470; c=relaxed/simple; bh=ZyyS472lY9ilNqBUUS6XBgXxtvVvCl5YE/QkVX7gPU4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Eg38kNcaXUoBqtTVUZFImtyqE0fdM1dOEDk59I4Ho5anpb9uuhSlbL0acaCzbV5sOHHNNqDjhhRhvkB9YyWMmpOR+FkA3KURbCeeDLU5blAtAPZsEs6gMVYhpeY/+xu2A8vkebc2YCrV3oQWUCo6dx4cTKckWbIdCdTlQQO2EeY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MwGFD9OK; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MwGFD9OK" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-43646b453bcso12405035e9.3 for ; Fri, 10 Jan 2025 10:41:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534459; x=1737139259; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=kYL7XeCfcudeBLlUDEGoh0o6eIbGWBwJtwhvHOdo8c0=; b=MwGFD9OKwwBWPJmYN7PHKR2peoxxK5IY840imBFglsc81KeOezd/D6v+lEfvCa90ex 67ZLw+nWPdbx+5QvceOBeAhFPg5FbFe9oLO62doYkEiC+eC46xuCxlMxAVlUg55mCYiz E7uFxgfNlhs2zFawZcDe2zOUmj2p19oW3uL1oyvL6i3Ia+kuaEj/+D7VGp50GMGrG+Uq i2+MglutLWhMNIZ87EdT6P91s5kh46w1AmqL7rYybbk+FlE6r1rR+nUHgUBcao/CbhAI /D21HDdzNAiZDHYI1fMF5AEpmBBGJSWaMCZF2f+vgEB3SBG0+KuoKj8UPl/2QnV8O3An nyjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534459; x=1737139259; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kYL7XeCfcudeBLlUDEGoh0o6eIbGWBwJtwhvHOdo8c0=; b=wvxs61OThoXXxcqXNSWbJOa6hk+QkWKboPYK4mS362bVWUL83QXis9T2JbSg14oUGL 4afGgacFjkvU0WKdfmhNIcuUP1WyNKgX2acG2nZZgwsEptgThapqMgYmsC6B9BrxTrqN O9XoN0wtZVm4CsBBPnz7pQESogqaQPia6Z7T6AjhpYvD7vvns//XQzn43x+Nq7iSavIg qoU2V/Q924cC6fktIgS5YUSwFV7gUNGxKyK00tD3joBsJDDhRMyGwrZDg48fjKgIOgxG fELwL57q5tg5XpqQN1l9V1AsAdUsB5kvlKseIEKP0DvQKqeS+j2+gbZdj359JZYOHSE/ 4Vaw== X-Forwarded-Encrypted: i=1; AJvYcCWSHc9JH/61MnJDSA2qAAgTffe2iAcWBp3Hg6zaI9aFKxXNJvIg6JJnQtn7zahoQXM8wIJrZtYU6uM=@vger.kernel.org X-Gm-Message-State: AOJu0YxLPM3+uR7Py/Wrww28qIH+Cw0G24luXc9ge6VUwtlFl7QIQFgZ VS0vzwIHO2mo9co4eOyC7hNMbgRJ6My5EzRZM/J0LAzU95+NaZm0iG4Jqm9+A5wJMUu4Dkj3KVL 5vEbX471j0Q== X-Google-Smtp-Source: AGHT+IH0DyTQGUBeXAkb8pkdNHIPyF1JMCOQ9V4x2A26uY/IXi2bUp3/F2mGP9HMTIoPxxyBiM7AkavMfpYEag== X-Received: from wmdn10.prod.google.com ([2002:a05:600c:294a:b0:436:d819:e4eb]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4f09:b0:434:fafe:edb with SMTP id 5b1f17b1804b1-436e26f00fdmr96576145e9.24.1736534459144; Fri, 10 Jan 2025 10:40:59 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:32 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-6-8419288bc805@google.com> Subject: [PATCH RFC v2 06/29] mm: asi: Use separate PCIDs for restricted address spaces From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Yosry Ahmed , Junaid Shahid From: Yosry Ahmed Each restricted address space is assigned a separate PCID. Since currently only one ASI instance per-class exists for a given process, the PCID is just derived from the class index. This commit only sets the appropriate PCID when switching CR3, but does not actually use the NOFLUSH bit. That will be done by later patches. Co-developed-by: Junaid Shahid Signed-off-by: Junaid Shahid Signed-off-by: Yosry Ahmed Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 4 +-- arch/x86/include/asm/processor-flags.h | 24 +++++++++++++++++ arch/x86/include/asm/tlbflush.h | 3 +++ arch/x86/mm/asi.c | 10 +++---- arch/x86/mm/tlb.c | 49 +++++++++++++++++++++++++++++++--- include/asm-generic/asi.h | 2 ++ 6 files changed, 81 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 9a9a139518289fc65f26a4d1cd311aa52cc5357f..a55e73f1b2bc84c41b9ab25f642a4d5f1aa6ba90 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -4,13 +4,13 @@ #include -#include - #include #include #include #include +#include + #ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION /* diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h index e5f204b9b33dfaa92ed1b05faa6b604e50d5f2f3..42c5acb67c2d2a6b03deb548fe3dd088baa88842 100644 --- a/arch/x86/include/asm/processor-flags.h +++ b/arch/x86/include/asm/processor-flags.h @@ -55,4 +55,28 @@ # define X86_CR3_PTI_PCID_USER_BIT 11 #endif +/* + * An ASI identifier is included in the higher bits of PCID to use a different + * PCID for each restricted address space, different from the PCID of the + * unrestricted address space (see asi_pcid()). We use the bits directly after + * the bit used by PTI (if any). + */ +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + +#define X86_CR3_ASI_PCID_BITS 2 + +/* Use the highest available PCID bits after the PTI bit (if any) */ +#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION +#define X86_CR3_ASI_PCID_END_BIT (X86_CR3_PTI_PCID_USER_BIT - 1) +#else +#define X86_CR3_ASI_PCID_END_BIT (X86_CR3_PCID_BITS - 1) +#endif + +#define X86_CR3_ASI_PCID_BITS_SHIFT (X86_CR3_ASI_PCID_END_BIT - X86_CR3_ASI_PCID_BITS + 1) +#define X86_CR3_ASI_PCID_MASK (((1UL << X86_CR3_ASI_PCID_BITS) - 1) << X86_CR3_ASI_PCID_BITS_SHIFT) + +#else +#define X86_CR3_ASI_PCID_BITS 0 +#endif + #endif /* _ASM_X86_PROCESSOR_FLAGS_H */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index c884174a44e119a3c027c44ada6c5cdba14d1282..f167feb5ebdfc7faba26b8b18ac65888cd9b0494 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -425,5 +425,8 @@ static inline void __native_tlb_flush_global(unsigned long cr4) } unsigned long build_cr3_noinstr(pgd_t *pgd, u16 asid, unsigned long lam); +unsigned long build_cr3_pcid_noinstr(pgd_t *pgd, u16 pcid, unsigned long lam, bool noflush); + +u16 asi_pcid(struct asi *asi, u16 asid); #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 054315d566c082c0925a00ce3a0877624c8b9957..8d060c633be68b508847e2c1c111761df1da92af 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -238,6 +238,7 @@ static __always_inline void maybe_flush_data(struct asi *next_asi) noinstr void __asi_enter(void) { u64 asi_cr3; + u16 pcid; struct asi *target = asi_get_target(current); /* @@ -266,9 +267,8 @@ noinstr void __asi_enter(void) this_cpu_write(curr_asi, target); maybe_flush_control(target); - asi_cr3 = build_cr3_noinstr(target->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid), - tlbstate_lam_cr3_mask()); + pcid = asi_pcid(target, this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + asi_cr3 = build_cr3_pcid_noinstr(target->pgd, pcid, tlbstate_lam_cr3_mask(), false); write_cr3(asi_cr3); maybe_flush_data(target); @@ -335,8 +335,8 @@ noinstr void asi_exit(void) unrestricted_cr3 = build_cr3_noinstr(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid), - tlbstate_lam_cr3_mask()); + this_cpu_read(cpu_tlbstate.loaded_mm_asid), + tlbstate_lam_cr3_mask()); /* Tainting first makes reentrancy easier to reason about. */ this_cpu_or(asi_taints, ASI_TAINT_KERNEL_DATA); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 7c2309996d1d5a7cac23bd122f7b56a869d67d6a..2601beed83aef182d88800c09d70e4c5e95e7ed0 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -13,6 +13,7 @@ #include #include +#include #include #include #include @@ -96,7 +97,10 @@ # define PTI_CONSUMED_PCID_BITS 0 #endif -#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS) +#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS - \ + X86_CR3_ASI_PCID_BITS) + +static_assert(BIT(CR3_AVAIL_PCID_BITS) > TLB_NR_DYN_ASIDS); /* * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account @@ -125,6 +129,11 @@ static __always_inline u16 kern_pcid(u16 asid) */ VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_PCID_USER_BIT)); #endif + +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_ASI_PCID_BITS_SHIFT)); + VM_WARN_ON_ONCE(asid & X86_CR3_ASI_PCID_MASK); +#endif /* * The dynamically-assigned ASIDs that get passed in are small * (class_id + 1) << X86_CR3_ASI_PCID_BITS_SHIFT); + // return kern_pcid(asid) | ((asi->index + 1) << X86_CR3_ASI_PCID_BITS_SHIFT); +} + +#else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ + +u16 asi_pcid(struct asi *asi, u16 asid) { return kern_pcid(asid); } + +#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ + void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables) diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 7f542c59c2b8a2b74432e4edb7199f9171db8a84..f777a6cf604b0656fb39087f6eba08f980b2cb6f 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -2,6 +2,7 @@ #ifndef __ASM_GENERIC_ASI_H #define __ASM_GENERIC_ASI_H +#include #include #ifndef _ASSEMBLY_ @@ -16,6 +17,7 @@ enum asi_class_id { #endif ASI_MAX_NUM_CLASSES, }; +static_assert(order_base_2(X86_CR3_ASI_PCID_BITS) <= ASI_MAX_NUM_CLASSES); typedef u8 asi_taints_t; From patchwork Fri Jan 10 18:40:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856332 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1684621CA14 for ; Fri, 10 Jan 2025 18:41:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534481; cv=none; b=daRPC3VWhft3bqfOiiN/PLmFtEqDR1kk69i9rpFvRSVoxVznDvMJm+mw8kGwxxc/8MJikNxKHu2+SV+FUk2NCQ4u3y9S2g/aZ1AyFPQotJVIA9MqA8qDmb+VIWnL6M4eJ+uIhM+yW0UMP7zXpOwWDyrVE2iOsWBQKVsyimRMlX4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534481; c=relaxed/simple; bh=/mfc3OXZKkMfjutCKVHn2IBTQNtJN2eL6dFZpIMjD6c=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=EZTtISjLDHF7GFQ221GIC+kOvz6gV3spFQilwjQj864/wXSdOTHOSCsZWl0lPAJSfVIMRf8i3JKzXNwxqaiaA9Cf9ElifXRBOp8+3KXlkOd9VESdciE6Uxnf7Sazwjltg0Z0QXPpccWXCh/RuPTA8G5H5MD+zCThgbuzRjlWP/Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=daSPyeIy; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="daSPyeIy" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-436219070b4so11487275e9.1 for ; Fri, 10 Jan 2025 10:41:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534464; x=1737139264; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=NdZ1cQRVb5G6t/AzbRzOJmiK1PmEXn2j4H5pJNPFFDE=; b=daSPyeIyBPIsSgowHXYJtgHj6XtYA3XMkLR6T0xN8aDtbXLKHuzkLA5toAKySqpg3Y fVylXdVKxN72v3IuLI7U+KDKYYJ+rHhV0G/eXzLoM+KWXRma6vBd7W3ziBv2xzluXsRq IbXLyeJxNI19q9sQFxZMFfhNJSZmkx8xYskG33g/Uj4Bg5SSV9cRx9+0JZGgPv4s2220 s06GRgWySEj4pt4EnIV9RQfyKRGq0pOChMy1Y0DgqjK3Yk3LDIH4A3SJ0kJDyWfvUMR1 B1lE1NsZHvfcL7Y3uJK6ymU/vRhfMUICIgPFZNJw2Rab8spYVyhykzjxMXNdBoq0vtQc ihpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534464; x=1737139264; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NdZ1cQRVb5G6t/AzbRzOJmiK1PmEXn2j4H5pJNPFFDE=; b=fNg7a/VPJDE6RWoEyizCTS9ZdThX2/I9Ep+EfWp++ktgnX1bJuqZhqB7gvG7NWW81x VfezLwzUFBhUPh33C4GFznsJjsRESDc4P4yXYS2t/b+rTLN1rONxSj8DVrgeOiXPQKGl tP+mOLMv5UOhcb7ZjP5mcW2mLaFIYY+DF+11/lhtNpolp3zCmcKhGNkFL/8YpY2T6spJ a42/6gmzFr3EuXg0/olDLqDUsu6RgplBTkReX9JP5Y9E6BIuwNZ/X4dW8nBo8aGtuKo4 2h+IcZzZPQhEHvjlEDLcnluuO3aMI14Ky7l5lu7jBigryx9M6fuPIHoLeerckqd87sXz jcsA== X-Forwarded-Encrypted: i=1; AJvYcCUrdw8hItu/SAXJfab0kORohfoqJB9ycZBRApxp2xbtSLt9wScWiRUsVvrrttSGmVPmy3u1JtC5rPg=@vger.kernel.org X-Gm-Message-State: AOJu0YzXzn+kgLapw9QH8nN8Ktaf6glvoKMHshpCxYMrJY/tbk+M9bsJ CceSDyCeuRd+btQ321QbHUVcpHBLbtOUfHrGmZAeaID++h0QVtbGelPlL9YO0s2xHIsPJEwECmp rMK3DFFVGMQ== X-Google-Smtp-Source: AGHT+IHOqPqFuZt9UmOoUeA2JDIFLsGLn/JF5TCYjoXI0XK4RQVEWWzyMiwW2MB3CIHY3dySJS8wD0E+4CgsAA== X-Received: from wmgg11.prod.google.com ([2002:a05:600d:b:b0:434:ff52:1c7]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4ed3:b0:434:f7e3:bfbd with SMTP id 5b1f17b1804b1-436e26dda8cmr98320145e9.23.1736534463780; Fri, 10 Jan 2025 10:41:03 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:34 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-8-8419288bc805@google.com> Subject: [PATCH RFC v2 08/29] mm: asi: Avoid warning from NMI userspace accesses in ASI context From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Junaid Shahid , Yosry Ahmed nmi_uaccess_okay() emits a warning if current CR3 != mm->pgd. Limit the warning to only when ASI is not active. Co-developed-by: Junaid Shahid Signed-off-by: Junaid Shahid Co-developed-by: Yosry Ahmed Signed-off-by: Yosry Ahmed Signed-off-by: Brendan Jackman --- arch/x86/mm/tlb.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index b2a13fdab0c6454c1d9d4e3338801f3402da4191..c41e083c5b5281684be79ad0391c1a5fc7b0c493 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1340,6 +1340,22 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) put_cpu(); } +static inline bool cr3_matches_current_mm(void) +{ + struct asi *asi = asi_get_current(); + pgd_t *pgd_asi = asi_pgd(asi); + pgd_t *pgd_cr3; + + /* + * Prevent read_cr3_pa -> [NMI, asi_exit] -> asi_get_current, + * otherwise we might find CR3 pointing to the ASI PGD but not + * find a current ASI domain. + */ + barrier(); + pgd_cr3 = __va(read_cr3_pa()); + return pgd_cr3 == current->mm->pgd || pgd_cr3 == pgd_asi; +} + /* * Blindly accessing user memory from NMI context can be dangerous * if we're in the middle of switching the current user task or @@ -1355,10 +1371,10 @@ bool nmi_uaccess_okay(void) VM_WARN_ON_ONCE(!loaded_mm); /* - * The condition we want to check is - * current_mm->pgd == __va(read_cr3_pa()). This may be slow, though, - * if we're running in a VM with shadow paging, and nmi_uaccess_okay() - * is supposed to be reasonably fast. + * The condition we want to check that CR3 points to either + * current_mm->pgd or an appropriate ASI PGD. Reading CR3 may be slow, + * though, if we're running in a VM with shadow paging, and + * nmi_uaccess_okay() is supposed to be reasonably fast. * * Instead, we check the almost equivalent but somewhat conservative * condition below, and we rely on the fact that switch_mm_irqs_off() @@ -1367,7 +1383,7 @@ bool nmi_uaccess_okay(void) if (loaded_mm != current_mm) return false; - VM_WARN_ON_ONCE(current_mm->pgd != __va(read_cr3_pa())); + VM_WARN_ON_ONCE(!cr3_matches_current_mm()); return true; } From patchwork Fri Jan 10 18:40:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856330 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEB012206A2 for ; Fri, 10 Jan 2025 18:41:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534493; cv=none; b=s2ZkcwB+Cei8b7F8GeeQtJdcYzzxGhZRpmf7vgydJEdEfFLIkhR4R8HCCSeeC5BTat7/2JBvXdDaapUbTcenSGYeSY9glp/+EBZYuFeEfPtkfRnT2P13vGHFvKwAnvYujDjZ+qDftpe9k/X3bHjSn87CwkMuWGo4+0BYMvgslgw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534493; c=relaxed/simple; bh=XnAw5nMnRVqakR2t02DFaQRnPs8banykJ7ILMz9ZBlM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=H9fE136vkp1JulPTck8x8xCLSwIGNyiMjA+S1Wq/G0f4tgmF0eDPfoafGYvvxdo6bZ1eDKXA9F8oq5n5KVeIn+0t5tPhfguUOC+0D1TI1Sx1tyh5ZO0Fl6nb+GGFXV8eoSwrM6D1j8vOQMGYGuKeJlmAdX5xbVq1Wjg1gC5mnOg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=G1T3ZOcw; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="G1T3ZOcw" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-4359206e1e4so19288785e9.2 for ; Fri, 10 Jan 2025 10:41:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534466; x=1737139266; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WaFp0TOYaWSuGPhKgfBnenMecO/UsWYRxmgXZ26ETaU=; b=G1T3ZOcwHTnab8YlPK6O9gOURuFau0S6MmThrjvXzqfR2WA3xrOQzRp4j+AkWXuaco EA6mcKAhzDdYPUK4CJMRphQzjnG25y5NEQRBe/300nVD6UeJ9V/M7K0oWS/Bnw7eFq/4 uTacpc46ZOJdPtA8yS16LGBxBmfu1mL38N4FxeaK5onfNijZ32LpEk80g5WzUfep2vWU adfGCrFJvJL+2SHF6BNs8iCCrEWIKDxColBTeznrquPHenMvsifoK3PlW/gjUQ1XYPEQ JDzcviI/JFHzUZjDHxGlPhrQlDbC/wNEmBWmiRLOu67Mix0VaV3GmmsECOPb4jyJFS3B IHUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534466; x=1737139266; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WaFp0TOYaWSuGPhKgfBnenMecO/UsWYRxmgXZ26ETaU=; b=iWIZsb3ibASqzK4ezt+lLoidtSmd8XG+v33wFNSlwVzGMzKEv4rf/sC+QwEJTutBoV 9bxbYSIzzAtYnIiQFw5y46dC7q830s1GjffgvualKtWlpPAEjWgFkmLeUWiLDr09CJ0A DHjr3MnnU81HZExpPEf99fCbgBE2EiI5BTrpraiuQj/cJK5fEsdmh1Vv0eAN0im0pxkO vbjnyIJJYaWgxYTbbWFn91v9bSl78O7FLYCXZaaDfcUiw3q2OOQJJUKbyYbWfCXfeDtY TgpPVkFeamMHs9I3cYItU0M5pkoVZEnlJJVtnEv04ap7CBt0fyiMb+y3vzTG5x6R5B6C RKmQ== X-Forwarded-Encrypted: i=1; AJvYcCWTbaAsjR78Qrw8aiE1pISlz6UwpcLTvejmGNdXKkruSs6XtSXviIet7nRK2V2EoQEQek2l16AmIF8=@vger.kernel.org X-Gm-Message-State: AOJu0YwiPZmWfP5CGUVw4B5ejc8AfDJo6tovpLNXVbLTmWX1Wf9QBK3V w4YwsWpbuFJTzZ5HijFeEPzDu8wUZld0kx/YZxKCji1p4SxweM5JlG5iexzKcX2Te6ZSD01KuFY LYDs5sf8dTg== X-Google-Smtp-Source: AGHT+IEE4az+Oz6XpM1h3immNPii0BZQy1AjHZ6Oe5B/in2qx8QsaCvn5PLTDmCUbo/W1PshFIv+Soo2Sqz9XA== X-Received: from wmrn35.prod.google.com ([2002:a05:600c:5023:b0:434:f2eb:aa72]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:3ca4:b0:434:a26c:8291 with SMTP id 5b1f17b1804b1-436e26e203emr101768035e9.24.1736534465947; Fri, 10 Jan 2025 10:41:05 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:35 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-9-8419288bc805@google.com> Subject: [PATCH RFC v2 09/29] mm: asi: ASI page table allocation functions From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Junaid Shahid From: Junaid Shahid This adds custom allocation and free functions for ASI page tables. The alloc functions support allocating memory using different GFP reclaim flags, in order to be able to support non-sensitive allocations from both standard and atomic contexts. They also install the page tables locklessly, which makes it slightly simpler to handle non-sensitive allocations from interrupts/exceptions. checkpatch.pl MACRO_ARG_UNUSED,SPACING is false positive. COMPLEX_MACRO - I dunno, suggestions welcome. Checkpatch-args: --ignore=MACRO_ARG_UNUSED,SPACING,COMPLEX_MACRO Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 8d060c633be68b508847e2c1c111761df1da92af..b15d043acedc9f459f17e86564a15061650afc3a 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -73,6 +73,65 @@ const char *asi_class_name(enum asi_class_id class_id) return asi_class_names[class_id]; } +#ifndef mm_inc_nr_p4ds +#define mm_inc_nr_p4ds(mm) do {} while (false) +#endif + +#ifndef mm_dec_nr_p4ds +#define mm_dec_nr_p4ds(mm) do {} while (false) +#endif + +#define pte_offset pte_offset_kernel + +/* + * asi_p4d_alloc, asi_pud_alloc, asi_pmd_alloc, asi_pte_alloc. + * + * These are like the normal xxx_alloc functions, but: + * + * - They use atomic operations instead of taking a spinlock; this allows them + * to be used from interrupts. This is necessary because we use the page + * allocator from interrupts and the page allocator ultimately calls this + * code. + * - They support customizing the allocation flags. + * + * On the other hand, they do not use the normal page allocation infrastructure, + * that means that PTE pages do not have the PageTable type nor the PagePgtable + * flag and we don't increment the meminfo stat (NR_PAGETABLE) as they do. + */ +static_assert(!IS_ENABLED(CONFIG_PARAVIRT)); +#define DEFINE_ASI_PGTBL_ALLOC(base, level) \ +__maybe_unused \ +static level##_t * asi_##level##_alloc(struct asi *asi, \ + base##_t *base, ulong addr, \ + gfp_t flags) \ +{ \ + if (unlikely(base##_none(*base))) { \ + ulong pgtbl = get_zeroed_page(flags); \ + phys_addr_t pgtbl_pa; \ + \ + if (!pgtbl) \ + return NULL; \ + \ + pgtbl_pa = __pa(pgtbl); \ + \ + if (cmpxchg((ulong *)base, 0, \ + pgtbl_pa | _PAGE_TABLE) != 0) { \ + free_page(pgtbl); \ + goto out; \ + } \ + \ + mm_inc_nr_##level##s(asi->mm); \ + } \ +out: \ + VM_BUG_ON(base##_leaf(*base)); \ + return level##_offset(base, addr); \ +} + +DEFINE_ASI_PGTBL_ALLOC(pgd, p4d) +DEFINE_ASI_PGTBL_ALLOC(p4d, pud) +DEFINE_ASI_PGTBL_ALLOC(pud, pmd) +DEFINE_ASI_PGTBL_ALLOC(pmd, pte) + void __init asi_check_boottime_disable(void) { bool enabled = IS_ENABLED(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION_DEFAULT_ON); From patchwork Fri Jan 10 18:40:37 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856331 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3197F21858C for ; Fri, 10 Jan 2025 18:41:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534489; cv=none; b=sbcgHiRPm1f8EFWUqifzwuV7gtIAGXsozVN/z0xzCJpRFuXi/Vo79SVyDErv+WDIxTrdgqBgLZPhgNhSebsPGqCOhWVFdmCtbV0KcYBkDh4wTN0R9x8L2jm/ZGl7cMX5Q79eFj1wyu6XInsziDmuZr4Z0e+D2L3d9FFa40a33Ss= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534489; c=relaxed/simple; bh=7B5/myGehuz/40qZtIhxDkAaMjV6lpikJmkMMEkOi+o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NoXQ+cYt2PsOX0lFvq/08nbV0LBgR8eqiVCvWF2nhTii+E0wkuXD7chLdKLcMoi7CnVMjKw+yUmIYWr2NlIuspb0VI/GifsqEzvNR5bdVxoW2WntQdO67xYUH+u/voYkj5EZ7EyVeyJAmg6PXhIJwb25MtUz/iJ04P7gblftMjY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=31l/Ojcs; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="31l/Ojcs" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4361eb83f46so20338835e9.3 for ; Fri, 10 Jan 2025 10:41:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534471; x=1737139271; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9R39fCT8BkGaT3ojJEFUEFWnag6aLMpEmNSF+NNhCcA=; b=31l/Ojcsje8DvVR5b2OkicF0Lrp0mGHubAk1nIbS/xoPrdax8IxHKa23NFqC/O1/t7 Io3xortfGH9M52KbPRCRa89IPWQGUdfrfJY6iiNIalsceyCAP9fRX50dAdho6mXZ4qMW HJulgHvHw620fghM6wtBTPemIePZUFXF6ViYLRhY03Q2QVD1sBqE+Zc/iCEbgI4o4b0D YeoS5bwKsTlnyg4zVRhEuldI2QFEwS9QShX3pnzADmotGqIYQECtRviAIZIBrq0QWi9S hsIZPmwdpH/Qso9RPQcGDEi29c0W0ng7Zv/7sWfdwVrBy2aDBK8UB9Pui08QDo5/HYD6 G2oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534471; x=1737139271; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9R39fCT8BkGaT3ojJEFUEFWnag6aLMpEmNSF+NNhCcA=; b=OVN7iOydJD7lFg3ycNEWZ2c1idf+3EV1CISXiRdVdcadjL1bosDo2ObLPCriQ/zGOI XhPCiLXpwCX4U98Ig4HsEgDh0CNXHDlRMSuuVPbHbKUKq5Z8olFYbtc3GP9lQUD7Hzsv SvuhfKnOufnNRMxPe7w8/ifFCivZ6zJ4Nh1DwIVkQDhEHadUUrksVf6mBWk/vhb56OEf sT4yUa3CxYQsmymOgKieli58GtoMhCiKjrj2P5jyXwSmgplrclUT7l4mdDQ1bUKp3bRL rPH6TOGMf+WgIjniSXaL23Ow2IPYTbIQ3g89ayU5vurqyQzN+ALYTqNaqz/EqEwMtWEI lHtw== X-Forwarded-Encrypted: i=1; AJvYcCW8SlSba836lOTvYec4peJ8Hn2qpQmFo/b8TA2+6hLtWJ/6lGRutjrDE6HWUJH2m7jCI4e6PX7kExA=@vger.kernel.org X-Gm-Message-State: AOJu0YwMxyd3nRlhjIVKWmpYj8wXP70LWkKp7WCuuMGWGhzNwlh3Gqcw dtpBAsma7sW/tX48ELbTwROYbQXi1f2q0BJFWPHBCLhXUJaxd6MN6ziVyiJW204rjHVyX9z2tG8 Nof8X6q6U7A== X-Google-Smtp-Source: AGHT+IFAF35eNkOAcz1ZyY8tB2zfXGnUuByWneOp6LmVFLX6PovKFbMifWci/fMsVQav8K8GF97UZk3GG6JG8w== X-Received: from wmbfl22.prod.google.com ([2002:a05:600c:b96:b0:436:6fa7:621]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:310c:b0:436:840b:2593 with SMTP id 5b1f17b1804b1-436e26ad50emr117815595e9.15.1736534470650; Fri, 10 Jan 2025 10:41:10 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:37 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-11-8419288bc805@google.com> Subject: [PATCH RFC v2 11/29] mm: asi: Functions to map/unmap a memory range into ASI page tables From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Junaid Shahid , Kevin Cheng From: Junaid Shahid Two functions, asi_map() and asi_map_gfp(), are added to allow mapping memory into ASI page tables. The mapping will be identical to the one for the same virtual address in the unrestricted page tables. This is necessary to allow switching between the page tables at any arbitrary point in the kernel. Another function, asi_unmap() is added to allow unmapping memory mapped via asi_map* RFC Notes: Don't read too much into the implementation of this, lots of it should probably be rewritten. It also needs to gain support for partial unmappings. Checkpatch-args: --ignore=MACRO_ARG_UNUSED Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman Signed-off-by: Kevin Cheng --- arch/x86/include/asm/asi.h | 5 + arch/x86/mm/asi.c | 236 ++++++++++++++++++++++++++++++++++++++++++++- arch/x86/mm/tlb.c | 5 + include/asm-generic/asi.h | 11 +++ include/linux/pgtable.h | 3 + mm/internal.h | 2 + mm/vmalloc.c | 32 +++--- 7 files changed, 280 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index a55e73f1b2bc84c41b9ab25f642a4d5f1aa6ba90..33f18be0e268b3a6725196619cbb8d847c21e197 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -157,6 +157,11 @@ void asi_relax(void); /* Immediately exit the restricted address space if in it */ void asi_exit(void); +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags); +int asi_map(struct asi *asi, void *addr, size_t len); +void asi_unmap(struct asi *asi, void *addr, size_t len); +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len); + static inline void asi_init_thread_state(struct thread_struct *thread) { thread->asi_state.intr_nest_depth = 0; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index b15d043acedc9f459f17e86564a15061650afc3a..f2d8fbc0366c289891903e1c2ac6c59b9476d95f 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -11,6 +11,9 @@ #include #include #include +#include + +#include "../../../mm/internal.h" static struct asi_taint_policy *taint_policies[ASI_MAX_NUM_CLASSES]; @@ -100,7 +103,6 @@ const char *asi_class_name(enum asi_class_id class_id) */ static_assert(!IS_ENABLED(CONFIG_PARAVIRT)); #define DEFINE_ASI_PGTBL_ALLOC(base, level) \ -__maybe_unused \ static level##_t * asi_##level##_alloc(struct asi *asi, \ base##_t *base, ulong addr, \ gfp_t flags) \ @@ -455,3 +457,235 @@ void asi_handle_switch_mm(void) this_cpu_or(asi_taints, new_taints); this_cpu_and(asi_taints, ~(ASI_TAINTS_GUEST_MASK | ASI_TAINTS_USER_MASK)); } + +static bool is_page_within_range(unsigned long addr, unsigned long page_size, + unsigned long range_start, unsigned long range_end) +{ + unsigned long page_start = ALIGN_DOWN(addr, page_size); + unsigned long page_end = page_start + page_size; + + return page_start >= range_start && page_end <= range_end; +} + +static bool follow_physaddr( + pgd_t *pgd_table, unsigned long virt, + phys_addr_t *phys, unsigned long *page_size, ulong *flags) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + /* RFC: This should be rewritten with lookup_address_in_*. */ + + *page_size = PGDIR_SIZE; + pgd = pgd_offset_pgd(pgd_table, virt); + if (!pgd_present(*pgd)) + return false; + if (pgd_leaf(*pgd)) { + *phys = PFN_PHYS(pgd_pfn(*pgd)) | (virt & ~PGDIR_MASK); + *flags = pgd_flags(*pgd); + return true; + } + + *page_size = P4D_SIZE; + p4d = p4d_offset(pgd, virt); + if (!p4d_present(*p4d)) + return false; + if (p4d_leaf(*p4d)) { + *phys = PFN_PHYS(p4d_pfn(*p4d)) | (virt & ~P4D_MASK); + *flags = p4d_flags(*p4d); + return true; + } + + *page_size = PUD_SIZE; + pud = pud_offset(p4d, virt); + if (!pud_present(*pud)) + return false; + if (pud_leaf(*pud)) { + *phys = PFN_PHYS(pud_pfn(*pud)) | (virt & ~PUD_MASK); + *flags = pud_flags(*pud); + return true; + } + + *page_size = PMD_SIZE; + pmd = pmd_offset(pud, virt); + if (!pmd_present(*pmd)) + return false; + if (pmd_leaf(*pmd)) { + *phys = PFN_PHYS(pmd_pfn(*pmd)) | (virt & ~PMD_MASK); + *flags = pmd_flags(*pmd); + return true; + } + + *page_size = PAGE_SIZE; + pte = pte_offset_map(pmd, virt); + if (!pte) + return false; + + if (!pte_present(*pte)) { + pte_unmap(pte); + return false; + } + + *phys = PFN_PHYS(pte_pfn(*pte)) | (virt & ~PAGE_MASK); + *flags = pte_flags(*pte); + + pte_unmap(pte); + return true; +} + +/* + * Map the given range into the ASI page tables. The source of the mapping is + * the regular unrestricted page tables. Can be used to map any kernel memory. + * + * The caller MUST ensure that the source mapping will not change during this + * function. For dynamic kernel memory, this is generally ensured by mapping the + * memory within the allocator. + * + * If this fails, it may leave partial mappings behind. You must asi_unmap them, + * bearing in mind asi_unmap's requirements on the calling context. Part of the + * reason for this is that we don't want to unexpectedly undo mappings that + * weren't created by the present caller. + * + * If the source mapping is a large page and the range being mapped spans the + * entire large page, then it will be mapped as a large page in the ASI page + * tables too. If the range does not span the entire huge page, then it will be + * mapped as smaller pages. In that case, the implementation is slightly + * inefficient, as it will walk the source page tables again for each small + * destination page, but that should be ok for now, as usually in such cases, + * the range would consist of a small-ish number of pages. + * + * RFC: * vmap_p4d_range supports huge mappings, we can probably use that now. + */ +int __must_check asi_map_gfp(struct asi *asi, void *addr, unsigned long len, gfp_t gfp_flags) +{ + unsigned long virt; + unsigned long start = (size_t)addr; + unsigned long end = start + len; + unsigned long page_size; + + if (!static_asi_enabled()) + return 0; + + VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE)); + VM_BUG_ON(!IS_ALIGNED(len, PAGE_SIZE)); + /* RFC: fault_in_kernel_space should be renamed. */ + VM_BUG_ON(!fault_in_kernel_space(start)); + + gfp_flags &= GFP_RECLAIM_MASK; + + if (asi->mm != &init_mm) + gfp_flags |= __GFP_ACCOUNT; + + for (virt = start; virt < end; virt = ALIGN(virt + 1, page_size)) { + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + phys_addr_t phys; + ulong flags; + + if (!follow_physaddr(asi->mm->pgd, virt, &phys, &page_size, &flags)) + continue; + +#define MAP_AT_LEVEL(base, BASE, level, LEVEL) { \ + if (base##_leaf(*base)) { \ + if (WARN_ON_ONCE(PHYS_PFN(phys & BASE##_MASK) !=\ + base##_pfn(*base))) \ + return -EBUSY; \ + continue; \ + } \ + \ + level = asi_##level##_alloc(asi, base, virt, gfp_flags);\ + if (!level) \ + return -ENOMEM; \ + \ + if (page_size >= LEVEL##_SIZE && \ + (level##_none(*level) || level##_leaf(*level)) && \ + is_page_within_range(virt, LEVEL##_SIZE, \ + start, end)) { \ + page_size = LEVEL##_SIZE; \ + phys &= LEVEL##_MASK; \ + \ + if (!level##_none(*level)) { \ + if (WARN_ON_ONCE(level##_pfn(*level) != \ + PHYS_PFN(phys))) { \ + return -EBUSY; \ + } \ + } else { \ + set_##level(level, \ + __##level(phys | flags)); \ + } \ + continue; \ + } \ + } + + pgd = pgd_offset_pgd(asi->pgd, virt); + + MAP_AT_LEVEL(pgd, PGDIR, p4d, P4D); + MAP_AT_LEVEL(p4d, P4D, pud, PUD); + MAP_AT_LEVEL(pud, PUD, pmd, PMD); + /* + * If a large page is going to be partially mapped + * in 4k pages, convert the PSE/PAT bits. + */ + if (page_size >= PMD_SIZE) + flags = protval_large_2_4k(flags); + MAP_AT_LEVEL(pmd, PMD, pte, PAGE); + + VM_BUG_ON(true); /* Should never reach here. */ + } + + return 0; +#undef MAP_AT_LEVEL +} + +int __must_check asi_map(struct asi *asi, void *addr, unsigned long len) +{ + return asi_map_gfp(asi, addr, len, GFP_KERNEL); +} + +/* + * Unmap a kernel address range previously mapped into the ASI page tables. + * + * The area being unmapped must be a whole previously mapped region (or regions) + * Unmapping a partial subset of a previously mapped region is not supported. + * That will work, but may end up unmapping more than what was asked for, if + * the mapping contained huge pages. A later patch will remove this limitation + * by splitting the huge mapping in the ASI page table in such a case. For now, + * vunmap_pgd_range() will just emit a warning if this situation is detected. + * + * This might sleep, and cannot be called with interrupts disabled. + */ +void asi_unmap(struct asi *asi, void *addr, size_t len) +{ + size_t start = (size_t)addr; + size_t end = start + len; + pgtbl_mod_mask mask = 0; + + if (!static_asi_enabled() || !len) + return; + + VM_BUG_ON(start & ~PAGE_MASK); + VM_BUG_ON(len & ~PAGE_MASK); + VM_BUG_ON(!fault_in_kernel_space(start)); /* Misnamed, ignore "fault_" */ + + vunmap_pgd_range(asi->pgd, start, end, &mask); + + /* We don't support partial unmappings. */ + if (mask & PGTBL_P4D_MODIFIED) { + VM_WARN_ON(!IS_ALIGNED((ulong)addr, P4D_SIZE)); + VM_WARN_ON(!IS_ALIGNED((ulong)len, P4D_SIZE)); + } else if (mask & PGTBL_PUD_MODIFIED) { + VM_WARN_ON(!IS_ALIGNED((ulong)addr, PUD_SIZE)); + VM_WARN_ON(!IS_ALIGNED((ulong)len, PUD_SIZE)); + } else if (mask & PGTBL_PMD_MODIFIED) { + VM_WARN_ON(!IS_ALIGNED((ulong)addr, PMD_SIZE)); + VM_WARN_ON(!IS_ALIGNED((ulong)len, PMD_SIZE)); + } + + asi_flush_tlb_range(asi, addr, len); +} diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index c41e083c5b5281684be79ad0391c1a5fc7b0c493..c55733e144c7538ce7f97b74ea2b1b9c22497c32 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1040,6 +1040,11 @@ noinstr u16 asi_pcid(struct asi *asi, u16 asid) // return kern_pcid(asid) | ((asi->index + 1) << X86_CR3_ASI_PCID_BITS_SHIFT); } +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +{ + flush_tlb_kernel_range((ulong)addr, (ulong)addr + len); +} + #else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ u16 asi_pcid(struct asi *asi, u16 asid) { return kern_pcid(asid); } diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index f777a6cf604b0656fb39087f6eba08f980b2cb6f..5be8f7d657ba0bc2196e333f62b084d0c9eef7b6 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -77,6 +77,17 @@ static inline int asi_intr_nest_depth(void) { return 0; } static inline void asi_intr_exit(void) { } +static inline int asi_map(struct asi *asi, void *addr, size_t len) +{ + return 0; +} + +static inline +void asi_unmap(struct asi *asi, void *addr, size_t len) { } + +static inline +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } + #define static_asi_enabled() false static inline void asi_check_boottime_disable(void) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e8b2ac6bd2ae3b0a768734c8411f45a7d162e12d..492a9cdee7ff3d4e562c4bf508dc14fd7fa67e36 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1900,6 +1900,9 @@ typedef unsigned int pgtbl_mod_mask; #ifndef pmd_leaf #define pmd_leaf(x) false #endif +#ifndef pte_leaf +#define pte_leaf(x) 1 +#endif #ifndef pgd_leaf_size #define pgd_leaf_size(x) (1ULL << PGDIR_SHIFT) diff --git a/mm/internal.h b/mm/internal.h index 64c2eb0b160e169ab9134e3ab618d8a1d552d92c..c0454fe019b9078a963b1ab3685bf31ccfd768b7 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -395,6 +395,8 @@ void unmap_page_range(struct mmu_gather *tlb, void page_cache_ra_order(struct readahead_control *, struct file_ra_state *, unsigned int order); void force_page_cache_ra(struct readahead_control *, unsigned long nr); +void vunmap_pgd_range(pgd_t *pgd_table, unsigned long addr, unsigned long end, + pgtbl_mod_mask *mask); static inline void force_page_cache_readahead(struct address_space *mapping, struct file *file, pgoff_t index, unsigned long nr_to_read) { diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 634162271c0045965eabd9bfe8b64f4a1135576c..8d260f2174fe664b54dcda054cb9759ae282bf03 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -427,6 +427,24 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, } while (p4d++, addr = next, addr != end); } +void vunmap_pgd_range(pgd_t *pgd_table, unsigned long addr, unsigned long end, + pgtbl_mod_mask *mask) +{ + unsigned long next; + pgd_t *pgd = pgd_offset_pgd(pgd_table, addr); + + BUG_ON(addr >= end); + + do { + next = pgd_addr_end(addr, end); + if (pgd_bad(*pgd)) + *mask |= PGTBL_PGD_MODIFIED; + if (pgd_none_or_clear_bad(pgd)) + continue; + vunmap_p4d_range(pgd, addr, next, mask); + } while (pgd++, addr = next, addr != end); +} + /* * vunmap_range_noflush is similar to vunmap_range, but does not * flush caches or TLBs. @@ -441,21 +459,9 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, */ void __vunmap_range_noflush(unsigned long start, unsigned long end) { - unsigned long next; - pgd_t *pgd; - unsigned long addr = start; pgtbl_mod_mask mask = 0; - BUG_ON(addr >= end); - pgd = pgd_offset_k(addr); - do { - next = pgd_addr_end(addr, end); - if (pgd_bad(*pgd)) - mask |= PGTBL_PGD_MODIFIED; - if (pgd_none_or_clear_bad(pgd)) - continue; - vunmap_p4d_range(pgd, addr, next, &mask); - } while (pgd++, addr = next, addr != end); + vunmap_pgd_range(init_mm.pgd, start, end, &mask); if (mask & ARCH_PAGE_TABLE_SYNC_MASK) arch_sync_kernel_mappings(start, end); From patchwork Fri Jan 10 18:40:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856328 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39ED521C163 for ; Fri, 10 Jan 2025 18:41:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534505; cv=none; b=ihs3Cw+omuJMluHklYWOKmJvhBf4mwn8pIx9+2k5KpeL6J2ofewFQft/vfdfNhcTyaMdorw8j9S04vP53nk/cN4fJxkp0mP0GimFrqSMph2ELE8XDimoAm7uSL1ye9UuRWdlJZb6qA+w0Y02TtOJtVf8avPpkWeT+kNVSOSjloA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534505; c=relaxed/simple; bh=ZTZcWQBF8tUo+J/JUKmX7ofVEYD/oEfP//Lw5p3vMVE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=CYEEhgyy9dGTGm7IbzBHRgXNSogvOf2qq783MiqyCehVKfe1yb3xbUJ8KbVYw28nY2toGBFjcpMMvycBawj3BaNo3p2RrDUYp7AM3SdNjcd9xhPQHVOGHhk60xZSE8cKolO+euEK3KqfG5p1EiWFxdHm/jgHBEaqr2k2mmFyXWA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rN5YQ4Qg; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rN5YQ4Qg" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4361f371908so16547475e9.0 for ; Fri, 10 Jan 2025 10:41:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534475; x=1737139275; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=DqQy0/uo7rp5xt5hxwNN280PfwFL6RX7EFGPCdWY0es=; b=rN5YQ4Qgxne0RHdvEWxSnVWuekoLjrSgQpBMyjzlbrZYm1LmGg03rDRNufoGUgrqm1 fRjnJsRo1w7eHCzSXj/ofLjpXz7q/v85GiJkPwg3IvDMvefS2fIGQzD5bW6VBiKviLbO 87+Wrmprat2mQXyZjSWVadja9OF6qD1bXpO4ip73dKrL+K3EAHnaA+XbztJKV8wDoLON kcfkxi0KRxH81TX31k3jtBEx+ylXA2d/tEqiEY55nTCW+7RKk5wNZbT1mqLmIsaWEQgn GECj3ff34jj4uN41yAzH+erWgR/dDYYb5GbUFspw5BeL/Ys7QkcazE3NWX4+YNgn+7Cl M3ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534475; x=1737139275; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DqQy0/uo7rp5xt5hxwNN280PfwFL6RX7EFGPCdWY0es=; b=TSuI9vCE3ey9jjKk6X6u88jMAi77xaZ3shtYWeG7hxz5j/OC9qTrHoG6Lp6b+x7bpy tguOnlNFSZN7myh7W1TjKV7NQUHzb1f9eS4xY8qd8trgj/NEsGd72GLQfHq8XBB1WtCV /HPKc+Bd3BdEXWxvU/8OTCeiYfoXTmZMv5TY0/2PTP88hZ7yj30tn8/ZDn2cn2C7uZrG OUlcDY2PyTDk7JX6e00x8xOh27cEbbUoA2I7nNAncooAzuH03mZVC3jYrC2wGRST47tm wfs8/mv9lcHl31nQyzqMNVzZCoocUr0Ju02RbETdTnCmkVEdAAalDgVCY+GsxOHICtGS 8sRw== X-Forwarded-Encrypted: i=1; AJvYcCUZHjjLRYnm0qfzbpLDRMUx1crjSImucpmoFdaV4bAnp8TCDRO1lME5XK3kKXN2CJn6PpHXN1q0YUY=@vger.kernel.org X-Gm-Message-State: AOJu0YyqLHlYFpFWkdTqrtfqrQhxfgHJGTNbhr/E/Nn5XlXwoOOK6qu/ BRWG7uh4NA/cp9RlJaotHEwRxpmsMYzzsc89Ps9nzeNewWO0TihTzh+tJrX5rVad6A/qvdeiGZs uenh2Xu5kVQ== X-Google-Smtp-Source: AGHT+IEHMyjAf4Ii6uANDaoYyo5ZkF+D1/fwKOgm4kaaSXfegc+L5vVxTYn6WIEloGlJDmt93RFbSdb2jTZqKA== X-Received: from wmbjt19.prod.google.com ([2002:a05:600c:5693:b0:435:4bd2:1dcd]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4e44:b0:434:e9ee:c3d with SMTP id 5b1f17b1804b1-436e27070b1mr106983675e9.20.1736534474969; Fri, 10 Jan 2025 10:41:14 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:39 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-13-8419288bc805@google.com> Subject: [PATCH RFC v2 13/29] mm: Add __PAGEFLAG_FALSE From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman __PAGEFLAG_FALSE is a non-atomic equivalent of PAGEFLAG_FALSE. Checkpatch-args: --ignore=COMPLEX_MACRO Signed-off-by: Brendan Jackman --- include/linux/page-flags.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index cc839e4365c18223e68c35efd0f67e7650708e8b..7ee9a0edc6d21708fc93dfa8913dc1ae9478dee3 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -484,6 +484,10 @@ static inline int Page##uname(const struct page *page) { return 0; } FOLIO_SET_FLAG_NOOP(lname) \ static inline void SetPage##uname(struct page *page) { } +#define __SETPAGEFLAG_NOOP(uname, lname) \ +static inline void __folio_set_##lname(struct folio *folio) { } \ +static inline void __SetPage##uname(struct page *page) { } + #define CLEARPAGEFLAG_NOOP(uname, lname) \ FOLIO_CLEAR_FLAG_NOOP(lname) \ static inline void ClearPage##uname(struct page *page) { } @@ -506,6 +510,9 @@ static inline int TestClearPage##uname(struct page *page) { return 0; } #define TESTSCFLAG_FALSE(uname, lname) \ TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname) +#define __PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname, lname) \ + __SETPAGEFLAG_NOOP(uname, lname) __CLEARPAGEFLAG_NOOP(uname, lname) + __PAGEFLAG(Locked, locked, PF_NO_TAIL) FOLIO_FLAG(waiters, FOLIO_HEAD_PAGE) FOLIO_FLAG(referenced, FOLIO_HEAD_PAGE) From patchwork Fri Jan 10 18:40:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856329 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0CCD22371A for ; Fri, 10 Jan 2025 18:41:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534501; cv=none; b=LiuM0LQHx4Y749qOhKujTjaWdLThpIe13G3c/A1TPaorFf7qGhhexY5u84Rf6NxkMv3vXuiGZPVWYPe/ohB0ab/3rehUjXyuijz8u+i/bmxShZag7bfr6IrfvxCbS5Q+8JN91fx7e6K5W1N4IRRSSSLnO8wvq8x8m3Tc0MSKe9Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534501; c=relaxed/simple; bh=SZWSB4xfWiJGdbbRiBIwnZ68GYA4NlfRz2uMWEeyAjw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jSrOupIbMydz6hNYyQSwJGoZTavywu6YDpFvOs6CivaSdidt1Ba7rrQ+8wmii7NBC1o171+vZB5xq6acpuH9ZMhFFf3Hdt4ZPbgd1Icnk6Dx9MNJWpC5ryuOsE0iC5qiJOjMRY7UcjUO0qMYqRIYk6SXQLyFDEOVXC8LYyfnZHs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=iDkGrgTK; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iDkGrgTK" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-3862b364578so1612481f8f.1 for ; Fri, 10 Jan 2025 10:41:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534480; x=1737139280; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Aw/CnxStV8mvixiIOCl4TLOf7+c8cjSAdFrfDhkuuHg=; b=iDkGrgTKHVUa34+ZKmxOSc0xwlurss6RE8/r+LqYpqitJzMkpPySm7IZKpZHys9em2 pnj9244eqbH46mK1dAP3AIse9cR2NBErzZbkf2d9DcAGpNvZh7PiY4LCjtICGW9V/YRP cJYr7hXel2n3RMuEQzGuA8h9TVs/fkUzZenWMztEJLjyxdwVOR7EjfUkybgTay/Xb+uk Y/Tst3A3W6yIWuUi3EA2NI4A8NYG4KT/8HSFxSt79sr41pHhJciLC7MjYC3EhXz+on3e fMuQWtQINtB6NLiLlGtQ9U3H3PCaK7LP5AhdGJvCvg46xw9SQTssN/MolzzES9pFiX54 /Wdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534480; x=1737139280; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Aw/CnxStV8mvixiIOCl4TLOf7+c8cjSAdFrfDhkuuHg=; b=LlNZFLR/UnwshR0X9um2HRF0LYNIreRe9JKnFPGoQqxTZP07tJ7A6LfihRhNu/pG3e Cp/QND4yBvyi3YHlkTJlhHjAfSpCr6jM2scqLo7B2Hc+Dei7v3prb0GfHGCy++BqTWuc xMs7dYFxugjku48hKvOHWbql2tJCUq/6BW1DsMI5gge4mcnCxUpW853KOHXr1cFuG3G4 BX1WUQLhgOak9TXpAivZ334Sx3I9FMTSR4EE/yz1gHrSiKOAcmVIbFm3rZ6ASfBQyQjn 4cXnThq102QECuDvnZ14nbuCZim704J2c+3WvtC+Ls7llY/wfPsmlZL5KraCsy6IFGmL mnjQ== X-Forwarded-Encrypted: i=1; AJvYcCX5U5TMo/l4Ixkqjg9c7CpABW57QEe5WVTxL9FQ5Q3CN1Uhv1JqVJH4XkBBoCr7z+hNnZM1W62Lwgk=@vger.kernel.org X-Gm-Message-State: AOJu0Yz8ctFmxRgDOzCHuYc6mpGUUGmYNV2nfQhO9i2gi22nN059bTyL Kqz6+XjrtEhjMNYg7Q5eWrUzz9XYOS2h4xyOZ87jN8lisdtv2Jy00SQbNlzRzHV9c0nPiFMO/Bj vlSnNMJX18w== X-Google-Smtp-Source: AGHT+IEfIL+1aAzwB4pbtK9eyCgzXtWnxn8Tg5skOTY2l9Jdg/lcArfVr7lNF3vncmrBkENqt6o7go6KThD7+Q== X-Received: from wmqe1.prod.google.com ([2002:a05:600c:4e41:b0:434:a050:ddcf]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:1547:b0:386:34af:9bae with SMTP id ffacd0b85a97d-38a8b0b7fd0mr6975597f8f.4.1736534479521; Fri, 10 Jan 2025 10:41:19 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:41 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-15-8419288bc805@google.com> Subject: [PATCH TEMP WORKAROUND RFC v2 15/29] mm: asi: Workaround missing partial-unmap support From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman This is a hack, no need to review it carefully. asi_unmap() doesn't currently work unless it corresponds exactly to an asi_map() of the exact same region. This is mostly harmless (it's only a functional problem if you wanna touch those pages from the ASI critical section) but it's messy. For now, working around the only practical case that appears by moving the asi_map call up the call stack in the page allocator, to the place where we know the actual size the mapping is supposed to end up at. This just removes the main case where that happens. Later, a proper solution for partial unmaps will be needed. Signed-off-by: Brendan Jackman --- mm/page_alloc.c | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3e98fdfbadddb1f7d71e9e050b63255b2008d167..f96e95032450be90b6567f67915b0b941fc431d8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4604,22 +4604,20 @@ void __init page_alloc_init_asi(void) } } -static int asi_map_alloced_pages(struct page *page, uint order, gfp_t gfp_mask) +static int asi_map_alloced_pages(struct page *page, size_t size, gfp_t gfp_mask) { if (!static_asi_enabled()) return 0; if (!(gfp_mask & __GFP_SENSITIVE)) { - int err = asi_map_gfp( - ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), - PAGE_SIZE * (1 << order), gfp_mask); + int err = asi_map_gfp(ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), size, gfp_mask); uint i; if (err) return err; - for (i = 0; i < (1 << order); i++) + for (i = 0; i < (size >> PAGE_SHIFT); i++) __SetPageGlobalNonSensitive(page + i); } @@ -4629,7 +4627,7 @@ static int asi_map_alloced_pages(struct page *page, uint order, gfp_t gfp_mask) #else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ static inline -int asi_map_alloced_pages(struct page *pages, uint order, gfp_t gfp_mask) +int asi_map_alloced_pages(struct page *pages, size_t size, gfp_t gfp_mask) { return 0; } @@ -4896,7 +4894,7 @@ struct page *__alloc_pages_noprof(gfp_t gfp, unsigned int order, trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); kmsan_alloc_page(page, order, alloc_gfp); - if (page && unlikely(asi_map_alloced_pages(page, order, gfp))) { + if (page && unlikely(asi_map_alloced_pages(page, PAGE_SIZE << order, gfp))) { __free_pages(page, order); page = NULL; } @@ -5118,12 +5116,13 @@ void page_frag_free(void *addr) } EXPORT_SYMBOL(page_frag_free); -static void *make_alloc_exact(unsigned long addr, unsigned int order, - size_t size) +static void *finish_exact_alloc(unsigned long addr, unsigned int order, + size_t size, gfp_t gfp_mask) { if (addr) { unsigned long nr = DIV_ROUND_UP(size, PAGE_SIZE); struct page *page = virt_to_page((void *)addr); + struct page *first = page; struct page *last = page + nr; split_page_owner(page, order, 0); @@ -5132,9 +5131,22 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order, while (page < --last) set_page_refcounted(last); - last = page + (1UL << order); + last = page + (1 << order); for (page += nr; page < last; page++) __free_pages_ok(page, 0, FPI_TO_TAIL); + + /* + * ASI doesn't support partially undoing calls to asi_map, so + * we can only safely free sub-allocations if they were made + * with __GFP_SENSITIVE in the first place. Users of this need + * to map with forced __GFP_SENSITIVE and then here we'll make a + * second asi_map_alloced_pages() call to do any mapping that's + * necessary, but with the exact size. + */ + if (unlikely(asi_map_alloced_pages(first, nr << PAGE_SHIFT, gfp_mask))) { + free_pages_exact(first, size); + return NULL; + } } return (void *)addr; } @@ -5162,8 +5174,8 @@ void *alloc_pages_exact_noprof(size_t size, gfp_t gfp_mask) if (WARN_ON_ONCE(gfp_mask & (__GFP_COMP | __GFP_HIGHMEM))) gfp_mask &= ~(__GFP_COMP | __GFP_HIGHMEM); - addr = get_free_pages_noprof(gfp_mask, order); - return make_alloc_exact(addr, order, size); + addr = get_free_pages_noprof(gfp_mask | __GFP_SENSITIVE, order); + return finish_exact_alloc(addr, order, size, gfp_mask); } EXPORT_SYMBOL(alloc_pages_exact_noprof); @@ -5187,10 +5199,10 @@ void * __meminit alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_ma if (WARN_ON_ONCE(gfp_mask & (__GFP_COMP | __GFP_HIGHMEM))) gfp_mask &= ~(__GFP_COMP | __GFP_HIGHMEM); - p = alloc_pages_node_noprof(nid, gfp_mask, order); + p = alloc_pages_node_noprof(nid, gfp_mask | __GFP_SENSITIVE, order); if (!p) return NULL; - return make_alloc_exact((unsigned long)page_address(p), order, size); + return finish_exact_alloc((unsigned long)page_address(p), order, size, gfp_mask); } /** From patchwork Fri Jan 10 18:40:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856327 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9639D225785 for ; Fri, 10 Jan 2025 18:41:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534515; cv=none; b=nGFnfKCRcxMTauxF+oQWI/5fJLTycUUqkbmI1CrHUnTHhpdEIK779JBUeSpeXA25+9KPKLuAr3fH/sIazIK7GUHvjS7X66zeCv4VcUerRqhNztdUB5cBQbHZKy3v0rjlVvfjLvsCWWcY0NAhkPrk+oay+yMB0mKxSJiCDZ8vDbc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534515; c=relaxed/simple; bh=kGzuSX53lHwpbQyPS7sz1yo/F9Y/VcabtEWf2Y1h4vY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=AHHLLEcPn5VF2GGT+YaA6f/AR8NuPdgGlmt9hUV8T0ejV2z8msfIFx+mKakK+re/DQQWyfReEZDYN2B9CDOebzmKrHkolke1VPK6KoVmXtT0uK6NYgyddznuRNzavmmTYZzr4X0ANK/zKNsAVHITn63BEUNEa4bHvInxtJ0xLAU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CYy/My5o; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CYy/My5o" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-385d7611ad3so1373002f8f.2 for ; Fri, 10 Jan 2025 10:41:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534491; x=1737139291; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=z5b6fu2IPJ4vMWdeJyvKaXZwCsIFdjwPZGonrVH8VuY=; b=CYy/My5odBGanUx9mJU448Z1FppXeL+BCJYcZBer9Hi3Ylw9Y3F31R6YtnmO/CvnJr YIlR3o2jdCU2FFCKXcJlsCKlT+NH163A8E1XnR6zQsRSAsWcYQHXgnD+oCR5pbrhH2FL uwolnqS5smZbHt4uLtRCyHgCjVnljI8Oo85sv/Kn+g0Zkm31xTSV4ajxMqZvPgRORD3+ mxzHiZjq8FUBv71L72jBXhJ1CQt+r9iLMpFUP0Sw/Mo+Ic2VxYNLanMom0NlPpHQf4su luM10YHPmsYXcGhc4e3RKmpY7tpw/RKXVqZkS80gOOVVaS4eqbviVv96Wf7/ybSyeTQe bZQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534491; x=1737139291; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=z5b6fu2IPJ4vMWdeJyvKaXZwCsIFdjwPZGonrVH8VuY=; b=pQtosnspW63ST8uDD+nq2sIfDHNPQUd+J1rTHPyLly82xxe2+fCxz1bWrkNiuRSFB9 zcHoqeALLGK1fpk0iAxMQAvBhDdftQnrf3ykxyeNrImlFfkFUsnc3AVYsfP79wmnEkxr 8BWOYnHyqcO18s3FjzeKgbhRR3LYQAy05zxKXFEM+Aud2acj5N5+o///ViziLIc5IcKZ WbauCZbpXS7Xm/41hUz1uZ/UtMKPSRn3F09FlWbPwZBeCOO0PIhaDoLDYJrtcfYRgY3h SDhAwydQE6zF86jk1NaT9S7j6nGnGQl0Gcyi72IP25dDb+h8TGFHw6xS98QYDJoHy6y8 ONJw== X-Forwarded-Encrypted: i=1; AJvYcCX7N3hFW/+rTvVw15wLzdgKJB/RAFK8BQ2Vsr6wvkzuqS/A2u1W1EKMe01eSbtWz0k0fwjAA4vjylw=@vger.kernel.org X-Gm-Message-State: AOJu0Yzwc82WZQY472lbY05fwh5L9z9YCMpEA5AYsoTmKEydWYNOQgjh /2Bsx8pd/A55Pqe7lASaAyJqwJzR7Q9eiqScg2yY3L7nzUqw+zP9xFXcFVcy0hD0hWNRA1EEnkh WGkBRGEYFfQ== X-Google-Smtp-Source: AGHT+IEE74PFwgraC6zRNOcME3rOW4swKtJd/bKE4AIl8MRowFtdpWNq19ZBTTKPz5sLGPxmEc1GHABVlwipiQ== X-Received: from wmjv9.prod.google.com ([2002:a7b:cb49:0:b0:434:f173:a51]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:1a85:b0:38a:4184:1520 with SMTP id ffacd0b85a97d-38a872eb1eamr9947778f8f.27.1736534490587; Fri, 10 Jan 2025 10:41:30 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:46 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-20-8419288bc805@google.com> Subject: [PATCH RFC v2 20/29] mm: asi: Make TLB flushing correct under ASI From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman This is the absolute minimum change for TLB flushing to be correct under ASI. There are two arguably orthogonal changes in here but they feel small enough for a single commit. .:: CR3 stabilization As noted in the comment ASI can destabilize CR3, but we can stabilize it again by calling asi_exit, this makes it safe to read CR3 and write it back. This is enough to be correct - we don't have to worry about invalidating the other ASI address space (i.e. we don't need to invalidate the restricted address space if we are currently unrestricted / vice versa) because we currently never set the noflush bit in CR3 for ASI transitions. Even without using CR3's noflush bit there are trivial optimizations still on the table here: on where invpcid_flush_single_context is available (i.e. with the INVPCID_SINGLE feature) we can use that in lieu of the CR3 read/write, and avoid the extremely costly asi_exit. .:: Invalidating kernel mappings Before ASI, with KPTI off we always either disable PCID or use global mappings for kernel memory. However ASI disables global kernel mappings regardless of factors. So we need to invalidate other address spaces to trigger a flush when we switch into them. Note that there is currently a pointless write of cpu_tlbstate.invalidate_other in the case of KPTI and !PCID. We've added another case of that (ASI, !KPTI and !PCID). I think that's preferable to expanding the conditional in flush_tlb_one_kernel. Signed-off-by: Brendan Jackman --- arch/x86/mm/tlb.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index ce5598f96ea7a84dc0e8623022ab5bfbba401b48..07b1657bee8e4cf17452ea57c838823e76f482c0 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -231,7 +231,7 @@ static void clear_asid_other(void) * This is only expected to be set if we have disabled * kernel _PAGE_GLOBAL pages. */ - if (!static_cpu_has(X86_FEATURE_PTI)) { + if (!static_cpu_has(X86_FEATURE_PTI) && !static_asi_enabled()) { WARN_ON_ONCE(1); return; } @@ -1040,7 +1040,6 @@ static void put_flush_tlb_info(void) noinstr u16 asi_pcid(struct asi *asi, u16 asid) { return kern_pcid(asid) | ((asi->class_id + 1) << X86_CR3_ASI_PCID_BITS_SHIFT); - // return kern_pcid(asid) | ((asi->index + 1) << X86_CR3_ASI_PCID_BITS_SHIFT); } void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) @@ -1192,15 +1191,19 @@ void flush_tlb_one_kernel(unsigned long addr) * use PCID if we also use global PTEs for the kernel mapping, and * INVLPG flushes global translations across all address spaces. * - * If PTI is on, then the kernel is mapped with non-global PTEs, and - * __flush_tlb_one_user() will flush the given address for the current - * kernel address space and for its usermode counterpart, but it does - * not flush it for other address spaces. + * If PTI or ASI is on, then the kernel is mapped with non-global PTEs, + * and __flush_tlb_one_user() will flush the given address for the + * current kernel address space and, if PTI is on, for its usermode + * counterpart, but it does not flush it for other address spaces. */ flush_tlb_one_user(addr); - if (!static_cpu_has(X86_FEATURE_PTI)) + /* Nothing more to do if PTI and ASI are completely off. */ + if (!static_cpu_has(X86_FEATURE_PTI) && !static_asi_enabled()) { + VM_WARN_ON_ONCE(static_cpu_has(X86_FEATURE_PCID) && + !(__default_kernel_pte_mask & _PAGE_GLOBAL)); return; + } /* * See above. We need to propagate the flush to all other address @@ -1289,6 +1292,16 @@ STATIC_NOPV void native_flush_tlb_local(void) invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + /* + * Restricted ASI CR3 is unstable outside of critical section, so we + * couldn't flush via a CR3 read/write. asi_exit() stabilizes it. + * We don't expect any flushes in a critical section. + */ + if (WARN_ON(asi_in_critical_section())) + native_flush_tlb_global(); + else + asi_exit(); + /* If current->mm == NULL then the read_cr3() "borrows" an mm */ native_write_cr3(__native_read_cr3()); } From patchwork Fri Jan 10 18:40:48 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856324 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D07F225A3A for ; Fri, 10 Jan 2025 18:41:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534544; cv=none; b=jPwWBOjQhfUat+8WtjkDmSBXAQb2mzx5f9cvpE8Iev1KaVunM1nW8ZaTt8HEJatjv55fyFUAzXOGICDsTYIdgQfBaLZE2AjlPnuJwBZNnMA5pVSjiggcvcC/hnllwe9s9UXbVuPPHH9yLhdNQEKXCcCZRO+ok4zMhhs9i77P2U8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534544; c=relaxed/simple; bh=GVYtkqY4mowPCZiA8XVd8DkPsgYd6+NdH6iE4GQP/oc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=M0+mqucAQhCvyVx5UIjRkrt00TzpSiHuQun4M/khCMvRJQBvjnrDtsutMr9LWH2o8XVzpXqgWtHp47go7Gy/gx3Y7BXztLRvro2urSxBVbuRIzS262fBtDakT9CoJIp4Vb4XHLqkxAPwSdsXpjRC5xVX3wDbaVaOxUutycy+PL4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aisCrj+c; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aisCrj+c" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4361ac8b25fso12664275e9.2 for ; Fri, 10 Jan 2025 10:41:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534495; x=1737139295; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=DPVLK2dGaKuBzLzuN4/iPTz64M+48MZ0VyD4n5dtrVo=; b=aisCrj+cbes39RkeG8+/95FRz8qG9mTEiaZq7wK0t556SLswPkklL3OveFJ0Y4694u VV1kf1y4GaWvjkWi/Gpgf/A4nm/7hZIVq0s0GvIzzwUurAfBRFpCk3qSbGyS2O6u+HWy hahKnBFCsWAkUuyhWTkyryL1vVCNNB6ZtxXxsodQ0vAVUWBasrF/adxeDkTewV0tBzKL Z6n20vDtHk28G5gYQQQa0+1kdGhU5igrt7FuZxekB86ZU3brzHoSl0+UPoapbL8GriX8 f5G5gAtlJMAPgYQdAEa2hPKab4/t/T0dwI6ljVNle3qzJ73dVmFff/YK2VpWzDP2fyr8 gvsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534495; x=1737139295; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=DPVLK2dGaKuBzLzuN4/iPTz64M+48MZ0VyD4n5dtrVo=; b=RD6iR7uWqLTZQkymRtcwhw7JcAsUjY0t1PaWpl4ORCKN7+u+dzsZDDc8llLowm4WOU jJ+snVi0Ahxsn5mNUTM4MDKB3kFly7BGTXfk/TM/ZMo2Jomw5D7mJiV7Ni7+RYm3zvzc p9ufhlTslX2zGLjGRkrqHcgPlMGPL78Mu3YVdUxkdF0EiQ/W/LCa310KXHbuiuRif24l 7f0YTEKORwNaZ57p8UwYPNwZ33UUTwE3BUrRv8F6UP+62lh/W+VjHmMYO7kUpoP5zda6 kP04pBqy9s9kbEs8ny1dz4jABjgNQAINJnpLdJsIf0hCV59gRtsXa+pUgQ8MLGuKPPiF P9pA== X-Forwarded-Encrypted: i=1; AJvYcCXr0s4luTQHWhtAVByOfPfmNZYpWT1Xct3nYs5gxFnLhsD5zKmoQK1kvygjiSLPz1TuPGbY/SRfMaA=@vger.kernel.org X-Gm-Message-State: AOJu0YxH0UEdGmcnqzKCSY3GsNQLX+wJIzIi16FWZszzBgI7chiVckZm tDhx47NEDY4LZBHReGhu3LeSeWx7/DWlg+R9X9+j+e1YfwVQhrJiO648XcuJzUl4gmiJIw/ZBuO M2IjaiZwb3g== X-Google-Smtp-Source: AGHT+IGa4COP7j9J4GokCTwr7XdO0ZTV5zuQPMzgfz+JYaXuU8H1oEoG9KOP7c/HjvN6cC1FzOnvXoWDhc/niQ== X-Received: from wmbbd12.prod.google.com ([2002:a05:600c:1f0c:b0:434:fd41:173c]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:25a:b0:431:547e:81d0 with SMTP id 5b1f17b1804b1-436ee0a061emr59243545e9.11.1736534495189; Fri, 10 Jan 2025 10:41:35 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:48 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-22-8419288bc805@google.com> Subject: [PATCH RFC v2 22/29] mm: asi: exit ASI before accessing CR3 from C code where appropriate From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Yosry Ahmed Because asi_exit()s can be triggered by NMIs, CR3 is unstable when in the ASI restricted address space. (Exception: code in the ASI critical section can treat it as stable, because if an asi_exit() occurs during an exception it will be undone before the critical section resumes). Code that accesses CR3 needs to become aware of this. Most importantly: if code reads CR3 and then writes a derived value back, if concurrent asi_exit() occurred in between then the address space switch would be undone, which would totally break ASI. So, make sure that an asi_exit() is performed before accessing CR3. Exceptions are made for cases that need to access the current CR3 value, restricted or not, without exiting ASI. (An alternative approach would be to enter an ASI critical section when a stable CR3 is needed. This would be worth exploring if the ASI exits introduced by this patch turned out to cause performance issues). Add calls to asi_exit() to __native_read_cr3() and native_write_cr3(), and introduce 'raw' variants that do not perform an ASI exit. Introduce similar variants for wrappers: __read_cr3(), read_cr3_pa(), and write_cr3(). A forward declaration of asi_exit(), because the one in asm-generic/asi.h is only declared when !CONFIG_ADDRESS_SPACE_ISOLATION, and arch/x86/asm/asi.h cannot be included either as it would cause a circular dependency. The 'raw' variants are used in the following cases: - In __show_regs() where the actual values of registers are dumped for debugging. - In dump_pagetable() and show_fault_oops() where the active page tables during a page fault are dumped for debugging. - In switch_mm_verify_cr3() and cr3_matches_current_mm() where the actual value of CR3 is needed for a debug check, and the code explicitly checks for ASI-restricted CR3. - In exc_page_fault() for ASI faults. The code is ASI-aware and explicitly performs an ASI exit before reading CR3. - In load_new_mm_cr3() where a new CR3 is loaded during context switching. At this point, it is guaranteed that ASI already exited. Calling asi_exit() at that point, where loaded_mm == LOADED_MM_SWITCHING, will cause VM_BUG_ON in asi_exit() to fire mistakenly even though loaded_mm is never accessed. - In __get_current_cr3_fast(), as it is called from an ASI critical section and the value is only used for debug checking. In nested_vmx_check_vmentry_hw(), do an explicit asi_exit() before calling __get_current_cr3_fast(), since in that case we are not in a critical section and do need a stable CR3 value. - In __asi_enter() and asi_exit() for obvious reasons. - In vmx_set_constant_host_state() when CR3 is initialized in the VMCS with the most likely value. Preemption is enabled, so once ASI supports context switching exiting ASI will not be reliable as rescheduling may cause re-entering ASI. It doesn't matter if the wrong value of CR3 is used in this context, before entering the guest, ASI is either explicitly entered or exited, and CR3 is updated again in the VMCS if needed. - In efi_5level_switch(), as it is called from startup_64_mixed_mode() during boot before ASI can be entered. startup_64_mixed_mode() is under arch/x86/boot/compressed/* and it cannot call asi_exit() anyway (see below). Finally, code in arch/x86/boot/compressed/ident_map_64.c and arch/x86/boot/compressed/pgtable_64.c extensively accesses CR3 during boot. This code under arch/x86/boot/compressed/* cannot call asi_exit() due to restriction on its compilation (it cannot use functions defined in .c files outside the directory). Instead of changing all CR3 accesses to use 'raw' variants, undefine CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION in these files. This will make the asi_exit() calls in CR3 helpers use the noop variant defined in include/asm-generic/asi.h. This is fine because the code is executed early in boot where asi_exit() would be noop anyway. With this change, the number of existing *_cr3() calls are 44, and the number of *_cr3_raw() are 22. The choice was made to make the existing functions exit ASI by default and adding new variants that do not exit ASI, because most call sites that use the new *_cr3_raw() variants are either ASI-aware code or debugging code. On the contrary, code that uses the existing variants is usually in important code paths (e.g. TLB flushes) and is ignorant of ASI. Hence, new code is most likely to be correct and less risky by using the variants that exit ASI by default. Signed-off-by: Yosry Ahmed Signed-off-by: Brendan Jackman --- arch/x86/Kconfig | 2 +- arch/x86/boot/compressed/ident_map_64.c | 10 ++++++++ arch/x86/boot/compressed/pgtable_64.c | 11 +++++++++ arch/x86/include/asm/processor.h | 5 ++++ arch/x86/include/asm/special_insns.h | 41 +++++++++++++++++++++++++++++++-- arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kvm/vmx/nested.c | 6 +++++ arch/x86/kvm/vmx/vmx.c | 8 ++++++- arch/x86/mm/asi.c | 4 ++-- arch/x86/mm/fault.c | 8 +++---- arch/x86/mm/tlb.c | 16 +++++++++---- arch/x86/virt/svm/sev.c | 2 +- drivers/firmware/efi/libstub/x86-5lvl.c | 2 +- include/asm-generic/asi.h | 1 + 15 files changed, 101 insertions(+), 19 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1fcb52cb8cd5084ac3cef04af61b7d1653162bdb..ae31f36ce23d7c29d1e90b726c5a2e6ea5a63c8d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2531,7 +2531,7 @@ config MITIGATION_ADDRESS_SPACE_ISOLATION The !PARAVIRT dependency is only because of lack of testing; in theory the code is written to work under paravirtualization. In practice there are likely to be unhandled cases, in particular concerning TLB - flushes. + flushes and CR3 manipulation. config ADDRESS_SPACE_ISOLATION_DEFAULT_ON diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c index dfb9c2deb77cfc4e9986976bf2fd1652666f8f15..957b6f818aec361191432b420b61ba6ae017cf6c 100644 --- a/arch/x86/boot/compressed/ident_map_64.c +++ b/arch/x86/boot/compressed/ident_map_64.c @@ -11,6 +11,16 @@ /* No MITIGATION_PAGE_TABLE_ISOLATION support needed either: */ #undef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION +/* + * CR3 access helpers (e.g. write_cr3()) will call asi_exit() to exit the + * restricted address space first. We cannot call the version defined in + * arch/x86/mm/asi.c here, so make sure we always call the noop version in + * asm-generic/asi.h. It does not matter because early during boot asi_exit() + * would be a noop anyway. The alternative is spamming the code with *_raw() + * variants of the CR3 helpers. + */ +#undef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + #include "error.h" #include "misc.h" diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c index c882e1f67af01c50a20bfe00a32138dc771ee88c..034ad7101126c19864cfacc7c363fd31fedecd2b 100644 --- a/arch/x86/boot/compressed/pgtable_64.c +++ b/arch/x86/boot/compressed/pgtable_64.c @@ -1,4 +1,15 @@ // SPDX-License-Identifier: GPL-2.0 + +/* + * CR3 access helpers (e.g. write_cr3()) will call asi_exit() to exit the + * restricted address space first. We cannot call the version defined in + * arch/x86/mm/asi.c here, so make sure we always call the noop version in + * asm-generic/asi.h. It does not matter because early during boot asi_exit() + * would be a noop anyway. The alternative is spamming the code with *_raw() + * variants of the CR3 helpers. + */ +#undef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + #include "misc.h" #include #include diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a32a53405f45e4c0473fe081e216029cf5bd0cdd..9375a7f877d60e8f556dedefbe74593c1a5a6e10 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -226,6 +226,11 @@ static __always_inline unsigned long read_cr3_pa(void) return __read_cr3() & CR3_ADDR_MASK; } +static __always_inline unsigned long read_cr3_pa_raw(void) +{ + return __read_cr3_raw() & CR3_ADDR_MASK; +} + static inline unsigned long native_read_cr3_pa(void) { return __native_read_cr3() & CR3_ADDR_MASK; diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 6e103358966f6f1333aa07be97aec5f8af794120..1c886b3f04a56893b7408466a2c94d23f5d11857 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -5,6 +5,7 @@ #ifdef __KERNEL__ #include #include +#include #include #include @@ -42,18 +43,32 @@ static __always_inline void native_write_cr2(unsigned long val) asm volatile("mov %0,%%cr2": : "r" (val) : "memory"); } -static __always_inline unsigned long __native_read_cr3(void) +void asi_exit(void); + +static __always_inline unsigned long __native_read_cr3_raw(void) { unsigned long val; asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : __FORCE_ORDER); return val; } -static __always_inline void native_write_cr3(unsigned long val) +static __always_inline unsigned long __native_read_cr3(void) +{ + asi_exit(); + return __native_read_cr3_raw(); +} + +static __always_inline void native_write_cr3_raw(unsigned long val) { asm volatile("mov %0,%%cr3": : "r" (val) : "memory"); } +static __always_inline void native_write_cr3(unsigned long val) +{ + asi_exit(); + native_write_cr3_raw(val); +} + static inline unsigned long native_read_cr4(void) { unsigned long val; @@ -152,17 +167,39 @@ static __always_inline void write_cr2(unsigned long x) /* * Careful! CR3 contains more than just an address. You probably want * read_cr3_pa() instead. + * + * The implementation interacts with ASI to ensure that the returned value is + * stable as long as preemption is disabled. */ static __always_inline unsigned long __read_cr3(void) { return __native_read_cr3(); } +/* + * The return value of this is unstable under ASI, even with preemption off. + * Call __read_cr3 instead unless you have a good reason not to. + */ +static __always_inline unsigned long __read_cr3_raw(void) +{ + return __native_read_cr3_raw(); +} + +/* This interacts with ASI like __read_cr3. */ static __always_inline void write_cr3(unsigned long x) { native_write_cr3(x); } +/* + * Like __read_cr3_raw, this doesn't interact with ASI. It's very unlikely that + * this should be called from outside ASI-specific code. + */ +static __always_inline void write_cr3_raw(unsigned long x) +{ + native_write_cr3_raw(x); +} + static inline void __write_cr4(unsigned long x) { native_write_cr4(x); diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 0917c7f25720be91372bacddb1a3032323b8996f..14828a867b713a50297953c5a0ccfd03d83debc0 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -79,7 +79,7 @@ void __show_regs(struct pt_regs *regs, enum show_regs_mode mode, cr0 = read_cr0(); cr2 = read_cr2(); - cr3 = __read_cr3(); + cr3 = __read_cr3_raw(); cr4 = __read_cr4(); printk("%sCR0: %08lx CR2: %08lx CR3: %08lx CR4: %08lx\n", log_lvl, cr0, cr2, cr3, cr4); diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 226472332a70dd02902f81c21207d770e698aeed..ca135731b54b7f5f1123c2b8b27afdca7b868bcc 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -113,7 +113,7 @@ void __show_regs(struct pt_regs *regs, enum show_regs_mode mode, cr0 = read_cr0(); cr2 = read_cr2(); - cr3 = __read_cr3(); + cr3 = __read_cr3_raw(); cr4 = __read_cr4(); printk("%sFS: %016lx(%04x) GS:%016lx(%04x) knlGS:%016lx\n", diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 931a7361c30f2da28073eb832efce0b378e3b325..7eb033dabe4a606947c4d7e5b96be2e42d8f2478 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3214,6 +3214,12 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu) */ vmcs_writel(GUEST_RFLAGS, 0); + /* + * Stabilize CR3 to ensure the VM Exit returns to the correct address + * space. This is costly, we wouldn't do this in hot-path code. + */ + asi_exit(); + cr3 = __get_current_cr3_fast(); if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) { vmcs_writel(HOST_CR3, cr3); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 181d230b1c057fed33f7b29b7b0e378dbdfeb174..0e90463f1f2183b8d716f85d5c8a8af8958fef0b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4323,8 +4323,14 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx) /* * Save the most likely value for this task's CR3 in the VMCS. * We can't use __get_current_cr3_fast() because we're not atomic. + * + * Use __read_cr3_raw() to avoid exiting ASI if we are in the restrict + * address space. Preemption is enabled, so rescheduling could make us + * re-enter ASI anyway. It's okay to avoid exiting ASI here because + * vmx_vcpu_enter_exit() and nested_vmx_check_vmentry_hw() will + * explicitly enter or exit ASI and update CR3 in the VMCS if needed. */ - cr3 = __read_cr3(); + cr3 = __read_cr3_raw(); vmcs_writel(HOST_CR3, cr3); /* 22.2.3 FIXME: shadow tables */ vmx->loaded_vmcs->host_state.cr3 = cr3; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index bc2cf0475a0e7344a66d81453f55034b2fc77eef..a9f9bfbf85eb47d16ef8d0bfbc7713f07052d3ed 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -488,7 +488,7 @@ noinstr void __asi_enter(void) pcid = asi_pcid(target, this_cpu_read(cpu_tlbstate.loaded_mm_asid)); asi_cr3 = build_cr3_pcid_noinstr(target->pgd, pcid, tlbstate_lam_cr3_mask(), false); - write_cr3(asi_cr3); + write_cr3_raw(asi_cr3); maybe_flush_data(target); /* @@ -559,7 +559,7 @@ noinstr void asi_exit(void) /* Tainting first makes reentrancy easier to reason about. */ this_cpu_or(asi_taints, ASI_TAINT_KERNEL_DATA); - write_cr3(unrestricted_cr3); + write_cr3_raw(unrestricted_cr3); /* * Must not update curr_asi until after CR3 write, otherwise a * re-entrant call might not enter this branch. (This means we diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ee8f5417174e2956391d538f41e2475553ca4972..ca48e4f5a27be30ff93d1c3d194aad23d99ae43c 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -295,7 +295,7 @@ static bool low_pfn(unsigned long pfn) static void dump_pagetable(unsigned long address) { - pgd_t *base = __va(read_cr3_pa()); + pgd_t *base = __va(read_cr3_pa_raw()); pgd_t *pgd = &base[pgd_index(address)]; p4d_t *p4d; pud_t *pud; @@ -351,7 +351,7 @@ static int bad_address(void *p) static void dump_pagetable(unsigned long address) { - pgd_t *base = __va(read_cr3_pa()); + pgd_t *base = __va(read_cr3_pa_raw()); pgd_t *pgd = base + pgd_index(address); p4d_t *p4d; pud_t *pud; @@ -519,7 +519,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad pgd_t *pgd; pte_t *pte; - pgd = __va(read_cr3_pa()); + pgd = __va(read_cr3_pa_raw()); pgd += pgd_index(address); pte = lookup_address_in_pgd_attr(pgd, address, &level, &nx, &rw); @@ -1578,7 +1578,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) * be losing some stats here. However for now this keeps ASI * page faults nice and fast. */ - pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address); + pgd = (pgd_t *)__va(read_cr3_pa_raw()) + pgd_index(address); if (!user_mode(regs) && kernel_access_ok(error_code, address, pgd)) { warn_if_bad_asi_pf(error_code, address); return; diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 07b1657bee8e4cf17452ea57c838823e76f482c0..0c9f477a44a4da971cb7744d01d9101900ead1a5 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -331,8 +331,14 @@ static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, unsigned long lam, * Caution: many callers of this function expect * that load_cr3() is serializing and orders TLB * fills with respect to the mm_cpumask writes. + * + * The context switching code will explicitly exit ASI when needed, do + * not use write_cr3() as it has an implicit ASI exit. Calling + * asi_exit() here, where loaded_mm == LOADED_MM_SWITCHING, will cause + * the VM_BUG_ON() in asi_exit() to fire mistakenly even though + * loaded_mm is never accessed. */ - write_cr3(new_mm_cr3); + write_cr3_raw(new_mm_cr3); } void leave_mm(void) @@ -559,11 +565,11 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, * without going through leave_mm() / switch_mm_irqs_off() or that * does something like write_cr3(read_cr3_pa()). * - * Only do this check if CONFIG_DEBUG_VM=y because __read_cr3() + * Only do this check if CONFIG_DEBUG_VM=y because __read_cr3_raw() * isn't free. */ #ifdef CONFIG_DEBUG_VM - if (WARN_ON_ONCE(__read_cr3() != build_cr3(prev->pgd, prev_asid, + if (WARN_ON_ONCE(__read_cr3_raw() != build_cr3(prev->pgd, prev_asid, tlbstate_lam_cr3_mask()))) { /* * If we were to BUG here, we'd be very likely to kill @@ -1173,7 +1179,7 @@ noinstr unsigned long __get_current_cr3_fast(void) */ VM_WARN_ON_ONCE(asi && asi_in_critical_section()); - VM_BUG_ON(cr3 != __read_cr3()); + VM_BUG_ON(cr3 != __read_cr3_raw()); return cr3; } EXPORT_SYMBOL_GPL(__get_current_cr3_fast); @@ -1373,7 +1379,7 @@ static inline bool cr3_matches_current_mm(void) * find a current ASI domain. */ barrier(); - pgd_cr3 = __va(read_cr3_pa()); + pgd_cr3 = __va(read_cr3_pa_raw()); return pgd_cr3 == current->mm->pgd || pgd_cr3 == pgd_asi; } diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c index 9a6a943d8e410c0289200adb9deafe8e45d33a4b..63d391395a5c7f4ddec28116814ccd6c52bbb428 100644 --- a/arch/x86/virt/svm/sev.c +++ b/arch/x86/virt/svm/sev.c @@ -379,7 +379,7 @@ void snp_dump_hva_rmpentry(unsigned long hva) pgd_t *pgd; pte_t *pte; - pgd = __va(read_cr3_pa()); + pgd = __va(read_cr3_pa_raw()); pgd += pgd_index(hva); pte = lookup_address_in_pgd(pgd, hva, &level); diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c index 77359e802181fd82b6a624cf74183e6a318cec9b..3b97a5aea983a109fbdc6d23a219e4a04024c512 100644 --- a/drivers/firmware/efi/libstub/x86-5lvl.c +++ b/drivers/firmware/efi/libstub/x86-5lvl.c @@ -66,7 +66,7 @@ void efi_5level_switch(void) bool have_la57 = native_read_cr4() & X86_CR4_LA57; bool need_toggle = want_la57 ^ have_la57; u64 *pgt = (void *)la57_toggle + PAGE_SIZE; - u64 *cr3 = (u64 *)__native_read_cr3(); + u64 *cr3 = (u64 *)__native_read_cr3_raw(); u64 *new_cr3; if (!la57_toggle || !need_toggle) diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 7867b8c23449058a1dd06308ab5351e0d210a489..4f033d3ef5929707fd280f74fc800193e45143c1 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -71,6 +71,7 @@ static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } static inline void asi_handle_switch_mm(void) { } +struct thread_struct; static inline void asi_init_thread_state(struct thread_struct *thread) { } static inline void asi_intr_enter(void) { } From patchwork Fri Jan 10 18:40:51 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856325 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C8FA225A36 for ; Fri, 10 Jan 2025 18:41:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534531; cv=none; b=g2/y6gOM6/9ROWnP9umBCXDEB9n/VhFeS415Xlx6PfMXrvhvm5unkRraZrUixXvmfDXCAaTfaCtyq5sSF3hHlwWy4JDzosR1nLTEbtoeCjN8sxCQae6+t746oh61TmIG40xN5Mu98kX0J05GXsOPTKhDcl6mbPLvbJLM5GOX6KE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534531; c=relaxed/simple; bh=uCTXszcaDTGYzLr/kSqhdNILKspgekzqD7024slk2Ug=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YOP60ulQ50V752tOdKcWlcz/uCVZzv78cjQyUQkxTpuGqQJxgM64sUiBVobaoFwxPGHIB7ezFNN+zxlexPbDwbPQdDsqu3wOEaRKekk6vmTmTWTM6ZFbn1aorExKR35OmXGnFUg1574Np221j1Po9dKHdBhZq7KDlE0x1YaR3WU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=NcaQ/3Sd; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NcaQ/3Sd" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-385e00ebb16so962948f8f.3 for ; Fri, 10 Jan 2025 10:41:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534502; x=1737139302; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lGjVOQFY6aPZLutIkt8TLO/unqgXoc0/gnPdxCEE5wQ=; b=NcaQ/3Sd0xjN43ePalum7sUbzGNq/jMReWRMsUokZ6+WVjess7wYWQe9SqGm0VQPbL BglYsRBvpK2UHmemJAxkPSXlik+bCfbEBGxRKYX9QotSbsWbF3M2+gQEmht77rGxoRoz X0kivdKAk4XHOF04R3oBHPtSvRebADGncMcQDFwLjCIJ+8OAh1EKEkVZ/qXjHuj5sLnP YnR0KWV+xWF8t/AFye0t+ArgyNIY5Q5N61CRrjGXKzMI+KHfHk5K7cJ/+JJuz/+q/YIp 36aWD8MeDoy5bnqSsImH7QB8HYcKeP7/mEOebRWdkea79t9iXfRi0ZrU7ENVQRdTOWex N0ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534502; x=1737139302; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lGjVOQFY6aPZLutIkt8TLO/unqgXoc0/gnPdxCEE5wQ=; b=C51rVoU2ZQuFGrY/1hBj5yhVpbIHGW4CtwJpqoMPY/k/0CgzreO47UHvUthQUeiVnI UE6CplCj2Vt8PwHs1Sw7EWhvNaxm6O+UVfwyYcFBF2TweB4m7vBprIu5XlshGwCbA3rT 75s5lsA/s6sBt3IODHhuZleQ1gmlb1FfMgtBecyFoeKkucGpU1z4Jys+g9ig7D1b9hJN TjwJaA6+YOtpRWmcqOtS6hnm6szooT8tGgyAxuKnALgV+P1lvnL6/J1lcs6QSDUXYmkX +oj/CCdbG2E6jihIou7/qxAsGw76I0WRGGqEJy3TrtlLRzE87JMiiVB1hJMd2g9sutmb i6pQ== X-Forwarded-Encrypted: i=1; AJvYcCWC3euYP+m5oP2FvNuBBPRjZgKnDgmuquNXWrgHHYNKkSn48hRqwBIR29ry/yWjSnfPTQcauSRVvuA=@vger.kernel.org X-Gm-Message-State: AOJu0YwrZnygxq3iMV8QXbckoUP+8EJqCQeQCPoPv8E6CSWVpzoV8hSD DSEwNa01wwav9Alq1XeNnMXAup8ii7lxPchtSKdttlqXYD8gYR0Nim+SkUGztttQRJUH2J2QRAx 82a0IM2w0xA== X-Google-Smtp-Source: AGHT+IEWN36a0RhRoQNfORq+2yO51X6txHS1Rw2POezO/cSMdMR6AN2SOjOPPnxrT0b5W7xr5QSy1CVP6p9cPA== X-Received: from wmqa1.prod.google.com ([2002:a05:600c:3481:b0:436:1995:1888]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:588b:0:b0:385:df84:8496 with SMTP id ffacd0b85a97d-38a872c9432mr10863969f8f.3.1736534502226; Fri, 10 Jan 2025 10:41:42 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:51 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-25-8419288bc805@google.com> Subject: [PATCH RFC v2 25/29] mm: asi: Restricted execution fore bare-metal processes From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman Now userspace gets a restricted address space too. The critical section begins on exit to userspace and ends when it makes a system call. Other entries from userspace just interrupt the critical section via asi_intr_enter(). The reason why system calls have to actually asi_relax() (i.e. fully terminate the critical section instead of just interrupting it) is that system calls are the type of kernel entry that can lead to transition into a _different_ ASI domain, namely the KVM one: it is not supported to transition into a different domain while a critical section exists (i.e. while asi_state.target is not NULL), even if it has been paused by asi_intr_enter() (i.e. even if asi_state.intr_nest_depth is nonzero) - there must be an asi_relax() between any two asi_enter()s. The restricted address space for bare-metal tasks naturally contains the entire userspace address region, although the task's own memory is still missing from the direct map. This implementation creates new userspace-specific APIs for asi_init(), asi_destroy() and asi_enter(), which seems a little ugly, maybe this suggest a general rework of these APIs given that the "generic" version only has one caller. For RFC code this seems good enough though. Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 8 ++++++-- arch/x86/mm/asi.c | 49 ++++++++++++++++++++++++++++++++++++++++---- include/asm-generic/asi.h | 9 +++++++- include/linux/entry-common.h | 11 ++++++++++ init/main.c | 2 ++ kernel/entry/common.c | 1 + kernel/fork.c | 4 +++- 7 files changed, 76 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index e925d7d2cfc85bca8480c837548654e7a5a7009e..c3c1a57f0147ae9bd11d89c8bf7c8a4477728f51 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -140,19 +140,23 @@ DECLARE_PER_CPU_ALIGNED(struct asi *, curr_asi); void asi_check_boottime_disable(void); -void asi_init_mm_state(struct mm_struct *mm); +int asi_init_mm_state(struct mm_struct *mm); int asi_init_class(enum asi_class_id class_id, struct asi_taint_policy *taint_policy); +void asi_init_userspace_class(void); void asi_uninit_class(enum asi_class_id class_id); const char *asi_class_name(enum asi_class_id class_id); int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_asi); void asi_destroy(struct asi *asi); +void asi_destroy_userspace(struct mm_struct *mm); void asi_clone_user_pgtbl(struct mm_struct *mm, pgd_t *pgdp); /* Enter an ASI domain (restricted address space) and begin the critical section. */ void asi_enter(struct asi *asi); +void asi_enter_userspace(void); + /* * Leave the "tense" state if we are in it, i.e. end the critical section. We * will stay relaxed until the next asi_enter. @@ -294,7 +298,7 @@ void asi_handle_switch_mm(void); */ static inline bool asi_maps_user_addr(enum asi_class_id class_id) { - return false; + return class_id == ASI_CLASS_USERSPACE; } #endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 093103c1bc2677c81d68008aca064fab53b73a62..1e9dc568e79e8686a4dbf47f765f2c2535d025ec 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -25,6 +25,7 @@ const char *asi_class_names[] = { #if IS_ENABLED(CONFIG_KVM) [ASI_CLASS_KVM] = "KVM", #endif + [ASI_CLASS_USERSPACE] = "userspace", }; DEFINE_PER_CPU_ALIGNED(struct asi *, curr_asi); @@ -67,6 +68,32 @@ int asi_init_class(enum asi_class_id class_id, struct asi_taint_policy *taint_po } EXPORT_SYMBOL_GPL(asi_init_class); +void __init asi_init_userspace_class(void) +{ + static struct asi_taint_policy policy = { + /* + * Prevent going to userspace with sensitive data potentially + * left in sidechannels by code running in the unrestricted + * address space, or another MM. Note we don't check for guest + * data here. This reflects the assumption that the guest trusts + * its VMM (absent fancy HW features, which are orthogonal). + */ + .protect_data = ASI_TAINT_KERNEL_DATA | ASI_TAINT_OTHER_MM_DATA, + /* + * Don't go into userspace with control flow state controlled by + * other processes, or any KVM guest the process is running. + * Note this bit is about protecting userspace from other parts + * of the system, while data_taints is about protecting other + * parts of the system from the guest. + */ + .prevent_control = ASI_TAINT_GUEST_CONTROL | ASI_TAINT_OTHER_MM_CONTROL, + .set = ASI_TAINT_USER_CONTROL | ASI_TAINT_USER_DATA, + }; + int err = asi_init_class(ASI_CLASS_USERSPACE, &policy); + + WARN_ON(err); +} + void asi_uninit_class(enum asi_class_id class_id) { if (!boot_cpu_has(X86_FEATURE_ASI)) @@ -385,7 +412,8 @@ int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_ int err = 0; uint i; - *out_asi = NULL; + if (out_asi) + *out_asi = NULL; if (!boot_cpu_has(X86_FEATURE_ASI)) return 0; @@ -424,7 +452,7 @@ int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_ exit_unlock: if (err) __asi_destroy(asi); - else + else if (out_asi) *out_asi = asi; __asi_init_user_pgds(mm, asi); @@ -515,6 +543,12 @@ static __always_inline void maybe_flush_data(struct asi *next_asi) this_cpu_and(asi_taints, ~ASI_TAINTS_DATA_MASK); } +void asi_destroy_userspace(struct mm_struct *mm) +{ + VM_BUG_ON(!asi_class_initialized(ASI_CLASS_USERSPACE)); + asi_destroy(&mm->asi[ASI_CLASS_USERSPACE]); +} + noinstr void __asi_enter(void) { u64 asi_cr3; @@ -584,6 +618,11 @@ noinstr void asi_enter(struct asi *asi) } EXPORT_SYMBOL_GPL(asi_enter); +noinstr void asi_enter_userspace(void) +{ + asi_enter(¤t->mm->asi[ASI_CLASS_USERSPACE]); +} + noinstr void asi_relax(void) { if (static_asi_enabled()) { @@ -633,13 +672,15 @@ noinstr void asi_exit(void) } EXPORT_SYMBOL_GPL(asi_exit); -void asi_init_mm_state(struct mm_struct *mm) +int asi_init_mm_state(struct mm_struct *mm) { if (!boot_cpu_has(X86_FEATURE_ASI)) - return; + return 0; memset(mm->asi, 0, sizeof(mm->asi)); mutex_init(&mm->asi_init_lock); + + return asi_init(mm, ASI_CLASS_USERSPACE, NULL); } void asi_handle_switch_mm(void) diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index d103343292fad567dcd73e45e986fb3974e59898..c93f9e779ce1fa61e3df7835f5ab744cce7d667b 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -15,6 +15,7 @@ enum asi_class_id { #if IS_ENABLED(CONFIG_KVM) ASI_CLASS_KVM, #endif + ASI_CLASS_USERSPACE, ASI_MAX_NUM_CLASSES, }; static_assert(order_base_2(X86_CR3_ASI_PCID_BITS) <= ASI_MAX_NUM_CLASSES); @@ -37,8 +38,10 @@ int asi_init_class(enum asi_class_id class_id, static inline void asi_uninit_class(enum asi_class_id class_id) { } +static inline void asi_init_userspace_class(void) { } + struct mm_struct; -static inline void asi_init_mm_state(struct mm_struct *mm) { } +static inline int asi_init_mm_state(struct mm_struct *mm) { return 0; } static inline int asi_init(struct mm_struct *mm, enum asi_class_id class_id, struct asi **out_asi) @@ -48,8 +51,12 @@ static inline int asi_init(struct mm_struct *mm, enum asi_class_id class_id, static inline void asi_destroy(struct asi *asi) { } +static inline void asi_destroy_userspace(struct mm_struct *mm) { } + static inline void asi_enter(struct asi *asi) { } +static inline void asi_enter_userspace(void) { } + static inline void asi_relax(void) { } static inline bool asi_is_relaxed(void) { return true; } diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 1e50cdb83ae501467ecc30ee52f1379d409f962e..f04c4c038556f84ddf3bc09b6c1dd22a9dbd2f6b 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -191,6 +191,16 @@ static __always_inline long syscall_enter_from_user_mode(struct pt_regs *regs, l { long ret; + /* + * End the ASI critical section for userspace. Syscalls are the only + * place this happens - all other entry from userspace is handled via + * ASI's interrupt-tracking. The reason syscalls are special is that's + * where it's possible to switch to another ASI domain within the same + * task (i.e. KVM_RUN), an asi_relax() is required here in case of an + * upcoming asi_enter(). + */ + asi_relax(); + enter_from_user_mode(regs); instrumentation_begin(); @@ -355,6 +365,7 @@ static __always_inline void exit_to_user_mode_prepare(struct pt_regs *regs) */ static __always_inline void exit_to_user_mode(void) { + instrumentation_begin(); trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); diff --git a/init/main.c b/init/main.c index c4778edae7972f512d5eefe8400075ac35a70d1c..d19e149d385e8321d2f3e7c28aa75802af62d09c 100644 --- a/init/main.c +++ b/init/main.c @@ -953,6 +953,8 @@ void start_kernel(void) /* Architectural and non-timekeeping rng init, before allocator init */ random_init_early(command_line); + asi_init_userspace_class(); + /* * These use large bootmem allocations and must precede * initalization of page allocator diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 5b6934e23c21d36a3238dc03e391eb9e3beb4cfb..874254ed5958d62eaeaef4fe3e8c02e56deaf5ed 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -218,6 +218,7 @@ __visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs) __syscall_exit_to_user_mode_work(regs); instrumentation_end(); exit_to_user_mode(); + asi_enter_userspace(); } noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs) diff --git a/kernel/fork.c b/kernel/fork.c index bb73758790d08112265d398b16902ff9a4c2b8fe..54068d2415939b92409ca8a45111176783c6acbd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -917,6 +917,7 @@ void __mmdrop(struct mm_struct *mm) /* Ensure no CPUs are using this as their lazy tlb mm */ cleanup_lazy_tlbs(mm); + asi_destroy_userspace(mm); WARN_ON_ONCE(mm == current->active_mm); mm_free_pgd(mm); destroy_context(mm); @@ -1297,7 +1298,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, if (mm_alloc_pgd(mm)) goto fail_nopgd; - asi_init_mm_state(mm); + if (asi_init_mm_state(mm)) + goto fail_nocontext; if (init_new_context(p, mm)) goto fail_nocontext; From patchwork Fri Jan 10 18:40:52 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856326 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1DE72248BD for ; Fri, 10 Jan 2025 18:41:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534522; cv=none; b=HfFiG1S+Oi3iXXFs+zO6UnCHP8gR31wUAMFum1D8Nek2w9Udogg9+48aDGnEiKHf3Xe71xUlBvRSFj8IpF/DpmVb8GvDM/FvxF50zfFAN47AKdINXNNQ9cN9AsELPnmUQeXyyXSPnpuJsSgYQY6r2VYaqnOzEEB0pKVB6dzAzyY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534522; c=relaxed/simple; bh=ZK/QlM+sVXQDBx7cl96FsEOcDBNt5hiDNlbgk7bMxQY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=tVST9kwi0pDNcVpguNZYvQs85NE5neemTxRMODp9hTBQXliqGfLk5pltVYnxqLPcKnZbeJL2uIFDneQ1mHNPTwirzR+TcxKSb5sZZQKSrI6TyyWZNYSOYwigIhS2iTD1HAwLCuFP9pzg8TUXtvT4BJct5e34s1pRcs+hic6IZws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=QYF+7tFD; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="QYF+7tFD" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-43621907030so19343115e9.1 for ; Fri, 10 Jan 2025 10:41:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534505; x=1737139305; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CAllTjvu+aeRJsuZfnIbaARP70yHbpTXyh5BDwNTLpw=; b=QYF+7tFDlf5oYpv3w5humMmfzo2I5dWv0RgGVOOWSGVAYY5T281nWfHfl8e3AOtbDk 7P3fOPF/8R0nGXhQx91pQRw4deW45iWzr57h4B6l/h5Lxt+O9SfCLYlu+EkGsOP+K6+e P3OIc1YWmvA+8vP9vjcN3MRyvwqgNacs02DlZT5GCdrqlSS3DCC5h7kgRUThUtxsuJ0D QGeqAyI1Bl54DWfMoyKCwLghkgahnnxJmDzTAE33btPxKAoq+PIX+dYGogYAsO0XKwAY jmHGLZ9BRwaFScenkbB+jyBrLmbzYSUPOeCrBw6zScTixBRxcIZ4DnDkUdtGzNKd57C2 svCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534505; x=1737139305; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CAllTjvu+aeRJsuZfnIbaARP70yHbpTXyh5BDwNTLpw=; b=dopRL28g6D4xdBjIGX+AkaKvIKE72wCFPZcv0SOJSmxyIMbB+4sfz7AkVMShWDC5Rz NW6BbF9vPkwX/RzzZkYiadiK+FheIUz49J8MPB0fdWdr7I0law2X2KoHyGJMeEO19TN8 zra221YtCspUSjcYuyW2E4p6V/PfbpajuzRNdm2WLvSRJGK27kLHwmOBQtnnozfWfUEi YcVWSUvQIznnFz4X7rOISbiDTXQnEmpDi5XmBZ9W68jHQuvQGlCDl5SS3gIQSupXt8yI he4Q4RkP3O3YTF89WG/L8NVP5B1RGATTCdZjn331uD+Es5KzoUTAZg7A3BfdCF3xrX+c mBJA== X-Forwarded-Encrypted: i=1; AJvYcCWPsE5R/iYutfxsnB4fdYn7pmCECZ6kvEKKESzajwLUVEt5R2MQ4IKCm7DoP5nnEsCelJTfAaVEpLs=@vger.kernel.org X-Gm-Message-State: AOJu0YyfXg5Ay1rlLfWZ+AogUaaAnwaSDi2ctNntKRN9jwQ3mEqSuITU vNc14xLLppEZ9UfIJhNoP8IZLqL2f6qvxqsqtlbZvlWroS98kAYLBvCL36UZoOn2L6IFN6ynviT PYnNWK+UVyw== X-Google-Smtp-Source: AGHT+IFbu2VfquZETrYawBwFSqjLwW/Qzimn4MDKfCetEVKQSMACYt32ELZsfcDoj+dSkmxIf50qY6QNFb/+Xg== X-Received: from wmqe1.prod.google.com ([2002:a05:600c:4e41:b0:434:a050:ddcf]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:3c85:b0:436:18d0:aa6e with SMTP id 5b1f17b1804b1-436e2679a7cmr125841955e9.5.1736534504638; Fri, 10 Jan 2025 10:41:44 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:52 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-26-8419288bc805@google.com> Subject: [PATCH RFC v2 26/29] x86: Create library for flushing L1D for L1TF From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman ASI will need to use this L1D flushing logic so put it in a library where it can be used independently of KVM. Since we're creating this library, it starts to look messy if we don't also use it in the double-opt-in (both kernel cmdline and prctl) mm-switching flush logic which is there for mitigating Snoop-Assisted L1 Data Sampling ("SAL1DS"). However, that logic doesn't use any software-based fallback for flushing on CPUs without the L1D_FLUSH command. In that case the prctl opt-in will fail. One option would be to just start using the software fallback sequence currently done by VMX code, but Linus didn't seem happy with a similar sequence being used here [1]. CPUs affected by SAL1DS are a subset of those affected by L1TF, so it wouldn't be completely insane to assume that the same sequence works for both cases, but I'll err on the side of caution and avoid risk of giving users a false impression that the kernel has really flushed L1D for them. [1] https://lore.kernel.org/linux-kernel/CAHk-=whC4PUhErcoDhCbTOdmPPy-Pj8j9ytsdcyz9TorOb4KUw@mail.gmail.com/ Instead, create this awkward library that is scoped specifically to L1TF, which will be used only by VMX and ASI, and has an annoying "only sometimes works" doc-comment. Users of the library can then infer from that comment whether they have flushed L1D. No functional change intended. Checkpatch-args: --ignore=COMMIT_LOG_LONG_LINE Signed-off-by: Brendan Jackman --- arch/x86/Kconfig | 4 ++ arch/x86/include/asm/l1tf.h | 11 ++++++ arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/vmx/vmx.c | 66 +++---------------------------- arch/x86/lib/Makefile | 1 + arch/x86/lib/l1tf.c | 94 +++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 117 insertions(+), 60 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ae31f36ce23d7c29d1e90b726c5a2e6ea5a63c8d..ca984dc7ee2f2b68c3ce1bcb5055047ca4f2a65d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2523,6 +2523,7 @@ config MITIGATION_ADDRESS_SPACE_ISOLATION bool "Allow code to run with a reduced kernel address space" default n depends on X86_64 && !PARAVIRT && !UML + select X86_L1TF_FLUSH_LIB help This feature provides the ability to run some kernel code with a reduced kernel address space. This can be used to @@ -3201,6 +3202,9 @@ config HAVE_ATOMIC_IOMAP def_bool y depends on X86_32 +config X86_L1TF_FLUSH_LIB + def_bool n + source "arch/x86/kvm/Kconfig" source "arch/x86/Kconfig.assembler" diff --git a/arch/x86/include/asm/l1tf.h b/arch/x86/include/asm/l1tf.h new file mode 100644 index 0000000000000000000000000000000000000000..e0be19c588bb5ec5c76a1861492e48b88615b4b8 --- /dev/null +++ b/arch/x86/include/asm/l1tf.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_L1TF_FLUSH_H +#define _ASM_L1TF_FLUSH_H + +#ifdef CONFIG_X86_L1TF_FLUSH_LIB +int l1tf_flush_setup(void); +void l1tf_flush(void); +#endif /* CONFIG_X86_L1TF_FLUSH_LIB */ + +#endif + diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index f09f13c01c6bbd28fa37fdf50547abf4403658c9..81c71510e33e52447882ab7b22682199c57b492e 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -92,6 +92,7 @@ config KVM_SW_PROTECTED_VM config KVM_INTEL tristate "KVM for Intel (and compatible) processors support" depends on KVM && IA32_FEAT_CTL + select X86_L1TF_FLUSH_LIB help Provides support for KVM on processors equipped with Intel's VT extensions, a.k.a. Virtual Machine Extensions (VMX). diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0e90463f1f2183b8d716f85d5c8a8af8958fef0b..b1a02f27b3abce0ef6ac448b66bef2c653a52eef 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -250,9 +251,6 @@ static void *vmx_l1d_flush_pages; static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) { - struct page *page; - unsigned int i; - if (!boot_cpu_has_bug(X86_BUG_L1TF)) { l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED; return 0; @@ -288,26 +286,11 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) l1tf = VMENTER_L1D_FLUSH_ALWAYS; } - if (l1tf != VMENTER_L1D_FLUSH_NEVER && !vmx_l1d_flush_pages && - !boot_cpu_has(X86_FEATURE_FLUSH_L1D)) { - /* - * This allocation for vmx_l1d_flush_pages is not tied to a VM - * lifetime and so should not be charged to a memcg. - */ - page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER); - if (!page) - return -ENOMEM; - vmx_l1d_flush_pages = page_address(page); + if (l1tf != VMENTER_L1D_FLUSH_NEVER) { + int err = l1tf_flush_setup(); - /* - * Initialize each page with a different pattern in - * order to protect against KSM in the nested - * virtualization case. - */ - for (i = 0; i < 1u << L1D_CACHE_ORDER; ++i) { - memset(vmx_l1d_flush_pages + i * PAGE_SIZE, i + 1, - PAGE_SIZE); - } + if (err) + return err; } l1tf_vmx_mitigation = l1tf; @@ -6652,20 +6635,8 @@ int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) return ret; } -/* - * Software based L1D cache flush which is used when microcode providing - * the cache control MSR is not loaded. - * - * The L1D cache is 32 KiB on Nehalem and later microarchitectures, but to - * flush it is required to read in 64 KiB because the replacement algorithm - * is not exactly LRU. This could be sized at runtime via topology - * information but as all relevant affected CPUs have 32KiB L1D cache size - * there is no point in doing so. - */ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) { - int size = PAGE_SIZE << L1D_CACHE_ORDER; - /* * This code is only executed when the flush mode is 'cond' or * 'always' @@ -6695,32 +6666,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) vcpu->stat.l1d_flush++; - if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) { - native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH); - return; - } - - asm volatile( - /* First ensure the pages are in the TLB */ - "xorl %%eax, %%eax\n" - ".Lpopulate_tlb:\n\t" - "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" - "addl $4096, %%eax\n\t" - "cmpl %%eax, %[size]\n\t" - "jne .Lpopulate_tlb\n\t" - "xorl %%eax, %%eax\n\t" - "cpuid\n\t" - /* Now fill the cache */ - "xorl %%eax, %%eax\n" - ".Lfill_cache:\n" - "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" - "addl $64, %%eax\n\t" - "cmpl %%eax, %[size]\n\t" - "jne .Lfill_cache\n\t" - "lfence\n" - :: [flush_pages] "r" (vmx_l1d_flush_pages), - [size] "r" (size) - : "eax", "ebx", "ecx", "edx"); + l1tf_flush(); } void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 98583a9dbab337e09a2e58905e5200499a496a07..b0a45bd70b40743a3fccb352b9641caacac83275 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -37,6 +37,7 @@ lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o lib-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o lib-$(CONFIG_MITIGATION_RETPOLINE) += retpoline.o +lib-$(CONFIG_X86_L1TF_FLUSH_LIB) += l1tf.o obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o obj-y += iomem.o diff --git a/arch/x86/lib/l1tf.c b/arch/x86/lib/l1tf.c new file mode 100644 index 0000000000000000000000000000000000000000..c474f18ae331c8dfa7a029c457dd3cf75bebf808 --- /dev/null +++ b/arch/x86/lib/l1tf.c @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include + +#include +#include +#include + +#define L1D_CACHE_ORDER 4 +static void *l1tf_flush_pages; + +int l1tf_flush_setup(void) +{ + struct page *page; + unsigned int i; + + if (l1tf_flush_pages || boot_cpu_has(X86_FEATURE_FLUSH_L1D)) + return 0; + + page = alloc_pages(GFP_KERNEL, L1D_CACHE_ORDER); + if (!page) + return -ENOMEM; + l1tf_flush_pages = page_address(page); + + /* + * Initialize each page with a different pattern in + * order to protect against KSM in the nested + * virtualization case. + */ + for (i = 0; i < 1u << L1D_CACHE_ORDER; ++i) { + memset(l1tf_flush_pages + i * PAGE_SIZE, i + 1, + PAGE_SIZE); + } + + return 0; +} +EXPORT_SYMBOL(l1tf_flush_setup); + +/* + * Flush L1D in a way that: + * + * - definitely works on CPUs X86_FEATURE_FLUSH_L1D (because the SDM says so). + * - almost definitely works on other CPUs with L1TF (because someone on LKML + * said someone from Intel said so). + * - may or may not work on other CPUs. + * + * Don't call unless l1tf_flush_setup() has returned successfully. + */ +noinstr void l1tf_flush(void) +{ + int size = PAGE_SIZE << L1D_CACHE_ORDER; + + if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) { + native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH); + return; + } + + if (WARN_ON(!l1tf_flush_pages)) + return; + + /* + * This sequence was provided by Intel for the purpose of mitigating + * L1TF on VMX. + * + * The L1D cache is 32 KiB on Nehalem and some later microarchitectures, + * but to flush it is required to read in 64 KiB because the replacement + * algorithm is not exactly LRU. This could be sized at runtime via + * topology information but as all relevant affected CPUs have 32KiB L1D + * cache size there is no point in doing so. + */ + asm volatile( + /* First ensure the pages are in the TLB */ + "xorl %%eax, %%eax\n" + ".Lpopulate_tlb:\n\t" + "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" + "addl $4096, %%eax\n\t" + "cmpl %%eax, %[size]\n\t" + "jne .Lpopulate_tlb\n\t" + "xorl %%eax, %%eax\n\t" + "cpuid\n\t" + /* Now fill the cache */ + "xorl %%eax, %%eax\n" + ".Lfill_cache:\n" + "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" + "addl $64, %%eax\n\t" + "cmpl %%eax, %[size]\n\t" + "jne .Lfill_cache\n\t" + "lfence\n" + :: [flush_pages] "r" (l1tf_flush_pages), + [size] "r" (size) + : "eax", "ebx", "ecx", "edx"); +} +EXPORT_SYMBOL(l1tf_flush); From patchwork Fri Jan 10 18:40:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856322 Received: from mail-lj1-f202.google.com (mail-lj1-f202.google.com [209.85.208.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D6E124B22C for ; Fri, 10 Jan 2025 18:49:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534952; cv=none; b=j2owqYRlR+E4c2mgigkUEvRbv+ryZTg10COz3BGao+Y6iHKbBEVg+k06m2LLA42izuAnEkSSDrsLqYi63If43mBHINtJ/QZn+nh1qO8+cDOy3yPKTWZT5qUO+vAesF8z2lHyeNAC+3Nux9JMAiIlS9vG515tvCFsdkOYMSfoIkU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534952; c=relaxed/simple; bh=gGBo/k79fltK/LiW8yEcfC3C6FkuDBayaBiJeug/v8c=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NIbOlCg5e1dBdcIgZ6sJVnyNWvlvMVDYCtTZHZMNPs+i+yPO+ilVOSrCHy9oihFyQT9YfD6KVQm0vChyQwB+Yiw06bu/+aRq2J8feHkFdCNt1u6CnQIPqVxfwNrBIry3358kk/hbiQdlE6Y44Lox0lsqTtsS4b1fVgtRn8FyY4w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ON8t+2UE; arc=none smtp.client-ip=209.85.208.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ON8t+2UE" Received: by mail-lj1-f202.google.com with SMTP id 38308e7fff4ca-30244e95199so16023961fa.2 for ; Fri, 10 Jan 2025 10:49:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534948; x=1737139748; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=49F9KmkByYRspQkaSYzsGBFq+GnQYA+RnfOGnAD1TNs=; b=ON8t+2UE7/islfJtSiy2q+ELr5b5u+ozG57Lj+nF7ZsrhvxkNw2nz2NTu7i1zq1sWs 5o+DnUwaCyly1FnjCldXczc0DfkblssPIH7TjXXlgVZsj3gUzTzIYcDl7fZW9vm4RJM7 NE8w6OnwTCqX1FR3ZA/yVQb5pVcXeoMrLBhnTVeHGBTnxMCZai9NGCAEK7pKb+vbxAJ9 b1vb57fUs0+GiZXKnnV31vxn89G1U5powoPIEuVyVmxF6puvK72HYxtQA61JTPIUoHsj OGAuG+lXRejXXtu+QFtTK0HmiJXp2fWzdlo0kYvHkwnvrnVTdmtP4trYZV8Et5AQmvU4 PDDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534948; x=1737139748; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=49F9KmkByYRspQkaSYzsGBFq+GnQYA+RnfOGnAD1TNs=; b=ndjQoVu2OW2+jmuWpX7zeG5Ezsootk4dOkzaI8eqq7Vg1kr+31gcVrK9qknRhoXVYg nOINL5ZW8KnmTT+QyLSHlhj3JKM+WypUlZlb16iFvHCRUYxQXwXo2FUbZZjcfZHD/5ng wxn1B+nlDmKLsXBEZhd1iAX0GQ/LYNGW5SnEQAwT0p4nMIDqqrlv7QBMXUTEekql2jhR 1RJA9FjP0dGBU1Vxl/llCRtZmWicX1eu7rQXFOr1s4tBidOZCbwt7JoVmfLEzfFuewWk i4UyyZEkL5xn3l357yc3ZXaZpNC6APStc1fY6RRcI5ghYNQxp7iqx5Owp6110lxnpT+7 +qDw== X-Forwarded-Encrypted: i=1; AJvYcCXP8fUJ5ncoDsYl8hZWSEmRrcOCI6NujhsxZWkLJE6BDoi8zl4G52KhSwOWMgFNa7a9mpuUmhYORFM=@vger.kernel.org X-Gm-Message-State: AOJu0YyUII/7XSNQTwpW76RgoejYYDDEtdz65l2bJH0XERq0jGOlmoqp 1izfdz50vJoTcRrx1ho1I8ZUHIvG4tXtkFLj6kTLC+blFZTTm7YDOybEab38O37/9vsj+K3Lfvv MnLRplMuwug== X-Google-Smtp-Source: AGHT+IFKWz0raFavEVPdqLi0m3c/SgK/uJNRKjAz6M2EuA2Ei6coxzOA0we1XFhGSIskVPMsumh+r4On/rf0kg== X-Received: from wmrn35.prod.google.com ([2002:a05:600c:5023:b0:434:f2eb:aa72]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1d07:b0:434:fa73:a907 with SMTP id 5b1f17b1804b1-436e269a5f5mr112362055e9.13.1736534508901; Fri, 10 Jan 2025 10:41:48 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:54 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-28-8419288bc805@google.com> Subject: [PATCH RFC v2 28/29] x86/pti: Disable PTI when ASI is on From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman Now that ASI has support for sandboxing userspace, although userspace now has much more mapped than it would under KPTI, in theory none of that data is important to protect. Note that one particular impact of this is it makes locally defeating KASLR easier. I don't think this is a great loss given [1] etc. Why do we pass in an argument instead of just having pti_check_boottime_disable() check boot_cpu_has(X86_FEATURE_ASI)? Just for clarity: I wanted it to be at least _sort of_ visible that it would break if you reordered asi_check_boottime_disable() afterwards. [1]: https://gruss.cc/files/prefetch.pdf and https://dl.acm.org/doi/pdf/10.1145/3623652.3623669 Signed-off-by: Brendan Jackman --- arch/x86/include/asm/pti.h | 6 ++++-- arch/x86/mm/init.c | 2 +- arch/x86/mm/pti.c | 14 +++++++++++++- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pti.h b/arch/x86/include/asm/pti.h index ab167c96b9ab474b33d778453db0bb550f42b0ac..79b9ba927db9b76ac3cc72cdda6f8b5fc413d352 100644 --- a/arch/x86/include/asm/pti.h +++ b/arch/x86/include/asm/pti.h @@ -3,12 +3,14 @@ #define _ASM_X86_PTI_H #ifndef __ASSEMBLY__ +#include + #ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION extern void pti_init(void); -extern void pti_check_boottime_disable(void); +extern void pti_check_boottime_disable(bool asi_enabled); extern void pti_finalize(void); #else -static inline void pti_check_boottime_disable(void) { } +static inline void pti_check_boottime_disable(bool asi_enabled) { } #endif #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index ded3a47f2a9c1f554824d4ad19f3b48bce271274..4ccf6d60705652805342abefc5e71cd00c563207 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -754,8 +754,8 @@ void __init init_mem_mapping(void) { unsigned long end; - pti_check_boottime_disable(); asi_check_boottime_disable(); + pti_check_boottime_disable(boot_cpu_has(X86_FEATURE_ASI)); probe_page_size_mask(); setup_pcid(); diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 851ec8f1363a8b389ea4579cc68bf3300a4df27c..b7132080d3c9b6962a0252383190335e171bafa6 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -76,7 +76,7 @@ static enum pti_mode { PTI_FORCE_ON } pti_mode; -void __init pti_check_boottime_disable(void) +void __init pti_check_boottime_disable(bool asi_enabled) { if (hypervisor_is_type(X86_HYPER_XEN_PV)) { pti_mode = PTI_FORCE_OFF; @@ -91,6 +91,18 @@ void __init pti_check_boottime_disable(void) return; } + if (asi_enabled) { + /* + * Having both ASI and PTI enabled is not a totally ridiculous + * thing to do; if you want ASI but you are not confident in the + * sensitivity annotations then it provides useful + * defence-in-depth. But, the implementation doesn't support it. + */ + if (pti_mode != PTI_FORCE_OFF) + pti_print_if_insecure("disabled by ASI"); + return; + } + if (pti_mode == PTI_FORCE_ON) pti_print_if_secure("force enabled on command line."); From patchwork Fri Jan 10 18:40:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 856323 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7419621516E for ; Fri, 10 Jan 2025 18:42:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534547; cv=none; b=aTC+n9oKtGZvMJ/zRGLxT9U1PKmjZvpzolblwzpBtnVWk66Bj5HcLs1hCzPFXzTWE/qNVFZGS6HLV/9/h+Q7pMXDS6/5I6lnehz+diIQxQnrc79EvZUoTqSudyVi/2lbF488tLYgpN5FB6G3EFqeF876TmxGKzIJ+WM6neHGGBo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736534547; c=relaxed/simple; bh=yFE2RvhsdHiuE5lhdo52Dzo1ojzSWSiC8Ss1voWwr0o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=CGricWcdAzLBaKkDBlzO/BYJaxlRKamW9wo+mYX5lVmUfGUcODxGnYrECjIuvcyCUIW9X+HE30a/VjPCiWwU5nCgP+kc7L7UflZa7qoco65CMv+0b+8Z3jHGPbudxXzMrZDgsRIAi4EduXm4co+t2olOBD5hgonOypyc99FOsOA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4qXm0akH; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4qXm0akH" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43651b1ba8aso16459715e9.1 for ; Fri, 10 Jan 2025 10:42:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534511; x=1737139311; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=t56obMWOlMyKyO7q6Si1cnEbfn2qyXwrpnzxR+4ONcI=; b=4qXm0akHB0cuC/EDQ8VuOoCpDXwtoPfxrekyuf3Jr5mxf6WGp7rbhmhHeDIXklqxd6 T9lxWQyVYIV47B5OJUReQPTb2cO2OVBUtcNSPlShJjCRvFudrMBkxkEdOt7YD8B8s+2v Xr3EQaZc/p07fpcwbTqNiK+qV7HY0Toaji/thgJjB5yEeJhTxg1FeoYSHEY8Jfy2/8EZ jiZa5Ti+Lb8moW5SGa97W3jyG6bI8MyoEEfqB2DFyV9BFxFzztzkYfBsUkAjDJ+UDd9H nFSU4xK+FYnLvbOSILO7311XYQqvF516GEImEec3eyOp+/ymIWgUdw/GtFKY2TxOmoyN vLJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534511; x=1737139311; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=t56obMWOlMyKyO7q6Si1cnEbfn2qyXwrpnzxR+4ONcI=; b=cI1GV9XvPM8o5HzLs6L6UdhPqVBOXIAw/ZCrREhUAng7AdkrnT2Fl6EklUBbuxrpmW BIrF7SvCuDelZv/CC1YCdglsbfAdisbJOHG9dOB7214Pt/xOVrV5dpbDJApFkbRT8iAA sM8AgvlleNmbN/81heHe0Y5ICz5i24BwXZpHku7/YL4SB0q+/hMA6uu0FDL4sIpS5Spq KFPo5OrAUMJMDFkNqVufwRHs3PzDoK1QYWdiVM5Zr5rplXbdFgyNbHmSe0WYixvDI2rO UACxXHmJMO+vfrDPh5JFDkhpCUemM7bbl6y6RORd3dCHFsOMBY1d1s3iVQA1j/6BzKMn bKjQ== X-Forwarded-Encrypted: i=1; AJvYcCWrAlV15F8rvEVoBxJRaVe73A2Fbzc4muS3RfNi+d5yuRUMnERCfxZPBmcv5rxqGeMsSUfWIW9wBpM=@vger.kernel.org X-Gm-Message-State: AOJu0Yy1W2uKw6w9P+sBVa/BtfA7qu9Fs3inj27ltVOjFFiHA8qZ/+NZ z4ls/LZ8MSafWd5MUrmigBlnVjy0GabeBBDlS14glmeRUvqcz32x8pqavv9UYva0CoLteuLFRET qMlQVYRslkg== X-Google-Smtp-Source: AGHT+IHNze04KHr3mNrPu4oWzk1uJr6kUgfUG8uUPvBwU8SZTj2uxyjbDaOwELGdGilpgEGrHE3bExV1JNVQyA== X-Received: from wmba16.prod.google.com ([2002:a05:600c:6dd0:b0:434:f350:9fc]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:70a:b0:38a:4184:1519 with SMTP id ffacd0b85a97d-38a873051e1mr10550801f8f.23.1736534511095; Fri, 10 Jan 2025 10:41:51 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:55 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-29-8419288bc805@google.com> Subject: [PATCH RFC v2 29/29] mm: asi: Stop ignoring asi=on cmdline flag From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman At this point the minimum requirements are in place for the kernel to operate correctly with ASI enabled. Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index f10f6614b26148e5ba423d8a44f640674573ee40..3e3956326936ea8550308ad004dbbb3738546f9f 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -207,14 +207,14 @@ void __init asi_check_boottime_disable(void) pr_info("ASI disabled through kernel command line.\n"); } else if (ret == 2 && !strncmp(arg, "on", 2)) { enabled = true; - pr_info("Ignoring asi=on param while ASI implementation is incomplete.\n"); + pr_info("ASI enabled through kernel command line.\n"); } else { pr_info("ASI %s by default.\n", enabled ? "enabled" : "disabled"); } if (enabled) - pr_info("ASI enablement ignored due to incomplete implementation.\n"); + setup_force_cpu_cap(X86_FEATURE_ASI); } /*