From patchwork Tue Dec 12 23:16:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 754133 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="lX1FHusX" Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FD5FB2 for ; Tue, 12 Dec 2023 15:17:12 -0800 (PST) Received: by mail-ot1-x336.google.com with SMTP id 46e09a7af769-6d9f879f784so3384896a34.2 for ; Tue, 12 Dec 2023 15:17:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423032; x=1703027832; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DmouMMuL78gKzcDFfbVzkzXE8UDTo3Q4rwqrYea/myQ=; b=lX1FHusXWCgacqtjr4amN5wcHfC3jsH9j08xBbFV+d/o/JAuCEJymeuO3F+i6aVb6u 2aH9ytcBN6Dr1ePRno5TSIhR62rt0pjMlPLjRdO7I1Lq/D35erfiH5gLJBF+MwP9LPf5 Z3vhD8hrveQ7OybEF7hZ94uP9vVNzyH98m614= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423032; x=1703027832; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DmouMMuL78gKzcDFfbVzkzXE8UDTo3Q4rwqrYea/myQ=; b=wvt2CsCdDbgon5at8xBYjXRqbP0378w71s1aSSbM7onlI+n1UwkM+oBN+wpVcvmpVC wOfBWrrI2XnydaNJ47czKzY4VPb7dILeCZI0o6CrSo7rVGoP27wGbgGUGNzrOQTI5VzA jjcpJTiAFYRsofyAHxx6DHx50bB3R0IVXihck+1EsD99w1rQ+Rerw6N49S10uwRAWhSc dgZUJMnDRM/bW/6kpdtSkrKFs7LYnBk0r/LckW+fdIBllHGdC4qdMrZUtjHufpsvc10B FAxVBCwLi6TVgG3OXrL2At/4A4Lh/FFtUFw867rk8q393ng128X3EfhAP+Mz0WoTOfWX Ef5Q== X-Gm-Message-State: AOJu0YzTGTqiczrhjh0dx60FpSN/03/eaxWTqAy7BidzfBEQdjN53BIM R2MdEPJosK5bukjJkkSsnktitw== X-Google-Smtp-Source: AGHT+IHWRbxCpkSD3Z0OA1+KEQM7Nb43rRei8W5sZbD56ePXA6pfN1fHTdnYRm+1Er/WpLM8j3G00Q== X-Received: by 2002:a05:6871:b06:b0:1fb:337c:402c with SMTP id fq6-20020a0568710b0600b001fb337c402cmr8802265oab.37.1702423031712; Tue, 12 Dec 2023 15:17:11 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id p2-20020aa78602000000b006ce691a1419sm8646308pfn.186.2023.12.12.15.17.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:11 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 01/11] mseal: Add mseal syscall. Date: Tue, 12 Dec 2023 23:16:55 +0000 Message-ID: <20231212231706.2680890-2-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu The new mseal() is an architecture independent syscall, and with following signature: mseal(void addr, size_t len, unsigned long types, unsigned long flags) addr/len: memory range. Must be continuous/allocated memory, or else mseal() will fail and no VMA is updated. For details on acceptable arguments, please refer to comments in mseal.c. Those are also covered by the selftest. This CL adds three sealing types. MM_SEAL_BASE MM_SEAL_PROT_PKEY MM_SEAL_SEAL The MM_SEAL_BASE: The base package includes the features common to all VMA sealing types. It prevents sealed VMAs from: 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Move or expand a different vma into the current location, via mremap(). 3> Modifying sealed VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. We consider the MM_SEAL_BASE feature, on which other sealing features will depend. For instance, it probably does not make sense to seal PROT_PKEY without sealing the BASE, and the kernel will implicitly add SEAL_BASE for SEAL_PROT_PKEY. (If the application wants to relax this in future, we could use the “flags” field in mseal() to overwrite this the behavior of implicitly adding SEAL_BASE.) The MM_SEAL_PROT_PKEY: Seal PROT and PKEY of the address range, in other words, mprotect() and pkey_mprotect() will be denied if the memory is sealed with MM_SEAL_PROT_PKEY. The MM_SEAL_SEAL MM_SEAL_SEAL denies adding a new seal for an VMA. The kernel will remember which seal types are applied, and the application doesn’t need to repeat all existing seal types in the next mseal(). Once a seal type is applied, it can’t be unsealed. Call mseal() on an existing seal type is a no-action, not a failure. Data structure: Internally, the vm_area_struct adds a new field, vm_seals, to store the bit masks. The vm_seals field is added because the existing vm_flags field is full in 32-bit CPUs. The vm_seals field can be merged into vm_flags in the future if the size of vm_flags is ever expanded. TODO: Sealed VMA won't merge with other VMA in this patch, merging support will be added in later patch. Signed-off-by: Jeff Xu --- include/linux/mm.h | 45 ++++++- include/linux/mm_types.h | 7 ++ include/linux/syscalls.h | 2 + include/uapi/linux/mman.h | 4 + kernel/sys_ni.c | 1 + mm/Kconfig | 9 ++ mm/Makefile | 1 + mm/mmap.c | 3 + mm/mseal.c | 257 ++++++++++++++++++++++++++++++++++++++ 9 files changed, 328 insertions(+), 1 deletion(-) create mode 100644 mm/mseal.c diff --git a/include/linux/mm.h b/include/linux/mm.h index 19fc73b02c9f..3d1120570de5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -30,6 +30,7 @@ #include #include #include +#include struct mempolicy; struct anon_vma; @@ -257,9 +258,17 @@ extern struct rw_semaphore nommu_region_sem; extern unsigned int kobjsize(const void *objp); #endif +/* + * MM_SEAL_ALL is all supported flags in mseal(). + */ +#define MM_SEAL_ALL ( \ + MM_SEAL_SEAL | \ + MM_SEAL_BASE | \ + MM_SEAL_PROT_PKEY) + /* * vm_flags in vm_area_struct, see mm_types.h. - * When changing, update also include/trace/events/mmflags.h + * When changing, update also include/trace/events/mmflags.h. */ #define VM_NONE 0x00000000 @@ -3308,6 +3317,40 @@ static inline void mm_populate(unsigned long addr, unsigned long len) static inline void mm_populate(unsigned long addr, unsigned long len) {} #endif +#ifdef CONFIG_MSEAL +static inline bool check_vma_seals_mergeable(unsigned long vm_seals) +{ + /* + * Set sealed VMA not mergeable with another VMA for now. + * This will be changed in later commit to make sealed + * VMA also mergeable. + */ + if (vm_seals & MM_SEAL_ALL) + return false; + + return true; +} + +/* + * return the valid sealing (after mask). + */ +static inline unsigned long vma_seals(struct vm_area_struct *vma) +{ + return (vma->vm_seals & MM_SEAL_ALL); +} + +#else +static inline bool check_vma_seals_mergeable(unsigned long vm_seals1) +{ + return true; +} + +static inline unsigned long vma_seals(struct vm_area_struct *vma) +{ + return 0; +} +#endif + /* These take the mm semaphore themselves */ extern int __must_check vm_brk(unsigned long, unsigned long); extern int __must_check vm_brk_flags(unsigned long, unsigned long, unsigned long); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 589f31ef2e84..052799173c86 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -687,6 +687,13 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_MSEAL + /* + * bit masks for seal. + * need this since vm_flags is full. + */ + unsigned long vm_seals; /* seal flags, see mm.h. */ +#endif } __randomize_layout; #ifdef CONFIG_SCHED_MM_CID diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 0901af60d971..b1c766b74765 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -812,6 +812,8 @@ asmlinkage long sys_process_mrelease(int pidfd, unsigned int flags); asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size, unsigned long prot, unsigned long pgoff, unsigned long flags); +asmlinkage long sys_mseal(unsigned long start, size_t len, unsigned long types, + unsigned long flags); asmlinkage long sys_mbind(unsigned long start, unsigned long len, unsigned long mode, const unsigned long __user *nmask, diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h index a246e11988d5..f561652886c4 100644 --- a/include/uapi/linux/mman.h +++ b/include/uapi/linux/mman.h @@ -55,4 +55,8 @@ struct cachestat { __u64 nr_recently_evicted; }; +#define MM_SEAL_SEAL _BITUL(0) +#define MM_SEAL_BASE _BITUL(1) +#define MM_SEAL_PROT_PKEY _BITUL(2) + #endif /* _UAPI_LINUX_MMAN_H */ diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 9db51ea373b0..716d64df522d 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -195,6 +195,7 @@ COND_SYSCALL(migrate_pages); COND_SYSCALL(move_pages); COND_SYSCALL(set_mempolicy_home_node); COND_SYSCALL(cachestat); +COND_SYSCALL(mseal); COND_SYSCALL(perf_event_open); COND_SYSCALL(accept4); diff --git a/mm/Kconfig b/mm/Kconfig index 264a2df5ecf5..63972d476d19 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1258,6 +1258,15 @@ config LOCK_MM_AND_FIND_VMA bool depends on !STACK_GROWSUP +config MSEAL + default n + bool "Enable mseal() system call" + depends on MMU + help + Enable the virtual memory sealing. + This feature allows sealing each virtual memory area separately with + multiple sealing types. + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index ec65984e2ade..643d8518dac0 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -120,6 +120,7 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o obj-$(CONFIG_PAGE_TABLE_CHECK) += page_table_check.o obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o obj-$(CONFIG_SECRETMEM) += secretmem.o +obj-$(CONFIG_MSEAL) += mseal.o obj-$(CONFIG_CMA_SYSFS) += cma_sysfs.o obj-$(CONFIG_USERFAULTFD) += userfaultfd.o obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o diff --git a/mm/mmap.c b/mm/mmap.c index 9e018d8dd7d6..42462c2a0c35 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -740,6 +740,9 @@ static inline bool is_mergeable_vma(struct vm_area_struct *vma, return false; if (!anon_vma_name_eq(anon_vma_name(vma), anon_name)) return false; + if (!check_vma_seals_mergeable(vma_seals(vma))) + return false; + return true; } diff --git a/mm/mseal.c b/mm/mseal.c new file mode 100644 index 000000000000..13bbe9ef5883 --- /dev/null +++ b/mm/mseal.c @@ -0,0 +1,257 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Implement mseal() syscall. + * + * Copyright (c) 2023 Google, Inc. + * + * Author: Jeff Xu + */ + +#include +#include +#include +#include +#include "internal.h" + +static bool can_do_mseal(unsigned long types, unsigned long flags) +{ + /* check types is a valid bitmap. */ + if (types & ~MM_SEAL_ALL) + return false; + + /* flags isn't used for now. */ + if (flags) + return false; + + return true; +} + +/* + * Check if a seal type can be added to VMA. + */ +static bool can_add_vma_seals(struct vm_area_struct *vma, unsigned long newSeals) +{ + /* When SEAL_MSEAL is set, reject if a new type of seal is added. */ + if ((vma->vm_seals & MM_SEAL_SEAL) && + (newSeals & ~(vma_seals(vma)))) + return false; + + return true; +} + +static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, + struct vm_area_struct **prev, unsigned long start, + unsigned long end, unsigned long addtypes) +{ + int ret = 0; + + if (addtypes & ~(vma_seals(vma))) { + /* + * Handle split at start and end. + * For now sealed VMA doesn't merge with other VMAs. + * This will be updated in later commit to make + * sealed VMA also mergeable. + */ + if (start != vma->vm_start) { + ret = split_vma(vmi, vma, start, 1); + if (ret) + goto out; + } + + if (end != vma->vm_end) { + ret = split_vma(vmi, vma, end, 0); + if (ret) + goto out; + } + + vma->vm_seals |= addtypes; + } + +out: + *prev = vma; + return ret; +} + +/* + * Check for do_mseal: + * 1> start is part of a valid vma. + * 2> end is part of a valid vma. + * 3> No gap (unallocated address) between start and end. + * 4> requested seal type can be added in given address range. + */ +static int check_mm_seal(unsigned long start, unsigned long end, + unsigned long newtypes) +{ + struct vm_area_struct *vma; + unsigned long nstart = start; + + VMA_ITERATOR(vmi, current->mm, start); + + /* going through each vma to check. */ + for_each_vma_range(vmi, vma, end) { + if (vma->vm_start > nstart) + /* unallocated memory found. */ + return -ENOMEM; + + if (!can_add_vma_seals(vma, newtypes)) + return -EACCES; + + if (vma->vm_end >= end) + return 0; + + nstart = vma->vm_end; + } + + return -ENOMEM; +} + +/* + * Apply sealing. + */ +static int apply_mm_seal(unsigned long start, unsigned long end, + unsigned long newtypes) +{ + unsigned long nstart, nend; + struct vm_area_struct *vma, *prev = NULL; + struct vma_iterator vmi; + int error = 0; + + vma_iter_init(&vmi, current->mm, start); + vma = vma_find(&vmi, end); + + prev = vma_prev(&vmi); + if (start > vma->vm_start) + prev = vma; + + nstart = start; + + /* going through each vma to update. */ + for_each_vma_range(vmi, vma, end) { + nend = vma->vm_end; + if (nend > end) + nend = end; + + error = mseal_fixup(&vmi, vma, &prev, nstart, nend, newtypes); + if (error) + break; + + nstart = vma->vm_end; + } + + return error; +} + +/* + * mseal(2) seals the VM's meta data from + * selected syscalls. + * + * addr/len: VM address range. + * + * The address range by addr/len must meet: + * start (addr) must be in a valid VMA. + * end (addr + len) must be in a valid VMA. + * no gap (unallocated memory) between start and end. + * start (addr) must be page aligned. + * + * len: len will be page aligned implicitly. + * + * types: bit mask for sealed syscalls. + * MM_SEAL_BASE: prevent VMA from: + * 1> Unmapping, moving to another location, and shrinking + * the size, via munmap() and mremap(), can leave an empty + * space, therefore can be replaced with a VMA with a new + * set of attributes. + * 2> Move or expand a different vma into the current location, + * via mremap(). + * 3> Modifying sealed VMA via mmap(MAP_FIXED). + * 4> Size expansion, via mremap(), does not appear to pose any + * specific risks to sealed VMAs. It is included anyway because + * the use case is unclear. In any case, users can rely on + * merging to expand a sealed VMA. + * + * The MM_SEAL_PROT_PKEY: + * Seal PROT and PKEY of the address range, in other words, + * mprotect() and pkey_mprotect() will be denied if the memory is + * sealed with MM_SEAL_PROT_PKEY. + * + * The MM_SEAL_SEAL + * MM_SEAL_SEAL denies adding a new seal for an VMA. + * + * The kernel will remember which seal types are applied, and the + * application doesn’t need to repeat all existing seal types in + * the next mseal(). Once a seal type is applied, it can’t be + * unsealed. Call mseal() on an existing seal type is a no-action, + * not a failure. + * + * flags: reserved. + * + * return values: + * zero: success. + * -EINVAL: + * invalid seal type. + * invalid input flags. + * addr is not page aligned. + * addr + len overflow. + * -ENOMEM: + * addr is not a valid address (not allocated). + * end (addr + len) is not a valid address. + * a gap (unallocated memory) between start and end. + * -EACCES: + * MM_SEAL_SEAL is set, adding a new seal is rejected. + * + * Note: + * user can call mseal(2) multiple times to add new seal types. + * adding an already added seal type is a no-action (no error). + * adding a new seal type after MM_SEAL_SEAL will be rejected. + * unseal() or removing a seal type is not supported. + */ +static int do_mseal(unsigned long start, size_t len_in, unsigned long types, + unsigned long flags) +{ + int ret = 0; + unsigned long end; + struct mm_struct *mm = current->mm; + size_t len; + + /* MM_SEAL_BASE is set when other seal types are set. */ + if (types & MM_SEAL_PROT_PKEY) + types |= MM_SEAL_BASE; + + if (!can_do_mseal(types, flags)) + return -EINVAL; + + start = untagged_addr(start); + if (!PAGE_ALIGNED(start)) + return -EINVAL; + + len = PAGE_ALIGN(len_in); + /* Check to see whether len was rounded up from small -ve to zero. */ + if (len_in && !len) + return -EINVAL; + + end = start + len; + if (end < start) + return -EINVAL; + + if (end == start) + return 0; + + if (mmap_write_lock_killable(mm)) + return -EINTR; + + ret = check_mm_seal(start, end, types); + if (ret) + goto out; + + ret = apply_mm_seal(start, end, types); + +out: + mmap_write_unlock(current->mm); + return ret; +} + +SYSCALL_DEFINE4(mseal, unsigned long, start, size_t, len, unsigned long, types, unsigned long, + flags) +{ + return do_mseal(start, len, types, flags); +} From patchwork Tue Dec 12 23:16:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 754132 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="JveXj9O7" Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78C72B7 for ; Tue, 12 Dec 2023 15:17:13 -0800 (PST) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-6cea2a38b48so5569935b3a.3 for ; Tue, 12 Dec 2023 15:17:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423033; x=1703027833; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9YF8gInYyKnjLyYVifyCViAm+ORlRHRAbg0UC91NVL0=; b=JveXj9O7DrMtVI6EPYNPmCyQRpnbxLDOhltqcF7+JB1Mj/2fPZVbqn3rHV/mDTKqto oq9A6Jrp3IpCXOYSFlMzA+YHd6jLv7dXk5x3n6Qri6rtHqhjf3nihi1/0kjx3w0qZdjK B9bMPguq9e8wEUUPgwmqrD0kB82X4Bo8Agkuo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423033; x=1703027833; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9YF8gInYyKnjLyYVifyCViAm+ORlRHRAbg0UC91NVL0=; b=Z4z0bg26TdVC7srNt0Ok+qCQf3V6xVKK73HUzvIPbdRiBvXL8crLSPpa5q9BjDY6DT J6dYfULIaBx/9aegPGRHOPcDP9aam0miZu3KrKIRXkAIPgtBRkGOIejUk1/X+MZUgwrZ +jegaklTOTpQiQ4CvDEgymUJe2AFk1Z358qQjHPMDrFsB/uGNmbdAX87JqfSRGMuPlGe opEUi44lHRfZs+B/5mymvazT2GAtVsyEdPpbHVtS2AEuTgs8A3TE0Qq+Ij8so4o9YKbL pq/9/BHhvJr+zHDLTZt1jetgHrBwQ9wl4RGRaGnGCmbW0cjnWN+YqJw/aH/ao9wONhKE lMhg== X-Gm-Message-State: AOJu0Yytpo4p//SQCbK2UHVZtYG7fhSmNttVAWBqWnaVYGbJNCX5uvlI UWsHgiC5/Gcpb+ydZCH7Svq+/g== X-Google-Smtp-Source: AGHT+IGEdvNtjL7vVDPGkLMDHm2ntWIQ9rfmcT11m+WtWcnAGDujtZ9MEI7HDfwGIcXn8coPiQFtiA== X-Received: by 2002:a05:6a00:2d1b:b0:6cb:bc92:c73f with SMTP id fa27-20020a056a002d1b00b006cbbc92c73fmr7494441pfb.2.1702423032923; Tue, 12 Dec 2023 15:17:12 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id m26-20020aa78a1a000000b006c988fda657sm8975614pfa.177.2023.12.12.15.17.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:12 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 02/11] mseal: Wire up mseal syscall Date: Tue, 12 Dec 2023 23:16:56 +0000 Message-ID: <20231212231706.2680890-3-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu Wire up mseal syscall for all architectures. Signed-off-by: Jeff Xu --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 ++ arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/uapi/asm-generic/unistd.h | 5 ++++- 19 files changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index b68f1f56b836..4de33b969009 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -496,3 +496,4 @@ 564 common futex_wake sys_futex_wake 565 common futex_wait sys_futex_wait 566 common futex_requeue sys_futex_requeue +567 common mseal sys_mseal diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 93d0d46cbb15..dacea023bb88 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -469,3 +469,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 531effca5f1f..298313d2e0af 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -39,7 +39,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 457 +#define __NR_compat_syscalls 458 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index c453291154fd..015c80b14206 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -917,6 +917,8 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake) __SYSCALL(__NR_futex_wait, sys_futex_wait) #define __NR_futex_requeue 456 __SYSCALL(__NR_futex_requeue, sys_futex_requeue) +#define __NR_mseal 457 +__SYSCALL(__NR_mseal, sys_mseal) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 81375ea78288..e8b40451693d 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -376,3 +376,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index f7f997a88bab..0da4a4dc1737 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -455,3 +455,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 2967ec26b978..ca8572222783 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -461,3 +461,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 383abb1713f4..4fd33623b7e8 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -394,3 +394,4 @@ 454 n32 futex_wake sys_futex_wake 455 n32 futex_wait sys_futex_wait 456 n32 futex_requeue sys_futex_requeue +457 n32 mseal sys_mseal diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index c9bd09ba905f..aaa6382781e0 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -370,3 +370,4 @@ 454 n64 futex_wake sys_futex_wake 455 n64 futex_wait sys_futex_wait 456 n64 futex_requeue sys_futex_requeue +457 n64 mseal sys_mseal diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index ba5ef6cea97a..bbdd6f151224 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -443,3 +443,4 @@ 454 o32 futex_wake sys_futex_wake 455 o32 futex_wait sys_futex_wait 456 o32 futex_requeue sys_futex_requeue +457 o32 mseal sys_mseal diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 9f0f6df55361..8dda80555c7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -454,3 +454,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 26fc41904266..d0aa97a669bc 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -542,3 +542,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 31be90b241f7..228f100f8565 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -458,3 +458,4 @@ 454 common futex_wake sys_futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue sys_futex_requeue +457 common mseal sys_mseal sys_mseal diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index 4bc5d488ab17..cf08ea4a7539 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -458,3 +458,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8404c8e50394..30796f78bdc2 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -501,3 +501,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 31c48bc2c3d8..c4163b904714 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -460,3 +460,4 @@ 454 i386 futex_wake sys_futex_wake 455 i386 futex_wait sys_futex_wait 456 i386 futex_requeue sys_futex_requeue +457 i386 mseal sys_mseal diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index a577bb27c16d..47fbc6ac0267 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -378,6 +378,7 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal # # Due to a historical design error, certain syscalls are numbered differently diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index dd71ecce8b86..fe5f562f6493 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -426,3 +426,4 @@ 454 common futex_wake sys_futex_wake 455 common futex_wait sys_futex_wait 456 common futex_requeue sys_futex_requeue +457 common mseal sys_mseal diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index d9e9cd13e577..1678245d8a2b 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -829,8 +829,11 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait) #define __NR_futex_requeue 456 __SYSCALL(__NR_futex_requeue, sys_futex_requeue) +#define __NR_mseal 457 +__SYSCALL(__NR_mseal, sys_mseal) + #undef __NR_syscalls -#define __NR_syscalls 457 +#define __NR_syscalls 458 /* * 32 bit systems traditionally used different From patchwork Tue Dec 12 23:16:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 754134 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="T6wsIoUG" Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43187CF for ; Tue, 12 Dec 2023 15:17:14 -0800 (PST) Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-6ce9e897aeaso5571220b3a.2 for ; Tue, 12 Dec 2023 15:17:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423034; x=1703027834; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Vgg0J3qxY24SnHD9sqtPXJJls+X5lh0PMbmZ3jrDVoQ=; b=T6wsIoUGX3fIwDf01FpNscoSVu4ubDm1E9zTDGFrSasB/SBA+rC/I52alm6lkdDh7m PqA/teR0nmJ7Hdgl+e/ylfl9d3bs1oKwrnNortNZZ67G6sZrRO+zazp3yadNcdjwxQtP BJ5VJ7ZYbEzS/lulvk5t77B4RRA68VnFkTfWM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423034; x=1703027834; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vgg0J3qxY24SnHD9sqtPXJJls+X5lh0PMbmZ3jrDVoQ=; b=RQt0jNoN/+Ijz8vwrQ1FKhOcOVXCtZZakPXG68FTbLaIz6e3Dd8y0aKzb+IRTWH1P9 bwqYJrL4EhSqbmFqzFYphxm6wXk57QkRTWRQN+Bb8jsZ3udk72zR4QDzDr6EP8ETKRZz EPYQgV7Ghffzb2ibbISNYU1Hjg/HjL6ERt4ztwIFyMxHlaI34K5LW3RpBZb7YFnf5GSG baEQZHKWPTnIAHqsZqhPU25OYGRBVra0uE9YiycTgep2aMRX/IYjNlylw7DpYzde4AmK 4JpfvGOJv/OHAQPPn79OMzbf5Ei0OM3Sz1JKSZZcbElJztm09OLBZ6ILITCnlT/aBNIe mh/g== X-Gm-Message-State: AOJu0YwF+QbnID/7WUR4TmyEIMIaMJEg4fgN6SSJCIXcspCAsZBHq/Hi sQAdUXFCOO+oD3+rZLRC0nm8ZA== X-Google-Smtp-Source: AGHT+IH0w5ya9vjJqVfLSwVuEpvbnpor4YM2jb0u3s8vpT0Fs9e2ntlip5J/9UPEmwW3uq6XLY2yzA== X-Received: by 2002:a05:6a20:429a:b0:190:50ec:e2e4 with SMTP id o26-20020a056a20429a00b0019050ece2e4mr9563392pzj.45.1702423033741; Tue, 12 Dec 2023 15:17:13 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id b4-20020aa78704000000b006ce41b1ba8csm8575780pfo.131.2023.12.12.15.17.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:13 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 03/11] mseal: add can_modify_mm and can_modify_vma Date: Tue, 12 Dec 2023 23:16:57 +0000 Message-ID: <20231212231706.2680890-4-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu Two utilities to be used later. can_modify_mm: checks sealing flags for given memory range. can_modify_vma: checks sealing flags for given vma. Signed-off-by: Jeff Xu --- include/linux/mm.h | 18 ++++++++++++++++++ mm/mseal.c | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3d1120570de5..2435acc1f44f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3339,6 +3339,12 @@ static inline unsigned long vma_seals(struct vm_area_struct *vma) return (vma->vm_seals & MM_SEAL_ALL); } +extern bool can_modify_mm(struct mm_struct *mm, unsigned long start, + unsigned long end, unsigned long checkSeals); + +extern bool can_modify_vma(struct vm_area_struct *vma, + unsigned long checkSeals); + #else static inline bool check_vma_seals_mergeable(unsigned long vm_seals1) { @@ -3349,6 +3355,18 @@ static inline unsigned long vma_seals(struct vm_area_struct *vma) { return 0; } + +static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start, + unsigned long end, unsigned long checkSeals) +{ + return true; +} + +static inline bool can_modify_vma(struct vm_area_struct *vma, + unsigned long checkSeals) +{ + return true; +} #endif /* These take the mm semaphore themselves */ diff --git a/mm/mseal.c b/mm/mseal.c index 13bbe9ef5883..d12aa628ebdc 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -26,6 +26,44 @@ static bool can_do_mseal(unsigned long types, unsigned long flags) return true; } +/* + * check if a vma is sealed for modification. + * return true, if modification is allowed. + */ +bool can_modify_vma(struct vm_area_struct *vma, + unsigned long checkSeals) +{ + if (checkSeals & vma_seals(vma)) + return false; + + return true; +} + +/* + * Check if the vmas of a memory range are allowed to be modified. + * the memory ranger can have a gap (unallocated memory). + * return true, if it is allowed. + */ +bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end, + unsigned long checkSeals) +{ + struct vm_area_struct *vma; + + VMA_ITERATOR(vmi, mm, start); + + if (!checkSeals) + return true; + + /* going through each vma to check. */ + for_each_vma_range(vmi, vma, end) { + if (!can_modify_vma(vma, checkSeals)) + return false; + } + + /* Allow by default. */ + return true; +} + /* * Check if a seal type can be added to VMA. */ From patchwork Tue Dec 12 23:16:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 753266 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="VwpHN4+I" Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 740FCD3 for ; Tue, 12 Dec 2023 15:17:15 -0800 (PST) Received: by mail-ot1-x336.google.com with SMTP id 46e09a7af769-6d9f4eed60eso3600149a34.1 for ; Tue, 12 Dec 2023 15:17:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423035; x=1703027835; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1D1PlLUfxR4CYnPMS2OzloWMrMloIYQmh/cjfDWMVaY=; b=VwpHN4+IJZhT1v7aZl1ccbIWgf9dK4aE019rwpcXSpH2ps8fUikdnlYv2UJ8l+nt9/ FlA/9gvk04NWswkP53hK603251om+FfWYZrjDdQ6LIcZkq3TZtcg+4a0GG+loqPhWvQD nXlonMf2JgHsnjKGXif9SSk/qGoZ1/LzWhhTQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423035; x=1703027835; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1D1PlLUfxR4CYnPMS2OzloWMrMloIYQmh/cjfDWMVaY=; b=ecBRzGyHFsCIUaysnLuo8/WQinhfksjomLSgUhpMT7S/1EfkGIcmiJBCR1/j+a9um0 enwoUTnVboPINzmzYTVSlM+jDLkwa8ru1hUT88cNK+6HlL3nxmiqzY+WLsI90Jp1g7uX 9FvcIsRLAG8n89gKIFPvMmMp6IKm4QnEQ9EeADHt0fBqBM/X81R33M5EhOTNZzfD9lEb aZdlU6f5iIQPpHC5vihDZzsb6C4JuuSRaGGBQ++rqSzErQHYGCZLGoBpFkxxEfzmZAGm D1SU+Br7O1qw0djXAnoIsQnDTS2t80pCWl5avnrGbYEIS6pMn0f+B12QZG7NqdHDan+o WcjQ== X-Gm-Message-State: AOJu0YzwafQDZlfqARB4ElolxRaWj5e/98H6IF8fVVGaNFMwyGhbyKmT umkYSkhHhmblXiHcbVfeKAx+kA== X-Google-Smtp-Source: AGHT+IELl9vxFDBtqlLqQjHJMgXEg7iTyGXK7EqaCyV/trQ5sRcstZZrUugD0WkeVeiGw0jvX4mjLw== X-Received: by 2002:a05:6358:262a:b0:16e:2898:5e02 with SMTP id l42-20020a056358262a00b0016e28985e02mr9003519rwc.32.1702423034702; Tue, 12 Dec 2023 15:17:14 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id s188-20020a635ec5000000b005c6617b52e6sm8763314pgb.5.2023.12.12.15.17.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:14 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 04/11] mseal: add MM_SEAL_BASE Date: Tue, 12 Dec 2023 23:16:58 +0000 Message-ID: <20231212231706.2680890-5-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu The base package includes the features common to all VMA sealing types. It prevents sealed VMAs from: 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Move or expand a different vma into the current location, via mremap(). 3> Modifying sealed VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. We consider the MM_SEAL_BASE feature, on which other sealing features will depend. For instance, it probably does not make sense to seal PROT_PKEY without sealing the BASE, and the kernel will implicitly add SEAL_BASE for SEAL_PROT_PKEY. (If the application wants to relax this in future, we could use the flags field in mseal() to overwrite this the behavior of implicitly adding SEAL_BASE.) Signed-off-by: Jeff Xu --- mm/mmap.c | 23 +++++++++++++++++++++++ mm/mremap.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 42462c2a0c35..dbc557bd460c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1259,6 +1259,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr, return -EEXIST; } + /* + * Check if the address range is sealed for do_mmap(). + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (!can_modify_mm(mm, addr, addr + len, MM_SEAL_BASE)) + return -EACCES; + if (prot == PROT_EXEC) { pkey = execute_only_pkey(mm); if (pkey < 0) @@ -2632,6 +2639,14 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, if (end == start) return -EINVAL; + /* + * Check if memory is sealed before arch_unmap. + * Prevent unmapping a sealed VMA. + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (!can_modify_mm(mm, start, end, MM_SEAL_BASE)) + return -EACCES; + /* arch_unmap() might do unmaps itself. */ arch_unmap(mm, start, end); @@ -3053,6 +3068,14 @@ int do_vma_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; + /* + * Check if memory is sealed before arch_unmap. + * Prevent unmapping a sealed VMA. + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (!can_modify_mm(mm, start, end, MM_SEAL_BASE)) + return -EACCES; + arch_unmap(mm, start, end); return do_vmi_align_munmap(vmi, vma, mm, start, end, uf, unlock); } diff --git a/mm/mremap.c b/mm/mremap.c index 382e81c33fc4..ff7429bfbbe1 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -835,7 +835,35 @@ static unsigned long mremap_to(unsigned long addr, unsigned long old_len, if ((mm->map_count + 2) >= sysctl_max_map_count - 3) return -ENOMEM; + /* + * In mremap_to() which moves a VMA to another address. + * Check if src address is sealed, if so, reject. + * In other words, prevent a sealed VMA being moved to + * another address. + * + * Place can_modify_mm here because mremap_to() + * does its own checking for address range, and we only + * check the sealing after passing those checks. + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (!can_modify_mm(mm, addr, addr + old_len, MM_SEAL_BASE)) + return -EACCES; + if (flags & MREMAP_FIXED) { + /* + * In mremap_to() which moves a VMA to another address. + * Check if dst address is sealed, if so, reject. + * In other words, prevent moving a vma to a sealed VMA. + * + * Place can_modify_mm here because mremap_to() does its + * own checking for address, and we only check the sealing + * after passing those checks. + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (!can_modify_mm(mm, new_addr, new_addr + new_len, + MM_SEAL_BASE)) + return -EACCES; + ret = do_munmap(mm, new_addr, new_len, uf_unmap_early); if (ret) goto out; @@ -994,6 +1022,20 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, goto out; } + /* + * This is shrink/expand case (not mremap_to()) + * Check if src address is sealed, if so, reject. + * In other words, prevent shrinking or expanding a sealed VMA. + * + * Place can_modify_mm here so we can keep the logic related to + * shrink/expand together. Perhaps we can extract below to be its + * own function in future. + */ + if (!can_modify_mm(mm, addr, addr + old_len, MM_SEAL_BASE)) { + ret = -EACCES; + goto out; + } + /* * Always allow a shrinking remap: that just unmaps * the unnecessary pages.. From patchwork Tue Dec 12 23:16:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 753267 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="TyBjuWlz" Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FDFCAB for ; Tue, 12 Dec 2023 15:17:16 -0800 (PST) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1d0ccda19eeso39413995ad.1 for ; Tue, 12 Dec 2023 15:17:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423035; x=1703027835; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nL+BTz2K9JmfgWW40zZKnYHQ0fFwxEyXyS5r4SraM1c=; b=TyBjuWlz2QWPbohiHkWTUANQfk4hCqqFjbNhgTsqEpaS3knwjlCsXJVWKWwzvBHIyV iew2Ew/6Ai7U/QGnuhP/wf45FSx+1sw8siUl2EpanVEvmZsyDwijfgEO/P33jP7y+b/S 4g95Imryk63/kRG8ozB3rZyt10UIxvTHCM6ak= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423035; x=1703027835; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nL+BTz2K9JmfgWW40zZKnYHQ0fFwxEyXyS5r4SraM1c=; b=c84679I3iPyb1W9pvwBPaCU7tkltwExh8IQIhWhA286BbQ3+guJ8GaP97XCVJTNJJD pEEMF5Wyv1SGsuexuBMY+CLj7wQmSrjrF1Wctb7nhw1u2m44zicH426Mtj7Ev/MsrHC9 SWCfhx8JGTlhFwVH7maph++N/hZnJa3HUnmFgMSicjUywsbBdmWl9fmIiIVi1u6HctNa 2W5fWOm/sZ7dWvdMkywca669T3P2IKJ6dZnBdn+XBsO+D5h8/elBu50Q4GH94OTYlMcn PCEk3zz/GLWsBne+4WGNth0RGt0ateB6FLvuA0L/cUPm/eWsTuqY0A7kLCnpwmnb8YEt Dnnw== X-Gm-Message-State: AOJu0YzdGpjDiOfBvXBnAhRneGEkSKrSU8PeAsi/Kdz4FmS78sAp0dtt VyPg1BTGqSUrTYXCEso1jGcWJA== X-Google-Smtp-Source: AGHT+IGFBfEL94OuzDJXPeAX8v2gc5zss9OFnE/JNle1zzc7v4J2psWLGmJv7VyTMrs5itl5BiG+DQ== X-Received: by 2002:a17:902:b702:b0:1d0:7d83:fdd9 with SMTP id d2-20020a170902b70200b001d07d83fdd9mr3543832pls.122.1702423035559; Tue, 12 Dec 2023 15:17:15 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id h2-20020a170902f54200b001cfc67d46efsm9074320plf.191.2023.12.12.15.17.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:15 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 05/11] mseal: add MM_SEAL_PROT_PKEY Date: Tue, 12 Dec 2023 23:16:59 +0000 Message-ID: <20231212231706.2680890-6-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu Seal PROT and PKEY of the address range, in other words, mprotect() and pkey_mprotect() will be denied if the memory is sealed with MM_SEAL_PROT_PKEY. Signed-off-by: Jeff Xu --- mm/mprotect.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/mm/mprotect.c b/mm/mprotect.c index b94fbb45d5c7..1527188b1e92 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -753,6 +754,15 @@ static int do_mprotect_pkey(unsigned long start, size_t len, } } + /* + * checking if PROT and PKEY is sealed. + * can_modify_mm assumes we have acquired the lock on MM. + */ + if (!can_modify_mm(current->mm, start, end, MM_SEAL_PROT_PKEY)) { + error = -EACCES; + goto out; + } + prev = vma_prev(&vmi); if (start > vma->vm_start) prev = vma; From patchwork Tue Dec 12 23:17:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 754131 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="UVi8MmUb" Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43991CE for ; Tue, 12 Dec 2023 15:17:17 -0800 (PST) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1d0bcc0c313so35636675ad.3 for ; Tue, 12 Dec 2023 15:17:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423037; x=1703027837; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=D0KleV6o3T8VwIpW2LkN+EL0QO50pQ5bkLoTJyBRonk=; b=UVi8MmUbHi0kNmSZ3GTDUM9D8DkaJTGkCEL2dbPQ0c2m9U/F/dZ2Sd89oztK1KW+OS SmE11kWukVcJXlgVH1mBBwmpD/8kX8qTm4YzCAl427EblT8kfxN4p3Uv3JVke1Ygzvr0 y9MnTZdXv61odrB09tA71ICC7wjyNSI2bDRq0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423037; x=1703027837; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=D0KleV6o3T8VwIpW2LkN+EL0QO50pQ5bkLoTJyBRonk=; b=Ts/yAN4YxA9JdnCUskgJVgvevlUJoFlcpfw6DmZXnWRUGHnqHSkfoGOngMK0pXhn85 IExcFPwKitkn2kzkQusH2/YXEo/WD30UY/peA+OslEGAHbdBEV5FJnUoyTlm1kKs6XwM kpOsu5qB4cP9t49Y/GmybFfA8f7XZAoIIPlduegclhrFQxIwlYB6O1AGr5BScmsWtHsJ Nk/Bb5dxT3hV8PIUxWqRRP2gW+AVWYXs0csdMWbXCg0vSKXGBSDioxwoyvY/P9MJvWKf zp25jw3Bd7L/81bdT52FJX7cZUuoUi3sP9WnkFG0OlFRo3IrO0TgPhbdr39lcylsc/0O 0jjA== X-Gm-Message-State: AOJu0Yw6oSYtL3+3CUPfR6pIlfw/7gCGFKPbj3a2Z5ShlUZiZKhP8+5U uygzdgBZPzrK22uMX8GD3tqdEOJ74lw3WSo9nRk= X-Google-Smtp-Source: AGHT+IGxR0oJJr0EjsRByUJbUwGXJ3kDSLTggQ7aP2NMz7xXn0QhhIQ1l2eHY3b5MujUVGQNKRdT6g== X-Received: by 2002:a17:902:d4d2:b0:1d3:4aab:194d with SMTP id o18-20020a170902d4d200b001d34aab194dmr281091plg.72.1702423036672; Tue, 12 Dec 2023 15:17:16 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id e1-20020a17090301c100b001d33e65b3cdsm1661489plh.112.2023.12.12.15.17.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:16 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 06/11] mseal: add sealing support for mmap Date: Tue, 12 Dec 2023 23:17:00 +0000 Message-ID: <20231212231706.2680890-7-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu Allow mmap() to set the sealing type when creating a mapping. This is useful for optimization because it avoids having to make two system calls: one for mmap() and one for mseal(). With this change, mmap() can take an input that specifies the sealing type, so only one system call is needed. This patch uses the "prot" field of mmap() to set the sealing. Three sealing types are added to match with MM_SEAL_xyz in mseal(). PROT_SEAL_SEAL PROT_SEAL_BASE PROT_SEAL_PROT_PKEY We also thought about using MAP_SEAL_xyz, which is a field in the mmap() function called "flags". However, this field is more about the type of mapping, such as MAP_FIXED_NOREPLACE. The "prot" field seems more appropriate for our case. It's worth noting that even though the sealing type is set via the "prot" field in mmap(), we don't require it to be set in the "prot" field in later mprotect() call. This is unlike the PROT_READ, PROT_WRITE, PROT_EXEC bits, e.g. if PROT_WRITE is not set in mprotect(), it means that the region is not writable. In other words, if you set PROT_SEAL_PROT_PKEY in mmap(), you don't need to set it in mprotect(). In fact, with the current approach, mseal() is used to set sealing on existing VMA. Signed-off-by: Jeff Xu Suggested-by: Pedro Falcato --- arch/mips/kernel/vdso.c | 10 +++- include/linux/mm.h | 63 +++++++++++++++++++++++++- include/uapi/asm-generic/mman-common.h | 13 ++++++ mm/mmap.c | 25 ++++++++-- 4 files changed, 105 insertions(+), 6 deletions(-) diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c index f6d40e43f108..6d1103d36af1 100644 --- a/arch/mips/kernel/vdso.c +++ b/arch/mips/kernel/vdso.c @@ -98,11 +98,17 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp) return -EINTR; if (IS_ENABLED(CONFIG_MIPS_FP_SUPPORT)) { - /* Map delay slot emulation page */ + /* + * Map delay slot emulation page. + * + * Note: passing vm_seals = 0 + * Don't support sealing for vdso for now. + * This might change when we add sealing support for vdso. + */ base = mmap_region(NULL, STACK_TOP, PAGE_SIZE, VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC, - 0, NULL); + 0, NULL, 0); if (IS_ERR_VALUE(base)) { ret = base; goto out; diff --git a/include/linux/mm.h b/include/linux/mm.h index 2435acc1f44f..5d3ee79f1438 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -266,6 +266,15 @@ extern unsigned int kobjsize(const void *objp); MM_SEAL_BASE | \ MM_SEAL_PROT_PKEY) +/* + * PROT_SEAL_ALL is all supported flags in mmap(). + * See include/uapi/asm-generic/mman-common.h. + */ +#define PROT_SEAL_ALL ( \ + PROT_SEAL_SEAL | \ + PROT_SEAL_BASE | \ + PROT_SEAL_PROT_PKEY) + /* * vm_flags in vm_area_struct, see mm_types.h. * When changing, update also include/trace/events/mmflags.h. @@ -3290,7 +3299,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo extern unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf); + struct list_head *uf, unsigned long vm_seals); extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, @@ -3339,12 +3348,47 @@ static inline unsigned long vma_seals(struct vm_area_struct *vma) return (vma->vm_seals & MM_SEAL_ALL); } +static inline void update_vma_seals(struct vm_area_struct *vma, unsigned long vm_seals) +{ + vma->vm_seals |= vm_seals; +} + extern bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned long checkSeals); extern bool can_modify_vma(struct vm_area_struct *vma, unsigned long checkSeals); +/* + * Convert prot field of mmap to vm_seals type. + */ +static inline unsigned long convert_mmap_seals(unsigned long prot) +{ + unsigned long seals = 0; + + /* + * set SEAL_PROT_PKEY implies SEAL_BASE. + */ + if (prot & PROT_SEAL_PROT_PKEY) + prot |= PROT_SEAL_BASE; + + /* + * The seal bits start from bit 26 of the "prot" field of mmap. + * see comments in include/uapi/asm-generic/mman-common.h. + */ + seals = (prot & PROT_SEAL_ALL) >> PROT_SEAL_BIT_BEGIN; + return seals; +} + +/* + * check input sealing type from the "prot" field of mmap(). + * for CONFIG_MSEAL case, this always return 0 (successful). + */ +static inline int check_mmap_seals(unsigned long prot, unsigned long *vm_seals) +{ + *vm_seals = convert_mmap_seals(prot); + return 0; +} #else static inline bool check_vma_seals_mergeable(unsigned long vm_seals1) { @@ -3367,6 +3411,23 @@ static inline bool can_modify_vma(struct vm_area_struct *vma, { return true; } + +static inline void update_vma_seals(struct vm_area_struct *vma, unsigned long vm_seals) +{ +} + +/* + * check input sealing type from the "prot" field of mmap(). + * For not CONFIG_MSEAL, if SEAL flag is set, it will return failure. + */ +static inline int check_mmap_seals(unsigned long prot, unsigned long *vm_seals) +{ + if (prot & PROT_SEAL_ALL) + return -EINVAL; + + return 0; +} + #endif /* These take the mm semaphore themselves */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..f07ad9e70b3a 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -17,6 +17,19 @@ #define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */ +/* + * The PROT_SEAL_XX defines memory sealings flags in the prot argument + * of mmap(). The bits currently take consecutive bits and match + * the same sequence as MM_SEAL_XX type, this allows convert_mmap_seals() + * to convert prot to MM_SEAL_XX type using bit operations. + * The include/uapi/linux/mman.h header file defines the MM_SEAL_XX type, + * which is used by the mseal() system call. + */ +#define PROT_SEAL_BIT_BEGIN 26 +#define PROT_SEAL_SEAL _BITUL(PROT_SEAL_BIT_BEGIN) /* 0x04000000 seal seal */ +#define PROT_SEAL_BASE _BITUL(PROT_SEAL_BIT_BEGIN + 1) /* 0x08000000 base for all sealing types */ +#define PROT_SEAL_PROT_PKEY _BITUL(PROT_SEAL_BIT_BEGIN + 2) /* 0x10000000 seal prot and pkey */ + /* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ #define MAP_FIXED 0x10 /* Interpret addr exactly */ diff --git a/mm/mmap.c b/mm/mmap.c index dbc557bd460c..3e1bf5a131b0 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1211,6 +1211,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, { struct mm_struct *mm = current->mm; int pkey = 0; + unsigned long vm_seals = 0; *populate = 0; @@ -1231,6 +1232,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (flags & MAP_FIXED_NOREPLACE) flags |= MAP_FIXED; + if (check_mmap_seals(prot, &vm_seals) < 0) + return -EINVAL; + if (!(flags & MAP_FIXED)) addr = round_hint_to_min(addr); @@ -1381,7 +1385,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags |= VM_NORESERVE; } - addr = mmap_region(file, addr, len, vm_flags, pgoff, uf); + addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, vm_seals); if (!IS_ERR_VALUE(addr) && ((vm_flags & VM_LOCKED) || (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE)) @@ -2679,7 +2683,7 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len, unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, - struct list_head *uf) + struct list_head *uf, unsigned long vm_seals) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; @@ -2723,7 +2727,13 @@ unsigned long mmap_region(struct file *file, unsigned long addr, next = vma_next(&vmi); prev = vma_prev(&vmi); - if (vm_flags & VM_SPECIAL) { + /* + * For now, sealed VMA doesn't merge with other VMA, + * Will change this in later commit when we make sealed VMA + * also mergeable. + */ + if ((vm_flags & VM_SPECIAL) || + (vm_seals & MM_SEAL_ALL)) { if (prev) vma_iter_next_range(&vmi); goto cannot_expand; @@ -2781,6 +2791,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma->vm_page_prot = vm_get_page_prot(vm_flags); vma->vm_pgoff = pgoff; + update_vma_seals(vma, vm_seals); + if (file) { if (vm_flags & VM_SHARED) { error = mapping_map_writable(file->f_mapping); @@ -2992,6 +3004,13 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (pgoff + (size >> PAGE_SHIFT) < pgoff) return ret; + /* + * Do not support sealing in remap_file_page. + * sealing is set via mmap() and mseal(). + */ + if (prot & PROT_SEAL_ALL) + return ret; + if (mmap_write_lock_killable(mm)) return -EINTR; From patchwork Tue Dec 12 23:17:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 753264 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="SVnvx4Kv" Received: from mail-oo1-xc2f.google.com (mail-oo1-xc2f.google.com [IPv6:2607:f8b0:4864:20::c2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94F71EA for ; Tue, 12 Dec 2023 15:17:18 -0800 (PST) Received: by mail-oo1-xc2f.google.com with SMTP id 006d021491bc7-5908a63a83fso3281132eaf.1 for ; Tue, 12 Dec 2023 15:17:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423038; x=1703027838; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9hi31AjxJtNZjlJknX0J3BHnj/VEX2GCu7NwkPG/5ME=; b=SVnvx4Kv7VfUbtzFewZaPKducGIyowYGkR4SRcsDB29q2C800rgrYLWJMuNJmy1VGP qtFqp6amVjfGe8Q0UKjLhnvZPIrJ7BGyRFe9K+J4uAc7y3+KhuzC/qushJhxYiFiChP5 mDu6LxxYTM7gBfvxZFkXJU7A8YZm8+3Z3TKQE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423038; x=1703027838; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9hi31AjxJtNZjlJknX0J3BHnj/VEX2GCu7NwkPG/5ME=; b=G6IxLUfGZSrSIwcnffo516XqNWMWjPHwKqEdb78xaEF06+flwf3K67E1EbgFxSoCR3 WXLnh/z5ViVzoaQPYENAQeKXPFMqluhlji409XedHRA+qN3r1q+re5wDfrEltELO0WI8 6PUhqzCwZGh5JrIzZJ75zvgdJFPiBsufqye9eg6fBpy5qYXe193U37NblxhdF+6Ckq6P 5ZdndQ9GiVeuVXjFC4KyyEMVQ7QaN3NGdCL81QS1JsET1MiGSsUuPzDbeImojxIYR+Kb Ag1OHf59d/X432276dKy/GxQxqPV1q9rtYuyZh7lq3E8oivHceuXo/Kubj4zvmLciWUU ptAQ== X-Gm-Message-State: AOJu0Yz0Ude7JlbsOvrdIiS1O5G+gT9cZ7uYjV42Q2LtJ0mRPtCMzYQS 7eQ8Nc6Qt9WTYLzZ23+F42PYvw== X-Google-Smtp-Source: AGHT+IHQHtOqaKUE81zeaJCn2q6SFy9/JngG9UEh+z5j67n5lmwKX/FrsIt4S5/AHC8z+ShhO7V8gw== X-Received: by 2002:a05:6358:9394:b0:170:5200:e1b2 with SMTP id h20-20020a056358939400b001705200e1b2mr7564052rwb.4.1702423037735; Tue, 12 Dec 2023 15:17:17 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id w5-20020a056a0014c500b006ce2e77ec4csm8687959pfu.193.2023.12.12.15.17.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:17 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 07/11] mseal: make sealed VMA mergeable. Date: Tue, 12 Dec 2023 23:17:01 +0000 Message-ID: <20231212231706.2680890-8-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu Add merge/split handling for mlock/madvice/mprotect/mmap case. Make sealed VMA mergeable with adjacent VMAs. This is so that we don't run out of VMAs, i.e. there is a max number of VMA per process. Signed-off-by: Jeff Xu Suggested-by: Jann Horn --- fs/userfaultfd.c | 8 +++++--- include/linux/mm.h | 31 +++++++++++++------------------ mm/madvise.c | 2 +- mm/mempolicy.c | 2 +- mm/mlock.c | 2 +- mm/mmap.c | 44 +++++++++++++++++++++----------------------- mm/mprotect.c | 2 +- mm/mremap.c | 2 +- mm/mseal.c | 23 ++++++++++++++++++----- 9 files changed, 62 insertions(+), 54 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 56eaae9dac1a..8ebee7c1c6cf 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -926,7 +926,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file) new_flags, vma->anon_vma, vma->vm_file, vma->vm_pgoff, vma_policy(vma), - NULL_VM_UFFD_CTX, anon_vma_name(vma)); + NULL_VM_UFFD_CTX, anon_vma_name(vma), + vma_seals(vma)); if (prev) { vma = prev; } else { @@ -1483,7 +1484,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), ((struct vm_userfaultfd_ctx){ ctx }), - anon_vma_name(vma)); + anon_vma_name(vma), vma_seals(vma)); if (prev) { /* vma_merge() invalidated the mas */ vma = prev; @@ -1668,7 +1669,8 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, prev = vma_merge(&vmi, mm, prev, start, vma_end, new_flags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - NULL_VM_UFFD_CTX, anon_vma_name(vma)); + NULL_VM_UFFD_CTX, anon_vma_name(vma), + vma_seals(vma)); if (prev) { vma = prev; goto next; diff --git a/include/linux/mm.h b/include/linux/mm.h index 5d3ee79f1438..1f162bb5b38d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3243,7 +3243,7 @@ extern struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *, struct vm_area_struct *prev, unsigned long addr, unsigned long end, unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t, struct mempolicy *, struct vm_userfaultfd_ctx, - struct anon_vma_name *); + struct anon_vma_name *, unsigned long vm_seals); extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *); extern int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *, unsigned long addr, int new_below); @@ -3327,19 +3327,6 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {} #endif #ifdef CONFIG_MSEAL -static inline bool check_vma_seals_mergeable(unsigned long vm_seals) -{ - /* - * Set sealed VMA not mergeable with another VMA for now. - * This will be changed in later commit to make sealed - * VMA also mergeable. - */ - if (vm_seals & MM_SEAL_ALL) - return false; - - return true; -} - /* * return the valid sealing (after mask). */ @@ -3353,6 +3340,14 @@ static inline void update_vma_seals(struct vm_area_struct *vma, unsigned long vm vma->vm_seals |= vm_seals; } +static inline bool check_vma_seals_mergeable(unsigned long vm_seals1, unsigned long vm_seals2) +{ + if ((vm_seals1 & MM_SEAL_ALL) != (vm_seals2 & MM_SEAL_ALL)) + return false; + + return true; +} + extern bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned long checkSeals); @@ -3390,14 +3385,14 @@ static inline int check_mmap_seals(unsigned long prot, unsigned long *vm_seals) return 0; } #else -static inline bool check_vma_seals_mergeable(unsigned long vm_seals1) +static inline unsigned long vma_seals(struct vm_area_struct *vma) { - return true; + return 0; } -static inline unsigned long vma_seals(struct vm_area_struct *vma) +static inline bool check_vma_seals_mergeable(unsigned long vm_seals1, unsigned long vm_seals2) { - return 0; + return true; } static inline bool can_modify_mm(struct mm_struct *mm, unsigned long start, diff --git a/mm/madvise.c b/mm/madvise.c index 4dded5d27e7e..e2d219a4b6ef 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -152,7 +152,7 @@ static int madvise_update_vma(struct vm_area_struct *vma, pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *prev = vma_merge(&vmi, mm, *prev, start, end, new_flags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx, anon_name); + vma->vm_userfaultfd_ctx, anon_name, vma_seals(vma)); if (*prev) { vma = *prev; goto success; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e52e3a0b8f2e..e70b69c64564 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -836,7 +836,7 @@ static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma, pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT); merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff, new_pol, - vma->vm_userfaultfd_ctx, anon_vma_name(vma)); + vma->vm_userfaultfd_ctx, anon_vma_name(vma), vma_seals(vma)); if (merged) { *prev = merged; return vma_replace_policy(merged, new_pol); diff --git a/mm/mlock.c b/mm/mlock.c index 06bdfab83b58..b537a2cbd337 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -428,7 +428,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *prev = vma_merge(vmi, mm, *prev, start, end, newflags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx, anon_vma_name(vma)); + vma->vm_userfaultfd_ctx, anon_vma_name(vma), vma_seals(vma)); if (*prev) { vma = *prev; goto success; diff --git a/mm/mmap.c b/mm/mmap.c index 3e1bf5a131b0..6da8d83f2e66 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -720,7 +720,8 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, static inline bool is_mergeable_vma(struct vm_area_struct *vma, struct file *file, unsigned long vm_flags, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - struct anon_vma_name *anon_name, bool may_remove_vma) + struct anon_vma_name *anon_name, bool may_remove_vma, + unsigned long vm_seals) { /* * VM_SOFTDIRTY should not prevent from VMA merging, if we @@ -740,7 +741,7 @@ static inline bool is_mergeable_vma(struct vm_area_struct *vma, return false; if (!anon_vma_name_eq(anon_vma_name(vma), anon_name)) return false; - if (!check_vma_seals_mergeable(vma_seals(vma))) + if (!check_vma_seals_mergeable(vma_seals(vma), vm_seals)) return false; return true; @@ -776,9 +777,10 @@ static bool can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags, struct anon_vma *anon_vma, struct file *file, pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - struct anon_vma_name *anon_name) + struct anon_vma_name *anon_name, unsigned long vm_seals) { - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, true) && + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, + anon_name, true, vm_seals) && is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { if (vma->vm_pgoff == vm_pgoff) return true; @@ -799,9 +801,10 @@ static bool can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags, struct anon_vma *anon_vma, struct file *file, pgoff_t vm_pgoff, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - struct anon_vma_name *anon_name) + struct anon_vma_name *anon_name, unsigned long vm_seals) { - if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name, false) && + if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, + anon_name, false, vm_seals) && is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) { pgoff_t vm_pglen; vm_pglen = vma_pages(vma); @@ -869,7 +872,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, struct anon_vma *anon_vma, struct file *file, pgoff_t pgoff, struct mempolicy *policy, struct vm_userfaultfd_ctx vm_userfaultfd_ctx, - struct anon_vma_name *anon_name) + struct anon_vma_name *anon_name, unsigned long vm_seals) { struct vm_area_struct *curr, *next, *res; struct vm_area_struct *vma, *adjust, *remove, *remove2; @@ -908,7 +911,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, /* Can we merge the predecessor? */ if (addr == prev->vm_end && mpol_equal(vma_policy(prev), policy) && can_vma_merge_after(prev, vm_flags, anon_vma, file, - pgoff, vm_userfaultfd_ctx, anon_name)) { + pgoff, vm_userfaultfd_ctx, anon_name, vm_seals)) { merge_prev = true; vma_prev(vmi); } @@ -917,7 +920,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, /* Can we merge the successor? */ if (next && mpol_equal(policy, vma_policy(next)) && can_vma_merge_before(next, vm_flags, anon_vma, file, pgoff+pglen, - vm_userfaultfd_ctx, anon_name)) { + vm_userfaultfd_ctx, anon_name, vm_seals)) { merge_next = true; } @@ -2727,13 +2730,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr, next = vma_next(&vmi); prev = vma_prev(&vmi); - /* - * For now, sealed VMA doesn't merge with other VMA, - * Will change this in later commit when we make sealed VMA - * also mergeable. - */ - if ((vm_flags & VM_SPECIAL) || - (vm_seals & MM_SEAL_ALL)) { + + if (vm_flags & VM_SPECIAL) { if (prev) vma_iter_next_range(&vmi); goto cannot_expand; @@ -2743,7 +2741,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* Check next */ if (next && next->vm_start == end && !vma_policy(next) && can_vma_merge_before(next, vm_flags, NULL, file, pgoff+pglen, - NULL_VM_UFFD_CTX, NULL)) { + NULL_VM_UFFD_CTX, NULL, vm_seals)) { merge_end = next->vm_end; vma = next; vm_pgoff = next->vm_pgoff - pglen; @@ -2752,9 +2750,9 @@ unsigned long mmap_region(struct file *file, unsigned long addr, /* Check prev */ if (prev && prev->vm_end == addr && !vma_policy(prev) && (vma ? can_vma_merge_after(prev, vm_flags, vma->anon_vma, file, - pgoff, vma->vm_userfaultfd_ctx, NULL) : + pgoff, vma->vm_userfaultfd_ctx, NULL, vm_seals) : can_vma_merge_after(prev, vm_flags, NULL, file, pgoff, - NULL_VM_UFFD_CTX, NULL))) { + NULL_VM_UFFD_CTX, NULL, vm_seals))) { merge_start = prev->vm_start; vma = prev; vm_pgoff = prev->vm_pgoff; @@ -2822,7 +2820,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr, merge = vma_merge(&vmi, mm, prev, vma->vm_start, vma->vm_end, vma->vm_flags, NULL, vma->vm_file, vma->vm_pgoff, NULL, - NULL_VM_UFFD_CTX, NULL); + NULL_VM_UFFD_CTX, NULL, vma_seals(vma)); if (merge) { /* * ->mmap() can change vma->vm_file and fput @@ -3130,14 +3128,14 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT)) return -ENOMEM; - /* * Expand the existing vma if possible; Note that singular lists do not * occur after forking, so the expand will only happen on new VMAs. */ if (vma && vma->vm_end == addr && !vma_policy(vma) && can_vma_merge_after(vma, flags, NULL, NULL, - addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL)) { + addr >> PAGE_SHIFT, NULL_VM_UFFD_CTX, NULL, + vma_seals(vma))) { vma_iter_config(vmi, vma->vm_start, addr + len); if (vma_iter_prealloc(vmi, vma)) goto unacct_fail; @@ -3380,7 +3378,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma = vma_merge(&vmi, mm, prev, addr, addr + len, vma->vm_flags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx, anon_vma_name(vma)); + vma->vm_userfaultfd_ctx, anon_vma_name(vma), vma_seals(vma)); if (new_vma) { /* * Source vma may have been merged into new_vma diff --git a/mm/mprotect.c b/mm/mprotect.c index 1527188b1e92..a4c90e71607b 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -632,7 +632,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); *pprev = vma_merge(vmi, mm, *pprev, start, end, newflags, vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx, anon_vma_name(vma)); + vma->vm_userfaultfd_ctx, anon_vma_name(vma), vma_seals(vma)); if (*pprev) { vma = *pprev; VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY); diff --git a/mm/mremap.c b/mm/mremap.c index ff7429bfbbe1..357efd6b48b9 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -1098,7 +1098,7 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, vma = vma_merge(&vmi, mm, vma, extension_start, extension_end, vma->vm_flags, vma->anon_vma, vma->vm_file, extension_pgoff, vma_policy(vma), - vma->vm_userfaultfd_ctx, anon_vma_name(vma)); + vma->vm_userfaultfd_ctx, anon_vma_name(vma), vma_seals(vma)); if (!vma) { vm_unacct_memory(pages); ret = -ENOMEM; diff --git a/mm/mseal.c b/mm/mseal.c index d12aa628ebdc..3b90dce7d20e 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -7,8 +7,10 @@ * Author: Jeff Xu */ +#include #include #include +#include #include #include #include "internal.h" @@ -81,14 +83,25 @@ static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end, unsigned long addtypes) { + pgoff_t pgoff; int ret = 0; + unsigned long newtypes = vma_seals(vma) | addtypes; + + if (newtypes != vma_seals(vma)) { + /* + * Attempt to merge with prev and next vma. + */ + pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT); + *prev = vma_merge(vmi, vma->vm_mm, *prev, start, end, vma->vm_flags, + vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma), + vma->vm_userfaultfd_ctx, anon_vma_name(vma), newtypes); + if (*prev) { + vma = *prev; + goto out; + } - if (addtypes & ~(vma_seals(vma))) { /* * Handle split at start and end. - * For now sealed VMA doesn't merge with other VMAs. - * This will be updated in later commit to make - * sealed VMA also mergeable. */ if (start != vma->vm_start) { ret = split_vma(vmi, vma, start, 1); @@ -102,7 +115,7 @@ static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, goto out; } - vma->vm_seals |= addtypes; + vma->vm_seals = newtypes; } out: From patchwork Tue Dec 12 23:17:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 753263 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="B+rra7Ql" Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62101F3 for ; Tue, 12 Dec 2023 15:17:19 -0800 (PST) Received: by mail-pj1-x1034.google.com with SMTP id 98e67ed59e1d1-28659348677so4893262a91.0 for ; Tue, 12 Dec 2023 15:17:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423039; x=1703027839; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oTvy78zmaIXZsTIPkmebRTxUrUOjQ/CWIjSmsuuygoc=; b=B+rra7Qlh5GXmswpkI4XOv34lb9U47sKHZldy63FrdMK39oQ+BpBMOsxphFQ35KlmF oO2swczMZNXZQk/piSbpkEIYmyPCRauAnkACK4EYdpbB+/H8e+OpVc5Let8uvrc7CU6W Wdg1NmQMl4rIAmcdksYTCMfqgmL6UQBxH6ezM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423039; x=1703027839; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oTvy78zmaIXZsTIPkmebRTxUrUOjQ/CWIjSmsuuygoc=; b=Ak7VB+qfm+dB1RbYeLtE9pTNGu1QAaG9/oW1/MRYnwiqGs4CvI9cCUpvtxZ/lyGHXK RKvX0oa2xlBUEAdr+WbhxwzxGsdYn2v2cdg2Kd6UxYjTYAGpot5hcMagbIpBs0bVg6Tq ikMhAFNEWA0SGiKNW2yivwaRPcyaYYh8ednfu9pwm1dGY5F581Udldu344J2InGLwfGR wgVmC7/GlnUAZ3omzCLMIbrp5xxxD4E0CfnRibuCm/AY5kiRuVgD1UprKnu701R4hTiN hAeAKSBpdqWkl6p5fnR8qrd7N/RroChXgwXYW6AfKXl7aemEiDA0TltswkyRXgKyqc5y 1uwQ== X-Gm-Message-State: AOJu0YxrXoYFW+f0kT4Q91K/0/EMHZVA6pYLo+hDoDi4eBgJidSSkil8 eGmTyZqPeqTNcBkcBOSEQw6ixA== X-Google-Smtp-Source: AGHT+IGWlFc4mkpXoFC0PEpzgfDHSLkC9hOefaTM1E9H1ZoIhebEpqs8jP0iWKY6qbWGYpKoD0aROg== X-Received: by 2002:a17:90a:7562:b0:28a:79b0:afc1 with SMTP id q89-20020a17090a756200b0028a79b0afc1mr5931086pjk.6.1702423038769; Tue, 12 Dec 2023 15:17:18 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id n20-20020a17090ade9400b00286a275d65asm11093878pjv.41.2023.12.12.15.17.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:18 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 08/11] mseal: add MM_SEAL_DISCARD_RO_ANON Date: Tue, 12 Dec 2023 23:17:02 +0000 Message-ID: <20231212231706.2680890-9-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu Certain types of madvise() operations are destructive, such as MADV_DONTNEED, which can effectively alter region contents by discarding pages, especially when memory is anonymous. This blocks such operations for anonymous memory which is not writable to the user. The MM_SEAL_DISCARD_RO_ANON blocks such operations if users don't have access to the memory, and the memory is anonymous memory. We do not think such sealing is useful for file-backed mapping because it should repopulate the memory contents from the underlying mapped file. We also do not think it is useful if the user can write to the memory because then the attacker can also write. Signed-off-by: Jeff Xu Suggested-by: Jann Horn Suggested-by: Stephen Röttger --- include/linux/mm.h | 19 +++++-- include/uapi/asm-generic/mman-common.h | 2 + include/uapi/linux/mman.h | 1 + mm/madvise.c | 12 +++++ mm/mseal.c | 73 ++++++++++++++++++++++++-- 5 files changed, 98 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f162bb5b38d..50dda474acc2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -264,7 +264,8 @@ extern unsigned int kobjsize(const void *objp); #define MM_SEAL_ALL ( \ MM_SEAL_SEAL | \ MM_SEAL_BASE | \ - MM_SEAL_PROT_PKEY) + MM_SEAL_PROT_PKEY | \ + MM_SEAL_DISCARD_RO_ANON) /* * PROT_SEAL_ALL is all supported flags in mmap(). @@ -273,7 +274,8 @@ extern unsigned int kobjsize(const void *objp); #define PROT_SEAL_ALL ( \ PROT_SEAL_SEAL | \ PROT_SEAL_BASE | \ - PROT_SEAL_PROT_PKEY) + PROT_SEAL_PROT_PKEY | \ + PROT_SEAL_DISCARD_RO_ANON) /* * vm_flags in vm_area_struct, see mm_types.h. @@ -3354,6 +3356,9 @@ extern bool can_modify_mm(struct mm_struct *mm, unsigned long start, extern bool can_modify_vma(struct vm_area_struct *vma, unsigned long checkSeals); +extern bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, + unsigned long end, int behavior); + /* * Convert prot field of mmap to vm_seals type. */ @@ -3362,9 +3367,9 @@ static inline unsigned long convert_mmap_seals(unsigned long prot) unsigned long seals = 0; /* - * set SEAL_PROT_PKEY implies SEAL_BASE. + * set SEAL_PROT_PKEY or SEAL_DISCARD_RO_ANON implies SEAL_BASE. */ - if (prot & PROT_SEAL_PROT_PKEY) + if (prot & (PROT_SEAL_PROT_PKEY | PROT_SEAL_DISCARD_RO_ANON)) prot |= PROT_SEAL_BASE; /* @@ -3407,6 +3412,12 @@ static inline bool can_modify_vma(struct vm_area_struct *vma, return true; } +static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, + unsigned long end, int behavior) +{ + return true; +} + static inline void update_vma_seals(struct vm_area_struct *vma, unsigned long vm_seals) { } diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index f07ad9e70b3a..bf503962409a 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -29,6 +29,8 @@ #define PROT_SEAL_SEAL _BITUL(PROT_SEAL_BIT_BEGIN) /* 0x04000000 seal seal */ #define PROT_SEAL_BASE _BITUL(PROT_SEAL_BIT_BEGIN + 1) /* 0x08000000 base for all sealing types */ #define PROT_SEAL_PROT_PKEY _BITUL(PROT_SEAL_BIT_BEGIN + 2) /* 0x10000000 seal prot and pkey */ +/* seal destructive madvise for non-writeable anonymous memory. */ +#define PROT_SEAL_DISCARD_RO_ANON _BITUL(PROT_SEAL_BIT_BEGIN + 3) /* 0x20000000 */ /* 0x01 - 0x03 are defined in linux/mman.h */ #define MAP_TYPE 0x0f /* Mask for type of mapping */ diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h index f561652886c4..3872cc118c8a 100644 --- a/include/uapi/linux/mman.h +++ b/include/uapi/linux/mman.h @@ -58,5 +58,6 @@ struct cachestat { #define MM_SEAL_SEAL _BITUL(0) #define MM_SEAL_BASE _BITUL(1) #define MM_SEAL_PROT_PKEY _BITUL(2) +#define MM_SEAL_DISCARD_RO_ANON _BITUL(3) #endif /* _UAPI_LINUX_MMAN_H */ diff --git a/mm/madvise.c b/mm/madvise.c index e2d219a4b6ef..ff038e323779 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1403,6 +1403,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * -EIO - an I/O error occurred while paging in data. * -EBADF - map exists, but area maps something that isn't a file. * -EAGAIN - a kernel resource was temporarily unavailable. + * -EACCES - memory is sealed. */ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) { @@ -1446,10 +1447,21 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh start = untagged_addr_remote(mm, start); end = start + len; + /* + * Check if the address range is sealed for do_madvise(). + * can_modify_mm_madv assumes we have acquired the lock on MM. + */ + if (!can_modify_mm_madv(mm, start, end, behavior)) { + error = -EACCES; + goto out; + } + blk_start_plug(&plug); error = madvise_walk_vmas(mm, start, end, behavior, madvise_vma_behavior); blk_finish_plug(&plug); + +out: if (write) mmap_write_unlock(mm); else diff --git a/mm/mseal.c b/mm/mseal.c index 3b90dce7d20e..294f48d33db6 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include "internal.h" @@ -66,6 +67,55 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end, return true; } +static bool is_madv_discard(int behavior) +{ + return behavior & + (MADV_FREE | MADV_DONTNEED | MADV_DONTNEED_LOCKED | + MADV_REMOVE | MADV_DONTFORK | MADV_WIPEONFORK); +} + +static bool is_ro_anon(struct vm_area_struct *vma) +{ + /* check anonymous mapping. */ + if (vma->vm_file || vma->vm_flags & VM_SHARED) + return false; + + /* + * check for non-writable: + * PROT=RO or PKRU is not writeable. + */ + if (!(vma->vm_flags & VM_WRITE) || + !arch_vma_access_permitted(vma, true, false, false)) + return true; + + return false; +} + +/* + * Check if the vmas of a memory range are allowed to be modified by madvise. + * the memory ranger can have a gap (unallocated memory). + * return true, if it is allowed. + */ +bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end, + int behavior) +{ + struct vm_area_struct *vma; + + VMA_ITERATOR(vmi, mm, start); + + if (!is_madv_discard(behavior)) + return true; + + /* going through each vma to check. */ + for_each_vma_range(vmi, vma, end) + if (is_ro_anon(vma) && !can_modify_vma( + vma, MM_SEAL_DISCARD_RO_ANON)) + return false; + + /* Allow by default. */ + return true; +} + /* * Check if a seal type can be added to VMA. */ @@ -76,6 +126,12 @@ static bool can_add_vma_seals(struct vm_area_struct *vma, unsigned long newSeals (newSeals & ~(vma_seals(vma)))) return false; + /* + * For simplicity, we allow adding all sealing types during mmap or mseal. + * The actual sealing check will happen later during particular action. + * E.g. For MM_SEAL_DISCARD_RO_ANON, we always allow adding it, at the + * time madvice() call, we will check if the sealing condition isn't met. + */ return true; } @@ -225,15 +281,22 @@ static int apply_mm_seal(unsigned long start, unsigned long end, * mprotect() and pkey_mprotect() will be denied if the memory is * sealed with MM_SEAL_PROT_PKEY. * - * The MM_SEAL_SEAL - * MM_SEAL_SEAL denies adding a new seal for an VMA. - * * The kernel will remember which seal types are applied, and the * application doesn’t need to repeat all existing seal types in * the next mseal(). Once a seal type is applied, it can’t be * unsealed. Call mseal() on an existing seal type is a no-action, * not a failure. * + * MM_SEAL_DISCARD_RO_ANON: block some destructive madvice() + * behavior, such as MADV_DONTNEED, which can effectively + * alter gegion contents by discarding pages, block such + * operation if users don't have write access to the memory, and + * the memory is anonymous memory. + * Setting this implies MM_SEAL_BASE is also set. + * + * The MM_SEAL_SEAL + * MM_SEAL_SEAL denies adding a new seal for an VMA. + * * flags: reserved. * * return values: @@ -264,8 +327,8 @@ static int do_mseal(unsigned long start, size_t len_in, unsigned long types, struct mm_struct *mm = current->mm; size_t len; - /* MM_SEAL_BASE is set when other seal types are set. */ - if (types & MM_SEAL_PROT_PKEY) + /* MM_SEAL_BASE is set when other seal types are set */ + if (types & (MM_SEAL_PROT_PKEY | MM_SEAL_DISCARD_RO_ANON)) types |= MM_SEAL_BASE; if (!can_do_mseal(types, flags)) From patchwork Tue Dec 12 23:17:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 754130 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="AlrxvD/t" Received: from mail-oo1-xc2c.google.com (mail-oo1-xc2c.google.com [IPv6:2607:f8b0:4864:20::c2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB688F4 for ; Tue, 12 Dec 2023 15:17:20 -0800 (PST) Received: by mail-oo1-xc2c.google.com with SMTP id 006d021491bc7-591487a1941so292632eaf.3 for ; Tue, 12 Dec 2023 15:17:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423040; x=1703027840; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cZjoMoZfbLgYK7cJaZ0EBpgFOBwo/12832X2C7MVeuk=; b=AlrxvD/tb29EN8sl2PtfN9xifrh6UOw7clLmq9lNivZ0Dqs8vq6wKTn2zeYOFUlILc S2dBS4mxoSJBA6Mnery4qzDfMCUCRxK7Lq3zZ5kcRuALSUm2rvHG++KypBTac2ckuM29 WDhFPLbF8fT6+rp94pGR1suK8Ep6uwTY1MrPE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423040; x=1703027840; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cZjoMoZfbLgYK7cJaZ0EBpgFOBwo/12832X2C7MVeuk=; b=IzJM6v6qiGcZfKxkljgqYqh3ObjmfG2aHn1+OTzELTiKE9rjh3iCZU81ySnjeWX1RD LajLV6Ra9NQzj17FCVcLIEncN0NT63itZZourJyCfzC/iNCAd00hU4vQmW2LoNo9WEeU ExeSx1HtgAAeuWi0AxUSk3Uw1I+6fwnM7zS+XaO/3rD7lheDekiBtbK31B6CAAvbWNaU bHplrFabK6zNOanlpSvdNNs2j1VKfoztXFNoXn3yTfYoGa8SsSxuM71S0OhX2gDNdyxj AxvVHZGCUl8i34C2V//e1sCtPn8avGk8TWLqeWvnzjMgaSNkG8rmec5uVw7eXBIbUDYs Ut5w== X-Gm-Message-State: AOJu0Yyow8aFLI0ePA8FSSsjQ0sntNkK+MKQGNutDRNlQzIXmsYQp9d8 l28kf1lD9/D8wioqB958SqvikQ== X-Google-Smtp-Source: AGHT+IGRI96+S8G+Wu1lZ2r1WtnFErIwHP6EFRT0QHSV6FRdjptIOjAkrTcn5HdK1GH4IA0qfPmw5g== X-Received: by 2002:a05:6358:7296:b0:170:17eb:2039 with SMTP id w22-20020a056358729600b0017017eb2039mr9338350rwf.34.1702423039955; Tue, 12 Dec 2023 15:17:19 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id z17-20020aa785d1000000b006ce5bb61a5fsm8749920pfn.3.2023.12.12.15.17.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:19 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 09/11] mseal: add MAP_SEALABLE to mmap() Date: Tue, 12 Dec 2023 23:17:03 +0000 Message-ID: <20231212231706.2680890-10-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu The MAP_SEALABLE flag is added to the flags field of mmap(). When present, it marks the map as sealable. A map created without MAP_SEALABLE will not support sealing; In other words, mseal() will fail for such a map. Applications that don't care about sealing will expect their behavior unchanged. For those that need sealing support, opt-in by adding MAP_SEALABLE when creating the map. Signed-off-by: Jeff Xu --- include/linux/mm.h | 52 ++++++++++++++++++++++++-- include/linux/mm_types.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/mmap.c | 2 +- mm/mseal.c | 7 +++- 5 files changed, 57 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 50dda474acc2..6f5dba9fbe21 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -267,6 +267,17 @@ extern unsigned int kobjsize(const void *objp); MM_SEAL_PROT_PKEY | \ MM_SEAL_DISCARD_RO_ANON) +/* define VM_SEALABLE in vm_seals of vm_area_struct. */ +#define VM_SEALABLE _BITUL(31) + +/* + * VM_SEALS_BITS_ALL marks the bits used for + * sealing in vm_seals of vm_area_structure. + */ +#define VM_SEALS_BITS_ALL ( \ + MM_SEAL_ALL | \ + VM_SEALABLE) + /* * PROT_SEAL_ALL is all supported flags in mmap(). * See include/uapi/asm-generic/mman-common.h. @@ -3330,9 +3341,17 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {} #ifdef CONFIG_MSEAL /* - * return the valid sealing (after mask). + * return the valid sealing (after mask), this includes sealable bit. */ static inline unsigned long vma_seals(struct vm_area_struct *vma) +{ + return (vma->vm_seals & VM_SEALS_BITS_ALL); +} + +/* + * return the enabled sealing type (after mask), without sealable bit. + */ +static inline unsigned long vma_enabled_seals(struct vm_area_struct *vma) { return (vma->vm_seals & MM_SEAL_ALL); } @@ -3342,9 +3361,14 @@ static inline void update_vma_seals(struct vm_area_struct *vma, unsigned long vm vma->vm_seals |= vm_seals; } +static inline bool is_vma_sealable(struct vm_area_struct *vma) +{ + return vma->vm_seals & VM_SEALABLE; +} + static inline bool check_vma_seals_mergeable(unsigned long vm_seals1, unsigned long vm_seals2) { - if ((vm_seals1 & MM_SEAL_ALL) != (vm_seals2 & MM_SEAL_ALL)) + if ((vm_seals1 & VM_SEALS_BITS_ALL) != (vm_seals2 & VM_SEALS_BITS_ALL)) return false; return true; @@ -3384,9 +3408,15 @@ static inline unsigned long convert_mmap_seals(unsigned long prot) * check input sealing type from the "prot" field of mmap(). * for CONFIG_MSEAL case, this always return 0 (successful). */ -static inline int check_mmap_seals(unsigned long prot, unsigned long *vm_seals) +static inline int check_mmap_seals(unsigned long prot, unsigned long *vm_seals, + unsigned long flags) { *vm_seals = convert_mmap_seals(prot); + if (*vm_seals) + /* setting one of MM_SEAL_XX means the map is sealable. */ + *vm_seals |= VM_SEALABLE; + else + *vm_seals |= (flags & MAP_SEALABLE) ? VM_SEALABLE:0; return 0; } #else @@ -3395,6 +3425,16 @@ static inline unsigned long vma_seals(struct vm_area_struct *vma) return 0; } +static inline unsigned long vma_enabled_seals(struct vm_area_struct *vma) +{ + return 0; +} + +static inline bool is_vma_sealable(struct vm_area_struct *vma) +{ + return false; +} + static inline bool check_vma_seals_mergeable(unsigned long vm_seals1, unsigned long vm_seals2) { return true; @@ -3426,11 +3466,15 @@ static inline void update_vma_seals(struct vm_area_struct *vma, unsigned long vm * check input sealing type from the "prot" field of mmap(). * For not CONFIG_MSEAL, if SEAL flag is set, it will return failure. */ -static inline int check_mmap_seals(unsigned long prot, unsigned long *vm_seals) +static inline int check_mmap_seals(unsigned long prot, unsigned long *vm_seals, + unsigned long flags) { if (prot & PROT_SEAL_ALL) return -EINVAL; + if (flags & MAP_SEALABLE) + return -EINVAL; + return 0; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 052799173c86..c9b04c545f39 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -691,6 +691,7 @@ struct vm_area_struct { /* * bit masks for seal. * need this since vm_flags is full. + * We could merge this into vm_flags if vm_flags ever get expanded. */ unsigned long vm_seals; /* seal flags, see mm.h. */ #endif diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index bf503962409a..57ef4507c00b 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -47,6 +47,7 @@ #define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be * uninitialized */ +#define MAP_SEALABLE 0x8000000 /* map is sealable. */ /* * Flags for mlock diff --git a/mm/mmap.c b/mm/mmap.c index 6da8d83f2e66..6e35e2070060 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1235,7 +1235,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (flags & MAP_FIXED_NOREPLACE) flags |= MAP_FIXED; - if (check_mmap_seals(prot, &vm_seals) < 0) + if (check_mmap_seals(prot, &vm_seals, flags) < 0) return -EINVAL; if (!(flags & MAP_FIXED)) diff --git a/mm/mseal.c b/mm/mseal.c index 294f48d33db6..5d4cf71b497e 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -121,9 +121,13 @@ bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long */ static bool can_add_vma_seals(struct vm_area_struct *vma, unsigned long newSeals) { + /* if map is not sealable, reject. */ + if (!is_vma_sealable(vma)) + return false; + /* When SEAL_MSEAL is set, reject if a new type of seal is added. */ if ((vma->vm_seals & MM_SEAL_SEAL) && - (newSeals & ~(vma_seals(vma)))) + (newSeals & ~(vma_enabled_seals(vma)))) return false; /* @@ -185,6 +189,7 @@ static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, * 2> end is part of a valid vma. * 3> No gap (unallocated address) between start and end. * 4> requested seal type can be added in given address range. + * 5> map is sealable. */ static int check_mm_seal(unsigned long start, unsigned long end, unsigned long newtypes) From patchwork Tue Dec 12 23:17:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 753262 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="I5GnwQeh" Received: from mail-oo1-xc35.google.com (mail-oo1-xc35.google.com [IPv6:2607:f8b0:4864:20::c35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B91010E for ; Tue, 12 Dec 2023 15:17:22 -0800 (PST) Received: by mail-oo1-xc35.google.com with SMTP id 006d021491bc7-59067f03282so3642679eaf.0 for ; Tue, 12 Dec 2023 15:17:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423041; x=1703027841; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4ks0JloPGamaa61d4Hb0TBtMWFANdRTfuPlMlHWvZS0=; b=I5GnwQeh50tWvbB+BJ/bBm/w1ci8AMXDbCBFFTvoQNQJG42WZtTkSklbQG0eXIHW2h xn7DVR1UV3EXN+Xvhk3C4QNLqmA62/A6mBQ91I6C15/A83FkLXGcnhg0FnRrb3jH/TAI U2gpAeuWHKoWb53A4o7/m/erXG1peS3/LX74M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423041; x=1703027841; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4ks0JloPGamaa61d4Hb0TBtMWFANdRTfuPlMlHWvZS0=; b=OAd7oMebBWaKGAgEoC4zULlAvslK0ibJWuUT+rxE5Ff2F/xU6TZT+CS7m2bfXW9Poh UTIFxL6WeLB0mPQyHgY2mKzpCDThbgXsUe+Pfz+JufUPsnL7Uuudrj9oHfbwOx8uB1VW 3JAxf8w1SBsd2adwM0hM2hso5mI9NzYFMpjizOD11K/BhM1XaliLCDr5VfOJwfUHIp1A k9rGqt63KfGiKQAwdDEQgm+aBqiS5vps2NZmeQiwQdH0Vyau4nVPUs5OKibGoCwoA+89 xXPI6BBPZ5Pr7EJU9Ec1N+WEs3KDvZrSyLupEoMvfz0JRGL5w/gMx0DhmN9cgX5sjBMV Jgog== X-Gm-Message-State: AOJu0Yxe6qT9Vt+T93ST6PhuYcgFpwL6hOj+LfGA6OnWkHeMW3IkdfuJ QPLammrwGf6qak6N2HkptOoShw== X-Google-Smtp-Source: AGHT+IF3Oq1fsZYeLBgFCye07IpKCl2Qm0GQHZhunt6nPEVTIAhM5EO+9T1cWMOp9tEYYtdQCNcv8A== X-Received: by 2002:a05:6359:223:b0:170:ad0e:c224 with SMTP id ej35-20020a056359022300b00170ad0ec224mr8667929rwb.23.1702423041112; Tue, 12 Dec 2023 15:17:21 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id r3-20020aa79883000000b006cbb65edcbfsm8922291pfl.12.2023.12.12.15.17.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:20 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 10/11] selftest mm/mseal memory sealing Date: Tue, 12 Dec 2023 23:17:04 +0000 Message-ID: <20231212231706.2680890-11-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu selftest for memory sealing change in mmap() and mseal(). Signed-off-by: Jeff Xu --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/mseal_test.c | 2141 +++++++++++++++++++++++ 4 files changed, 2144 insertions(+) create mode 100644 tools/testing/selftests/mm/mseal_test.c diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index cdc9ce4426b9..f0f22a649985 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -43,3 +43,4 @@ mdwe_test gup_longterm mkdirty va_high_addr_switch +mseal_test diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 6a9fc5693145..0c086cecc093 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -59,6 +59,7 @@ TEST_GEN_FILES += mlock2-tests TEST_GEN_FILES += mrelease_test TEST_GEN_FILES += mremap_dontunmap TEST_GEN_FILES += mremap_test +TEST_GEN_FILES += mseal_test TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress diff --git a/tools/testing/selftests/mm/config b/tools/testing/selftests/mm/config index be087c4bc396..cf2b8780e9b1 100644 --- a/tools/testing/selftests/mm/config +++ b/tools/testing/selftests/mm/config @@ -6,3 +6,4 @@ CONFIG_TEST_HMM=m CONFIG_GUP_TEST=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_MEM_SOFT_DIRTY=y +CONFIG_MSEAL=y diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c new file mode 100644 index 000000000000..0692485d8b3c --- /dev/null +++ b/tools/testing/selftests/mm/mseal_test.c @@ -0,0 +1,2141 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include "../kselftest.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * need those definition for manually build using gcc. + * gcc -I ../../../../usr/include -DDEBUG -O3 -DDEBUG -O3 mseal_test.c -o mseal_test + */ +#ifndef MM_SEAL_SEAL +#define MM_SEAL_SEAL 0x1 +#endif + +#ifndef MM_SEAL_BASE +#define MM_SEAL_BASE 0x2 +#endif + +#ifndef MM_SEAL_PROT_PKEY +#define MM_SEAL_PROT_PKEY 0x4 +#endif + +#ifndef MM_SEAL_DISCARD_RO_ANON +#define MM_SEAL_DISCARD_RO_ANON 0x8 +#endif + +#ifndef MAP_SEALABLE +#define MAP_SEALABLE 0x8000000 +#endif + +#ifndef PROT_SEAL_SEAL +#define PROT_SEAL_SEAL 0x04000000 +#endif + +#ifndef PROT_SEAL_BASE +#define PROT_SEAL_BASE 0x08000000 +#endif + +#ifndef PROT_SEAL_PROT_PKEY +#define PROT_SEAL_PROT_PKEY 0x10000000 +#endif + +#ifndef PROT_SEAL_DISCARD_RO_ANON +#define PROT_SEAL_DISCARD_RO_ANON 0x20000000 +#endif + +#ifndef PKEY_DISABLE_ACCESS +# define PKEY_DISABLE_ACCESS 0x1 +#endif + +#ifndef PKEY_DISABLE_WRITE +# define PKEY_DISABLE_WRITE 0x2 +#endif + +#ifndef PKEY_BITS_PER_KEY +#define PKEY_BITS_PER_PKEY 2 +#endif + +#ifndef PKEY_MASK +#define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) +#endif + +#ifndef DEBUG +#define LOG_TEST_ENTER() {} +#else +#define LOG_TEST_ENTER() {ksft_print_msg("%s\n", __func__); } +#endif + +#ifndef u64 +#define u64 unsigned long long +#endif + +/* + * define sys_xyx to call syscall directly. + */ +static int sys_mseal(void *start, size_t len, int types) +{ + int sret; + + errno = 0; + sret = syscall(__NR_mseal, start, len, types, 0); + return sret; +} + +int sys_mprotect(void *ptr, size_t size, unsigned long prot) +{ + int sret; + + errno = 0; + sret = syscall(SYS_mprotect, ptr, size, prot); + return sret; +} + +int sys_mprotect_pkey(void *ptr, size_t size, unsigned long orig_prot, + unsigned long pkey) +{ + int sret; + + errno = 0; + sret = syscall(__NR_pkey_mprotect, ptr, size, orig_prot, pkey); + return sret; +} + +int sys_munmap(void *ptr, size_t size) +{ + int sret; + + errno = 0; + sret = syscall(SYS_munmap, ptr, size); + return sret; +} + +static int sys_madvise(void *start, size_t len, int types) +{ + int sret; + + errno = 0; + sret = syscall(__NR_madvise, start, len, types); + return sret; +} + +int sys_pkey_alloc(unsigned long flags, unsigned long init_val) +{ + int ret = syscall(SYS_pkey_alloc, flags, init_val); + return ret; +} + +static inline unsigned int __read_pkey_reg(void) +{ + unsigned int eax, edx; + unsigned int ecx = 0; + unsigned int pkey_reg; + + asm volatile(".byte 0x0f,0x01,0xee\n\t" + : "=a" (eax), "=d" (edx) + : "c" (ecx)); + pkey_reg = eax; + return pkey_reg; +} + +static inline void __write_pkey_reg(u64 pkey_reg) +{ + unsigned int eax = pkey_reg; + unsigned int ecx = 0; + unsigned int edx = 0; + + asm volatile(".byte 0x0f,0x01,0xef\n\t" + : : "a" (eax), "c" (ecx), "d" (edx)); + assert(pkey_reg == __read_pkey_reg()); +} + +static inline unsigned long pkey_bit_position(int pkey) +{ + return pkey * PKEY_BITS_PER_PKEY; +} + +static inline u64 set_pkey_bits(u64 reg, int pkey, u64 flags) +{ + unsigned long shift = pkey_bit_position(pkey); + /* mask out bits from pkey in old value */ + reg &= ~((u64)PKEY_MASK << shift); + /* OR in new bits for pkey */ + reg |= (flags & PKEY_MASK) << shift; + return reg; +} + +static inline void set_pkey(int pkey, unsigned long pkey_value) +{ + unsigned long mask = (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE); + u64 new_pkey_reg; + + assert(!(pkey_value & ~mask)); + new_pkey_reg = set_pkey_bits(__read_pkey_reg(), pkey, pkey_value); + __write_pkey_reg(new_pkey_reg); +} + +void setup_single_address(int size, void **ptrOut) +{ + void *ptr; + + ptr = mmap(NULL, size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE | MAP_SEALABLE, -1, 0); + assert(ptr != (void *)-1); + *ptrOut = ptr; +} + +void setup_single_address_sealable(int size, void **ptrOut, bool sealable) +{ + void *ptr; + unsigned long mapflags = MAP_ANONYMOUS | MAP_PRIVATE; + + if (sealable) + mapflags |= MAP_SEALABLE; + + ptr = mmap(NULL, size, PROT_READ, mapflags, -1, 0); + assert(ptr != (void *)-1); + *ptrOut = ptr; +} + +void setup_single_address_rw_sealable(int size, void **ptrOut, bool sealable) +{ + void *ptr; + unsigned long mapflags = MAP_ANONYMOUS | MAP_PRIVATE; + + if (sealable) + mapflags |= MAP_SEALABLE; + + ptr = mmap(NULL, size, PROT_READ | PROT_WRITE, mapflags, -1, 0); + assert(ptr != (void *)-1); + *ptrOut = ptr; +} + +void clean_single_address(void *ptr, int size) +{ + int ret; + + ret = munmap(ptr, size); + assert(!ret); +} + +void seal_mprotect_single_address(void *ptr, int size) +{ + int ret; + + ret = sys_mseal(ptr, size, MM_SEAL_PROT_PKEY); + assert(!ret); +} + +void seal_discard_ro_anon_single_address(void *ptr, int size) +{ + int ret; + + ret = sys_mseal(ptr, size, MM_SEAL_DISCARD_RO_ANON); + assert(!ret); +} + +static void test_seal_addseals(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + /* adding seal one by one */ + + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + ret = sys_mseal(ptr, size, MM_SEAL_PROT_PKEY); + assert(!ret); + ret = sys_mseal(ptr, size, MM_SEAL_SEAL); + assert(!ret); +} + +static void test_seal_addseals_combined(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + ret = sys_mseal(ptr, size, MM_SEAL_PROT_PKEY); + assert(!ret); + + /* adding multiple seals */ + ret = sys_mseal(ptr, size, + MM_SEAL_PROT_PKEY | MM_SEAL_BASE| + MM_SEAL_SEAL); + assert(!ret); + + /* not adding more seal type, so ok. */ + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + + /* not adding more seal type, so ok. */ + ret = sys_mseal(ptr, size, MM_SEAL_SEAL); + assert(!ret); +} + +static void test_seal_addseals_reject(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + ret = sys_mseal(ptr, size, MM_SEAL_BASE | MM_SEAL_SEAL); + assert(!ret); + + /* MM_SEAL_SEAL is set, so not allow new seal type . */ + ret = sys_mseal(ptr, size, + MM_SEAL_PROT_PKEY | MM_SEAL_BASE | MM_SEAL_SEAL); + assert(ret < 0); +} + +static void test_seal_unmapped_start(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + // munmap 2 pages from ptr. + ret = sys_munmap(ptr, 2 * page_size); + assert(!ret); + + // mprotect will fail because 2 pages from ptr are unmapped. + ret = sys_mprotect(ptr, size, PROT_READ | PROT_WRITE); + assert(ret < 0); + + // mseal will fail because 2 pages from ptr are unmapped. + ret = sys_mseal(ptr, size, MM_SEAL_SEAL); + assert(ret < 0); + + ret = sys_mseal(ptr + 2 * page_size, 2 * page_size, MM_SEAL_SEAL); + assert(!ret); +} + +static void test_seal_unmapped_middle(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + // munmap 2 pages from ptr + page. + ret = sys_munmap(ptr + page_size, 2 * page_size); + assert(!ret); + + // mprotect will fail, since size is 4 pages. + ret = sys_mprotect(ptr, size, PROT_READ | PROT_WRITE); + assert(ret < 0); + + // mseal will fail as well. + ret = sys_mseal(ptr, size, MM_SEAL_SEAL); + assert(ret < 0); + + /* we still can add seal to the first page and last page*/ + ret = sys_mseal(ptr, page_size, MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(!ret); + + ret = sys_mseal(ptr + 3 * page_size, page_size, + MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(!ret); +} + +static void test_seal_unmapped_end(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + // unmap last 2 pages. + ret = sys_munmap(ptr + 2 * page_size, 2 * page_size); + assert(!ret); + + //mprotect will fail since last 2 pages are unmapped. + ret = sys_mprotect(ptr, size, PROT_READ | PROT_WRITE); + assert(ret < 0); + + //mseal will fail as well. + ret = sys_mseal(ptr, size, MM_SEAL_SEAL); + assert(ret < 0); + + /* The first 2 pages is not sealed, and can add seals */ + ret = sys_mseal(ptr, 2 * page_size, MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(!ret); +} + +static void test_seal_multiple_vmas(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + // use mprotect to split the vma into 3. + ret = sys_mprotect(ptr + page_size, 2 * page_size, + PROT_READ | PROT_WRITE); + assert(!ret); + + // mprotect will get applied to all 4 pages - 3 VMAs. + ret = sys_mprotect(ptr, size, PROT_READ); + assert(!ret); + + // use mprotect to split the vma into 3. + ret = sys_mprotect(ptr + page_size, 2 * page_size, + PROT_READ | PROT_WRITE); + assert(!ret); + + // mseal get applied to all 4 pages - 3 VMAs. + ret = sys_mseal(ptr, size, MM_SEAL_SEAL); + assert(!ret); + + // verify additional seal type will fail after MM_SEAL_SEAL set. + ret = sys_mseal(ptr, page_size, MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(ret < 0); + + ret = sys_mseal(ptr + page_size, 2 * page_size, + MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(ret < 0); + + ret = sys_mseal(ptr + 3 * page_size, page_size, + MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(ret < 0); +} + +static void test_seal_split_start(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + /* use mprotect to split at middle */ + ret = sys_mprotect(ptr, 2 * page_size, PROT_READ | PROT_WRITE); + assert(!ret); + + /* seal the first page, this will split the VMA */ + ret = sys_mseal(ptr, page_size, MM_SEAL_SEAL); + assert(!ret); + + /* can't add seal to the first page */ + ret = sys_mseal(ptr, page_size, MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(ret < 0); + + /* add seal to the remain 3 pages */ + ret = sys_mseal(ptr + page_size, 3 * page_size, + MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(!ret); +} + +static void test_seal_split_end(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + /* use mprotect to split at middle */ + ret = sys_mprotect(ptr, 2 * page_size, PROT_READ | PROT_WRITE); + assert(!ret); + + /* seal the last page */ + ret = sys_mseal(ptr + 3 * page_size, page_size, MM_SEAL_SEAL); + assert(!ret); + + /* adding seal to the last page is rejected. */ + ret = sys_mseal(ptr + 3 * page_size, page_size, + MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(ret < 0); + + /* Adding seals to the first 3 pages */ + ret = sys_mseal(ptr, 3 * page_size, MM_SEAL_SEAL | MM_SEAL_PROT_PKEY); + assert(!ret); +} + +static void test_seal_invalid_input(void) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(8 * page_size, &ptr); + clean_single_address(ptr + 4 * page_size, 4 * page_size); + + /* invalid flag */ + ret = sys_mseal(ptr, size, 0x20); + assert(ret < 0); + + ret = sys_mseal(ptr, size, 0x31); + assert(ret < 0); + + ret = sys_mseal(ptr, size, 0x3F); + assert(ret < 0); + + /* unaligned address */ + ret = sys_mseal(ptr + 1, 2 * page_size, MM_SEAL_SEAL); + assert(ret < 0); + + /* length too big */ + ret = sys_mseal(ptr, 5 * page_size, MM_SEAL_SEAL); + assert(ret < 0); + + /* start is not in a valid VMA */ + ret = sys_mseal(ptr - page_size, 5 * page_size, MM_SEAL_SEAL); + assert(ret < 0); +} + +static void test_seal_zero_length(void) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + ret = sys_mprotect(ptr, 0, PROT_READ | PROT_WRITE); + assert(!ret); + + /* seal 0 length will be OK, same as mprotect */ + ret = sys_mseal(ptr, 0, MM_SEAL_PROT_PKEY); + assert(!ret); + + // verify the 4 pages are not sealed by previous call. + ret = sys_mprotect(ptr, size, PROT_READ | PROT_WRITE); + assert(!ret); +} + +static void test_seal_twice(void) +{ + LOG_TEST_ENTER(); + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + setup_single_address(size, &ptr); + + ret = sys_mseal(ptr, size, MM_SEAL_PROT_PKEY); + assert(!ret); + + // apply the same seal will be OK. idempotent. + ret = sys_mseal(ptr, size, MM_SEAL_PROT_PKEY); + assert(!ret); + + ret = sys_mseal(ptr, size, + MM_SEAL_PROT_PKEY | MM_SEAL_BASE | + MM_SEAL_SEAL); + assert(!ret); + + ret = sys_mseal(ptr, size, + MM_SEAL_PROT_PKEY | MM_SEAL_BASE | + MM_SEAL_SEAL); + assert(!ret); + + ret = sys_mseal(ptr, size, MM_SEAL_SEAL); + assert(!ret); +} + +static void test_seal_mprotect(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + if (seal) + seal_mprotect_single_address(ptr, size); + + ret = sys_mprotect(ptr, size, PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +static void test_seal_start_mprotect(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + if (seal) + seal_mprotect_single_address(ptr, page_size); + + // the first page is sealed. + ret = sys_mprotect(ptr, page_size, PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); + + // pages after the first page is not sealed. + ret = sys_mprotect(ptr + page_size, page_size * 3, + PROT_READ | PROT_WRITE); + assert(!ret); +} + +static void test_seal_end_mprotect(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + if (seal) + seal_mprotect_single_address(ptr + page_size, 3 * page_size); + + /* first page is not sealed */ + ret = sys_mprotect(ptr, page_size, PROT_READ | PROT_WRITE); + assert(!ret); + + /* last 3 page are sealed */ + ret = sys_mprotect(ptr + page_size, page_size * 3, + PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +static void test_seal_mprotect_unalign_len(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + if (seal) + seal_mprotect_single_address(ptr, page_size * 2 - 1); + + // 2 pages are sealed. + ret = sys_mprotect(ptr, page_size * 2, PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); + + ret = sys_mprotect(ptr + page_size * 2, page_size, + PROT_READ | PROT_WRITE); + assert(!ret); +} + +static void test_seal_mprotect_unalign_len_variant_2(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + if (seal) + seal_mprotect_single_address(ptr, page_size * 2 + 1); + + // 3 pages are sealed. + ret = sys_mprotect(ptr, page_size * 3, PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); + + ret = sys_mprotect(ptr + page_size * 3, page_size, + PROT_READ | PROT_WRITE); + assert(!ret); +} + +static void test_seal_mprotect_two_vma(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + /* use mprotect to split */ + ret = sys_mprotect(ptr, page_size * 2, PROT_READ | PROT_WRITE); + assert(!ret); + + if (seal) + seal_mprotect_single_address(ptr, page_size * 4); + + ret = sys_mprotect(ptr, page_size * 2, PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); + + ret = sys_mprotect(ptr + page_size * 2, page_size * 2, + PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +static void test_seal_mprotect_two_vma_with_split(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + // use mprotect to split as two vma. + ret = sys_mprotect(ptr, page_size * 2, PROT_READ | PROT_WRITE); + assert(!ret); + + // mseal can apply across 2 vma, also split them. + if (seal) + seal_mprotect_single_address(ptr + page_size, page_size * 2); + + // the first page is not sealed. + ret = sys_mprotect(ptr, page_size, PROT_READ | PROT_WRITE); + assert(!ret); + + // the second page is sealed. + ret = sys_mprotect(ptr + page_size, page_size, PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); + + // the third page is sealed. + ret = sys_mprotect(ptr + 2 * page_size, page_size, + PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); + + // the fouth page is not sealed. + ret = sys_mprotect(ptr + 3 * page_size, page_size, + PROT_READ | PROT_WRITE); + assert(!ret); +} + +static void test_seal_mprotect_partial_mprotect(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + // seal one page. + if (seal) + seal_mprotect_single_address(ptr, page_size); + + // mprotect first 2 page will fail, since the first page are sealed. + ret = sys_mprotect(ptr, 2 * page_size, PROT_READ | PROT_WRITE); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +static void test_seal_mprotect_two_vma_with_gap(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + // use mprotect to split. + ret = sys_mprotect(ptr, page_size, PROT_READ | PROT_WRITE); + assert(!ret); + + // use mprotect to split. + ret = sys_mprotect(ptr + 3 * page_size, page_size, + PROT_READ | PROT_WRITE); + assert(!ret); + + // use munmap to free two pages in the middle + ret = sys_munmap(ptr + page_size, 2 * page_size); + assert(!ret); + + // mprotect will fail, because there is a gap in the address. + // notes, internally mprotect still updated the first page. + ret = sys_mprotect(ptr, 4 * page_size, PROT_READ); + assert(ret < 0); + + // mseal will fail as well. + ret = sys_mseal(ptr, 4 * page_size, MM_SEAL_PROT_PKEY); + assert(ret < 0); + + // unlike mprotect, the first page is not sealed. + ret = sys_mprotect(ptr, page_size, PROT_READ); + assert(ret == 0); + + // the last page is not sealed. + ret = sys_mprotect(ptr + 3 * page_size, page_size, PROT_READ); + assert(ret == 0); +} + +static void test_seal_mprotect_split(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + //use mprotect to split. + ret = sys_mprotect(ptr, page_size, PROT_READ | PROT_WRITE); + assert(!ret); + + //seal all 4 pages. + if (seal) { + ret = sys_mseal(ptr, 4 * page_size, MM_SEAL_PROT_PKEY); + assert(!ret); + } + + //madvice is OK. + ret = sys_madvise(ptr, page_size * 2, MADV_WILLNEED); + assert(!ret); + + //mprotect is sealed. + ret = sys_mprotect(ptr, 2 * page_size, PROT_READ); + if (seal) + assert(ret < 0); + else + assert(!ret); + + + ret = sys_mprotect(ptr + 2 * page_size, 2 * page_size, PROT_READ); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +static void test_seal_mprotect_merge(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + // use mprotect to split one page. + ret = sys_mprotect(ptr, page_size, PROT_READ | PROT_WRITE); + assert(!ret); + + // seal first two pages. + if (seal) { + ret = sys_mseal(ptr, 2 * page_size, MM_SEAL_PROT_PKEY); + assert(!ret); + } + + ret = sys_madvise(ptr, page_size, MADV_WILLNEED); + assert(!ret); + + // 2 pages are sealed. + ret = sys_mprotect(ptr, 2 * page_size, PROT_READ); + if (seal) + assert(ret < 0); + else + assert(!ret); + + // last 2 pages are not sealed. + ret = sys_mprotect(ptr + 2 * page_size, 2 * page_size, PROT_READ); + assert(ret == 0); +} + +static void test_seal_munmap(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + // 4 pages are sealed. + ret = sys_munmap(ptr, size); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +/* + * allocate 4 pages, + * use mprotect to split it as two VMAs + * seal the whole range + * munmap will fail on both + */ +static void test_seal_munmap_two_vma(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + /* use mprotect to split */ + ret = sys_mprotect(ptr, page_size * 2, PROT_READ | PROT_WRITE); + assert(!ret); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + ret = sys_munmap(ptr, page_size * 2); + if (seal) + assert(ret < 0); + else + assert(!ret); + + ret = sys_munmap(ptr + page_size, page_size * 2); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +/* + * allocate a VMA with 4 pages. + * munmap the middle 2 pages. + * seal the whole 4 pages, will fail. + * note: one of the pages are sealed + * munmap the first page will be OK. + * munmap the last page will be OK. + */ +static void test_seal_munmap_vma_with_gap(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + ret = sys_munmap(ptr + page_size, page_size * 2); + assert(!ret); + + if (seal) { + // can't have gap in the middle. + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(ret < 0); + } + + ret = sys_munmap(ptr, page_size); + assert(!ret); + + ret = sys_munmap(ptr + page_size * 2, page_size); + assert(!ret); + + ret = sys_munmap(ptr, size); + assert(!ret); +} + +static void test_munmap_start_freed(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + // unmap the first page. + ret = sys_munmap(ptr, page_size); + assert(!ret); + + // seal the last 3 pages. + if (seal) { + ret = sys_mseal(ptr + page_size, 3 * page_size, MM_SEAL_BASE); + assert(!ret); + } + + // unmap from the first page. + ret = sys_munmap(ptr, size); + if (seal) { + assert(ret < 0); + + // use mprotect to verify page is not unmapped. + ret = sys_mprotect(ptr + page_size, 3 * page_size, PROT_READ); + assert(!ret); + } else + // note: this will be OK, even the first page is + // already unmapped. + assert(!ret); +} + +static void test_munmap_end_freed(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + // unmap last page. + ret = sys_munmap(ptr + page_size * 3, page_size); + assert(!ret); + + // seal the first 3 pages. + if (seal) { + ret = sys_mseal(ptr, 3 * page_size, MM_SEAL_BASE); + assert(!ret); + } + + // unmap all pages. + ret = sys_munmap(ptr, size); + if (seal) { + assert(ret < 0); + + // use mprotect to verify page is not unmapped. + ret = sys_mprotect(ptr, 3 * page_size, PROT_READ); + assert(!ret); + } else + assert(!ret); +} + +static void test_munmap_middle_freed(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + // unmap 2 pages in the middle. + ret = sys_munmap(ptr + page_size, page_size * 2); + assert(!ret); + + // seal the first page. + if (seal) { + ret = sys_mseal(ptr, page_size, MM_SEAL_BASE); + assert(!ret); + } + + // munmap all 4 pages. + ret = sys_munmap(ptr, size); + if (seal) { + assert(ret < 0); + + // use mprotect to verify page is not unmapped. + ret = sys_mprotect(ptr, page_size, PROT_READ); + assert(!ret); + } else + assert(!ret); +} + +void test_seal_mremap_shrink(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + // shrink from 4 pages to 2 pages. + ret2 = mremap(ptr, size, 2 * page_size, 0, 0); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else { + assert(ret2 != MAP_FAILED); + + } +} + +void test_seal_mremap_expand(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + // ummap last 2 pages. + ret = sys_munmap(ptr + 2 * page_size, 2 * page_size); + assert(!ret); + + if (seal) { + ret = sys_mseal(ptr, 2 * page_size, MM_SEAL_BASE); + assert(!ret); + } + + // expand from 2 page to 4 pages. + ret2 = mremap(ptr, 2 * page_size, 4 * page_size, 0, 0); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else { + assert(ret2 == ptr); + + } +} + +void test_seal_mremap_move(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr, *newPtr; + unsigned long page_size = getpagesize(); + unsigned long size = page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + setup_single_address(size, &newPtr); + clean_single_address(newPtr, size); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + // move from ptr to fixed address. + ret2 = mremap(ptr, size, size, MREMAP_MAYMOVE | MREMAP_FIXED, newPtr); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else { + assert(ret2 != MAP_FAILED); + + } +} + +void test_seal_mmap_overwrite_prot(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + // use mmap to change protection. + ret2 = mmap(ptr, size, PROT_NONE, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else + assert(ret2 == ptr); +} + +void test_seal_mmap_expand(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 12 * page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + // ummap last 4 pages. + ret = sys_munmap(ptr + 8 * page_size, 4 * page_size); + assert(!ret); + + if (seal) { + ret = sys_mseal(ptr, 8 * page_size, MM_SEAL_BASE); + assert(!ret); + } + + // use mmap to expand. + ret2 = mmap(ptr, size, PROT_READ, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else + assert(ret2 == ptr); +} + +void test_seal_mmap_shrink(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 12 * page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + // use mmap to shrink. + ret2 = mmap(ptr, 8 * page_size, PROT_READ, + MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else + assert(ret2 == ptr); +} + +void test_seal_mremap_shrink_fixed(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + void *newAddr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + setup_single_address(size, &newAddr); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + // mremap to move and shrink to fixed address + ret2 = mremap(ptr, size, 2 * page_size, MREMAP_MAYMOVE | MREMAP_FIXED, + newAddr); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else + assert(ret2 == newAddr); +} + +void test_seal_mremap_expand_fixed(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + void *newAddr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + setup_single_address(page_size, &ptr); + setup_single_address(size, &newAddr); + + if (seal) { + ret = sys_mseal(newAddr, size, MM_SEAL_BASE); + assert(!ret); + } + + // mremap to move and expand to fixed address + ret2 = mremap(ptr, page_size, size, MREMAP_MAYMOVE | MREMAP_FIXED, + newAddr); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else + assert(ret2 == newAddr); +} + +void test_seal_mremap_move_fixed(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + void *newAddr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + setup_single_address(size, &newAddr); + + if (seal) { + ret = sys_mseal(newAddr, size, MM_SEAL_BASE); + assert(!ret); + } + + // mremap to move to fixed address + ret2 = mremap(ptr, size, size, MREMAP_MAYMOVE | MREMAP_FIXED, newAddr); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else + assert(ret2 == newAddr); +} + +void test_seal_mremap_move_fixed_zero(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + void *newAddr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + /* + * MREMAP_FIXED can move the mapping to zero address + */ + ret2 = mremap(ptr, size, 2 * page_size, MREMAP_MAYMOVE | MREMAP_FIXED, + 0); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else { + assert(ret2 == 0); + + } +} + +void test_seal_mremap_move_dontunmap(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + void *newAddr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + // mremap to move, and don't unmap src addr. + ret2 = mremap(ptr, size, size, MREMAP_MAYMOVE | MREMAP_DONTUNMAP, 0); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else { + assert(ret2 != MAP_FAILED); + + } +} + +void test_seal_mremap_move_dontunmap_anyaddr(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + void *newAddr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + setup_single_address(size, &ptr); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(!ret); + } + + /* + * The 0xdeaddead should not have effect on dest addr + * when MREMAP_DONTUNMAP is set. + */ + ret2 = mremap(ptr, size, size, MREMAP_MAYMOVE | MREMAP_DONTUNMAP, + 0xdeaddead); + if (seal) { + assert(ret2 == MAP_FAILED); + assert(errno == EACCES); + } else { + assert(ret2 != MAP_FAILED); + assert((long)ret2 != 0xdeaddead); + + } +} + +unsigned long get_vma_size(void *addr) +{ + FILE *maps; + char line[256]; + int size = 0; + uintptr_t addr_start, addr_end; + + maps = fopen("/proc/self/maps", "r"); + if (!maps) + return 0; + + while (fgets(line, sizeof(line), maps)) { + if (sscanf(line, "%lx-%lx", &addr_start, &addr_end) == 2) { + if (addr_start == (uintptr_t) addr) { + size = addr_end - addr_start; + break; + } + } + } + fclose(maps); + return size; +} + +void test_seal_mmap_seal_base(void) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + ptr = mmap(NULL, size, PROT_READ | PROT_SEAL_BASE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + assert(ptr != (void *)-1); + + ret = sys_munmap(ptr, size); + assert(ret < 0); + + ret = sys_mprotect(ptr, size, PROT_READ | PROT_WRITE); + assert(!ret); + + ret = sys_mseal(ptr, size, MM_SEAL_PROT_PKEY); + assert(!ret); + + ret = sys_mprotect(ptr, size, PROT_READ); + assert(ret < 0); +} + +void test_seal_mmap_seal_mprotect(void) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + ptr = mmap(NULL, size, PROT_READ | PROT_SEAL_PROT_PKEY, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + assert(ptr != (void *)-1); + + ret = sys_munmap(ptr, size); + assert(ret < 0); + + ret = sys_mprotect(ptr, size, PROT_READ | PROT_WRITE); + assert(ret < 0); +} + +void test_seal_mmap_seal_mseal(void) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + void *ret2; + + ptr = mmap(NULL, size, PROT_READ | PROT_SEAL_SEAL, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + assert(ptr != (void *)-1); + + ret = sys_mseal(ptr, size, MM_SEAL_BASE); + assert(ret < 0); + + ret = sys_mprotect(ptr, size, PROT_READ | PROT_WRITE); + assert(!ret); + + ret = sys_munmap(ptr, size); + assert(!ret); +} + +void test_seal_merge_and_split(void) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size; + int ret; + void *ret2; + + // (24 RO) + setup_single_address(24 * page_size, &ptr); + + // use mprotect(NONE) to set out boundary + // (1 NONE) (22 RO) (1 NONE) + ret = sys_mprotect(ptr, page_size, PROT_NONE); + assert(!ret); + ret = sys_mprotect(ptr + 23 * page_size, page_size, PROT_NONE); + assert(!ret); + size = get_vma_size(ptr + page_size); + assert(size == 22 * page_size); + + // use mseal to split from beginning + // (1 NONE) (1 RO_SBASE) (21 RO) (1 NONE) + ret = sys_mseal(ptr + page_size, page_size, MM_SEAL_BASE); + assert(!ret); + size = get_vma_size(ptr + page_size); + assert(size == page_size); + size = get_vma_size(ptr + 2 * page_size); + assert(size == 21 * page_size); + + // use mseal to split from the end. + // (1 NONE) (1 RO_SBASE) (20 RO) (1 RO_SBASE) (1 NONE) + ret = sys_mseal(ptr + 22 * page_size, page_size, MM_SEAL_BASE); + assert(!ret); + size = get_vma_size(ptr + 22 * page_size); + assert(size == page_size); + size = get_vma_size(ptr + 2 * page_size); + assert(size == 20 * page_size); + + // merge with prev. + // (1 NONE) (2 RO_SBASE) (19 RO) (1 RO_SBASE) (1 NONE) + ret = sys_mseal(ptr + 2 * page_size, page_size, MM_SEAL_BASE); + assert(!ret); + size = get_vma_size(ptr + page_size); + assert(size == 2 * page_size); + + // merge with after. + // (1 NONE) (2 RO_SBASE) (18 RO) (2 RO_SBASES) (1 NONE) + ret = sys_mseal(ptr + 21 * page_size, page_size, MM_SEAL_BASE); + assert(!ret); + size = get_vma_size(ptr + 21 * page_size); + assert(size == 2 * page_size); + + // split from prev + // (1 NONE) (1 RO_SBASE) (2RO_SPROT) (17 RO) (2 RO_SBASES) (1 NONE) + ret = sys_mseal(ptr + 2 * page_size, 2 * page_size, MM_SEAL_PROT_PKEY); + assert(!ret); + size = get_vma_size(ptr + 2 * page_size); + assert(size == 2 * page_size); + ret = sys_munmap(ptr + page_size, page_size); + assert(ret < 0); + ret = sys_mprotect(ptr + 2 * page_size, page_size, PROT_NONE); + assert(ret < 0); + + // split from next + // (1 NONE) (1 RO_SBASE) (2 RO_SPROT) (16 RO) (2 RO_SPROT) (1 RO_SBASES) (1 NONE) + ret = sys_mseal(ptr + 20 * page_size, 2 * page_size, MM_SEAL_PROT_PKEY); + assert(!ret); + size = get_vma_size(ptr + 20 * page_size); + assert(size == 2 * page_size); + + // merge from middle of prev and middle of next. + // (1 NONE) (1 RW_SBASE) (20 RO_SPROT) (1 RW_SBASES) (1 NONE) + ret = sys_mseal(ptr + 3 * page_size, 18 * page_size, MM_SEAL_PROT_PKEY); + assert(!ret); + size = get_vma_size(ptr + 2 * page_size); + assert(size == 20 * page_size); + + size = get_vma_size(ptr + 22 * page_size); + assert(size == page_size); + + size = get_vma_size(ptr + 23 * page_size); + assert(size == page_size); + + // Add split using SEAL_ALL + // (1 NONE) (1 RW_SBASE) (1 RO_SALL) (18 RO_SPROT) (1 RO_SALL) (1 RW_SBASES) (1 NONE) + ret = sys_mseal(ptr + 2 * page_size, page_size, + MM_SEAL_PROT_PKEY | MM_SEAL_DISCARD_RO_ANON); + assert(!ret); + size = get_vma_size(ptr + 2 * page_size); + assert(size == 1 * page_size); + + ret = sys_mseal(ptr + 21 * page_size, page_size, + MM_SEAL_PROT_PKEY | MM_SEAL_DISCARD_RO_ANON); + assert(!ret); + size = get_vma_size(ptr + 21 * page_size); + assert(size == 1 * page_size); + + // add a new seal type, and merge with next + // (1 NONE) (2 RO_SALL) (18 RO_SPROT) (2 RO_SALL) (1 NONE) + ret = sys_mprotect(ptr + page_size, page_size, PROT_READ); + assert(!ret); + ret = sys_mseal(ptr + page_size, page_size, MM_SEAL_PROT_PKEY); + assert(!ret); + ret = sys_mseal(ptr + page_size, page_size, MM_SEAL_DISCARD_RO_ANON); + assert(!ret); + size = get_vma_size(ptr + page_size); + assert(size == 2 * page_size); + + ret = sys_mprotect(ptr + 22 * page_size, page_size, PROT_READ); + assert(!ret); + ret = sys_mseal(ptr + 22 * page_size, page_size, MM_SEAL_PROT_PKEY); + assert(!ret); + ret = sys_mseal(ptr + 22 * page_size, page_size, MM_SEAL_DISCARD_RO_ANON); + assert(!ret); + size = get_vma_size(ptr + page_size); + assert(size == 2 * page_size); +} + +void test_seal_mmap_merge(void) +{ + LOG_TEST_ENTER(); + + void *ptr, *ptr2; + unsigned long page_size = getpagesize(); + unsigned long size; + int ret; + void *ret2; + + // (24 RO) + setup_single_address(24 * page_size, &ptr); + + // use mprotect(NONE) to set out boundary + // (1 NONE) (22 RO) (1 NONE) + ret = sys_mprotect(ptr, page_size, PROT_NONE); + assert(!ret); + ret = sys_mprotect(ptr + 23 * page_size, page_size, PROT_NONE); + assert(!ret); + size = get_vma_size(ptr + page_size); + assert(size == 22 * page_size); + + // use munmap to free 2 segment of memory. + // (1 NONE) (1 free) (20 RO) (1 free) (1 NONE) + ret = sys_munmap(ptr + page_size, page_size); + assert(!ret); + + ret = sys_munmap(ptr + 22 * page_size, page_size); + assert(!ret); + + // apply seal to the middle + // (1 NONE) (1 free) (20 RO_SBASE) (1 free) (1 NONE) + ret = sys_mseal(ptr + 2 * page_size, 20 * page_size, MM_SEAL_BASE); + assert(!ret); + size = get_vma_size(ptr + 2 * page_size); + assert(size == 20 * page_size); + + // allocate a mapping at beginning, and make sure it merges. + // (1 NONE) (21 RO_SBASE) (1 free) (1 NONE) + ptr2 = mmap(ptr + page_size, page_size, PROT_READ | PROT_SEAL_BASE, + MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + assert(ptr != (void *)-1); + size = get_vma_size(ptr + page_size); + assert(size == 21 * page_size); + + // allocate a mapping at end, and make sure it merges. + // (1 NONE) (22 RO_SBASE) (1 NONE) + ptr2 = mmap(ptr + 22 * page_size, page_size, PROT_READ | PROT_SEAL_BASE, + MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + assert(ptr != (void *)-1); + size = get_vma_size(ptr + page_size); + assert(size == 22 * page_size); +} + +static void test_not_sealable(void) +{ + int ret; + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + + ptr = mmap(NULL, size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + assert(ptr != (void *)-1); + + ret = sys_mseal(ptr, size, MM_SEAL_SEAL); + assert(ret < 0); +} + +static void test_merge_sealable(void) +{ + int ret; + void *ptr, *ptr2; + unsigned long page_size = getpagesize(); + unsigned long size; + + // (24 RO) + setup_single_address(24 * page_size, &ptr); + + // use mprotect(NONE) to set out boundary + // (1 NONE) (22 RO) (1 NONE) + ret = sys_mprotect(ptr, page_size, PROT_NONE); + assert(!ret); + ret = sys_mprotect(ptr + 23 * page_size, page_size, PROT_NONE); + assert(!ret); + size = get_vma_size(ptr + page_size); + assert(size == 22 * page_size); + + // (1 NONE) (RO) (4 free) (17 RO) (1 NONE) + ret = sys_munmap(ptr + 2 * page_size, 4 * page_size); + assert(!ret); + size = get_vma_size(ptr + page_size); + assert(size == 1 * page_size); + size = get_vma_size(ptr + 6 * page_size); + assert(size == 17 * page_size); + + // (1 NONE) (RO) (1 free) (2 RO) (1 free) (17 RO) (1 NONE) + ptr2 = mmap(ptr + 3 * page_size, 2 * page_size, PROT_READ, + MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_SEALABLE, -1, 0); + size = get_vma_size(ptr + 3 * page_size); + assert(size == 2 * page_size); + + // (1 NONE) (RO) (1 free) (20 RO) (1 NONE) + ptr2 = mmap(ptr + 5 * page_size, 1 * page_size, PROT_READ, + MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_SEALABLE, -1, 0); + assert(ptr2 != (void *)-1); + size = get_vma_size(ptr + 3 * page_size); + assert(size == 20 * page_size); + + // (1 NONE) (RO) (1 free) (19 RO) (1 RO_SB) (1 NONE) + ret = sys_mseal(ptr + 22 * page_size, page_size, MM_SEAL_BASE); + assert(!ret); + + // (1 NONE) (RO) (not sealable) (19 RO) (1 RO_SB) (1 NONE) + ptr2 = mmap(ptr + 2 * page_size, page_size, PROT_READ, + MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + assert(ptr2 != (void *)-1); + size = get_vma_size(ptr + page_size); + assert(size == page_size); + size = get_vma_size(ptr + 2 * page_size); + assert(size == page_size); +} + +static void test_seal_discard_ro_anon_on_rw(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address_rw_sealable(size, &ptr, seal); + assert(ptr != (void *)-1); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_DISCARD_RO_ANON); + assert(!ret); + } + + // sealing doesn't take effect on RW memory. + ret = sys_madvise(ptr, size, MADV_DONTNEED); + assert(!ret); + + // base seal still apply. + ret = sys_munmap(ptr, size); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +static void test_seal_discard_ro_anon_on_pkey(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + int pkey; + + setup_single_address_rw_sealable(size, &ptr, seal); + assert(ptr != (void *)-1); + + pkey = sys_pkey_alloc(0, 0); + assert(pkey > 0); + + ret = sys_mprotect_pkey((void *)ptr, size, PROT_READ | PROT_WRITE, pkey); + assert(!ret); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_DISCARD_RO_ANON); + assert(!ret); + } + + // sealing doesn't take effect if PKRU allow write. + set_pkey(pkey, 0); + ret = sys_madvise(ptr, size, MADV_DONTNEED); + assert(!ret); + + // sealing will take effect if PKRU deny write. + set_pkey(pkey, PKEY_DISABLE_WRITE); + ret = sys_madvise(ptr, size, MADV_DONTNEED); + if (seal) + assert(ret < 0); + else + assert(!ret); + + // base seal still apply. + ret = sys_munmap(ptr, size); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +static void test_seal_discard_ro_anon_on_filebacked(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + int fd; + unsigned long mapflags = MAP_PRIVATE; + + if (seal) + mapflags |= MAP_SEALABLE; + + fd = memfd_create("test", 0); + assert(fd > 0); + + ret = fallocate(fd, 0, 0, size); + assert(!ret); + + ptr = mmap(NULL, size, PROT_READ, mapflags, fd, 0); + assert(ptr != MAP_FAILED); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_DISCARD_RO_ANON); + assert(!ret); + } + + // sealing doesn't apply for file backed mapping. + ret = sys_madvise(ptr, size, MADV_DONTNEED); + assert(!ret); + + ret = sys_munmap(ptr, size); + if (seal) + assert(ret < 0); + else + assert(!ret); + close(fd); +} + +static void test_seal_discard_ro_anon_on_shared(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + unsigned long mapflags = MAP_ANONYMOUS | MAP_SHARED; + + if (seal) + mapflags |= MAP_SEALABLE; + + ptr = mmap(NULL, size, PROT_READ, mapflags, -1, 0); + assert(ptr != (void *)-1); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_DISCARD_RO_ANON); + assert(!ret); + } + + // sealing doesn't apply for shared mapping. + ret = sys_madvise(ptr, size, MADV_DONTNEED); + assert(!ret); + + ret = sys_munmap(ptr, size); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +static void test_seal_discard_ro_anon_invalid_shared(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + int fd; + + fd = open("/proc/self/maps", O_RDONLY); + ptr = mmap(NULL, size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, fd, 0); + assert(ptr != (void *)-1); + + if (seal) { + ret = sys_mseal(ptr, size, MM_SEAL_DISCARD_RO_ANON); + assert(!ret); + } + + ret = sys_madvise(ptr, size, MADV_DONTNEED); + assert(!ret); + + ret = sys_munmap(ptr, size); + assert(ret < 0); + close(fd); +} + +static void test_seal_discard_ro_anon(bool seal) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + setup_single_address(size, &ptr); + + if (seal) + seal_discard_ro_anon_single_address(ptr, size); + + ret = sys_madvise(ptr, size, MADV_DONTNEED); + if (seal) + assert(ret < 0); + else + assert(!ret); + + ret = sys_munmap(ptr, size); + if (seal) + assert(ret < 0); + else + assert(!ret); +} + +static void test_mmap_seal_discard_ro_anon(void) +{ + LOG_TEST_ENTER(); + void *ptr; + unsigned long page_size = getpagesize(); + unsigned long size = 4 * page_size; + int ret; + + ptr = mmap(NULL, size, PROT_READ | PROT_WRITE | PROT_SEAL_DISCARD_RO_ANON, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + assert(ptr != (void *)-1); + + ret = sys_mprotect(ptr, size, PROT_READ); + assert(!ret); + + ret = sys_madvise(ptr, size, MADV_DONTNEED); + assert(ret < 0); + + ret = sys_munmap(ptr, size); + assert(ret < 0); +} + +bool seal_support(void) +{ + void *ptr; + unsigned long page_size = getpagesize(); + + ptr = mmap(NULL, page_size, PROT_READ | PROT_SEAL_BASE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (ptr == (void *) -1) + return false; + return true; +} + +bool pkey_supported(void) +{ + int pkey = sys_pkey_alloc(0, 0); + + if (pkey > 0) + return true; + return false; +} + +int main(int argc, char **argv) +{ + bool test_seal = seal_support(); + + if (!test_seal) { + ksft_print_msg("%s CONFIG_MSEAL might be disabled, skip test\n", __func__); + return 0; + } + + test_seal_invalid_input(); + test_seal_addseals(); + test_seal_addseals_combined(); + test_seal_addseals_reject(); + test_seal_unmapped_start(); + test_seal_unmapped_middle(); + test_seal_unmapped_end(); + test_seal_multiple_vmas(); + test_seal_split_start(); + test_seal_split_end(); + + test_seal_zero_length(); + test_seal_twice(); + + test_seal_mprotect(false); + test_seal_mprotect(true); + + test_seal_start_mprotect(false); + test_seal_start_mprotect(true); + + test_seal_end_mprotect(false); + test_seal_end_mprotect(true); + + test_seal_mprotect_unalign_len(false); + test_seal_mprotect_unalign_len(true); + + test_seal_mprotect_unalign_len_variant_2(false); + test_seal_mprotect_unalign_len_variant_2(true); + + test_seal_mprotect_two_vma(false); + test_seal_mprotect_two_vma(true); + + test_seal_mprotect_two_vma_with_split(false); + test_seal_mprotect_two_vma_with_split(true); + + test_seal_mprotect_partial_mprotect(false); + test_seal_mprotect_partial_mprotect(true); + + test_seal_mprotect_two_vma_with_gap(false); + test_seal_mprotect_two_vma_with_gap(true); + + test_seal_mprotect_merge(false); + test_seal_mprotect_merge(true); + + test_seal_mprotect_split(false); + test_seal_mprotect_split(true); + + test_seal_munmap(false); + test_seal_munmap(true); + test_seal_munmap_two_vma(false); + test_seal_munmap_two_vma(true); + test_seal_munmap_vma_with_gap(false); + test_seal_munmap_vma_with_gap(true); + + test_munmap_start_freed(false); + test_munmap_start_freed(true); + test_munmap_middle_freed(false); + test_munmap_middle_freed(true); + test_munmap_end_freed(false); + test_munmap_end_freed(true); + + test_seal_mremap_shrink(false); + test_seal_mremap_shrink(true); + test_seal_mremap_expand(false); + test_seal_mremap_expand(true); + test_seal_mremap_move(false); + test_seal_mremap_move(true); + + test_seal_mremap_shrink_fixed(false); + test_seal_mremap_shrink_fixed(true); + test_seal_mremap_expand_fixed(false); + test_seal_mremap_expand_fixed(true); + test_seal_mremap_move_fixed(false); + test_seal_mremap_move_fixed(true); + test_seal_mremap_move_dontunmap(false); + test_seal_mremap_move_dontunmap(true); + test_seal_mremap_move_fixed_zero(false); + test_seal_mremap_move_fixed_zero(true); + test_seal_mremap_move_dontunmap_anyaddr(false); + test_seal_mremap_move_dontunmap_anyaddr(true); + test_seal_discard_ro_anon(false); + test_seal_discard_ro_anon(true); + test_seal_discard_ro_anon_on_rw(false); + test_seal_discard_ro_anon_on_rw(true); + test_seal_discard_ro_anon_on_shared(false); + test_seal_discard_ro_anon_on_shared(true); + test_seal_discard_ro_anon_on_filebacked(false); + test_seal_discard_ro_anon_on_filebacked(true); + test_seal_mmap_overwrite_prot(false); + test_seal_mmap_overwrite_prot(true); + test_seal_mmap_expand(false); + test_seal_mmap_expand(true); + test_seal_mmap_shrink(false); + test_seal_mmap_shrink(true); + + test_seal_mmap_seal_base(); + test_seal_mmap_seal_mprotect(); + test_seal_mmap_seal_mseal(); + test_mmap_seal_discard_ro_anon(); + test_seal_merge_and_split(); + test_seal_mmap_merge(); + + test_not_sealable(); + test_merge_sealable(); + + if (pkey_supported()) { + test_seal_discard_ro_anon_on_pkey(false); + test_seal_discard_ro_anon_on_pkey(true); + } + + ksft_print_msg("Done\n"); + return 0; +} From patchwork Tue Dec 12 23:17:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jeff Xu X-Patchwork-Id: 754129 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="Q6rto8wf" Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 101D9113 for ; Tue, 12 Dec 2023 15:17:23 -0800 (PST) Received: by mail-ot1-x332.google.com with SMTP id 46e09a7af769-6d9d4193d94so4738715a34.3 for ; Tue, 12 Dec 2023 15:17:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1702423042; x=1703027842; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uU36FjOIbdJKQPRiJg4nrZQU0fb/SKB6wgyMQ4t+i44=; b=Q6rto8wfztnbMECUp7AIeqx0uu86PE9aG6ZHMHfsjUoqaLNMUDRhZk/SE9OjXCUtFl tyNx8suTCa8z103x5Vnhmo1qX9D6NaCaZE+t4uFYEu19z4tSFEZdam0z9d4iVNwCSfCg LwwbGS3JrdK+T+sSR4ad7VVWE1Nq2hRlTCaGo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702423042; x=1703027842; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uU36FjOIbdJKQPRiJg4nrZQU0fb/SKB6wgyMQ4t+i44=; b=hroGy8FyPHAZhtaFIHdv5+niJ8sSDq/kO/Xw7p2V0gE/ah227Gzk9WzQIPVvubSbeq 3fSAGZQghH866wc6RgA1YWk+qJ0F36Lx3ZV6B6R+yPN/kJxnMEbjrMBddGw08UoVg5QA hqQnCw3aqv7mf8jUFYws/7nmJdo4wo7w9d7sxYQloHq0Rsq8F+p5OVCUplGAIeJw5Dnk 3AsN2Ip8atpxJk+8UQDyow6bHQr8MQJcRBjsfADvoSDwwh3mOQvdzxTXiAgeE6im+j5a KVvvnf1o5s3nIUq3FL0zjl/BMLVhSBIOxDgGhWBCyUGLSOKQyoa1/uSyIdVKEw/4xykl tGBg== X-Gm-Message-State: AOJu0YyMVtls9v1nlE5Pq0+pNL6wzl1OevcvhgvO9/4aL8zdytTt//4q nL4UgyPT+BIa4hU1B/iJFvaF6g== X-Google-Smtp-Source: AGHT+IESnkqDjGhkYEemYfFxmrgB/rWYhFKb6e57jqVPg8rTGimOrJk6IlvAbIgCglInHK6pIykjCw== X-Received: by 2002:a05:6870:ec8b:b0:1fb:75c:400e with SMTP id eo11-20020a056870ec8b00b001fb075c400emr8193744oab.110.1702423042075; Tue, 12 Dec 2023 15:17:22 -0800 (PST) Received: from localhost (34.133.83.34.bc.googleusercontent.com. [34.83.133.34]) by smtp.gmail.com with UTF8SMTPSA id v29-20020a63481d000000b005c19c586cb7sm8685104pga.33.2023.12.12.15.17.21 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Dec 2023 15:17:21 -0800 (PST) From: jeffxu@chromium.org To: akpm@linux-foundation.org, keescook@chromium.org, jannh@google.com, sroettger@google.com, willy@infradead.org, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Cc: jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, pedro.falcato@gmail.com, dave.hansen@intel.com, linux-hardening@vger.kernel.org, deraadt@openbsd.org, Jeff Xu Subject: [RFC PATCH v3 11/11] mseal:add documentation Date: Tue, 12 Dec 2023 23:17:05 +0000 Message-ID: <20231212231706.2680890-12-jeffxu@chromium.org> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog In-Reply-To: <20231212231706.2680890-1-jeffxu@chromium.org> References: <20231212231706.2680890-1-jeffxu@chromium.org> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Jeff Xu Add documentation for mseal(). Signed-off-by: Jeff Xu --- Documentation/userspace-api/mseal.rst | 189 ++++++++++++++++++++++++++ 1 file changed, 189 insertions(+) create mode 100644 Documentation/userspace-api/mseal.rst diff --git a/Documentation/userspace-api/mseal.rst b/Documentation/userspace-api/mseal.rst new file mode 100644 index 000000000000..651c618d0664 --- /dev/null +++ b/Documentation/userspace-api/mseal.rst @@ -0,0 +1,189 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +Introduction of mseal +===================== + +:Author: Jeff Xu + +Modern CPUs support memory permissions such as RW and NX bits. The memory +permission feature improves security stance on memory corruption bugs, i.e. +the attacker can’t just write to arbitrary memory and point the code to it, +the memory has to be marked with X bit, or else an exception will happen. + +Memory sealing additionally protects the mapping itself against +modifications. This is useful to mitigate memory corruption issues where a +corrupted pointer is passed to a memory management system. For example, +such an attacker primitive can break control-flow integrity guarantees +since read-only memory that is supposed to be trusted can become writable +or .text pages can get remapped. Memory sealing can automatically be +applied by the runtime loader to seal .text and .rodata pages and +applications can additionally seal security critical data at runtime. + +A similar feature already exists in the XNU kernel with the +VM_FLAGS_PERMANENT flag [1] and on OpenBSD with the mimmutable syscall [2]. + +User API +======== +Two system calls are involved in virtual memory sealing, ``mseal()`` and ``mmap()``. + +``mseal()`` +----------- + +The ``mseal()`` is an architecture independent syscall, and with following +signature: + +``int mseal(void addr, size_t len, unsigned long types, unsigned long flags)`` + +**addr/len**: virtual memory address range. + +The address range set by ``addr``/``len`` must meet: + - start (addr) must be in a valid VMA. + - end (addr + len) must be in a valid VMA. + - no gap (unallocated memory) between start and end. + - start (addr) must be page aligned. + +The ``len`` will be paged aligned implicitly by kernel. + +**types**: bit mask to specify the sealing types, they are: + +- The ``MM_SEAL_BASE``: Prevent VMA from: + + Unmapping, moving to another location, and shrinking the size, + via munmap() and mremap(), can leave an empty space, therefore + can be replaced with a VMA with a new set of attributes. + + Move or expand a different vma into the current location, + via mremap(). + + Modifying sealed VMA via mmap(MAP_FIXED). + + Size expansion, via mremap(), does not appear to pose any + specific risks to sealed VMAs. It is included anyway because + the use case is unclear. In any case, users can rely on + merging to expand a sealed VMA. + + We consider the MM_SEAL_BASE feature, on which other sealing + features will depend. For instance, it probably does not make sense + to seal PROT_PKEY without sealing the BASE, and the kernel will + implicitly add SEAL_BASE for SEAL_PROT_PKEY. (If the application + wants to relax this in future, we could use the “flags” field in + mseal() to overwrite this the behavior.) + +- The ``MM_SEAL_PROT_PKEY``: + + Seal PROT and PKEY of the address range, in other words, + mprotect() and pkey_mprotect() will be denied if the memory is + sealed with MM_SEAL_PROT_PKEY. + +- The ``MM_SEAL_DISCARD_RO_ANON``: + + Certain types of madvise() operations are destructive [3], such + as MADV_DONTNEED, which can effectively alter region contents by + discarding pages, especially when memory is anonymous. This blocks + such operations for anonymous memory which is not writable to the + user. + +- The ``MM_SEAL_SEAL`` + Denies adding a new seal. + +**flags**: reserved for future use. + +**return values**: + +- ``0``: + - Success. + +- ``-EINVAL``: + - Invalid seal type. + - Invalid input flags. + - Start address is not page aligned. + - Address range (``addr`` + ``len``) overflow. + +- ``-ENOMEM``: + - ``addr`` is not a valid address (not allocated). + - End address (``addr`` + ``len``) is not a valid address. + - A gap (unallocated memory) between start and end. + +- ``-EACCES``: + - ``MM_SEAL_SEAL`` is set, adding a new seal is not allowed. + - Address range is not sealable, e.g. ``MAP_SEALABLE`` is not + set during ``mmap()``. + +**Note**: + +- User can call mseal(2) multiple times to add new seal types. +- Adding an already added seal type is a no-action (no error). +- unseal() or removing a seal type is not supported. +- In case of error return, one can expect the memory range is unchanged. + +``mmap()`` +---------- +``void *mmap(void* addr, size_t length, int prot, int flags, int fd, +off_t offset);`` + +We made two changes (``prot`` and ``flags``) to ``mmap()`` related to +memory sealing. + +**prot**: + +- ``PROT_SEAL_SEAL`` +- ``PROT_SEAL_BASE`` +- ``PROT_SEAL_PROT_PKEY`` +- ``PROT_SEAL_DISCARD_RO_ANON`` + +Allow ``mmap()`` to set the sealing type when creating a mapping. This is +useful for optimization because it avoids having to make two system +calls: one for ``mmap()`` and one for ``mseal()``. + +It's worth noting that even though the sealing type is set via the +``prot`` field in ``mmap()``, we don't require it to be set in the ``prot`` +field in later ``mprotect()`` call. This is unlike the ``PROT_READ``, +``PROT_WRITE``, ``PROT_EXEC`` bits, e.g. if ``PROT_WRITE`` is not set in +``mprotect()``, it means that the region is not writable. + +**flags** +The ``MAP_SEALABLE`` flag is added to the ``flags`` field of ``mmap()``. +When present, it marks the map as sealable. A map created +without ``MAP_SEALABLE`` will not support sealing; In other words, +``mseal()`` will fail for such a map. + +Applications that don't care about sealing will expect their +behavior unchanged. For those that need sealing support, opt-in +by adding ``MAP_SEALABLE`` when creating the map. + +Use Case: +========= +- glibc: + The dymamic linker, during loading ELF executables, can apply sealing to + to non-writeable memory segments. + +- Chrome browser: protect some security sensitive data-structures. + +Additional notes: +================= +As Jann Horn pointed out in [3], there are still a few ways to write +to RO memory, which is, in a way, by design. Those are not covered by +``mseal()``. If applications want to block such cases, sandboxer +(such as seccomp, LSM, etc) might be considered. + +Those cases are: + +- Write to read-only memory through ``/proc/self/mem`` interface. + +- Write to read-only memory through ``ptrace`` (such as ``PTRACE_POKETEXT``). + +- ``userfaultfd()``. + +The idea that inspired this patch comes from Stephen Röttger’s work in V8 +CFI [4].Chrome browser in ChromeOS will be the first user of this API. + +Reference: +========== +[1] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274 + +[2] https://man.openbsd.org/mimmutable.2 + +[3] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com + +[4] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc