From patchwork Wed Mar 3 16:22:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 393081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7CA9C43381 for ; Thu, 4 Mar 2021 00:58:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A96B864F3E for ; Thu, 4 Mar 2021 00:58:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235183AbhCDA5t (ORCPT ); Wed, 3 Mar 2021 19:57:49 -0500 Received: from mail.kernel.org ([198.145.29.99]:46330 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237517AbhCCQXe (ORCPT ); Wed, 3 Mar 2021 11:23:34 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 1C28564EFC; Wed, 3 Mar 2021 16:22:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1614788571; bh=X/1ny8uglVuhfpYmrp6Fc32vjqohkdMv8EZUozNZySc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kyJa4OCSK43iTrFITQPoqKtpiDK0T+REEpq4p1Mh6+pUce21iER5SwDElH+oFQbIh Z6zBY5QKkhea+XXH73zsfScmO8VM7FbDyudJ3bj+l9ox8botngiAbrronlv1T9OB6G ZZ56/Tp908vg6XBZpeVQshQOfMW88Emr4d3r3zplUGA2EBK6QYSnLKVDU5gY6CE65E VAO5DVx37EtLg7/37j64q3v4HKizGtazm6D5IbYz65fNOQJxYIb/cKaWCPrO8Hu8Aj eCyWRhBjGp4EUR3A67hU7MOjKgs2Tm7uL6FLZdArMbZCUDdtwvjecGvhwDFwKzSmH6 0fRpSofH+sEZQ== From: Mike Rapoport To: Andrew Morton Cc: Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Matthew Garrett , Mark Rutland , Michal Hocko , Mike Rapoport , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , "Rafael J. Wysocki" , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: [PATCH v18 2/9] mmap: make mlock_future_check() global Date: Wed, 3 Mar 2021 18:22:02 +0200 Message-Id: <20210303162209.8609-3-rppt@kernel.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210303162209.8609-1-rppt@kernel.org> References: <20210303162209.8609-1-rppt@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Mike Rapoport It will be used by the upcoming secret memory implementation. Signed-off-by: Mike Rapoport Cc: Alexander Viro Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Catalin Marinas Cc: Christopher Lameter Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Elena Reshetova Cc: Hagen Paul Pfeifer Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: James Bottomley Cc: "Kirill A. Shutemov" Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael Kerrisk Cc: Palmer Dabbelt Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Peter Zijlstra Cc: Rick Edgecombe Cc: Roman Gushchin Cc: Shakeel Butt Cc: Shuah Khan Cc: Thomas Gleixner Cc: Tycho Andersen Cc: Will Deacon --- mm/internal.h | 3 +++ mm/mmap.c | 5 ++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 9902648f2206..8e9c660f33ca 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -353,6 +353,9 @@ static inline void munlock_vma_pages_all(struct vm_area_struct *vma) extern void mlock_vma_page(struct page *page); extern unsigned int munlock_vma_page(struct page *page); +extern int mlock_future_check(struct mm_struct *mm, unsigned long flags, + unsigned long len); + /* * Clear the page's PageMlocked(). This can be useful in a situation where * we want to unconditionally remove a page from the pagecache -- e.g., diff --git a/mm/mmap.c b/mm/mmap.c index 3f287599a7a3..f989aa170de4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1346,9 +1346,8 @@ static inline unsigned long round_hint_to_min(unsigned long hint) return hint; } -static inline int mlock_future_check(struct mm_struct *mm, - unsigned long flags, - unsigned long len) +int mlock_future_check(struct mm_struct *mm, unsigned long flags, + unsigned long len) { unsigned long locked, lock_limit; From patchwork Wed Mar 3 16:22:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 393080 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B463CC433E0 for ; Thu, 4 Mar 2021 00:58:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 892F864EDB for ; Thu, 4 Mar 2021 00:58:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235340AbhCDA5v (ORCPT ); Wed, 3 Mar 2021 19:57:51 -0500 Received: from mail.kernel.org ([198.145.29.99]:46646 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238792AbhCCQYu (ORCPT ); Wed, 3 Mar 2021 11:24:50 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 699C564EF1; Wed, 3 Mar 2021 16:22:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1614788583; bh=O4nT5/YvVTgsA8Uq9YZRjJp/C7wJQhyeo4C8+vcD1Lk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BuZEIk0kxIY9JM/Lt3HqGzCl2ST20PiYNnDTT/3KQ0jKlvbOcfwGBcXhbOC9nFLbg 07LruRFjETGAcxz+mbVDAvMtfW28FHCnFJyK1vIr5btMjDa0WMzFo5y/wRGkrwMg3m W97qlbfvSQXHkf/tZ1HK0QCitJ4kfkSVgCMGHVt3/uobur4RBlb4ZgJzkG5CERLFcF 5pdsXG9JyUXeXGLKEL6NkjGyTKtWr1z6qNzmQ3N0env8/KbJ9bga17Mj+T0MpKA/km cyvGQmss+vaKLSbh6SE4N5aCVu92v5P4Pt/PD1TuBO7BVV+b8aw1pOC5Mmb5W4nRAs OX+RXYMHtAzog== From: Mike Rapoport To: Andrew Morton Cc: Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Matthew Garrett , Mark Rutland , Michal Hocko , Mike Rapoport , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , "Rafael J. Wysocki" , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, kernel test robot Subject: [PATCH v18 3/9] riscv/Kconfig: make direct map manipulation options depend on MMU Date: Wed, 3 Mar 2021 18:22:03 +0200 Message-Id: <20210303162209.8609-4-rppt@kernel.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210303162209.8609-1-rppt@kernel.org> References: <20210303162209.8609-1-rppt@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Mike Rapoport ARCH_HAS_SET_DIRECT_MAP and ARCH_HAS_SET_MEMORY configuration options have no meaning when CONFIG_MMU is disabled and there is no point to enable them for the nommu case. Add an explicit dependency on MMU for these options. Signed-off-by: Mike Rapoport Reported-by: kernel test robot --- arch/riscv/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 85d626b8ce5e..87ccc29fc31d 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -25,8 +25,8 @@ config RISCV select ARCH_HAS_KCOV select ARCH_HAS_MMIOWB select ARCH_HAS_PTE_SPECIAL - select ARCH_HAS_SET_DIRECT_MAP - select ARCH_HAS_SET_MEMORY + select ARCH_HAS_SET_DIRECT_MAP if MMU + select ARCH_HAS_SET_MEMORY if MMU select ARCH_HAS_STRICT_KERNEL_RWX if MMU select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT From patchwork Wed Mar 3 16:22:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 393079 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EFF7C433E6 for ; Thu, 4 Mar 2021 00:58:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E14C264E76 for ; Thu, 4 Mar 2021 00:58:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235499AbhCDA5w (ORCPT ); Wed, 3 Mar 2021 19:57:52 -0500 Received: from mail.kernel.org ([198.145.29.99]:46922 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238886AbhCCQYv (ORCPT ); Wed, 3 Mar 2021 11:24:51 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8C7D764EEB; Wed, 3 Mar 2021 16:23:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1614788594; bh=3L5LfgX8YQZHs8RhJs3aUV3Lp9pUuH2HcdGSCfQz7w8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZGfvsZGZQAM599zQUIVcrTqth2yu+GJIaidUQ1VxQKtyB4kVeRNubHlrvDGzmHjrb P8e1H/vG0/zaY55hqTarALIx9hq7kKeE175hk0vnoCLpEej7OzEfhmvY+mOUHDc2o3 CW4yk4ob8BGnQ2Ij/D4eL5k36utJUK/ySgdRHlEI9JI6MXgiH8/XquGySEjL2b651m +5gHVUeyrsTGyJC7csSVRUFbpDqELxLTTVNNoC91CQ6p1sASqfFEM9qXWN9RKGa6Bo b1cgGZPzWU9ZoNbd5PxwYd0Caw1zQMCebLmUxFfqKSh3htleCbNw/Y4+zDUNrWJe9a SfFXg++8K8MSg== From: Mike Rapoport To: Andrew Morton Cc: Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Matthew Garrett , Mark Rutland , Michal Hocko , Mike Rapoport , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , "Rafael J. Wysocki" , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: [PATCH v18 4/9] set_memory: allow set_direct_map_*_noflush() for multiple pages Date: Wed, 3 Mar 2021 18:22:04 +0200 Message-Id: <20210303162209.8609-5-rppt@kernel.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210303162209.8609-1-rppt@kernel.org> References: <20210303162209.8609-1-rppt@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Mike Rapoport The underlying implementations of set_direct_map_invalid_noflush() and set_direct_map_default_noflush() allow updating multiple contiguous pages at once. Add numpages parameter to set_direct_map_*_noflush() to expose this ability with these APIs. Signed-off-by: Mike Rapoport Acked-by: Catalin Marinas [arm64] Cc: Alexander Viro Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christopher Lameter Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Elena Reshetova Cc: Hagen Paul Pfeifer Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: James Bottomley Cc: "Kirill A. Shutemov" Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael Kerrisk Cc: Palmer Dabbelt Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Peter Zijlstra Cc: Rick Edgecombe Cc: Roman Gushchin Cc: Shakeel Butt Cc: Shuah Khan Cc: Thomas Gleixner Cc: Tycho Andersen Cc: Will Deacon --- arch/arm64/include/asm/cacheflush.h | 4 ++-- arch/arm64/mm/pageattr.c | 10 ++++++---- arch/riscv/include/asm/set_memory.h | 4 ++-- arch/riscv/mm/pageattr.c | 8 ++++---- arch/x86/include/asm/set_memory.h | 4 ++-- arch/x86/mm/pat/set_memory.c | 8 ++++---- include/linux/set_memory.h | 4 ++-- kernel/power/snapshot.c | 4 ++-- mm/vmalloc.c | 5 +++-- 9 files changed, 27 insertions(+), 24 deletions(-) diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h index 52e5c1623224..ace2c3d7ae7e 100644 --- a/arch/arm64/include/asm/cacheflush.h +++ b/arch/arm64/include/asm/cacheflush.h @@ -133,8 +133,8 @@ static __always_inline void __flush_icache_all(void) int set_memory_valid(unsigned long addr, int numpages, int enable); -int set_direct_map_invalid_noflush(struct page *page); -int set_direct_map_default_noflush(struct page *page); +int set_direct_map_invalid_noflush(struct page *page, int numpages); +int set_direct_map_default_noflush(struct page *page, int numpages); bool kernel_page_present(struct page *page); #include diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index 92eccaf595c8..b53ef37bf95a 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -148,34 +148,36 @@ int set_memory_valid(unsigned long addr, int numpages, int enable) __pgprot(PTE_VALID)); } -int set_direct_map_invalid_noflush(struct page *page) +int set_direct_map_invalid_noflush(struct page *page, int numpages) { struct page_change_data data = { .set_mask = __pgprot(0), .clear_mask = __pgprot(PTE_VALID), }; + unsigned long size = PAGE_SIZE * numpages; if (!debug_pagealloc_enabled() && !rodata_full) return 0; return apply_to_page_range(&init_mm, (unsigned long)page_address(page), - PAGE_SIZE, change_page_range, &data); + size, change_page_range, &data); } -int set_direct_map_default_noflush(struct page *page) +int set_direct_map_default_noflush(struct page *page, int numpages) { struct page_change_data data = { .set_mask = __pgprot(PTE_VALID | PTE_WRITE), .clear_mask = __pgprot(PTE_RDONLY), }; + unsigned long size = PAGE_SIZE * numpages; if (!debug_pagealloc_enabled() && !rodata_full) return 0; return apply_to_page_range(&init_mm, (unsigned long)page_address(page), - PAGE_SIZE, change_page_range, &data); + size, change_page_range, &data); } #ifdef CONFIG_DEBUG_PAGEALLOC diff --git a/arch/riscv/include/asm/set_memory.h b/arch/riscv/include/asm/set_memory.h index 6887b3d9f371..018e26732940 100644 --- a/arch/riscv/include/asm/set_memory.h +++ b/arch/riscv/include/asm/set_memory.h @@ -26,8 +26,8 @@ static inline void protect_kernel_text_data(void) {} static inline int set_memory_rw_nx(unsigned long addr, int numpages) { return 0; } #endif -int set_direct_map_invalid_noflush(struct page *page); -int set_direct_map_default_noflush(struct page *page); +int set_direct_map_invalid_noflush(struct page *page, int numpages); +int set_direct_map_default_noflush(struct page *page, int numpages); bool kernel_page_present(struct page *page); #endif /* __ASSEMBLY__ */ diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c index 5e49e4b4a4cc..9618181b70be 100644 --- a/arch/riscv/mm/pageattr.c +++ b/arch/riscv/mm/pageattr.c @@ -156,11 +156,11 @@ int set_memory_nx(unsigned long addr, int numpages) return __set_memory(addr, numpages, __pgprot(0), __pgprot(_PAGE_EXEC)); } -int set_direct_map_invalid_noflush(struct page *page) +int set_direct_map_invalid_noflush(struct page *page, int numpages) { int ret; unsigned long start = (unsigned long)page_address(page); - unsigned long end = start + PAGE_SIZE; + unsigned long end = start + PAGE_SIZE * numpages; struct pageattr_masks masks = { .set_mask = __pgprot(0), .clear_mask = __pgprot(_PAGE_PRESENT) @@ -173,11 +173,11 @@ int set_direct_map_invalid_noflush(struct page *page) return ret; } -int set_direct_map_default_noflush(struct page *page) +int set_direct_map_default_noflush(struct page *page, int numpages) { int ret; unsigned long start = (unsigned long)page_address(page); - unsigned long end = start + PAGE_SIZE; + unsigned long end = start + PAGE_SIZE * numpages; struct pageattr_masks masks = { .set_mask = PAGE_KERNEL, .clear_mask = __pgprot(0) diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h index 4352f08bfbb5..6224cb291f6c 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -80,8 +80,8 @@ int set_pages_wb(struct page *page, int numpages); int set_pages_ro(struct page *page, int numpages); int set_pages_rw(struct page *page, int numpages); -int set_direct_map_invalid_noflush(struct page *page); -int set_direct_map_default_noflush(struct page *page); +int set_direct_map_invalid_noflush(struct page *page, int numpages); +int set_direct_map_default_noflush(struct page *page, int numpages); bool kernel_page_present(struct page *page); extern int kernel_set_to_readonly; diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 16f878c26667..d157fd617c99 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2184,14 +2184,14 @@ static int __set_pages_np(struct page *page, int numpages) return __change_page_attr_set_clr(&cpa, 0); } -int set_direct_map_invalid_noflush(struct page *page) +int set_direct_map_invalid_noflush(struct page *page, int numpages) { - return __set_pages_np(page, 1); + return __set_pages_np(page, numpages); } -int set_direct_map_default_noflush(struct page *page) +int set_direct_map_default_noflush(struct page *page, int numpages) { - return __set_pages_p(page, 1); + return __set_pages_p(page, numpages); } #ifdef CONFIG_DEBUG_PAGEALLOC diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h index fe1aa4e54680..c650f82db813 100644 --- a/include/linux/set_memory.h +++ b/include/linux/set_memory.h @@ -15,11 +15,11 @@ static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; } #endif #ifndef CONFIG_ARCH_HAS_SET_DIRECT_MAP -static inline int set_direct_map_invalid_noflush(struct page *page) +static inline int set_direct_map_invalid_noflush(struct page *page, int numpages) { return 0; } -static inline int set_direct_map_default_noflush(struct page *page) +static inline int set_direct_map_default_noflush(struct page *page, int numpages) { return 0; } diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index d63560e1cf87..64b7aab9aee4 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -86,7 +86,7 @@ static inline void hibernate_restore_unprotect_page(void *page_address) {} static inline void hibernate_map_page(struct page *page) { if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) { - int ret = set_direct_map_default_noflush(page); + int ret = set_direct_map_default_noflush(page, 1); if (ret) pr_warn_once("Failed to remap page\n"); @@ -99,7 +99,7 @@ static inline void hibernate_unmap_page(struct page *page) { if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) { unsigned long addr = (unsigned long)page_address(page); - int ret = set_direct_map_invalid_noflush(page); + int ret = set_direct_map_invalid_noflush(page, 1); if (ret) pr_warn_once("Failed to remap page\n"); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 4f5f8c907897..8ab83fbecadd 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2195,13 +2195,14 @@ struct vm_struct *remove_vm_area(const void *addr) } static inline void set_area_direct_map(const struct vm_struct *area, - int (*set_direct_map)(struct page *page)) + int (*set_direct_map)(struct page *page, + int numpages)) { int i; for (i = 0; i < area->nr_pages; i++) if (page_address(area->pages[i])) - set_direct_map(area->pages[i]); + set_direct_map(area->pages[i], 1); } /* Handle removing and resetting vm mappings related to the vm_struct. */ From patchwork Wed Mar 3 16:22:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 393075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C13A6C433E9 for ; Thu, 4 Mar 2021 00:58:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 945C264E76 for ; Thu, 4 Mar 2021 00:58:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235779AbhCDA5x (ORCPT ); Wed, 3 Mar 2021 19:57:53 -0500 Received: from mail.kernel.org ([198.145.29.99]:50052 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346415AbhCCQ0Q (ORCPT ); Wed, 3 Mar 2021 11:26:16 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8C5A364F07; Wed, 3 Mar 2021 16:23:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1614788617; bh=WdLa9ZZxuBADbujeOJu0bHGHXitkk2itr8spRdpuWI0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=P6UUIOPV22xfLrEOpt2xzhSw2pBYX/MTwSR02KL1KZxSKAdLTws3cJJfK3tjOv2Z3 J+nCzLR68Z01Hr8nuBKdWXN0s8Z2L17LN9mKeUyj3D7PTZgsL5EGPHq/eUV13tLU81 Mm9izr6C8t9K9McyBqXxdYRjuwCv6MixmsYZf/VqX3j86IvWiSmVWp5S1v0l8Nl+yo HYepR7KuWlGjv37kGdhOIEhxfuk9Iz2hX9LC777+unKK2UAUIW8b5KGPt25s3+v1Xm THCiq70xRX7f3GQeGuk6Exg9OckMLXjV5qCus1KWUKUH/usZ3KizczFODWOSxR2VPN Mn12rHvUvVMmg== From: Mike Rapoport To: Andrew Morton Cc: Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Matthew Garrett , Mark Rutland , Michal Hocko , Mike Rapoport , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , "Rafael J. Wysocki" , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: [PATCH v18 6/9] mm: introduce memfd_secret system call to create "secret" memory areas Date: Wed, 3 Mar 2021 18:22:06 +0200 Message-Id: <20210303162209.8609-7-rppt@kernel.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210303162209.8609-1-rppt@kernel.org> References: <20210303162209.8609-1-rppt@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Mike Rapoport Introduce "memfd_secret" system call with the ability to create memory areas visible only in the context of the owning process and not mapped not only to other processes but in the kernel page tables as well. The secretmem feature is off by default and the user must explicitly enable it at the boot time. Once secretmem is enabled, the user will be able to create a file descriptor using the memfd_secret() system call. The memory areas created by mmap() calls from this file descriptor will be unmapped from the kernel direct map and they will be only mapped in the page table of the processes that have access to the file descriptor. The file descriptor based memory has several advantages over the "traditional" mm interfaces, such as mlock(), mprotect(), madvise(). File descriptor approach allows explict and controlled sharing of the memory areas, it allows to seal the operations. Besides, file descriptor based memory paves the way for VMMs to remove the secret memory range from the userpace hipervisor process, for instance QEMU. Andy Lutomirski says: "Getting fd-backed memory into a guest will take some possibly major work in the kernel, but getting vma-backed memory into a guest without mapping it in the host user address space seems much, much worse." memfd_secret() is made a dedicated system call rather than an extention to memfd_create() because it's purpose is to allow the user to create more secure memory mappings rather than to simply allow file based access to the memory. Nowadays a new system call cost is negligible while it is way simpler for userspace to deal with a clear-cut system calls than with a multiplexer or an overloaded syscall. Moreover, the initial implementation of memfd_secret() is completely distinct from memfd_create() so there is no much sense in overloading memfd_create() to begin with. If there will be a need for code sharing between these implementation it can be easily achieved without a need to adjust user visible APIs. The secret memory remains accessible in the process context using uaccess primitives, but it is not exposed to the kernel otherwise; secret memory areas are removed from the direct map and functions in the follow_page()/get_user_page() family will refuse to return a page that belongs to the secret memory area. Once there will be a use case that will require exposing secretmem to the kernel it will be an opt-in request in the system call flags so that user would have to decide what data can be exposed to the kernel. Removing of the pages from the direct map may cause its fragmentation on architectures that use large pages to map the physical memory which affects the system performance. However, the original Kconfig text for CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736 ("x86: add gbpages switches")) and the recent report [1] showed that "... although 1G mappings are a good default choice, there is no compelling evidence that it must be the only choice". Hence, it is sufficient to have secretmem disabled by default with the ability of a system administrator to enable it at boot time. Pages in the secretmem regions are unevictable and unmovable to avoid accidental exposure of the sensitive data via swap or during page migration. Since the secretmem mappings are locked in memory they cannot exceed RLIMIT_MEMLOCK. Since these mappings are already locked independently from mlock(), an attempt to mlock()/munlock() secretmem range would fail and mlockall()/munlockall() will ignore secretmem mappings. However, unlike mlock()ed memory, secretmem currently behaves more like long-term GUP: secretmem mappings are unmovable mappings directly consumed by user space. With default limits, there is no excessive use of secretmem and it poses no real problem in combination with ZONE_MOVABLE/CMA, but in the future this should be addressed to allow balanced use of large amounts of secretmem along with ZONE_MOVABLE/CMA. A page that was a part of the secret memory area is cleared when it is freed to ensure the data is not exposed to the next user of that page. The following example demonstrates creation of a secret mapping (error handling is omitted): fd = memfd_secret(0); ftruncate(fd, MAP_SIZE); ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); [1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/ Signed-off-by: Mike Rapoport Acked-by: Hagen Paul Pfeifer Cc: Alexander Viro Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Catalin Marinas Cc: Christopher Lameter Cc: Dan Williams Cc: Dave Hansen Cc: Elena Reshetova Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: James Bottomley Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Mark Rutland Cc: Michael Kerrisk Cc: Palmer Dabbelt Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Peter Zijlstra Cc: Rick Edgecombe Cc: Roman Gushchin Cc: Shakeel Butt Cc: Shuah Khan Cc: Thomas Gleixner Cc: Tycho Andersen Cc: Will Deacon --- include/linux/secretmem.h | 24 ++++ include/uapi/linux/magic.h | 1 + kernel/sys_ni.c | 2 + mm/Kconfig | 3 + mm/Makefile | 1 + mm/gup.c | 10 ++ mm/mlock.c | 3 +- mm/secretmem.c | 246 +++++++++++++++++++++++++++++++++++++ 8 files changed, 289 insertions(+), 1 deletion(-) create mode 100644 include/linux/secretmem.h create mode 100644 mm/secretmem.c diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h new file mode 100644 index 000000000000..70e7db9f94fe --- /dev/null +++ b/include/linux/secretmem.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _LINUX_SECRETMEM_H +#define _LINUX_SECRETMEM_H + +#ifdef CONFIG_SECRETMEM + +bool vma_is_secretmem(struct vm_area_struct *vma); +bool page_is_secretmem(struct page *page); + +#else + +static inline bool vma_is_secretmem(struct vm_area_struct *vma) +{ + return false; +} + +static inline bool page_is_secretmem(struct page *page) +{ + return false; +} + +#endif /* CONFIG_SECRETMEM */ + +#endif /* _LINUX_SECRETMEM_H */ diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index f3956fc11de6..35687dcb1a42 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -97,5 +97,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define Z3FOLD_MAGIC 0x33 #define PPC_CMM_MAGIC 0xc7571590 +#define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ #endif /* __LINUX_MAGIC_H__ */ diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 19aa806890d5..e9a2011ee4a2 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -352,6 +352,8 @@ COND_SYSCALL(pkey_mprotect); COND_SYSCALL(pkey_alloc); COND_SYSCALL(pkey_free); +/* memfd_secret */ +COND_SYSCALL(memfd_secret); /* * Architecture specific weak syscall entries. diff --git a/mm/Kconfig b/mm/Kconfig index 24c045b24b95..5f8243442f66 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -872,4 +872,7 @@ config MAPPING_DIRTY_HELPERS config KMAP_LOCAL bool +config SECRETMEM + def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED + endmenu diff --git a/mm/Makefile b/mm/Makefile index 72227b24a616..b2a564eec27f 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -120,3 +120,4 @@ obj-$(CONFIG_MEMFD_CREATE) += memfd.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) += mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PAGE_REPORTING) += page_reporting.o +obj-$(CONFIG_SECRETMEM) += secretmem.o diff --git a/mm/gup.c b/mm/gup.c index e40579624f10..ecadc80934b2 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -758,6 +759,9 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, struct follow_page_context ctx = { NULL }; struct page *page; + if (vma_is_secretmem(vma)) + return NULL; + page = follow_page_mask(vma, address, foll_flags, &ctx); if (ctx.pgmap) put_dev_pagemap(ctx.pgmap); @@ -891,6 +895,9 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP; + if (vma_is_secretmem(vma)) + return -EFAULT; + if (write) { if (!(vm_flags & VM_WRITE)) { if (!(gup_flags & FOLL_FORCE)) @@ -2030,6 +2037,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); + if (page_is_secretmem(page)) + goto pte_unmap; + head = try_grab_compound_head(page, 1, flags); if (!head) goto pte_unmap; diff --git a/mm/mlock.c b/mm/mlock.c index f8f8cc32d03d..188711c72b67 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "internal.h" @@ -503,7 +504,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev, if (newflags == vma->vm_flags || (vma->vm_flags & VM_SPECIAL) || is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) || - vma_is_dax(vma)) + vma_is_dax(vma) || vma_is_secretmem(vma)) /* don't set VM_LOCKED or VM_LOCKONFAULT and don't count */ goto out; diff --git a/mm/secretmem.c b/mm/secretmem.c new file mode 100644 index 000000000000..fa6738e860c2 --- /dev/null +++ b/mm/secretmem.c @@ -0,0 +1,246 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright IBM Corporation, 2021 + * + * Author: Mike Rapoport + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +#include "internal.h" + +#undef pr_fmt +#define pr_fmt(fmt) "secretmem: " fmt + +/* + * Define mode and flag masks to allow validation of the system call + * parameters. + */ +#define SECRETMEM_MODE_MASK (0x0) +#define SECRETMEM_FLAGS_MASK SECRETMEM_MODE_MASK + +static bool secretmem_enable __ro_after_init; +module_param_named(enable, secretmem_enable, bool, 0400); +MODULE_PARM_DESC(secretmem_enable, + "Enable secretmem and memfd_secret(2) system call"); + +static vm_fault_t secretmem_fault(struct vm_fault *vmf) +{ + struct address_space *mapping = vmf->vma->vm_file->f_mapping; + struct inode *inode = file_inode(vmf->vma->vm_file); + pgoff_t offset = vmf->pgoff; + gfp_t gfp = vmf->gfp_mask; + unsigned long addr; + struct page *page; + int err; + + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >= i_size_read(inode)) + return vmf_error(-EINVAL); + +retry: + page = find_lock_page(mapping, offset); + if (!page) { + page = alloc_page(gfp | __GFP_ZERO); + if (!page) + return VM_FAULT_OOM; + + err = set_direct_map_invalid_noflush(page, 1); + if (err) { + put_page(page); + return vmf_error(err); + } + + __SetPageUptodate(page); + err = add_to_page_cache_lru(page, mapping, offset, gfp); + if (unlikely(err)) { + put_page(page); + /* + * If a split of large page was required, it + * already happened when we marked the page invalid + * which guarantees that this call won't fail + */ + set_direct_map_default_noflush(page, 1); + if (err == -EEXIST) + goto retry; + + return vmf_error(err); + } + + addr = (unsigned long)page_address(page); + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); + } + + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static const struct vm_operations_struct secretmem_vm_ops = { + .fault = secretmem_fault, +}; + +static int secretmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + unsigned long len = vma->vm_end - vma->vm_start; + + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) + return -EINVAL; + + if (mlock_future_check(vma->vm_mm, vma->vm_flags | VM_LOCKED, len)) + return -EAGAIN; + + vma->vm_flags |= VM_LOCKED | VM_DONTDUMP; + vma->vm_ops = &secretmem_vm_ops; + + return 0; +} + +bool vma_is_secretmem(struct vm_area_struct *vma) +{ + return vma->vm_ops == &secretmem_vm_ops; +} + +static const struct file_operations secretmem_fops = { + .mmap = secretmem_mmap, +}; + +static bool secretmem_isolate_page(struct page *page, isolate_mode_t mode) +{ + return false; +} + +static int secretmem_migratepage(struct address_space *mapping, + struct page *newpage, struct page *page, + enum migrate_mode mode) +{ + return -EBUSY; +} + +static void secretmem_freepage(struct page *page) +{ + set_direct_map_default_noflush(page, 1); + clear_highpage(page); +} + +static const struct address_space_operations secretmem_aops = { + .freepage = secretmem_freepage, + .migratepage = secretmem_migratepage, + .isolate_page = secretmem_isolate_page, +}; + +bool page_is_secretmem(struct page *page) +{ + struct address_space *mapping = page_mapping(page); + + if (!mapping) + return false; + + return mapping->a_ops == &secretmem_aops; +} + +static struct vfsmount *secretmem_mnt; + +static struct file *secretmem_file_create(unsigned long flags) +{ + struct file *file = ERR_PTR(-ENOMEM); + struct inode *inode; + + inode = alloc_anon_inode(secretmem_mnt->mnt_sb); + if (IS_ERR(inode)) + return ERR_CAST(inode); + + file = alloc_file_pseudo(inode, secretmem_mnt, "secretmem", + O_RDWR, &secretmem_fops); + if (IS_ERR(file)) + goto err_free_inode; + + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); + mapping_set_unevictable(inode->i_mapping); + + inode->i_mapping->a_ops = &secretmem_aops; + + /* pretend we are a normal file with zero size */ + inode->i_mode |= S_IFREG; + inode->i_size = 0; + + return file; + +err_free_inode: + iput(inode); + return file; +} + +SYSCALL_DEFINE1(memfd_secret, unsigned long, flags) +{ + struct file *file; + int fd, err; + + /* make sure local flags do not confict with global fcntl.h */ + BUILD_BUG_ON(SECRETMEM_FLAGS_MASK & O_CLOEXEC); + + if (!secretmem_enable) + return -ENOSYS; + + if (flags & ~(SECRETMEM_FLAGS_MASK | O_CLOEXEC)) + return -EINVAL; + + fd = get_unused_fd_flags(flags & O_CLOEXEC); + if (fd < 0) + return fd; + + file = secretmem_file_create(flags); + if (IS_ERR(file)) { + err = PTR_ERR(file); + goto err_put_fd; + } + + file->f_flags |= O_LARGEFILE; + + fd_install(fd, file); + return fd; + +err_put_fd: + put_unused_fd(fd); + return err; +} + +static int secretmem_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, SECRETMEM_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type secretmem_fs = { + .name = "secretmem", + .init_fs_context = secretmem_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static int secretmem_init(void) +{ + int ret = 0; + + if (!secretmem_enable) + return ret; + + secretmem_mnt = kern_mount(&secretmem_fs); + if (IS_ERR(secretmem_mnt)) + ret = PTR_ERR(secretmem_mnt); + + return ret; +} +fs_initcall(secretmem_init); From patchwork Wed Mar 3 16:22:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 393078 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E827EC4332E for ; Thu, 4 Mar 2021 00:58:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BFEAA64EDB for ; Thu, 4 Mar 2021 00:58:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232565AbhCDA5z (ORCPT ); Wed, 3 Mar 2021 19:57:55 -0500 Received: from mail.kernel.org ([198.145.29.99]:50512 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346460AbhCCQ0n (ORCPT ); Wed, 3 Mar 2021 11:26:43 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8804664EFD; Wed, 3 Mar 2021 16:24:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1614788651; bh=4leNaCIZ1xcR/mkizNz8otv7a82Efwmo44tjJNIZIk4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=JX+/9tLtUvIiDi7SFCqpWGhmAdreFxjmliUzXlVRwklzZTk4DPUwOSPZK9pwvWYD5 Ca308y9JA8UPmjRGSejqUIC8kF5wNup7qArHASl6egrzZtmLdpGXV/IY8XCH1jd/Uv nKJMwPe04B4fCU+0jwUhaZtSAnKLIBUhF77TydDo8AtOLgchSuMhe5GzsFY0dzL5oT sM0rWnwCWoGXKirD+9vHn1jE2/h2x7ew8QbbdZYzKOxMMWUROMkU0CRyKBx3/v04kq 9vg4tuPv2qegjZoZr9epw0w3+AFpfG3YdN5BeLeS2z2/mb0T3AWHqyIMb27IZdbBfo DFuxxEQOzpUyA== From: Mike Rapoport To: Andrew Morton Cc: Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , David Hildenbrand , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , James Bottomley , "Kirill A. Shutemov" , Matthew Wilcox , Matthew Garrett , Mark Rutland , Michal Hocko , Mike Rapoport , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , "Rafael J. Wysocki" , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: [PATCH v18 9/9] secretmem: test: add basic selftest for memfd_secret(2) Date: Wed, 3 Mar 2021 18:22:09 +0200 Message-Id: <20210303162209.8609-10-rppt@kernel.org> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20210303162209.8609-1-rppt@kernel.org> References: <20210303162209.8609-1-rppt@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Mike Rapoport The test verifies that file descriptor created with memfd_secret does not allow read/write operations, that secret memory mappings respect RLIMIT_MEMLOCK and that remote accesses with process_vm_read() and ptrace() to the secret memory fail. Signed-off-by: Mike Rapoport Cc: Alexander Viro Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Catalin Marinas Cc: Christopher Lameter Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Elena Reshetova Cc: Hagen Paul Pfeifer Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: James Bottomley Cc: "Kirill A. Shutemov" Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael Kerrisk Cc: Palmer Dabbelt Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Peter Zijlstra Cc: Rick Edgecombe Cc: Roman Gushchin Cc: Shakeel Butt Cc: Shuah Khan Cc: Thomas Gleixner Cc: Tycho Andersen Cc: Will Deacon --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 3 +- tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++ tools/testing/selftests/vm/run_vmtests.sh | 17 ++ 4 files changed, 316 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/vm/memfd_secret.c diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 9a35c3f6a557..c8deddc81e7a 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -21,4 +21,5 @@ va_128TBswitch map_fixed_noreplace write_to_hugetlbfs hmm-tests +memfd_secret local_config.* diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index d42115e4284d..0200fb61646c 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -34,6 +34,7 @@ TEST_GEN_FILES += khugepaged TEST_GEN_FILES += map_fixed_noreplace TEST_GEN_FILES += map_hugetlb TEST_GEN_FILES += map_populate +TEST_GEN_FILES += memfd_secret TEST_GEN_FILES += mlock-random-test TEST_GEN_FILES += mlock2-tests TEST_GEN_FILES += mremap_dontunmap @@ -133,7 +134,7 @@ warn_32bit_failure: endif endif -$(OUTPUT)/mlock-random-test: LDLIBS += -lcap +$(OUTPUT)/mlock-random-test $(OUTPUT)/memfd_secret: LDLIBS += -lcap $(OUTPUT)/gup_test: ../../../../mm/gup_test.h diff --git a/tools/testing/selftests/vm/memfd_secret.c b/tools/testing/selftests/vm/memfd_secret.c new file mode 100644 index 000000000000..c878c2b841fc --- /dev/null +++ b/tools/testing/selftests/vm/memfd_secret.c @@ -0,0 +1,296 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright IBM Corporation, 2020 + * + * Author: Mike Rapoport + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "../kselftest.h" + +#define fail(fmt, ...) ksft_test_result_fail(fmt, ##__VA_ARGS__) +#define pass(fmt, ...) ksft_test_result_pass(fmt, ##__VA_ARGS__) +#define skip(fmt, ...) ksft_test_result_skip(fmt, ##__VA_ARGS__) + +#ifdef __NR_memfd_secret + +#define PATTERN 0x55 + +static const int prot = PROT_READ | PROT_WRITE; +static const int mode = MAP_SHARED; + +static unsigned long page_size; +static unsigned long mlock_limit_cur; +static unsigned long mlock_limit_max; + +static int memfd_secret(unsigned long flags) +{ + return syscall(__NR_memfd_secret, flags); +} + +static void test_file_apis(int fd) +{ + char buf[64]; + + if ((read(fd, buf, sizeof(buf)) >= 0) || + (write(fd, buf, sizeof(buf)) >= 0) || + (pread(fd, buf, sizeof(buf), 0) >= 0) || + (pwrite(fd, buf, sizeof(buf), 0) >= 0)) + fail("unexpected file IO\n"); + else + pass("file IO is blocked as expected\n"); +} + +static void test_mlock_limit(int fd) +{ + size_t len; + char *mem; + + len = mlock_limit_cur; + mem = mmap(NULL, len, prot, mode, fd, 0); + if (mem == MAP_FAILED) { + fail("unable to mmap secret memory\n"); + return; + } + munmap(mem, len); + + len = mlock_limit_max * 2; + mem = mmap(NULL, len, prot, mode, fd, 0); + if (mem != MAP_FAILED) { + fail("unexpected mlock limit violation\n"); + munmap(mem, len); + return; + } + + pass("mlock limit is respected\n"); +} + +static void try_process_vm_read(int fd, int pipefd[2]) +{ + struct iovec liov, riov; + char buf[64]; + char *mem; + + if (read(pipefd[0], &mem, sizeof(mem)) < 0) { + fail("pipe write: %s\n", strerror(errno)); + exit(KSFT_FAIL); + } + + liov.iov_len = riov.iov_len = sizeof(buf); + liov.iov_base = buf; + riov.iov_base = mem; + + if (process_vm_readv(getppid(), &liov, 1, &riov, 1, 0) < 0) { + if (errno == ENOSYS) + exit(KSFT_SKIP); + exit(KSFT_PASS); + } + + exit(KSFT_FAIL); +} + +static void try_ptrace(int fd, int pipefd[2]) +{ + pid_t ppid = getppid(); + int status; + char *mem; + long ret; + + if (read(pipefd[0], &mem, sizeof(mem)) < 0) { + perror("pipe write"); + exit(KSFT_FAIL); + } + + ret = ptrace(PTRACE_ATTACH, ppid, 0, 0); + if (ret) { + perror("ptrace_attach"); + exit(KSFT_FAIL); + } + + ret = waitpid(ppid, &status, WUNTRACED); + if ((ret != ppid) || !(WIFSTOPPED(status))) { + fprintf(stderr, "weird waitppid result %ld stat %x\n", + ret, status); + exit(KSFT_FAIL); + } + + if (ptrace(PTRACE_PEEKDATA, ppid, mem, 0)) + exit(KSFT_PASS); + + exit(KSFT_FAIL); +} + +static void check_child_status(pid_t pid, const char *name) +{ + int status; + + waitpid(pid, &status, 0); + + if (WIFEXITED(status) && WEXITSTATUS(status) == KSFT_SKIP) { + skip("%s is not supported\n", name); + return; + } + + if ((WIFEXITED(status) && WEXITSTATUS(status) == KSFT_PASS) || + WIFSIGNALED(status)) { + pass("%s is blocked as expected\n", name); + return; + } + + fail("%s: unexpected memory access\n", name); +} + +static void test_remote_access(int fd, const char *name, + void (*func)(int fd, int pipefd[2])) +{ + int pipefd[2]; + pid_t pid; + char *mem; + + if (pipe(pipefd)) { + fail("pipe failed: %s\n", strerror(errno)); + return; + } + + pid = fork(); + if (pid < 0) { + fail("fork failed: %s\n", strerror(errno)); + return; + } + + if (pid == 0) { + func(fd, pipefd); + return; + } + + mem = mmap(NULL, page_size, prot, mode, fd, 0); + if (mem == MAP_FAILED) { + fail("Unable to mmap secret memory\n"); + return; + } + + ftruncate(fd, page_size); + memset(mem, PATTERN, page_size); + + if (write(pipefd[1], &mem, sizeof(mem)) < 0) { + fail("pipe write: %s\n", strerror(errno)); + return; + } + + check_child_status(pid, name); +} + +static void test_process_vm_read(int fd) +{ + test_remote_access(fd, "process_vm_read", try_process_vm_read); +} + +static void test_ptrace(int fd) +{ + test_remote_access(fd, "ptrace", try_ptrace); +} + +static int set_cap_limits(rlim_t max) +{ + struct rlimit new; + cap_t cap = cap_init(); + + new.rlim_cur = max; + new.rlim_max = max; + if (setrlimit(RLIMIT_MEMLOCK, &new)) { + perror("setrlimit() returns error"); + return -1; + } + + /* drop capabilities including CAP_IPC_LOCK */ + if (cap_set_proc(cap)) { + perror("cap_set_proc() returns error"); + return -2; + } + + return 0; +} + +static void prepare(void) +{ + struct rlimit rlim; + + page_size = sysconf(_SC_PAGE_SIZE); + if (!page_size) + ksft_exit_fail_msg("Failed to get page size %s\n", + strerror(errno)); + + if (getrlimit(RLIMIT_MEMLOCK, &rlim)) + ksft_exit_fail_msg("Unable to detect mlock limit: %s\n", + strerror(errno)); + + mlock_limit_cur = rlim.rlim_cur; + mlock_limit_max = rlim.rlim_max; + + printf("page_size: %ld, mlock.soft: %ld, mlock.hard: %ld\n", + page_size, mlock_limit_cur, mlock_limit_max); + + if (page_size > mlock_limit_cur) + mlock_limit_cur = page_size; + if (page_size > mlock_limit_max) + mlock_limit_max = page_size; + + if (set_cap_limits(mlock_limit_max)) + ksft_exit_fail_msg("Unable to set mlock limit: %s\n", + strerror(errno)); +} + +#define NUM_TESTS 4 + +int main(int argc, char *argv[]) +{ + int fd; + + prepare(); + + ksft_print_header(); + ksft_set_plan(NUM_TESTS); + + fd = memfd_secret(0); + if (fd < 0) { + if (errno == ENOSYS) + ksft_exit_skip("memfd_secret is not supported\n"); + else + ksft_exit_fail_msg("memfd_secret failed: %s\n", + strerror(errno)); + } + + test_mlock_limit(fd); + test_file_apis(fd); + test_process_vm_read(fd); + test_ptrace(fd); + + close(fd); + + ksft_exit(!ksft_get_fail_cnt()); +} + +#else /* __NR_memfd_secret */ + +int main(int argc, char *argv[]) +{ + printf("skip: skipping memfd_secret test (missing __NR_memfd_secret)\n"); + return KSFT_SKIP; +} + +#endif /* __NR_memfd_secret */ diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index e953f3cd9664..95a67382f132 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -346,4 +346,21 @@ else exitcode=1 fi +echo "running memfd_secret test" +echo "------------------------------------" +./memfd_secret +ret_val=$? + +if [ $ret_val -eq 0 ]; then + echo "[PASS]" +elif [ $ret_val -eq $ksft_skip ]; then + echo "[SKIP]" + exitcode=$ksft_skip +else + echo "[FAIL]" + exitcode=1 +fi + +exit $exitcode + exit $exitcode