From patchwork Fri Mar 5 01:49:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 394509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F59BC433E6 for ; Fri, 5 Mar 2021 01:49:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44BD665010 for ; Fri, 5 Mar 2021 01:49:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229500AbhCEBt0 (ORCPT ); Thu, 4 Mar 2021 20:49:26 -0500 Received: from mail.kernel.org ([198.145.29.99]:53894 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229436AbhCEBt0 (ORCPT ); Thu, 4 Mar 2021 20:49:26 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 90C896500C; Fri, 5 Mar 2021 01:49:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1614908966; bh=bzPBkcGBE+ojVBJFF9dsRIZwceAzEcsRQBFXrWNndxI=; h=Date:From:To:Subject:From; b=i0Eq3RYBAH6QlSisXJDceJYgF1GWVtf59xmATEfmBtHTS2L0x/fusPnevEvgpiNjs pEWr8Lah4VlJ1peIgVfl/nJQIsML3GqXUkKhTwZepGx2CbxfIRBlkSYbSWrYFyeHD9 qS+wbuQjSe62w0MwLq/1oijXC65jYOylKoE8PcRM= Date: Thu, 04 Mar 2021 17:49:24 -0800 From: akpm@linux-foundation.org To: aarcange@redhat.com, luto@kernel.org, mike.kravetz@oracle.com, minchan@kernel.org, mm-commits@vger.kernel.org, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, rppt@linux.vnet.ibm.com, stable@vger.kernel.org, will@kernel.org, xemul@openvz.org, yuzhao@google.com Subject: + mm-userfaultfd-fix-memory-corruption-due-to-writeprotect.patch added to -mm tree Message-ID: <20210305014924.v3_zhhr2b%akpm@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org The patch titled Subject: mm/userfaultfd: fix memory corruption due to writeprotect has been added to the -mm tree. Its filename is mm-userfaultfd-fix-memory-corruption-due-to-writeprotect.patch This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-userfaultfd-fix-memory-corruption-due-to-writeprotect.patch and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-userfaultfd-fix-memory-corruption-due-to-writeprotect.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Nadav Amit Subject: mm/userfaultfd: fix memory corruption due to writeprotect Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0 cpu1 cpu2 ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0 cpu1 cpu2 cpu3 ---- ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ... [ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: [5.9+] Signed-off-by: Andrew Morton --- mm/memory.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/mm/memory.c~mm-userfaultfd-fix-memory-corruption-due-to-writeprotect +++ a/mm/memory.c @@ -3103,6 +3103,14 @@ static vm_fault_t do_wp_page(struct vm_f return handle_userfault(vmf, VM_UFFD_WP); } + /* + * Userfaultfd write-protect can defer flushes. Ensure the TLB + * is flushed in this case before copying. + */ + if (unlikely(userfaultfd_wp(vmf->vma) && + mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /*