From patchwork Thu Apr 1 18:21:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 414274 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62738C433B4 for ; Thu, 1 Apr 2021 19:22:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 415876108B for ; Thu, 1 Apr 2021 19:22:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234180AbhDATWv (ORCPT ); Thu, 1 Apr 2021 15:22:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234376AbhDATWo (ORCPT ); Thu, 1 Apr 2021 15:22:44 -0400 Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com [IPv6:2607:f8b0:4864:20::74a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B28EC049FC9 for ; Thu, 1 Apr 2021 11:21:33 -0700 (PDT) Received: by mail-qk1-x74a.google.com with SMTP id h21so4305771qkl.12 for ; Thu, 01 Apr 2021 11:21:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=7FFTxJfhBub0ermj70tsntXQaF5NxHdv1LLLlAgh3Tg=; b=IvjDKg2QrBzk4nuwOJe7alSzTfjXQP4b7KooEke+eiaaPZYUwJz1yuqQH5tTDFEK98 7Lk7/uotQuA8WyQRHVZilIOzw6ATk5kIoJ1FMiDxVUWxXOo3lA5bkZ4nsXxDQzVFsb6I wQhRBt2hS9Y68Z/VRvYd4JI0ecn1ecAgTx2aT51YYHfrUeGCT4o3Vdi3jgTB0NTGksYH Ag5biBiLOCYWQEf42Fjbyn8pmn5S1XUm0OEFDO56SyluU3/SAdNQKqcz6FZKg12/s5jO hzYsgH8HQUmWqtuOIg96pFQdNKziF+3zTtd959cnOlMDo9R7W+V7/eGk9S+ly3+hVUP8 m61A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=7FFTxJfhBub0ermj70tsntXQaF5NxHdv1LLLlAgh3Tg=; b=X/6ORj2inK2MUhLMZrI3mOw/ChdDiByJLNHych3aGchF26B4gNpEdJicNNJcMGSaqN EnOnTSAOWy10NUnXGYTV/NK/9XLerj8ZDBIK5pySJn7ybXSYa2WmISsTz/Swgm8EgXm2 soqv4ilGXHyyI9ipQY2W2XgeS9AU9WC6Okelzk/2V8vCRkcvsvUkAg6OrKUh/1MjkuCS jI1IJyw2e/U4MBY4xOnpH36a/RXUj5OFOu2kouzEFu8qGPYPUaA001QUY7MgsIXZZTi4 TbpCdhU+D+3FHigdDdBVIoVTFRP3FTdP9MkCzZ+620RvhzeE3bNRORGJ3IWNwP1+W56O DCXw== X-Gm-Message-State: AOAM531dUirUT0JYHCQjfjlAWkfJioXGhtsikzi3KeByqK6s/N4/MJQs PV3RI+M9zLjMfFPAfswk1fndzmA7ppJljsSi+L7ZEebrynwDzQMxh9EXHzwzkt2T8ZV668Fw/wb sQZXPtgSRA62i/BnU41kfPDxaILtU5r+9ytXDxhKFVD46SEvNxaYNqbIV3po8kw== X-Google-Smtp-Source: ABdhPJxh3T2CByrPxU0x0VwYvdWMkwXm0bV1SkaFoboTivfYvAc2IT1vS+/1Q743N1AN5Z/g9DCa/yCPEVI= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:a0c:fecd:: with SMTP id z13mr9375738qvs.43.1617301292340; Thu, 01 Apr 2021 11:21:32 -0700 (PDT) Date: Thu, 1 Apr 2021 11:21:22 -0700 In-Reply-To: <20210401182125.171484-1-surenb@google.com> Message-Id: <20210401182125.171484-3-surenb@google.com> Mime-Version: 1.0 References: <20210401182125.171484-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 2/5] mm: do_wp_page() simplification From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, Peter Xu Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Linus Torvalds How about we just make sure we're the only possible valid user fo the page before we bother to reuse it? Simplify, simplify, simplify. And get rid of the nasty serialization on the page lock at the same time. [peterx: add subject prefix] Signed-off-by: Linus Torvalds Signed-off-by: Peter Xu Signed-off-by: Linus Torvalds --- mm/memory.c | 58 ++++++++++++++++------------------------------------- 1 file changed, 17 insertions(+), 41 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3874acce1472..d95a4573a273 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2847,49 +2847,25 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) * not dirty accountable. */ if (PageAnon(vmf->page)) { - int total_map_swapcount; - if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) || - page_count(vmf->page) != 1)) + struct page *page = vmf->page; + + /* PageKsm() doesn't necessarily raise the page refcount */ + if (PageKsm(page) || page_count(page) != 1) + goto copy; + if (!trylock_page(page)) + goto copy; + if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { + unlock_page(page); goto copy; - if (!trylock_page(vmf->page)) { - get_page(vmf->page); - pte_unmap_unlock(vmf->pte, vmf->ptl); - lock_page(vmf->page); - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); - if (!pte_same(*vmf->pte, vmf->orig_pte)) { - unlock_page(vmf->page); - pte_unmap_unlock(vmf->pte, vmf->ptl); - put_page(vmf->page); - return 0; - } - put_page(vmf->page); - } - if (PageKsm(vmf->page)) { - bool reused = reuse_ksm_page(vmf->page, vmf->vma, - vmf->address); - unlock_page(vmf->page); - if (!reused) - goto copy; - wp_page_reuse(vmf); - return VM_FAULT_WRITE; - } - if (reuse_swap_page(vmf->page, &total_map_swapcount)) { - if (total_map_swapcount == 1) { - /* - * The page is all ours. Move it to - * our anon_vma so the rmap code will - * not search our parent or siblings. - * Protected against the rmap code by - * the page lock. - */ - page_move_anon_rmap(vmf->page, vma); - } - unlock_page(vmf->page); - wp_page_reuse(vmf); - return VM_FAULT_WRITE; } - unlock_page(vmf->page); + /* + * Ok, we've got the only map reference, and the only + * page count reference, and the page is locked, + * it's dark out, and we're wearing sunglasses. Hit it. + */ + wp_page_reuse(vmf); + unlock_page(page); + return VM_FAULT_WRITE; } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED))) { return wp_page_shared(vmf); From patchwork Thu Apr 1 18:21:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 414272 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-31.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5C94C433B4 for ; Thu, 1 Apr 2021 19:22:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE59B6108B for ; Thu, 1 Apr 2021 19:22:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235054AbhDATW5 (ORCPT ); Thu, 1 Apr 2021 15:22:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234717AbhDATWs (ORCPT ); Thu, 1 Apr 2021 15:22:48 -0400 Received: from mail-qk1-x749.google.com (mail-qk1-x749.google.com [IPv6:2607:f8b0:4864:20::749]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D29BC049FCF for ; Thu, 1 Apr 2021 11:21:37 -0700 (PDT) Received: by mail-qk1-x749.google.com with SMTP id b78so4300432qkg.13 for ; Thu, 01 Apr 2021 11:21:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Oo4/iLObcqLXLV2/XExPkM5pqEReTYs+R+HS3we8lZ4=; b=tuz14c1aRmcVZA08g3WfHkeWoJDzUVhxIpLw1Vp9J743ol12q2ihLiSrd9dSujmDSe rX3+gUYoS8a5Gn27DtJVlZWNp24bjHifABDq97HEH8DfZxheS5k4MFwk3AabRyBlnNtg F6PzQSLgVplocyN8cqAiDKNdSnVNqGInuJ4SJbVxu1WdTJKA+RCLu4MYbi6Lwg1Ey3In P6HrGt/3oe4VyhsDVuxZkLn4P3uTX3qvhyE9mHmeyjDOOH5Qv73pZ8KUi4ep+mCa/WXZ 6t6fW4wYHuppKajlhNQfsjQoB5z6wNTthH7LT59lhzDrYyyWZ1Ub3doo40630Sbi1X8J vrpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Oo4/iLObcqLXLV2/XExPkM5pqEReTYs+R+HS3we8lZ4=; b=mpITabpW1sw8hiwVlgN3F6BumS+Fu8LcK80bQjqWs18VDvGWcpP7fgAxIgMzfsOUF8 FzT50YQhK70npFYTEmirvNa+fw3CzZ2LOVdq7/bu2cqwbFHW8xIA1idYJLvx8EMJ8OPR mIrj+S514FMmGBWDUzwhenvzOUfv0qTlFuQqw++O58KMfXianlMlgd3/dKFugIS5d3eI Sv+hXfIIvDtSa9P6tIlp2KQOx7rwLxj4KqxuCflx2EEkvvow/t7Ee4ZeGYDeIUkP9Rl4 h3cTGAyPBUCP4jmuSvijpY+QorGs7CsSLzHeKdSfC+2u3k/I4UOeYvdSNRrilIPYoeTJ SKfQ== X-Gm-Message-State: AOAM530bKQ7akNP0z7NuTnqoTrjxeinRDqzuso9qMbs/G79Ht9pOo2ZN T9lXxAievs4vNU8BuxxGbiSvTHwNnozzPNHSc/CwppAcqXz8nmYJ5Hh9SP77iUESl1wVjb1PU5v YljxIzUGDolO204PtIo7WNvToprch7XnHvkfbijMa6sTb04Q62AaCZ2HRwPCy1A== X-Google-Smtp-Source: ABdhPJwcg8o/8IRXJYzRz4F/NAPO3qCJYllZtztoX0gkXvKl/3JJgpcO7aVC6xjLgs/5XDWGYjNBR4Am+Do= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:a05:6214:5af:: with SMTP id by15mr9621777qvb.37.1617301296257; Thu, 01 Apr 2021 11:21:36 -0700 (PDT) Date: Thu, 1 Apr 2021 11:21:24 -0700 In-Reply-To: <20210401182125.171484-1-surenb@google.com> Message-Id: <20210401182125.171484-5-surenb@google.com> Mime-Version: 1.0 References: <20210401182125.171484-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 4/5] userfaultfd: wp: add helper for writeprotect check From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, Andrea Arcangeli , Peter Xu , Andrew Morton , Jerome Glisse , Mike Rapoport , Rik van Riel , "Kirill A . Shutemov" , Mel Gorman , Hugh Dickins , Johannes Weiner , Bobby Powers , Brian Geffon , David Hildenbrand , Denis Plotnikov , "Dr . David Alan Gilbert" , Martin Cracauer , Marty McFadden , Maya Gokhale , Mike Kravetz , Pavel Emelyanov Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Shaohua Li Patch series "userfaultfd: write protection support", v6. Overview ======== The uffd-wp work was initialized by Shaohua Li [1], and later continued by Andrea [2]. This series is based upon Andrea's latest userfaultfd tree, and it is a continuous works from both Shaohua and Andrea. Many of the follow up ideas come from Andrea too. Besides the old MISSING register mode of userfaultfd, the new uffd-wp support provides another alternative register mode called UFFDIO_REGISTER_MODE_WP that can be used to listen to not only missing page faults but also write protection page faults, or even they can be registered together. At the same time, the new feature also provides a new userfaultfd ioctl called UFFDIO_WRITEPROTECT which allows the userspace to write protect a range or memory or fixup write permission of faulted pages. Please refer to the document patch "userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update" for more information on the new interface and what it can do. The major workflow of an uffd-wp program should be: 1. Register a memory region with WP mode using UFFDIO_REGISTER_MODE_WP 2. Write protect part of the whole registered region using UFFDIO_WRITEPROTECT, passing in UFFDIO_WRITEPROTECT_MODE_WP to show that we want to write protect the range. 3. Start a working thread that modifies the protected pages, meanwhile listening to UFFD messages. 4. When a write is detected upon the protected range, page fault happens, a UFFD message will be generated and reported to the page fault handling thread 5. The page fault handler thread resolves the page fault using the new UFFDIO_WRITEPROTECT ioctl, but this time passing in !UFFDIO_WRITEPROTECT_MODE_WP instead showing that we want to recover the write permission. Before this operation, the fault handler thread can do anything it wants, e.g., dumps the page to a persistent storage. 6. The worker thread will continue running with the correctly applied write permission from step 5. Currently there are already two projects that are based on this new userfaultfd feature. QEMU Live Snapshot: The project provides a way to allow the QEMU hypervisor to take snapshot of VMs without stopping the VM [3]. LLNL umap library: The project provides a mmap-like interface and "allow to have an application specific buffer of pages cached from a large file, i.e. out-of-core execution using memory map" [4][5]. Before posting the patchset, this series was smoke tested against QEMU live snapshot and the LLNL umap library (by doing parallel quicksort using 128 sorting threads + 80 uffd servicing threads). My sincere thanks to Marty Mcfadden and Denis Plotnikov for the help along the way. TODO ==== - hugetlbfs/shmem support - performance - more architectures - cooperate with mprotect()-allowed processes (???) - ... References ========== [1] https://lwn.net/Articles/666187/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=userfault [3] https://github.com/denis-plotnikov/qemu/commits/background-snapshot-kvm [4] https://github.com/LLNL/umap [5] https://llnl-umap.readthedocs.io/en/develop/ [6] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=b245ecf6cf59156966f3da6e6b674f6695a5ffa5 [7] https://lkml.org/lkml/2018/11/21/370 [8] https://lkml.org/lkml/2018/12/30/64 This patch (of 19): Add helper for writeprotect check. Will use it later. Signed-off-by: Shaohua Li Signed-off-by: Andrea Arcangeli Signed-off-by: Peter Xu Signed-off-by: Andrew Morton Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Cc: Rik van Riel Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Hugh Dickins Cc: Johannes Weiner Cc: Bobby Powers Cc: Brian Geffon Cc: David Hildenbrand Cc: Denis Plotnikov Cc: "Dr . David Alan Gilbert" Cc: Martin Cracauer Cc: Marty McFadden Cc: Maya Gokhale Cc: Mike Kravetz Cc: Pavel Emelyanov Link: http://lkml.kernel.org/r/20200220163112.11409-2-peterx@redhat.com Signed-off-by: Linus Torvalds --- include/linux/userfaultfd_k.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 37c9eba75c98..38f748e7186e 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -50,6 +50,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma) return vma->vm_flags & VM_UFFD_MISSING; } +static inline bool userfaultfd_wp(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP); @@ -94,6 +99,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma) return false; } +static inline bool userfaultfd_wp(struct vm_area_struct *vma) +{ + return false; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false; From patchwork Thu Apr 1 18:21:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 414273 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09860C433ED for ; Thu, 1 Apr 2021 19:22:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D8F2361001 for ; Thu, 1 Apr 2021 19:22:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235030AbhDATWx (ORCPT ); Thu, 1 Apr 2021 15:22:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234326AbhDATWr (ORCPT ); Thu, 1 Apr 2021 15:22:47 -0400 Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com [IPv6:2607:f8b0:4864:20::f49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CFA1C049FD1 for ; Thu, 1 Apr 2021 11:21:39 -0700 (PDT) Received: by mail-qv1-xf49.google.com with SMTP id gy8so3855913qvb.4 for ; Thu, 01 Apr 2021 11:21:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=1v73NpRb4gNLtIJ6f4iTsLt04NK7xDWlZvvGIJeIyOY=; b=BGhVoWtyVDS11um69GvHWaBxjInp0xc3nZ5igQbDAiVstwD+DwfnURZTAVsHvefgul 54CNcu4PR19AQT/0CY7RzNPbivkK8sBS4gr5l9Y8nIqdzfWJXgPSfyuryUbXs1i117CW ypS9RQXBq/eURlzTCMmK5OT5dgKf2sJM9PYhQwlJ24MLApAGorTD0mAVAvC8SCFmekKR rJ53gDagXqpf+orSrLVUX6slrSoSNg32aHBosaupo8IXy6ULWRPhUqVFRP319BpvKy4q 0jADW5Au3xewFYvz4nt8yaXmmPjjO63lVrGa6MCKaHqCIhu3+iCbTWipUikh7ZnmJxXZ YgBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=1v73NpRb4gNLtIJ6f4iTsLt04NK7xDWlZvvGIJeIyOY=; b=tCjhmXAKTd/H8OeqXDOTNnm9PkxbryiQoqMZf2ipnep2XiZQfw25UXzDmKpASrU3VN ZK8gfSDIM6TGCkbllZgB/NrahAOr6PTGZdzgGTqzmSESeXywSMBmrRk1QptDe/5RB1kJ gAYBlhOXetXQpa9BuxXd9EEon7GG2VlUl2Fk5zAR51ng2Pwwfacf2cBhkYNaJpOWBnOa 01KroUwT6SyxiUycUYeltLGbHAEiBoVkfSxtYnJc8W8Lqs2zTHzAoN28fdiWHWOOHG1Y nJU85gqXY9wv+V5v2aRTfmL6cMQcFKS7e6osHFqewI4Sthe1xBqzAP6fpM4/N+hrtUyR rGdw== X-Gm-Message-State: AOAM533jRBKRYlpBbmqVyttqNL3bjWBP8xWaeeywAgO9HKwupLKvYl0i BTteKUNg0EyUvKSKMjX8j+7Dpo6FZtAlaM/XKKSXTRm3xARI/++QsR77GaRoBmvsXaA17oXJhfG QBRgk1JfejQmNDnJinlcPtJz/oQjYY8DOhU73JKvK0UTBbVHYH7BNtz5NQ9lLVg== X-Google-Smtp-Source: ABdhPJwxEkRxWitj7VWGtbxQbGP2KZXXbWNT/3WR7Y0OIP2YMtheTtde/hB1RIvwzNn28Us9gKp/DzrgxuY= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:ad4:50c7:: with SMTP id e7mr9408265qvq.58.1617301298047; Thu, 01 Apr 2021 11:21:38 -0700 (PDT) Date: Thu, 1 Apr 2021 11:21:25 -0700 In-Reply-To: <20210401182125.171484-1-surenb@google.com> Message-Id: <20210401182125.171484-6-surenb@google.com> Mime-Version: 1.0 References: <20210401182125.171484-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 5/5] mm/userfaultfd: fix memory corruption due to writeprotect From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com, Yu Zhao , Peter Xu , Andrea Arcangeli , Andy Lutomirski , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Andrew Morton Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Nadav Amit Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0 cpu1 cpu2 ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0 cpu1 cpu2 cpu3 ---- ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ... [ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memory.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index 656d90a75cf8..fe6e92de9bec 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2825,6 +2825,14 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + /* + * Userfaultfd write-protect can defer flushes. Ensure the TLB + * is flushed in this case before copying. + */ + if (unlikely(userfaultfd_wp(vmf->vma) && + mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /*