From patchwork Thu Apr 1 18:17:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 414277 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27D5BC433B4 for ; Thu, 1 Apr 2021 19:21:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F0B8B6108B for ; Thu, 1 Apr 2021 19:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234872AbhDATVT (ORCPT ); Thu, 1 Apr 2021 15:21:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234376AbhDATVL (ORCPT ); Thu, 1 Apr 2021 15:21:11 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FE3CC048F2D for ; Thu, 1 Apr 2021 11:17:52 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id w7so6623489ybq.4 for ; Thu, 01 Apr 2021 11:17:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=SQmWyihLynq1vxlXIZnnYrhWJiVcwsY6ykKbBmHmp/w=; b=P0s9nmSUOso3M4IwpWpTuWzReIWrwsNczHXACiQLGjUVD59T6zAZhOZTKqVDoOGZPM ELqsOa6/b6oxSB/BpxzS6H8PFYB2iNb3A6Ko5THKzv1XQSjGF0+OV+bAA2hzf1oei2G9 waMHYgD1BtXe2ARHAGth7KBqipiyJj+jHcsvAVXmhu4iSPF6R/LnnUdRRxE1RXCxCoZF DYnGQpp1mo3XFHEyCzTOcvoC2c1sCFBkMLDkU/KUfVUDnbg/WpsvqF33X3hoeqqLqxF9 6nmCH+s3SYWbdx87hp8/JLThMxWLAoLpf29erRSjdcs/BTJ4EPhMOa4y7k/+x2mi9m4I YwXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=SQmWyihLynq1vxlXIZnnYrhWJiVcwsY6ykKbBmHmp/w=; b=VVc04NJ/q/qr55DR3orxw+Xu/0Bfu6/TjQwAfHvq9ai/XQN3349l09FeLd6e5xR1Up z7MtoTqmnyeYRSfSLrJhQXAsBFyMhPFLwdP35HKtiDkmQhutF187sg5r7wBMZs1Ehx0m fWvUBbOQyHC9nQdFKbsQ8MLYM/7DTzdvuIOw1bm3BGtq6fXnBMoksaHdH6nSOnhL5jVf UhqO3AEK6ISyJ9UukA1GOH+zSrLjfaDtpKQyTPoQpB0BbHGF10DAdJQIiM5G+e+kKzQB 2GK1YAtMhzzoE3uKm9CIXKStcyqAf03ERwDYdrkRB7rbq5s1HCaAgsrRZku8hs9bv6lU /ndQ== X-Gm-Message-State: AOAM530duRznJ5RFga48ATruLSTpXqFXDGiLTXY+OPFSR9WvMBU1ZliJ 6L2DAoZ96SUt1RFlZjaEZIqVKNwW4Q5eYOvHTH9oTZnfnRi1Fh24Z36ALxpyCpiSp4BZ/jZhuSs Oeq8Q8MyXIPiA4Cy77IRbGFQJSlZw9Lzz5EH2rROnsQY6m9qCkUm1q/B92CsvjA== X-Google-Smtp-Source: ABdhPJw36/jc6xurXz6R7Fq8v13fytweZrUVCqZMZTqkwel5NP1HU4iFmgcBmdujTpkEUebtBKXFFUZhRpM= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:a25:6f44:: with SMTP id k65mr12694020ybc.218.1617301071663; Thu, 01 Apr 2021 11:17:51 -0700 (PDT) Date: Thu, 1 Apr 2021 11:17:38 -0700 In-Reply-To: <20210401181741.168763-1-surenb@google.com> Message-Id: <20210401181741.168763-3-surenb@google.com> Mime-Version: 1.0 References: <20210401181741.168763-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 2/5] mm: do_wp_page() simplification From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Peter Xu Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Linus Torvalds How about we just make sure we're the only possible valid user fo the page before we bother to reuse it? Simplify, simplify, simplify. And get rid of the nasty serialization on the page lock at the same time. [peterx: add subject prefix] Signed-off-by: Linus Torvalds Signed-off-by: Peter Xu Signed-off-by: Linus Torvalds --- mm/memory.c | 58 ++++++++++++++++------------------------------------- 1 file changed, 17 insertions(+), 41 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 6920bfb3f89c..e84648d81d6d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2832,49 +2832,25 @@ static int do_wp_page(struct vm_fault *vmf) * not dirty accountable. */ if (PageAnon(vmf->page)) { - int total_map_swapcount; - if (PageKsm(vmf->page) && (PageSwapCache(vmf->page) || - page_count(vmf->page) != 1)) + struct page *page = vmf->page; + + /* PageKsm() doesn't necessarily raise the page refcount */ + if (PageKsm(page) || page_count(page) != 1) + goto copy; + if (!trylock_page(page)) + goto copy; + if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { + unlock_page(page); goto copy; - if (!trylock_page(vmf->page)) { - get_page(vmf->page); - pte_unmap_unlock(vmf->pte, vmf->ptl); - lock_page(vmf->page); - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); - if (!pte_same(*vmf->pte, vmf->orig_pte)) { - unlock_page(vmf->page); - pte_unmap_unlock(vmf->pte, vmf->ptl); - put_page(vmf->page); - return 0; - } - put_page(vmf->page); - } - if (PageKsm(vmf->page)) { - bool reused = reuse_ksm_page(vmf->page, vmf->vma, - vmf->address); - unlock_page(vmf->page); - if (!reused) - goto copy; - wp_page_reuse(vmf); - return VM_FAULT_WRITE; - } - if (reuse_swap_page(vmf->page, &total_map_swapcount)) { - if (total_map_swapcount == 1) { - /* - * The page is all ours. Move it to - * our anon_vma so the rmap code will - * not search our parent or siblings. - * Protected against the rmap code by - * the page lock. - */ - page_move_anon_rmap(vmf->page, vma); - } - unlock_page(vmf->page); - wp_page_reuse(vmf); - return VM_FAULT_WRITE; } - unlock_page(vmf->page); + /* + * Ok, we've got the only map reference, and the only + * page count reference, and the page is locked, + * it's dark out, and we're wearing sunglasses. Hit it. + */ + wp_page_reuse(vmf); + unlock_page(page); + return VM_FAULT_WRITE; } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED))) { return wp_page_shared(vmf); From patchwork Thu Apr 1 18:17:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 414276 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E288C43462 for ; Thu, 1 Apr 2021 19:21:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EE296610A0 for ; Thu, 1 Apr 2021 19:21:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234914AbhDATVU (ORCPT ); Thu, 1 Apr 2021 15:21:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51178 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234638AbhDATVM (ORCPT ); Thu, 1 Apr 2021 15:21:12 -0400 Received: from mail-qk1-x749.google.com (mail-qk1-x749.google.com [IPv6:2607:f8b0:4864:20::749]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63D54C048F2F for ; Thu, 1 Apr 2021 11:17:54 -0700 (PDT) Received: by mail-qk1-x749.google.com with SMTP id v136so4303662qkb.9 for ; Thu, 01 Apr 2021 11:17:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=xxiAXSo3VjIRKYILlCaYgYwnfwMxsag7xg8F7C8V4gk=; b=WHCnyfgYE72BsxoC3+we8HzNs4Cc2W1zdmE2FBIe8p+ZzPpPzsPi4TtDXqwNyNCWiP +Gg6WechHI5zDExRJ9YvYIIXygcLqALrlFi3u3gY01HPKlZgpBcGUJUK1CmSam6/Gtpb z1zC77H9PxKT39XC6llEq3qNKJTrdeTO/C1UAuUipIPijkof5was8wfysttuQOB6Ul9K p5zoGxqTEtpRm/rRTG8FoFoUO60ZHTf9Ho6P8g4rHxBaF9VnMrRhVQ0REouKLdZYLUO7 9MQ5wPt6Sejmp+5AnwsHVoj0KjgBeostzPsWDXVe2OmFvRkatC4LvBzmvvzybLQTKqvP yE5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=xxiAXSo3VjIRKYILlCaYgYwnfwMxsag7xg8F7C8V4gk=; b=KkUlURlEZ4O9q1HlsT+/9/FuvAblfAKRaiTv6la9t3tKU0W1TPzT5rC/binLCwqAXa JGALJS4GTk9WYajbK9bo/uCeEVfyF8euduK0XLFfC7Ibz2TU1PBUfFANAYy+YbaOGZm9 aXqg+/nWIP8H580RNRqPWo3ZG7wVU5VmYOIT5W6YVTPtK+SXNhnO6kuQayJXYoELJrhC tXeZc4bHHxQD/qp+Jq/x4J1rMP8HORncafO8HnrWmw8vl4omGlFKttgiHn+aWvJVxFKe LEgjP0ikhV/8TPdK90HPuKDQB2sir8lqzAP+99j70WsquPzG4TXnpqlHhbDv/VvywYGH wmiQ== X-Gm-Message-State: AOAM531Z1jxOmBquca5UZA3/G9LxMStZ9PkG9zSz9SeOdoJNqE6kkjiW BnJmsfWJA8kXhyo+ZCheglPVoXPFNKFpE4b5l2GjLIgBZQLJCBErsNtA7G8GjXEAdzCKrnkV9Ra DcMbPecfJ7X6Nm5+7YmpBsXztXN6Q3TayeiqLGc3t8J9Jbv2a5xX3kY80SAgqsA== X-Google-Smtp-Source: ABdhPJzWpshHJYnTRPhKzhXMsIiS2RDdrBVS8l2L80P/crekRlC8oE3WyRSgXrn5+ygoLXbaAAC9XSW3rIs= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:a05:6214:9c9:: with SMTP id dp9mr9503117qvb.34.1617301073524; Thu, 01 Apr 2021 11:17:53 -0700 (PDT) Date: Thu, 1 Apr 2021 11:17:39 -0700 In-Reply-To: <20210401181741.168763-1-surenb@google.com> Message-Id: <20210401181741.168763-4-surenb@google.com> Mime-Version: 1.0 References: <20210401181741.168763-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 3/5] mm: fix misplaced unlock_page in do_wp_page() From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Qian Cai , Alex Shi , Gerald Schaefer Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Linus Torvalds Commit 09854ba94c6a ("mm: do_wp_page() simplification") reorganized all the code around the page re-use vs copy, but in the process also moved the final unlock_page() around to after the wp_page_reuse() call. That normally doesn't matter - but it means that the unlock_page() is now done after releasing the page table lock. Again, not a big deal, you'd think. But it turns out that it's very wrong indeed, because once we've released the page table lock, we've basically lost our only reference to the page - the page tables - and it could now be free'd at any time. We do hold the mmap_sem, so no actual unmap() can happen, but madvise can come in and a MADV_DONTNEED will zap the page range - and free the page. So now the page may be free'd just as we're unlocking it, which in turn will usually trigger a "Bad page state" error in the freeing path. To make matters more confusing, by the time the debug code prints out the page state, the unlock has typically completed and everything looks fine again. This all doesn't happen in any normal situations, but it does trigger with the dirtyc0w_child LTP test. And it seems to trigger much more easily (but not expclusively) on s390 than elsewhere, probably because s390 doesn't do the "batch pages up for freeing after the TLB flush" that gives the unlock_page() more time to complete and makes the race harder to hit. Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Link: https://lore.kernel.org/lkml/a46e9bbef2ed4e17778f5615e818526ef848d791.camel@redhat.com/ Link: https://lore.kernel.org/linux-mm/c41149a8-211e-390b-af1d-d5eee690fecb@linux.alibaba.com/ Reported-by: Qian Cai Reported-by: Alex Shi Bisected-and-analyzed-by: Gerald Schaefer Tested-by: Gerald Schaefer Signed-off-by: Linus Torvalds --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index e84648d81d6d..14470ceaf3f2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2848,8 +2848,8 @@ static int do_wp_page(struct vm_fault *vmf) * page count reference, and the page is locked, * it's dark out, and we're wearing sunglasses. Hit it. */ - wp_page_reuse(vmf); unlock_page(page); + wp_page_reuse(vmf); return VM_FAULT_WRITE; } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED))) { From patchwork Thu Apr 1 18:17:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 414275 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-31.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5165FC43600 for ; Thu, 1 Apr 2021 19:21:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 29256610CB for ; Thu, 1 Apr 2021 19:21:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234638AbhDATVV (ORCPT ); Thu, 1 Apr 2021 15:21:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234287AbhDATVM (ORCPT ); Thu, 1 Apr 2021 15:21:12 -0400 Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com [IPv6:2607:f8b0:4864:20::84a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A16CC048F31 for ; Thu, 1 Apr 2021 11:17:56 -0700 (PDT) Received: by mail-qt1-x84a.google.com with SMTP id l13so3606202qtu.6 for ; Thu, 01 Apr 2021 11:17:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=EPPZk3kaJmCkCuLSmDGChUbNjldn8zDlCy03b3PwOQY=; b=Sw3ihxK4bHf0Qh8AFiyTVIWGEOipcxVOp6rn+s5hDqIUo0cxKJRzuojdeIGP5He/Gw LoLmjPOTGaSquZ+lRKHRbtWHje87Yf3SBxBdxeWvHNJbjxUf8PAJfsP5cgxJzwHnqviV MwVsReqjMjwKcYoTftk0u/HxV/jVLUfYU5z2RnnVpeWJQSnww2MH9+KzScoKdF/pbB4O 7OGjMlISEy8QQwWA2r2+qjc8o8/jbzjE3N6xvKQZLlfs3zSk3+e1iaS5oODjHUF9lhWw FlFGgk+pTNiX2VXqit05IKys2IvnhJ3cFfixyjjiUTm1lMWEF0wKuuTFwo4e/klARQxM 9TTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=EPPZk3kaJmCkCuLSmDGChUbNjldn8zDlCy03b3PwOQY=; b=bxnRm8juJ6qcm1YBVfbVRApEpQYBqkXdpQ41vNHV64Qbr85I/FgKIowUfKkPp3Uvek 5IJgNWr/tIN92GyfU1szT6swTEvSkQsqMs8mLWveabYmsE8qs4SukbWnhhGRgdebck3R jxswH9p1VaeO7Ltjjh6PjbIsOnJNvJoafXmHlNYPN/Uq+tdX7+g+TlS/FDCalm5IwKj0 IivEV/yLckG1nclygGkn9t415ls6cBw5YF9dka1RfWvoq18Ojt00K/GrQBGth3qICdS5 9vv6iiUnsJmZ0SF94iV7wteKE8PyvITYENXo1KqIc6tSMzYc+511u+ByW9kSc1p27hbV Hv/g== X-Gm-Message-State: AOAM531l6xJAb3kUH1T3AtIuP1DEvboqxeRChD7JnfmA4RdUPEFiKMpS ePWNq2nVBP6oSnvPXitB5/F2UnJsWcTD7n10urVAC6T51tm3WNcxVWIJGAmylUTw5IXP+yF/fT/ QvdapZEY6sHR0nCU4rWeCOZ7yDrmUQaELqtOLBVkBpBl83qrP4tQM/DD7ymuN/A== X-Google-Smtp-Source: ABdhPJwOK8+v3IEC4Rr66Qc+yElipZ/pNCwtCC/TEjMZUv12p1QLgJWmZUAkzgRnWJqR407P20su7HNvSUA= X-Received: from surenb1.mtv.corp.google.com ([2620:15c:211:200:899:1066:21fc:b3c5]) (user=surenb job=sendgmr) by 2002:a05:6214:12ec:: with SMTP id w12mr9122530qvv.54.1617301075354; Thu, 01 Apr 2021 11:17:55 -0700 (PDT) Date: Thu, 1 Apr 2021 11:17:40 -0700 In-Reply-To: <20210401181741.168763-1-surenb@google.com> Message-Id: <20210401181741.168763-5-surenb@google.com> Mime-Version: 1.0 References: <20210401181741.168763-1-surenb@google.com> X-Mailer: git-send-email 2.31.0.291.g576ba9dcdaf-goog Subject: [PATCH 4/5] userfaultfd: wp: add helper for writeprotect check From: Suren Baghdasaryan To: stable@vger.kernel.org Cc: gregkh@linuxfoundation.org, jannh@google.com, ktkhai@virtuozzo.com, torvalds@linux-foundation.org, shli@fb.com, namit@vmware.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Andrea Arcangeli , Peter Xu , Andrew Morton , Jerome Glisse , Mike Rapoport , Rik van Riel , "Kirill A . Shutemov" , Mel Gorman , Hugh Dickins , Johannes Weiner , Bobby Powers , Brian Geffon , David Hildenbrand , Denis Plotnikov , "Dr . David Alan Gilbert" , Martin Cracauer , Marty McFadden , Maya Gokhale , Mike Kravetz , Pavel Emelyanov Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Shaohua Li Patch series "userfaultfd: write protection support", v6. Overview ======== The uffd-wp work was initialized by Shaohua Li [1], and later continued by Andrea [2]. This series is based upon Andrea's latest userfaultfd tree, and it is a continuous works from both Shaohua and Andrea. Many of the follow up ideas come from Andrea too. Besides the old MISSING register mode of userfaultfd, the new uffd-wp support provides another alternative register mode called UFFDIO_REGISTER_MODE_WP that can be used to listen to not only missing page faults but also write protection page faults, or even they can be registered together. At the same time, the new feature also provides a new userfaultfd ioctl called UFFDIO_WRITEPROTECT which allows the userspace to write protect a range or memory or fixup write permission of faulted pages. Please refer to the document patch "userfaultfd: wp: UFFDIO_REGISTER_MODE_WP documentation update" for more information on the new interface and what it can do. The major workflow of an uffd-wp program should be: 1. Register a memory region with WP mode using UFFDIO_REGISTER_MODE_WP 2. Write protect part of the whole registered region using UFFDIO_WRITEPROTECT, passing in UFFDIO_WRITEPROTECT_MODE_WP to show that we want to write protect the range. 3. Start a working thread that modifies the protected pages, meanwhile listening to UFFD messages. 4. When a write is detected upon the protected range, page fault happens, a UFFD message will be generated and reported to the page fault handling thread 5. The page fault handler thread resolves the page fault using the new UFFDIO_WRITEPROTECT ioctl, but this time passing in !UFFDIO_WRITEPROTECT_MODE_WP instead showing that we want to recover the write permission. Before this operation, the fault handler thread can do anything it wants, e.g., dumps the page to a persistent storage. 6. The worker thread will continue running with the correctly applied write permission from step 5. Currently there are already two projects that are based on this new userfaultfd feature. QEMU Live Snapshot: The project provides a way to allow the QEMU hypervisor to take snapshot of VMs without stopping the VM [3]. LLNL umap library: The project provides a mmap-like interface and "allow to have an application specific buffer of pages cached from a large file, i.e. out-of-core execution using memory map" [4][5]. Before posting the patchset, this series was smoke tested against QEMU live snapshot and the LLNL umap library (by doing parallel quicksort using 128 sorting threads + 80 uffd servicing threads). My sincere thanks to Marty Mcfadden and Denis Plotnikov for the help along the way. TODO ==== - hugetlbfs/shmem support - performance - more architectures - cooperate with mprotect()-allowed processes (???) - ... References ========== [1] https://lwn.net/Articles/666187/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=userfault [3] https://github.com/denis-plotnikov/qemu/commits/background-snapshot-kvm [4] https://github.com/LLNL/umap [5] https://llnl-umap.readthedocs.io/en/develop/ [6] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=b245ecf6cf59156966f3da6e6b674f6695a5ffa5 [7] https://lkml.org/lkml/2018/11/21/370 [8] https://lkml.org/lkml/2018/12/30/64 This patch (of 19): Add helper for writeprotect check. Will use it later. Signed-off-by: Shaohua Li Signed-off-by: Andrea Arcangeli Signed-off-by: Peter Xu Signed-off-by: Andrew Morton Reviewed-by: Jerome Glisse Reviewed-by: Mike Rapoport Cc: Rik van Riel Cc: Kirill A. Shutemov Cc: Mel Gorman Cc: Hugh Dickins Cc: Johannes Weiner Cc: Bobby Powers Cc: Brian Geffon Cc: David Hildenbrand Cc: Denis Plotnikov Cc: "Dr . David Alan Gilbert" Cc: Martin Cracauer Cc: Marty McFadden Cc: Maya Gokhale Cc: Mike Kravetz Cc: Pavel Emelyanov Link: http://lkml.kernel.org/r/20200220163112.11409-2-peterx@redhat.com Signed-off-by: Linus Torvalds --- include/linux/userfaultfd_k.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index f2f3b68ba910..07878cd475f2 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -48,6 +48,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma) return vma->vm_flags & VM_UFFD_MISSING; } +static inline bool userfaultfd_wp(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP); @@ -91,6 +96,11 @@ static inline bool userfaultfd_missing(struct vm_area_struct *vma) return false; } +static inline bool userfaultfd_wp(struct vm_area_struct *vma) +{ + return false; +} + static inline bool userfaultfd_armed(struct vm_area_struct *vma) { return false;