From patchwork Fri Aug 20 02:04:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 501393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 005FFC4320A for ; Fri, 20 Aug 2021 02:04:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDD4F61106 for ; Fri, 20 Aug 2021 02:04:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236038AbhHTCE7 (ORCPT ); Thu, 19 Aug 2021 22:04:59 -0400 Received: from mail.kernel.org ([198.145.29.99]:56642 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234428AbhHTCE6 (ORCPT ); Thu, 19 Aug 2021 22:04:58 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6102261056; Fri, 20 Aug 2021 02:04:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1629425061; bh=/aEvIhdntCA3unz+FfqjDENmiVx+8p5exlMeVe8F2Yo=; h=Date:From:To:Subject:In-Reply-To:From; b=yHq9ksdID5VOFwTzY7eVWpCoAwsakWiwH8HYTGBme3TeQ69kgZia+OzuvfJ1VcmwF vMibW4C4SWtvB4J6C05tPvpEuxrvE0jyLb53ZQXbxx2C9a2fkiwbhlQ9SVIyKYotLx KLrKs/aTi/ZWjkookFYf/gbd1hjhyTACIqH4pDRg= Date: Thu, 19 Aug 2021 19:04:21 -0700 From: Andrew Morton To: akpm@linux-foundation.org, chris@chrisdown.name, guro@fb.com, hannes@cmpxchg.org, linux-mm@kvack.org, lnyng@fb.com, mhocko@suse.com, mm-commits@vger.kernel.org, riel@surriel.com, shakeelb@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 06/10] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Message-ID: <20210820020421.yuaeOQMtB%akpm@linux-foundation.org> In-Reply-To: <20210819190327.14fc4e97102e1af7929e30af@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Johannes Weiner Subject: mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim We've noticed occasional OOM killing when memory.low settings are in effect for cgroups. This is unexpected and undesirable as memory.low is supposed to express non-OOMing memory priorities between cgroups. The reason for this is proportional memory.low reclaim. When cgroups are below their memory.low threshold, reclaim passes them over in the first round, and then retries if it couldn't find pages anywhere else. But when cgroups are slightly above their memory.low setting, page scan force is scaled down and diminished in proportion to the overage, to the point where it can cause reclaim to fail as well - only in that case we currently don't retry, and instead trigger OOM. To fix this, hook proportional reclaim into the same retry logic we have in place for when cgroups are skipped entirely. This way if reclaim fails and some cgroups were scanned with diminished pressure, we'll try another full-force cycle before giving up and OOMing. [akpm@linux-foundation.org: coding-style fixes] Link: https://lkml.kernel.org/r/20210817180506.220056-1-hannes@cmpxchg.org Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") Signed-off-by: Johannes Weiner Reported-by: Leon Yang Reviewed-by: Rik van Riel Reviewed-by: Shakeel Butt Acked-by: Roman Gushchin Acked-by: Chris Down Acked-by: Michal Hocko Cc: [5.4+] Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 29 +++++++++++++++-------------- mm/vmscan.c | 27 +++++++++++++++++++-------- 2 files changed, 34 insertions(+), 22 deletions(-) --- a/include/linux/memcontrol.h~mm-memcontrol-fix-occasional-ooms-due-to-proportional-memorylow-reclaim +++ a/include/linux/memcontrol.h @@ -612,12 +612,15 @@ static inline bool mem_cgroup_disabled(v return !cgroup_subsys_enabled(memory_cgrp_subsys); } -static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, - struct mem_cgroup *memcg, - bool in_low_reclaim) +static inline void mem_cgroup_protection(struct mem_cgroup *root, + struct mem_cgroup *memcg, + unsigned long *min, + unsigned long *low) { + *min = *low = 0; + if (mem_cgroup_disabled()) - return 0; + return; /* * There is no reclaim protection applied to a targeted reclaim. @@ -653,13 +656,10 @@ static inline unsigned long mem_cgroup_p * */ if (root == memcg) - return 0; - - if (in_low_reclaim) - return READ_ONCE(memcg->memory.emin); + return; - return max(READ_ONCE(memcg->memory.emin), - READ_ONCE(memcg->memory.elow)); + *min = READ_ONCE(memcg->memory.emin); + *low = READ_ONCE(memcg->memory.elow); } void mem_cgroup_calculate_protection(struct mem_cgroup *root, @@ -1147,11 +1147,12 @@ static inline void memcg_memory_event_mm { } -static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root, - struct mem_cgroup *memcg, - bool in_low_reclaim) +static inline void mem_cgroup_protection(struct mem_cgroup *root, + struct mem_cgroup *memcg, + unsigned long *min, + unsigned long *low) { - return 0; + *min = *low = 0; } static inline void mem_cgroup_calculate_protection(struct mem_cgroup *root, --- a/mm/vmscan.c~mm-memcontrol-fix-occasional-ooms-due-to-proportional-memorylow-reclaim +++ a/mm/vmscan.c @@ -100,9 +100,12 @@ struct scan_control { unsigned int may_swap:1; /* - * Cgroups are not reclaimed below their configured memory.low, - * unless we threaten to OOM. If any cgroups are skipped due to - * memory.low and nothing was reclaimed, go back for memory.low. + * Cgroup memory below memory.low is protected as long as we + * don't threaten to OOM. If any cgroup is reclaimed at + * reduced force or passed over entirely due to its memory.low + * setting (memcg_low_skipped), and nothing is reclaimed as a + * result, then go back for one more cycle that reclaims the protected + * memory (memcg_low_reclaim) to avert OOM. */ unsigned int memcg_low_reclaim:1; unsigned int memcg_low_skipped:1; @@ -2537,15 +2540,14 @@ out: for_each_evictable_lru(lru) { int file = is_file_lru(lru); unsigned long lruvec_size; + unsigned long low, min; unsigned long scan; - unsigned long protection; lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx); - protection = mem_cgroup_protection(sc->target_mem_cgroup, - memcg, - sc->memcg_low_reclaim); + mem_cgroup_protection(sc->target_mem_cgroup, memcg, + &min, &low); - if (protection) { + if (min || low) { /* * Scale a cgroup's reclaim pressure by proportioning * its current usage to its memory.low or memory.min @@ -2576,6 +2578,15 @@ out: * hard protection. */ unsigned long cgroup_size = mem_cgroup_size(memcg); + unsigned long protection; + + /* memory.low scaling, make sure we retry before OOM */ + if (!sc->memcg_low_reclaim && low > min) { + protection = low; + sc->memcg_low_skipped = 1; + } else { + protection = min; + } /* Avoid TOCTOU with earlier protection check */ cgroup_size = max(cgroup_size, protection); From patchwork Fri Aug 20 02:04:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 501008 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E794CC43216 for ; Fri, 20 Aug 2021 02:04:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D064761056 for ; Fri, 20 Aug 2021 02:04:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236389AbhHTCFC (ORCPT ); Thu, 19 Aug 2021 22:05:02 -0400 Received: from mail.kernel.org ([198.145.29.99]:56698 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234428AbhHTCFB (ORCPT ); Thu, 19 Aug 2021 22:05:01 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 831D6610FE; Fri, 20 Aug 2021 02:04:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1629425064; bh=Vc5t3ffg54MDOH+Bu8ofC9/2wjypspqcsOqm1+KcvWk=; h=Date:From:To:Subject:In-Reply-To:From; b=DSnLOvfhNTTCjMAGUesYSoYC5NE8hjHCjhDcwIXLRDR4wva/B2xy/l3r0SbiALSI9 wA2fPccdtVYBRVricPnbQBtSIha2nnu+MYbnfZPc5sdwsA/GFJYHP3Iy2YN9S8yxXK 4xcM6eItqD3zYL4pAt4NmwX2449/fjNSFxSnjDHA= Date: Thu, 19 Aug 2021 19:04:24 -0700 From: Andrew Morton To: akpm@linux-foundation.org, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, shy828301@gmail.com, songmuchun@bytedance.com, stable@vger.kernel.org, tony.luck@intel.com, torvalds@linux-foundation.org Subject: [patch 07/10] mm/hwpoison: retry with shake_page() for unhandlable pages Message-ID: <20210820020424.MPo-2MFQJ%akpm@linux-foundation.org> In-Reply-To: <20210819190327.14fc4e97102e1af7929e30af@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Naoya Horiguchi Subject: mm/hwpoison: retry with shake_page() for unhandlable pages HWPoisonHandlable() sometimes returns false for typical user pages due to races with average memory events like transfers over LRU lists. This causes failures in hwpoison handling. There's retry code for such a case but does not work because the retry loop reaches the retry limit too quickly before the page settles down to handlable state. Let get_any_page() call shake_page() to fix it. [naoya.horiguchi@nec.com: get_any_page(): return -EIO when retry limit reached] Link: https://lkml.kernel.org/r/20210819001958.2365157-1-naoya.horiguchi@linux.dev Link: https://lkml.kernel.org/r/20210817053703.2267588-1-naoya.horiguchi@linux.dev Fixes: 25182f05ffed ("mm,hwpoison: fix race with hugetlb page allocation") Signed-off-by: Naoya Horiguchi Reported-by: Tony Luck Reviewed-by: Yang Shi Cc: Oscar Salvador Cc: Muchun Song Cc: Mike Kravetz Cc: Michal Hocko Cc: [5.13+] Signed-off-by: Andrew Morton --- mm/memory-failure.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) --- a/mm/memory-failure.c~mm-hwpoison-retry-with-shake_page-for-unhandlable-pages +++ a/mm/memory-failure.c @@ -1146,7 +1146,7 @@ static int __get_hwpoison_page(struct pa * unexpected races caused by taking a page refcount. */ if (!HWPoisonHandlable(head)) - return 0; + return -EBUSY; if (PageTransHuge(head)) { /* @@ -1199,9 +1199,15 @@ try_again: } goto out; } else if (ret == -EBUSY) { - /* We raced with freeing huge page to buddy, retry. */ - if (pass++ < 3) + /* + * We raced with (possibly temporary) unhandlable + * page, retry. + */ + if (pass++ < 3) { + shake_page(p, 1); goto try_again; + } + ret = -EIO; goto out; } } From patchwork Fri Aug 20 02:04:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 501392 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EEFFC4320E for ; Fri, 20 Aug 2021 02:04:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 46B7F61101 for ; Fri, 20 Aug 2021 02:04:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236815AbhHTCFJ (ORCPT ); Thu, 19 Aug 2021 22:05:09 -0400 Received: from mail.kernel.org ([198.145.29.99]:56796 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234428AbhHTCFH (ORCPT ); Thu, 19 Aug 2021 22:05:07 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 840B461056; Fri, 20 Aug 2021 02:04:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1629425070; bh=FAbgX98AcgWnC66yUp/bgSzwqTVgDkEDzHuEwn3Fbw0=; h=Date:From:To:Subject:In-Reply-To:From; b=a7Ir5CfHyUv4BhHr9giy0OFZ2b/aQdUfMhbhDc4MJSFAkzsDPXN9LGd+SkZibvEIi mKnH0VmwQ3f24y+BYCeKfHvnGaTGTVQ8QDJ2hn8FmHFO2iXNHhI6ImWUO61PmM00iE LAqw24/UsYPSY3tVB4HYkfVRc57ZYs3ONtCxx3kU= Date: Thu, 19 Aug 2021 19:04:30 -0700 From: Andrew Morton To: akpm@linux-foundation.org, dvyukov@google.com, elver@google.com, glider@google.com, Kuan-Ying.Lee@mediatek.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 09/10] kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE Message-ID: <20210820020430.mQDxQa8Jr%akpm@linux-foundation.org> In-Reply-To: <20210819190327.14fc4e97102e1af7929e30af@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Marco Elver Subject: kfence: fix is_kfence_address() for addresses below KFENCE_POOL_SIZE Originally the addr != NULL check was meant to take care of the case where __kfence_pool == NULL (KFENCE is disabled). However, this does not work for addresses where addr > 0 && addr < KFENCE_POOL_SIZE. This can be the case on NULL-deref where addr > 0 && addr < PAGE_SIZE or any other faulting access with addr < KFENCE_POOL_SIZE. While the kernel would likely crash, the stack traces and report might be confusing due to double faults upon KFENCE's attempt to unprotect such an address. Fix it by just checking that __kfence_pool != NULL instead. Link: https://lkml.kernel.org/r/20210818130300.2482437-1-elver@google.com Fixes: 0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure") Signed-off-by: Marco Elver Reported-by: Kuan-Ying Lee Acked-by: Alexander Potapenko Cc: Dmitry Vyukov Cc: [5.12+] Signed-off-by: Andrew Morton --- include/linux/kfence.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) --- a/include/linux/kfence.h~kfence-fix-is_kfence_address-for-addresses-below-kfence_pool_size +++ a/include/linux/kfence.h @@ -51,10 +51,11 @@ extern atomic_t kfence_allocation_gate; static __always_inline bool is_kfence_address(const void *addr) { /* - * The non-NULL check is required in case the __kfence_pool pointer was - * never initialized; keep it in the slow-path after the range-check. + * The __kfence_pool != NULL check is required to deal with the case + * where __kfence_pool == NULL && addr < KFENCE_POOL_SIZE. Keep it in + * the slow-path after the range-check! */ - return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && addr); + return unlikely((unsigned long)((char *)addr - __kfence_pool) < KFENCE_POOL_SIZE && __kfence_pool); } /** From patchwork Fri Aug 20 02:04:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 501007 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15375C432BE for ; Fri, 20 Aug 2021 02:04:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 00E03610FA for ; Fri, 20 Aug 2021 02:04:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237182AbhHTCFL (ORCPT ); Thu, 19 Aug 2021 22:05:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:56858 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234428AbhHTCFL (ORCPT ); Thu, 19 Aug 2021 22:05:11 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 966A86113C; Fri, 20 Aug 2021 02:04:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1629425074; bh=F5ffLljMynW8OQI322x6xuWAR4ORXn8Yh3J4MMQaa1o=; h=Date:From:To:Subject:In-Reply-To:From; b=0PktERO1Rrz4nCrU297YfdZ8EKD61adKCUMtZ2M3NGwhL51ZzMwVuDh5Hqc4EDc4T uxyv+CMBRUSMypXFhkQF4gEv16FZqUm5bItk2HSEVTHCVkOcjiMoqNXENxhmx2z4Fd lvLbk36cgpukaK2jeKPpiMB3r4aXbo3HZEVaEAes= Date: Thu, 19 Aug 2021 19:04:33 -0700 From: Andrew Morton To: akpm@linux-foundation.org, almasrymina@google.com, axelrasmussen@google.com, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, naoya.horiguchi@linux.dev, peterx@redhat.com, songmuchun@bytedance.com, stable@vger.kernel.org, syzbot+67654e51e54455f1c585@syzkaller.appspotmail.com, torvalds@linux-foundation.org Subject: [patch 10/10] hugetlb: don't pass page cache pages to restore_reserve_on_error Message-ID: <20210820020433.cv6IN-Xde%akpm@linux-foundation.org> In-Reply-To: <20210819190327.14fc4e97102e1af7929e30af@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Mike Kravetz Subject: hugetlb: don't pass page cache pages to restore_reserve_on_error syzbot hit kernel BUG at fs/hugetlbfs/inode.c:532 as described in [1]. This BUG triggers if the HPageRestoreReserve flag is set on a page in the page cache. It should never be set, as the routine huge_add_to_page_cache explicitly clears the flag after adding a page to the cache. The only code other than huge page allocation which sets the flag is restore_reserve_on_error. It will potentially set the flag in rare out of memory conditions. syzbot was injecting errors to cause memory allocation errors which exercised this specific path. The code in restore_reserve_on_error is doing the right thing. However, there are instances where pages in the page cache were being passed to restore_reserve_on_error. This is incorrect, as once a page goes into the cache reservation information will not be modified for the page until it is removed from the cache. Error paths do not remove pages from the cache, so even in the case of error, the page will remain in the cache and no reservation adjustment is needed. Modify routines that potentially call restore_reserve_on_error with a page cache page to no longer do so. Note on fixes tag: Prior to commit 846be08578ed ("mm/hugetlb: expand restore_reserve_on_error functionality") the routine would not process page cache pages because the HPageRestoreReserve flag is not set on such pages. Therefore, this issue could not be trigggered. The code added by commit 846be08578ed ("mm/hugetlb: expand restore_reserve_on_error functionality") is needed and correct. It exposed incorrect calls to restore_reserve_on_error which is the root cause addressed by this commit. [1] https://lore.kernel.org/linux-mm/00000000000050776d05c9b7c7f0@google.com/ Link: https://lkml.kernel.org/r/20210818213304.37038-1-mike.kravetz@oracle.com Fixes: 846be08578ed ("mm/hugetlb: expand restore_reserve_on_error functionality") Signed-off-by: Mike Kravetz Reported-by: Cc: Mina Almasry Cc: Axel Rasmussen Cc: Peter Xu Cc: Muchun Song Cc: Michal Hocko Cc: Naoya Horiguchi Cc: Signed-off-by: Andrew Morton --- mm/hugetlb.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) --- a/mm/hugetlb.c~hugetlb-dont-pass-page-cache-pages-to-restore_reserve_on_error +++ a/mm/hugetlb.c @@ -2476,7 +2476,7 @@ void restore_reserve_on_error(struct hst if (!rc) { /* * This indicates there is an entry in the reserve map - * added by alloc_huge_page. We know it was added + * not added by alloc_huge_page. We know it was added * before the alloc_huge_page call, otherwise * HPageRestoreReserve would be set on the page. * Remove the entry so that a subsequent allocation @@ -4660,7 +4660,9 @@ retry_avoidcopy: spin_unlock(ptl); mmu_notifier_invalidate_range_end(&range); out_release_all: - restore_reserve_on_error(h, vma, haddr, new_page); + /* No restore in case of successful pagetable update (Break COW) */ + if (new_page != old_page) + restore_reserve_on_error(h, vma, haddr, new_page); put_page(new_page); out_release_old: put_page(old_page); @@ -4776,7 +4778,7 @@ static vm_fault_t hugetlb_no_page(struct pte_t new_pte; spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); - bool new_page = false; + bool new_page, new_pagecache_page = false; /* * Currently, we are forced to kill the process in the event the @@ -4799,6 +4801,7 @@ static vm_fault_t hugetlb_no_page(struct goto out; retry: + new_page = false; page = find_lock_page(mapping, idx); if (!page) { /* Check for page in userfault range */ @@ -4842,6 +4845,7 @@ retry: goto retry; goto out; } + new_pagecache_page = true; } else { lock_page(page); if (unlikely(anon_vma_prepare(vma))) { @@ -4926,7 +4930,9 @@ backout: spin_unlock(ptl); backout_unlocked: unlock_page(page); - restore_reserve_on_error(h, vma, haddr, page); + /* restore reserve for newly allocated pages not in page cache */ + if (new_page && !new_pagecache_page) + restore_reserve_on_error(h, vma, haddr, page); put_page(page); goto out; } @@ -5135,6 +5141,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_s int ret = -ENOMEM; struct page *page; int writable; + bool new_pagecache_page = false; if (is_continue) { ret = -EFAULT; @@ -5228,6 +5235,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_s ret = huge_add_to_page_cache(page, mapping, idx); if (ret) goto out_release_nounlock; + new_pagecache_page = true; } ptl = huge_pte_lockptr(h, dst_mm, dst_pte); @@ -5291,7 +5299,8 @@ out_release_unlock: if (vm_shared || is_continue) unlock_page(page); out_release_nounlock: - restore_reserve_on_error(h, dst_vma, dst_addr, page); + if (!new_pagecache_page) + restore_reserve_on_error(h, dst_vma, dst_addr, page); put_page(page); goto out; }