From patchwork Tue Nov 3 00:28:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 317296 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC494C388F2 for ; Tue, 3 Nov 2020 00:30:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8382F2225E for ; Tue, 3 Nov 2020 00:30:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="NIi8uErS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727328AbgKCAaQ (ORCPT ); Mon, 2 Nov 2020 19:30:16 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:47148 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726860AbgKCAaM (ORCPT ); Mon, 2 Nov 2020 19:30:12 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0A30Susv088111; Tue, 3 Nov 2020 00:28:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=h5+Lu0QflE5OZ+B9UFFP0ald5p6gRIVbsKDocV9JCg0=; b=NIi8uErSycKbmI66pWcHyz/fFTsk8wS3L5KQyx8c5yg++sknwOrbh6JtvOKBzNnpICvP xPn1oY7nw5jQOQXXrAVKGURg1WhR05YdcdFI6xH23zFtj0US43zlV+/z/f2gIfJHmbBY //ulyFsYNXRRzqWTKp8PMKsQWjTNuOVwLgqsEKwzNTRgs+Ui7Qny3uckxriFs/MsNwQO 5OrEhtgMqxgPKyAG1DAA0/+uZyF9sWnIOEGgRJLLDhMUuqtzKjDtz3+b9M5QBJrwByV3 YVOpjqr/2nNeInA3rQsAF4PucN0rX/pSlVjfnA9jBMqYtf2uwgzlWHUhn+kiuiOTqpdf Og== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2130.oracle.com with ESMTP id 34hhb1xt6p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 03 Nov 2020 00:28:56 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0A30AQwE076604; Tue, 3 Nov 2020 00:28:53 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 34hw0g068p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 03 Nov 2020 00:28:53 +0000 Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 0A30Sppu000627; Tue, 3 Nov 2020 00:28:51 GMT Received: from monkey.oracle.com (/50.38.35.18) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 02 Nov 2020 16:28:51 -0800 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Hugh Dickins , Naoya Horiguchi , Michal Hocko , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , Andrew Morton , Mike Kravetz , stable@vger.kernel.org Subject: [PATCH 2/4] hugetlbfs: add hinode_rwsem to hugetlb specific inode Date: Mon, 2 Nov 2020 16:28:39 -0800 Message-Id: <20201103002841.273161-3-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201103002841.273161-1-mike.kravetz@oracle.com> References: <20201026233150.371577-1-mike.kravetz@oracle.com> <20201103002841.273161-1-mike.kravetz@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9793 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 adultscore=0 bulkscore=0 mlxscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011030000 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9793 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 phishscore=0 suspectscore=0 clxscore=1015 mlxlogscore=999 impostorscore=0 malwarescore=0 lowpriorityscore=0 adultscore=0 spamscore=0 priorityscore=1501 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011030001 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org The hugetlb pmd sharing code needs additional synchronization. This is because sharing established via a call huge_pte_alloc, could be undone before control is returned to the caller. As a result, the returned value may be invalid. Ideally, i_mmap_rwsem would be used for this type of synchronization. However, previous attempts at using i_mmap_rwsem have failed. This is partly due to conflicts with the existing uses of i_mmap_rwsem that force a locking order not compatible with it's use for pmd sharing. Introduce a rwsem (hinode_rwsem) that resides in the hugetlb specific inode for the purpose of pmd sharing synchronization. This patch adds the semaphore to the inode and also provides routines for using the semaphore. To minimize performance impacts, the routines only acquire the semaphore if pmd sharing is possible. In addition, routines which can be used with lockdep to help ensure proper locking are also added. Use of the new semaphore and supporting routines will be provided in a later patch. Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 12 ++++ include/linux/hugetlb.h | 121 ++++++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 13 ----- 3 files changed, 133 insertions(+), 13 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index c1057378dbf4..4f1404b9f354 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -85,6 +85,17 @@ static const struct fs_parameter_spec hugetlb_fs_parameters[] = { {} }; +#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE +static inline void init_hinode_rwsem(struct hugetlbfs_inode_info *info) +{ + init_rwsem(&info->hinode_rwsem); +} +#else +static inline void init_hinode_rwsem(struct hugetlbfs_inode_info *info) +{ +} +#endif + #ifdef CONFIG_NUMA static inline void hugetlb_set_vma_policy(struct vm_area_struct *vma, struct inode *inode, pgoff_t index) @@ -831,6 +842,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb, inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode); inode->i_mapping->private_data = resv_map; info->seals = F_SEAL_SEAL; + init_hinode_rwsem(info); switch (mode & S_IFMT) { default: init_special_inode(inode, mode, dev); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ebca2ef02212..c6a59c2dbc30 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -424,6 +424,9 @@ struct hugetlbfs_inode_info { struct shared_policy policy; struct inode vfs_inode; unsigned int seals; +#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE + struct rw_semaphore hinode_rwsem; +#endif }; static inline struct hugetlbfs_inode_info *HUGETLBFS_I(struct inode *inode) @@ -449,6 +452,101 @@ static inline struct hstate *hstate_inode(struct inode *i) { return HUGETLBFS_SB(i->i_sb)->hstate; } + +#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE +static inline bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) +{ + unsigned long base = addr & PUD_MASK; + unsigned long end = base + PUD_SIZE; + + /* check on proper vm_flags and page table alignment */ + if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, base, end)) + return true; + return false; +} + +/* + * hugetlb specific hinode_rwsem is used for pmd sharing synchronization. + * This routine will take the semaphore in read mode if necessary. If vma + * and addr are NULL, the routine will always acquire the semaphore. If + * values are supplied for vma and addr, they are used to determine if pmd + * sharing is actually possible, and only acquire the semaphore if possible. + * Returns true if lock was acquired, otherwise false. + */ +static inline bool hinode_lock_read(struct inode *inode, + struct vm_area_struct *vma, + unsigned long addr) +{ + if (vma && !addr) + addr = round_up(vma->vm_start, PUD_SIZE); + if (vma && !vma_shareable(vma, addr)) + return false; + + down_read(&HUGETLBFS_I(inode)->hinode_rwsem); + return true; +} + +static inline void hinode_unlock_read(struct inode *inode) +{ + up_read(&HUGETLBFS_I(inode)->hinode_rwsem); +} + +/* + * Take hinode_rwsem semaphore in write mode if necessary. See, + * hinode_lock_read for details. + * Returns true is lock was acquired, otherwise false. + */ +static inline bool hinode_lock_write(struct inode *inode, + struct vm_area_struct *vma, + unsigned long addr) +{ + if (vma && !addr) + addr = round_up(vma->vm_start, PUD_SIZE); + if (vma && !vma_shareable(vma, addr)) + return false; + + down_write(&HUGETLBFS_I(inode)->hinode_rwsem); + return true; +} + +static inline void hinode_unlock_write(struct inode *inode) +{ + up_write(&HUGETLBFS_I(inode)->hinode_rwsem); +} + +static inline void hinode_assert_locked(struct address_space *mapping) +{ + lockdep_assert_held(&HUGETLBFS_I(mapping->host)->hinode_rwsem); +} + +static inline void hinode_assert_write_locked(struct address_space *mapping) +{ + lockdep_assert_held_write(&HUGETLBFS_I(mapping->host)->hinode_rwsem); +} +#else +static inline bool hinode_lock_read(struct inode *inode, + struct vm_area_struct *vma, + unsigned long addr) +{ + return false; +} + +static inline void hinode_unlock_read(struct inode *inode) +{ +} + +static inline bool hinode_lock_write(struct inode *inode, + struct vm_area_struct *vma, + unsigned long addr) +{ + return false; +} + +static inline void hinode_unlock_write(struct inode *inode) +{ +} +#endif + #else /* !CONFIG_HUGETLBFS */ #define is_file_hugepages(file) false @@ -923,6 +1021,29 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline bool hinode_lock_read(struct inode *inode, + struct vm_area_struct *vma, + unsigned long addr) +{ + return false; +} + +static inline void hinode_unlock_read(struct inode *inode) +{ +} + +static inline bool hinode_lock_write(struct inode *inode, + struct vm_area_struct *vma, + unsigned long addr) +{ + return false; +} + +static inline void hinode_unlock_write(struct inode *inode) +{ +} + #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8a82b90ca3ee..da57018926e4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5296,19 +5296,6 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, return saddr; } -static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) -{ - unsigned long base = addr & PUD_MASK; - unsigned long end = base + PUD_SIZE; - - /* - * check on proper vm_flags and page table alignment - */ - if (vma->vm_flags & VM_MAYSHARE && range_in_vma(vma, base, end)) - return true; - return false; -} - /* * Determine if start,end range within vma could be mapped by shared pmd. * If yes, adjust start and end to cover range associated with possible From patchwork Tue Nov 3 00:28:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Kravetz X-Patchwork-Id: 317297 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7C85C00A89 for ; Tue, 3 Nov 2020 00:30:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 632B022268 for ; Tue, 3 Nov 2020 00:30:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="qcGdrIHL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726877AbgKCAaL (ORCPT ); Mon, 2 Nov 2020 19:30:11 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:47136 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726216AbgKCAaL (ORCPT ); Mon, 2 Nov 2020 19:30:11 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0A30SvUN088145; Tue, 3 Nov 2020 00:28:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2020-01-29; bh=FKqQw36FTOdMVzkcUWz8onJ82dve1+T6c2U3WfZd7e8=; b=qcGdrIHL8YLuByFfODBHdI16MREm5sXx9Dg+SR5HmpjQbrzmW9UEsbrrb4z5o5M//1Lg jaeTwM76R3HnrOCt2I2/kAsgE8snUV04FHrS+Ll18x3V8aUvMyiYpgunUX+Pkp6HFfJG sr2+zAe229gVk3ojuGGNl4CtHGWPbmLulUAjIzUA0YrplE70NvQsPaYv0zD4Z0vitHvG dssIXinP+XLfk0j/Z6QdsS9CGM2pqKoPrjJUn/3wvqJxvvPXqggT3Q/IQUUdTAIE5Rdx WXq9J+rIlUqjYMPqoTzIEp9XVUF6TxHSiQgc48oxET5r4RrAl0FdPqR6BlWIi+VftBfl gA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2130.oracle.com with ESMTP id 34hhb1xt6v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 03 Nov 2020 00:28:57 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0A309foH086478; Tue, 3 Nov 2020 00:28:56 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 34hvrutq2a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 03 Nov 2020 00:28:56 +0000 Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 0A30SsNn011245; Tue, 3 Nov 2020 00:28:55 GMT Received: from monkey.oracle.com (/50.38.35.18) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 02 Nov 2020 16:28:54 -0800 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Hugh Dickins , Naoya Horiguchi , Michal Hocko , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , Andrew Morton , Mike Kravetz , stable@vger.kernel.org Subject: [PATCH 4/4] huegtlbfs: handle page fault/truncate races Date: Mon, 2 Nov 2020 16:28:41 -0800 Message-Id: <20201103002841.273161-5-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201103002841.273161-1-mike.kravetz@oracle.com> References: <20201026233150.371577-1-mike.kravetz@oracle.com> <20201103002841.273161-1-mike.kravetz@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9793 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 adultscore=0 mlxscore=0 malwarescore=0 mlxlogscore=999 suspectscore=2 spamscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011030000 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9793 signatures=668682 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 phishscore=0 suspectscore=2 clxscore=1015 mlxlogscore=999 impostorscore=0 malwarescore=0 lowpriorityscore=0 adultscore=0 spamscore=0 priorityscore=1501 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011030001 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org A huegtlb page fault can race with page truncation. Make the code identifying and handling these races more robust. Page fault handling needs to back out pages added to page cache beyond file size (i_size). When backing out the page, take care to restore reserve map entries and counts as necessary. File truncation (remove_inode_hugepages) needs to handle page mapping changes before locking the page. This could happen if page was added to page cache and later backed out in fault processing. Fixes 7bf91d39bb5 ("hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race") Cc: Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 34 ++++++++++++++++++++-------------- mm/hugetlb.c | 40 ++++++++++++++++++++++++++++++++++++++-- 2 files changed, 58 insertions(+), 16 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index bc9979382a1e..6b975377558e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -534,23 +534,29 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, lock_page(page); /* - * We must free the huge page and remove from page - * cache (remove_huge_page) BEFORE removing the - * region/reserve map (hugetlb_unreserve_pages). In - * rare out of memory conditions, removal of the - * region/reserve map could fail. Correspondingly, - * the subpool and global reserve usage count can need - * to be adjusted. + * After locking page, make sure mapping is the same. + * We could have raced with page fault populate and + * backout code. */ - VM_BUG_ON(PagePrivate(page)); - remove_huge_page(page); - freed++; - if (!truncate_op) { - if (unlikely(hugetlb_unreserve_pages(inode, + if (page_mapping(page) == mapping) { + /* + * We must free the huge page and remove from + * page cache (remove_huge_page) BEFORE + * removing the region/reserve map. In rare + * out of memory conditions, removal of the + * region/reserve map could fail and the + * subpool and global reserve usage count + * will need to be adjusted. + */ + VM_BUG_ON(PagePrivate(page)); + remove_huge_page(page); + freed++; + if (!truncate_op) { + if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1))) - hugetlb_fix_reserve_counts(inode); + hugetlb_fix_reserve_counts(inode); + } } - unlock_page(page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 957abc2d02ff..6b348d344f23 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4224,6 +4224,9 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); bool new_page = false; + bool page_cache = false; + bool reserve_alloc = false; + bool beyond_i_size = false; /* * Currently, we are forced to kill the process in the event the @@ -4311,6 +4314,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, clear_huge_page(page, address, pages_per_huge_page(h)); __SetPageUptodate(page); new_page = true; + if (PagePrivate(page)) + reserve_alloc = true; if (vma->vm_flags & VM_MAYSHARE) { int err = huge_add_to_page_cache(page, mapping, idx); @@ -4320,6 +4325,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, goto retry; goto out; } + page_cache = true; } else { lock_page(page); if (unlikely(anon_vma_prepare(vma))) { @@ -4358,8 +4364,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, ptl = huge_pte_lock(h, mm, ptep); size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) + if (idx >= size) { + beyond_i_size = true; goto backout; + } ret = 0; if (!huge_pte_none(huge_ptep_get(ptep))) @@ -4397,8 +4405,36 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, backout: spin_unlock(ptl); backout_unlocked: + if (new_page) { + if (page_cache && beyond_i_size) { + /* + * Back out pages added to page cache beyond i_size. + * Otherwise, they will 'sit' there until the file + * is removed. + */ + ClearPageDirty(page); + ClearPageUptodate(page); + delete_from_page_cache(page); + } + + if (reserve_alloc) { + /* + * If reserve was consumed, set PagePrivate so that + * it will be restored in free_huge_page(). + */ + SetPagePrivate(page); + } + + if (!beyond_i_size) { + /* + * Do not restore reserve map entries beyond i_size. + * there will be leaks when the file is removed. + */ + restore_reserve_on_error(h, vma, haddr, page); + } + + } unlock_page(page); - restore_reserve_on_error(h, vma, haddr, page); put_page(page); goto out; }