From patchwork Fri Feb 7 03:37:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 208810 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B58E8C352A3 for ; Fri, 7 Feb 2020 03:37:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8A59B22464 for ; Fri, 7 Feb 2020 03:37:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="ejg6TBDM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727195AbgBGDhi (ORCPT ); Thu, 6 Feb 2020 22:37:38 -0500 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:2970 "EHLO hqnvemgate26.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726709AbgBGDhh (ORCPT ); Thu, 6 Feb 2020 22:37:37 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 06 Feb 2020 19:37:22 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Thu, 06 Feb 2020 19:37:37 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Thu, 06 Feb 2020 19:37:37 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 03:37:36 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 03:37:36 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7, 5, 8, 10121) id ; Thu, 06 Feb 2020 19:37:36 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Christoph Hellwig , Dan Williams , Dave Chinner , Ira Weiny , Jan Kara , Jason Gunthorpe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , "Kirill A . Shutemov" , Michal Hocko , Mike Kravetz , Shuah Khan , Vlastimil Babka , Matthew Wilcox , , , , , , LKML , John Hubbard , "Kirill A . Shutemov" Subject: [PATCH v5 01/12] mm: dump_page(): better diagnostics for compound pages Date: Thu, 6 Feb 2020 19:37:24 -0800 Message-ID: <20200207033735.308000-2-jhubbard@nvidia.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200207033735.308000-1-jhubbard@nvidia.com> References: <20200207033735.308000-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581046642; bh=JHbFoVlby1O2Fi0YWtwHfwC9UdnPi6RlItOvHznb5y0=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=ejg6TBDMQadq3V8m4zpmpBWDvyla1Cwb+Pz7xP8Sy2K+XjBjA9AL5+bGG3OlOUGYc 6vCxNALEG82PF88QxORGXRoa9GKsm+Pv0WCv2l57EGB1Cdf4r4W5ErzwGKvw16H0hj JF98EdTNU/tVRevEDqZgBeD0h1Tzz51zyM+zD2+tf+9uSPSEiWjT281tJ21tcZ62xW fvBTfsDCics41w6Y8J1oOEJiOGsCSe3spIH8o6LSVtrn5GpI3x9eKgNpVf3azkJzXk uCNgz1Pn1Qaz0tLcWL3r0WTWW5ioZi19i7XFLTbVHgSwL1wCR5HLWzBRIYzWd8AwKK UpXznRe1xXEzA== Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org A compound page collects the refcount in the head page, while leaving the refcount of each tail page at zero. Therefore, when debugging a problem that involves compound pages, it's best to have diagnostics that reflect that situation. However, dump_page() is oblivious to these points. Change dump_page() as follows: 1) For tail pages, print relevant head page information: refcount, in particular. But only do this if the page is not corrupted so badly that the pointer to the head page is all wrong. 2) Do a separate check to catch any (rare) cases of the tail page's refcount being non-zero, and issue a separate, clear pr_warn() if that ever happens. Suggested-by: Matthew Wilcox Suggested-by: Kirill A. Shutemov Acked-by: Kirill A. Shutemov Signed-off-by: John Hubbard --- mm/debug.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/mm/debug.c b/mm/debug.c index ecccd9f17801..f074077eee11 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -42,6 +42,33 @@ const struct trace_print_flags vmaflag_names[] = { {0, NULL} }; +static void __dump_tail_page(struct page *page, int mapcount) +{ + struct page *head = compound_head(page); + + if ((page < head) || (page >= head + MAX_ORDER_NR_PAGES)) { + /* + * Page is hopelessly corrupted, so limit any reporting to + * information about the page itself. Do not attempt to look at + * the head page. + */ + pr_warn("page:%px refcount:%d mapcount:%d mapping:%px " + "index:%#lx (corrupted tail page case)\n", + page, page_ref_count(page), mapcount, page->mapping, + page_to_pgoff(page)); + } else { + pr_warn("page:%px compound refcount:%d mapcount:%d mapping:%px " + "index:%#lx compound_mapcount:%d\n", + page, page_ref_count(head), mapcount, head->mapping, + page_to_pgoff(head), compound_mapcount(page)); + } + + if (page_ref_count(page) != 0) { + pr_warn("page:%px PROBLEM: non-zero refcount (==%d) on this " + "tail page\n", page, page_ref_count(page)); + } +} + void __dump_page(struct page *page, const char *reason) { struct address_space *mapping; @@ -75,12 +102,8 @@ void __dump_page(struct page *page, const char *reason) */ mapcount = PageSlab(page) ? 0 : page_mapcount(page); - if (PageCompound(page)) - pr_warn("page:%px refcount:%d mapcount:%d mapping:%px " - "index:%#lx compound_mapcount: %d\n", - page, page_ref_count(page), mapcount, - page->mapping, page_to_pgoff(page), - compound_mapcount(page)); + if (PageTail(page)) + __dump_tail_page(page, mapcount); else pr_warn("page:%px refcount:%d mapcount:%d mapping:%px index:%#lx\n", page, page_ref_count(page), mapcount, From patchwork Fri Feb 7 03:37:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 208808 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE4FDC3B185 for ; Fri, 7 Feb 2020 03:38:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 865B122522 for ; Fri, 7 Feb 2020 03:38:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="BZ7IOW08" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727608AbgBGDhn (ORCPT ); Thu, 6 Feb 2020 22:37:43 -0500 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:13366 "EHLO hqnvemgate25.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727587AbgBGDhm (ORCPT ); Thu, 6 Feb 2020 22:37:42 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 06 Feb 2020 19:37:12 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Thu, 06 Feb 2020 19:37:37 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Thu, 06 Feb 2020 19:37:37 -0800 Received: from HQMAIL109.nvidia.com (172.20.187.15) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 03:37:36 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 03:37:36 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7, 5, 8, 10121) id ; Thu, 06 Feb 2020 19:37:36 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Christoph Hellwig , Dan Williams , Dave Chinner , Ira Weiny , Jan Kara , Jason Gunthorpe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , "Kirill A . Shutemov" , Michal Hocko , Mike Kravetz , Shuah Khan , Vlastimil Babka , Matthew Wilcox , , , , , , LKML , John Hubbard , "Kirill A . Shutemov" Subject: [PATCH v5 03/12] mm/gup: pass a flags arg to __gup_device_* functions Date: Thu, 6 Feb 2020 19:37:26 -0800 Message-ID: <20200207033735.308000-4-jhubbard@nvidia.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200207033735.308000-1-jhubbard@nvidia.com> References: <20200207033735.308000-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581046632; bh=9vUvR7MWlrDY2h3QHPZ14a3zjGRWS++CvrjGW0mNfqY=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Type:Content-Transfer-Encoding; b=BZ7IOW08VMig3ZvuZbdHCqlBvHE0hCtsmfDYe2g93wWP+bbQFFpCRQ+vUaDzoDKKB pgoPhdWlYsNWQU9PDRa2sfRqJKAdqP/BNoj9a44gEmGvgKVBPUDHCE4cnBUvGjXX1/ RgJ3BonJAU7r7tin+WU36Wql+OWIaQkLcooSuVeJYqwffC/SQfKrdYmErOy3ZXInvM WhzLelaWiNMjTOsJaAKSW7NpL8Jks+o8qhA+8MpS+SmW6PINvR6s2/ULrOsfb44zZ0 LD1G4w/MDUkOTnJlN3krNs9gFo7oHw9r5I9Xj7Co7UIYQWyXcjtdQIou6HiCnpkmo6 E85LH6wPJzAxQ== Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org A subsequent patch requires access to gup flags, so pass the flags argument through to the __gup_device_* functions. Also placate checkpatch.pl by shortening a nearby line. Reviewed-by: Jan Kara Reviewed-by: Jérôme Glisse Reviewed-by: Ira Weiny Acked-by: Kirill A. Shutemov Signed-off-by: John Hubbard --- mm/gup.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index b699500da077..9e117998274c 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1963,7 +1963,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) static int __gup_device_huge(unsigned long pfn, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { int nr_start = *nr; struct dev_pagemap *pgmap = NULL; @@ -1989,13 +1990,14 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, } static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { unsigned long fault_pfn; int nr_start = *nr; fault_pfn = pmd_pfn(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - if (!__gup_device_huge(fault_pfn, addr, end, pages, nr)) + if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) return 0; if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { @@ -2006,13 +2008,14 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, } static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { unsigned long fault_pfn; int nr_start = *nr; fault_pfn = pud_pfn(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - if (!__gup_device_huge(fault_pfn, addr, end, pages, nr)) + if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr)) return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { @@ -2023,14 +2026,16 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, } #else static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { BUILD_BUG(); return 0; } static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr, - unsigned long end, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { BUILD_BUG(); return 0; @@ -2146,7 +2151,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, if (pmd_devmap(orig)) { if (unlikely(flags & FOLL_LONGTERM)) return 0; - return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr); + return __gup_device_huge_pmd(orig, pmdp, addr, end, flags, + pages, nr); } page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); @@ -2167,7 +2173,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, } static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, - unsigned long end, unsigned int flags, struct page **pages, int *nr) + unsigned long end, unsigned int flags, + struct page **pages, int *nr) { struct page *head, *page; int refs; @@ -2178,7 +2185,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, if (pud_devmap(orig)) { if (unlikely(flags & FOLL_LONGTERM)) return 0; - return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr); + return __gup_device_huge_pud(orig, pudp, addr, end, flags, + pages, nr); } page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); From patchwork Fri Feb 7 03:37:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 208805 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7F78C3B189 for ; Fri, 7 Feb 2020 03:38:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BE96721927 for ; Fri, 7 Feb 2020 03:38:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="qCL7wJvy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727551AbgBGDhk (ORCPT ); Thu, 6 Feb 2020 22:37:40 -0500 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:13343 "EHLO hqnvemgate25.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727524AbgBGDhj (ORCPT ); Thu, 6 Feb 2020 22:37:39 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 06 Feb 2020 19:37:12 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Thu, 06 Feb 2020 19:37:37 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Thu, 06 Feb 2020 19:37:37 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 03:37:36 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 03:37:36 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7, 5, 8, 10121) id ; Thu, 06 Feb 2020 19:37:36 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Christoph Hellwig , Dan Williams , Dave Chinner , Ira Weiny , Jan Kara , Jason Gunthorpe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , "Kirill A . Shutemov" , Michal Hocko , "Mike Kravetz" , Shuah Khan , "Vlastimil Babka" , Matthew Wilcox , , , , , , LKML , John Hubbard Subject: [PATCH v5 04/12] mm: introduce page_ref_sub_return() Date: Thu, 6 Feb 2020 19:37:27 -0800 Message-ID: <20200207033735.308000-5-jhubbard@nvidia.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200207033735.308000-1-jhubbard@nvidia.com> References: <20200207033735.308000-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581046632; bh=DhMMWR0L+su36XhWVkaUTT8AIpyvoaCJ7tCBpQiPc3M=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=qCL7wJvyuAIwVgPSCBkxszuO0dlj80eWg/EDbClYxweQTRNHL+GTW+zK5ITf9LNig uRb9r+NQmOvLYXjHpxOGosNBSj7qAFzD1Gp/gXOyn3l/7Eqt4KN7cqUq1efOgufStl 7Bea1KDIgNa9pQWBFdc/LdEPKGUGmPWbDZMoxCXH+5lY5tU5TD64FzekAlyXNuUnXd Myo9xS52/un25I1eAeSv6b3cg81SSkeMiPJec39TyWWzQLsEdozMVBZqphrxjT7MNM jkW3BmWsPJ7q587TT4lIOdnmBBkxgFCDUucq88TsdGmsVVI9cKN0B+bb69YA/DJtdV 8Nhet+MSTrAig== Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org An upcoming patch requires subtracting a large chunk of refcounts from a page, and checking what the resulting refcount is. This is a little different than the usual "check for zero refcount" that many of the page ref functions already do. However, it is similar to a few other routines that (like this one) are generally useful for things such as 1-based refcounting. Add page_ref_sub_return(), that subtracts a chunk of refcounts atomically, and returns an atomic snapshot of the result. Reviewed-by: Jan Kara Signed-off-by: John Hubbard --- include/linux/page_ref.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h index 14d14beb1f7f..d27701199a4d 100644 --- a/include/linux/page_ref.h +++ b/include/linux/page_ref.h @@ -102,6 +102,15 @@ static inline void page_ref_sub(struct page *page, int nr) __page_ref_mod(page, -nr); } +static inline int page_ref_sub_return(struct page *page, int nr) +{ + int ret = atomic_sub_return(nr, &page->_refcount); + + if (page_ref_tracepoint_active(__tracepoint_page_ref_mod_and_return)) + __page_ref_mod_and_return(page, -nr, ret); + return ret; +} + static inline void page_ref_inc(struct page *page) { atomic_inc(&page->_refcount); From patchwork Fri Feb 7 03:37:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 208807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1086EC47409 for ; Fri, 7 Feb 2020 03:38:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DCBEA22522 for ; Fri, 7 Feb 2020 03:38:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="LUxasBJo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727778AbgBGDiO (ORCPT ); Thu, 6 Feb 2020 22:38:14 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:4794 "EHLO hqnvemgate24.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727571AbgBGDhl (ORCPT ); Thu, 6 Feb 2020 22:37:41 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 06 Feb 2020 19:36:38 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Thu, 06 Feb 2020 19:37:37 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Thu, 06 Feb 2020 19:37:37 -0800 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 03:37:37 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 03:37:36 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7, 5, 8, 10121) id ; Thu, 06 Feb 2020 19:37:36 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Christoph Hellwig , Dan Williams , Dave Chinner , Ira Weiny , Jan Kara , Jason Gunthorpe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , "Kirill A . Shutemov" , Michal Hocko , Mike Kravetz , Shuah Khan , Vlastimil Babka , Matthew Wilcox , , , , , , LKML , John Hubbard , "Kirill A . Shutemov" Subject: [PATCH v5 05/12] mm/gup: pass gup flags to two more routines Date: Thu, 6 Feb 2020 19:37:28 -0800 Message-ID: <20200207033735.308000-6-jhubbard@nvidia.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200207033735.308000-1-jhubbard@nvidia.com> References: <20200207033735.308000-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581046598; bh=+TuU8OOcGV+ejGVFCndVV6S1iiJ5m64AX+o6oBv/wIw=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=LUxasBJoN7P30/xAEnxWVKJg6G6A361EKR/cysoA4VHWOyt9C/Pc0l/xBI0VHnJPm 3rfu7+Rw25fCotPP2iyg5IkmU1y/7CyHH7WIRmIPAcyY2h1deS4FsJpZtZJTjdm1Sd WHewDvSSvtqkNyPqBhU9GJMLe3hEz0uKqJa5USBT42Lpu9JmUrq1Seg4KLgnrrDWXY hbA2TdjTcYFIcDT5VfNi0KJb+Y3Lxa7a8IZH+W86x9QPFzcddFI5AFIPovKdYnYUas kYPv4lMZoStKZkZNc6P2NTBoIRshoitEDkeeAKr8g0mSmVuSGNyWO7w68EA900sIQO OsUJZcIVSt9vw== Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org In preparation for an upcoming patch, send gup flags args to two more routines: put_compound_head(), and undo_dev_pagemap(). Acked-by: Kirill A. Shutemov Reviewed-by: Jan Kara Signed-off-by: John Hubbard --- mm/gup.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 9e117998274c..e5f75e886663 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1870,6 +1870,7 @@ static inline pte_t gup_get_pte(pte_t *ptep) #endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start, + unsigned int flags, struct page **pages) { while ((*nr) - nr_start) { @@ -1909,7 +1910,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, pgmap = get_dev_pagemap(pte_pfn(pte), pgmap); if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); goto pte_unmap; } } else if (pte_special(pte)) @@ -1974,7 +1975,7 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, pgmap = get_dev_pagemap(pfn, pgmap); if (unlikely(!pgmap)) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } SetPageReferenced(page); @@ -2001,7 +2002,7 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, return 0; if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } return 1; @@ -2019,7 +2020,7 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { - undo_dev_pagemap(nr, nr_start, pages); + undo_dev_pagemap(nr, nr_start, flags, pages); return 0; } return 1; @@ -2053,7 +2054,7 @@ static int record_subpages(struct page *page, unsigned long addr, return nr; } -static void put_compound_head(struct page *page, int refs) +static void put_compound_head(struct page *page, int refs, unsigned int flags) { VM_BUG_ON_PAGE(page_ref_count(page) < refs, page); /* @@ -2103,7 +2104,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, return 0; if (unlikely(pte_val(pte) != pte_val(*ptep))) { - put_compound_head(head, refs); + put_compound_head(head, refs, flags); return 0; } @@ -2163,7 +2164,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, return 0; if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) { - put_compound_head(head, refs); + put_compound_head(head, refs, flags); return 0; } @@ -2197,7 +2198,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { - put_compound_head(head, refs); + put_compound_head(head, refs, flags); return 0; } @@ -2226,7 +2227,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, return 0; if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) { - put_compound_head(head, refs); + put_compound_head(head, refs, flags); return 0; } From patchwork Fri Feb 7 03:37:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 208806 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC3F4C3B188 for ; Fri, 7 Feb 2020 03:38:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A271921927 for ; Fri, 7 Feb 2020 03:38:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="BCVsa9Vg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727561AbgBGDhk (ORCPT ); Thu, 6 Feb 2020 22:37:40 -0500 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:13345 "EHLO hqnvemgate25.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727527AbgBGDhj (ORCPT ); Thu, 6 Feb 2020 22:37:39 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 06 Feb 2020 19:37:12 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Thu, 06 Feb 2020 19:37:37 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Thu, 06 Feb 2020 19:37:37 -0800 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL109.nvidia.com (172.20.187.15) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 03:37:37 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 03:37:37 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7, 5, 8, 10121) id ; Thu, 06 Feb 2020 19:37:36 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Christoph Hellwig , Dan Williams , Dave Chinner , Ira Weiny , Jan Kara , Jason Gunthorpe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , "Kirill A . Shutemov" , Michal Hocko , "Mike Kravetz" , Shuah Khan , "Vlastimil Babka" , Matthew Wilcox , , , , , , LKML , John Hubbard , "Kirill A . Shutemov" Subject: [PATCH v5 06/12] mm/gup: require FOLL_GET for get_user_pages_fast() Date: Thu, 6 Feb 2020 19:37:29 -0800 Message-ID: <20200207033735.308000-7-jhubbard@nvidia.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200207033735.308000-1-jhubbard@nvidia.com> References: <20200207033735.308000-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581046633; bh=zdKJN+80LnhrM42H0YcEzysuucpfBmwOwmXaeST0eWY=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=BCVsa9Vg/FLx131VkBwVusgApBWD3rXCMU4HwFYgqK+JHpePFTfpxlV7oG9uqvwoD eFInmWS/uaWR2xl9VFdCBWhGrmcdqU43SUDY+MjXrnqfaP1+yCtpfUdgqU2hzEGK3j OMpaRT/Yiw0G3m/vV1RCf5Xl1oK5n4Q587r5K2jjOC9vErW93juZY6sOFBaqP+MZok 7EdA7Ui/FblkaCwg16abHqUYqltQJA7CKenmPCbrJteRfvVY/d+b5anez4HgL/09d9 yLGxVuVVc7GjMkopOFdmOp6vT2zV+F4i1Lx9L2gWpSQfJxkLl+Xmb7ViCvcDclCSnF xK8ZrUQ/PlMtQ== Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Internal to mm/gup.c, require that get_user_pages_fast() and __get_user_pages_fast() identify themselves, by setting FOLL_GET. This is required in order to be able to make decisions based on "FOLL_PIN, or FOLL_GET, or both or neither are set", in upcoming patches. Acked-by: Kirill A. Shutemov Reviewed-by: Jan Kara Signed-off-by: John Hubbard --- mm/gup.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index e5f75e886663..c8affbea2019 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2390,6 +2390,14 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, unsigned long len, end; unsigned long flags; int nr = 0; + /* + * Internally (within mm/gup.c), gup fast variants must set FOLL_GET, + * because gup fast is always a "pin with a +1 page refcount" request. + */ + unsigned int gup_flags = FOLL_GET; + + if (write) + gup_flags |= FOLL_WRITE; start = untagged_addr(start) & PAGE_MASK; len = (unsigned long) nr_pages << PAGE_SHIFT; @@ -2415,7 +2423,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) && gup_fast_permitted(start, end)) { local_irq_save(flags); - gup_pgd_range(start, end, write ? FOLL_WRITE : 0, pages, &nr); + gup_pgd_range(start, end, gup_flags, pages, &nr); local_irq_restore(flags); } @@ -2454,7 +2462,7 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages, int nr = 0, ret = 0; if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | - FOLL_FORCE | FOLL_PIN))) + FOLL_FORCE | FOLL_PIN | FOLL_GET))) return -EINVAL; start = untagged_addr(start) & PAGE_MASK; @@ -2521,6 +2529,13 @@ int get_user_pages_fast(unsigned long start, int nr_pages, if (WARN_ON_ONCE(gup_flags & FOLL_PIN)) return -EINVAL; + /* + * The caller may or may not have explicitly set FOLL_GET; either way is + * OK. However, internally (within mm/gup.c), gup fast variants must set + * FOLL_GET, because gup fast is always a "pin with a +1 page refcount" + * request. + */ + gup_flags |= FOLL_GET; return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages); } EXPORT_SYMBOL_GPL(get_user_pages_fast); From patchwork Fri Feb 7 03:37:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 208809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FD72C35247 for ; Fri, 7 Feb 2020 03:37:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 29EF522B48 for ; Fri, 7 Feb 2020 03:37:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="ht4ic/A/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727599AbgBGDhm (ORCPT ); Thu, 6 Feb 2020 22:37:42 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:4797 "EHLO hqnvemgate24.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727579AbgBGDhl (ORCPT ); Thu, 6 Feb 2020 22:37:41 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 06 Feb 2020 19:36:38 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Thu, 06 Feb 2020 19:37:37 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Thu, 06 Feb 2020 19:37:37 -0800 Received: from HQMAIL107.nvidia.com (172.20.187.13) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 03:37:37 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 03:37:37 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7, 5, 8, 10121) id ; Thu, 06 Feb 2020 19:37:37 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Christoph Hellwig , Dan Williams , Dave Chinner , Ira Weiny , Jan Kara , Jason Gunthorpe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , "Kirill A . Shutemov" , Michal Hocko , Mike Kravetz , Shuah Khan , Vlastimil Babka , Matthew Wilcox , , , , , , LKML , John Hubbard , "Kirill A . Shutemov" Subject: [PATCH v5 08/12] mm/gup: page->hpage_pinned_refcount: exact pin counts for huge pages Date: Thu, 6 Feb 2020 19:37:31 -0800 Message-ID: <20200207033735.308000-9-jhubbard@nvidia.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200207033735.308000-1-jhubbard@nvidia.com> References: <20200207033735.308000-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581046598; bh=OT3RBMIOTYuZ2ESmgRWT99LM4mnzHujzb1/CcKG658o=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=ht4ic/A/e4J6w/cRlZWMIxKMARsRDdIyh21phulzTKPskRX3oak75qEZ7sR2zLEUd gGQUwLpOsYd8wPGhdTjL09EoTJD/FscKvCovOT4gSPicwCVhX+We73dMF/27d1L14R 5MUebqyGLOv1yqkLc2KVfp/uTkzdU6AgPgT4JPppggmecW8dZc218wK73reD4gRyF3 mS0mxmp118ba0pQobWP8DQamwXN8LRrsAigG56udLaIhbhC8h+WDFuYJMc142pYf0u KwhJ8y0X1kp8Jkc+Lqkq7qm8xYhThXujDhKBaxXBK+XVGYOIQmAPjYA2fQrL9fZXOL bJY37B1EtB1xQ== Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org For huge pages (and in fact, any compound page), the GUP_PIN_COUNTING_BIAS scheme tends to overflow too easily, each tail page increments the head page->_refcount by GUP_PIN_COUNTING_BIAS (1024). That limits the number of huge pages that can be pinned. This patch removes that limitation, by using an exact form of pin counting for compound pages of order > 1. The "order > 1" is required because this approach uses the 3rd struct page in the compound page, and order 1 compound pages only have two pages, so that won't work there. A new struct page field, hpage_pinned_refcount, has been added, replacing a padding field in the union (so no new space is used). This enhancement also has a useful side effect: huge pages and compound pages (of order > 1) do not suffer from the "potential false positives" problem that is discussed in the page_dma_pinned() comment block. That is because these compound pages have extra space for tracking things, so they get exact pin counts instead of overloading page->_refcount. Documentation/core-api/pin_user_pages.rst is updated accordingly. Acked-by: Kirill A. Shutemov Reviewed-by: Jan Kara Suggested-by: Jan Kara Signed-off-by: John Hubbard --- Documentation/core-api/pin_user_pages.rst | 40 +++++------- include/linux/mm.h | 26 ++++++++ include/linux/mm_types.h | 7 +- mm/gup.c | 78 ++++++++++++++++++++--- mm/hugetlb.c | 6 ++ mm/page_alloc.c | 2 + mm/rmap.c | 6 ++ 7 files changed, 133 insertions(+), 32 deletions(-) diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst index 9829345428f8..7e5dd8b1b3f2 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -52,8 +52,22 @@ Which flags are set by each wrapper For these pin_user_pages*() functions, FOLL_PIN is OR'd in with whatever gup flags the caller provides. The caller is required to pass in a non-null struct -pages* array, and the function then pin pages by incrementing each by a special -value. For now, that value is +1, just like get_user_pages*().:: +pages* array, and the function then pins pages by incrementing each by a special +value: GUP_PIN_COUNTING_BIAS. + +For huge pages (and in fact, any compound page of more than 2 pages), the +GUP_PIN_COUNTING_BIAS scheme is not used. Instead, an exact form of pin counting +is achieved, by using the 3rd struct page in the compound page. A new struct +page field, hpage_pinned_refcount, has been added in order to support this. + +This approach for compound pages avoids the counting upper limit problems that +are discussed below. Those limitations would have been aggravated severely by +huge pages, because each tail page adds a refcount to the head page. And in +fact, testing revealed that, without a separate hpage_pinned_refcount field, +page overflows were seen in some huge page stress tests. + +This also means that huge pages and compound pages (of order > 1) do not suffer +from the false positives problem that is mentioned below.:: Function -------- @@ -99,27 +113,6 @@ pages: This also leads to limitations: there are only 31-10==21 bits available for a counter that increments 10 bits at a time. -TODO: for 1GB and larger huge pages, this is cutting it close. That's because -when pin_user_pages() follows such pages, it increments the head page by "1" -(where "1" used to mean "+1" for get_user_pages(), but now means "+1024" for -pin_user_pages()) for each tail page. So if you have a 1GB huge page: - -* There are 256K (18 bits) worth of 4 KB tail pages. -* There are 21 bits available to count up via GUP_PIN_COUNTING_BIAS (that is, - 10 bits at a time) -* There are 21 - 18 == 3 bits available to count. Except that there aren't, - because you need to allow for a few normal get_page() calls on the head page, - as well. Fortunately, the approach of using addition, rather than "hard" - bitfields, within page->_refcount, allows for sharing these bits gracefully. - But we're still looking at about 8 references. - -This, however, is a missing feature more than anything else, because it's easily -solved by addressing an obvious inefficiency in the original get_user_pages() -approach of retrieving pages: stop treating all the pages as if they were -PAGE_SIZE. Retrieve huge pages as huge pages. The callers need to be aware of -this, so some work is required. Once that's in place, this limitation mostly -disappears from view, because there will be ample refcounting range available. - * Callers must specifically request "dma-pinned tracking of pages". In other words, just calling get_user_pages() will not suffice; a new set of functions, pin_user_page() and related, must be used. @@ -228,5 +221,6 @@ References * `Some slow progress on get_user_pages() (Apr 2, 2019) `_ * `DMA and get_user_pages() (LPC: Dec 12, 2018) `_ * `The trouble with get_user_pages() (Apr 30, 2018) `_ +* `LWN kernel index: get_user_pages() `_ John Hubbard, October, 2019 diff --git a/include/linux/mm.h b/include/linux/mm.h index 8d4f9f4094f4..2f9ca976402b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -770,6 +770,24 @@ static inline unsigned int compound_order(struct page *page) return page[1].compound_order; } +static inline bool hpage_pincount_available(struct page *page) +{ + /* + * Can the page->hpage_pinned_refcount field be used? That field is in + * the 3rd page of the compound page, so the smallest (2-page) compound + * pages cannot support it. + */ + page = compound_head(page); + return PageCompound(page) && compound_order(page) > 1; +} + +static inline int compound_pincount(struct page *page) +{ + VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); + page = compound_head(page); + return atomic_read(compound_pincount_ptr(page)); +} + static inline void set_compound_order(struct page *page, unsigned int order) { page[1].compound_order = order; @@ -1084,6 +1102,11 @@ void unpin_user_pages(struct page **pages, unsigned long npages); * refcounts, and b) all the callers of this routine are expected to be able to * deal gracefully with a false positive. * + * For huge pages, the result will be exactly correct. That's because we have + * more tracking data available: the 3rd struct page in the compound page is + * used to track the pincount (instead using of the GUP_PIN_COUNTING_BIAS + * scheme). + * * For more information, please see Documentation/vm/pin_user_pages.rst. * * @page: pointer to page to be queried. @@ -1092,6 +1115,9 @@ void unpin_user_pages(struct page **pages, unsigned long npages); */ static inline bool page_maybe_dma_pinned(struct page *page) { + if (hpage_pincount_available(page)) + return compound_pincount(page) > 0; + /* * page_ref_count() is signed. If that refcount overflows, then * page_ref_count() returns a negative value, and callers will avoid diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c28911c3afa8..dd555e6d23f3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -137,7 +137,7 @@ struct page { }; struct { /* Second tail page of compound page */ unsigned long _compound_pad_1; /* compound_head */ - unsigned long _compound_pad_2; + atomic_t hpage_pinned_refcount; /* For both global and memcg */ struct list_head deferred_list; }; @@ -226,6 +226,11 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page) return &page[1].compound_mapcount; } +static inline atomic_t *compound_pincount_ptr(struct page *page) +{ + return &page[2].hpage_pinned_refcount; +} + /* * Used for sizing the vmemmap region on some architectures */ diff --git a/mm/gup.c b/mm/gup.c index a2356482e1ea..4d0d94405639 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -29,6 +29,22 @@ struct follow_page_context { unsigned int page_mask; }; +static void hpage_pincount_add(struct page *page, int refs) +{ + VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); + VM_BUG_ON_PAGE(page != compound_head(page), page); + + atomic_add(refs, compound_pincount_ptr(page)); +} + +static void hpage_pincount_sub(struct page *page, int refs) +{ + VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); + VM_BUG_ON_PAGE(page != compound_head(page), page); + + atomic_sub(refs, compound_pincount_ptr(page)); +} + /* * Return the compound head page with ref appropriately incremented, * or NULL if that failed. @@ -70,8 +86,25 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page, if (flags & FOLL_GET) return try_get_compound_head(page, refs); else if (flags & FOLL_PIN) { - refs *= GUP_PIN_COUNTING_BIAS; - return try_get_compound_head(page, refs); + /* + * When pinning a compound page of order > 1 (which is what + * hpage_pincount_available() checks for), use an exact count to + * track it, via hpage_pincount_add/_sub(). + * + * However, be sure to *also* increment the normal page refcount + * field at least once, so that the page really is pinned. + */ + if (!hpage_pincount_available(page)) + refs *= GUP_PIN_COUNTING_BIAS; + + page = try_get_compound_head(page, refs); + if (!page) + return NULL; + + if (hpage_pincount_available(page)) + hpage_pincount_add(page, refs); + + return page; } WARN_ON_ONCE(1); @@ -106,12 +139,25 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags) if (flags & FOLL_GET) return try_get_page(page); else if (flags & FOLL_PIN) { + int refs = 1; + page = compound_head(page); if (WARN_ON_ONCE(page_ref_count(page) <= 0)) return false; - page_ref_add(page, GUP_PIN_COUNTING_BIAS); + if (hpage_pincount_available(page)) + hpage_pincount_add(page, 1); + else + refs = GUP_PIN_COUNTING_BIAS; + + /* + * Similar to try_grab_compound_head(): even if using the + * hpage_pincount_add/_sub() routines, be sure to + * *also* increment the normal page refcount field at least + * once, so that the page really is pinned. + */ + page_ref_add(page, refs); } return true; @@ -120,12 +166,17 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags) #ifdef CONFIG_DEV_PAGEMAP_OPS static bool __unpin_devmap_managed_user_page(struct page *page) { - int count; + int count, refs = 1; if (!page_is_devmap_managed(page)) return false; - count = page_ref_sub_return(page, GUP_PIN_COUNTING_BIAS); + if (hpage_pincount_available(page)) + hpage_pincount_sub(page, 1); + else + refs = GUP_PIN_COUNTING_BIAS; + + count = page_ref_sub_return(page, refs); /* * devmap page refcounts are 1-based, rather than 0-based: if @@ -157,6 +208,8 @@ static bool __unpin_devmap_managed_user_page(struct page *page) */ void unpin_user_page(struct page *page) { + int refs = 1; + page = compound_head(page); /* @@ -168,7 +221,12 @@ void unpin_user_page(struct page *page) if (__unpin_devmap_managed_user_page(page)) return; - if (page_ref_sub_and_test(page, GUP_PIN_COUNTING_BIAS)) + if (hpage_pincount_available(page)) + hpage_pincount_sub(page, 1); + else + refs = GUP_PIN_COUNTING_BIAS; + + if (page_ref_sub_and_test(page, refs)) __put_page(page); } EXPORT_SYMBOL(unpin_user_page); @@ -2200,8 +2258,12 @@ static int record_subpages(struct page *page, unsigned long addr, static void put_compound_head(struct page *page, int refs, unsigned int flags) { - if (flags & FOLL_PIN) - refs *= GUP_PIN_COUNTING_BIAS; + if (flags & FOLL_PIN) { + if (hpage_pincount_available(page)) + hpage_pincount_sub(page, refs); + else + refs *= GUP_PIN_COUNTING_BIAS; + } VM_BUG_ON_PAGE(page_ref_count(page) < refs, page); /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ba1de6bc1402..3d31a235b53d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1009,6 +1009,9 @@ static void destroy_compound_gigantic_page(struct page *page, struct page *p = page + 1; atomic_set(compound_mapcount_ptr(page), 0); + if (hpage_pincount_available(page)) + atomic_set(compound_pincount_ptr(page), 0); + for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) { clear_compound_head(p); set_page_refcounted(p); @@ -1287,6 +1290,9 @@ static void prep_compound_gigantic_page(struct page *page, unsigned int order) set_compound_head(p, page); } atomic_set(compound_mapcount_ptr(page), -1); + + if (hpage_pincount_available(page)) + atomic_set(compound_pincount_ptr(page), 0); } /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c4eb750a199..b2fe61035b7a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -689,6 +689,8 @@ void prep_compound_page(struct page *page, unsigned int order) set_compound_head(p, page); } atomic_set(compound_mapcount_ptr(page), -1); + if (hpage_pincount_available(page)) + atomic_set(compound_pincount_ptr(page), 0); } #ifdef CONFIG_DEBUG_PAGEALLOC diff --git a/mm/rmap.c b/mm/rmap.c index b3e381919835..e45b9b991e2f 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1178,6 +1178,9 @@ void page_add_new_anon_rmap(struct page *page, VM_BUG_ON_PAGE(!PageTransHuge(page), page); /* increment count (starts at -1) */ atomic_set(compound_mapcount_ptr(page), 0); + if (hpage_pincount_available(page)) + atomic_set(compound_pincount_ptr(page), 0); + __inc_node_page_state(page, NR_ANON_THPS); } else { /* Anon THP always mapped first with PMD */ @@ -1974,6 +1977,9 @@ void hugepage_add_new_anon_rmap(struct page *page, { BUG_ON(address < vma->vm_start || address >= vma->vm_end); atomic_set(compound_mapcount_ptr(page), 0); + if (hpage_pincount_available(page)) + atomic_set(compound_pincount_ptr(page), 0); + __page_set_anon_rmap(page, vma, address, 1); } #endif /* CONFIG_HUGETLB_PAGE */ From patchwork Fri Feb 7 03:37:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Hubbard X-Patchwork-Id: 208804 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86990C35247 for ; Fri, 7 Feb 2020 03:38:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5CA8221927 for ; Fri, 7 Feb 2020 03:38:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="ef/4udRo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727845AbgBGDil (ORCPT ); Thu, 6 Feb 2020 22:38:41 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:4751 "EHLO hqnvemgate24.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726709AbgBGDhj (ORCPT ); Thu, 6 Feb 2020 22:37:39 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 06 Feb 2020 19:36:38 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Thu, 06 Feb 2020 19:37:38 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Thu, 06 Feb 2020 19:37:38 -0800 Received: from HQMAIL111.nvidia.com (172.20.187.18) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 7 Feb 2020 03:37:37 +0000 Received: from hqnvemgw03.nvidia.com (10.124.88.68) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 7 Feb 2020 03:37:37 +0000 Received: from blueforge.nvidia.com (Not Verified[10.110.48.28]) by hqnvemgw03.nvidia.com with Trustwave SEG (v7, 5, 8, 10121) id ; Thu, 06 Feb 2020 19:37:37 -0800 From: John Hubbard To: Andrew Morton CC: Al Viro , Christoph Hellwig , Dan Williams , Dave Chinner , Ira Weiny , Jan Kara , Jason Gunthorpe , Jonathan Corbet , =?utf-8?b?SsOpcsO0bWUgR2xp?= =?utf-8?q?sse?= , "Kirill A . Shutemov" , Michal Hocko , Mike Kravetz , Shuah Khan , Vlastimil Babka , Matthew Wilcox , , , , , , LKML , John Hubbard , "Kirill A . Shutemov" Subject: [PATCH v5 09/12] mm: dump_page(): better diagnostics for huge pinned pages Date: Thu, 6 Feb 2020 19:37:32 -0800 Message-ID: <20200207033735.308000-10-jhubbard@nvidia.com> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200207033735.308000-1-jhubbard@nvidia.com> References: <20200207033735.308000-1-jhubbard@nvidia.com> MIME-Version: 1.0 X-NVConfidentiality: public DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581046599; bh=jmOUjzJ549EfFTXaNo+IqA1m0L9Edt0hTMtlZOuZ9oI=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:MIME-Version:X-NVConfidentiality: Content-Transfer-Encoding:Content-Type; b=ef/4udRoouVf31SNt8dxH8rGemdY2KQ3FUhL1foXTZbDgzP4IyhLK4YjOkoH8c4J7 vZIBhMUtt/Q4ked+2xEBrgqdgHxMVPEsMO+ZlJHNnQBUA0b/dN7rEQuPkTfxV5Iz+m YwMe+ClfugPn1gCR754PWzgyqnw3AwyYFOcqRCqkmf5o2pBbdcO5A8xjcfl4on2s17 zxAj6h1KNcrH4VfNdeKRUNav6fFaca+k9pEYjlc5vne5iyY9kswBj9/R7pYLkzXyQ4 Vp5jnRVUcRp0yCDbFslLV8IXO7B+toifjZo7qM2L5LOadtrhiHvkBEE9Lq9NvmG1A3 7txFYKNBc41Kg== Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org As part of pin_user_pages() and related API calls, pages are "dma-pinned". For the case of compound pages of order > 1, the per-page accounting of dma pins is accomplished via the 3rd struct page in the compound page. In order to support debugging of any pin_user_pages()- related problems, enhance dump_page() so as to report the pin count in that case. Documentation/core-api/pin_user_pages.rst is also updated accordingly. Acked-by: Kirill A. Shutemov Reviewed-by: Jan Kara Signed-off-by: John Hubbard --- Documentation/core-api/pin_user_pages.rst | 7 +++++ mm/debug.c | 34 +++++++++++++++++------ 2 files changed, 33 insertions(+), 8 deletions(-) diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst index 7e5dd8b1b3f2..24641f1a1eba 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -215,6 +215,13 @@ Those are both going to show zero, unless CONFIG_DEBUG_VM is set. This is because there is a noticeable performance drop in unpin_user_page(), when they are activated. +Other diagnostics +================= + +dump_page() has been enhanced slightly, to handle these new counting fields, and +to better report on compound pages in general. Specifically, for compound pages +with order > 1, the exact (hpage_pinned_refcount) pincount is reported. + References ========== diff --git a/mm/debug.c b/mm/debug.c index f074077eee11..e82c878c27df 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -57,10 +57,20 @@ static void __dump_tail_page(struct page *page, int mapcount) page, page_ref_count(page), mapcount, page->mapping, page_to_pgoff(page)); } else { - pr_warn("page:%px compound refcount:%d mapcount:%d mapping:%px " - "index:%#lx compound_mapcount:%d\n", - page, page_ref_count(head), mapcount, head->mapping, - page_to_pgoff(head), compound_mapcount(page)); + if (hpage_pincount_available(page)) + pr_warn("page:%px compound refcount:%d mapcount:%d " + "mapping:%px index:%#lx compound_mapcount:%d " + "compound_pincount:%d\n", + page, page_ref_count(head), mapcount, + head->mapping, page_to_pgoff(head), + compound_mapcount(page), + compound_pincount(page)); + else + pr_warn("page:%px compound refcount:%d mapcount:%d " + "mapping:%px index:%#lx compound_mapcount:%d\n", + page, page_ref_count(head), mapcount, + head->mapping, page_to_pgoff(head), + compound_mapcount(page)); } if (page_ref_count(page) != 0) { @@ -104,10 +114,18 @@ void __dump_page(struct page *page, const char *reason) if (PageTail(page)) __dump_tail_page(page, mapcount); - else - pr_warn("page:%px refcount:%d mapcount:%d mapping:%px index:%#lx\n", - page, page_ref_count(page), mapcount, - page->mapping, page_to_pgoff(page)); + else { + if (hpage_pincount_available(page)) + pr_warn("page:%px refcount:%d mapcount:%d mapping:%px " + "index:%#lx compound pincount: %d\n", + page, page_ref_count(page), mapcount, + page->mapping, page_to_pgoff(page), + compound_pincount(page)); + else + pr_warn("page:%px refcount:%d mapcount:%d mapping:%px " + "index:%#lx\n", page, page_ref_count(page), + mapcount, page->mapping, page_to_pgoff(page)); + } if (PageKsm(page)) type = "ksm "; else if (PageAnon(page))