From patchwork Fri Jun 19 13:24:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 224278 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63427C433DF for ; Fri, 19 Jun 2020 13:24:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3C26A21582 for ; Fri, 19 Jun 2020 13:24:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b="hjeGhM8n" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732259AbgFSNYa (ORCPT ); Fri, 19 Jun 2020 09:24:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40820 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731869AbgFSNY3 (ORCPT ); Fri, 19 Jun 2020 09:24:29 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A9CDC06174E for ; Fri, 19 Jun 2020 06:24:29 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id q198so1026811qka.2 for ; Fri, 19 Jun 2020 06:24:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=v6TMSJ/8lo2brWcEAaJNfw/tnRuRoCSR9EKb09QlOKc=; b=hjeGhM8nHtNupVQWYtakYj6ITtJU6A+GFOOKLKWqLENXkDtFRpY9UhjZaAXcoZqZ7i 7f+QXyWGuIYoWsmzxupfpIwB3aKDaG5mZ+/Ley21OJ+8Iow8mdzw0D01s+S2drGqHErv 9Gy3OtInPQYtcmI8usBZo1ZYgs77MwNDBOEE/1sE1dyiOSa3oiOL2eRFsnIalW9DSkdm O7CEm9rwug80aQ53KtOWGxsG0o9BEid7JoP9hLG74Lhh5OiVM6PsXo3ec5WQrlxn+ooZ LnYCrPd4HzTQ2oDfWKshAwI6Wm0KOMYOc8B12gXqPKfWBE4WG97SSry6+rPCAMTtINU8 GfSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=v6TMSJ/8lo2brWcEAaJNfw/tnRuRoCSR9EKb09QlOKc=; b=mFAhazMJkR+4TZttFjqAyvQ+bFZcGuhuHqlcuv3SVCIcIoXKcBfdfEOGWPiP8b8Qux /6R15ppXGDEWo1UrPFxFp2CiWBQAU+6AleqUJW2Bl4Ma2L9gV9FwBbxFTKw5sns+ljKN 73BRelQUvX8mNxFdJS3izq0Aqsokt4RM+gf3aHeOwCpknsb3rUmAhg89BJWD+lECGBgZ BxfX777UEtEImPSn9c/QN0AclJi8UeNv1wclJoY9gU4gTXOk/n80yRh48WEZw7bvcBsG UHuf/p5hw2Cpmp822wtV0XarjTIumMtKokFL9dB4vCQ0dw/ZkWJ76i223ai8t2vPytth pbQQ== X-Gm-Message-State: AOAM530VVojt/JcYMGFOGTa/ab4F3nEQ+JAjWEeLGz+ysMVNdO1Kbffy rjfEaaZvkqow2upYnJ8fMFLescP1+Bs= X-Google-Smtp-Source: ABdhPJwp2fQ6ZFBqaWgXjL3LIQ4UcUkZcrWPkvKQgFdCepiQcZlNkzdxNcROVe3ytZecpGM9ry/wgQ== X-Received: by 2002:a05:620a:526:: with SMTP id h6mr3380539qkh.338.1592573068233; Fri, 19 Jun 2020 06:24:28 -0700 (PDT) Received: from localhost.localdomain (c-73-69-118-222.hsd1.nh.comcast.net. [73.69.118.222]) by smtp.gmail.com with ESMTPSA id m26sm7146268qtm.73.2020.06.19.06.24.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jun 2020 06:24:27 -0700 (PDT) From: Pavel Tatashin To: stable@vger.kernel.org, akpm@linux-foundation.org, mhocko@suse.com, dan.j.williams@intel.com, shile.zhang@linux.alibaba.com, daniel.m.jordan@oracle.com, pasha.tatashin@soleen.com, ktkhai@virtuozzo.com, david@redhat.com, jmorris@namei.org, sashal@kernel.org, vbabka@suse.cz, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Subject: [PATCH 4.19 1/7] mm: drop meminit_pfn_in_nid as it is redundant Date: Fri, 19 Jun 2020 09:24:19 -0400 Message-Id: <20200619132425.425063-1-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Alexander Duyck From: Alexander Duyck commit 56ec43d8b02719402c9fcf984feb52ec2300f8a5 upstream. As best as I can tell the meminit_pfn_in_nid call is completely redundant. The deferred memory initialization is already making use of for_each_free_mem_range which in turn will call into __next_mem_range which will only return a memory range if it matches the node ID provided assuming it is not NUMA_NO_NODE. I am operating on the assumption that there are no zones or pgdata_t structures that have a NUMA node of NUMA_NO_NODE associated with them. If that is the case then __next_mem_range will never return a memory range that doesn't match the zone's node ID and as such the check is redundant. So one piece I would like to verify on this is if this works for ia64. Technically it was using a different approach to get the node ID, but it seems to have the node ID also encoded into the memblock. So I am assuming this is okay, but would like to get confirmation on that. On my x86_64 test system with 384GB of memory per node I saw a reduction in initialization time from 2.80s to 1.85s as a result of this patch. Link: http://lkml.kernel.org/r/20190405221219.12227.93957.stgit@localhost.localdomain Signed-off-by: Alexander Duyck Reviewed-by: Pavel Tatashin Acked-by: Michal Hocko Cc: Mike Rapoport Cc: Dan Williams Cc: Dave Jiang Cc: David S. Miller Cc: Ingo Molnar Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Laurent Dufour Cc: Matthew Wilcox Cc: Mel Gorman Cc: Mike Rapoport Cc: Pavel Tatashin Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Pavel Tatashin --- mm/page_alloc.c | 51 ++++++++++++++----------------------------------- 1 file changed, 14 insertions(+), 37 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d8c3051387d1..c86a117acb5b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1321,36 +1321,22 @@ int __meminit early_pfn_to_nid(unsigned long pfn) #endif #ifdef CONFIG_NODES_SPAN_OTHER_NODES -static inline bool __meminit __maybe_unused -meminit_pfn_in_nid(unsigned long pfn, int node, - struct mminit_pfnnid_cache *state) +/* Only safe to use early in boot when initialisation is single-threaded */ +static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node) { int nid; - nid = __early_pfn_to_nid(pfn, state); + nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache); if (nid >= 0 && nid != node) return false; return true; } -/* Only safe to use early in boot when initialisation is single-threaded */ -static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node) -{ - return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache); -} - #else - static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node) { return true; } -static inline bool __meminit __maybe_unused -meminit_pfn_in_nid(unsigned long pfn, int node, - struct mminit_pfnnid_cache *state) -{ - return true; -} #endif @@ -1480,21 +1466,13 @@ static inline void __init pgdat_init_report_one_done(void) * * Then, we check if a current large page is valid by only checking the validity * of the head pfn. - * - * Finally, meminit_pfn_in_nid is checked on systems where pfns can interleave - * within a node: a pfn is between start and end of a node, but does not belong - * to this memory node. */ -static inline bool __init -deferred_pfn_valid(int nid, unsigned long pfn, - struct mminit_pfnnid_cache *nid_init_state) +static inline bool __init deferred_pfn_valid(unsigned long pfn) { if (!pfn_valid_within(pfn)) return false; if (!(pfn & (pageblock_nr_pages - 1)) && !pfn_valid(pfn)) return false; - if (!meminit_pfn_in_nid(pfn, nid, nid_init_state)) - return false; return true; } @@ -1502,15 +1480,14 @@ deferred_pfn_valid(int nid, unsigned long pfn, * Free pages to buddy allocator. Try to free aligned pages in * pageblock_nr_pages sizes. */ -static void __init deferred_free_pages(int nid, int zid, unsigned long pfn, +static void __init deferred_free_pages(unsigned long pfn, unsigned long end_pfn) { - struct mminit_pfnnid_cache nid_init_state = { }; unsigned long nr_pgmask = pageblock_nr_pages - 1; unsigned long nr_free = 0; for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) { + if (!deferred_pfn_valid(pfn)) { deferred_free_range(pfn - nr_free, nr_free); nr_free = 0; } else if (!(pfn & nr_pgmask)) { @@ -1530,17 +1507,18 @@ static void __init deferred_free_pages(int nid, int zid, unsigned long pfn, * by performing it only once every pageblock_nr_pages. * Return number of pages initialized. */ -static unsigned long __init deferred_init_pages(int nid, int zid, +static unsigned long __init deferred_init_pages(struct zone *zone, unsigned long pfn, unsigned long end_pfn) { - struct mminit_pfnnid_cache nid_init_state = { }; unsigned long nr_pgmask = pageblock_nr_pages - 1; + int nid = zone_to_nid(zone); unsigned long nr_pages = 0; + int zid = zone_idx(zone); struct page *page = NULL; for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) { + if (!deferred_pfn_valid(pfn)) { page = NULL; continue; } else if (!page || !(pfn & nr_pgmask)) { @@ -1603,12 +1581,12 @@ static int __init deferred_init_memmap(void *data) for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa)); - nr_pages += deferred_init_pages(nid, zid, spfn, epfn); + nr_pages += deferred_init_pages(zone, spfn, epfn); } for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa)); - deferred_free_pages(nid, zid, spfn, epfn); + deferred_free_pages(spfn, epfn); } pgdat_resize_unlock(pgdat, &flags); @@ -1640,7 +1618,6 @@ static int __init deferred_init_memmap(void *data) static noinline bool __init deferred_grow_zone(struct zone *zone, unsigned int order) { - int zid = zone_idx(zone); int nid = zone_to_nid(zone); pg_data_t *pgdat = NODE_DATA(nid); unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); @@ -1690,7 +1667,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order) while (spfn < epfn && nr_pages < nr_pages_needed) { t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION); first_deferred_pfn = min(t, epfn); - nr_pages += deferred_init_pages(nid, zid, spfn, + nr_pages += deferred_init_pages(zone, spfn, first_deferred_pfn); spfn = first_deferred_pfn; } @@ -1702,7 +1679,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order) for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) { spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa)); epfn = min_t(unsigned long, first_deferred_pfn, PFN_DOWN(epa)); - deferred_free_pages(nid, zid, spfn, epfn); + deferred_free_pages(spfn, epfn); if (first_deferred_pfn == epfn) break; From patchwork Fri Jun 19 13:24:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 224277 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A39DC433DF for ; Fri, 19 Jun 2020 13:24:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 236F720DD4 for ; Fri, 19 Jun 2020 13:24:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b="ZmEn8W2r" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732706AbgFSNYk (ORCPT ); Fri, 19 Jun 2020 09:24:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731869AbgFSNYf (ORCPT ); Fri, 19 Jun 2020 09:24:35 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20C7AC0613EF for ; Fri, 19 Jun 2020 06:24:35 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id q198so1027117qka.2 for ; Fri, 19 Jun 2020 06:24:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=V0D9xk3SI/T+CK0VAFMcSpiPR6QOFoVjZ9Qrcy//Ru8=; b=ZmEn8W2rf0qhXWjz+lLdOuHloLGq0rbzXdEXswVZ7JTHztTvATDl8Cnt4BYbA3VNbW u5aOVw3bL1fQwRkIzcp6yxMz2rw6pZuKBau18aMkZS5PfjLD7bgnfhvMjR/qLaNW3RIC F6W9QDSPU21u1FgheX/nsD2cgfXBOqL1/3gJqV2pMAfRxmtQmyX5yviXPGfw9x8fvaie FHt1DOrSkmAe8wVs0LsyFWfxdW8sKKcCgRmfo6zPhPXsdyApvQkhvSA7agiapPSEPuDa eQnWBBtMvj2rhnrOPaX/l8LuSBV45pa1CoFdfz3dpXEkzqZKZzq3vcEbbEIK3OwzrTqz 22/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=V0D9xk3SI/T+CK0VAFMcSpiPR6QOFoVjZ9Qrcy//Ru8=; b=PSUct0vDc4e2iuYu2993I7u1R/xSMMbGMS8W+KXhmYzYxu3kncG0loI95oDGmIGTD0 qqnmE4bKyzwcLXlffycjnyatpcLgkxF6sSZe28zX1gzVghf6XGg5CmqhcU0DRiR2q1b+ GlPxiVWie6y+CuHwyviYRlceqhQl9Ou+bKemOfu/JC0JMJ3GhbMGk2chuHrDZ4sIuMMX wavgU9PMFha5DMloe98rLephl6blOoP9ovs/UpVz4Hi37xhXoWsB+TeOleGWPdRmGeA7 byAbfeMYnVLR2ShR56IjzcvDXdWagA5zkmTTfZUHkL3fcypW1vMfeTnNfvVKCjs7xEi6 fdGg== X-Gm-Message-State: AOAM5325kUqNACLKGpN3ut4sqV5Afze8zMQ4zPQs5jAodDbqv0m2SD7p NbWRP8DRZLYOTBRa8IvhRiQsnAtn/Uc= X-Google-Smtp-Source: ABdhPJzEDKYkRbnpMX1nwnzbjFhQhbIAoJoyJGTooKk6d0wZzB70o392Vyan/IxQFKDb3uFsmpKgqA== X-Received: by 2002:a05:620a:8c1:: with SMTP id z1mr3344438qkz.431.1592573074019; Fri, 19 Jun 2020 06:24:34 -0700 (PDT) Received: from localhost.localdomain (c-73-69-118-222.hsd1.nh.comcast.net. [73.69.118.222]) by smtp.gmail.com with ESMTPSA id m26sm7146268qtm.73.2020.06.19.06.24.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jun 2020 06:24:33 -0700 (PDT) From: Pavel Tatashin To: stable@vger.kernel.org, akpm@linux-foundation.org, mhocko@suse.com, dan.j.williams@intel.com, shile.zhang@linux.alibaba.com, daniel.m.jordan@oracle.com, pasha.tatashin@soleen.com, ktkhai@virtuozzo.com, david@redhat.com, jmorris@namei.org, sashal@kernel.org, vbabka@suse.cz, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Subject: [PATCH 4.19 5/7] mm/pagealloc.c: call touch_nmi_watchdog() on max order boundaries in deferred init Date: Fri, 19 Jun 2020 09:24:23 -0400 Message-Id: <20200619132425.425063-5-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200619132425.425063-1-pasha.tatashin@soleen.com> References: <20200619132425.425063-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Daniel Jordan From: Daniel Jordan commit 117003c32771df617acf66e140fbdbdeb0ac71f5 upstream. Patch series "initialize deferred pages with interrupts enabled", v4. Keep interrupts enabled during deferred page initialization in order to make code more modular and allow jiffies to update. Original approach, and discussion can be found here: http://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com This patch (of 3): deferred_init_memmap() disables interrupts the entire time, so it calls touch_nmi_watchdog() periodically to avoid soft lockup splats. Soon it will run with interrupts enabled, at which point cond_resched() should be used instead. deferred_grow_zone() makes the same watchdog calls through code shared with deferred init but will continue to run with interrupts disabled, so it can't call cond_resched(). Pull the watchdog calls up to these two places to allow the first to be changed later, independently of the second. The frequency reduces from twice per pageblock (init and free) to once per max order block. Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages") Signed-off-by: Daniel Jordan Signed-off-by: Pavel Tatashin Signed-off-by: Andrew Morton Reviewed-by: David Hildenbrand Acked-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Dan Williams Cc: Shile Zhang Cc: Kirill Tkhai Cc: James Morris Cc: Sasha Levin Cc: Yiqian Wei Cc: [4.17+] Link: http://lkml.kernel.org/r/20200403140952.17177-2-pasha.tatashin@soleen.com Signed-off-by: Linus Torvalds Signed-off-by: Pavel Tatashin --- mm/page_alloc.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2821e9824831..182f1198a406 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1493,7 +1493,6 @@ static void __init deferred_free_pages(unsigned long pfn, } else if (!(pfn & nr_pgmask)) { deferred_free_range(pfn - nr_free, nr_free); nr_free = 1; - touch_nmi_watchdog(); } else { nr_free++; } @@ -1523,7 +1522,6 @@ static unsigned long __init deferred_init_pages(struct zone *zone, continue; } else if (!page || !(pfn & nr_pgmask)) { page = pfn_to_page(pfn); - touch_nmi_watchdog(); } else { page++; } @@ -1663,8 +1661,10 @@ static int __init deferred_init_memmap(void *data) * that we can avoid introducing any issues with the buddy * allocator. */ - while (spfn < epfn) + while (spfn < epfn) { nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + touch_nmi_watchdog(); + } zone_empty: pgdat_resize_unlock(pgdat, &flags); @@ -1748,6 +1748,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order) first_deferred_pfn = spfn; nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + touch_nmi_watchdog(); /* We should only stop along section boundaries */ if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION) From patchwork Fri Jun 19 13:24:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 224275 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0272EC433E3 for ; Fri, 19 Jun 2020 13:24:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D813920DD4 for ; Fri, 19 Jun 2020 13:24:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b="BHOd5l6T" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732699AbgFSNYp (ORCPT ); Fri, 19 Jun 2020 09:24:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732450AbgFSNYh (ORCPT ); Fri, 19 Jun 2020 09:24:37 -0400 Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C6F3C0613F0 for ; Fri, 19 Jun 2020 06:24:36 -0700 (PDT) Received: by mail-qt1-x842.google.com with SMTP id z1so7140145qtn.2 for ; Fri, 19 Jun 2020 06:24:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=46d1xNWzm/veCkXrmLXd2XOA6eel5z1TmUHG5VT9uhM=; b=BHOd5l6TG1flLxIFgp758XhqsMYXz5flD/uCDcLV7hqkH6Br+/1+ubSDElcCohWs3I 0HFHntfAl9iIGO5fkz3IhocC2Wo0lNX1a5qOWKVnXQnCng5dbhmJ8eoruZ9X1CSIzM1E TwJtElRNqGKTi5LxiWI6aUoSTTNuvaKYJ/8DI6asxh2SP83gBkfdnemSwwi+NHqJeWIv Nn+CfDUoBZn7CLnHxEDGmDDwMgHZsXFvtzlbBbgZp3ZmjXiPG24cB56B6WAg9guf9Xdc 1kf2D75cnig6LA1UcNZQZGqrzB0RqKF8wlHnqxTO1HRycNpj3Ehb46DFMdi2O1fpkqay D6Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=46d1xNWzm/veCkXrmLXd2XOA6eel5z1TmUHG5VT9uhM=; b=ba4vJtdDxswzLpkkXjzL9F4r4pu7bM4s2TrIsOxjx6cLjKxJGw7I+vyqSjiiz3Dnov +3bAfHB6HuO++S9ED4YcOccIXP67ZpyCOoA+O2N+yx6RAD1fBdv4hNuht0H9UD5AHhra 8tDT9JkpeLM5CeRns2wMcYNTmdmjXhJUVU2AGp9YKpbqBXIBUtfmuvrfElrUocXGhhM6 pW/deejyo4pXm0A5etrBl0Ki4GcKU9opS46ymvwv2pa5wn+kXkCuWG0F3JcTbN/HPNGI 6aLSsUtN/T8Mg59hr6TzgC3RCFQbsVK1S0WVax7TZCPTcdxzZpC5juCuG+toDc2oQ6/r xfZA== X-Gm-Message-State: AOAM533/Wxe7AaEDLhH8zbim6ngps5NeFUObLBjtc/jTP8r5QYo8fbKM WETxTsAeOxMFGMsPXqCpYaVOgrOrGG8= X-Google-Smtp-Source: ABdhPJxlIK236nhod0z0R2oGrtuLIhou0iH8rzj8sE+8Xqa5P/AraWjFdhusundNqdUSAYKvwMtjSQ== X-Received: by 2002:ac8:7413:: with SMTP id p19mr3331731qtq.387.1592573075402; Fri, 19 Jun 2020 06:24:35 -0700 (PDT) Received: from localhost.localdomain (c-73-69-118-222.hsd1.nh.comcast.net. [73.69.118.222]) by smtp.gmail.com with ESMTPSA id m26sm7146268qtm.73.2020.06.19.06.24.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jun 2020 06:24:34 -0700 (PDT) From: Pavel Tatashin To: stable@vger.kernel.org, akpm@linux-foundation.org, mhocko@suse.com, dan.j.williams@intel.com, shile.zhang@linux.alibaba.com, daniel.m.jordan@oracle.com, pasha.tatashin@soleen.com, ktkhai@virtuozzo.com, david@redhat.com, jmorris@namei.org, sashal@kernel.org, vbabka@suse.cz, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Subject: [PATCH 4.19 6/7] mm: initialize deferred pages with interrupts enabled Date: Fri, 19 Jun 2020 09:24:24 -0400 Message-Id: <20200619132425.425063-6-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200619132425.425063-1-pasha.tatashin@soleen.com> References: <20200619132425.425063-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Pavel Tatashin commit da97f2d56bbd880b4138916a7ef96f9881a551b2 upstream. Initializing struct pages is a long task and keeping interrupts disabled for the duration of this operation introduces a number of problems. 1. jiffies are not updated for long period of time, and thus incorrect time is reported. See proposed solution and discussion here: lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com 2. It prevents farther improving deferred page initialization by allowing intra-node multi-threading. We are keeping interrupts disabled to solve a rather theoretical problem that was never observed in real world (See 3a2d7fa8a3d5). Let's keep interrupts enabled. In case we ever encounter a scenario where an interrupt thread wants to allocate large amount of memory this early in boot we can deal with that by growing zone (see deferred_grow_zone()) by the needed amount before starting deferred_init_memmap() threads. Before: [ 1.232459] node 0 initialised, 12058412 pages in 1ms After: [ 1.632580] node 0 initialised, 12051227 pages in 436ms Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages") Reported-by: Shile Zhang Signed-off-by: Pavel Tatashin Signed-off-by: Andrew Morton Reviewed-by: Daniel Jordan Reviewed-by: David Hildenbrand Acked-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Dan Williams Cc: James Morris Cc: Kirill Tkhai Cc: Sasha Levin Cc: Yiqian Wei Cc: [4.17+] Link: http://lkml.kernel.org/r/20200403140952.17177-3-pasha.tatashin@soleen.com Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 6 ++++-- mm/page_alloc.c | 20 +++++++------------- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d6791e2df30a..fba0eee85392 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -636,8 +636,10 @@ typedef struct pglist_data { #endif #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT) /* - * Must be held any time you expect node_start_pfn, node_present_pages - * or node_spanned_pages stay constant. + * Must be held any time you expect node_start_pfn, + * node_present_pages, node_spanned_pages or nr_zones to stay constant. + * Also synchronizes pgdat->first_deferred_pfn during deferred page + * init. * * pgdat_resize_lock() and pgdat_resize_unlock() are provided to * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 182f1198a406..05c27edbe076 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1644,6 +1644,13 @@ static int __init deferred_init_memmap(void *data) BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); pgdat->first_deferred_pfn = ULONG_MAX; + /* + * Once we unlock here, the zone cannot be grown anymore, thus if an + * interrupt thread must allocate this early in boot, zone must be + * pre-grown prior to start of deferred page initialization. + */ + pgdat_resize_unlock(pgdat, &flags); + /* Only the highest zone is deferred so find it */ for (zid = 0; zid < MAX_NR_ZONES; zid++) { zone = pgdat->node_zones + zid; @@ -1666,8 +1673,6 @@ static int __init deferred_init_memmap(void *data) touch_nmi_watchdog(); } zone_empty: - pgdat_resize_unlock(pgdat, &flags); - /* Sanity check that the next zone really is unpopulated */ WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); @@ -1709,17 +1714,6 @@ deferred_grow_zone(struct zone *zone, unsigned int order) pgdat_resize_lock(pgdat, &flags); - /* - * If deferred pages have been initialized while we were waiting for - * the lock, return true, as the zone was grown. The caller will retry - * this zone. We won't return to this function since the caller also - * has this static branch. - */ - if (!static_branch_unlikely(&deferred_pages)) { - pgdat_resize_unlock(pgdat, &flags); - return true; - } - /* * If someone grew this zone while we were waiting for spinlock, return * true, as there might be enough pages already. From patchwork Fri Jun 19 13:24:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 224276 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B07C7C433E1 for ; Fri, 19 Jun 2020 13:24:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8DB7D20DD4 for ; Fri, 19 Jun 2020 13:24:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b="TLUB1RFG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732715AbgFSNYo (ORCPT ); Fri, 19 Jun 2020 09:24:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732699AbgFSNYj (ORCPT ); Fri, 19 Jun 2020 09:24:39 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 253F8C061794 for ; Fri, 19 Jun 2020 06:24:38 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id x62so3772580qtd.3 for ; Fri, 19 Jun 2020 06:24:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=UMmRwCkRAhMS3MbiqQcecS88t7B8fQ168Nxabe54lAo=; b=TLUB1RFGf2DXf2lMtsNGm0Z61F53yxlBfSVOci3GvntHq50uUGDHaVY0Mq8A/Htbhe HrObzydPUXM9jYQddTMnH3n3ZEuW5ltAsn1IYEIskwo09vMJvuWVx/3geAvGRkjapyeE I+1MzLOdMO/bqSDwtxpTD1aSr3r54nCWoi9nScny80dXJlcZo41e9L2jaizJPkzQyip9 W9/4zR8n6U3eSQ5okgOrQedTBU+QH9qiLDzHMVlJhog3MKusqDtmamHJ8TFkviPCC+Bv 2XU6b3ENxo+HvQ8tDdCzE9IoY8O5BARZ6B8AJdheB0OfqXuD9KnYW9FzV7rBOvmTMhS8 euDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UMmRwCkRAhMS3MbiqQcecS88t7B8fQ168Nxabe54lAo=; b=bxGx9ByNEbNAOT2PKht/+feVvPvBVoAIi2ji/QcoSG5QouVCdXzxAgB0H3G+bLHx3w lXOFgEF8fbTIccxDG2ALTSsazjEkg+ovn3EWJLZMaedqMRLPjcYBbl8pAeyxrPFrn+0b da2Xb27DGBZBjxsEEMd8j8A1UrNS/EYjnlwqF/n575t9NYvW/9v8Ovc82AEf2Wb7qa5Z 11qtHpLO4dju1FrA/8llDNmFBkXzX0dqrTWd7fhtVzgXpfvzyVvbcJtmgvrVtswBMKKj LPwi8hWb62o6j/Ccx++JHLo/KbGULW1U8zhrD7AOyyBgXSicG+aOwillwZvuOevLIaQj R7zg== X-Gm-Message-State: AOAM530+H5N2R26NF5/GkVOVtybxVkWaFW1u0EiDvUHranj8Yls5ktec rPu5BIexbAetZ4mXTwFvq5EFAnDrN9Q= X-Google-Smtp-Source: ABdhPJy8CkK8fpA7AN7mH2AOLC1PEgIFwO2iFMu/A0yTeNpMOu9bQE+Kt+886emGTWfKlSZZAn1z3A== X-Received: by 2002:ac8:4088:: with SMTP id p8mr3298769qtl.119.1592573076968; Fri, 19 Jun 2020 06:24:36 -0700 (PDT) Received: from localhost.localdomain (c-73-69-118-222.hsd1.nh.comcast.net. [73.69.118.222]) by smtp.gmail.com with ESMTPSA id m26sm7146268qtm.73.2020.06.19.06.24.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jun 2020 06:24:36 -0700 (PDT) From: Pavel Tatashin To: stable@vger.kernel.org, akpm@linux-foundation.org, mhocko@suse.com, dan.j.williams@intel.com, shile.zhang@linux.alibaba.com, daniel.m.jordan@oracle.com, pasha.tatashin@soleen.com, ktkhai@virtuozzo.com, david@redhat.com, jmorris@namei.org, sashal@kernel.org, vbabka@suse.cz, gregkh@linuxfoundation.org, torvalds@linux-foundation.org Subject: [PATCH 4.19 7/7] mm: call cond_resched() from deferred_init_memmap() Date: Fri, 19 Jun 2020 09:24:25 -0400 Message-Id: <20200619132425.425063-7-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200619132425.425063-1-pasha.tatashin@soleen.com> References: <20200619132425.425063-1-pasha.tatashin@soleen.com> MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Pavel Tatashin commit 3d060856adfc59afb9d029c233141334cfaba418 upstream. Now that deferred pages are initialized with interrupts enabled we can replace touch_nmi_watchdog() with cond_resched(), as it was before 3a2d7fa8a3d5. For now, we cannot do the same in deferred_grow_zone() as it is still initializes pages with interrupts disabled. This change fixes RCU problem described in https://lkml.kernel.org/r/20200401104156.11564-2-david@redhat.com [ 60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 60.475000] rcu: 1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000 [ 60.475000] rcu: (detected by 0, t=60002 jiffies, g=-1199, q=1) [ 60.475000] Sending NMI from CPU 0 to CPUs 1: [ 1.760091] NMI backtrace for cpu 1 [ 1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1 [ 1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014 [ 1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f [ 1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41 [ 1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006 [ 1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000 [ 1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340 [ 1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002 [ 1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002 [ 1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c [ 1.760091] FS: 0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000 [ 1.760091] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0 [ 1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.760091] Call Trace: [ 1.760091] deferred_init_pages+0x8f/0xbf [ 1.760091] deferred_init_memmap+0x184/0x29d [ 1.760091] ? deferred_free_pages.isra.97+0xba/0xba [ 1.760091] kthread+0x112/0x130 [ 1.760091] ? kthread_flush_work_fn+0x10/0x10 [ 1.760091] ret_from_fork+0x35/0x40 [ 89.123011] node 0 initialised, 1055935372 pages in 88650ms Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages") Reported-by: Yiqian Wei Signed-off-by: Pavel Tatashin Signed-off-by: Andrew Morton Tested-by: David Hildenbrand Reviewed-by: Daniel Jordan Reviewed-by: David Hildenbrand Reviewed-by: Pankaj Gupta Acked-by: Michal Hocko Cc: Dan Williams Cc: James Morris Cc: Kirill Tkhai Cc: Sasha Levin Cc: Shile Zhang Cc: Vlastimil Babka Cc: [4.17+] Link: http://lkml.kernel.org/r/20200403140952.17177-4-pasha.tatashin@soleen.com Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 05c27edbe076..96b8f5e8a008 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1670,7 +1670,7 @@ static int __init deferred_init_memmap(void *data) */ while (spfn < epfn) { nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); - touch_nmi_watchdog(); + cond_resched(); } zone_empty: /* Sanity check that the next zone really is unpopulated */