From patchwork Tue Jan 5 19:59:01 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Capper X-Patchwork-Id: 59208 Delivered-To: patch@linaro.org Received: by 10.112.130.2 with SMTP id oa2csp6171339lbb; Tue, 5 Jan 2016 12:00:47 -0800 (PST) X-Received: by 10.98.71.91 with SMTP id u88mr127015139pfa.30.1452024042289; Tue, 05 Jan 2016 12:00:42 -0800 (PST) Return-Path: Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id i84si28479213pfj.77.2016.01.05.12.00.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Jan 2016 12:00:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) client-ip=2001:1868:205::9; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) smtp.mailfrom=linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org; dkim=neutral (body hash did not verify) header.i=@linaro.org Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1aGXlO-00077U-7e; Tue, 05 Jan 2016 19:59:26 +0000 Received: from mail-io0-x231.google.com ([2607:f8b0:4001:c06::231]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1aGXlK-00074y-T5 for linux-arm-kernel@lists.infradead.org; Tue, 05 Jan 2016 19:59:24 +0000 Received: by mail-io0-x231.google.com with SMTP id 1so150865558ion.1 for ; Tue, 05 Jan 2016 11:59:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=EDrKqaTUnage5cWpvXp0DxpQ3wtifAJqqp/WiwJC/tk=; b=iWyNPmoZQduAtDsCorYUYyOmB7elKJW0kVz9RMRNReapGHVVhTLgJBV7sD4WsBRL30 7NcGWzt+P+L8WaQ8/+U9vqF0GVRGAEgjN5bW6h8Dco7oHnKmLBij6V+qcS1tUxrhLnNN lCswYHgVjL+GE1zFM6dMNnFe05ch19RV8D3g4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=EDrKqaTUnage5cWpvXp0DxpQ3wtifAJqqp/WiwJC/tk=; b=cyJmkSSS/lHKEHJNUTSt9oun+7ao28zPgD6JXAACvv64YfDJjIfF/93seiw28ZkoJ/ pcatRD/WpEtzLFyn/2q33UuiZVOvvzl6Lhh7BOHKGTM9sB3pAiSTftQihnPSW56XGg42 9t9UuBvkETk6WfzatxII5lqzZSe1u5KR4pR8xnxJUW4F62mfIFswFUQAPKUEY6cHNOTv GapJecnLt2D62sCgqXnS/FJ3tNu8ptieKvVlVWKKP31ZpN9IZF5/h35LbAIrd23mbkCq oAqc+GtBt/EboGLUqEBMCumpB8hhzz722hNO+hezll0PhTgITHp7832BXQn5pJDYsu7n QR0A== X-Gm-Message-State: ALoCoQmVQiunbWHsS0opyyU8kczyQG5Y63oKn9hWI4s3TEVJQ2G2c+NHAB3WJvDIHeI+XXDGaJkbkusBbXzNV1jijGkIOpCMLMBxNBYtCYoQSTUwwbacWFI= MIME-Version: 1.0 X-Received: by 10.107.14.72 with SMTP id 69mr53112066ioo.145.1452023941538; Tue, 05 Jan 2016 11:59:01 -0800 (PST) Received: by 10.36.159.67 with HTTP; Tue, 5 Jan 2016 11:59:01 -0800 (PST) In-Reply-To: <568BB55F.2020709@arm.com> References: <20160104224233.GU16023@sirena.org.uk> <20160104150946.373ed02b8e8b81221340b7c8@linux-foundation.org> <20160104235512.GW16023@sirena.org.uk> <20160104163528.be56a4b1.akpm@linux-foundation.org> <20160105114549.GX16023@sirena.org.uk> <568BB55F.2020709@arm.com> Date: Tue, 5 Jan 2016 19:59:01 +0000 Message-ID: Subject: Re: Widespread boot failures on ARM due to "mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()" From: Steve Capper To: Sudeep Holla X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160105_115923_291170_1F850D50 X-CRM114-Status: GOOD ( 18.88 ) X-Spam-Score: -2.7 (--) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-2.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [2607:f8b0:4001:c06:0:0:0:231 listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matt Fleming , Stephen Rothwell , Tony Luck , Russell King , Kernel Build Reports Mailman List , Mel Gorman , Kamezawa Hiroyuki , Tyler Baker , Dave Hansen , Kevin.Hilman@linaro.org, Mark Brown , linux-next@vger.kernel.org, Taku Izumi , Xishi Qiu , Andrew Morton , "linux-arm-kernel@lists.infradead.org" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org On 5 January 2016 at 12:21, Sudeep Holla wrote: > > > On 05/01/16 11:45, Mark Brown wrote: >> >> On Mon, Jan 04, 2016 at 04:35:28PM -0800, Andrew Morton wrote: >>> >>> On Mon, 4 Jan 2016 23:55:12 +0000 Mark Brown wrote: >>>> >>>> On Mon, Jan 04, 2016 at 03:09:46PM -0800, Andrew Morton wrote: >> >> >>>>> Thanks. That patch has rather a blooper if >>>>> CONFIG_HAVE_MEMBLOCK_NODE_MAP=n. Is that the case in your testing? >> >> >>>> Seems to be what's making a difference from a quick run through, yes. >> >> >>> OK, thanks. >> >> >> Seems like I was mistaken here somehow or there's some other problem - >> I've kicked off another bisect for today's -next: >> >> >> https://ci.linaro.org/view/people/job/tbaker-boot-bisect-bot/137/console >> >> and will follow up with any results. >> > > With both patches applied(one already in today's -next), I am able to > boot on ARM64 platform but I get huge load(for each pfn) of below warning: > > -->8 > > BUG: Bad page state in process swapper pfn:900000 > page:ffffffbde4000000 count:0 mapcount:1 mapping: (null) index:0x0 > flags: 0x0() > page dumped because: nonzero mapcount > Modules linked in: > Hardware name: ARM Juno development board (r0) (DT) > Call trace: > [] dump_backtrace+0x0/0x180 > [] show_stack+0x14/0x20 > [] dump_stack+0x90/0xc8 > [] bad_page+0xd8/0x138 > [] free_pages_prepare+0x218/0x290 > [] __free_pages_ok+0x1c/0xb8 > [] __free_pages+0x30/0x50 > [] __free_pages_bootmem+0xa0/0xa8 > [] free_all_bootmem+0x11c/0x184 > [] mem_init+0x48/0x1b4 > [] start_kernel+0x224/0x3b4 > [<0000000080663000>] 0x80663000 > Disabling lock debugging due to kernel taint > > -- I managed to get 904769ac82ebf60cb54f225f59ae7c064772a4d7 booting on an arm64 machine without errors with the following changes: @@ -5328,6 +5337,8 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size, pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, (u64)start_pfn << PAGE_SHIFT, end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0); +#else + start_pfn = node_start_pfn; #endif calculate_node_totalpages(pgdat, start_pfn, end_pfn, zones_size, zholes_size); ===================================== My understanding is that 904769a ("mm/page_alloc.c: calculate zone_start_pfn at zone_spanned_pages_in_node()") inadvertently discards information when pgdat->node_start_pfn is removed from free_area_init_core (and zone_start_pfn is no longer updated by "size" in the loop inside free_area_init_core). This isn't an issue with systems where CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled as zone_start_pfn is set correctly. On systems without CONFIG_HAVE_MEMBLOCK_NODE_MAP, zone_start_pfn is always 0. When I ported the above fix to linux-next (8ef79cd05e6894c01ab9b41aa918a402fa8022a7) I was able to boot in a VM but not on my actual machine, I'll investigate that tomorrow. Cheers, -- Steve _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ===================================== diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a8bb70d..0edb608 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5013,6 +5013,15 @@ static inline unsigned long __meminit zone_spanned_pages_in_node(int nid, unsigned long *zone_end_pfn, unsigned long *zones_size) { + unsigned int zone; + + *zone_start_pfn = node_start_pfn; + for (zone = 0; zone < zone_type; zone++) { + *zone_start_pfn += zones_size[zone]; + } + + *zone_end_pfn = *zone_start_pfn + zones_size[zone_type]; + return zones_size[zone_type]; }