From patchwork Fri Jan 14 22:05:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 532229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20824C433FE for ; Fri, 14 Jan 2022 22:05:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230144AbiANWF0 (ORCPT ); Fri, 14 Jan 2022 17:05:26 -0500 Received: from ams.source.kernel.org ([145.40.68.75]:58348 "EHLO ams.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230142AbiANWF0 (ORCPT ); Fri, 14 Jan 2022 17:05:26 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 2C028B8260F; Fri, 14 Jan 2022 22:05:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB18FC36AE5; Fri, 14 Jan 2022 22:05:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1642197924; bh=UW1gO00Rf3OgyA+/XNaq+nZ6muF292ZuljiSVTpw6sI=; h=Date:From:To:Subject:In-Reply-To:From; b=j4JpGheqJ0ItsRvu1N1noyOETc4mCX/sQ0pijsmUYER1Zj6Nrp+Kzqk4XHVYxFJ7X SM1w71sQutykBkH68eyllbfSV0UNflDB2RRh8Dc68hMB852cbTYURIklYIo3exLgm0 7qRQle4nJftTxk+6r9SFTjT1/FwdvRKihR3cJcRQ= Date: Fri, 14 Jan 2022 14:05:23 -0800 From: Andrew Morton To: akpm@linux-foundation.org, hughd@google.com, kirill.shutemov@linux.intel.com, ligang.bdlg@bytedance.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 046/146] shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode Message-ID: <20220114220523.T4gcXYcCD%akpm@linux-foundation.org> In-Reply-To: <20220114140222.6b14f0061194d3200000c52d@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Gang Li Subject: shmem: fix a race between shmem_unused_huge_shrink and shmem_evict_inode Fix a data race in commit 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure"). Here are call traces causing race: Call Trace 1: shmem_unused_huge_shrink+0x3ae/0x410 ? __list_lru_walk_one.isra.5+0x33/0x160 super_cache_scan+0x17c/0x190 shrink_slab.part.55+0x1ef/0x3f0 shrink_node+0x10e/0x330 kswapd+0x380/0x740 kthread+0xfc/0x130 ? mem_cgroup_shrink_node+0x170/0x170 ? kthread_create_on_node+0x70/0x70 ret_from_fork+0x1f/0x30 Call Trace 2: shmem_evict_inode+0xd8/0x190 evict+0xbe/0x1c0 do_unlinkat+0x137/0x330 do_syscall_64+0x76/0x120 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 A simple explanation: Image there are 3 items in the local list (@list). In the first traversal, A is not deleted from @list. 1) A->B->C ^ | pos (leave) In the second traversal, B is deleted from @list. Concurrently, A is deleted from @list through shmem_evict_inode() since last reference counter of inode is dropped by other thread. Then the @list is corrupted. 2) A->B->C ^ ^ | | evict pos (drop) We should make sure the inode is either on the global list or deleted from any local list before iput(). Fixed by moving inodes back to global list before we put them. [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/20211125064502.99983-1-ligang.bdlg@bytedance.com Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure") Signed-off-by: Gang Li Reviewed-by: Muchun Song Acked-by: Kirill A. Shutemov Cc: Hugh Dickins Cc: Signed-off-by: Andrew Morton --- mm/shmem.c | 37 +++++++++++++++++++++---------------- 1 file changed, 21 insertions(+), 16 deletions(-) --- a/mm/shmem.c~shmem-fix-a-race-between-shmem_unused_huge_shrink-and-shmem_evict_inode +++ a/mm/shmem.c @@ -554,7 +554,7 @@ static unsigned long shmem_unused_huge_s struct shmem_inode_info *info; struct page *page; unsigned long batch = sc ? sc->nr_to_scan : 128; - int removed = 0, split = 0; + int split = 0; if (list_empty(&sbinfo->shrinklist)) return SHRINK_STOP; @@ -569,7 +569,6 @@ static unsigned long shmem_unused_huge_s /* inode is about to be evicted */ if (!inode) { list_del_init(&info->shrinklist); - removed++; goto next; } @@ -577,12 +576,12 @@ static unsigned long shmem_unused_huge_s if (round_up(inode->i_size, PAGE_SIZE) == round_up(inode->i_size, HPAGE_PMD_SIZE)) { list_move(&info->shrinklist, &to_remove); - removed++; goto next; } list_move(&info->shrinklist, &list); next: + sbinfo->shrinklist_len--; if (!--batch) break; } @@ -602,7 +601,7 @@ next: inode = &info->vfs_inode; if (nr_to_split && split >= nr_to_split) - goto leave; + goto move_back; page = find_get_page(inode->i_mapping, (inode->i_size & HPAGE_PMD_MASK) >> PAGE_SHIFT); @@ -616,38 +615,44 @@ next: } /* - * Leave the inode on the list if we failed to lock - * the page at this time. + * Move the inode on the list back to shrinklist if we failed + * to lock the page at this time. * * Waiting for the lock may lead to deadlock in the * reclaim path. */ if (!trylock_page(page)) { put_page(page); - goto leave; + goto move_back; } ret = split_huge_page(page); unlock_page(page); put_page(page); - /* If split failed leave the inode on the list */ + /* If split failed move the inode on the list back to shrinklist */ if (ret) - goto leave; + goto move_back; split++; drop: list_del_init(&info->shrinklist); - removed++; -leave: + goto put; +move_back: + /* + * Make sure the inode is either on the global list or deleted + * from any local list before iput() since it could be deleted + * in another thread once we put the inode (then the local list + * is corrupted). + */ + spin_lock(&sbinfo->shrinklist_lock); + list_move(&info->shrinklist, &sbinfo->shrinklist); + sbinfo->shrinklist_len++; + spin_unlock(&sbinfo->shrinklist_lock); +put: iput(inode); } - spin_lock(&sbinfo->shrinklist_lock); - list_splice_tail(&list, &sbinfo->shrinklist); - sbinfo->shrinklist_len -= removed; - spin_unlock(&sbinfo->shrinklist_lock); - return split; } From patchwork Fri Jan 14 22:07:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 532567 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31BD0C4332F for ; Fri, 14 Jan 2022 22:07:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230209AbiANWHl (ORCPT ); Fri, 14 Jan 2022 17:07:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42830 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230202AbiANWHl (ORCPT ); Fri, 14 Jan 2022 17:07:41 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD72AC061574; Fri, 14 Jan 2022 14:07:40 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 7456AB8262E; Fri, 14 Jan 2022 22:07:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A8FAFC36AE9; Fri, 14 Jan 2022 22:07:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1642198058; bh=GvLI1VMIk/NS63MOGwEUgrmTl41cPtOIpDgQ1jhXHZY=; h=Date:From:To:Subject:In-Reply-To:From; b=lF6oR21qFJpecXE/MJh5AixdRuragDNro0gW2bUDgug5HWPvQ0oL3+vGjzmeasYl6 CmzdvA7y5xqS85SIf9EowzXfikOV4u+fia3mb0SYO2hXwhUnNQZdmTUreeo8S5gxUk ZvYLB6aSWV+/BZP9peU9JD/6l16FRZfSDAdCw1R8= Date: Fri, 14 Jan 2022 14:07:37 -0800 From: Andrew Morton To: 42.hyeyoo@gmail.com, akpm@linux-foundation.org, bhe@redhat.com, bp@alien8.de, cl@linux.com, David.Laight@ACULAB.COM, david@redhat.com, hch@lst.de, iamjoonsoo.kim@lge.com, john.p.donnelly@oracle.com, linux-mm@kvack.org, m.szyprowski@samsung.com, mm-commits@vger.kernel.org, penberg@kernel.org, rientjes@google.com, robin.murphy@arm.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 085/146] mm_zone: add function to check if managed dma zone exists Message-ID: <20220114220737.Yf78KyApy%akpm@linux-foundation.org> In-Reply-To: <20220114140222.6b14f0061194d3200000c52d@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Baoquan He Subject: mm_zone: add function to check if managed dma zone exists Patch series "Handle warning of allocation failure on DMA zone w/o managed pages", v4. **Problem observed: On x86_64, when crash is triggered and entering into kdump kernel, page allocation failure can always be seen. --------------------------------- DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 1 Comm: swapper/0 Call Trace: dump_stack+0x7f/0xa1 warn_alloc.cold+0x72/0xd6 ...... __alloc_pages+0x24d/0x2c0 ...... dma_atomic_pool_init+0xdb/0x176 do_one_initcall+0x67/0x320 ? rcu_read_lock_sched_held+0x3f/0x80 kernel_init_freeable+0x290/0x2dc ? rest_init+0x24f/0x24f kernel_init+0xa/0x111 ret_from_fork+0x22/0x30 Mem-Info: ------------------------------------ ***Root cause: In the current kernel, it assumes that DMA zone must have managed pages and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not always true. E.g in kdump kernel of x86_64, only low 1M is presented and locked down at very early stage of boot, so that this low 1M won't be added into buddy allocator to become managed pages of DMA zone. This exception will always cause page allocation failure if page is requested from DMA zone. ***Investigation: This failure happens since below commit merged into linus's tree. 1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options 23721c8e92f7 x86/crash: Remove crash_reserve_low_1M() f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM 7c321eb2b843 x86/kdump: Remove the backup region handling 6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified Before them, on x86_64, the low 640K area will be reused by kdump kernel. So in kdump kernel, the content of low 640K area is copied into a backup region for dumping before jumping into kdump. Then except of those firmware reserved region in [0, 640K], the left area will be added into buddy allocator to become available managed pages of DMA zone. However, after above commits applied, in kdump kernel of x86_64, the low 1M is reserved by memblock, but not released to buddy allocator. So any later page allocation requested from DMA zone will fail. At the beginning, if crashkernel is reserved, the low 1M need be locked down because AMD SME encrypts memory making the old backup region mechanims impossible when switching into kdump kernel. Later, it was also observed that there are BIOSes corrupting memory under 1M. To solve this, in commit f1d4d47c5851, the entire region of low 1M is always reserved after the real mode trampoline is allocated. Besides, recently, Intel engineer mentioned their TDX (Trusted domain extensions) which is under development in kernel also needs to lock down the low 1M. So we can't simply revert above commits to fix the page allocation failure from DMA zone as someone suggested. ***Solution: Currently, only DMA atomic pool and dma-kmalloc will initialize and request page allocation with GFP_DMA during bootup. So only initializ DMA atomic pool when DMA zone has available managed pages, otherwise just skip the initialization. For dma-kmalloc(), for the time being, let's mute the warning of allocation failure if requesting pages from DMA zone while no manged pages. Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc() if not necessary. Christoph is posting patches to fix those under drivers/scsi/. Finally, we can remove the need of dma-kmalloc() as people suggested. This patch (of 3): In some places of the current kernel, it assumes that dma zone must have managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true. E.g in kdump kernel of x86_64, only low 1M is presented and locked down at very early stage of boot, so that there's no managed pages at all in DMA zone. This exception will always cause page allocation failure if page is requested from DMA zone. Here add function has_managed_dma() and the relevant helper functions to check if there's DMA zone with managed pages. It will be used in later patches. Link: https://lkml.kernel.org/r/20211223094435.248523-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20211223094435.248523-2-bhe@redhat.com Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Signed-off-by: Baoquan He Reviewed-by: David Hildenbrand Acked-by: John Donnelly Cc: Christoph Hellwig Cc: Christoph Lameter Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: David Laight Cc: Borislav Petkov Cc: Marek Szyprowski Cc: Robin Murphy Cc: Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 9 +++++++++ mm/page_alloc.c | 15 +++++++++++++++ 2 files changed, 24 insertions(+) --- a/include/linux/mmzone.h~mm_zone-add-function-to-check-if-managed-dma-zone-exists +++ a/include/linux/mmzone.h @@ -1047,6 +1047,15 @@ static inline int is_highmem_idx(enum zo #endif } +#ifdef CONFIG_ZONE_DMA +bool has_managed_dma(void); +#else +static inline bool has_managed_dma(void) +{ + return false; +} +#endif + /** * is_highmem - helper function to quickly check if a struct zone is a * highmem zone or not. This is an attempt to keep references --- a/mm/page_alloc.c~mm_zone-add-function-to-check-if-managed-dma-zone-exists +++ a/mm/page_alloc.c @@ -9518,3 +9518,18 @@ bool take_page_off_buddy(struct page *pa return ret; } #endif + +#ifdef CONFIG_ZONE_DMA +bool has_managed_dma(void) +{ + struct pglist_data *pgdat; + + for_each_online_pgdat(pgdat) { + struct zone *zone = &pgdat->node_zones[ZONE_DMA]; + + if (managed_zone(zone)) + return true; + } + return false; +} +#endif /* CONFIG_ZONE_DMA */ From patchwork Fri Jan 14 22:07:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 532228 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC6CDC4332F for ; Fri, 14 Jan 2022 22:07:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230210AbiANWHq (ORCPT ); Fri, 14 Jan 2022 17:07:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230202AbiANWHp (ORCPT ); Fri, 14 Jan 2022 17:07:45 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6347BC061574; Fri, 14 Jan 2022 14:07:45 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id B6F76CE24A3; Fri, 14 Jan 2022 22:07:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8341BC36AE9; Fri, 14 Jan 2022 22:07:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1642198062; bh=J5e179zjw3W3zMjjHqBdwyuqskrDIkdNg9dL/tTN6Vo=; h=Date:From:To:Subject:In-Reply-To:From; b=VJldcyoJLn7ojA85bDXn+/AWKfRuSDX1Z9qCOtJb4Kc8Jv0XsgqPAPG9g8z9koBU6 BWKDeu57r6ZS/SvgqafRPXMKggQnhjlK8e/6vXVGXjCYb1uUt2YZP/B3GKQMqSOEtf 3w/6kRFDaYbaE3rB43Q1ZTpeF2M4+D5aMU90ry8s= Date: Fri, 14 Jan 2022 14:07:41 -0800 From: Andrew Morton To: 42.hyeyoo@gmail.com, akpm@linux-foundation.org, bhe@redhat.com, bp@alien8.de, cl@linux.com, David.Laight@ACULAB.COM, david@redhat.com, hch@lst.de, iamjoonsoo.kim@lge.com, john.p.donnelly@oracle.com, linux-mm@kvack.org, m.szyprowski@samsung.com, mm-commits@vger.kernel.org, penberg@kernel.org, rientjes@google.com, robin.murphy@arm.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 086/146] dma/pool: create dma atomic pool only if dma zone has managed pages Message-ID: <20220114220741.hCFHv9Bem%akpm@linux-foundation.org> In-Reply-To: <20220114140222.6b14f0061194d3200000c52d@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Baoquan He Subject: dma/pool: create dma atomic pool only if dma zone has managed pages Currently three dma atomic pools are initialized as long as the relevant kernel codes are built in. While in kdump kernel of x86_64, this is not right when trying to create atomic_pool_dma, because there's no managed pages in DMA zone. In the case, DMA zone only has low 1M memory presented and locked down by memblock allocator. So no pages are added into buddy of DMA zone. Please check commit f1d4d47c5851 ("x86/setup: Always reserve the first 1M of RAM"). Then in kdump kernel of x86_64, it always prints below failure message: DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1 Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018 Call Trace: dump_stack+0x7f/0xa1 warn_alloc.cold+0x72/0xd6 ? _raw_spin_unlock_irq+0x24/0x40 ? __alloc_pages_direct_compact+0x90/0x1b0 __alloc_pages_slowpath.constprop.0+0xf29/0xf50 ? __cond_resched+0x16/0x50 ? prepare_alloc_pages.constprop.0+0x19d/0x1b0 __alloc_pages+0x24d/0x2c0 ? __dma_atomic_pool_init+0x93/0x93 alloc_page_interleave+0x13/0xb0 atomic_pool_expand+0x118/0x210 ? __dma_atomic_pool_init+0x93/0x93 __dma_atomic_pool_init+0x45/0x93 dma_atomic_pool_init+0xdb/0x176 do_one_initcall+0x67/0x320 ? rcu_read_lock_sched_held+0x3f/0x80 kernel_init_freeable+0x290/0x2dc ? rest_init+0x24f/0x24f kernel_init+0xa/0x111 ret_from_fork+0x22/0x30 Mem-Info: ...... DMA: failed to allocate 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations Here, let's check if DMA zone has managed pages, then create atomic_pool_dma if yes. Otherwise just skip it. Link: https://lkml.kernel.org/r/20211223094435.248523-3-bhe@redhat.com Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Signed-off-by: Baoquan He Reviewed-by: Christoph Hellwig Acked-by: John Donnelly Reviewed-by: David Hildenbrand Cc: Marek Szyprowski Cc: Robin Murphy Cc: Borislav Petkov Cc: Christoph Lameter Cc: David Laight Cc: David Rientjes Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim Cc: Pekka Enberg Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- kernel/dma/pool.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/kernel/dma/pool.c~dma-pool-create-dma-atomic-pool-only-if-dma-zone-has-managed-pages +++ a/kernel/dma/pool.c @@ -203,7 +203,7 @@ static int __init dma_atomic_pool_init(v GFP_KERNEL); if (!atomic_pool_kernel) ret = -ENOMEM; - if (IS_ENABLED(CONFIG_ZONE_DMA)) { + if (has_managed_dma()) { atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size, GFP_KERNEL | GFP_DMA); if (!atomic_pool_dma) @@ -226,7 +226,7 @@ static inline struct gen_pool *dma_guess if (prev == NULL) { if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32)) return atomic_pool_dma32; - if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA)) + if (atomic_pool_dma && (gfp & GFP_DMA)) return atomic_pool_dma; return atomic_pool_kernel; } From patchwork Fri Jan 14 22:07:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 532566 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BE5EC433F5 for ; Fri, 14 Jan 2022 22:07:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230215AbiANWHt (ORCPT ); Fri, 14 Jan 2022 17:07:49 -0500 Received: from sin.source.kernel.org ([145.40.73.55]:37440 "EHLO sin.source.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230202AbiANWHt (ORCPT ); Fri, 14 Jan 2022 17:07:49 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id AFDC6CE2498; Fri, 14 Jan 2022 22:07:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 61332C36AEC; Fri, 14 Jan 2022 22:07:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1642198066; bh=H4631Pyu+q2Mu2CLJ4PHUC0/uQIvI0vr+z8ZF4QT454=; h=Date:From:To:Subject:In-Reply-To:From; b=yT2+9SDTyn93j9Rkm5df/HkuRF1usCFfqJjoK4id5XW1pV/DyjWFVEClRlk0JokbG 0n28Nb1qSqA7G7PysuB+HJC5K1bPWdq1f1ar6iaqkOMgZW+ETrjxnXAh65Fz9b42HI prQW7PcnPnG9hT0oDwHBDmEfVnqo2LpUAr7iySyo= Date: Fri, 14 Jan 2022 14:07:44 -0800 From: Andrew Morton To: 42.hyeyoo@gmail.com, akpm@linux-foundation.org, bhe@redhat.com, bp@alien8.de, cl@linux.com, David.Laight@ACULAB.COM, david@redhat.com, hch@lst.de, iamjoonsoo.kim@lge.com, john.p.donnelly@oracle.com, linux-mm@kvack.org, m.szyprowski@samsung.com, mm-commits@vger.kernel.org, penberg@kernel.org, rientjes@google.com, robin.murphy@arm.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 087/146] mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages Message-ID: <20220114220744.5KK-0rDF5%akpm@linux-foundation.org> In-Reply-To: <20220114140222.6b14f0061194d3200000c52d@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Baoquan He Subject: mm/page_alloc.c: do not warn allocation failure on zone DMA if no managed pages In kdump kernel of x86_64, page allocation failure is observed: kworker/u2:2: page allocation failure: order:0, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 55 Comm: kworker/u2:2 Not tainted 5.16.0-rc4+ #5 Hardware name: AMD Dinar/Dinar, BIOS RDN1505B 06/05/2013 Workqueue: events_unbound async_run_entry_fn Call Trace: dump_stack_lvl+0x48/0x5e warn_alloc.cold+0x72/0xd6 __alloc_pages_slowpath.constprop.0+0xc69/0xcd0 __alloc_pages+0x1df/0x210 new_slab+0x389/0x4d0 ___slab_alloc+0x58f/0x770 __slab_alloc.constprop.0+0x4a/0x80 kmem_cache_alloc_trace+0x24b/0x2c0 sr_probe+0x1db/0x620 ...... device_add+0x405/0x920 ...... __scsi_add_device+0xe5/0x100 ata_scsi_scan_host+0x97/0x1d0 async_run_entry_fn+0x30/0x130 process_one_work+0x1e8/0x3c0 worker_thread+0x50/0x3b0 ? rescuer_thread+0x350/0x350 kthread+0x16b/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 Mem-Info: ...... The above failure happened when calling kmalloc() to allocate buffer with GFP_DMA. It requests to allocate slab page from DMA zone while no managed pages at all in there. sr_probe() --> get_capabilities() --> buffer = kmalloc(512, GFP_KERNEL | GFP_DMA); Because in the current kernel, dma-kmalloc will be created as long as CONFIG_ZONE_DMA is enabled. However, kdump kernel of x86_64 doesn't have managed pages on DMA zone since commit 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified"). The failure can be always reproduced. For now, let's mute the warning of allocation failure if requesting pages from DMA zone while no managed pages. [akpm@linux-foundation.org: fix warning] Link: https://lkml.kernel.org/r/20211223094435.248523-4-bhe@redhat.com Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Signed-off-by: Baoquan He Acked-by: John Donnelly Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: Borislav Petkov Cc: Christoph Hellwig Cc: David Hildenbrand Cc: David Laight Cc: Marek Szyprowski Cc: Robin Murphy Cc: Signed-off-by: Andrew Morton --- mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/mm/page_alloc.c~mm-page_allocc-do-not-warn-allocation-failure-on-zone-dma-if-no-managed-pages +++ a/mm/page_alloc.c @@ -4218,7 +4218,9 @@ void warn_alloc(gfp_t gfp_mask, nodemask va_list args; static DEFINE_RATELIMIT_STATE(nopage_rs, 10*HZ, 1); - if ((gfp_mask & __GFP_NOWARN) || !__ratelimit(&nopage_rs)) + if ((gfp_mask & __GFP_NOWARN) || + !__ratelimit(&nopage_rs) || + ((gfp_mask & __GFP_DMA) && !has_managed_dma())) return; va_start(args, fmt);