From patchwork Tue Jan 28 09:50:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 232769 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.4 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_DBL_ABUSE_MALW autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0F7FC35246 for ; Tue, 28 Jan 2020 09:51:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 82E8C24686 for ; Tue, 28 Jan 2020 09:51:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aOlhcgsh" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725922AbgA1JvX (ORCPT ); Tue, 28 Jan 2020 04:51:23 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:39961 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725901AbgA1JvX (ORCPT ); Tue, 28 Jan 2020 04:51:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580205081; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=72SdbaJ4/IJRXbpxtNMbuMiXfQLcQVRYkn12Z1ngsTA=; b=aOlhcgshT6javYK6o0papfeEF8PBK5BdlMor23+dPpMxFAq1DdI4So40KqBEL8XmKTQfCY pt5QHGISB3qLeeB8fXlOl4wo7vJ8CXklCyxTO4BauWw6iv3Zueujl7sRt8l6cOX+5ZKSGz +MUQFssIfbmCzbLC2SE8PDa3epl4vRU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-195-Y9TFb9LFO4GSrZ3CmDwMxg-1; Tue, 28 Jan 2020 04:51:16 -0500 X-MC-Unique: Y9TFb9LFO4GSrZ3CmDwMxg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 37BBC801E67; Tue, 28 Jan 2020 09:51:15 +0000 (UTC) Received: from t480s.redhat.com (ovpn-116-207.ams2.redhat.com [10.36.116.207]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0B0C560BE0; Tue, 28 Jan 2020 09:51:12 +0000 (UTC) From: David Hildenbrand To: stable@vger.kernel.org Cc: linux-mm@kvack.org, Michal Hocko , Greg Kroah-Hartman , Andrew Morton , "Aneesh Kumar K . V" , Baoquan He , Dan Williams , Oscar Salvador , Wei Yang , David Hildenbrand Subject: [PATCH for 4.19-stable v3 16/24] mm/memory_hotplug: create memory block devices after arch_add_memory() Date: Tue, 28 Jan 2020 10:50:13 +0100 Message-Id: <20200128095021.8076-17-david@redhat.com> In-Reply-To: <20200128095021.8076-1-david@redhat.com> References: <20200128095021.8076-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org commit db051a0dac13db24d58470d75cee0ce7c6b031a1 upstream. Only memory to be added to the buddy and to be onlined/offlined by user space using /sys/devices/system/memory/... needs (and should have!) memory block devices. Factor out creation of memory block devices. Create all devices after arch_add_memory() succeeded. We can later drop the want_memblock parameter, because it is now effectively stale. Only after memory block devices have been added, memory can be onlined by user space. This implies, that memory is not visible to user space at all before arch_add_memory() succeeded. While at it - use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory() - introduce find_memory_block_by_id() to search via block id - Use find_memory_block_by_id() in init_memory_block() to catch duplicates Link: http://lkml.kernel.org/r/20190527111152.16324-8-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Pavel Tatashin Acked-by: Michal Hocko Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: David Hildenbrand Cc: "mike.travis@hpe.com" Cc: Ingo Molnar Cc: Andrew Banman Cc: Oscar Salvador Cc: Qian Cai Cc: Wei Yang Cc: Arun KS Cc: Mathieu Malaterre Cc: Alex Deucher Cc: Andy Lutomirski Cc: Anshuman Khandual Cc: Ard Biesheuvel Cc: Baoquan He Cc: Benjamin Herrenschmidt Cc: Borislav Petkov Cc: Catalin Marinas Cc: Chintan Pandya Cc: Christophe Leroy Cc: Chris Wilson Cc: Dan Williams Cc: Dave Hansen Cc: "David S. Miller" Cc: Fenghua Yu Cc: Heiko Carstens Cc: "H. Peter Anvin" Cc: Jonathan Cameron Cc: Joonsoo Kim Cc: Jun Yao Cc: "Kirill A. Shutemov" Cc: Logan Gunthorpe Cc: Mark Brown Cc: Mark Rutland Cc: Masahiro Yamada Cc: Michael Ellerman Cc: Mike Rapoport Cc: Nicholas Piggin Cc: Oscar Salvador Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Rich Felker Cc: Rob Herring Cc: Robin Murphy Cc: Thomas Gleixner Cc: Tony Luck Cc: Vasily Gorbik Cc: Will Deacon Cc: Yoshinori Sato Cc: Yu Zhao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: David Hildenbrand --- drivers/base/memory.c | 82 +++++++++++++++++++++++++++--------------- include/linux/memory.h | 2 +- mm/memory_hotplug.c | 15 ++++---- 3 files changed, 63 insertions(+), 36 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index f9818d75ac43..b89b9c3efa59 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -39,6 +39,11 @@ static inline int base_memory_block_id(int section_nr) return section_nr / sections_per_block; } +static inline int pfn_to_block_id(unsigned long pfn) +{ + return base_memory_block_id(pfn_to_section_nr(pfn)); +} + static int memory_subsys_online(struct device *dev); static int memory_subsys_offline(struct device *dev); @@ -591,10 +596,9 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn) * A reference for the returned object is held and the reference for the * hinted object is released. */ -struct memory_block *find_memory_block_hinted(struct mem_section *section, - struct memory_block *hint) +static struct memory_block *find_memory_block_by_id(int block_id, + struct memory_block *hint) { - int block_id = base_memory_block_id(__section_nr(section)); struct device *hintdev = hint ? &hint->dev : NULL; struct device *dev; @@ -606,6 +610,14 @@ struct memory_block *find_memory_block_hinted(struct mem_section *section, return to_memory_block(dev); } +struct memory_block *find_memory_block_hinted(struct mem_section *section, + struct memory_block *hint) +{ + int block_id = base_memory_block_id(__section_nr(section)); + + return find_memory_block_by_id(block_id, hint); +} + /* * For now, we have a linear search to go find the appropriate * memory_block corresponding to a particular phys_index. If @@ -667,6 +679,11 @@ static int init_memory_block(struct memory_block **memory, int block_id, unsigned long start_pfn; int ret = 0; + mem = find_memory_block_by_id(block_id, NULL); + if (mem) { + put_device(&mem->dev); + return -EEXIST; + } mem = kzalloc(sizeof(*mem), GFP_KERNEL); if (!mem) return -ENOMEM; @@ -704,44 +721,53 @@ static int add_memory_block(int base_section_nr) return 0; } +static void unregister_memory(struct memory_block *memory) +{ + if (WARN_ON_ONCE(memory->dev.bus != &memory_subsys)) + return; + + /* drop the ref. we got via find_memory_block() */ + put_device(&memory->dev); + device_unregister(&memory->dev); +} + /* - * need an interface for the VM to add new memory regions, - * but without onlining it. + * Create memory block devices for the given memory area. Start and size + * have to be aligned to memory block granularity. Memory block devices + * will be initialized as offline. */ -int hotplug_memory_register(int nid, struct mem_section *section) +int create_memory_block_devices(unsigned long start, unsigned long size) { - int block_id = base_memory_block_id(__section_nr(section)); - int ret = 0; + const int start_block_id = pfn_to_block_id(PFN_DOWN(start)); + int end_block_id = pfn_to_block_id(PFN_DOWN(start + size)); struct memory_block *mem; + unsigned long block_id; + int ret = 0; - mutex_lock(&mem_sysfs_mutex); + if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) || + !IS_ALIGNED(size, memory_block_size_bytes()))) + return -EINVAL; - mem = find_memory_block(section); - if (mem) { - mem->section_count++; - put_device(&mem->dev); - } else { + mutex_lock(&mem_sysfs_mutex); + for (block_id = start_block_id; block_id != end_block_id; block_id++) { ret = init_memory_block(&mem, block_id, MEM_OFFLINE); if (ret) - goto out; - mem->section_count++; + break; + mem->section_count = sections_per_block; + } + if (ret) { + end_block_id = block_id; + for (block_id = start_block_id; block_id != end_block_id; + block_id++) { + mem = find_memory_block_by_id(block_id, NULL); + mem->section_count = 0; + unregister_memory(mem); + } } - -out: mutex_unlock(&mem_sysfs_mutex); return ret; } -static void -unregister_memory(struct memory_block *memory) -{ - BUG_ON(memory->dev.bus != &memory_subsys); - - /* drop the ref. we got via find_memory_block() */ - put_device(&memory->dev); - device_unregister(&memory->dev); -} - void unregister_memory_section(struct mem_section *section) { struct memory_block *mem; diff --git a/include/linux/memory.h b/include/linux/memory.h index 474c7c60c8f2..db3e8567f900 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -111,7 +111,7 @@ extern int register_memory_notifier(struct notifier_block *nb); extern void unregister_memory_notifier(struct notifier_block *nb); extern int register_memory_isolate_notifier(struct notifier_block *nb); extern void unregister_memory_isolate_notifier(struct notifier_block *nb); -int hotplug_memory_register(int nid, struct mem_section *section); +int create_memory_block_devices(unsigned long start, unsigned long size); extern void unregister_memory_section(struct mem_section *); extern int memory_dev_init(void); extern int memory_notify(unsigned long val, void *v); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 6b5ce0bd907f..414771114685 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -256,13 +256,7 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn, return -EEXIST; ret = sparse_add_one_section(nid, phys_start_pfn, altmap); - if (ret < 0) - return ret; - - if (!want_memblock) - return 0; - - return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn)); + return ret < 0 ? ret : 0; } /* @@ -1096,6 +1090,13 @@ int __ref add_memory_resource(int nid, struct resource *res, bool online) if (ret < 0) goto error; + /* create memory block devices after memory was added */ + ret = create_memory_block_devices(start, size); + if (ret) { + arch_remove_memory(nid, start, size, NULL); + goto error; + } + if (new_node) { /* If sysfs file of new node can't be created, cpu on the node * can't be hot-added. There is no rollback way now.