From patchwork Thu May 21 23:37:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 225469 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B124C433E0 for ; Thu, 21 May 2020 23:54:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDC7E2078B for ; Thu, 21 May 2020 23:54:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729581AbgEUXyB (ORCPT ); Thu, 21 May 2020 19:54:01 -0400 Received: from mga17.intel.com ([192.55.52.151]:39714 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728635AbgEUXyB (ORCPT ); Thu, 21 May 2020 19:54:01 -0400 IronPort-SDR: 1205/Z00YhtV4tB9QGsx/uOe2HwC+V7F6S0OO6/FnupkuP2IbZKSPiENyocuJLgtZzRqjTc1rl PieRSJnsp16Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2020 16:54:00 -0700 IronPort-SDR: uN2JO7VRFCA+BU+ZiUlQtX0aFqBz4P05q3uuqVEjHelJfxM6w40okjP2byV05yOD9ToPqPTWjj ++x+te6Tpbsg== X-IronPort-AV: E=Sophos;i="5.73,419,1583222400"; d="scan'208";a="309216914" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2020 16:54:00 -0700 Subject: [5.4-stable PATCH 1/7] mm/memremap_pages: Kill unused __devm_memremap_pages() From: Dan Williams To: stable@vger.kernel.org Cc: Christoph Hellwig , "Aneesh Kumar K.V" , Christoph Hellwig , vishal.l.verma@intel.com, linux-nvdimm@lists.01.org Date: Thu, 21 May 2020 16:37:48 -0700 Message-ID: <159010426892.1062454.14033665233091025420.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <159010426294.1062454.8853083370975871627.stgit@dwillia2-desk3.amr.corp.intel.com> References: <159010426294.1062454.8853083370975871627.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org Commit 1d0827b75ee7df497f611a2ac412a88135fb0ef5 upstream. Kill this definition that was introduced in commit 41e94a851304 ("add devm_memremap_pages") add never used. Cc: Christoph Hellwig Reviewed-by: Aneesh Kumar K.V Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/158041476158.3889308.4221100673554151124.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- include/linux/io.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/io.h b/include/linux/io.h index a59834bc0a11..35e8d84935e0 100644 --- a/include/linux/io.h +++ b/include/linux/io.h @@ -79,8 +79,6 @@ void *devm_memremap(struct device *dev, resource_size_t offset, size_t size, unsigned long flags); void devm_memunmap(struct device *dev, void *addr); -void *__devm_memremap_pages(struct device *dev, struct resource *res); - #ifdef CONFIG_PCI /* * The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and From patchwork Thu May 21 23:37:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 225468 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 024F2C433DF for ; Thu, 21 May 2020 23:54:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CF85520679 for ; Thu, 21 May 2020 23:54:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729729AbgEUXyM (ORCPT ); Thu, 21 May 2020 19:54:12 -0400 Received: from mga05.intel.com ([192.55.52.43]:42215 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729651AbgEUXyM (ORCPT ); Thu, 21 May 2020 19:54:12 -0400 IronPort-SDR: 7b8ssZEwRR2BLX9Sxivb/3zw4L+Xh6PZUCka2Y/fJ9YC/XGt2Su22qhFjiCgMBejqrhE1cCgVP 1aKeaNlAp4vw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2020 16:54:11 -0700 IronPort-SDR: jC4NIxANmRw8At+c3PgCbb7hwea+BXWGo4yZ3uqm2DPl+lDjqCIpgJ+azYgPUWWP8rGeus0cJ5 TsUeO52fiPTg== X-IronPort-AV: E=Sophos;i="5.73,419,1583222400"; d="scan'208";a="254130247" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2020 16:54:11 -0700 Subject: [5.4-stable PATCH 3/7] libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid From: Dan Williams To: stable@vger.kernel.org Cc: "Aneesh Kumar K.V" , Jeff Moyer , "Aneesh Kumar K.V" , vishal.l.verma@intel.com, hch@lst.de, linux-nvdimm@lists.01.org Date: Thu, 21 May 2020 16:37:59 -0700 Message-ID: <159010427971.1062454.10990062889525749549.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <159010426294.1062454.8853083370975871627.stgit@dwillia2-desk3.amr.corp.intel.com> References: <159010426294.1062454.8853083370975871627.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org Commit b2ba7e91fa81bec9b64c47ab852145559cad2b68 upstream. The EOPNOTSUPP return code from the pmem driver indicates that the namespace has a configuration that may be valid, but the current kernel does not support it. Expand this to all of the nd_pfn_validate() error conditions after the infoblock has been verified as self consistent. This prevents exposing the namespace to I/O when the infoblock needs to be corrected, or the system needs to be put into a different configuration (like changing the page size on PowerPC). Cc: Aneesh Kumar K.V Cc: Jeff Moyer Reviewed-by: Aneesh Kumar K.V Signed-off-by: Dan Williams --- drivers/nvdimm/pfn_devs.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index aa144c8a4ee6..8c5b13567f55 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -560,14 +560,14 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) dev_dbg(&nd_pfn->dev, "align: %lx:%lx mode: %d:%d\n", nd_pfn->align, align, nd_pfn->mode, mode); - return -EINVAL; + return -EOPNOTSUPP; } } if (align > nvdimm_namespace_capacity(ndns)) { dev_err(&nd_pfn->dev, "alignment: %lx exceeds capacity %llx\n", align, nvdimm_namespace_capacity(ndns)); - return -EINVAL; + return -EOPNOTSUPP; } /* @@ -580,7 +580,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) if (offset >= resource_size(&nsio->res)) { dev_err(&nd_pfn->dev, "pfn array size exceeds capacity of %s\n", dev_name(&ndns->dev)); - return -EBUSY; + return -EOPNOTSUPP; } if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align)) @@ -588,7 +588,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) dev_err(&nd_pfn->dev, "bad offset: %#llx dax disabled align: %#lx\n", offset, align); - return -ENXIO; + return -EOPNOTSUPP; } return nd_pfn_clear_memmap_errors(nd_pfn); From patchwork Thu May 21 23:38:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 225467 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6145C433DF for ; Thu, 21 May 2020 23:54:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A98A52078B for ; Thu, 21 May 2020 23:54:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729444AbgEUXyX (ORCPT ); Thu, 21 May 2020 19:54:23 -0400 Received: from mga14.intel.com ([192.55.52.115]:53225 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728537AbgEUXyX (ORCPT ); Thu, 21 May 2020 19:54:23 -0400 IronPort-SDR: 177wbTtqwiMn8sB+Xy3O05CmdLbeWfMlClGtChqA16bZtALbbxrFqy48k2szadl7AWWQb28RYP 0d6J+xu0Kk/Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2020 16:54:22 -0700 IronPort-SDR: ZqJXlm+G3u6fJYil8N129n5yuQzUGfBMLNNW0IWI7CjnLCnOZjqay9UvrpLZhv/gUCpLLM6/6i fv5Bb081zJYQ== X-IronPort-AV: E=Sophos;i="5.73,419,1583222400"; d="scan'208";a="255450425" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2020 16:54:22 -0700 Subject: [5.4-stable PATCH 5/7] libnvdimm/namespace: Enforce memremap_compat_align() From: Dan Williams To: stable@vger.kernel.org Cc: "Aneesh Kumar K.V" , Jeff Moyer , "Aneesh Kumar K.V" , vishal.l.verma@intel.com, hch@lst.de, linux-nvdimm@lists.01.org Date: Thu, 21 May 2020 16:38:10 -0700 Message-ID: <159010429083.1062454.2466252359997112899.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <159010426294.1062454.8853083370975871627.stgit@dwillia2-desk3.amr.corp.intel.com> References: <159010426294.1062454.8853083370975871627.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org Commit 6acd7d5ef264d8e9a8988cebf6eeb3567eaf60c6 upstream. The pmem driver on PowerPC crashes with the following signature when instantiating misaligned namespaces that map their capacity via memremap_pages(). BUG: Unable to handle kernel data access at 0xc001000406000000 Faulting instruction address: 0xc000000000090790 NIP [c000000000090790] arch_add_memory+0xc0/0x130 LR [c000000000090744] arch_add_memory+0x74/0x130 Call Trace: arch_add_memory+0x74/0x130 (unreliable) memremap_pages+0x74c/0xa30 devm_memremap_pages+0x3c/0xa0 pmem_attach_disk+0x188/0x770 nvdimm_bus_probe+0xd8/0x470 With the assumption that only memremap_pages() has alignment constraints, enforce memremap_compat_align() for pmem_should_map_pages(), nd_pfn, and nd_dax cases. This includes preventing the creation of namespaces where the base address is misaligned and cases there infoblock padding parameters are invalid. Reported-by: Aneesh Kumar K.V Cc: Jeff Moyer Fixes: a3619190d62e ("libnvdimm/pfn: stop padding pmem namespaces to section alignment") Reviewed-by: Aneesh Kumar K.V Signed-off-by: Dan Williams --- drivers/nvdimm/namespace_devs.c | 17 +++++++++++++++++ drivers/nvdimm/pfn.h | 12 ++++++++++++ drivers/nvdimm/pfn_devs.c | 32 +++++++++++++++++++++++++++++--- 3 files changed, 58 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index cca0a3ba1d2c..b2db15250e0d 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -10,6 +10,7 @@ #include #include "nd-core.h" #include "pmem.h" +#include "pfn.h" #include "nd.h" static void namespace_io_release(struct device *dev) @@ -1735,6 +1736,22 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev) return ERR_PTR(-ENODEV); } + /* + * Note, alignment validation for fsdax and devdax mode + * namespaces happens in nd_pfn_validate() where infoblock + * padding parameters can be applied. + */ + if (pmem_should_map_pages(dev)) { + struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev); + struct resource *res = &nsio->res; + + if (!IS_ALIGNED(res->start | (res->end + 1), + memremap_compat_align())) { + dev_err(&ndns->dev, "%pr misaligned, unable to map\n", res); + return ERR_PTR(-EOPNOTSUPP); + } + } + if (is_namespace_pmem(&ndns->dev)) { struct nd_namespace_pmem *nspm; diff --git a/drivers/nvdimm/pfn.h b/drivers/nvdimm/pfn.h index acb19517f678..37cb1b8a2a39 100644 --- a/drivers/nvdimm/pfn.h +++ b/drivers/nvdimm/pfn.h @@ -24,6 +24,18 @@ struct nd_pfn_sb { __le64 npfns; __le32 mode; /* minor-version-1 additions for section alignment */ + /** + * @start_pad: Deprecated attribute to pad start-misaligned namespaces + * + * start_pad is deprecated because the original definition did + * not comprehend that dataoff is relative to the base address + * of the namespace not the start_pad adjusted base. The result + * is that the dax path is broken, but the block-I/O path is + * not. The kernel will no longer create namespaces using start + * padding, but it still supports block-I/O for legacy + * configurations mainly to allow a backup, reconfigure the + * namespace, and restore flow to repair dax operation. + */ __le32 start_pad; __le32 end_trunc; /* minor-version-2 record the base alignment of the mapping */ diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 6e5b042f453e..18a2602d93e7 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -445,6 +445,7 @@ static bool nd_supported_alignment(unsigned long align) int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) { u64 checksum, offset; + struct resource *res; enum nd_pfn_mode mode; struct nd_namespace_io *nsio; unsigned long align, start_pad; @@ -577,13 +578,14 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) * established. */ nsio = to_nd_namespace_io(&ndns->dev); - if (offset >= resource_size(&nsio->res)) { + res = &nsio->res; + if (offset >= resource_size(res)) { dev_err(&nd_pfn->dev, "pfn array size exceeds capacity of %s\n", dev_name(&ndns->dev)); return -EOPNOTSUPP; } - if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align)) + if ((align && !IS_ALIGNED(res->start + offset + start_pad, align)) || !IS_ALIGNED(offset, PAGE_SIZE)) { dev_err(&nd_pfn->dev, "bad offset: %#llx dax disabled align: %#lx\n", @@ -591,6 +593,18 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) return -EOPNOTSUPP; } + if (!IS_ALIGNED(res->start + le32_to_cpu(pfn_sb->start_pad), + memremap_compat_align())) { + dev_err(&nd_pfn->dev, "resource start misaligned\n"); + return -EOPNOTSUPP; + } + + if (!IS_ALIGNED(res->end + 1 - le32_to_cpu(pfn_sb->end_trunc), + memremap_compat_align())) { + dev_err(&nd_pfn->dev, "resource end misaligned\n"); + return -EOPNOTSUPP; + } + return 0; } EXPORT_SYMBOL(nd_pfn_validate); @@ -754,7 +768,19 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) start = nsio->res.start; size = resource_size(&nsio->res); npfns = PHYS_PFN(size - SZ_8K); - align = max(nd_pfn->align, SUBSECTION_SIZE); + align = max(nd_pfn->align, memremap_compat_align()); + + /* + * When @start is misaligned fail namespace creation. See + * the 'struct nd_pfn_sb' commentary on why ->start_pad is not + * an option. + */ + if (!IS_ALIGNED(start, memremap_compat_align())) { + dev_err(&nd_pfn->dev, "%s: start %pa misaligned to %#lx\n", + dev_name(&ndns->dev), &start, + memremap_compat_align()); + return -EINVAL; + } end_trunc = start + size - ALIGN_DOWN(start + size, align); if (nd_pfn->mode == PFN_MODE_PMEM) { /* From patchwork Thu May 21 23:38:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 225466 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04E80C433E0 for ; Thu, 21 May 2020 23:54:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D51EE20679 for ; Thu, 21 May 2020 23:54:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729717AbgEUXye (ORCPT ); Thu, 21 May 2020 19:54:34 -0400 Received: from mga12.intel.com ([192.55.52.136]:44942 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728537AbgEUXye (ORCPT ); Thu, 21 May 2020 19:54:34 -0400 IronPort-SDR: 1r6g6CflE8l2remSlFHFlZCaP6sDXUDhmmg5WWFBGq4JyQP3zygtPGa9B/ebKRkIyiETOxd8dc 2r5w/XGDXaKQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2020 16:54:33 -0700 IronPort-SDR: 5Ud6ZE47rEf3Iem7nxhueUy8uYmGlg1u2XprUu6TAS1G9rG1iDz8ZnjrkasDd1n+jkjnSYCrcG yC5W/cJPcAQA== X-IronPort-AV: E=Sophos;i="5.73,419,1583222400"; d="scan'208";a="412582849" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2020 16:54:33 -0700 Subject: [5.4-stable PATCH 7/7] libnvdimm/region: Introduce an 'align' attribute From: Dan Williams To: stable@vger.kernel.org Cc: "Aneesh Kumar K.V" , Jeff Moyer , "Aneesh Kumar K.V" , Jeff Moyer , vishal.l.verma@intel.com, hch@lst.de, linux-nvdimm@lists.01.org Date: Thu, 21 May 2020 16:38:22 -0700 Message-ID: <159010430200.1062454.8543650231999164978.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <159010426294.1062454.8853083370975871627.stgit@dwillia2-desk3.amr.corp.intel.com> References: <159010426294.1062454.8853083370975871627.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org Commit 2522afb86a8cceba0f67dbf05772d21b76d79f06 upstream. The align attribute applies an alignment constraint for namespace creation in a region. Whereas the 'align' attribute of a namespace applied alignment padding via an info block, the 'align' attribute applies alignment constraints to the free space allocation. The default for 'align' is the maximum known memremap_compat_align() across all archs (16MiB from PowerPC at time of writing) multiplied by the number of interleave ways if there is blk-aliasing. The minimum is PAGE_SIZE and allows for the creation of cross-arch incompatible namespaces, just as previous kernels allowed, but the expectation is cross-arch and mode-independent compatibility by default. The regression risk with this change is limited to cases that were dependent on the ability to create unaligned namespaces, *and* for some reason are unable to opt-out of aligned namespaces by writing to 'regionX/align'. If such a scenario arises the default can be flipped from opt-out to opt-in of compat-aligned namespace creation, but that is a last resort. The kernel will otherwise continue to support existing defined misaligned namespaces. Unfortunately this change needs to touch several parts of the implementation at once: - region/available_size: expand busy extents to current align - region/max_available_extent: expand busy extents to current align - namespace/size: trim free space to current align ...to keep the free space accounting conforming to the dynamic align setting. Reported-by: Aneesh Kumar K.V Reported-by: Jeff Moyer Reviewed-by: Aneesh Kumar K.V Reviewed-by: Jeff Moyer Link: https://lore.kernel.org/r/158041478371.3889308.14542630147672668068.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- drivers/nvdimm/dimm_devs.c | 86 +++++++++++++++++++++++----- drivers/nvdimm/namespace_devs.c | 9 ++- drivers/nvdimm/nd.h | 1 drivers/nvdimm/region_devs.c | 120 ++++++++++++++++++++++++++++++++++++--- 4 files changed, 190 insertions(+), 26 deletions(-) diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index def5c2846bea..8c773fb60296 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -557,6 +557,21 @@ int nvdimm_security_freeze(struct nvdimm *nvdimm) return rc; } +static unsigned long dpa_align(struct nd_region *nd_region) +{ + struct device *dev = &nd_region->dev; + + if (dev_WARN_ONCE(dev, !is_nvdimm_bus_locked(dev), + "bus lock required for capacity provision\n")) + return 0; + if (dev_WARN_ONCE(dev, !nd_region->ndr_mappings || nd_region->align + % nd_region->ndr_mappings, + "invalid region align %#lx mappings: %d\n", + nd_region->align, nd_region->ndr_mappings)) + return 0; + return nd_region->align / nd_region->ndr_mappings; +} + int alias_dpa_busy(struct device *dev, void *data) { resource_size_t map_end, blk_start, new; @@ -565,6 +580,7 @@ int alias_dpa_busy(struct device *dev, void *data) struct nd_region *nd_region; struct nvdimm_drvdata *ndd; struct resource *res; + unsigned long align; int i; if (!is_memory(dev)) @@ -602,13 +618,21 @@ int alias_dpa_busy(struct device *dev, void *data) * Find the free dpa from the end of the last pmem allocation to * the end of the interleave-set mapping. */ + align = dpa_align(nd_region); + if (!align) + return 0; + for_each_dpa_resource(ndd, res) { + resource_size_t start, end; + if (strncmp(res->name, "pmem", 4) != 0) continue; - if ((res->start >= blk_start && res->start < map_end) - || (res->end >= blk_start - && res->end <= map_end)) { - new = max(blk_start, min(map_end + 1, res->end + 1)); + + start = ALIGN_DOWN(res->start, align); + end = ALIGN(res->end + 1, align) - 1; + if ((start >= blk_start && start < map_end) + || (end >= blk_start && end <= map_end)) { + new = max(blk_start, min(map_end, end) + 1); if (new != blk_start) { blk_start = new; goto retry; @@ -648,6 +672,7 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region) .res = NULL, }; struct resource *res; + unsigned long align; if (!ndd) return 0; @@ -655,10 +680,20 @@ resource_size_t nd_blk_available_dpa(struct nd_region *nd_region) device_for_each_child(&nvdimm_bus->dev, &info, alias_dpa_busy); /* now account for busy blk allocations in unaliased dpa */ + align = dpa_align(nd_region); + if (!align) + return 0; for_each_dpa_resource(ndd, res) { + resource_size_t start, end, size; + if (strncmp(res->name, "blk", 3) != 0) continue; - info.available -= resource_size(res); + start = ALIGN_DOWN(res->start, align); + end = ALIGN(res->end + 1, align) - 1; + size = end - start + 1; + if (size >= info.available) + return 0; + info.available -= size; } return info.available; @@ -677,19 +712,31 @@ resource_size_t nd_pmem_max_contiguous_dpa(struct nd_region *nd_region, struct nvdimm_bus *nvdimm_bus; resource_size_t max = 0; struct resource *res; + unsigned long align; /* if a dimm is disabled the available capacity is zero */ if (!ndd) return 0; + align = dpa_align(nd_region); + if (!align) + return 0; + nvdimm_bus = walk_to_nvdimm_bus(ndd->dev); if (__reserve_free_pmem(&nd_region->dev, nd_mapping->nvdimm)) return 0; for_each_dpa_resource(ndd, res) { + resource_size_t start, end; + if (strcmp(res->name, "pmem-reserve") != 0) continue; - if (resource_size(res) > max) - max = resource_size(res); + /* trim free space relative to current alignment setting */ + start = ALIGN(res->start, align); + end = ALIGN_DOWN(res->end + 1, align) - 1; + if (end < start) + continue; + if (end - start + 1 > max) + max = end - start + 1; } release_free_pmem(nvdimm_bus, nd_mapping); return max; @@ -717,24 +764,33 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region, struct nvdimm_drvdata *ndd = to_ndd(nd_mapping); struct resource *res; const char *reason; + unsigned long align; if (!ndd) return 0; + align = dpa_align(nd_region); + if (!align) + return 0; + map_start = nd_mapping->start; map_end = map_start + nd_mapping->size - 1; blk_start = max(map_start, map_end + 1 - *overlap); for_each_dpa_resource(ndd, res) { - if (res->start >= map_start && res->start < map_end) { + resource_size_t start, end; + + start = ALIGN_DOWN(res->start, align); + end = ALIGN(res->end + 1, align) - 1; + if (start >= map_start && start < map_end) { if (strncmp(res->name, "blk", 3) == 0) blk_start = min(blk_start, - max(map_start, res->start)); - else if (res->end > map_end) { + max(map_start, start)); + else if (end > map_end) { reason = "misaligned to iset"; goto err; } else - busy += resource_size(res); - } else if (res->end >= map_start && res->end <= map_end) { + busy += end - start + 1; + } else if (end >= map_start && end <= map_end) { if (strncmp(res->name, "blk", 3) == 0) { /* * If a BLK allocation overlaps the start of @@ -743,8 +799,8 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region, */ blk_start = map_start; } else - busy += resource_size(res); - } else if (map_start > res->start && map_start < res->end) { + busy += end - start + 1; + } else if (map_start > start && map_start < end) { /* total eclipse of the mapping */ busy += nd_mapping->size; blk_start = map_start; @@ -754,7 +810,7 @@ resource_size_t nd_pmem_available_dpa(struct nd_region *nd_region, *overlap = map_end + 1 - blk_start; available = blk_start - map_start; if (busy < available) - return available - busy; + return ALIGN_DOWN(available - busy, align); return 0; err: diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c index 8cf1c8932a3b..e80de9b9ccce 100644 --- a/drivers/nvdimm/namespace_devs.c +++ b/drivers/nvdimm/namespace_devs.c @@ -568,6 +568,11 @@ static void space_valid(struct nd_region *nd_region, struct nvdimm_drvdata *ndd, { bool is_reserve = strcmp(label_id->id, "pmem-reserve") == 0; bool is_pmem = strncmp(label_id->id, "pmem", 4) == 0; + unsigned long align; + + align = nd_region->align / nd_region->ndr_mappings; + valid->start = ALIGN(valid->start, align); + valid->end = ALIGN_DOWN(valid->end + 1, align) - 1; if (valid->start >= valid->end) goto invalid; @@ -1007,10 +1012,10 @@ static ssize_t __size_store(struct device *dev, unsigned long long val) return -ENXIO; } - div_u64_rem(val, PAGE_SIZE * nd_region->ndr_mappings, &remainder); + div_u64_rem(val, nd_region->align, &remainder); if (remainder) { dev_dbg(dev, "%llu is not %ldK aligned\n", val, - (PAGE_SIZE * nd_region->ndr_mappings) / SZ_1K); + nd_region->align / SZ_1K); return -EINVAL; } diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h index 4a946c019e0f..e37c79c340d6 100644 --- a/drivers/nvdimm/nd.h +++ b/drivers/nvdimm/nd.h @@ -146,6 +146,7 @@ struct nd_region { struct device *btt_seed; struct device *pfn_seed; struct device *dax_seed; + unsigned long align; u16 ndr_mappings; u64 ndr_size; u64 ndr_start; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index b1b13debffbc..7d5ab00c7b45 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -246,21 +246,25 @@ int nd_region_to_nstype(struct nd_region *nd_region) } EXPORT_SYMBOL(nd_region_to_nstype); -static ssize_t size_show(struct device *dev, - struct device_attribute *attr, char *buf) +static unsigned long long region_size(struct nd_region *nd_region) { - struct nd_region *nd_region = to_nd_region(dev); - unsigned long long size = 0; - - if (is_memory(dev)) { - size = nd_region->ndr_size; + if (is_memory(&nd_region->dev)) { + return nd_region->ndr_size; } else if (nd_region->ndr_mappings == 1) { struct nd_mapping *nd_mapping = &nd_region->mapping[0]; - size = nd_mapping->size; + return nd_mapping->size; } - return sprintf(buf, "%llu\n", size); + return 0; +} + +static ssize_t size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nd_region *nd_region = to_nd_region(dev); + + return sprintf(buf, "%llu\n", region_size(nd_region)); } static DEVICE_ATTR_RO(size); @@ -559,6 +563,54 @@ static ssize_t read_only_store(struct device *dev, } static DEVICE_ATTR_RW(read_only); +static ssize_t align_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct nd_region *nd_region = to_nd_region(dev); + + return sprintf(buf, "%#lx\n", nd_region->align); +} + +static ssize_t align_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t len) +{ + struct nd_region *nd_region = to_nd_region(dev); + unsigned long val, dpa; + u32 remainder; + int rc; + + rc = kstrtoul(buf, 0, &val); + if (rc) + return rc; + + if (!nd_region->ndr_mappings) + return -ENXIO; + + /* + * Ensure space-align is evenly divisible by the region + * interleave-width because the kernel typically has no facility + * to determine which DIMM(s), dimm-physical-addresses, would + * contribute to the tail capacity in system-physical-address + * space for the namespace. + */ + dpa = div_u64_rem(val, nd_region->ndr_mappings, &remainder); + if (!is_power_of_2(dpa) || dpa < PAGE_SIZE + || val > region_size(nd_region) || remainder) + return -EINVAL; + + /* + * Given that space allocation consults this value multiple + * times ensure it does not change for the duration of the + * allocation. + */ + nvdimm_bus_lock(dev); + nd_region->align = val; + nvdimm_bus_unlock(dev); + + return len; +} +static DEVICE_ATTR_RW(align); + static ssize_t region_badblocks_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -601,6 +653,7 @@ static DEVICE_ATTR_RO(persistence_domain); static struct attribute *nd_region_attributes[] = { &dev_attr_size.attr, + &dev_attr_align.attr, &dev_attr_nstype.attr, &dev_attr_mappings.attr, &dev_attr_btt_seed.attr, @@ -660,6 +713,19 @@ static umode_t region_visible(struct kobject *kobj, struct attribute *a, int n) return a->mode; } + if (a == &dev_attr_align.attr) { + int i; + + for (i = 0; i < nd_region->ndr_mappings; i++) { + struct nd_mapping *nd_mapping = &nd_region->mapping[i]; + struct nvdimm *nvdimm = nd_mapping->nvdimm; + + if (test_bit(NDD_LABELING, &nvdimm->flags)) + return a->mode; + } + return 0; + } + if (a != &dev_attr_set_cookie.attr && a != &dev_attr_available_size.attr) return a->mode; @@ -930,6 +996,41 @@ void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane) } EXPORT_SYMBOL(nd_region_release_lane); +/* + * PowerPC requires this alignment for memremap_pages(). All other archs + * should be ok with SUBSECTION_SIZE (see memremap_compat_align()). + */ +#define MEMREMAP_COMPAT_ALIGN_MAX SZ_16M + +static unsigned long default_align(struct nd_region *nd_region) +{ + unsigned long align; + int i, mappings; + u32 remainder; + + if (is_nd_blk(&nd_region->dev)) + align = PAGE_SIZE; + else + align = MEMREMAP_COMPAT_ALIGN_MAX; + + for (i = 0; i < nd_region->ndr_mappings; i++) { + struct nd_mapping *nd_mapping = &nd_region->mapping[i]; + struct nvdimm *nvdimm = nd_mapping->nvdimm; + + if (test_bit(NDD_ALIASING, &nvdimm->flags)) { + align = MEMREMAP_COMPAT_ALIGN_MAX; + break; + } + } + + mappings = max_t(u16, 1, nd_region->ndr_mappings); + div_u64_rem(align, mappings, &remainder); + if (remainder) + align *= mappings; + + return align; +} + static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, struct nd_region_desc *ndr_desc, struct device_type *dev_type, const char *caller) @@ -1034,6 +1135,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus, dev->of_node = ndr_desc->of_node; nd_region->ndr_size = resource_size(ndr_desc->res); nd_region->ndr_start = ndr_desc->res->start; + nd_region->align = default_align(nd_region); if (ndr_desc->flush) nd_region->flush = ndr_desc->flush; else