From patchwork Fri Nov 6 23:42:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 322412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CA56C4741F for ; Fri, 6 Nov 2020 23:50:24 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DB1B82087E for ; Fri, 6 Nov 2020 23:50:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="hDXKIku1" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB1B82087E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:46598 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kbBUs-0005Tk-TM for qemu-devel@archiver.kernel.org; Fri, 06 Nov 2020 18:50:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34234) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOF-0005Ll-Mq; Fri, 06 Nov 2020 18:43:31 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:57360) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOB-0003Tq-Uq; Fri, 06 Nov 2020 18:43:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1604706208; x=1636242208; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=G3kD64FZFQBNwCAKQ6g+YVeLzgd0PXwIYGnld+o7ZrQ=; b=hDXKIku1ytFXPVMEpKNccP0xxJQCd8rzV5Jcb3F+Zf6/SrZmTPT3nDbD 5N8HemvSPOSAcq/GhXGnvjXFvWoU1DykCDvHLRRhYzIful7NmxWKPBx4j kbFaX2FPpdT5dNamcfzjHZlpaY4FV8Bz09YjfRx/GHacK9I/YuuUDUOME foIrym1UIwzZY8Ni1dQ3pVTxubFt3PhKfVVSHv2R8ostmi9yPSkeQOStI /AiAWC/vJ1YT7AZKMgDPVJZKGt4AeyI17ZahKonMOtrxMMUec+DHXppfn jzQZkHYRBYcJt4L0CekrthZkeJtFCA4AF+Qo0sDzqA9NjuWfPkgFNJQbO A==; IronPort-SDR: MIbK2XfzX3Aj9jr3cKRczTgNhxYNgzjUEin1UxmU4ok8qnSqfV4Y9EWEcf9QGFJEaw4s+uI/hA 2iDIrYXBmSdz6q2MiJ6kCyT1wfU98F3b9Anmj2cMIBd1oVAuDYJ1SumB0m2toCmLJadT4TAit4 I+thKvI19AVcFF+2++DnoO6uJie64PEe6SO6jA2QwSVnOLUUiCh8N84X62LyT8k+EPQjwzAM+I ubTjWlwCZpafiuOSNl4xqWqSRozOiKOf/lNgMbY5cKOIAVZAOI39EnoJhK0agShonBmB6j3Jvm wIc= X-IronPort-AV: E=Sophos;i="5.77,457,1596470400"; d="scan'208";a="153267061" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Nov 2020 07:43:23 +0800 IronPort-SDR: gxAxOcDNIxokSlWuSZ6a2m/QAIdrDTOIlLXYiyfER92JDrcD9nXxyiJMwA9jfFO365Chp88IOy KvRmEzD+4tEqgfNeYaQ1Ss3midfs8iK80sAxQOiZxkn+Vd8Fx7r00edunO+gz8W/rwx0fqFgVN EHCdAUye2+sHnw/+5x9gGS6yI3TSQelb/XKb3FcKgMyUSwuo6ZZ9QQ+FxlY6MEgiYYgo2HyNUC 3x0FPMlCT+0G+kbUzdWPIBa8sNKQzQUFPIkQQPh9ul7ZCmywTnYcLcHWcCUlxE5fpgqPcMl11e wHB2B1R9KNMyRr1rXORV/bv7 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:28:14 -0800 IronPort-SDR: 3bfwp0CI24LlCyo+hLjKBDRQ9mJhajMPgm+NF5ZDmO1Atutzmgo5eZ2qFzBFSOLH+mbtZXct5y PiDqJWrJRPho08x0H8Qzxb+OR6YPL2bQ455+JWqtyiu/LHThJtKnRTYw2cYaKxbBSUvsT9GOeJ BlopNpWnOfBMVoLJR326JoXQ919aq3pO+d+6kLoVNuJCLpWz1O+/LNn+f3vQiQfbEQDIDzKgU4 +PrFJIuhHVlcrCspaIOJALKw++SHyjSdsU4EMB7nPTuTGfekcxT5FmoM1KsSrW1apkgRhxVLFJ 53U= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip01.wdc.com with ESMTP; 06 Nov 2020 15:43:20 -0800 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Max Reitz , Maxim Levitsky , Fam Zheng Subject: [PATCH v10 04/12] hw/block/nvme: Merge nvme_write_zeroes() with nvme_write() Date: Sat, 7 Nov 2020 08:42:57 +0900 Message-Id: <20201106234305.21339-5-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201106234305.21339-1-dmitry.fomichev@wdc.com> References: <20201106234305.21339-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.45; envelope-from=prvs=572b21b8d=dmitry.fomichev@wdc.com; helo=esa6.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/06 18:43:12 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" nvme_write() now handles WRITE, WRITE ZEROES and ZONE_APPEND. Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel Acked-by: Klaus Jensen --- hw/block/nvme.c | 72 +++++++++++++++++-------------------------- hw/block/trace-events | 1 - 2 files changed, 28 insertions(+), 45 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 770e42a066..48adbe84f5 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1001,32 +1001,7 @@ invalid: return status | NVME_DNR; } -static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) -{ - NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; - NvmeNamespace *ns = req->ns; - uint64_t slba = le64_to_cpu(rw->slba); - uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; - uint64_t offset = nvme_l2b(ns, slba); - uint32_t count = nvme_l2b(ns, nlb); - uint16_t status; - - trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), slba, nlb); - - status = nvme_check_bounds(n, ns, slba, nlb); - if (status) { - trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze); - return status; - } - - block_acct_start(blk_get_stats(req->ns->blkconf.blk), &req->acct, 0, - BLOCK_ACCT_WRITE); - req->aiocb = blk_aio_pwrite_zeroes(req->ns->blkconf.blk, offset, count, - BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req); - return NVME_NO_COMPLETE; -} - -static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool wrz) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; NvmeNamespace *ns = req->ns; @@ -1040,10 +1015,12 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode), nvme_nsid(ns), nlb, data_size, slba); - status = nvme_check_mdts(n, data_size); - if (status) { - trace_pci_nvme_err_mdts(nvme_cid(req), data_size); - goto invalid; + if (!wrz) { + status = nvme_check_mdts(n, data_size); + if (status) { + trace_pci_nvme_err_mdts(nvme_cid(req), data_size); + goto invalid; + } } status = nvme_check_bounds(n, ns, slba, nlb); @@ -1052,21 +1029,28 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req) goto invalid; } - status = nvme_map_dptr(n, data_size, req); - if (status) { - goto invalid; - } - data_offset = nvme_l2b(ns, slba); - block_acct_start(blk_get_stats(blk), &req->acct, data_size, - BLOCK_ACCT_WRITE); - if (req->qsg.sg) { - req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); + if (!wrz) { + status = nvme_map_dptr(n, data_size, req); + if (status) { + goto invalid; + } + + block_acct_start(blk_get_stats(blk), &req->acct, data_size, + BLOCK_ACCT_WRITE); + if (req->qsg.sg) { + req->aiocb = dma_blk_write(blk, &req->qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); + } else { + req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, + nvme_rw_cb, req); + } } else { - req->aiocb = blk_aio_pwritev(blk, data_offset, &req->iov, 0, - nvme_rw_cb, req); + block_acct_start(blk_get_stats(blk), &req->acct, 0, BLOCK_ACCT_WRITE); + req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size, + BDRV_REQ_MAY_UNMAP, nvme_rw_cb, + req); } return NVME_NO_COMPLETE; @@ -1100,9 +1084,9 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_FLUSH: return nvme_flush(n, req); case NVME_CMD_WRITE_ZEROES: - return nvme_write_zeroes(n, req); + return nvme_write(n, req, true); case NVME_CMD_WRITE: - return nvme_write(n, req); + return nvme_write(n, req, false); case NVME_CMD_READ: return nvme_read(n, req); default: diff --git a/hw/block/trace-events b/hw/block/trace-events index 540c600931..e67e96c2b5 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -43,7 +43,6 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char *opna pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_write(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'" -pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32" slba %"PRIu64" nlb %"PRIu32"" pci_nvme_create_sq(uint64_t addr, uint16_t sqid, uint16_t cqid, uint16_t qsize, uint16_t qflags) "create submission queue, addr=0x%"PRIx64", sqid=%"PRIu16", cqid=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16"" pci_nvme_create_cq(uint64_t addr, uint16_t cqid, uint16_t vector, uint16_t size, uint16_t qflags, int ien) "create completion queue, addr=0x%"PRIx64", cqid=%"PRIu16", vector=%"PRIu16", qsize=%"PRIu16", qflags=%"PRIu16", ien=%d" pci_nvme_del_sq(uint16_t qid) "deleting submission queue sqid=%"PRIu16"" From patchwork Fri Nov 6 23:42:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 322415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02855C4741F for ; Fri, 6 Nov 2020 23:44:57 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5EE8920702 for ; Fri, 6 Nov 2020 23:44:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="gkXD4a7D" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5EE8920702 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:59042 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kbBPb-0007En-CX for qemu-devel@archiver.kernel.org; Fri, 06 Nov 2020 18:44:55 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34270) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOH-0005Q4-Dw; Fri, 06 Nov 2020 18:43:33 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:57347) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOE-0003Sk-Bs; Fri, 06 Nov 2020 18:43:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1604706211; x=1636242211; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ioG8CtacGXPm8N0F0AXEBY1dx1hdRzuXtXfPGQ3KeOw=; b=gkXD4a7DRB/of4YkkwVYfZ7NZKailH+LF0XJ4WT2TwbXDG0eyjClXv2c QmKUx3oSmSXB1J+syKD5N7M9GdEzmQnWTra8zsH1oTCSNm7zmggJzbx7x +hWNH4wf6omdw0v6y8nLF5HPV8WbtO2JK7FVs5lM+af6YG59xx6iDw3Qq g5sDc58Ux9gPeFmGm0jldHa/DgkUlvB3v1c7l0ZE5P7myhVeIzdfzbw79 j8zbxwMKIN6MtHVi1gsGAUBAhAydCRJDq9UcBcF/7n8yUHmLdxoukDPmi 4ujAB1DySSGJqIhpHGkPRRjzSuzWsyfDgGbiH5jAeNxj4dWnh6ZbwvXJV Q==; IronPort-SDR: u6QFlePsS1HYcpY3gbs2kHZ2NhedEfaY/ZjIskY5quZuJcPJvRk/HibqYLWnZKjGbJQKCHLgYB 1V8vpifQ5x6kMNMPAwJqzkLhAnGDUXR6TRCNco7wI/tUqf269TKsCkhjGQj7tEPx3fJf/VSe8h glNZmVTmTXeeVmqyni/KC1y4Qpx0Psygz/2zVHLISjAAG4O5f/7Rh/scUhVBH+oaC9Q/dquraq IP+Cvzm3ZT5GiStod0tBktzuCmlV7wV8l0Le33cFv3o0ryAgAF9Ls8lNC8AQIcXqDeI3/jUbvn x8I= X-IronPort-AV: E=Sophos;i="5.77,457,1596470400"; d="scan'208";a="153267067" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Nov 2020 07:43:29 +0800 IronPort-SDR: 37pMl1OvNSBMOm/mn8fcUDGhBYb9Rz6AIDmSqR6ItMvCyJStdcFyf0J/c52YWKnOc2bZYvsfYm bvZ8dOh/g+nn2/K0INqdhj0jfdOxmJHsOoZj1RD7x9TqJa+/bPD6akioxnYaJ6sZVTerdq8KIY YsJya7/TI7+WbSV5zm6NjLpOjoemusaPkRu2Pde46Uw/AB06nRxNjwA4UsWT8wuNy1FAdKL8vp be3EWLWiIjDhZiTDGcFhSSEW26F+LRdlq17PGIAIwxJzpCCMxjj26kbXx8ydZko1xIg17xJn3t wxu8M6DVYTY7GVyrVeeLseH4 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:28:20 -0800 IronPort-SDR: o+e0cQrqoKkmr/p3NqPK4aMCseyJo4zcUdCzJ6MQus6K22rZa9tBMf8egN9GbxtfE83ZyoC9QG eWSp41xvpj4DrfumZj3qU8JVNW7xyJpJvhE7jQJkhLUCVrHn/0w0vn3IUx04O7lWYRA0uKnUUZ dkLm8CmjcfMgACde3P7FBl+8hFcwclAa+EZ+4GHeGlQg9Ety3pZYtE7jV0Umu0wsHcjFo/Kobo vMlkk7DLJ3/zyqtwzDxkC+S3kpSTqOUWyD+zdl5oKWGKY9vjElSGDIWErqzP6NOepZ+sEHG8Wx euk= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip01.wdc.com with ESMTP; 06 Nov 2020 15:43:25 -0800 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Max Reitz , Maxim Levitsky , Fam Zheng Subject: [PATCH v10 06/12] hw/block/nvme: Support allocated CNS command variants Date: Sat, 7 Nov 2020 08:42:59 +0900 Message-Id: <20201106234305.21339-7-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201106234305.21339-1-dmitry.fomichev@wdc.com> References: <20201106234305.21339-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.45; envelope-from=prvs=572b21b8d=dmitry.fomichev@wdc.com; helo=esa6.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/06 18:43:12 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Niklas Cassel Many CNS commands have "allocated" command variants. These include a namespace as long as it is allocated, that is a namespace is included regardless if it is active (attached) or not. While these commands are optional (they are mandatory for controllers supporting the namespace attachment command), our QEMU implementation is more complete by actually providing support for these CNS values. However, since our QEMU model currently does not support the namespace attachment command, these new allocated CNS commands will return the same result as the active CNS command variants. In NVMe, a namespace is active if it exists and is attached to the controller. Add a new Boolean namespace flag, "attached", to provide the most basic namespace attachment support. The default value for this new flag is true. Also, implement the logic in the new CNS values to include/exclude namespaces based on this new property. The only thing missing is hooking up the actual Namespace Attachment command opcode, which will allow a user to toggle the "attached" flag per namespace. The reason for not hooking up this command completely is because the NVMe specification requires the namespace management command to be supported if the namespace attachment command is supported. Signed-off-by: Niklas Cassel Signed-off-by: Dmitry Fomichev Reviewed-by: Keith Busch --- hw/block/nvme-ns.h | 1 + include/block/nvme.h | 20 +++++++++++-------- hw/block/nvme-ns.c | 1 + hw/block/nvme.c | 46 +++++++++++++++++++++++++++++++++----------- 4 files changed, 49 insertions(+), 19 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index d795e44bab..2d9cd29d07 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -31,6 +31,7 @@ typedef struct NvmeNamespace { int64_t size; NvmeIdNs id_ns; const uint32_t *iocs; + bool attached; uint8_t csi; NvmeNamespaceParams params; diff --git a/include/block/nvme.h b/include/block/nvme.h index af23514713..394db19022 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -806,14 +806,18 @@ typedef struct QEMU_PACKED NvmePSD { #define NVME_IDENTIFY_DATA_SIZE 4096 enum NvmeIdCns { - NVME_ID_CNS_NS = 0x00, - NVME_ID_CNS_CTRL = 0x01, - NVME_ID_CNS_NS_ACTIVE_LIST = 0x02, - NVME_ID_CNS_NS_DESCR_LIST = 0x03, - NVME_ID_CNS_CS_NS = 0x05, - NVME_ID_CNS_CS_CTRL = 0x06, - NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, - NVME_ID_CNS_IO_COMMAND_SET = 0x1c, + NVME_ID_CNS_NS = 0x00, + NVME_ID_CNS_CTRL = 0x01, + NVME_ID_CNS_NS_ACTIVE_LIST = 0x02, + NVME_ID_CNS_NS_DESCR_LIST = 0x03, + NVME_ID_CNS_CS_NS = 0x05, + NVME_ID_CNS_CS_CTRL = 0x06, + NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, + NVME_ID_CNS_NS_PRESENT_LIST = 0x10, + NVME_ID_CNS_NS_PRESENT = 0x11, + NVME_ID_CNS_CS_NS_PRESENT_LIST = 0x1a, + NVME_ID_CNS_CS_NS_PRESENT = 0x1b, + NVME_ID_CNS_IO_COMMAND_SET = 0x1c, }; typedef struct QEMU_PACKED NvmeIdCtrl { diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index c0362426cc..e191ef9be0 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -42,6 +42,7 @@ static void nvme_ns_init(NvmeNamespace *ns) id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns)); ns->csi = NVME_CSI_NVM; + ns->attached = true; /* no thin provisioning */ id_ns->ncap = id_ns->nsze; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index bb82dd9975..7495cdb5ef 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1236,6 +1236,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, uint32_t trans_len; NvmeNamespace *ns; time_t current_ms; + int i; if (off >= sizeof(smart)) { return NVME_INVALID_FIELD | NVME_DNR; @@ -1246,10 +1247,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, if (!ns) { return NVME_INVALID_NSID | NVME_DNR; } - nvme_set_blk_stats(ns, &stats); } else { - int i; - for (i = 1; i <= n->num_namespaces; i++) { ns = nvme_ns(n, i); if (!ns) { @@ -1552,7 +1550,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1569,11 +1568,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) return nvme_rpt_empty_id_struct(n, req); } + if (only_active && !ns->attached) { + return nvme_rpt_empty_id_struct(n, req); + } + return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1590,6 +1594,10 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) return nvme_rpt_empty_id_struct(n, req); } + if (only_active && !ns->attached) { + return nvme_rpt_empty_id_struct(n, req); + } + if (c->csi == NVME_CSI_NVM) { return nvme_rpt_empty_id_struct(n, req); } @@ -1597,7 +1605,8 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1627,6 +1636,9 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) if (ns->params.nsid <= min_nsid) { continue; } + if (only_active && !ns->attached) { + continue; + } list_ptr[j++] = cpu_to_le32(ns->params.nsid); if (j == data_len / sizeof(uint32_t)) { break; @@ -1636,7 +1648,8 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } -static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)&req->cmd; @@ -1667,6 +1680,9 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req) if (ns->params.nsid <= min_nsid) { continue; } + if (only_active && !ns->attached) { + continue; + } list_ptr[j++] = cpu_to_le32(ns->params.nsid); if (j == data_len / sizeof(uint32_t)) { break; @@ -1740,17 +1756,25 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) switch (le32_to_cpu(c->cns)) { case NVME_ID_CNS_NS: - return nvme_identify_ns(n, req); + /* fall through */ + case NVME_ID_CNS_NS_PRESENT: + return nvme_identify_ns(n, req, true); case NVME_ID_CNS_CS_NS: - return nvme_identify_ns_csi(n, req); + /* fall through */ + case NVME_ID_CNS_CS_NS_PRESENT: + return nvme_identify_ns_csi(n, req, true); case NVME_ID_CNS_CTRL: return nvme_identify_ctrl(n, req); case NVME_ID_CNS_CS_CTRL: return nvme_identify_ctrl_csi(n, req); case NVME_ID_CNS_NS_ACTIVE_LIST: - return nvme_identify_nslist(n, req); + /* fall through */ + case NVME_ID_CNS_NS_PRESENT_LIST: + return nvme_identify_nslist(n, req, true); case NVME_ID_CNS_CS_NS_ACTIVE_LIST: - return nvme_identify_nslist_csi(n, req); + /* fall through */ + case NVME_ID_CNS_CS_NS_PRESENT_LIST: + return nvme_identify_nslist_csi(n, req, true); case NVME_ID_CNS_NS_DESCR_LIST: return nvme_identify_ns_descr_list(n, req); case NVME_ID_CNS_IO_COMMAND_SET: From patchwork Fri Nov 6 23:43:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 322410 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA674C2D0A3 for ; Fri, 6 Nov 2020 23:54:53 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 361F1206FC for ; Fri, 6 Nov 2020 23:54:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="RiAy3J0c" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 361F1206FC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:56556 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kbBZE-0001M4-9H for qemu-devel@archiver.kernel.org; Fri, 06 Nov 2020 18:54:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34282) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOI-0005SA-CI; Fri, 06 Nov 2020 18:43:34 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:57360) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOG-0003Tq-BK; Fri, 06 Nov 2020 18:43:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1604706213; x=1636242213; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=g063222WlgUGzbpsQdE8ESMz45RVE42CPVFav5X88Us=; b=RiAy3J0c0dC7az2FYMAGsZTX/OKGaV0IrvONIGAsMtfJTbNFtvOd0uPR 8VRNl2DKzi0ND3cGxVZvStcwRANMGoPV2DmZeNPOb0LSkMtOdGMKsEWVj HI3QJXBQtlJ7krddekixobnCHWv25A18KPIMN9y70nMw6XJZN9ZpWNcCN 6lxNohEeP+xp+84b1W5LVt2zsSji+uEp+C2F09MJJ/VErBju5E/J+Ploy ohIz5Z7E+qRBlyRyXmXB4p7RpgCcWAsEtfb1MaiRuT98sml9kqn1qvMWf 1fsQeodnVsRblf2ziqgPde2VUfzLuPlsUd/hsF8FZzimMQhGxzP7MjzXg A==; IronPort-SDR: ewtCoAUuunRnAMMQ2l9Ozj8HLFi3ZuScEqUWOXZcQO3Mh/pddoae8w00SDad3UC6j/sTbwL+gJ 4jtyUOsyaMXw1rk+9d79h3uO2bezqSi7FI2p3OUClZO338NiMJdmE9bOfahInF1Ij4A77Jmgpk NjLD2M0ACe1T82F4B9c5qS1LqhN4lgLHUUfTNjHdMoRQsQW67YPIp8EwaqbAlHt7etmuFH1mNi JvfxRVVdlE3E3dfUKYRDJymumnRzX5Pkdf/L616Gky9jMV4feSx6wZ+6Hsy+Y0Q+WFTZ7o1qqC aQY= X-IronPort-AV: E=Sophos;i="5.77,457,1596470400"; d="scan'208";a="153267071" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Nov 2020 07:43:31 +0800 IronPort-SDR: 55o6HdOnW4Vv/pQrCMmRc0LSt0bLVUfySDVvDgOtT9UjCjNaBvIZDdA0Ij/1QbCRuvxsWJI+Et Yj7Z4vPFRUXwq39iY4tNQVqvetlEM/y+v6fqvhu9oZFUCZMULQEm4DkvnFyi6OXPN7fkN1D+fp K2O4sG/vcHrSPybIowgxR+ONfvulnDc8v/ZNMDpHbXXRRKd9dkZcO5843vbLw3U1TmdZI6qspg f1Aiobw9+3Mupsw5mFKsj+ErDdebDiis3XvmnPfuddP47CBG+qsCYrN69/sZpO7m+sS8+aB+3R Ymg4EqwTM9dN7rwZWR+jt1gK Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:28:22 -0800 IronPort-SDR: wxDKU7jQOtvn0suT9acN7pJP29WOvGxiLY7MrNbCLrmM1WzN72iJAlL/qs2IpuAIIbKxOiyULo TB5PSdpM/4JkcQvt4sbLn3tUaYimpVZPkp8rmPSJEuv25yTS3V/NrC2goEpXr9XtjrkT6PuNWo l0szem1Dl8kSNR9ECIIcRFq932ce6DtEdWarIuv1zIdOVBj4pPV1wmnsyrMdyN6n2srs3jWgTU VUovQpzraIsvvRSYXFnkOpZ8iqwG648MYsAlUkIpmwoPq8eGO91uERpKWtI1hpbciBYFH1vFAz cEQ= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip01.wdc.com with ESMTP; 06 Nov 2020 15:43:28 -0800 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Max Reitz , Maxim Levitsky , Fam Zheng Subject: [PATCH v10 07/12] block/nvme: Make ZNS-related definitions Date: Sat, 7 Nov 2020 08:43:00 +0900 Message-Id: <20201106234305.21339-8-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201106234305.21339-1-dmitry.fomichev@wdc.com> References: <20201106234305.21339-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.45; envelope-from=prvs=572b21b8d=dmitry.fomichev@wdc.com; helo=esa6.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/06 18:43:12 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053). Signed-off-by: Dmitry Fomichev --- include/block/nvme.h | 114 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 113 insertions(+), 1 deletion(-) diff --git a/include/block/nvme.h b/include/block/nvme.h index 394db19022..752623b4f9 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -489,6 +489,9 @@ enum NvmeIoCommands { NVME_CMD_COMPARE = 0x05, NVME_CMD_WRITE_ZEROES = 0x08, NVME_CMD_DSM = 0x09, + NVME_CMD_ZONE_MGMT_SEND = 0x79, + NVME_CMD_ZONE_MGMT_RECV = 0x7a, + NVME_CMD_ZONE_APPEND = 0x7d, }; typedef struct QEMU_PACKED NvmeDeleteQ { @@ -648,9 +651,13 @@ typedef struct QEMU_PACKED NvmeAerResult { uint8_t resv; } NvmeAerResult; +typedef struct QEMU_PACKED NvmeZonedResult { + uint64_t slba; +} NvmeZonedResult; + typedef struct QEMU_PACKED NvmeCqe { uint32_t result; - uint32_t rsvd; + uint32_t dw1; uint16_t sq_head; uint16_t sq_id; uint16_t cid; @@ -679,6 +686,7 @@ enum NvmeStatusCodes { NVME_INVALID_USE_OF_CMB = 0x0012, NVME_INVALID_PRP_OFFSET = 0x0013, NVME_CMD_SET_CMB_REJECTED = 0x002b, + NVME_INVALID_CMD_SET = 0x002c, NVME_LBA_RANGE = 0x0080, NVME_CAP_EXCEEDED = 0x0081, NVME_NS_NOT_READY = 0x0082, @@ -703,6 +711,14 @@ enum NvmeStatusCodes { NVME_CONFLICTING_ATTRS = 0x0180, NVME_INVALID_PROT_INFO = 0x0181, NVME_WRITE_TO_RO = 0x0182, + NVME_ZONE_BOUNDARY_ERROR = 0x01b8, + NVME_ZONE_FULL = 0x01b9, + NVME_ZONE_READ_ONLY = 0x01ba, + NVME_ZONE_OFFLINE = 0x01bb, + NVME_ZONE_INVALID_WRITE = 0x01bc, + NVME_ZONE_TOO_MANY_ACTIVE = 0x01bd, + NVME_ZONE_TOO_MANY_OPEN = 0x01be, + NVME_ZONE_INVAL_TRANSITION = 0x01bf, NVME_WRITE_FAULT = 0x0280, NVME_UNRECOVERED_READ = 0x0281, NVME_E2E_GUARD_ERROR = 0x0282, @@ -887,6 +903,11 @@ typedef struct QEMU_PACKED NvmeIdCtrl { uint8_t vs[1024]; } NvmeIdCtrl; +typedef struct NvmeIdCtrlZoned { + uint8_t zasl; + uint8_t rsvd1[4095]; +} NvmeIdCtrlZoned; + enum NvmeIdCtrlOacs { NVME_OACS_SECURITY = 1 << 0, NVME_OACS_FORMAT = 1 << 1, @@ -1012,6 +1033,12 @@ typedef struct QEMU_PACKED NvmeLBAF { uint8_t rp; } NvmeLBAF; +typedef struct QEMU_PACKED NvmeLBAFE { + uint64_t zsze; + uint8_t zdes; + uint8_t rsvd9[7]; +} NvmeLBAFE; + #define NVME_NSID_BROADCAST 0xffffffff typedef struct QEMU_PACKED NvmeIdNs { @@ -1066,10 +1093,24 @@ enum NvmeNsIdentifierType { enum NvmeCsi { NVME_CSI_NVM = 0x00, + NVME_CSI_ZONED = 0x02, }; #define NVME_SET_CSI(vec, csi) (vec |= (uint8_t)(1 << (csi))) +typedef struct QEMU_PACKED NvmeIdNsZoned { + uint16_t zoc; + uint16_t ozcs; + uint32_t mar; + uint32_t mor; + uint32_t rrl; + uint32_t frl; + uint8_t rsvd20[2796]; + NvmeLBAFE lbafe[16]; + uint8_t rsvd3072[768]; + uint8_t vs[256]; +} NvmeIdNsZoned; + /*Deallocate Logical Block Features*/ #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat) ((dlfeat) & 0x10) #define NVME_ID_NS_DLFEAT_WRITE_ZEROES(dlfeat) ((dlfeat) & 0x08) @@ -1101,10 +1142,76 @@ enum NvmeIdNsDps { DPS_FIRST_EIGHT = 8, }; +enum NvmeZoneAttr { + NVME_ZA_FINISHED_BY_CTLR = 1 << 0, + NVME_ZA_FINISH_RECOMMENDED = 1 << 1, + NVME_ZA_RESET_RECOMMENDED = 1 << 2, + NVME_ZA_ZD_EXT_VALID = 1 << 7, +}; + +typedef struct QEMU_PACKED NvmeZoneReportHeader { + uint64_t nr_zones; + uint8_t rsvd[56]; +} NvmeZoneReportHeader; + +enum NvmeZoneReceiveAction { + NVME_ZONE_REPORT = 0, + NVME_ZONE_REPORT_EXTENDED = 1, +}; + +enum NvmeZoneReportType { + NVME_ZONE_REPORT_ALL = 0, + NVME_ZONE_REPORT_EMPTY = 1, + NVME_ZONE_REPORT_IMPLICITLY_OPEN = 2, + NVME_ZONE_REPORT_EXPLICITLY_OPEN = 3, + NVME_ZONE_REPORT_CLOSED = 4, + NVME_ZONE_REPORT_FULL = 5, + NVME_ZONE_REPORT_READ_ONLY = 6, + NVME_ZONE_REPORT_OFFLINE = 7, +}; + +enum NvmeZoneType { + NVME_ZONE_TYPE_RESERVED = 0x00, + NVME_ZONE_TYPE_SEQ_WRITE = 0x02, +}; + +enum NvmeZoneSendAction { + NVME_ZONE_ACTION_RSD = 0x00, + NVME_ZONE_ACTION_CLOSE = 0x01, + NVME_ZONE_ACTION_FINISH = 0x02, + NVME_ZONE_ACTION_OPEN = 0x03, + NVME_ZONE_ACTION_RESET = 0x04, + NVME_ZONE_ACTION_OFFLINE = 0x05, + NVME_ZONE_ACTION_SET_ZD_EXT = 0x10, +}; + +typedef struct QEMU_PACKED NvmeZoneDescr { + uint8_t zt; + uint8_t zs; + uint8_t za; + uint8_t rsvd3[5]; + uint64_t zcap; + uint64_t zslba; + uint64_t wp; + uint8_t rsvd32[32]; +} NvmeZoneDescr; + +enum NvmeZoneState { + NVME_ZONE_STATE_RESERVED = 0x00, + NVME_ZONE_STATE_EMPTY = 0x01, + NVME_ZONE_STATE_IMPLICITLY_OPEN = 0x02, + NVME_ZONE_STATE_EXPLICITLY_OPEN = 0x03, + NVME_ZONE_STATE_CLOSED = 0x04, + NVME_ZONE_STATE_READ_ONLY = 0x0D, + NVME_ZONE_STATE_FULL = 0x0E, + NVME_ZONE_STATE_OFFLINE = 0x0F, +}; + static inline void _nvme_check_size(void) { QEMU_BUILD_BUG_ON(sizeof(NvmeBar) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeAerResult) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeZonedResult) != 8); QEMU_BUILD_BUG_ON(sizeof(NvmeCqe) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeDsmRange) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeCmd) != 64); @@ -1120,8 +1227,13 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrlZoned) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeLBAF) != 4); + QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) != 16); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZoned) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16); + QEMU_BUILD_BUG_ON(sizeof(NvmeZoneDescr) != 64); } #endif From patchwork Fri Nov 6 23:43:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 322409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7C4FC2D0A3 for ; Fri, 6 Nov 2020 23:56:59 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0BCD9206FC for ; Fri, 6 Nov 2020 23:56:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="o73Cmhcm" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0BCD9206FC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:60166 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kbBbF-00030Z-Td for qemu-devel@archiver.kernel.org; Fri, 06 Nov 2020 18:56:57 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34320) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBON-0005hX-Ku; Fri, 06 Nov 2020 18:43:39 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:57360) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOJ-0003Tq-1M; Fri, 06 Nov 2020 18:43:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1604706215; x=1636242215; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hXh5KueJNF5HWuYGtdYQ9Q1DbHpFoGs6JEU/YA2VpWA=; b=o73CmhcmxpAjP1mLpKik42P1jM9VM0+V7yU+KPMwL4JMqNlFNeE7Fah8 OOYQeXC1oljGY3qi31VLTyBH/74etqkArQ2z7pPPUIQS4MrKIXIOPp6h5 l34pIwCEItbCwHJ02XWr1XSWaFbiXPkjLmfJufc+d6QolY/iLvkdM+8ef b4zz4JZqvuxlCkCRFyHTOI3bVZfyQDvziXMPVZ33mluqPPwbirZkAISw7 nAxlr2jb3KYHHfdFQ+uo0FAeuKmOQ+afmySKqLcVO35b4xGjTWC69a+T7 QTFIZINiGPuutLU2m9PPtbzZJcTLzQL+Bg8OX9TXG6F/HNnIARaiVLDbh w==; IronPort-SDR: zy5FxdO+H+XnbUd+1fLMemHwGF8ULWO8DHHlAXVXPRfuscbxBraTDR0Dq/9rcDnUt0LWgj0pAq d9RQ94GIJJ26oBlpu0WnQKLSrV0p507rVPbm6bUZ0FySXWSN4sSEIjJBAK+OD9gRsrjqXDgoZ4 yHdbU8zMXe+elf5BIMcrpnsu6BabEZ1ddQ21yLa5Y+FasqCBHjZvIFo2XTBjSjWFsvdJxBYy3O D4tFxct4GNWKIg+ms0zf5UdfoZSXbeQ0uaASZexuGw3NTIxJJqRu+wvDpgCyIYTOp3mAsE5nJR xFM= X-IronPort-AV: E=Sophos;i="5.77,457,1596470400"; d="scan'208";a="153267072" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Nov 2020 07:43:34 +0800 IronPort-SDR: aqaozhyn+1/wqTGYHx3SewvILlznwswBoM8hg6rKBCGEvkESor5ltOyzuYDesKxO1GgYbCMsXG vvUeG/1n3GTr5FEgY7y8JRX6l19bRQXEFeZM+aHa7rTPyg6vqykMH8yfg5JRpnDNVUDeH7NKuY E44V6pg5ncZiIBquPnIFivevNsA8jDoMVAFYUdpA3XbgnAQiWDGARRZiqwW+3aP1DTRbnuoIGE ++v2ced/ffoZCOgltoK2QPdQVal+KRUXLU0jP1AQmQEYmsrC8WXzH6TTYB4N71oSVL7Xv6Bq27 s3G8Pd6pmv3DbyliFRSWCZSn Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:28:25 -0800 IronPort-SDR: JjaGHv/ASmDRjvL1kowI9f060/+AJ7j5Y+QQqplKvWahclXq9Aw+ojhjHw6uma86WBXoRt+gxt uTPTXAM1sfK3TErocBKDbUQG4e+cCif85Z6nQkw9hjSsSeRv6plW9c6A7kJpINHal7vFxEd2Kn GhnNrdkQPJpSxhT1D/qgCBU1UbD92qz4mZJCsoIFhgH9H+Slmco9ZFhKnWvwBW1jWmubwODdW3 e3bkRA7orEK8IFPfwGBrvykxBrmDu0IYIT5cASgRb98EP0ZMgwOJqP8+HqVg+zfIQxazHAo99I /Ls= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip01.wdc.com with ESMTP; 06 Nov 2020 15:43:31 -0800 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Max Reitz , Maxim Levitsky , Fam Zheng Subject: [PATCH v10 08/12] hw/block/nvme: Support Zoned Namespace Command Set Date: Sat, 7 Nov 2020 08:43:01 +0900 Message-Id: <20201106234305.21339-9-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201106234305.21339-1-dmitry.fomichev@wdc.com> References: <20201106234305.21339-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.45; envelope-from=prvs=572b21b8d=dmitry.fomichev@wdc.com; helo=esa6.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/06 18:43:12 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The emulation code has been changed to advertise NVM Command Set when "zoned" device property is not set (default) and Zoned Namespace Command Set otherwise. Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator. Define trace events where needed in newly introduced code. In order to improve scalability, all open, closed and full zones are organized in separate linked lists. Consequently, almost all zone operations don't require scanning of the entire zone array (which potentially can be quite large) - it is only necessary to enumerate one or more zone lists. Handlers for three new NVMe commands introduced in Zoned Namespace Command Set specification are added, namely for Zone Management Receive, Zone Management Send and Zone Append. Device initialization code has been extended to create a proper configuration for zoned operation using device properties. Read/Write command handler is modified to only allow writes at the write pointer if the namespace is zoned. For Zone Append command, writes implicitly happen at the write pointer and the starting write pointer value is returned as the result of the command. Write Zeroes handler is modified to add zoned checks that are identical to those done as a part of Write flow. Subsequent commits in this series add ZDE support and checks for active and open zone limits. Signed-off-by: Niklas Cassel Signed-off-by: Hans Holmberg Signed-off-by: Ajay Joshi Signed-off-by: Chaitanya Kulkarni Signed-off-by: Matias Bjorling Signed-off-by: Aravind Ramesh Signed-off-by: Shin'ichiro Kawasaki Signed-off-by: Adam Manzanares Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel --- hw/block/nvme-ns.h | 54 +++ hw/block/nvme.h | 8 + hw/block/nvme-ns.c | 173 ++++++++ hw/block/nvme.c | 972 +++++++++++++++++++++++++++++++++++++++++- hw/block/trace-events | 18 +- 5 files changed, 1210 insertions(+), 15 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 2d9cd29d07..d2631ff5a3 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -19,9 +19,20 @@ #define NVME_NS(obj) \ OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS) +typedef struct NvmeZone { + NvmeZoneDescr d; + uint64_t w_ptr; + QTAILQ_ENTRY(NvmeZone) entry; +} NvmeZone; + typedef struct NvmeNamespaceParams { uint32_t nsid; QemuUUID uuid; + + bool zoned; + bool cross_zone_read; + uint64_t zone_size_bs; + uint64_t zone_cap_bs; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -34,6 +45,18 @@ typedef struct NvmeNamespace { bool attached; uint8_t csi; + NvmeIdNsZoned *id_ns_zoned; + NvmeZone *zone_array; + QTAILQ_HEAD(, NvmeZone) exp_open_zones; + QTAILQ_HEAD(, NvmeZone) imp_open_zones; + QTAILQ_HEAD(, NvmeZone) closed_zones; + QTAILQ_HEAD(, NvmeZone) full_zones; + uint32_t num_zones; + uint64_t zone_size; + uint64_t zone_capacity; + uint64_t zone_array_size; + uint32_t zone_size_log2; + NvmeNamespaceParams params; } NvmeNamespace; @@ -71,8 +94,39 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba) typedef struct NvmeCtrl NvmeCtrl; +static inline uint8_t nvme_get_zone_state(NvmeZone *zone) +{ + return zone->d.zs >> 4; +} + +static inline void nvme_set_zone_state(NvmeZone *zone, enum NvmeZoneState state) +{ + zone->d.zs = state << 4; +} + +static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone *zone) +{ + return zone->d.zslba + ns->zone_size; +} + +static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone) +{ + return zone->d.zslba + zone->d.zcap; +} + +static inline bool nvme_wp_is_valid(NvmeZone *zone) +{ + uint8_t st = nvme_get_zone_state(zone); + + return st != NVME_ZONE_STATE_FULL && + st != NVME_ZONE_STATE_READ_ONLY && + st != NVME_ZONE_STATE_OFFLINE; +} + int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_flush(NvmeNamespace *ns); +void nvme_ns_shutdown(NvmeNamespace *ns); +void nvme_ns_cleanup(NvmeNamespace *ns); #endif /* NVME_NS_H */ diff --git a/hw/block/nvme.h b/hw/block/nvme.h index e080a2318a..4cb0615128 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -6,6 +6,9 @@ #define NVME_MAX_NAMESPACES 256 +#define NVME_DEFAULT_ZONE_SIZE (128 * MiB) +#define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB) + typedef struct NvmeParams { char *serial; uint32_t num_queues; /* deprecated since 5.1 */ @@ -16,6 +19,7 @@ typedef struct NvmeParams { uint32_t aer_max_queued; uint8_t mdts; bool use_intel_id; + uint32_t zasl_bs; } NvmeParams; typedef struct NvmeAsyncEvent { @@ -28,6 +32,8 @@ typedef struct NvmeRequest { struct NvmeNamespace *ns; BlockAIOCB *aiocb; uint16_t status; + uint64_t fill_off; + uint32_t fill_len; NvmeCqe cqe; NvmeCmd cmd; BlockAcctCookie acct; @@ -147,6 +153,8 @@ typedef struct NvmeCtrl { QTAILQ_HEAD(, NvmeAsyncEvent) aer_queue; int aer_queued; + uint8_t zasl; + NvmeNamespace namespace; NvmeNamespace *namespaces[NVME_MAX_NAMESPACES]; NvmeSQueue **sq; diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index e191ef9be0..e6db7f7d3b 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -25,6 +25,7 @@ #include "hw/qdev-properties.h" #include "hw/qdev-core.h" +#include "trace.h" #include "nvme.h" #include "nvme-ns.h" @@ -77,6 +78,151 @@ static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) return 0; } +static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) +{ + uint64_t zone_size, zone_cap; + uint32_t nz, lbasz = ns->blkconf.logical_block_size; + + if (ns->params.zone_size_bs) { + zone_size = ns->params.zone_size_bs; + } else { + zone_size = NVME_DEFAULT_ZONE_SIZE; + } + if (ns->params.zone_cap_bs) { + zone_cap = ns->params.zone_cap_bs; + } else { + zone_cap = zone_size; + } + if (zone_cap > zone_size) { + error_setg(errp, "zone capacity %luB exceeds zone size %luB", + zone_cap, zone_size); + return -1; + } + if (zone_size < lbasz) { + error_setg(errp, "zone size %luB too small, must be at least %uB", + zone_size, lbasz); + return -1; + } + if (zone_cap < lbasz) { + error_setg(errp, "zone capacity %luB too small, must be at least %uB", + zone_cap, lbasz); + return -1; + } + ns->zone_size = zone_size / lbasz; + ns->zone_capacity = zone_cap / lbasz; + + nz = DIV_ROUND_UP(ns->size / lbasz, ns->zone_size); + ns->num_zones = nz; + ns->zone_array_size = sizeof(NvmeZone) * nz; + ns->zone_size_log2 = 0; + if (is_power_of_2(ns->zone_size)) { + ns->zone_size_log2 = 63 - clz64(ns->zone_size); + } + + return 0; +} + +static void nvme_init_zone_state(NvmeNamespace *ns) +{ + uint64_t start = 0, zone_size = ns->zone_size; + uint64_t capacity = ns->num_zones * zone_size; + NvmeZone *zone; + int i; + + ns->zone_array = g_malloc0(ns->zone_array_size); + + QTAILQ_INIT(&ns->exp_open_zones); + QTAILQ_INIT(&ns->imp_open_zones); + QTAILQ_INIT(&ns->closed_zones); + QTAILQ_INIT(&ns->full_zones); + + zone = ns->zone_array; + for (i = 0; i < ns->num_zones; i++, zone++) { + if (start + zone_size > capacity) { + zone_size = capacity - start; + } + zone->d.zt = NVME_ZONE_TYPE_SEQ_WRITE; + nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY); + zone->d.za = 0; + zone->d.zcap = ns->zone_capacity; + zone->d.zslba = start; + zone->d.wp = start; + zone->w_ptr = start; + start += zone_size; + } +} + +static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, + Error **errp) +{ + NvmeIdNsZoned *id_ns_z; + + if (nvme_calc_zone_geometry(ns, errp) != 0) { + return -1; + } + + nvme_init_zone_state(ns); + + id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned)); + + /* MAR/MOR are zeroes-based, 0xffffffff means no limit */ + id_ns_z->mar = 0xffffffff; + id_ns_z->mor = 0xffffffff; + id_ns_z->zoc = 0; + id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; + + id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size); + id_ns_z->lbafe[lba_index].zdes = 0; + + ns->csi = NVME_CSI_ZONED; + ns->id_ns.nsze = cpu_to_le64(ns->zone_size * ns->num_zones); + ns->id_ns.ncap = cpu_to_le64(ns->zone_capacity * ns->num_zones); + ns->id_ns.nuse = ns->id_ns.ncap; + + ns->id_ns_zoned = id_ns_z; + + return 0; +} + +static void nvme_clear_zone(NvmeNamespace *ns, NvmeZone *zone) +{ + uint8_t state; + + zone->w_ptr = zone->d.wp; + state = nvme_get_zone_state(zone); + if (zone->d.wp != zone->d.zslba) { + if (state != NVME_ZONE_STATE_CLOSED) { + trace_pci_nvme_clear_ns_close(state, zone->d.zslba); + nvme_set_zone_state(zone, NVME_ZONE_STATE_CLOSED); + } + QTAILQ_INSERT_HEAD(&ns->closed_zones, zone, entry); + } else { + trace_pci_nvme_clear_ns_reset(state, zone->d.zslba); + nvme_set_zone_state(zone, NVME_ZONE_STATE_EMPTY); + } +} + +/* + * Close all the zones that are currently open. + */ +static void nvme_zoned_ns_shutdown(NvmeNamespace *ns) +{ + NvmeZone *zone, *next; + + QTAILQ_FOREACH_SAFE(zone, &ns->closed_zones, entry, next) { + QTAILQ_REMOVE(&ns->closed_zones, zone, entry); + nvme_clear_zone(ns, zone); + } + QTAILQ_FOREACH_SAFE(zone, &ns->imp_open_zones, entry, next) { + QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); + nvme_clear_zone(ns, zone); + } + QTAILQ_FOREACH_SAFE(zone, &ns->exp_open_zones, entry, next) { + QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); + nvme_clear_zone(ns, zone); + } +} + static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) { if (!ns->blkconf.blk) { @@ -98,6 +244,12 @@ int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp) } nvme_ns_init(ns); + if (ns->params.zoned) { + if (nvme_zoned_init_ns(n, ns, 0, errp) != 0) { + return -1; + } + } + if (nvme_register_namespace(n, ns, errp)) { return -1; } @@ -115,6 +267,21 @@ void nvme_ns_flush(NvmeNamespace *ns) blk_flush(ns->blkconf.blk); } +void nvme_ns_shutdown(NvmeNamespace *ns) +{ + if (ns->params.zoned) { + nvme_zoned_ns_shutdown(ns); + } +} + +void nvme_ns_cleanup(NvmeNamespace *ns) +{ + if (ns->params.zoned) { + g_free(ns->id_ns_zoned); + g_free(ns->zone_array); + } +} + static void nvme_ns_realize(DeviceState *dev, Error **errp) { NvmeNamespace *ns = NVME_NS(dev); @@ -133,6 +300,12 @@ static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), + DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false), + DEFINE_PROP_SIZE("zoned.zsze", NvmeNamespace, params.zone_size_bs, + NVME_DEFAULT_ZONE_SIZE), + DEFINE_PROP_SIZE("zoned.zcap", NvmeNamespace, params.zone_cap_bs, 0), + DEFINE_PROP_BOOL("zoned.cross_read", NvmeNamespace, + params.cross_zone_read, false), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 7495cdb5ef..f5390e6863 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -133,6 +133,16 @@ static const uint32_t nvme_cse_iocs_nvm[256] = { [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP, }; +static const uint32_t nvme_cse_iocs_zoned[256] = { + [NVME_CMD_FLUSH] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_WRITE] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_READ] = NVME_CMD_EFF_CSUPP, + [NVME_CMD_ZONE_APPEND] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, + [NVME_CMD_ZONE_MGMT_SEND] = NVME_CMD_EFF_CSUPP, + [NVME_CMD_ZONE_MGMT_RECV] = NVME_CMD_EFF_CSUPP, +}; + static void nvme_process_sq(void *opaque); static uint16_t nvme_cid(NvmeRequest *req) @@ -149,6 +159,46 @@ static uint16_t nvme_sqid(NvmeRequest *req) return le16_to_cpu(req->sq->sqid); } +static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + if (QTAILQ_IN_USE(zone, entry)) { + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->exp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + QTAILQ_REMOVE(&ns->imp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_CLOSED: + QTAILQ_REMOVE(&ns->closed_zones, zone, entry); + break; + case NVME_ZONE_STATE_FULL: + QTAILQ_REMOVE(&ns->full_zones, zone, entry); + } + } + + nvme_set_zone_state(zone, state); + + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + QTAILQ_INSERT_TAIL(&ns->exp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + QTAILQ_INSERT_TAIL(&ns->imp_open_zones, zone, entry); + break; + case NVME_ZONE_STATE_CLOSED: + QTAILQ_INSERT_TAIL(&ns->closed_zones, zone, entry); + break; + case NVME_ZONE_STATE_FULL: + QTAILQ_INSERT_TAIL(&ns->full_zones, zone, entry); + case NVME_ZONE_STATE_READ_ONLY: + break; + default: + zone->d.za = 0; + } +} + static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) { hwaddr low = n->ctrl_mem.addr; @@ -900,6 +950,319 @@ static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns, return NVME_SUCCESS; } +static void nvme_fill_read_data(NvmeRequest *req, uint64_t offset, + uint32_t max_len) +{ + QEMUSGList *qsg = &req->qsg; + QEMUIOVector *iov = &req->iov; + ScatterGatherEntry *entry; + uint32_t len, ent_len; + + if (qsg->nsg > 0) { + entry = qsg->sg; + len = qsg->size; + if (max_len) { + len = MIN(len, max_len); + } + for (; len > 0; len -= ent_len) { + ent_len = MIN(len, entry->len); + if (offset > ent_len) { + offset -= ent_len; + } else if (offset != 0) { + dma_memory_set(qsg->as, entry->base + offset, + 0, ent_len - offset); + offset = 0; + } else { + dma_memory_set(qsg->as, entry->base, 0, ent_len); + } + entry++; + } + } else if (iov->iov) { + len = iov_size(iov->iov, iov->niov); + if (max_len) { + len = MIN(len, max_len); + } + qemu_iovec_memset(iov, offset, 0, len - offset); + } +} + +static inline uint32_t nvme_zone_idx(NvmeNamespace *ns, uint64_t slba) +{ + return ns->zone_size_log2 > 0 ? slba >> ns->zone_size_log2 : + slba / ns->zone_size; +} + +static inline NvmeZone *nvme_get_zone_by_slba(NvmeNamespace *ns, uint64_t slba) +{ + uint32_t zone_idx = nvme_zone_idx(ns, slba); + + assert(zone_idx < ns->num_zones); + return &ns->zone_array[zone_idx]; +} + +static uint16_t nvme_zone_state_ok_to_write(NvmeZone *zone) +{ + uint16_t status; + + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + status = NVME_SUCCESS; + break; + case NVME_ZONE_STATE_FULL: + status = NVME_ZONE_FULL; + break; + case NVME_ZONE_STATE_OFFLINE: + status = NVME_ZONE_OFFLINE; + break; + case NVME_ZONE_STATE_READ_ONLY: + status = NVME_ZONE_READ_ONLY; + break; + default: + assert(false); + } + + return status; +} + +static uint16_t nvme_check_zone_write(NvmeCtrl *n, NvmeNamespace *ns, + NvmeZone *zone, uint64_t slba, + uint32_t nlb, bool append) +{ + uint16_t status; + + if (unlikely((slba + nlb) > nvme_zone_wr_boundary(zone))) { + status = NVME_ZONE_BOUNDARY_ERROR; + } else { + status = nvme_zone_state_ok_to_write(zone); + } + + if (status != NVME_SUCCESS) { + trace_pci_nvme_err_zone_write_not_ok(slba, nlb, status); + } else { + assert(nvme_wp_is_valid(zone)); + if (append) { + if (unlikely(slba != zone->d.zslba)) { + trace_pci_nvme_err_append_not_at_start(slba, zone->d.zslba); + status = NVME_ZONE_INVALID_WRITE; + } + if (nvme_l2b(ns, nlb) > (n->page_size << n->zasl)) { + trace_pci_nvme_err_append_too_large(slba, nlb, n->zasl); + status = NVME_INVALID_FIELD; + } + } else if (unlikely(slba != zone->w_ptr)) { + trace_pci_nvme_err_write_not_at_wp(slba, zone->d.zslba, + zone->w_ptr); + status = NVME_ZONE_INVALID_WRITE; + } + } + + return status; +} + +static uint16_t nvme_zone_state_ok_to_read(NvmeZone *zone) +{ + uint16_t status; + + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_FULL: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_READ_ONLY: + status = NVME_SUCCESS; + break; + case NVME_ZONE_STATE_OFFLINE: + status = NVME_ZONE_OFFLINE | NVME_DNR; + break; + default: + assert(false); + } + + return status; +} + +typedef struct NvmeReadFillCtx { + uint64_t pre_rd_fill_slba; + uint64_t read_slba; + uint64_t post_rd_fill_slba; + + uint32_t pre_rd_fill_nlb; + uint32_t read_nlb; + uint32_t post_rd_fill_nlb; +} NvmeReadFillCtx; + +static uint16_t nvme_check_zone_read(NvmeNamespace *ns, uint64_t slba, + uint32_t nlb, NvmeReadFillCtx *rfc) +{ + NvmeZone *zone = nvme_get_zone_by_slba(ns, slba); + NvmeZone *next_zone; + uint64_t bndry = nvme_zone_rd_boundary(ns, zone); + uint64_t end = slba + nlb, wp1, wp2; + uint16_t status; + + rfc->read_slba = slba; + rfc->read_nlb = nlb; + + status = nvme_zone_state_ok_to_read(zone); + if (status != NVME_SUCCESS) { + ; + } else if (likely(end <= bndry)) { + if (end > zone->w_ptr) { + wp1 = zone->w_ptr; + if (slba >= wp1) { + /* No i/o necessary, just fill */ + rfc->pre_rd_fill_slba = slba; + rfc->pre_rd_fill_nlb = nlb; + rfc->read_nlb = 0; + } else { + rfc->read_nlb = wp1 - slba; + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = nlb - rfc->read_nlb; + } + } + } else if (!ns->params.cross_zone_read) { + status = NVME_ZONE_BOUNDARY_ERROR; + } else { + /* + * Read across zone boundary - look at the next zone. + * Earlier bounds checks ensure that the current zone + * is not the last one. + */ + next_zone = zone + 1; + status = nvme_zone_state_ok_to_read(next_zone); + if (status != NVME_SUCCESS) { + ; + } else if (end > nvme_zone_rd_boundary(ns, next_zone)) { + /* + * As zone size is much larger than a typical maximum + * i/o size in real hardware, only allow the i/o range + * to span no more than one pair of zones. + */ + status = NVME_ZONE_BOUNDARY_ERROR; + } else { + wp1 = zone->w_ptr; + wp2 = next_zone->w_ptr; + if (wp2 == bndry) { + if (slba >= wp1) { + /* Again, no i/o necessary, just fill */ + rfc->pre_rd_fill_slba = slba; + rfc->pre_rd_fill_nlb = nlb; + rfc->read_nlb = 0; + } else { + rfc->read_nlb = wp1 - slba; + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = nlb - rfc->read_nlb; + } + } else if (slba < wp1) { + if (end > wp2) { + if (wp1 == bndry) { + rfc->post_rd_fill_slba = wp2; + rfc->post_rd_fill_nlb = end - wp2; + rfc->read_nlb = wp2 - slba; + } else { + rfc->pre_rd_fill_slba = wp2; + rfc->pre_rd_fill_nlb = end - wp2; + rfc->read_nlb = wp2 - slba; + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = bndry - wp1; + } + } else { + rfc->post_rd_fill_slba = wp1; + rfc->post_rd_fill_nlb = bndry - wp1; + } + } else { + if (end > wp2) { + rfc->pre_rd_fill_slba = slba; + rfc->pre_rd_fill_nlb = end - slba; + rfc->read_slba = bndry; + rfc->read_nlb = wp2 - bndry; + } else { + rfc->read_slba = bndry; + rfc->read_nlb = end - bndry; + rfc->post_rd_fill_slba = slba; + rfc->post_rd_fill_nlb = bndry - slba; + } + } + } + } + + return status; +} + +static bool nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req, + bool failed) +{ + NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; + NvmeZone *zone; + NvmeZonedResult *res = (NvmeZonedResult *)&req->cqe; + uint64_t slba, start_wp = res->slba; + uint32_t nlb; + + if (rw->opcode != NVME_CMD_WRITE && + rw->opcode != NVME_CMD_ZONE_APPEND && + rw->opcode != NVME_CMD_WRITE_ZEROES) { + return false; + } + + slba = le64_to_cpu(rw->slba); + nlb = le16_to_cpu(rw->nlb) + 1; + zone = nvme_get_zone_by_slba(ns, slba); + + if (!failed && zone->w_ptr < start_wp + nlb) { + /* + * A preceding queued write to the zone has failed, + * now this write is not at the WP, fail it too. + */ + failed = true; + } + + if (failed) { + res->slba = 0; + } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) { + switch (nvme_get_zone_state(zone)) { + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_EMPTY: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); + /* fall through */ + case NVME_ZONE_STATE_FULL: + break; + default: + assert(false); + } + zone->d.wp = zone->w_ptr; + } else { + zone->d.wp += nlb; + } + + return failed; +} + +static uint64_t nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone, + uint32_t nlb) +{ + uint64_t result = zone->w_ptr; + uint8_t zs; + + zone->w_ptr += nlb; + + if (zone->w_ptr < nvme_zone_wr_boundary(zone)) { + zs = nvme_get_zone_state(zone); + switch (zs) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_CLOSED: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN); + } + } + + return result; +} + static void nvme_rw_cb(void *opaque, int ret) { NvmeRequest *req = opaque; @@ -914,10 +1277,26 @@ static void nvme_rw_cb(void *opaque, int ret) trace_pci_nvme_rw_cb(nvme_cid(req), blk_name(blk)); if (!ret) { - block_acct_done(stats, acct); + if (ns->params.zoned) { + if (nvme_finalize_zoned_write(ns, req, false)) { + ret = EIO; + block_acct_failed(stats, acct); + req->status = NVME_ZONE_INVALID_WRITE; + } else if (req->fill_len) { + nvme_fill_read_data(req, req->fill_off, req->fill_len); + req->fill_len = 0; + } + } + if (!ret) { + block_acct_done(stats, acct); + } } else { uint16_t status; + if (ns->params.zoned) { + nvme_finalize_zoned_write(ns, req, true); + } + block_acct_failed(stats, acct); switch (req->cmd.opcode) { @@ -960,7 +1339,9 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req) uint64_t slba = le64_to_cpu(rw->slba); uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t data_size = nvme_l2b(ns, nlb); - uint64_t data_offset; + uint64_t data_offset, fill_off; + uint32_t fill_len; + NvmeReadFillCtx rfc = {}; BlockBackend *blk = ns->blkconf.blk; uint16_t status; @@ -978,11 +1359,40 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req) goto invalid; } + if (ns->params.zoned) { + status = nvme_check_zone_read(ns, slba, nlb, &rfc); + if (status != NVME_SUCCESS) { + trace_pci_nvme_err_zone_read_not_ok(slba, nlb, status); + goto invalid; + } + } + status = nvme_map_dptr(n, data_size, req); if (status) { goto invalid; } + if (ns->params.zoned) { + if (rfc.pre_rd_fill_nlb) { + fill_off = nvme_l2b(ns, rfc.pre_rd_fill_slba - slba); + fill_len = nvme_l2b(ns, rfc.pre_rd_fill_nlb); + nvme_fill_read_data(req, fill_off, fill_len); + } + if (!rfc.read_nlb) { + /* No backend I/O necessary, only needed to fill the buffer */ + req->status = NVME_SUCCESS; + return NVME_SUCCESS; + } + if (rfc.post_rd_fill_nlb) { + req->fill_off = nvme_l2b(ns, rfc.post_rd_fill_slba - slba); + req->fill_len = nvme_l2b(ns, rfc.post_rd_fill_nlb); + } else { + req->fill_len = 0; + } + slba = rfc.read_slba; + data_size = nvme_l2b(ns, rfc.read_nlb); + } + data_offset = nvme_l2b(ns, slba); block_acct_start(blk_get_stats(blk), &req->acct, data_size, @@ -1001,7 +1411,7 @@ invalid: return status | NVME_DNR; } -static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool wrz) +static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool append, bool wrz) { NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd; NvmeNamespace *ns = req->ns; @@ -1009,6 +1419,8 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool wrz) uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t data_size = nvme_l2b(ns, nlb); uint64_t data_offset; + NvmeZone *zone; + NvmeZonedResult *res = (NvmeZonedResult *)&req->cqe; BlockBackend *blk = ns->blkconf.blk; uint16_t status; @@ -1029,6 +1441,25 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool wrz) goto invalid; } + if (ns->params.zoned) { + zone = nvme_get_zone_by_slba(ns, slba); + + status = nvme_check_zone_write(n, ns, zone, slba, nlb, append); + if (status != NVME_SUCCESS) { + goto invalid; + } + + if (append) { + slba = zone->w_ptr; + } + + res->slba = nvme_advance_zone_wp(ns, zone, nlb); + } else if (append) { + trace_pci_nvme_err_invalid_opc(rw->opcode); + status = NVME_INVALID_OPCODE; + goto invalid; + } + data_offset = nvme_l2b(ns, slba); if (!wrz) { @@ -1059,6 +1490,436 @@ invalid: return status | NVME_DNR; } +static uint16_t nvme_get_mgmt_zone_slba_idx(NvmeNamespace *ns, NvmeCmd *c, + uint64_t *slba, uint32_t *zone_idx) +{ + uint32_t dw10 = le32_to_cpu(c->cdw10); + uint32_t dw11 = le32_to_cpu(c->cdw11); + + if (!ns->params.zoned) { + trace_pci_nvme_err_invalid_opc(c->opcode); + return NVME_INVALID_OPCODE | NVME_DNR; + } + + *slba = ((uint64_t)dw11) << 32 | dw10; + if (unlikely(*slba >= ns->id_ns.nsze)) { + trace_pci_nvme_err_invalid_lba_range(*slba, 0, ns->id_ns.nsze); + *slba = 0; + return NVME_LBA_RANGE | NVME_DNR; + } + + *zone_idx = nvme_zone_idx(ns, *slba); + assert(*zone_idx < ns->num_zones); + + return NVME_SUCCESS; +} + +typedef uint16_t (*op_handler_t)(NvmeNamespace *, NvmeZone *, + uint8_t); + +enum NvmeZoneProcessingMask { + NVME_PROC_CURRENT_ZONE = 0, + NVME_PROC_IMP_OPEN_ZONES = 1 << 0, + NVME_PROC_EXP_OPEN_ZONES = 1 << 1, + NVME_PROC_CLOSED_ZONES = 1 << 2, + NVME_PROC_READ_ONLY_ZONES = 1 << 3, + NVME_PROC_FULL_ZONES = 1 << 4, +}; + +static uint16_t nvme_open_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EMPTY: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EXPLICITLY_OPEN); + /* fall through */ + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static uint16_t nvme_close_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); + /* fall through */ + case NVME_ZONE_STATE_CLOSED: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static uint16_t nvme_finish_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_EMPTY: + zone->w_ptr = nvme_zone_wr_boundary(zone); + zone->d.wp = zone->w_ptr; + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_FULL); + /* fall through */ + case NVME_ZONE_STATE_FULL: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static uint16_t nvme_reset_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + case NVME_ZONE_STATE_CLOSED: + case NVME_ZONE_STATE_FULL: + zone->w_ptr = zone->d.zslba; + zone->d.wp = zone->w_ptr; + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_EMPTY); + /* fall through */ + case NVME_ZONE_STATE_EMPTY: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static uint16_t nvme_offline_zone(NvmeNamespace *ns, NvmeZone *zone, + uint8_t state) +{ + switch (state) { + case NVME_ZONE_STATE_READ_ONLY: + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_OFFLINE); + /* fall through */ + case NVME_ZONE_STATE_OFFLINE: + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + +static uint16_t nvme_bulk_proc_zone(NvmeNamespace *ns, NvmeZone *zone, + enum NvmeZoneProcessingMask proc_mask, + op_handler_t op_hndlr) +{ + uint16_t status = NVME_SUCCESS; + uint8_t zs = nvme_get_zone_state(zone); + bool proc_zone = false; + + switch (zs) { + case NVME_ZONE_STATE_IMPLICITLY_OPEN: + proc_zone = proc_mask & NVME_PROC_IMP_OPEN_ZONES; + break; + case NVME_ZONE_STATE_EXPLICITLY_OPEN: + proc_zone = proc_mask & NVME_PROC_EXP_OPEN_ZONES; + break; + case NVME_ZONE_STATE_CLOSED: + proc_zone = proc_mask & NVME_PROC_CLOSED_ZONES; + break; + case NVME_ZONE_STATE_READ_ONLY: + proc_zone = proc_mask & NVME_PROC_READ_ONLY_ZONES; + break; + case NVME_ZONE_STATE_FULL: + proc_zone = proc_mask & NVME_PROC_FULL_ZONES; + } + + if (proc_zone) { + status = op_hndlr(ns, zone, zs); + } + + return status; +} + +static uint16_t nvme_do_zone_op(NvmeNamespace *ns, NvmeZone *zone, + enum NvmeZoneProcessingMask proc_mask, + op_handler_t op_hndlr) +{ + NvmeZone *next; + uint16_t status = NVME_SUCCESS; + int i; + + if (!proc_mask) { + status = op_hndlr(ns, zone, nvme_get_zone_state(zone)); + } else { + if (proc_mask & NVME_PROC_CLOSED_ZONES) { + QTAILQ_FOREACH_SAFE(zone, &ns->closed_zones, entry, next) { + status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr); + if (status != NVME_SUCCESS) { + goto out; + } + } + } + if (proc_mask & NVME_PROC_IMP_OPEN_ZONES) { + QTAILQ_FOREACH_SAFE(zone, &ns->imp_open_zones, entry, next) { + status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr); + if (status != NVME_SUCCESS) { + goto out; + } + } + } + if (proc_mask & NVME_PROC_EXP_OPEN_ZONES) { + QTAILQ_FOREACH_SAFE(zone, &ns->exp_open_zones, entry, next) { + status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr); + if (status != NVME_SUCCESS) { + goto out; + } + } + } + if (proc_mask & NVME_PROC_FULL_ZONES) { + QTAILQ_FOREACH_SAFE(zone, &ns->full_zones, entry, next) { + status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr); + if (status != NVME_SUCCESS) { + goto out; + } + } + } + + if (proc_mask & NVME_PROC_READ_ONLY_ZONES) { + for (i = 0; i < ns->num_zones; i++, zone++) { + status = nvme_bulk_proc_zone(ns, zone, proc_mask, op_hndlr); + if (status != NVME_SUCCESS) { + goto out; + } + } + } + } + +out: + return status; +} + +static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeCmd *cmd = (NvmeCmd *)&req->cmd; + NvmeNamespace *ns = req->ns; + NvmeZone *zone; + uint32_t dw13 = le32_to_cpu(cmd->cdw13); + uint64_t slba = 0; + uint32_t zone_idx = 0; + uint16_t status; + uint8_t action; + bool all; + enum NvmeZoneProcessingMask proc_mask = NVME_PROC_CURRENT_ZONE; + + action = dw13 & 0xff; + all = dw13 & 0x100; + + req->status = NVME_SUCCESS; + + if (!all) { + status = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx); + if (status) { + return status; + } + } + + zone = &ns->zone_array[zone_idx]; + if (slba != zone->d.zslba) { + trace_pci_nvme_err_unaligned_zone_cmd(action, slba, zone->d.zslba); + return NVME_INVALID_FIELD | NVME_DNR; + } + + switch (action) { + + case NVME_ZONE_ACTION_OPEN: + if (all) { + proc_mask = NVME_PROC_CLOSED_ZONES; + } + trace_pci_nvme_open_zone(slba, zone_idx, all); + status = nvme_do_zone_op(ns, zone, proc_mask, nvme_open_zone); + break; + + case NVME_ZONE_ACTION_CLOSE: + if (all) { + proc_mask = NVME_PROC_IMP_OPEN_ZONES | NVME_PROC_EXP_OPEN_ZONES; + } + trace_pci_nvme_close_zone(slba, zone_idx, all); + status = nvme_do_zone_op(ns, zone, proc_mask, nvme_close_zone); + break; + + case NVME_ZONE_ACTION_FINISH: + if (all) { + proc_mask = NVME_PROC_IMP_OPEN_ZONES | NVME_PROC_EXP_OPEN_ZONES | + NVME_PROC_CLOSED_ZONES; + } + trace_pci_nvme_finish_zone(slba, zone_idx, all); + status = nvme_do_zone_op(ns, zone, proc_mask, nvme_finish_zone); + break; + + case NVME_ZONE_ACTION_RESET: + if (all) { + proc_mask = NVME_PROC_IMP_OPEN_ZONES | NVME_PROC_EXP_OPEN_ZONES | + NVME_PROC_CLOSED_ZONES | NVME_PROC_FULL_ZONES; + } + trace_pci_nvme_reset_zone(slba, zone_idx, all); + status = nvme_do_zone_op(ns, zone, proc_mask, nvme_reset_zone); + break; + + case NVME_ZONE_ACTION_OFFLINE: + if (all) { + proc_mask = NVME_PROC_READ_ONLY_ZONES; + } + trace_pci_nvme_offline_zone(slba, zone_idx, all); + status = nvme_do_zone_op(ns, zone, proc_mask, nvme_offline_zone); + break; + + case NVME_ZONE_ACTION_SET_ZD_EXT: + trace_pci_nvme_set_descriptor_extension(slba, zone_idx); + return NVME_INVALID_FIELD | NVME_DNR; + break; + + default: + trace_pci_nvme_err_invalid_mgmt_action(action); + status = NVME_INVALID_FIELD; + } + + if (status == NVME_ZONE_INVAL_TRANSITION) { + trace_pci_nvme_err_invalid_zone_state_transition(action, slba, + zone->d.za); + } + if (status) { + status |= NVME_DNR; + } + + return status; +} + +static bool nvme_zone_matches_filter(uint32_t zafs, NvmeZone *zl) +{ + int zs = nvme_get_zone_state(zl); + + switch (zafs) { + case NVME_ZONE_REPORT_ALL: + return true; + case NVME_ZONE_REPORT_EMPTY: + return zs == NVME_ZONE_STATE_EMPTY; + case NVME_ZONE_REPORT_IMPLICITLY_OPEN: + return zs == NVME_ZONE_STATE_IMPLICITLY_OPEN; + case NVME_ZONE_REPORT_EXPLICITLY_OPEN: + return zs == NVME_ZONE_STATE_EXPLICITLY_OPEN; + case NVME_ZONE_REPORT_CLOSED: + return zs == NVME_ZONE_STATE_CLOSED; + case NVME_ZONE_REPORT_FULL: + return zs == NVME_ZONE_STATE_FULL; + case NVME_ZONE_REPORT_READ_ONLY: + return zs == NVME_ZONE_STATE_READ_ONLY; + case NVME_ZONE_REPORT_OFFLINE: + return zs == NVME_ZONE_STATE_OFFLINE; + default: + return false; + } +} + +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeCmd *cmd = (NvmeCmd *)&req->cmd; + NvmeNamespace *ns = req->ns; + /* cdw12 is zero-based number of dwords to return. Convert to bytes */ + uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2; + uint32_t dw13 = le32_to_cpu(cmd->cdw13); + uint32_t zone_idx, zra, zrasf, partial; + uint64_t max_zones, nr_zones = 0; + uint16_t status; + uint64_t slba; + NvmeZoneDescr *z; + NvmeZone *zs; + NvmeZoneReportHeader *header; + void *buf, *buf_p; + size_t zone_entry_sz; + + req->status = NVME_SUCCESS; + + status = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx); + if (status) { + return status; + } + + zra = dw13 & 0xff; + if (zra != NVME_ZONE_REPORT) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + zrasf = (dw13 >> 8) & 0xff; + if (zrasf > NVME_ZONE_REPORT_OFFLINE) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (data_size < sizeof(NvmeZoneReportHeader)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + status = nvme_check_mdts(n, data_size); + if (status) { + trace_pci_nvme_err_mdts(nvme_cid(req), data_size); + return status; + } + + partial = (dw13 >> 16) & 0x01; + + zone_entry_sz = sizeof(NvmeZoneDescr); + + max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; + buf = g_malloc0(data_size); + + header = (NvmeZoneReportHeader *)buf; + buf_p = buf + sizeof(NvmeZoneReportHeader); + + while (zone_idx < ns->num_zones && nr_zones < max_zones) { + zs = &ns->zone_array[zone_idx]; + + if (!nvme_zone_matches_filter(zrasf, zs)) { + zone_idx++; + continue; + } + + z = (NvmeZoneDescr *)buf_p; + buf_p += sizeof(NvmeZoneDescr); + nr_zones++; + + z->zt = zs->d.zt; + z->zs = zs->d.zs; + z->zcap = cpu_to_le64(zs->d.zcap); + z->zslba = cpu_to_le64(zs->d.zslba); + z->za = zs->d.za; + + if (nvme_wp_is_valid(zs)) { + z->wp = cpu_to_le64(zs->d.wp); + } else { + z->wp = cpu_to_le64(~0ULL); + } + + zone_idx++; + } + + if (!partial) { + for (; zone_idx < ns->num_zones; zone_idx++) { + zs = &ns->zone_array[zone_idx]; + if (nvme_zone_matches_filter(zrasf, zs)) { + nr_zones++; + } + } + } + header->nr_zones = cpu_to_le64(nr_zones); + + status = nvme_dma(n, (uint8_t *)buf, data_size, + DMA_DIRECTION_FROM_DEVICE, req); + + g_free(buf); + + return status; +} + static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) { uint32_t nsid = le32_to_cpu(req->cmd.nsid); @@ -1084,11 +1945,17 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_FLUSH: return nvme_flush(n, req); case NVME_CMD_WRITE_ZEROES: - return nvme_write(n, req, true); + return nvme_write(n, req, false, true); + case NVME_CMD_ZONE_APPEND: + return nvme_write(n, req, true, false); case NVME_CMD_WRITE: - return nvme_write(n, req, false); + return nvme_write(n, req, false, false); case NVME_CMD_READ: return nvme_read(n, req); + case NVME_CMD_ZONE_MGMT_SEND: + return nvme_zone_mgmt_send(n, req); + case NVME_CMD_ZONE_MGMT_RECV: + return nvme_zone_mgmt_recv(n, req); default: assert(false); } @@ -1348,6 +2215,9 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t csi, uint32_t buf_len, case NVME_CSI_NVM: src_iocs = nvme_cse_iocs_nvm; break; + case NVME_CSI_ZONED: + src_iocs = nvme_cse_iocs_zoned; + break; } } @@ -1529,6 +2399,16 @@ static uint16_t nvme_rpt_empty_id_struct(NvmeCtrl *n, NvmeRequest *req) return nvme_dma(n, id, sizeof(id), DMA_DIRECTION_FROM_DEVICE, req); } +static inline bool nvme_csi_has_nvm_support(NvmeNamespace *ns) +{ + switch (ns->csi) { + case NVME_CSI_NVM: + case NVME_CSI_ZONED: + return true; + } + return false; +} + static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) { trace_pci_nvme_identify_ctrl(); @@ -1540,11 +2420,16 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req) static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) { NvmeIdentify *c = (NvmeIdentify *)&req->cmd; + NvmeIdCtrlZoned id = {}; trace_pci_nvme_identify_ctrl_csi(c->csi); if (c->csi == NVME_CSI_NVM) { return nvme_rpt_empty_id_struct(n, req); + } else if (c->csi == NVME_CSI_ZONED) { + id.zasl = n->zasl; + return nvme_dma(n, (uint8_t *)&id, sizeof(id), + DMA_DIRECTION_FROM_DEVICE, req); } return NVME_INVALID_FIELD | NVME_DNR; @@ -1572,8 +2457,12 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, return nvme_rpt_empty_id_struct(n, req); } - return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), - DMA_DIRECTION_FROM_DEVICE, req); + if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { + return nvme_dma(n, (uint8_t *)&ns->id_ns, sizeof(NvmeIdNs), + DMA_DIRECTION_FROM_DEVICE, req); + } + + return NVME_INVALID_CMD_SET | NVME_DNR; } static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, @@ -1598,8 +2487,11 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, return nvme_rpt_empty_id_struct(n, req); } - if (c->csi == NVME_CSI_NVM) { + if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) { return nvme_rpt_empty_id_struct(n, req); + } else if (c->csi == NVME_CSI_ZONED && ns->csi == NVME_CSI_ZONED) { + return nvme_dma(n, (uint8_t *)ns->id_ns_zoned, sizeof(NvmeIdNsZoned), + DMA_DIRECTION_FROM_DEVICE, req); } return NVME_INVALID_FIELD | NVME_DNR; @@ -1668,7 +2560,7 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, return NVME_INVALID_NSID | NVME_DNR; } - if (c->csi != NVME_CSI_NVM) { + if (c->csi != NVME_CSI_NVM && c->csi != NVME_CSI_ZONED) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1677,7 +2569,7 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, NvmeRequest *req, if (!ns) { continue; } - if (ns->params.nsid <= min_nsid) { + if (ns->params.nsid <= min_nsid || c->csi != ns->csi) { continue; } if (only_active && !ns->attached) { @@ -1747,6 +2639,8 @@ static uint16_t nvme_identify_cmd_set(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_identify_cmd_set(); NVME_SET_CSI(*list, NVME_CSI_NVM); + NVME_SET_CSI(*list, NVME_CSI_ZONED); + return nvme_dma(n, list, data_len, DMA_DIRECTION_FROM_DEVICE, req); } @@ -2206,7 +3100,7 @@ static void nvme_process_sq(void *opaque) } } -static void nvme_clear_ctrl(NvmeCtrl *n) +static void nvme_clear_ctrl(NvmeCtrl *n, bool shutdown) { NvmeNamespace *ns; int i; @@ -2250,6 +3144,17 @@ static void nvme_clear_ctrl(NvmeCtrl *n) nvme_ns_flush(ns); } + if (shutdown) { + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + + nvme_ns_shutdown(ns); + } + } + n->bar.cc = 0; } @@ -2270,6 +3175,13 @@ static void nvme_select_ns_iocs(NvmeCtrl *n) ns->iocs = nvme_cse_iocs_nvm; } break; + case NVME_CSI_ZONED: + if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) { + ns->iocs = nvme_cse_iocs_zoned; + } else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) { + ns->iocs = nvme_cse_iocs_nvm; + } + break; } } } @@ -2368,6 +3280,17 @@ static int nvme_start_ctrl(NvmeCtrl *n) nvme_init_sq(&n->admin_sq, n, n->bar.asq, 0, 0, NVME_AQA_ASQS(n->bar.aqa) + 1); + if (!n->params.zasl_bs) { + n->zasl = n->params.mdts; + } else { + if (n->params.zasl_bs < n->page_size) { + trace_pci_nvme_err_startfail_zasl_too_small(n->params.zasl_bs, + n->page_size); + return -1; + } + n->zasl = 31 - clz32(n->params.zasl_bs / n->page_size); + } + nvme_set_timestamp(n, 0ULL); QTAILQ_INIT(&n->aer_queue); @@ -2440,12 +3363,12 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data, } } else if (!NVME_CC_EN(data) && NVME_CC_EN(n->bar.cc)) { trace_pci_nvme_mmio_stopped(); - nvme_clear_ctrl(n); + nvme_clear_ctrl(n, false); n->bar.csts &= ~NVME_CSTS_READY; } if (NVME_CC_SHN(data) && !(NVME_CC_SHN(n->bar.cc))) { trace_pci_nvme_mmio_shutdown_set(); - nvme_clear_ctrl(n); + nvme_clear_ctrl(n, true); n->bar.cc = data; n->bar.csts |= NVME_CSTS_SHST_COMPLETE; } else if (!NVME_CC_SHN(data) && NVME_CC_SHN(n->bar.cc)) { @@ -2792,6 +3715,13 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp) host_memory_backend_set_mapped(n->pmrdev, true); } + + if (n->params.zasl_bs) { + if (!is_power_of_2(n->params.zasl_bs)) { + error_setg(errp, "zone append size limit has to be a power of 2"); + return; + } + } } static void nvme_init_state(NvmeCtrl *n) @@ -3056,8 +3986,20 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) static void nvme_exit(PCIDevice *pci_dev) { NvmeCtrl *n = NVME(pci_dev); + NvmeNamespace *ns; + int i; + + nvme_clear_ctrl(n, true); + + for (i = 1; i <= n->num_namespaces; i++) { + ns = nvme_ns(n, i); + if (!ns) { + continue; + } + + nvme_ns_cleanup(ns); + } - nvme_clear_ctrl(n); g_free(n->cq); g_free(n->sq); g_free(n->aer_reqs); @@ -3085,6 +4027,8 @@ static Property nvme_props[] = { DEFINE_PROP_UINT32("aer_max_queued", NvmeCtrl, params.aer_max_queued, 64), DEFINE_PROP_UINT8("mdts", NvmeCtrl, params.mdts, 7), DEFINE_PROP_BOOL("use-intel-id", NvmeCtrl, params.use_intel_id, false), + DEFINE_PROP_SIZE32("zoned.append_size_limit", NvmeCtrl, params.zasl_bs, + NVME_DEFAULT_MAX_ZA_SIZE), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/trace-events b/hw/block/trace-events index 8b29423132..4d910bb942 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -89,6 +89,14 @@ pci_nvme_mmio_start_success(void) "setting controller enable bit succeeded" pci_nvme_mmio_stopped(void) "cleared controller enable bit" pci_nvme_mmio_shutdown_set(void) "shutdown bit set" pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared" +pci_nvme_open_zone(uint64_t slba, uint32_t zone_idx, int all) "open zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_close_zone(uint64_t slba, uint32_t zone_idx, int all) "close zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_reset_zone(uint64_t slba, uint32_t zone_idx, int all) "reset zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_offline_zone(uint64_t slba, uint32_t zone_idx, int all) "offline zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" +pci_nvme_set_descriptor_extension(uint64_t slba, uint32_t zone_idx) "set zone descriptor extension, slba=%"PRIu64", idx=%"PRIu32"" +pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Closed state" +pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Empty state" # nvme traces for error conditions pci_nvme_err_mdts(uint16_t cid, size_t len) "cid %"PRIu16" len %zu" @@ -107,7 +115,13 @@ pci_nvme_err_invalid_opc(uint8_t opc) "invalid opcode 0x%"PRIx8"" pci_nvme_err_invalid_admin_opc(uint8_t opc) "invalid admin opcode 0x%"PRIx8"" pci_nvme_err_invalid_lba_range(uint64_t start, uint64_t len, uint64_t limit) "Invalid LBA start=%"PRIu64" len=%"PRIu64" limit=%"PRIu64"" pci_nvme_err_invalid_log_page_offset(uint64_t ofs, uint64_t size) "must be <= %"PRIu64", got %"PRIu64"" -pci_nvme_err_only_nvm_cmd_set_avail(void) "setting 110b CC.CSS, but only NVM command set is enabled" +pci_nvme_err_unaligned_zone_cmd(uint8_t action, uint64_t slba, uint64_t zslba) "unaligned zone op 0x%"PRIx32", got slba=%"PRIu64", zslba=%"PRIu64"" +pci_nvme_err_invalid_zone_state_transition(uint8_t action, uint64_t slba, uint8_t attrs) "action=0x%"PRIx8", slba=%"PRIu64", attrs=0x%"PRIx32"" +pci_nvme_err_write_not_at_wp(uint64_t slba, uint64_t zone, uint64_t wp) "writing at slba=%"PRIu64", zone=%"PRIu64", but wp=%"PRIu64"" +pci_nvme_err_append_not_at_start(uint64_t slba, uint64_t zone) "appending at slba=%"PRIu64", but zone=%"PRIu64"" +pci_nvme_err_zone_write_not_ok(uint64_t slba, uint32_t nlb, uint32_t status) "slba=%"PRIu64", nlb=%"PRIu32", status=0x%"PRIx16"" +pci_nvme_err_zone_read_not_ok(uint64_t slba, uint32_t nlb, uint32_t status) "slba=%"PRIu64", nlb=%"PRIu32", status=0x%"PRIx16"" +pci_nvme_err_append_too_large(uint64_t slba, uint32_t nlb, uint8_t zasl) "slba=%"PRIu64", nlb=%"PRIu32", zasl=%"PRIu8"" pci_nvme_err_invalid_iocsci(uint32_t idx) "unsupported command set combination index %"PRIu32"" pci_nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16"" pci_nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16"" @@ -141,7 +155,9 @@ pci_nvme_err_startfail_sqent_too_large(uint8_t log2ps, uint8_t maxlog2ps) "nvme_ pci_nvme_err_startfail_css(uint8_t css) "nvme_start_ctrl failed because invalid command set selected:%u" pci_nvme_err_startfail_asqent_sz_zero(void) "nvme_start_ctrl failed because the admin submission queue size is zero" pci_nvme_err_startfail_acqent_sz_zero(void) "nvme_start_ctrl failed because the admin completion queue size is zero" +pci_nvme_err_startfail_zasl_too_small(uint32_t zasl, uint32_t pagesz) "nvme_start_ctrl failed because zone append size limit %"PRIu32" is too small, needs to be >= %"PRIu32"" pci_nvme_err_startfail(void) "setting controller enable bit failed" +pci_nvme_err_invalid_mgmt_action(int action) "action=0x%"PRIx8"" # Traces for undefined behavior pci_nvme_ub_mmiowr_misaligned32(uint64_t offset) "MMIO write not 32-bit aligned, offset=0x%"PRIx64"" From patchwork Fri Nov 6 23:43:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 322408 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25747C2D0A3 for ; Fri, 6 Nov 2020 23:59:20 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7120D206FC for ; Fri, 6 Nov 2020 23:59:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="F6/6rAXS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7120D206FC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:36200 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kbBdW-0004tC-CE for qemu-devel@archiver.kernel.org; Fri, 06 Nov 2020 18:59:18 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34336) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOQ-0005qX-L5; Fri, 06 Nov 2020 18:43:42 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:57360) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOO-0003Tq-Bj; Fri, 06 Nov 2020 18:43:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1604706221; x=1636242221; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qPsMt6Py9z6j9y71gIC3n0L3IIyZmGFlYik7fCYTXcw=; b=F6/6rAXSFjkfD5TyhoR9oyp7A9a99DVy1PZVGwhb495nhJgz2bGReNPE V5g6412xdCtsU8WgSmZ8y9pzRgamgY2LP2CyKnbtPh2Ra53vKrmBLW+8u 0ZIkf9IOBm5G2dRlgQDrVQYlg4UkGOqt1mQv0YTIb8+ymU5FN5W6IFx4w f1xqroMGZUDQ/k/IUmoVhRxdk4A12k2aC+fkjqJmuVmgKUs5Xbdjw4fmJ 2vQWYbfswgNcndZEWhIzapWKKaSMReHx1ECuvtCzTj6nzTDaM94Lcu+9z 1BlnC3fji1fQdveu80xXoRNwYpwGymjCJ/46GN8IWnbESaEiFbEE3lA7O A==; IronPort-SDR: cThB30qgDrYNtxCuLvLkcCJAix5F58kmjeiXV/WvQHWQd2vHDygFgKT0q7qkXxe9HjjK71oxeC F4O3Mo5VeBZuJGMbFj1jPt3jjaLHo3D6IvbvYkVPtyaAb4LDptpx8yLYDs3EWGWowgZFVgMfX1 edy7Dz2TUDiPKHSiADkGTyu9XXV1ehdt300S6gzRXte+Afe/t5UTSQ6cXMlzhjGmUUjkLkBVHl jpplMde75xD3OD9vHE0JdwhIFRzbuTnrqg4+V55/F19GltvGGKATPYYXlI4EGVKewVWRW48o5v BN4= X-IronPort-AV: E=Sophos;i="5.77,457,1596470400"; d="scan'208";a="153267076" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Nov 2020 07:43:39 +0800 IronPort-SDR: k2JnEs3xjvEeSWr5uPezZxRfSO3JgEfWNF5rlyDvOhCp4V0nqiJoPqzQ63TYarTN3GHUyZcySA PwfpHqRBpExltMBpJI/IPwD92EH42A77ENU1iqmu0J9SA9lyU0IZgRDuS/93aujyS2e/8ySaef 8f7veeCNEw3EuFgmE/sLRAJPfd2E/EZ/dWmTO+DvS0+gN9X/cB108TRlIVc5DA5FgQgGJWk97I NF2tq5lYDKuwy0GN1IWHpXOC132IZuvO4/ZVN/TwsNIIbtwyxW2MHwvStMg04CPqf4Qg/VnlO0 ziYaVlihhO/YtNYFzJcDRWE6 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:28:30 -0800 IronPort-SDR: UwHjA+46+wJCteUNTumATB4AKoh5RIXR7tvY+9pNQo1VYpSM0QjXm9gaMS4z6VBRm2f+D9B99C 9I8matuxoFHnwWRgnAL57bFErEXVeiKOkLXfrHyWOr93qV2NtVICVIgc8fZXp1NMiCkKoWiaNB OHyko58UxeMi63rTaKuZjZGzXvJFxNcOVhy4Ou8fFYVlg3uanrRZ8Gpio2HvMq5m4PeNTh/pOS S2+38aUepr8pboiezQ3FkxDyBvrFMlffCWhzuxCsiBAgTUKdLRtRQ15ITthBsrkdGYeqL+PYtP XkE= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip01.wdc.com with ESMTP; 06 Nov 2020 15:43:36 -0800 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Max Reitz , Maxim Levitsky , Fam Zheng Subject: [PATCH v10 10/12] hw/block/nvme: Support Zone Descriptor Extensions Date: Sat, 7 Nov 2020 08:43:03 +0900 Message-Id: <20201106234305.21339-11-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201106234305.21339-1-dmitry.fomichev@wdc.com> References: <20201106234305.21339-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.45; envelope-from=prvs=572b21b8d=dmitry.fomichev@wdc.com; helo=esa6.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/06 18:43:12 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Zone Descriptor Extension is a label that can be assigned to a zone. It can be set to an Empty zone and it stays assigned until the zone is reset. This commit adds a new optional module property, "zoned.descr_ext_size". Its value must be a multiple of 64 bytes. If this value is non-zero, it becomes possible to assign extensions of that size to any Empty zones. The default value for this property is 0, therefore setting extensions is disabled by default. Signed-off-by: Hans Holmberg Signed-off-by: Dmitry Fomichev Reviewed-by: Klaus Jensen Reviewed-by: Niklas Cassel --- hw/block/nvme-ns.h | 8 +++++++ hw/block/nvme-ns.c | 25 ++++++++++++++++++-- hw/block/nvme.c | 54 +++++++++++++++++++++++++++++++++++++++++-- hw/block/trace-events | 2 ++ 4 files changed, 85 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 421bab0a57..50a6a0e1ac 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -35,6 +35,7 @@ typedef struct NvmeNamespaceParams { uint64_t zone_cap_bs; uint32_t max_active_zones; uint32_t max_open_zones; + uint32_t zd_extension_size; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -58,6 +59,7 @@ typedef struct NvmeNamespace { uint64_t zone_capacity; uint64_t zone_array_size; uint32_t zone_size_log2; + uint8_t *zd_extensions; int32_t nr_open_zones; int32_t nr_active_zones; @@ -127,6 +129,12 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone) st != NVME_ZONE_STATE_OFFLINE; } +static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, + uint32_t zone_idx) +{ + return &ns->zd_extensions[zone_idx * ns->params.zd_extension_size]; +} + static inline void nvme_aor_inc_open(NvmeNamespace *ns) { assert(ns->nr_open_zones >= 0); diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 2e45838c15..85dc73cf06 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -133,6 +133,18 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) return -1; } + if (ns->params.zd_extension_size) { + if (ns->params.zd_extension_size & 0x3f) { + error_setg(errp, + "zone descriptor extension size must be a multiple of 64B"); + return -1; + } + if ((ns->params.zd_extension_size >> 6) > 0xff) { + error_setg(errp, "zone descriptor extension size is too large"); + return -1; + } + } + return 0; } @@ -144,6 +156,10 @@ static void nvme_init_zone_state(NvmeNamespace *ns) int i; ns->zone_array = g_malloc0(ns->zone_array_size); + if (ns->params.zd_extension_size) { + ns->zd_extensions = g_malloc0(ns->params.zd_extension_size * + ns->num_zones); + } QTAILQ_INIT(&ns->exp_open_zones); QTAILQ_INIT(&ns->imp_open_zones); @@ -186,7 +202,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size); - id_ns_z->lbafe[lba_index].zdes = 0; + id_ns_z->lbafe[lba_index].zdes = + ns->params.zd_extension_size >> 6; /* Units of 64B */ ns->csi = NVME_CSI_ZONED; ns->id_ns.nsze = cpu_to_le64(ns->zone_size * ns->num_zones); @@ -204,7 +221,8 @@ static void nvme_clear_zone(NvmeNamespace *ns, NvmeZone *zone) zone->w_ptr = zone->d.wp; state = nvme_get_zone_state(zone); - if (zone->d.wp != zone->d.zslba) { + if (zone->d.wp != zone->d.zslba || + (zone->d.za & NVME_ZA_ZD_EXT_VALID)) { if (state != NVME_ZONE_STATE_CLOSED) { trace_pci_nvme_clear_ns_close(state, zone->d.zslba); nvme_set_zone_state(zone, NVME_ZONE_STATE_CLOSED); @@ -301,6 +319,7 @@ void nvme_ns_cleanup(NvmeNamespace *ns) if (ns->params.zoned) { g_free(ns->id_ns_zoned); g_free(ns->zone_array); + g_free(ns->zd_extensions); } } @@ -332,6 +351,8 @@ static Property nvme_ns_props[] = { params.max_active_zones, 0), DEFINE_PROP_UINT32("zoned.max_open", NvmeNamespace, params.max_open_zones, 0), + DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace, + params.zd_extension_size, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index c6b3a5dcf7..e82e3be821 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1703,6 +1703,26 @@ static uint16_t nvme_offline_zone(NvmeNamespace *ns, NvmeZone *zone, return NVME_ZONE_INVAL_TRANSITION; } +static uint16_t nvme_set_zd_ext(NvmeNamespace *ns, NvmeZone *zone) +{ + uint16_t status; + uint8_t state = nvme_get_zone_state(zone); + + if (state == NVME_ZONE_STATE_EMPTY) { + nvme_auto_transition_zone(ns, false, true); + status = nvme_aor_check(ns, 1, 0); + if (status != NVME_SUCCESS) { + return status; + } + nvme_aor_inc_active(ns); + zone->d.za |= NVME_ZA_ZD_EXT_VALID; + nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_CLOSED); + return NVME_SUCCESS; + } + + return NVME_ZONE_INVAL_TRANSITION; +} + static uint16_t nvme_bulk_proc_zone(NvmeNamespace *ns, NvmeZone *zone, enum NvmeZoneProcessingMask proc_mask, op_handler_t op_hndlr) @@ -1798,6 +1818,7 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) NvmeCmd *cmd = (NvmeCmd *)&req->cmd; NvmeNamespace *ns = req->ns; NvmeZone *zone; + uint8_t *zd_ext; uint32_t dw13 = le32_to_cpu(cmd->cdw13); uint64_t slba = 0; uint32_t zone_idx = 0; @@ -1870,7 +1891,22 @@ static uint16_t nvme_zone_mgmt_send(NvmeCtrl *n, NvmeRequest *req) case NVME_ZONE_ACTION_SET_ZD_EXT: trace_pci_nvme_set_descriptor_extension(slba, zone_idx); - return NVME_INVALID_FIELD | NVME_DNR; + if (all || !ns->params.zd_extension_size) { + return NVME_INVALID_FIELD | NVME_DNR; + } + zd_ext = nvme_get_zd_extension(ns, zone_idx); + status = nvme_dma(n, zd_ext, ns->params.zd_extension_size, + DMA_DIRECTION_TO_DEVICE, req); + if (status) { + trace_pci_nvme_err_zd_extension_map_error(zone_idx); + return status; + } + + status = nvme_set_zd_ext(ns, zone); + if (status == NVME_SUCCESS) { + trace_pci_nvme_zd_extension_set(zone_idx); + return status; + } break; default: @@ -1940,7 +1976,10 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) } zra = dw13 & 0xff; - if (zra != NVME_ZONE_REPORT) { + if (zra != NVME_ZONE_REPORT && zra != NVME_ZONE_REPORT_EXTENDED) { + return NVME_INVALID_FIELD | NVME_DNR; + } + if (zra == NVME_ZONE_REPORT_EXTENDED && !ns->params.zd_extension_size) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1962,6 +2001,9 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) partial = (dw13 >> 16) & 0x01; zone_entry_sz = sizeof(NvmeZoneDescr); + if (zra == NVME_ZONE_REPORT_EXTENDED) { + zone_entry_sz += ns->params.zd_extension_size; + } max_zones = (data_size - sizeof(NvmeZoneReportHeader)) / zone_entry_sz; buf = g_malloc0(data_size); @@ -1993,6 +2035,14 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) z->wp = cpu_to_le64(~0ULL); } + if (zra == NVME_ZONE_REPORT_EXTENDED) { + if (zs->d.za & NVME_ZA_ZD_EXT_VALID) { + memcpy(buf_p, nvme_get_zd_extension(ns, zone_idx), + ns->params.zd_extension_size); + } + buf_p += ns->params.zd_extension_size; + } + zone_idx++; } diff --git a/hw/block/trace-events b/hw/block/trace-events index e674522883..d42d2c8d61 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -95,6 +95,7 @@ pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, sl pci_nvme_reset_zone(uint64_t slba, uint32_t zone_idx, int all) "reset zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" pci_nvme_offline_zone(uint64_t slba, uint32_t zone_idx, int all) "offline zone, slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32"" pci_nvme_set_descriptor_extension(uint64_t slba, uint32_t zone_idx) "set zone descriptor extension, slba=%"PRIu64", idx=%"PRIu32"" +pci_nvme_zd_extension_set(uint32_t zone_idx) "set descriptor extension for zone_idx=%"PRIu32"" pci_nvme_clear_ns_close(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Closed state" pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Empty state" @@ -124,6 +125,7 @@ pci_nvme_err_zone_read_not_ok(uint64_t slba, uint32_t nlb, uint32_t status) "slb pci_nvme_err_append_too_large(uint64_t slba, uint32_t nlb, uint8_t zasl) "slba=%"PRIu64", nlb=%"PRIu32", zasl=%"PRIu8"" pci_nvme_err_insuff_active_res(uint32_t max_active) "max_active=%"PRIu32" zone limit exceeded" pci_nvme_err_insuff_open_res(uint32_t max_open) "max_open=%"PRIu32" zone limit exceeded" +pci_nvme_err_zd_extension_map_error(uint32_t zone_idx) "can't map descriptor extension for zone_idx=%"PRIu32"" pci_nvme_err_invalid_iocsci(uint32_t idx) "unsupported command set combination index %"PRIu32"" pci_nvme_err_invalid_del_sq(uint16_t qid) "invalid submission queue deletion, sid=%"PRIu16"" pci_nvme_err_invalid_create_sq_cqid(uint16_t cqid) "failed creating submission queue, invalid cqid=%"PRIu16"" From patchwork Fri Nov 6 23:43:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 322413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 715D0C2D0A3 for ; Fri, 6 Nov 2020 23:47:58 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DCEAE20704 for ; Fri, 6 Nov 2020 23:47:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="RM/LIEx0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DCEAE20704 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:40606 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kbBSW-0002uG-Ou for qemu-devel@archiver.kernel.org; Fri, 06 Nov 2020 18:47:56 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34360) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOT-0005xm-81; Fri, 06 Nov 2020 18:43:45 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:57360) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOR-0003Tq-C9; Fri, 06 Nov 2020 18:43:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1604706224; x=1636242224; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y+4Hkf4JH02Acl3UmgoBUtRDqe6KVXNmEtKOCZ+YtyY=; b=RM/LIEx0Yqji8g14RRsY2j7+NVTLqwgLoa+Sf5ZBQ8i1bX+Xp/UmFYoB gMvkxZX+HG5hiwZjduuaqxDif+atu7HTMiZNHOVwquuKy/9ksoLLFM8yE JoSJHiCNEnIuD21AbAdYvNOckI/GdUnc/sqf3yr0uZGPIUBGTs3SBYeOn YYcDo40XIuwQ52Z/HH0ql0WmJB/Kvqi/hRzqScX1U1BGiT/TTZJEZMSkb 0fZdv9FlH/Dnfk6R560XBHPUxjDpw87AUaBSNjS/6JjUMf4+vb+x4wWhb cN+hAsYKo4dRf4+lTq33mHVY8lIZxIPbrnIXXU8w8XeGRK/uD23O1KqrZ A==; IronPort-SDR: x/32uDi2W330jtptJ0cubV2YquQntA3pKj1GEZN1Ziob7ujTGezMRCCnpPykg99Bl1jnwvGYiw jRgHPn6j4DwOFyP4NC+VD8+HTH9mqCucnAefzfVCIOh6h3FVa2ygxMwjQlaPtnCxA24ZB1hCOE yCi0BfJRw5FtUbcU1DTktxyNEIlmJWzJMjtXWcauINqT2/czIO6YFW4Pgoaa7exXGOnH5nWxtg ax68dY+JFx3TgeamLaISAL2Zquj+RlFvzJyY6FBOt5g0EzArysomwGBkvpuyiWwMCshp8w6JxI 3FE= X-IronPort-AV: E=Sophos;i="5.77,457,1596470400"; d="scan'208";a="153267078" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Nov 2020 07:43:43 +0800 IronPort-SDR: bZc8qEBcqsLLt8u96Gg8bJg1RiHf+Scgxe/PbGGWNMfd2e5R1xBILCq7we0iVlPEEWr+Ifkekd AVr9XxHst8QBGxNX8P9cvGTdkKDlhOpTHZ5ZHZN6bdzZuArEhKldhND3UfSB0uo1Cjm2XBOMVU WCBgMuveTTIOkjiNcINCvrBeeBdVzGQebpO2bhbfYM5kj0L1fIrGUwcg4znZ/FvMIgfiAYxdF9 x05m+Ctr2cKp72ry0ztaWF837zqmk4HmsITwXp6+28m8jVdQs+/Hin4poerYcGtSou1FJ77Tfh MJtx0YWOOlZiH3JglQSsXa44 Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:28:33 -0800 IronPort-SDR: WRG7ra1xj1qi5gbCGA/GawMVoRZ4iI3YvcLNL1tR1xwi69cqbCG8dcaP+Uh1DjBXVLHr2pHdzE VTL4L1Suc8/W99zmYkInJdYbRkPJBT9vNkYHG1C+Gz88Pvo+imJipZN+ZKXL/3dZm+WKF534X/ 7/hfOnSuIM2j+xDNouI+Ii7VTXJjOoBNguwOWoo/VA5P2PnwIJZrV8SDhjL/54XXCaE4qtcuPa ZUTT3F8E1KWtA2lBngk+SnMxoUErmxIvRYMa4R5a0ZnjkOLhEFkyRvyWKzJ8k8+1LaMRZlhsNV 09c= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip01.wdc.com with ESMTP; 06 Nov 2020 15:43:39 -0800 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Max Reitz , Maxim Levitsky , Fam Zheng Subject: [PATCH v10 11/12] hw/block/nvme: Add injection of Offline/Read-Only zones Date: Sat, 7 Nov 2020 08:43:04 +0900 Message-Id: <20201106234305.21339-12-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201106234305.21339-1-dmitry.fomichev@wdc.com> References: <20201106234305.21339-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.45; envelope-from=prvs=572b21b8d=dmitry.fomichev@wdc.com; helo=esa6.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/06 18:43:12 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" ZNS specification defines two zone conditions for the zones that no longer can function properly, possibly because of flash wear or other internal fault. It is useful to be able to "inject" a small number of such zones for testing purposes. This commit defines two optional device properties, "offline_zones" and "rdonly_zones". Users can assign non-zero values to these variables to specify the number of zones to be initialized as Offline or Read-Only. The actual number of injected zones may be smaller than the requested amount - Read-Only and Offline counts are expected to be much smaller than the total number of zones on a drive. Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel --- hw/block/nvme-ns.h | 2 ++ hw/block/nvme-ns.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 50a6a0e1ac..b30478e5d7 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -36,6 +36,8 @@ typedef struct NvmeNamespaceParams { uint32_t max_active_zones; uint32_t max_open_zones; uint32_t zd_extension_size; + uint32_t nr_offline_zones; + uint32_t nr_rdonly_zones; } NvmeNamespaceParams; typedef struct NvmeNamespace { diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 85dc73cf06..5e4a6705cd 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -21,6 +21,7 @@ #include "sysemu/sysemu.h" #include "sysemu/block-backend.h" #include "qapi/error.h" +#include "crypto/random.h" #include "hw/qdev-properties.h" #include "hw/qdev-core.h" @@ -145,6 +146,20 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) } } + if (ns->params.max_open_zones < nz) { + if (ns->params.nr_offline_zones > nz - ns->params.max_open_zones) { + error_setg(errp, "offline_zones value %u is too large", + ns->params.nr_offline_zones); + return -1; + } + if (ns->params.nr_rdonly_zones > + nz - ns->params.max_open_zones - ns->params.nr_offline_zones) { + error_setg(errp, "rdonly_zones value %u is too large", + ns->params.nr_rdonly_zones); + return -1; + } + } + return 0; } @@ -153,7 +168,9 @@ static void nvme_init_zone_state(NvmeNamespace *ns) uint64_t start = 0, zone_size = ns->zone_size; uint64_t capacity = ns->num_zones * zone_size; NvmeZone *zone; + uint32_t rnd; int i; + uint16_t zs; ns->zone_array = g_malloc0(ns->zone_array_size); if (ns->params.zd_extension_size) { @@ -180,6 +197,37 @@ static void nvme_init_zone_state(NvmeNamespace *ns) zone->w_ptr = start; start += zone_size; } + + /* If required, make some zones Offline or Read Only */ + + for (i = 0; i < ns->params.nr_offline_zones; i++) { + do { + qcrypto_random_bytes(&rnd, sizeof(rnd), NULL); + rnd %= ns->num_zones; + } while (rnd < ns->params.max_open_zones); + zone = &ns->zone_array[rnd]; + zs = nvme_get_zone_state(zone); + if (zs != NVME_ZONE_STATE_OFFLINE) { + nvme_set_zone_state(zone, NVME_ZONE_STATE_OFFLINE); + } else { + i--; + } + } + + for (i = 0; i < ns->params.nr_rdonly_zones; i++) { + do { + qcrypto_random_bytes(&rnd, sizeof(rnd), NULL); + rnd %= ns->num_zones; + } while (rnd < ns->params.max_open_zones); + zone = &ns->zone_array[rnd]; + zs = nvme_get_zone_state(zone); + if (zs != NVME_ZONE_STATE_OFFLINE && + zs != NVME_ZONE_STATE_READ_ONLY) { + nvme_set_zone_state(zone, NVME_ZONE_STATE_READ_ONLY); + } else { + i--; + } + } } static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, @@ -353,6 +401,10 @@ static Property nvme_ns_props[] = { params.max_open_zones, 0), DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace, params.zd_extension_size, 0), + DEFINE_PROP_UINT32("zoned.offline_zones", NvmeNamespace, + params.nr_offline_zones, 0), + DEFINE_PROP_UINT32("zoned.rdonly_zones", NvmeNamespace, + params.nr_rdonly_zones, 0), DEFINE_PROP_END_OF_LIST(), }; From patchwork Fri Nov 6 23:43:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Fomichev X-Patchwork-Id: 322414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12B02C2D0A3 for ; Fri, 6 Nov 2020 23:45:34 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7435B20702 for ; Fri, 6 Nov 2020 23:45:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="HCQpyZ27" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7435B20702 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=wdc.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:34336 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kbBQC-0000FS-DH for qemu-devel@archiver.kernel.org; Fri, 06 Nov 2020 18:45:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34394) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOW-00065z-7w; Fri, 06 Nov 2020 18:43:48 -0500 Received: from esa6.hgst.iphmx.com ([216.71.154.45]:57360) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kbBOT-0003Tq-Sl; Fri, 06 Nov 2020 18:43:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1604706226; x=1636242226; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fYRGLGEOoxEBnGfF2tyEuYZSUoEdxDjp8wNk3+vvDGE=; b=HCQpyZ27xbVRX4rzWs+g6Oe0loLfjrPILTkps/JusEkw4h8UXVuWIzsX wOyED/iejWUHuKyFxB3AEoCz0HTNhdhx69KcRWo6fWnYpTFN1TKoNEQGB eMUewM/8fKeCxr5Ib2lR0T7KQG/sHYEeIL1C1EPvcaXFPNZeGgAZEsdD9 W51e17cYDGORAN6gbBXcj6gtJqUUG/wGthQGb6PgYXIsm8IzPIgAfsRYb iOqycUZ1+Mt3qGAdfzkwbA3MYgzO3s9qYFmmuwlePkpiSk4fBIxCpNlv7 NksUJ1wEQ9J8i3ldrLRoWNh3KXSgWONr3JbCUinM+/F5sXi7TTaxPqz5G g==; IronPort-SDR: 2NW0z+4hkOzEFlzaCSQPo2kmAm6/pL/LqWnhbyu5Gx99iT+LKG+drUZPA4dW9OrNn5Ep4yc6GK LrdJQ0c3F16DD1wPmdVz/cTDSj9xuY9J38el7o3I0EnAORw6S88owo3OF1IRo1SaLFy/TPFIoZ Lh2n1vPZ6jCO6hisaXZe9XyzyAyf/O/7u1hcCLFHAm2Vf5RW7ubRai1t81PwAkj+MzLowWUHak ZzDvJTCir+RzwzxkNubVD0JoUC+eOypNjy3YtwU5hH7Oewbb8sLlLylPziOilvngeublKN60hr SIE= X-IronPort-AV: E=Sophos;i="5.77,457,1596470400"; d="scan'208";a="153267080" Received: from h199-255-45-15.hgst.com (HELO uls-op-cesaep02.wdc.com) ([199.255.45.15]) by ob1.hgst.iphmx.com with ESMTP; 07 Nov 2020 07:43:45 +0800 IronPort-SDR: F/6FZxJQr88nyu/0O6RWpt4/wSn1FPxf8yyxdfkqEqSa9ns96RjkF6ZYK8FJzr9J4y5TlujZnY 4+YFyfOTTfij5MAbr64Z44dnwwsZ7cybvNMgcmzFmBaAlJzNKcFy4+HrGr+Mc7xsV+OivKNcRr 6WJOhlnkINvA4X4cx5m97VwvDCeG2mAV4hRSo1XVWLVPajzJGUo4X7fMArOwnrAhYOfA9m52dr IIgZ/VWw7G/DsO+MVRBmKhPsNeaE4iGPhLRCZBBUL8coK6OGdPaRkQG00zI12PXjukXi+90jVU q7symgHTHBLB/czHwYt+7RPW Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep02.wdc.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2020 15:28:36 -0800 IronPort-SDR: 1tqXDhHyaKLSVGTR1PEtg1K6N3iblJH1/I+w4I93OFJLIsq6wLzQSSwOlMZDmINfHpeL4aUUXc Tc3YFDzGBK6+IQpC0iE1eqNv0F3BpX9ijAu4qjxP9oeaKVe1KT4P4y69tWrB2wzgtCoYbB1KIL gwcnyLrYp506zpA2afOnYQTOp7AzA3NcSrDDCaJMi5TkFULUxjD6RIeqQqbdQDrCtmj89y9tV0 133z2/sXgp1lmF5uFKLBvQWk+R/qGx3N0TnfM/RtqbRTBbBIY7q82F+/94jD9qpJmKiiovZmQ7 XQM= WDCIronportException: Internal Received: from unknown (HELO redsun50.ssa.fujisawa.hgst.com) ([10.149.66.24]) by uls-op-cesaip01.wdc.com with ESMTP; 06 Nov 2020 15:43:42 -0800 From: Dmitry Fomichev To: Keith Busch , Klaus Jensen , Kevin Wolf , =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , Max Reitz , Maxim Levitsky , Fam Zheng Subject: [PATCH v10 12/12] hw/block/nvme: Document zoned parameters in usage text Date: Sat, 7 Nov 2020 08:43:05 +0900 Message-Id: <20201106234305.21339-13-dmitry.fomichev@wdc.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20201106234305.21339-1-dmitry.fomichev@wdc.com> References: <20201106234305.21339-1-dmitry.fomichev@wdc.com> MIME-Version: 1.0 Received-SPF: pass client-ip=216.71.154.45; envelope-from=prvs=572b21b8d=dmitry.fomichev@wdc.com; helo=esa6.hgst.iphmx.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/06 18:43:12 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Niklas Cassel , Damien Le Moal , qemu-block@nongnu.org, Dmitry Fomichev , qemu-devel@nongnu.org, Alistair Francis , Matias Bjorling Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Added brief descriptions of the new device properties that are now available to users to configure features of Zoned Namespace Command Set in the emulator. This patch is for documentation only, no functionality change. Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel --- hw/block/nvme.c | 47 ++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 5 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index e82e3be821..6043f95116 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -9,7 +9,7 @@ */ /** - * Reference Specs: http://www.nvmexpress.org, 1.2, 1.1, 1.0e + * Reference Specs: http://www.nvmexpress.org, 1.4, 1.3, 1.2, 1.1, 1.0e * * https://nvmexpress.org/developers/nvme-specification/ */ @@ -22,8 +22,9 @@ * [pmrdev=,] \ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ - * mdts= - * -device nvme-ns,drive=,bus=bus_name,nsid= + * mdts=,zoned.append_size_limit= \ + * -device nvme-ns,drive=,bus=,nsid=,\ + * zoned= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. @@ -41,14 +42,50 @@ * ~~~~~~~~~~~~~~~~~~~~~~ * - `aerl` * The Asynchronous Event Request Limit (AERL). Indicates the maximum number - * of concurrently outstanding Asynchronous Event Request commands suppoert + * of concurrently outstanding Asynchronous Event Request commands support * by the controller. This is a 0's based value. * * - `aer_max_queued` * This is the maximum number of events that the device will enqueue for - * completion when there are no oustanding AERs. When the maximum number of + * completion when there are no outstanding AERs. When the maximum number of * enqueued events are reached, subsequent events will be dropped. * + * - `zoned.append_size_limit` + * The maximum I/O size in bytes that is allowed in Zone Append command. + * The default is 128KiB. Since internally this this value is maintained as + * ZASL = log2( / ), some values assigned + * to this property may be rounded down and result in a lower maximum ZA + * data size being in effect. By setting this property to 0, users can make + * ZASL to be equal to MDTS. This property only affects zoned namespaces. + * + * Setting `zoned` to true selects Zoned Command Set at the namespace. + * In this case, the following namespace properties are available to configure + * zoned operation: + * zoned.zsze= + * The number may be followed by K, M, G as in kilo-, mega- or giga-. + * + * zoned.zcap= + * The value 0 (default) forces zone capacity to be the same as zone + * size. The value of this property may not exceed zone size. + * + * zoned.descr_ext_size= + * This value needs to be specified in 64B units. If it is zero, + * namespace(s) will not support zone descriptor extensions. + * + * zoned.max_active= + * The default value means there is no limit to the number of + * concurrently active zones. + * + * zoned.max_open= + * The default value means there is no limit to the number of + * concurrently open zones. + * + * zoned.offline_zones= + * + * zoned.rdonly_zones= + * + * zoned.cross_zone_read= + * Setting this property to true enables Read Across Zone Boundaries. */ #include "qemu/osdep.h"