diff mbox series

[v5,13/14] hw/block/nvme: Use zone metadata file for persistence

Message ID 20200928023528.15260-14-dmitry.fomichev@wdc.com
State New
Headers show
Series hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set | expand

Commit Message

Dmitry Fomichev Sept. 28, 2020, 2:35 a.m. UTC
A ZNS drive that is emulated by this module is currently initialized
with all zones Empty upon startup. However, actual ZNS SSDs save the
state and condition of all zones in their internal NVRAM in the event
of power loss. When such a drive is powered up again, it closes or
finishes all zones that were open at the moment of shutdown. Besides
that, the write pointer position as well as the state and condition
of all zones is preserved across power-downs.

This commit adds the capability to have a persistent zone metadata
to the device. The new optional module property, "zone_file",
is introduced. If added to the command line, this property specifies
the name of the file that stores the zone metadata. If "zone_file" is
omitted, the device will be initialized with all zones empty, the same
as before.

If zone metadata is configured to be persistent, then zone descriptor
extensions also persist across controller shutdowns.

Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
 hw/block/nvme-ns.c    | 341 ++++++++++++++++++++++++++++++++++++++++--
 hw/block/nvme-ns.h    |  33 ++++
 hw/block/nvme.c       |   2 +
 hw/block/trace-events |   1 +
 4 files changed, 362 insertions(+), 15 deletions(-)

Comments

Klaus Jensen Sept. 28, 2020, 7:51 a.m. UTC | #1
On Sep 28 11:35, Dmitry Fomichev wrote:
> A ZNS drive that is emulated by this module is currently initialized

> with all zones Empty upon startup. However, actual ZNS SSDs save the

> state and condition of all zones in their internal NVRAM in the event

> of power loss. When such a drive is powered up again, it closes or

> finishes all zones that were open at the moment of shutdown. Besides

> that, the write pointer position as well as the state and condition

> of all zones is preserved across power-downs.

> 

> This commit adds the capability to have a persistent zone metadata

> to the device. The new optional module property, "zone_file",

> is introduced. If added to the command line, this property specifies

> the name of the file that stores the zone metadata. If "zone_file" is

> omitted, the device will be initialized with all zones empty, the same

> as before.

> 

> If zone metadata is configured to be persistent, then zone descriptor

> extensions also persist across controller shutdowns.

> 

> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>

> ---

>  hw/block/nvme-ns.c    | 341 ++++++++++++++++++++++++++++++++++++++++--

>  hw/block/nvme-ns.h    |  33 ++++

>  hw/block/nvme.c       |   2 +

>  hw/block/trace-events |   1 +

>  4 files changed, 362 insertions(+), 15 deletions(-)

> 

> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c

> index 47751f2d54..a94021da81 100644

> --- a/hw/block/nvme-ns.c

> +++ b/hw/block/nvme-ns.c

> @@ -293,12 +421,180 @@ static void nvme_init_zone_meta(NvmeNamespace *ns)

>              i--;

>          }

>      }

> +

> +    if (ns->params.zone_file) {

> +        nvme_set_zone_meta_dirty(ns);

> +    }

> +}

> +

> +static int nvme_open_zone_file(NvmeNamespace *ns, bool *init_meta,

> +                               Error **errp)

> +{

> +    Object *file_be;

> +    HostMemoryBackend *fb;

> +    struct stat statbuf;

> +    int ret;

> +

> +    ret = stat(ns->params.zone_file, &statbuf);

> +    if (ret && errno == ENOENT) {

> +        *init_meta = true;

> +    } else if (!S_ISREG(statbuf.st_mode)) {

> +        error_setg(errp, "\"%s\" is not a regular file",

> +                   ns->params.zone_file);

> +        return -1;

> +    }

> +

> +    file_be = object_new(TYPE_MEMORY_BACKEND_FILE);

> +    object_property_set_str(file_be, "mem-path", ns->params.zone_file,

> +                            &error_abort);

> +    object_property_set_int(file_be, "size", ns->meta_size, &error_abort);

> +    object_property_set_bool(file_be, "share", true, &error_abort);

> +    object_property_set_bool(file_be, "discard-data", false, &error_abort);

> +    if (!user_creatable_complete(USER_CREATABLE(file_be), errp)) {

> +        object_unref(file_be);

> +        return -1;

> +    }

> +    object_property_add_child(OBJECT(ns), "_fb", file_be);

> +    object_unref(file_be);

> +

> +    fb = MEMORY_BACKEND(file_be);

> +    ns->zone_mr = host_memory_backend_get_memory(fb);

> +

> +    return 0;

> +}

> +

> +static int nvme_map_zone_file(NvmeNamespace *ns, bool *init_meta)

> +{

> +    ns->zone_meta = (void *)memory_region_get_ram_ptr(ns->zone_mr);


I forgot that the HostMemoryBackend doesn't magically make the memory
available to the device, so of course this is still needed.

Anyway.

No reason for me to keep complaining about this. I do not like it, I
will not ACK it and I think I made my reasons pretty clear.
Dmitry Fomichev Sept. 29, 2020, 3:43 p.m. UTC | #2
> -----Original Message-----

> From: Klaus Jensen <its@irrelevant.dk>

> Sent: Monday, September 28, 2020 3:52 AM

> To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>

> Cc: Keith Busch <kbusch@kernel.org>; Klaus Jensen

> <k.jensen@samsung.com>; Kevin Wolf <kwolf@redhat.com>; Philippe

> Mathieu-Daudé <philmd@redhat.com>; Maxim Levitsky

> <mlevitsk@redhat.com>; Fam Zheng <fam@euphon.net>; Niklas Cassel

> <Niklas.Cassel@wdc.com>; Damien Le Moal <Damien.LeMoal@wdc.com>;

> qemu-block@nongnu.org; qemu-devel@nongnu.org; Alistair Francis

> <Alistair.Francis@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>

> Subject: Re: [PATCH v5 13/14] hw/block/nvme: Use zone metadata file for

> persistence

> 

> On Sep 28 11:35, Dmitry Fomichev wrote:

> > A ZNS drive that is emulated by this module is currently initialized

> > with all zones Empty upon startup. However, actual ZNS SSDs save the

> > state and condition of all zones in their internal NVRAM in the event

> > of power loss. When such a drive is powered up again, it closes or

> > finishes all zones that were open at the moment of shutdown. Besides

> > that, the write pointer position as well as the state and condition

> > of all zones is preserved across power-downs.

> >

> > This commit adds the capability to have a persistent zone metadata

> > to the device. The new optional module property, "zone_file",

> > is introduced. If added to the command line, this property specifies

> > the name of the file that stores the zone metadata. If "zone_file" is

> > omitted, the device will be initialized with all zones empty, the same

> > as before.

> >

> > If zone metadata is configured to be persistent, then zone descriptor

> > extensions also persist across controller shutdowns.

> >

> > Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>

> > ---

> >  hw/block/nvme-ns.c    | 341

> ++++++++++++++++++++++++++++++++++++++++--

> >  hw/block/nvme-ns.h    |  33 ++++

> >  hw/block/nvme.c       |   2 +

> >  hw/block/trace-events |   1 +

> >  4 files changed, 362 insertions(+), 15 deletions(-)

> >

> > diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c

> > index 47751f2d54..a94021da81 100644

> > --- a/hw/block/nvme-ns.c

> > +++ b/hw/block/nvme-ns.c

> > @@ -293,12 +421,180 @@ static void

> nvme_init_zone_meta(NvmeNamespace *ns)

> >              i--;

> >          }

> >      }

> > +

> > +    if (ns->params.zone_file) {

> > +        nvme_set_zone_meta_dirty(ns);

> > +    }

> > +}

> > +

> > +static int nvme_open_zone_file(NvmeNamespace *ns, bool *init_meta,

> > +                               Error **errp)

> > +{

> > +    Object *file_be;

> > +    HostMemoryBackend *fb;

> > +    struct stat statbuf;

> > +    int ret;

> > +

> > +    ret = stat(ns->params.zone_file, &statbuf);

> > +    if (ret && errno == ENOENT) {

> > +        *init_meta = true;

> > +    } else if (!S_ISREG(statbuf.st_mode)) {

> > +        error_setg(errp, "\"%s\" is not a regular file",

> > +                   ns->params.zone_file);

> > +        return -1;

> > +    }

> > +

> > +    file_be = object_new(TYPE_MEMORY_BACKEND_FILE);

> > +    object_property_set_str(file_be, "mem-path", ns->params.zone_file,

> > +                            &error_abort);

> > +    object_property_set_int(file_be, "size", ns->meta_size, &error_abort);

> > +    object_property_set_bool(file_be, "share", true, &error_abort);

> > +    object_property_set_bool(file_be, "discard-data", false, &error_abort);

> > +    if (!user_creatable_complete(USER_CREATABLE(file_be), errp)) {

> > +        object_unref(file_be);

> > +        return -1;

> > +    }

> > +    object_property_add_child(OBJECT(ns), "_fb", file_be);

> > +    object_unref(file_be);

> > +

> > +    fb = MEMORY_BACKEND(file_be);

> > +    ns->zone_mr = host_memory_backend_get_memory(fb);

> > +

> > +    return 0;

> > +}

> > +

> > +static int nvme_map_zone_file(NvmeNamespace *ns, bool *init_meta)

> > +{

> > +    ns->zone_meta = (void *)memory_region_get_ram_ptr(ns->zone_mr);

> 

> I forgot that the HostMemoryBackend doesn't magically make the memory

> available to the device, so of course this is still needed.

> 

> Anyway.

> 

> No reason for me to keep complaining about this. I do not like it, I

> will not ACK it and I think I made my reasons pretty clear.


So, memory_region_msync() is ok, but memory_region_get_ram_ptr() is not??
This is the same API! You are really splitting hairs here to suit your agenda.
Moving goal posts again....

The "I do not like it" part is priceless. It is great that we have mail archives available.
Klaus Jensen Sept. 29, 2020, 4:46 p.m. UTC | #3
On Sep 29 15:43, Dmitry Fomichev wrote:
> 

> 

> > -----Original Message-----

> > From: Klaus Jensen <its@irrelevant.dk>

> > Sent: Monday, September 28, 2020 3:52 AM

> > To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>

> > Cc: Keith Busch <kbusch@kernel.org>; Klaus Jensen

> > <k.jensen@samsung.com>; Kevin Wolf <kwolf@redhat.com>; Philippe

> > Mathieu-Daudé <philmd@redhat.com>; Maxim Levitsky

> > <mlevitsk@redhat.com>; Fam Zheng <fam@euphon.net>; Niklas Cassel

> > <Niklas.Cassel@wdc.com>; Damien Le Moal <Damien.LeMoal@wdc.com>;

> > qemu-block@nongnu.org; qemu-devel@nongnu.org; Alistair Francis

> > <Alistair.Francis@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>

> > Subject: Re: [PATCH v5 13/14] hw/block/nvme: Use zone metadata file for

> > persistence

> > 

> > On Sep 28 11:35, Dmitry Fomichev wrote:

> > > A ZNS drive that is emulated by this module is currently initialized

> > > with all zones Empty upon startup. However, actual ZNS SSDs save the

> > > state and condition of all zones in their internal NVRAM in the event

> > > of power loss. When such a drive is powered up again, it closes or

> > > finishes all zones that were open at the moment of shutdown. Besides

> > > that, the write pointer position as well as the state and condition

> > > of all zones is preserved across power-downs.

> > >

> > > This commit adds the capability to have a persistent zone metadata

> > > to the device. The new optional module property, "zone_file",

> > > is introduced. If added to the command line, this property specifies

> > > the name of the file that stores the zone metadata. If "zone_file" is

> > > omitted, the device will be initialized with all zones empty, the same

> > > as before.

> > >

> > > If zone metadata is configured to be persistent, then zone descriptor

> > > extensions also persist across controller shutdowns.

> > >

> > > Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>

> > > ---

> > >  hw/block/nvme-ns.c    | 341

> > ++++++++++++++++++++++++++++++++++++++++--

> > >  hw/block/nvme-ns.h    |  33 ++++

> > >  hw/block/nvme.c       |   2 +

> > >  hw/block/trace-events |   1 +

> > >  4 files changed, 362 insertions(+), 15 deletions(-)

> > >

> > > diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c

> > > index 47751f2d54..a94021da81 100644

> > > --- a/hw/block/nvme-ns.c

> > > +++ b/hw/block/nvme-ns.c

> > > @@ -293,12 +421,180 @@ static void

> > nvme_init_zone_meta(NvmeNamespace *ns)

> > >              i--;

> > >          }

> > >      }

> > > +

> > > +    if (ns->params.zone_file) {

> > > +        nvme_set_zone_meta_dirty(ns);

> > > +    }

> > > +}

> > > +

> > > +static int nvme_open_zone_file(NvmeNamespace *ns, bool *init_meta,

> > > +                               Error **errp)

> > > +{

> > > +    Object *file_be;

> > > +    HostMemoryBackend *fb;

> > > +    struct stat statbuf;

> > > +    int ret;

> > > +

> > > +    ret = stat(ns->params.zone_file, &statbuf);

> > > +    if (ret && errno == ENOENT) {

> > > +        *init_meta = true;

> > > +    } else if (!S_ISREG(statbuf.st_mode)) {

> > > +        error_setg(errp, "\"%s\" is not a regular file",

> > > +                   ns->params.zone_file);

> > > +        return -1;

> > > +    }

> > > +

> > > +    file_be = object_new(TYPE_MEMORY_BACKEND_FILE);

> > > +    object_property_set_str(file_be, "mem-path", ns->params.zone_file,

> > > +                            &error_abort);

> > > +    object_property_set_int(file_be, "size", ns->meta_size, &error_abort);

> > > +    object_property_set_bool(file_be, "share", true, &error_abort);

> > > +    object_property_set_bool(file_be, "discard-data", false, &error_abort);

> > > +    if (!user_creatable_complete(USER_CREATABLE(file_be), errp)) {

> > > +        object_unref(file_be);

> > > +        return -1;

> > > +    }

> > > +    object_property_add_child(OBJECT(ns), "_fb", file_be);

> > > +    object_unref(file_be);

> > > +

> > > +    fb = MEMORY_BACKEND(file_be);

> > > +    ns->zone_mr = host_memory_backend_get_memory(fb);

> > > +

> > > +    return 0;

> > > +}

> > > +

> > > +static int nvme_map_zone_file(NvmeNamespace *ns, bool *init_meta)

> > > +{

> > > +    ns->zone_meta = (void *)memory_region_get_ram_ptr(ns->zone_mr);

> > 

> > I forgot that the HostMemoryBackend doesn't magically make the memory

> > available to the device, so of course this is still needed.

> > 

> > Anyway.

> > 

> > No reason for me to keep complaining about this. I do not like it, I

> > will not ACK it and I think I made my reasons pretty clear.

> 

> So, memory_region_msync() is ok, but memory_region_get_ram_ptr() is not??

> This is the same API! You are really splitting hairs here to suit your agenda.

> Moving goal posts again....

> 

> The "I do not like it" part is priceless. It is great that we have mail archives available.

> 


If you read my review again, its pretty clear that I am calling out the
abstraction. I was clear that if it *really* had to be mmap based, then
it should use hostmem. Sorry for moving your patchset forward by
suggesting an improvement.

But again, I also made it pretty clear that I did not agree with the
abstraction. And that I very much disliked that it was non-portable. And
had endiannes issues. I made it SUPER clear that that was why I "did not
like it".
diff mbox series

Patch

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 47751f2d54..a94021da81 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -20,12 +20,15 @@ 
 #include "hw/pci/pci.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/hostmem.h"
+#include "qom/object_interfaces.h"
 #include "qapi/error.h"
 #include "crypto/random.h"
 
 #include "hw/qdev-properties.h"
 #include "hw/qdev-core.h"
 
+#include "trace.h"
 #include "nvme.h"
 #include "nvme-ns.h"
 
@@ -98,6 +101,7 @@  void nvme_add_zone_tail(NvmeNamespace *ns, NvmeZoneList *zl, NvmeZone *zone)
         zl->tail = idx;
     }
     zl->size++;
+    nvme_set_zone_meta_dirty(ns);
 }
 
 /*
@@ -113,12 +117,15 @@  void nvme_remove_zone(NvmeNamespace *ns, NvmeZoneList *zl, NvmeZone *zone)
     if (zl->size == 0) {
         zl->head = NVME_ZONE_LIST_NIL;
         zl->tail = NVME_ZONE_LIST_NIL;
+        nvme_set_zone_meta_dirty(ns);
     } else if (idx == zl->head) {
         zl->head = zone->next;
         ns->zone_array[zl->head].prev = NVME_ZONE_LIST_NIL;
+        nvme_set_zone_meta_dirty(ns);
     } else if (idx == zl->tail) {
         zl->tail = zone->prev;
         ns->zone_array[zl->tail].next = NVME_ZONE_LIST_NIL;
+        nvme_set_zone_meta_dirty(ns);
     } else {
         ns->zone_array[zone->next].prev = zone->prev;
         ns->zone_array[zone->prev].next = zone->next;
@@ -144,6 +151,7 @@  NvmeZone *nvme_remove_zone_head(NvmeNamespace *ns, NvmeZoneList *zl)
             ns->zone_array[zl->head].prev = NVME_ZONE_LIST_NIL;
         }
         zone->prev = zone->next = 0;
+        nvme_set_zone_meta_dirty(ns);
     }
 
     return zone;
@@ -219,11 +227,119 @@  static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp)
         }
     }
 
+    ns->meta_size = sizeof(NvmeZoneMeta) + ns->zone_array_size +
+                          nz * ns->params.zd_extension_size;
+    ns->meta_size = ROUND_UP(ns->meta_size, qemu_real_host_page_size);
+
+    return 0;
+}
+
+static int nvme_validate_zone_file(NvmeNamespace *ns, uint64_t capacity)
+{
+    NvmeZoneMeta *meta = ns->zone_meta;
+    NvmeZone *zone = ns->zone_array;
+    uint64_t start = 0, zone_size = ns->zone_size;
+    int i, n_imp_open = 0, n_exp_open = 0, n_closed = 0, n_full = 0;
+
+    if (meta->magic != NVME_ZONE_META_MAGIC) {
+        return 1;
+    }
+    if (meta->version != NVME_ZONE_META_VER) {
+        return 2;
+    }
+    if (meta->zone_size != zone_size) {
+        return 3;
+    }
+    if (meta->zone_capacity != ns->zone_capacity) {
+        return 4;
+    }
+    if (meta->nr_offline_zones != ns->params.nr_offline_zones) {
+        return 5;
+    }
+    if (meta->nr_rdonly_zones != ns->params.nr_rdonly_zones) {
+        return 6;
+    }
+    if (meta->lba_size != ns->blkconf.logical_block_size) {
+        return 7;
+    }
+    if (meta->zd_extension_size != ns->params.zd_extension_size) {
+        return 8;
+    }
+
+    for (i = 0; i < ns->num_zones; i++, zone++) {
+        if (start + zone_size > capacity) {
+            zone_size = capacity - start;
+        }
+        if (zone->d.zt != NVME_ZONE_TYPE_SEQ_WRITE) {
+            return 9;
+        }
+        if (zone->d.zcap != ns->zone_capacity) {
+            return 10;
+        }
+        if (zone->d.zslba != start) {
+            return 11;
+        }
+        switch (nvme_get_zone_state(zone)) {
+        case NVME_ZONE_STATE_EMPTY:
+        case NVME_ZONE_STATE_OFFLINE:
+        case NVME_ZONE_STATE_READ_ONLY:
+            if (zone->d.wp != start) {
+                return 12;
+            }
+            break;
+        case NVME_ZONE_STATE_IMPLICITLY_OPEN:
+            if (zone->d.wp < start ||
+                zone->d.wp >= zone->d.zslba + zone->d.zcap) {
+                return 13;
+            }
+            n_imp_open++;
+            break;
+        case NVME_ZONE_STATE_EXPLICITLY_OPEN:
+            if (zone->d.wp < start ||
+                zone->d.wp >= zone->d.zslba + zone->d.zcap) {
+                return 13;
+            }
+            n_exp_open++;
+            break;
+        case NVME_ZONE_STATE_CLOSED:
+            if (zone->d.wp < start ||
+                zone->d.wp >= zone->d.zslba + zone->d.zcap) {
+                return 13;
+            }
+            n_closed++;
+            break;
+        case NVME_ZONE_STATE_FULL:
+            if (zone->d.wp != zone->d.zslba + zone->d.zcap) {
+                return 14;
+            }
+            n_full++;
+            break;
+        default:
+            return 15;
+        }
+
+        start += zone_size;
+    }
+
+    if (n_exp_open != nvme_zone_list_size(ns->exp_open_zones)) {
+        return 16;
+    }
+    if (n_imp_open != nvme_zone_list_size(ns->imp_open_zones)) {
+        return 17;
+    }
+    if (n_closed != nvme_zone_list_size(ns->closed_zones)) {
+        return 18;
+    }
+    if (n_full != nvme_zone_list_size(ns->full_zones)) {
+        return 19;
+    }
+
     return 0;
 }
 
 static void nvme_init_zone_meta(NvmeNamespace *ns)
 {
+    NvmeZoneMeta *meta = ns->zone_meta;
     uint64_t start = 0, zone_size = ns->zone_size;
     uint64_t capacity = ns->num_zones * zone_size;
     NvmeZone *zone;
@@ -231,14 +347,26 @@  static void nvme_init_zone_meta(NvmeNamespace *ns)
     int i;
     uint16_t zs;
 
-    ns->zone_array = g_malloc0(ns->zone_array_size);
-    ns->exp_open_zones = g_malloc0(sizeof(NvmeZoneList));
-    ns->imp_open_zones = g_malloc0(sizeof(NvmeZoneList));
-    ns->closed_zones = g_malloc0(sizeof(NvmeZoneList));
-    ns->full_zones = g_malloc0(sizeof(NvmeZoneList));
-    if (ns->params.zd_extension_size) {
-        ns->zd_extensions = g_malloc0(ns->params.zd_extension_size *
-                                      ns->num_zones);
+    if (ns->params.zone_file) {
+        meta->magic = NVME_ZONE_META_MAGIC;
+        meta->version = NVME_ZONE_META_VER;
+        meta->zone_size = zone_size;
+        meta->zone_capacity = ns->zone_capacity;
+        meta->lba_size = ns->blkconf.logical_block_size;
+        meta->nr_offline_zones = ns->params.nr_offline_zones;
+        meta->nr_rdonly_zones = ns->params.nr_rdonly_zones;
+        meta->zd_extension_size = ns->params.zd_extension_size;
+    } else {
+        assert(!ns->zone_meta);
+        ns->zone_array = g_malloc0(ns->zone_array_size);
+        ns->exp_open_zones = g_malloc0(sizeof(NvmeZoneList));
+        ns->imp_open_zones = g_malloc0(sizeof(NvmeZoneList));
+        ns->closed_zones = g_malloc0(sizeof(NvmeZoneList));
+        ns->full_zones = g_malloc0(sizeof(NvmeZoneList));
+        if (ns->params.zd_extension_size) {
+            ns->zd_extensions = g_malloc0(ns->params.zd_extension_size *
+                                          ns->num_zones);
+        }
     }
 
     nvme_init_zone_list(ns->exp_open_zones);
@@ -293,12 +421,180 @@  static void nvme_init_zone_meta(NvmeNamespace *ns)
             i--;
         }
     }
+
+    if (ns->params.zone_file) {
+        nvme_set_zone_meta_dirty(ns);
+    }
+}
+
+static int nvme_open_zone_file(NvmeNamespace *ns, bool *init_meta,
+                               Error **errp)
+{
+    Object *file_be;
+    HostMemoryBackend *fb;
+    struct stat statbuf;
+    int ret;
+
+    ret = stat(ns->params.zone_file, &statbuf);
+    if (ret && errno == ENOENT) {
+        *init_meta = true;
+    } else if (!S_ISREG(statbuf.st_mode)) {
+        error_setg(errp, "\"%s\" is not a regular file",
+                   ns->params.zone_file);
+        return -1;
+    }
+
+    file_be = object_new(TYPE_MEMORY_BACKEND_FILE);
+    object_property_set_str(file_be, "mem-path", ns->params.zone_file,
+                            &error_abort);
+    object_property_set_int(file_be, "size", ns->meta_size, &error_abort);
+    object_property_set_bool(file_be, "share", true, &error_abort);
+    object_property_set_bool(file_be, "discard-data", false, &error_abort);
+    if (!user_creatable_complete(USER_CREATABLE(file_be), errp)) {
+        object_unref(file_be);
+        return -1;
+    }
+    object_property_add_child(OBJECT(ns), "_fb", file_be);
+    object_unref(file_be);
+
+    fb = MEMORY_BACKEND(file_be);
+    ns->zone_mr = host_memory_backend_get_memory(fb);
+
+    return 0;
+}
+
+static int nvme_map_zone_file(NvmeNamespace *ns, bool *init_meta)
+{
+    ns->zone_meta = (void *)memory_region_get_ram_ptr(ns->zone_mr);
+    ns->zone_array = (NvmeZone *)(ns->zone_meta + 1);
+    ns->exp_open_zones = &ns->zone_meta->exp_open_zones;
+    ns->imp_open_zones = &ns->zone_meta->imp_open_zones;
+    ns->closed_zones = &ns->zone_meta->closed_zones;
+    ns->full_zones = &ns->zone_meta->full_zones;
+
+    if (ns->params.zd_extension_size) {
+        ns->zd_extensions = (uint8_t *)(ns->zone_meta + 1);
+        ns->zd_extensions += ns->zone_array_size;
+    }
+
+    return 0;
+}
+
+void nvme_sync_zone_file(NvmeNamespace *ns, NvmeZone *zone, int len)
+{
+    uintptr_t z = (uintptr_t)zone, off = z - (uintptr_t)ns->zone_meta;
+
+    if (ns->zone_meta) {
+        memory_region_msync(ns->zone_mr, off, len);
+
+        if (ns->zone_meta->dirty) {
+            ns->zone_meta->dirty = false;
+            memory_region_msync(ns->zone_mr, 0, sizeof(NvmeZoneMeta));
+        }
+    }
+}
+
+/*
+ * Close or finish all the zones that might be still open after power-down.
+ */
+static void nvme_prepare_zones(NvmeNamespace *ns)
+{
+    NvmeZone *zone;
+    uint32_t set_state;
+    int i;
+
+    assert(!ns->nr_active_zones);
+    assert(!ns->nr_open_zones);
+
+    zone = ns->zone_array;
+    for (i = 0; i < ns->num_zones; i++, zone++) {
+        switch (nvme_get_zone_state(zone)) {
+        case NVME_ZONE_STATE_IMPLICITLY_OPEN:
+            nvme_remove_zone(ns, ns->imp_open_zones, zone);
+            break;
+        case NVME_ZONE_STATE_EXPLICITLY_OPEN:
+            nvme_remove_zone(ns, ns->exp_open_zones, zone);
+            break;
+        case NVME_ZONE_STATE_CLOSED:
+            nvme_aor_inc_active(ns);
+            /* fall through */
+        default:
+            continue;
+        }
+
+        if (zone->d.za & NVME_ZA_ZD_EXT_VALID) {
+            set_state = NVME_ZONE_STATE_CLOSED;
+        } else if (zone->d.wp == zone->d.zslba) {
+            set_state = NVME_ZONE_STATE_EMPTY;
+        } else if (ns->params.max_active_zones == 0 ||
+                   ns->nr_active_zones < ns->params.max_active_zones) {
+            set_state = NVME_ZONE_STATE_CLOSED;
+        } else {
+            set_state = NVME_ZONE_STATE_FULL;
+        }
+
+        switch (set_state) {
+        case NVME_ZONE_STATE_CLOSED:
+            trace_pci_nvme_power_on_close(nvme_get_zone_state(zone),
+                                          zone->d.zslba);
+            nvme_aor_inc_active(ns);
+            nvme_add_zone_tail(ns, ns->closed_zones, zone);
+            break;
+        case NVME_ZONE_STATE_EMPTY:
+            trace_pci_nvme_power_on_reset(nvme_get_zone_state(zone),
+                                          zone->d.zslba);
+            break;
+        case NVME_ZONE_STATE_FULL:
+            trace_pci_nvme_power_on_full(nvme_get_zone_state(zone),
+                                         zone->d.zslba);
+            zone->d.wp = nvme_zone_wr_boundary(zone);
+        }
+
+        zone->w_ptr = zone->d.wp;
+        nvme_set_zone_state(zone, set_state);
+    }
+}
+
+static int nvme_load_zone_meta(NvmeNamespace *ns, bool *init_meta)
+{
+    uint64_t capacity = ns->num_zones * ns->zone_size;
+    int ret = 0;
+
+    if (ns->params.zone_file) {
+        ret = nvme_map_zone_file(ns, init_meta);
+        trace_pci_nvme_mapped_zone_file(ns->params.zone_file, ret);
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (!*init_meta) {
+            ret = nvme_validate_zone_file(ns, capacity);
+            if (ret) {
+                trace_pci_nvme_err_zone_file_invalid(ret);
+                *init_meta = true;
+            }
+        }
+    } else {
+        *init_meta = true;
+    }
+
+    if (*init_meta) {
+        nvme_init_zone_meta(ns);
+        trace_pci_nvme_initialized_zone_file(ns->params.zone_file);
+    } else {
+        nvme_prepare_zones(ns);
+    }
+    nvme_sync_zone_file(ns, ns->zone_array, ns->zone_array_size);
+
+    return 0;
 }
 
 static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index,
                               Error **errp)
 {
     NvmeIdNsZoned *id_ns_z;
+    int ret;
+    bool init_meta = false;
 
     if (n->params.fill_pattern == 0) {
         ns->id_ns.dlfeat |= 0x01;
@@ -310,7 +606,17 @@  static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index,
         return -1;
     }
 
-    nvme_init_zone_meta(ns);
+    if (ns->params.zone_file) {
+        if (nvme_open_zone_file(ns, &init_meta, errp) != 0) {
+            return -1;
+        }
+    }
+
+    ret = nvme_load_zone_meta(ns, &init_meta);
+    if (ret) {
+        error_setg(errp, "could not load/init zone metadata, err=%d", ret);
+        return -1;
+    }
 
     id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned));
 
@@ -376,17 +682,21 @@  void nvme_ns_drain(NvmeNamespace *ns)
 void nvme_ns_flush(NvmeNamespace *ns)
 {
     blk_flush(ns->blkconf.blk);
+
+    nvme_sync_zone_file(ns, ns->zone_array, ns->zone_array_size);
 }
 
 void nvme_ns_cleanup(NvmeNamespace *ns)
 {
+    if (!ns->params.zone_file)  {
+        g_free(ns->zone_array);
+        g_free(ns->exp_open_zones);
+        g_free(ns->imp_open_zones);
+        g_free(ns->closed_zones);
+        g_free(ns->full_zones);
+        g_free(ns->zd_extensions);
+    }
     g_free(ns->id_ns_zoned);
-    g_free(ns->zone_array);
-    g_free(ns->exp_open_zones);
-    g_free(ns->imp_open_zones);
-    g_free(ns->closed_zones);
-    g_free(ns->full_zones);
-    g_free(ns->zd_extensions);
 }
 
 static void nvme_ns_realize(DeviceState *dev, Error **errp)
@@ -422,6 +732,7 @@  static Property nvme_ns_props[] = {
                        params.nr_offline_zones, 0),
     DEFINE_PROP_UINT32("rdonly_zones", NvmeNamespace,
                        params.nr_rdonly_zones, 0),
+    DEFINE_PROP_STRING("zone_file", NvmeNamespace, params.zone_file),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index e9b90f9677..4ff0955f91 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -36,6 +36,27 @@  typedef struct NvmeZoneList {
     uint8_t         rsvd12[4];
 } NvmeZoneList;
 
+#define NVME_ZONE_META_MAGIC 0x3aebaa70
+#define NVME_ZONE_META_VER  1
+
+typedef struct NvmeZoneMeta {
+    uint32_t        magic;
+    uint32_t        version;
+    uint64_t        zone_size;
+    uint64_t        zone_capacity;
+    uint32_t        nr_offline_zones;
+    uint32_t        nr_rdonly_zones;
+    uint32_t        lba_size;
+    uint32_t        rsvd40;
+    NvmeZoneList    exp_open_zones;
+    NvmeZoneList    imp_open_zones;
+    NvmeZoneList    closed_zones;
+    NvmeZoneList    full_zones;
+    uint8_t         zd_extension_size;
+    uint8_t         dirty;
+    uint8_t         rsvd594[3990];
+} NvmeZoneMeta;
+
 typedef struct NvmeNamespaceParams {
     uint32_t nsid;
     bool     attached;
@@ -50,6 +71,7 @@  typedef struct NvmeNamespaceParams {
     uint32_t zd_extension_size;
     uint32_t nr_offline_zones;
     uint32_t nr_rdonly_zones;
+    char     *zone_file;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
@@ -62,6 +84,7 @@  typedef struct NvmeNamespace {
 
     NvmeIdNsZoned   *id_ns_zoned;
     NvmeZone        *zone_array;
+    NvmeZoneMeta    *zone_meta;
     NvmeZoneList    *exp_open_zones;
     NvmeZoneList    *imp_open_zones;
     NvmeZoneList    *closed_zones;
@@ -74,6 +97,8 @@  typedef struct NvmeNamespace {
     uint8_t         *zd_extensions;
     int32_t         nr_open_zones;
     int32_t         nr_active_zones;
+    MemoryRegion    *zone_mr;
+    size_t          meta_size;
 
     NvmeNamespaceParams params;
 } NvmeNamespace;
@@ -110,6 +135,13 @@  static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
     return lba << nvme_ns_lbads(ns);
 }
 
+static inline void nvme_set_zone_meta_dirty(NvmeNamespace *ns)
+{
+    if (ns->params.zone_file) {
+        ns->zone_meta->dirty = true;
+    }
+}
+
 typedef struct NvmeCtrl NvmeCtrl;
 
 int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
@@ -243,5 +275,6 @@  static inline void nvme_aor_dec_active(NvmeNamespace *ns)
 void nvme_add_zone_tail(NvmeNamespace *ns, NvmeZoneList *zl, NvmeZone *zone);
 void nvme_remove_zone(NvmeNamespace *ns, NvmeZoneList *zl, NvmeZone *zone);
 NvmeZone *nvme_remove_zone_head(NvmeNamespace *ns, NvmeZoneList *zl);
+void nvme_sync_zone_file(NvmeNamespace *ns, NvmeZone *zone, int len);
 
 #endif /* NVME_NS_H */
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 80973f3ff6..ff7d43d38f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -165,6 +165,8 @@  static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone,
     default:
         zone->d.za = 0;
     }
+
+    nvme_sync_zone_file(ns, zone, sizeof(NvmeZone));
 }
 
 /*
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 386f28e457..1ea4846443 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -103,6 +103,7 @@  pci_nvme_zd_extension_set(uint32_t zone_idx) "set descriptor extension for zone_
 pci_nvme_power_on_close(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Closed state"
 pci_nvme_power_on_reset(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Empty state"
 pci_nvme_power_on_full(uint32_t state, uint64_t slba) "zone state=%"PRIu32", slba=%"PRIu64" transitioned to Full state"
+pci_nvme_initialized_zone_file(char *zfile_name) "mapped zone file %s"
 pci_nvme_mapped_zone_file(char *zfile_name, int ret) "mapped zone file %s, error %d"
 
 # nvme traces for error conditions