Message ID | 20220711230051.15372-1-bvanassche@acm.org |
---|---|
Headers | show |
Series | Improve READ CAPACITY reporting and handling | expand |
On 7/12/22 08:00, Bart Van Assche wrote: > Provide information to the SCSI initiator about whether or not all examined > zones have the same zone type and zone length. From the description of the > SAME field in ZBC-1: > * 0: The zone type and zone length in each zone descriptor may be different. > * 1: The zone type and zone length in each zone descriptor are equal to the > zone type and zone length indicated in the first zone descriptor in the > zone descriptor list. > > Cc: Douglas Gilbert <dgilbert@interlog.com> > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/scsi_debug.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c > index b8a76b89f85a..5a8efc328fb5 100644 > --- a/drivers/scsi/scsi_debug.c > +++ b/drivers/scsi/scsi_debug.c > @@ -4476,8 +4476,10 @@ static int resp_report_zones(struct scsi_cmnd *scp, > u64 lba, zs_lba; > u8 *arr = NULL, *desc; > u8 *cmd = scp->cmnd; > - struct sdeb_zone_state *zsp = NULL; > + struct sdeb_zone_state *zsp = NULL, *first_reported_zone = NULL; > struct sdeb_store_info *sip = devip2sip(devip, false); > + /* 1: all zones in the response have the same type and length. */ > + u8 same = 1; > > if (!sdebug_dev_is_zoned(devip)) { > mk_sense_invalid_opcode(scp); > @@ -4571,6 +4573,13 @@ static int resp_report_zones(struct scsi_cmnd *scp, > goto fini; > } > > + if (first_reported_zone) { > + if (zsp->z_type != first_reported_zone->z_type || > + zsp->z_size != first_reported_zone->z_size) > + same = 0; > + } else { > + first_reported_zone = zsp; > + } > if (nrz < rep_max_zones) { > /* Fill zone descriptor */ > desc[0] = zsp->z_type; > @@ -4592,6 +4601,8 @@ static int resp_report_zones(struct scsi_cmnd *scp, > /* Report header */ > /* Zone list length. */ > put_unaligned_be32(nrz * RZONES_DESC_HD, arr + 0); > + /* SAME field. */ > + arr[4] = same; > /* Maximum LBA */ > put_unaligned_be64(sdebug_capacity - 1, arr + 8); > /* Zone starting LBA granularity. */ Looks good. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
On 7/12/22 08:00, Bart Van Assche wrote: > From ZBC-1: > * RC BASIS = 0: The RETURNED LOGICAL BLOCK ADDRESS field indicates the > highest LBA of a contiguous range of zones that are not sequential write > required zones starting with the first zone. > * RC BASIS = 1: The RETURNED LOGICAL BLOCK ADDRESS field indicates the LBA > of the last logical block on the logical unit. > > The current scsi_debug READ CAPACITY response does not comply with the above > if there are one or more sequential write required zones. Hence this patch. > > Cc: Douglas Gilbert <dgilbert@interlog.com> > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Fixes: 64e14ece0700 ("scsi: scsi_debug: Implement ZBC host-aware emulation") > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/scsi_debug.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c > index 5a8efc328fb5..c5f4af774078 100644 > --- a/drivers/scsi/scsi_debug.c > +++ b/drivers/scsi/scsi_debug.c > @@ -304,6 +304,7 @@ struct sdebug_dev_info { > unsigned int nr_exp_open; > unsigned int nr_closed; > unsigned int max_open; > + unsigned int rc_basis:2; > ktime_t create_ts; /* time since bootup that this device was created */ > struct sdeb_zone_state *zstate; > }; > @@ -1901,6 +1902,8 @@ static int resp_readcap16(struct scsi_cmnd *scp, > > arr[15] = sdebug_lowest_aligned & 0xff; > > + arr[12] |= devip->rc_basis << 4; > + > if (have_dif_prot) { > arr[12] = (sdebug_dif - 1) << 1; /* P_TYPE */ > arr[12] |= 1; /* PROT_EN */ > @@ -5107,10 +5110,12 @@ static int sdebug_device_create_zones(struct sdebug_dev_info *devip) > zsp->z_size = > min_t(u64, devip->zsize, capacity - zstart); > } else if ((zstart & (devip->zsize - 1)) == 0) { > - if (devip->zmodel == BLK_ZONED_HM) > + if (devip->zmodel == BLK_ZONED_HM) { > zsp->z_type = ZBC_ZTYPE_SWR; > - else > + devip->rc_basis = 1; > + } else { > zsp->z_type = ZBC_ZTYPE_SWP; > + } > zsp->z_cond = ZC1_EMPTY; > zsp->z_wp = zsp->z_start; > zsp->z_size = Looks good. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
On 7/12/22 08:00, Bart Van Assche wrote: > From ZBC-1: > * RC BASIS = 0: The RETURNED LOGICAL BLOCK ADDRESS field indicates the > highest LBA of a contiguous range of zones that are not sequential write > required zones starting with the first zone. > * RC BASIS = 1: The RETURNED LOGICAL BLOCK ADDRESS field indicates the LBA > of the last logical block on the logical unit. > > The current scsi_debug READ CAPACITY response does not comply with the above > if there are one or more sequential write required zones. Hence this patch. > > Cc: Douglas Gilbert <dgilbert@interlog.com> > Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> > Fixes: 64e14ece0700 ("scsi: scsi_debug: Implement ZBC host-aware emulation") > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/scsi_debug.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c > index 5a8efc328fb5..c5f4af774078 100644 > --- a/drivers/scsi/scsi_debug.c > +++ b/drivers/scsi/scsi_debug.c > @@ -304,6 +304,7 @@ struct sdebug_dev_info { > unsigned int nr_exp_open; > unsigned int nr_closed; > unsigned int max_open; > + unsigned int rc_basis:2; > ktime_t create_ts; /* time since bootup that this device was created */ > struct sdeb_zone_state *zstate; > }; > @@ -1901,6 +1902,8 @@ static int resp_readcap16(struct scsi_cmnd *scp, > > arr[15] = sdebug_lowest_aligned & 0xff; > > + arr[12] |= devip->rc_basis << 4; Note that I think this entire patch could be reduced to just doing this: if (devip->zmodel == BLK_ZONED_HM) { /* Set RC BASIS = 1 */ arr[12] |= 0x01 << 4; } No ? > + > if (have_dif_prot) { > arr[12] = (sdebug_dif - 1) << 1; /* P_TYPE */ > arr[12] |= 1; /* PROT_EN */ > @@ -5107,10 +5110,12 @@ static int sdebug_device_create_zones(struct sdebug_dev_info *devip) > zsp->z_size = > min_t(u64, devip->zsize, capacity - zstart); > } else if ((zstart & (devip->zsize - 1)) == 0) { > - if (devip->zmodel == BLK_ZONED_HM) > + if (devip->zmodel == BLK_ZONED_HM) { > zsp->z_type = ZBC_ZTYPE_SWR; > - else > + devip->rc_basis = 1; > + } else { > zsp->z_type = ZBC_ZTYPE_SWP; > + } > zsp->z_cond = ZC1_EMPTY; > zsp->z_wp = zsp->z_start; > zsp->z_size =
On 7/11/22 16:19, Damien Le Moal wrote: > Note that I think this entire patch could be reduced to just doing this: > > if (devip->zmodel == BLK_ZONED_HM) { > /* Set RC BASIS = 1 */ > arr[12] |= 0x01 << 4; > } > > No ? Hi Damien, The above looks valid to me from the point-of-view of the ZBC-spec. However, reducing patch 2/3 to the above would reduce the number of code paths in the sd_zbc.c code that can be triggered with the scsi_debug driver. Thanks, Bart.
On 7/12/22 09:39, Bart Van Assche wrote: > On 7/11/22 16:11, Damien Le Moal wrote: >> Do you have these cases: >> 1) host-managed disks: >> SWR zones are *mandatory* so there is at least one. Thus read capacity >> will return either 0 if there are no conventional zones (they are >> optional) or the total capacity of the set of contiguous conventional >> zones starting at lba 0. In either case, read capacity does not give you >> the actual total capacity and you have to look at the report zones reply >> max lba field. > > Does the scsi_debug driver allow to create a host-managed disk with one > or more conventional zones and one or more sequential write required zones? As per specs, conventional zones are optional. SWR zones are mandatory. See this code in sdebug_device_create_zones(): conv_capacity = (sector_t)sdeb_zbc_nr_conv << devip->zsize_shift; if (conv_capacity >= capacity) { pr_err("Number of conventional zones too large\n"); return -EINVAL; } devip->nr_conv_zones = sdeb_zbc_nr_conv; devip->nr_seq_zones = ALIGN(capacity - conv_capacity, devip->zsize) >> devip->zsize_shift; devip->nr_zones = devip->nr_conv_zones + devip->nr_seq_zones; > >> Note that anyway, there are no drives out there that use RC BASIS = 0. I >> had to hack a drive FW to implement it to test this code... > > A JEDEC member is telling me that I should use RC BASIS = 0 for > host-managed zoned storage devices that only have sequential write > required zones. That sounds weird to me so I decided to take a look at > how the sd_zbc.c code handles RC BASIS. RC BASIS = 0 makes sense only for host-managed drives that have conventional zones starting at LBA 0 *AND* want to have some sort of backward compatibility with a regular disk by advertizing only the capacity of these conventional zones (all together, the set of conventional zones starting at LBA 0 can be used like a regular disk). That is a fairly narrow use case that is in my opinion not useful at all given that the device type differs from regular disks anyway. For host-managed disks, RC BASIS = 1 makes more sense. All SMR disks I know of on the market use that. > > Bart. >
On 7/12/22 09:33, Bart Van Assche wrote: > On 7/11/22 16:19, Damien Le Moal wrote: >> Note that I think this entire patch could be reduced to just doing this: >> >> if (devip->zmodel == BLK_ZONED_HM) { >> /* Set RC BASIS = 1 */ >> arr[12] |= 0x01 << 4; >> } >> >> No ? > > Hi Damien, > > The above looks valid to me from the point-of-view of the ZBC-spec. > However, reducing patch 2/3 to the above would reduce the number of code > paths in the sd_zbc.c code that can be triggered with the scsi_debug driver. You lost me... I do not understand what you are trying to say here. You patch as is repeatedly sets devid->rc_basis to one for every zone in the zone initialization loop. Not a big problem, but not necessary either. > > Thanks, > > Bart.
On 2022-07-11 20:39, Bart Van Assche wrote: > On 7/11/22 16:11, Damien Le Moal wrote: >> Do you have these cases: >> 1) host-managed disks: >> SWR zones are *mandatory* so there is at least one. Thus read capacity >> will return either 0 if there are no conventional zones (they are >> optional) or the total capacity of the set of contiguous conventional >> zones starting at lba 0. In either case, read capacity does not give you >> the actual total capacity and you have to look at the report zones reply >> max lba field. > > Does the scsi_debug driver allow to create a host-managed disk with one or more > conventional zones and one or more sequential write required zones? Yes. See https://doug-gilbert.github.io/scsi_debug.html Example from that page: # modprobe scsi_debug max_luns=1 sector_size=512 zbc=managed zone_size_mb=8 zone_nr_conv=3 # sg_rep_zones -S /dev/sda Number of conventional type zones: 3 Number of sequential write required type zones: 13 Lowest starting LBA: 0xc000 Number of 'not write pointer' condition zones: 3 Lowest starting LBA: 0x0 Number of empty condition zones: 13 Lowest starting LBA: 0xc000 Number of used blocks in write pointer zones: 0x0 Size of all conventional zones: 25165824 bytes, 24.0 MiB, 0.03 GB All zones are the same size. So if you rmmod that instance and do: # modprobe scsi_debug max_luns=1 sector_size=512 zbc=managed zone_size_mb=8 zone_nr_conv=3 dev_size_mb=40 # sg_rep_zones /dev/sda -S Number of conventional type zones: 3 Number of sequential write required type zones: 2 Lowest starting LBA: 0xc000 Number of 'not write pointer' condition zones: 3 Lowest starting LBA: 0x0 Number of empty condition zones: 2 Lowest starting LBA: 0xc000 Number of used blocks in write pointer zones: 0x0 Size of all conventional zones: 25165824 bytes, 24.0 MiB, 0.03 GB So the 40 MiB was divided up into 3 conventional zones followed by 2 SWR zones, each of 8 MiB. So you have all the control needed to specify 1 (zero ?) or more conventional zones and 1 or more SWR zones. Doug Gilbert >> Note that anyway, there are no drives out there that use RC BASIS = 0. I >> had to hack a drive FW to implement it to test this code... > > A JEDEC member is telling me that I should use RC BASIS = 0 for host-managed > zoned storage devices that only have sequential write required zones. That > sounds weird to me so I decided to take a look at how the sd_zbc.c code handles > RC BASIS. > > Bart. >
On 7/11/22 17:53, Damien Le Moal wrote: > On 7/12/22 09:33, Bart Van Assche wrote: >> The above looks valid to me from the point-of-view of the ZBC-spec. >> However, reducing patch 2/3 to the above would reduce the number of code >> paths in the sd_zbc.c code that can be triggered with the scsi_debug driver. > > You lost me... I do not understand what you are trying to say here. > You patch as is repeatedly sets devid->rc_basis to one for every zone in > the zone initialization loop. Not a big problem, but not necessary either. No worries - I plan to use your implementation because it is simpler. Bart.
On 7/13/22 02:14, Bart Van Assche wrote: > On 7/11/22 16:28, Damien Le Moal wrote: >> On 7/12/22 08:11, Damien Le Moal wrote: >>> On 7/12/22 08:00, Bart Van Assche wrote: >>>> Using the RETURNED LOGICAL BLOCK ADDRESS field + 1 as the capacity (largest >>>> addressable LBA) if RC BASIS = 0 is wrong if there are sequential write >>>> required zones. Hence only use the RC BASIS = 0 capacity if there are no >>>> sequential write required zones. >>> >>> This does not make sense to me: RC BASIS == 0 is defined like this: >>> >>> The RETURNED LOGICAL BLOCK ADDRESS field indicates the highest LBA of a >>> contiguous range of zones that are not sequential write required zones >>> starting with the first zone. >>> >>> Do you have these cases: >>> 1) host-managed disks: >>> SWR zones are *mandatory* so there is at least one. Thus read capacity >>> will return either 0 if there are no conventional zones (they are >>> optional) or the total capacity of the set of contiguous conventional >>> zones starting at lba 0. In either case, read capacity does not give you >>> the actual total capacity and you have to look at the report zones reply >>> max lba field. >>> 2) host-aware disks: >>> There are no SWR zones, there cannot be any. You can only have >>> conventional zones (optionally) and sequential write preferred zones. So >>> "the highest LBA of a contiguous range of zones that are not sequential >>> write required zones starting with the first zone" necessarily is always >>> the total capacity. RC BASIS = 0 is non-sensical for host-aware drives. >> >> What I meant to say here for the host-aware case is that RC BASIS is >> irrelevant. Both RC BASIS = 0 and = 1 end up with read capacity giving the >> actual total capacity. ANd for good reasons: the host-aware model is >> designed to be backward compatible with regular disks, which do not have >> RC BASIS, so RC BASIS = 0, always. > Hi Damien, > > I agree that for host-aware block devices (conventional + sequential write > preferred zones) RC BASIS is irrelevant. > > What I'm concerned about is host-managed block devices (conventional + sequential > write required zones) that report an incorrect RC BASIS value (0 instead of 1). > Shouldn't sd_zbc_check_capacity() skip the capacity reported by host-managed block > devices that report RC BASIS = 0? If that is so much a concern, we could do something like this: diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c index 6acc4f406eb8..32da54e7b68a 100644 --- a/drivers/scsi/sd_zbc.c +++ b/drivers/scsi/sd_zbc.c @@ -716,17 +716,15 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf, if (ret) return ret; - if (sdkp->rc_basis == 0) { - /* The max_lba field is the capacity of this device */ - max_lba = get_unaligned_be64(&buf[8]); - if (sdkp->capacity != max_lba + 1) { - if (sdkp->first_scan) - sd_printk(KERN_WARNING, sdkp, - "Changing capacity from %llu to max LBA+1 %llu\n", - (unsigned long long)sdkp->capacity, - (unsigned long long)max_lba + 1); - sdkp->capacity = max_lba + 1; - } + /* The max_lba field is the capacity of this device */ + max_lba = get_unaligned_be64(&buf[8]); + if (sdkp->capacity != max_lba + 1) { + if (sdkp->first_scan) + sd_printk(KERN_WARNING, sdkp, + "Changing capacity from %llu to max LBA+1 %llu\n", + (unsigned long long)sdkp->capacity, + (unsigned long long)max_lba + 1); + sdkp->capacity = max_lba + 1; } if (sdkp->zone_starting_lba_gran == 0) { That is, always check the reported capacity against max_lba of report zones reply, regardless of rc_basis (and we can even then drop the rc_basis field from struct scsi_disk). But I would argue that any problem with the current code would mean a buggy device firmware. For such case, the device FW should be fixed or we should add a quirk for it. > > Thanks, > > Bart.
On 7/12/22 15:08, Damien Le Moal wrote: > diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c > index 6acc4f406eb8..32da54e7b68a 100644 > --- a/drivers/scsi/sd_zbc.c > +++ b/drivers/scsi/sd_zbc.c > @@ -716,17 +716,15 @@ static int sd_zbc_check_capacity(struct scsi_disk > *sdkp, unsigned char *buf, > if (ret) > return ret; > > - if (sdkp->rc_basis == 0) { > - /* The max_lba field is the capacity of this device */ > - max_lba = get_unaligned_be64(&buf[8]); > - if (sdkp->capacity != max_lba + 1) { > - if (sdkp->first_scan) > - sd_printk(KERN_WARNING, sdkp, > - "Changing capacity from %llu to > max LBA+1 %llu\n", > - (unsigned long long)sdkp->capacity, > - (unsigned long long)max_lba + 1); > - sdkp->capacity = max_lba + 1; > - } > + /* The max_lba field is the capacity of this device */ > + max_lba = get_unaligned_be64(&buf[8]); > + if (sdkp->capacity != max_lba + 1) { > + if (sdkp->first_scan) > + sd_printk(KERN_WARNING, sdkp, > + "Changing capacity from %llu to max LBA+1 %llu\n", > + (unsigned long long)sdkp->capacity, > + (unsigned long long)max_lba + 1); > + sdkp->capacity = max_lba + 1; > } > > if (sdkp->zone_starting_lba_gran == 0) { > > That is, always check the reported capacity against max_lba of report > zones reply, regardless of rc_basis (and we can even then drop the > rc_basis field from struct scsi_disk). I like the above patch because it simplifies the existing code. > But I would argue that any problem with the current code would mean a > buggy device firmware. For such case, the device FW should be fixed or we > should add a quirk for it. My question was which approach should be followed for devices with a buggy firmware? Use the zones up to the LBA reported in the READ CAPACITY response or reject these devices entirely? Thanks, Bart.
On 7/18/22 03:38, Damien Le Moal wrote: > Even the above code (removing the "sdkp->rc_basis == 0" test) is > borderline in my opinion. The code with the test is per specs, so correct. Hi Damien, I do not agree that the current code is compliant with the ZBC specification. My interpretation of the ZBC specification is that the RC BASIS field influences the meaning of the RETURNED LOGICAL BLOCK ADDRESS field in the READ CAPACITY response only. The max_lba variable in sd_zbc_check_capacity() represents the MAXIMUM LBA field from the REPORT ZONES response. All I found in ZBC-2 about that field is the following: "The MAXIMUM LBA field contains the LBA of the last logical block on the logical unit." I haven't found any reference to RC BASIS in the description of the REPORT ZONES command - neither in ZBC-1 nor in ZBC-2. Thanks, Bart.
On 7/18/22 06:59, Bart Van Assche wrote: > On 7/18/22 03:38, Damien Le Moal wrote: >> Even the above code (removing the "sdkp->rc_basis == 0" test) is >> borderline in my opinion. The code with the test is per specs, so >> correct. > > I do not agree that the current code is compliant with the ZBC > specification. My interpretation of the ZBC specification is that the RC > BASIS field influences the meaning of the RETURNED LOGICAL BLOCK ADDRESS > field in the READ CAPACITY response only. The max_lba variable in > sd_zbc_check_capacity() represents the MAXIMUM LBA field from the REPORT > ZONES response. All I found in ZBC-2 about that field is the following: > "The MAXIMUM LBA field contains the LBA of the last logical block on the > logical unit." I haven't found any reference to RC BASIS in the > description of the REPORT ZONES command - neither in ZBC-1 nor in ZBC-2. Hi Damien, I want to take the above back. After having taken a closer look at the READ CAPACITY implementation I think the current implementation of sd_zbc_check_capacity() is fine. Thanks, Bart.