mbox series

[0/4] Initial support for multi-actuator HDDs

Message ID 20210721104205.885115-1-damien.lemoal@wdc.com
Headers show
Series Initial support for multi-actuator HDDs | expand

Message

Damien Le Moal July 21, 2021, 10:42 a.m. UTC
Single LUN multi-actuator hard-disks are cappable to seek and execute
multiple commands in parallel. This capability is exposed to the host
using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
Each positioning range describes the contiguous set of LBAs that an
actuator serves.

This series adds support the scsi disk driver to retreive this
information and advertize it to user space through sysfs. libata is also
modified to handle ATA drives.

The first patch adds the block layer plumbing to expose concurrent
sector ranges of the device through sysfs as a sub-directory of the
device sysfs queue directory. Patch 2 and 3 add support to sd and
libata. Finally patch 4 documents the sysfs queue attributed changes.

This series does not attempt in any way to optimize accesses to
multi-actuator devices (e.g. block IO scheduler or filesystems). This
initial support only exposes the actuators information to user space
through sysfs.

Damien Le Moal (4):
  block: Add concurrent positioning ranges support
  scsi: sd: add concurrent positioning ranges support
  libata: support concurrent positioning ranges log
  doc: document sysfs queue/cranges attributes

 Documentation/block/queue-sysfs.rst |  27 ++-
 block/Makefile                      |   2 +-
 block/blk-cranges.c                 | 286 ++++++++++++++++++++++++++++
 block/blk-sysfs.c                   |  13 ++
 block/blk.h                         |   3 +
 drivers/ata/libata-core.c           |  57 ++++++
 drivers/ata/libata-scsi.c           |  47 ++++-
 drivers/scsi/sd.c                   |  80 ++++++++
 drivers/scsi/sd.h                   |   1 +
 include/linux/ata.h                 |   1 +
 include/linux/blkdev.h              |  29 +++
 include/linux/libata.h              |  11 ++
 12 files changed, 546 insertions(+), 11 deletions(-)
 create mode 100644 block/blk-cranges.c

Comments

Hannes Reinecke July 22, 2021, 3:58 p.m. UTC | #1
On 7/21/21 12:42 PM, Damien Le Moal wrote:
> Single LUN multi-actuator hard-disks are cappable to seek and execute

> multiple commands in parallel. This capability is exposed to the host

> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).

> Each positioning range describes the contiguous set of LBAs that an

> actuator serves.

> 

> This series adds support the scsi disk driver to retreive this

> information and advertize it to user space through sysfs. libata is also

> modified to handle ATA drives.

> 

> The first patch adds the block layer plumbing to expose concurrent

> sector ranges of the device through sysfs as a sub-directory of the

> device sysfs queue directory. Patch 2 and 3 add support to sd and

> libata. Finally patch 4 documents the sysfs queue attributed changes.

> 

> This series does not attempt in any way to optimize accesses to

> multi-actuator devices (e.g. block IO scheduler or filesystems). This

> initial support only exposes the actuators information to user space

> through sysfs.

> 

> Damien Le Moal (4):

>    block: Add concurrent positioning ranges support

>    scsi: sd: add concurrent positioning ranges support

>    libata: support concurrent positioning ranges log

>    doc: document sysfs queue/cranges attributes

> 

>   Documentation/block/queue-sysfs.rst |  27 ++-

>   block/Makefile                      |   2 +-

>   block/blk-cranges.c                 | 286 ++++++++++++++++++++++++++++

>   block/blk-sysfs.c                   |  13 ++

>   block/blk.h                         |   3 +

>   drivers/ata/libata-core.c           |  57 ++++++

>   drivers/ata/libata-scsi.c           |  47 ++++-

>   drivers/scsi/sd.c                   |  80 ++++++++

>   drivers/scsi/sd.h                   |   1 +

>   include/linux/ata.h                 |   1 +

>   include/linux/blkdev.h              |  29 +++

>   include/linux/libata.h              |  11 ++

>   12 files changed, 546 insertions(+), 11 deletions(-)

>   create mode 100644 block/blk-cranges.c

> 

Oh well, you beat me to it.
Actually I have patches here locally, which just had been waiting for 
the missing SBC definitions, which apparently are present now.

Too bad; should have been more aggressive posting them :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke July 22, 2021, 4:14 p.m. UTC | #2
On 7/21/21 12:42 PM, Damien Le Moal wrote:
> Add the sd_read_cpr() function to the sd scsi disk driver to discover

> if a device has multiple concurrent positioning ranges (i.e. multiple

> actuators on an HDD). This new function is called from

> sd_revalidate_disk() and uses the block layer functions

> blk_alloc_cranges() and blk_queue_set_cranges() to set a device

> cranges according to the information retrieved from log page B9h,

> if the device supports it.

> 

> The format of the Concurrent Positioning Ranges VPD page B9h is defined

> in section 6.6.6 of SBC-5.

> 

> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>

> ---

>   drivers/scsi/sd.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++

>   drivers/scsi/sd.h |  1 +

>   2 files changed, 81 insertions(+)

> 

> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c

> index b8d55af763f9..b1e767a01b9f 100644

> --- a/drivers/scsi/sd.c

> +++ b/drivers/scsi/sd.c

> @@ -3125,6 +3125,85 @@ static void sd_read_security(struct scsi_disk *sdkp, unsigned char *buffer)

>   		sdkp->security = 1;

>   }

>   

> +static inline sector_t sd64_to_sectors(struct scsi_disk *sdkp, u8 *buf)

> +{

> +	return logical_to_sectors(sdkp->device, get_unaligned_be64(buf));

> +}

> +

> +/**

> + * sd_read_cpr - Query concurrent positioning ranges

> + * @sdkp:	disk to query

> + */

> +static void sd_read_cpr(struct scsi_disk *sdkp)

> +{

> +	unsigned char *buffer, *desc;

> +	struct blk_cranges *cr = NULL;

> +	unsigned int nr_cpr = 0;

> +	int i, vpd_len, buf_len = SD_BUF_SIZE;

> +

> +	/*

> +	 * We need to have the capacity set first for the block layer to be

> +	 * able to check the ranges.

> +	 */

> +	if (sdkp->first_scan)

> +		return;

> +

> +	if (!sdkp->capacity)

> +		goto out;

> +

> +	/*

> +	 * Concurrent Positioning Ranges VPD: there can be at most 256 ranges,

> +	 * leading to a maximum page size of 64 + 256*32 bytes.

> +	 */

> +	buf_len = 64 + 256*32;

> +	buffer = kmalloc(buf_len, GFP_KERNEL);

> +	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))

> +		goto out;

> +

> +	/* We must have at least a 64B header and one 32B range descriptor */

> +	vpd_len = get_unaligned_be16(&buffer[2]) + 3;

> +	if (vpd_len > buf_len || vpd_len < 64 + 32 || (vpd_len & 31)) {

> +		sd_printk(KERN_ERR, sdkp,

> +			  "Invalid Concurrent Positioning Ranges VPD page\n");

> +		goto out;

> +	}

> +

> +	nr_cpr = (vpd_len - 64) / 32;

> +	if (nr_cpr == 1) {

> +		nr_cpr = 0;

> +		goto out;

> +	}

> +

> +	cr = blk_alloc_cranges(sdkp->disk, nr_cpr);

> +	if (!cr) {

> +		nr_cpr = 0;

> +		goto out;

> +	}

> +

> +	desc = &buffer[64];

> +	for (i = 0; i < nr_cpr; i++, desc += 32) {

> +		if (desc[0] != i) {

> +			sd_printk(KERN_ERR, sdkp,

> +				"Invalid Concurrent Positioning Range number\n");

> +			nr_cpr = 0;

> +			break;

> +		}

> +

> +		cr->ranges[i].sector = sd64_to_sectors(sdkp, desc + 8);

> +		cr->ranges[i].nr_sectors = sd64_to_sectors(sdkp, desc + 16);

> +	}

> +

> +out:

> +	blk_queue_set_cranges(sdkp->disk, cr);


See? We are _are_ creating a new set of ranges.
So why bother updating the old ones?

> +	if (nr_cpr && sdkp->nr_actuators != nr_cpr) {

> +		sd_printk(KERN_NOTICE, sdkp,

> +			  "%u concurrent positioning ranges\n", nr_cpr);

> +		sdkp->nr_actuators = nr_cpr;

> +	}

> +

> +	kfree(buffer);

> +}

> +

>   /*

>    * Determine the device's preferred I/O size for reads and writes

>    * unless the reported value is unreasonably small, large, not a

> @@ -3240,6 +3319,7 @@ static int sd_revalidate_disk(struct gendisk *disk)

>   		sd_read_app_tag_own(sdkp, buffer);

>   		sd_read_write_same(sdkp, buffer);

>   		sd_read_security(sdkp, buffer);

> +		sd_read_cpr(sdkp);

>   	}

>   

>   	/*

> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h

> index b59136c4125b..2e5932bde43d 100644

> --- a/drivers/scsi/sd.h

> +++ b/drivers/scsi/sd.h

> @@ -106,6 +106,7 @@ struct scsi_disk {

>   	u8		protection_type;/* Data Integrity Field */

>   	u8		provisioning_mode;

>   	u8		zeroing_mode;

> +	u8		nr_actuators;		/* Number of actuators */

>   	unsigned	ATO : 1;	/* state of disk ATO bit */

>   	unsigned	cache_override : 1; /* temp override of WCE,RCD */

>   	unsigned	WCE : 1;	/* state of disk WCE bit */

> 

Otherwise:

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Damien Le Moal July 22, 2021, 11:26 p.m. UTC | #3
On 2021/07/23 1:14, Hannes Reinecke wrote:
> On 7/21/21 12:42 PM, Damien Le Moal wrote:
>> Add the sd_read_cpr() function to the sd scsi disk driver to discover
>> if a device has multiple concurrent positioning ranges (i.e. multiple
>> actuators on an HDD). This new function is called from
>> sd_revalidate_disk() and uses the block layer functions
>> blk_alloc_cranges() and blk_queue_set_cranges() to set a device
>> cranges according to the information retrieved from log page B9h,
>> if the device supports it.
>>
>> The format of the Concurrent Positioning Ranges VPD page B9h is defined
>> in section 6.6.6 of SBC-5.
>>
>> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
>> ---
>>   drivers/scsi/sd.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++
>>   drivers/scsi/sd.h |  1 +
>>   2 files changed, 81 insertions(+)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index b8d55af763f9..b1e767a01b9f 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3125,6 +3125,85 @@ static void sd_read_security(struct scsi_disk *sdkp, unsigned char *buffer)
>>   		sdkp->security = 1;
>>   }
>>   
>> +static inline sector_t sd64_to_sectors(struct scsi_disk *sdkp, u8 *buf)
>> +{
>> +	return logical_to_sectors(sdkp->device, get_unaligned_be64(buf));
>> +}
>> +
>> +/**
>> + * sd_read_cpr - Query concurrent positioning ranges
>> + * @sdkp:	disk to query
>> + */
>> +static void sd_read_cpr(struct scsi_disk *sdkp)
>> +{
>> +	unsigned char *buffer, *desc;
>> +	struct blk_cranges *cr = NULL;
>> +	unsigned int nr_cpr = 0;
>> +	int i, vpd_len, buf_len = SD_BUF_SIZE;
>> +
>> +	/*
>> +	 * We need to have the capacity set first for the block layer to be
>> +	 * able to check the ranges.
>> +	 */
>> +	if (sdkp->first_scan)
>> +		return;
>> +
>> +	if (!sdkp->capacity)
>> +		goto out;
>> +
>> +	/*
>> +	 * Concurrent Positioning Ranges VPD: there can be at most 256 ranges,
>> +	 * leading to a maximum page size of 64 + 256*32 bytes.
>> +	 */
>> +	buf_len = 64 + 256*32;
>> +	buffer = kmalloc(buf_len, GFP_KERNEL);
>> +	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
>> +		goto out;
>> +
>> +	/* We must have at least a 64B header and one 32B range descriptor */
>> +	vpd_len = get_unaligned_be16(&buffer[2]) + 3;
>> +	if (vpd_len > buf_len || vpd_len < 64 + 32 || (vpd_len & 31)) {
>> +		sd_printk(KERN_ERR, sdkp,
>> +			  "Invalid Concurrent Positioning Ranges VPD page\n");
>> +		goto out;
>> +	}
>> +
>> +	nr_cpr = (vpd_len - 64) / 32;
>> +	if (nr_cpr == 1) {
>> +		nr_cpr = 0;
>> +		goto out;
>> +	}
>> +
>> +	cr = blk_alloc_cranges(sdkp->disk, nr_cpr);
>> +	if (!cr) {
>> +		nr_cpr = 0;
>> +		goto out;
>> +	}
>> +
>> +	desc = &buffer[64];
>> +	for (i = 0; i < nr_cpr; i++, desc += 32) {
>> +		if (desc[0] != i) {
>> +			sd_printk(KERN_ERR, sdkp,
>> +				"Invalid Concurrent Positioning Range number\n");
>> +			nr_cpr = 0;
>> +			break;
>> +		}
>> +
>> +		cr->ranges[i].sector = sd64_to_sectors(sdkp, desc + 8);
>> +		cr->ranges[i].nr_sectors = sd64_to_sectors(sdkp, desc + 16);
>> +	}
>> +
>> +out:
>> +	blk_queue_set_cranges(sdkp->disk, cr);
> 
> See? We are _are_ creating a new set of ranges.
> So why bother updating the old ones?

See my reply to my comments on patch 1. We do not update the old ones. The cases
are:
1) This is the first call, q->cranges is NULL: cr will become q->cranges.
2) q->cranges is already set and cr has the same info as q->cranges (no change):
cr will be freed by blk_queue_set_cranges().
3) q->cranges is already set and cr is different from q-cranges: q->cranges will
be completely dropped and cr become the new q->cranges.

So we never directly update anything inside q->cranges. We just swap in a new
full structure in case of change.

>> +	if (nr_cpr && sdkp->nr_actuators != nr_cpr) {
>> +		sd_printk(KERN_NOTICE, sdkp,
>> +			  "%u concurrent positioning ranges\n", nr_cpr);
>> +		sdkp->nr_actuators = nr_cpr;
>> +	}
>> +
>> +	kfree(buffer);
>> +}
>> +
>>   /*
>>    * Determine the device's preferred I/O size for reads and writes
>>    * unless the reported value is unreasonably small, large, not a
>> @@ -3240,6 +3319,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
>>   		sd_read_app_tag_own(sdkp, buffer);
>>   		sd_read_write_same(sdkp, buffer);
>>   		sd_read_security(sdkp, buffer);
>> +		sd_read_cpr(sdkp);
>>   	}
>>   
>>   	/*
>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index b59136c4125b..2e5932bde43d 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -106,6 +106,7 @@ struct scsi_disk {
>>   	u8		protection_type;/* Data Integrity Field */
>>   	u8		provisioning_mode;
>>   	u8		zeroing_mode;
>> +	u8		nr_actuators;		/* Number of actuators */
>>   	unsigned	ATO : 1;	/* state of disk ATO bit */
>>   	unsigned	cache_override : 1; /* temp override of WCE,RCD */
>>   	unsigned	WCE : 1;	/* state of disk WCE bit */
>>
> Otherwise:
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> 
> Cheers,
> 
> Hannes
>