mbox series

[v14,00/19] Improve write performance for zoned UFS devices​

Message ID 20231023215638.3405959-1-bvanassche@acm.org
Headers show
Series Improve write performance for zoned UFS devices​ | expand

Message

Bart Van Assche Oct. 23, 2023, 9:53 p.m. UTC
Hi Jens,

This patch series improves small write IOPS by a factor of four (+300%) for
zoned UFS devices on my test setup with an UFSHCI 3.0 controller. Please
consider the block layer patches of this series for the next merge window.

Thank you,

Bart.

Changes compared to v13:
 - Reworked patch "block: Preserve the order of requeued zoned writes".
 - Addressed a performance concern by removing the eh_needs_prepare_resubmit
   SCSI driver callback and by introducing the SCSI host template flag
   .needs_prepare_resubmit instead.
 - Added a patch that adds a 'host' argument to scsi_eh_flush_done_q().
 - Made the code in unit tests less repetitive.

Changes compared to v12:
 - Added two new patches: "block: Preserve the order of requeued zoned writes"
   and "scsi: sd: Add a unit test for sd_cmp_sector()"
 - Restricted the number of zoned write retries. To my surprise I had to add
   "&& scmd->retries <= scmd->allowed" in the SCSI error handler to limit the
   number of retries.
 - In patch "scsi: ufs: Inform the block layer about write ordering", only set
   ELEVATOR_F_ZBD_SEQ_WRITE for zoned block devices.

Changes compared to v11:
 - Fixed a NULL pointer dereference that happened when booting from an ATA
   device by adding an scmd->device != NULL check in scsi_needs_preparation().
 - Updated Reviewed-by tags.

Changes compared to v10:
 - Dropped the UFS MediaTek and HiSilicon patches because these are not correct
   and because it is safe to drop these patches.
 - Updated Acked-by / Reviewed-by tags.

Changes compared to v9:
 - Introduced an additional scsi_driver callback: .eh_needs_prepare_resubmit().
 - Renamed the scsi_debug kernel module parameter 'no_zone_write_lock' into
   'preserves_write_order'.
 - Fixed an out-of-bounds access in the unit scsi_call_prepare_resubmit() unit
   test.
 - Wrapped ufshcd_auto_hibern8_update() calls in UFS host drivers with
   WARN_ON_ONCE() such that a kernel stack appears in case an error code is
   returned.
 - Elaborated a comment in the UFSHCI driver.

Changes compared to v8:
 - Fixed handling of 'driver_preserves_write_order' and 'use_zone_write_lock'
   in blk_stack_limits().
 - Added a comment in disk_set_zoned().
 - Modified blk_req_needs_zone_write_lock() such that it returns false if
   q->limits.use_zone_write_lock is false.
 - Modified disk_clear_zone_settings() such that it clears
   q->limits.use_zone_write_lock.
 - Left out one change from the mq-deadline patch that became superfluous due to
   the blk_req_needs_zone_write_lock() change.
 - Modified scsi_call_prepare_resubmit() such that it only calls list_sort() if
   zoned writes have to be resubmitted for which zone write locking is disabled.
 - Added an additional unit test for scsi_call_prepare_resubmit().
 - Modified the sorting code in the sd driver such that only those SCSI commands
   are sorted for which write locking is disabled.
 - Modified sd_zbc.c such that ELEVATOR_F_ZBD_SEQ_WRITE is only set if the
   write order is not preserved.
 - Included three patches for UFS host drivers that rework code that wrote
   directly to the auto-hibernation controller register.
 - Modified the UFS driver such that enabling auto-hibernation is not allowed
   if a zoned logical unit is present and if the controller operates in legacy
   mode.
 - Also in the UFS driver, simplified ufshcd_auto_hibern8_update().

Changes compared to v7:
 - Split the queue_limits member variable `use_zone_write_lock' into two member
   variables: `use_zone_write_lock' (set by disk_set_zoned()) and
   `driver_preserves_write_order' (set by the block driver or SCSI LLD). This
   should clear up the confusion about the purpose of this variable.
 - Moved the code for sorting SCSI commands by LBA from the SCSI error handler
   into the SCSI disk (sd) driver as requested by Christoph.
   
Changes compared to v6:
 - Removed QUEUE_FLAG_NO_ZONE_WRITE_LOCK and instead introduced a flag in
   the request queue limits data structure.

Changes compared to v5:
 - Renamed scsi_cmp_lba() into scsi_cmp_sector().
 - Improved several source code comments.

Changes compared to v4:
 - Dropped the patch that introduces the REQ_NO_ZONE_WRITE_LOCK flag.
 - Dropped the null_blk patch and added two scsi_debug patches instead.
 - Dropped the f2fs patch.
 - Split the patch for the UFS driver into two patches.
 - Modified several patch descriptions and source code comments.
 - Renamed dd_use_write_locking() into dd_use_zone_write_locking().
 - Moved the list_sort() call from scsi_unjam_host() into scsi_eh_flush_done_q()
   such that sorting happens just before reinserting.
 - Removed the scsi_cmd_retry_allowed() call from scsi_check_sense() to make
   sure that the retry counter is adjusted once per retry instead of twice.

Changes compared to v3:
 - Restored the patch that introduces QUEUE_FLAG_NO_ZONE_WRITE_LOCK. That patch
   had accidentally been left out from v2.
 - In patch "block: Introduce the flag REQ_NO_ZONE_WRITE_LOCK", improved the
   patch description and added the function blk_no_zone_write_lock().
 - In patch "block/mq-deadline: Only use zone locking if necessary", moved the
   blk_queue_is_zoned() call into dd_use_write_locking().
 - In patch "fs/f2fs: Disable zone write locking", set REQ_NO_ZONE_WRITE_LOCK
   from inside __bio_alloc() instead of in f2fs_submit_write_bio().

Changes compared to v2:
 - Renamed the request queue flag for disabling zone write locking.
 - Introduced a new request flag for disabling zone write locking.
 - Modified the mq-deadline scheduler such that zone write locking is only
   disabled if both flags are set.
 - Added an F2FS patch that sets the request flag for disabling zone write
   locking.
 - Only disable zone write locking in the UFS driver if auto-hibernation is
   disabled.

Changes compared to v1:
 - Left out the patches that are already upstream.
 - Switched the approach in patch "scsi: Retry unaligned zoned writes" from
   retrying immediately to sending unaligned write commands to the SCSI error
   handler.

Bart Van Assche (19):
  block: Introduce more member variables related to zone write locking
  block: Only use write locking if necessary
  block: Preserve the order of requeued zoned writes
  block/mq-deadline: Only use zone locking if necessary
  scsi: Add an argument to scsi_eh_flush_done_q()
  scsi: core: Introduce a mechanism for reordering requests in the error
    handler
  scsi: core: Add unit tests for scsi_call_prepare_resubmit()
  scsi: sd: Sort commands by LBA before resubmitting
  scsi: sd: Add a unit test for sd_cmp_sector()
  scsi: core: Retry unaligned zoned writes
  scsi: sd_zbc: Only require an I/O scheduler if needed
  scsi: scsi_debug: Add the preserves_write_order module parameter
  scsi: scsi_debug: Support injecting unaligned write errors
  scsi: ufs: hisi: Rework the code that disables auto-hibernation
  scsi: ufs: Rename ufshcd_auto_hibern8_enable() and make it static
  scsi: ufs: Change the return type of ufshcd_auto_hibern8_update()
  scsi: ufs: Simplify ufshcd_auto_hibern8_update()
  scsi: ufs: Forbid auto-hibernation without I/O scheduler
  scsi: ufs: Inform the block layer about write ordering

 block/blk-mq.c                      |   6 +-
 block/blk-settings.c                |  15 ++
 block/blk-zoned.c                   |  10 +-
 block/mq-deadline.c                 |  11 +-
 drivers/ata/libata-eh.c             |   2 +-
 drivers/scsi/Kconfig                |   2 +
 drivers/scsi/Kconfig.kunit          |   9 ++
 drivers/scsi/Makefile               |   2 +
 drivers/scsi/Makefile.kunit         |   2 +
 drivers/scsi/libsas/sas_scsi_host.c |   2 +-
 drivers/scsi/scsi_debug.c           |  21 ++-
 drivers/scsi/scsi_error.c           |  75 ++++++++-
 drivers/scsi/scsi_error_test.c      | 227 ++++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c             |   1 +
 drivers/scsi/scsi_priv.h            |   2 +
 drivers/scsi/sd.c                   |  43 ++++++
 drivers/scsi/sd.h                   |   2 +
 drivers/scsi/sd_test.c              |  86 +++++++++++
 drivers/scsi/sd_zbc.c               |   4 +-
 drivers/ufs/core/ufs-sysfs.c        |   2 +-
 drivers/ufs/core/ufshcd-priv.h      |   1 -
 drivers/ufs/core/ufshcd.c           | 119 ++++++++++++---
 drivers/ufs/host/ufs-hisi.c         |   5 +-
 include/linux/blkdev.h              |  10 ++
 include/scsi/scsi.h                 |   1 +
 include/scsi/scsi_driver.h          |   1 +
 include/scsi/scsi_eh.h              |   3 +-
 include/scsi/scsi_host.h            |   6 +
 include/ufs/ufshcd.h                |   3 +-
 29 files changed, 627 insertions(+), 46 deletions(-)
 create mode 100644 drivers/scsi/Kconfig.kunit
 create mode 100644 drivers/scsi/Makefile.kunit
 create mode 100644 drivers/scsi/scsi_error_test.c
 create mode 100644 drivers/scsi/sd_test.c

Comments

Bart Van Assche Oct. 23, 2023, 11:43 p.m. UTC | #1
On 10/23/23 14:53, Bart Van Assche wrote:
> This patch series improves small write IOPS by a factor of four (+300%) for
> zoned UFS devices on my test setup with an UFSHCI 3.0 controller. Please
> consider the block layer patches of this series for the next merge window.

(replying to my own email)

Hi Jens,

Are you OK with queuing the first four patches of this series for the
upcoming merge window (one week from now)? If so, I can send the
remaining patches to Martin after the merge window has closed.

Thanks,

Bart.
Damien Le Moal Oct. 24, 2023, 12:13 a.m. UTC | #2
On 10/24/23 06:54, Bart Van Assche wrote:
> If zoned writes (REQ_OP_WRITE) for a sequential write required zone have
> a starting LBA that differs from the write pointer, e.g. because zoned
> writes have been reordered, then the storage device will respond with an
> UNALIGNED WRITE COMMAND error. Send commands that failed with an
> unaligned write error to the SCSI error handler if zone write locking is
> disabled. The SCSI error handler will sort SCSI commands per LBA before
> resubmitting these.
> 
> If zone write locking is disabled, increase the number of retries for
> write commands sent to a sequential zone to the maximum number of
> outstanding commands because in the worst case the number of times
> reordered zoned writes have to be retried is (number of outstanding
> writes per sequential zone) - 1.
> 
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_error.c | 16 ++++++++++++++++
>  drivers/scsi/scsi_lib.c   |  1 +
>  drivers/scsi/sd.c         |  6 ++++++
>  include/scsi/scsi.h       |  1 +
>  4 files changed, 24 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 8b1eb637ffa8..9a54856fa03b 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -699,6 +699,22 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>  		fallthrough;
>  
>  	case ILLEGAL_REQUEST:
> +		/*
> +		 * Unaligned write command. This may indicate that zoned writes
> +		 * have been received by the device in the wrong order. If zone
> +		 * write locking is disabled, retry after all pending commands
> +		 * have completed.
> +		 */
> +		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
> +		    !req->q->limits.use_zone_write_lock &&
> +		    blk_rq_is_seq_zoned_write(req) &&
> +		    scmd->retries <= scmd->allowed) {
> +			sdev_printk(KERN_INFO, scmd->device,
> +				    "Retrying unaligned write at LBA %#llx.\n",
> +				    scsi_get_lba(scmd));

KERN_INFO ? Did you perhaps mean KERN_DEBUG ? An info message for this will be
way too noisy.

> +			return NEEDS_DELAYED_RETRY;
> +		}
> +
>  		if (sshdr.asc == 0x20 || /* Invalid command operation code */
>  		    sshdr.asc == 0x21 || /* Logical block address out of range */
>  		    sshdr.asc == 0x22 || /* Invalid function */
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index c2f647a7c1b0..33a34693c8a2 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1443,6 +1443,7 @@ static void scsi_complete(struct request *rq)
>  	case ADD_TO_MLQUEUE:
>  		scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
>  		break;
> +	case NEEDS_DELAYED_RETRY:
>  	default:
>  		scsi_eh_scmd_add(cmd);
>  		break;
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 82abc721b543..4e6b77f5854f 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1197,6 +1197,12 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>  	cmd->transfersize = sdp->sector_size;
>  	cmd->underflow = nr_blocks << 9;
>  	cmd->allowed = sdkp->max_retries;
> +	/*
> +	 * Increase the number of allowed retries for zoned writes if zone
> +	 * write locking is disabled.
> +	 */
> +	if (!rq->q->limits.use_zone_write_lock && blk_rq_is_seq_zoned_write(rq))
> +		cmd->allowed += rq->q->nr_requests;
>  	cmd->sdb.length = nr_blocks * sdp->sector_size;
>  
>  	SCSI_LOG_HLQUEUE(1,
> diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
> index ec093594ba53..6600db046227 100644
> --- a/include/scsi/scsi.h
> +++ b/include/scsi/scsi.h
> @@ -93,6 +93,7 @@ static inline int scsi_status_is_check_condition(int status)
>   * Internal return values.
>   */
>  enum scsi_disposition {
> +	NEEDS_DELAYED_RETRY	= 0x2000,
>  	NEEDS_RETRY		= 0x2001,
>  	SUCCESS			= 0x2002,
>  	FAILED			= 0x2003,
Bart Van Assche Oct. 24, 2023, 5:17 p.m. UTC | #3
On 10/24/23 02:26, John Garry wrote:
> On 23/10/2023 22:53, Bart Van Assche wrote:
> "Add an argument to scsi_eh_flush_done_q()" - that's too vague. I'd have 
> "Pass shost pointer to scsi_eh_flush_done_q()".

Thanks for having taken a look. I will change the subject.

> And I have to admit that it is painful to help review when only cc'ed 
> explicitly on a single patch.

The linux-scsi mailing list has been Cc-ed. I assume that you are
subscribed to that mailing list.

BTW, I'm using "scripts/get_maintainer.pl" to build a Cc-list. 
Unfortunately git send-email does not make it easy to combine the
cc-lists of individual patches into one cc-list for all patches :-(

Thanks,

Bart.
John Garry Oct. 24, 2023, 6:20 p.m. UTC | #4
On 24/10/2023 18:17, Bart Van Assche wrote:
> 
> The linux-scsi mailing list has been Cc-ed. I assume that you are
> subscribed to that mailing list.

I am subscribed. However I have a rule set up which filters anything 
linux-scsi to a dedicated subfolder unless I am explicitly included in 
the mail "to" or "cc" list. I guess that many contributors do something 
similar. So this means that for this series patches end up in different 
folders and it becomes a bit harder to track.

> 
> BTW, I'm using "scripts/get_maintainer.pl" to build a Cc-list. 
> Unfortunately git send-email does not make it easy to combine the
> cc-lists of individual patches into one cc-list for all patches 🙁

I just normally run scripts/get_maintainer.pl on *.patch and use that 
result as the git send-email address list and never bother with "Cc:" in 
any individual patch.

Thanks,
John
Bart Van Assche Oct. 25, 2023, 7:28 p.m. UTC | #5
On 10/25/23 00:25, Damien Le Moal wrote:
> On 10/25/23 02:22, Bart Van Assche wrote:
>> On 10/23/23 17:13, Damien Le Moal wrote:
>>> On 10/24/23 06:54, Bart Van Assche wrote:
>>>>    	case ILLEGAL_REQUEST:
>>>> +		/*
>>>> +		 * Unaligned write command. This may indicate that zoned writes
>>>> +		 * have been received by the device in the wrong order. If zone
>>>> +		 * write locking is disabled, retry after all pending commands
>>>> +		 * have completed.
>>>> +		 */
>>>> +		if (sshdr.asc == 0x21 && sshdr.ascq == 0x04 &&
>>>> +		    !req->q->limits.use_zone_write_lock &&
>>>> +		    blk_rq_is_seq_zoned_write(req) &&
>>>> +		    scmd->retries <= scmd->allowed) {
>>>> +			sdev_printk(KERN_INFO, scmd->device,
>>>> +				    "Retrying unaligned write at LBA %#llx.\n",
>>>> +				    scsi_get_lba(scmd));
>>>
>>> KERN_INFO ? Did you perhaps mean KERN_DEBUG ? An info message for this will be
>>> way too noisy.
>>
>> Hi Damien,
>>
>> Are you sure that KERN_INFO will be too noisy? On our test setups we see
>> this message less than once a day. Anyway, I will change the severity level.
> 
> I am not sure. But better safe than sorry :)
> 
> So given that we should not scare the user with errors that are not errors (as
> the next tries will succeed), we should be silent and log a message only if the
> retry count is exhausted and we still see a failure.

Hi Damien,

I will wrap SCSI_LOG_ERROR_RECOVERY(1, ...) around the above
sdev_printk() call. As you probably know SCSI logging is disabled by
default.

Thanks,

Bart.