mbox series

[v2,00/20] UFS patches for kernel v5.17

Message ID 20211119195743.2817-1-bvanassche@acm.org
Headers show
Series UFS patches for kernel v5.17 | expand

Message

Bart Van Assche Nov. 19, 2021, 7:57 p.m. UTC
Hi Martin,

This patch series includes the following changes:
- Introduce internal command and reserved tag support in the SCSI core.
- Fix a deadlock in the UFS error handler.
- Add polling support in the UFS driver.
- Several smaller fixes for the UFS driver.

Please consider these UFS driver kernel patches for kernel v5.17.

Thanks,

Bart.

Changes compared to v1:
- Introduced the symbolic constant UFSHCD_NUM_RESERVED.
- Changed the behavior of the patch that removes the clock scaling lock from
  ufshcd_queuecommand() such that it again waits until all pending commands
  have finished before changing the clock frequency.
- Split the patch that optimizes ufshcd_queuecommand().
- Added patches that introduce support for internal command management in the
  SCSI core.
- Introduced a new function ufshcd_release_scsi_cmd().

Bart Van Assche (18):
  scsi: core: Unexport scsi_track_queue_full()
  scsi: core: Fix scsi_device_max_queue_depth()
  scsi: core: Fix a race between scsi_done() and scsi_times_out()
  scsi: core: Add support for reserved tags
  scsi: ufs: Rename a function argument
  scsi: ufs: Remove is_rpmb_wlun()
  scsi: ufs: Remove the sdev_rpmb member
  scsi: ufs: Remove dead code
  scsi: ufs: Switch to scsi_(get|put)_internal_cmd()
  scsi: ufs: Rework ufshcd_change_queue_depth()
  scsi: ufs: Fix a deadlock in the error handler
  scsi: ufs: Introduce ufshcd_release_scsi_cmd()
  scsi: ufs: Improve SCSI abort handling
  scsi: ufs: Fix a kernel crash during shutdown
  scsi: ufs: Stop using the clock scaling lock in the error handler
  scsi: ufs: Optimize the command queueing code
  scsi: ufs: Implement polling support
  scsi: ufs: Fix race conditions related to driver data

Hannes Reinecke (2):
  block: Add a flag for internal commands
  scsi: core: Add support for internal commands

 block/blk-exec.c                   |   5 +
 drivers/scsi/scsi.c                |   5 +-
 drivers/scsi/scsi_error.c          |  29 +--
 drivers/scsi/scsi_lib.c            |  50 +++-
 drivers/scsi/ufs/tc-dwc-g210-pci.c |   1 -
 drivers/scsi/ufs/ufs-exynos.c      |   4 +-
 drivers/scsi/ufs/ufshcd-pci.c      |   2 -
 drivers/scsi/ufs/ufshcd-pltfrm.c   |   2 -
 drivers/scsi/ufs/ufshcd.c          | 375 ++++++++++++++++-------------
 drivers/scsi/ufs/ufshcd.h          |   5 +-
 include/linux/blk-mq.h             |   5 +
 include/linux/blk_types.h          |   2 +
 include/scsi/scsi_device.h         |   4 +
 include/scsi/scsi_host.h           |   9 +-
 14 files changed, 301 insertions(+), 197 deletions(-)

Comments

Bart Van Assche Nov. 30, 2021, 5:34 p.m. UTC | #1
On 11/30/21 7:40 AM, Bean Huo wrote:
> It is the test case in your commit message. If iodepth=1, there is no
> performance improvement. Increase the io-depth to multiple, for
> example, let iodepth= CPU core counter. I see that the interrupt
> overhead is significantly increased when the request is completed, so
> IO_polling will win compared to the interrupt mode.

Hi Bean,

It is not guaranteed that polling will result in a significant performance
improvement. I assume that on my test setup the improvement is so
significant because the interrupt coalescing delay. Maybe the interrupt
coalescing delay is much smaller on your test setup.

Bart.
Bart Van Assche Nov. 30, 2021, 5:37 p.m. UTC | #2
On 11/30/21 12:43 AM, Bean Huo wrote:
> On Fri, 2021-11-19 at 11:57 -0800, Bart Van Assche wrote:
>> +	return IRQ_HANDLED;
>>   }
>>   
> 
> ufshcd_transfer_req_compl() will never return IRQ_NONE,
> but ufshcd_intr() needs this return to check the fake interrupt.

I don't think that it is possible to implement polling and detect
spurious interrupts without affecting performance negatively. Hence
the choice to never return IRQ_NONE. Is spurious interrupt detection
that important or is this only required for debugging host controllers?
I consider helping with hardware debugging as out of scope for the UFS
driver.

>> @@ -9437,6 +9481,7 @@ int ufshcd_alloc_host(struct device *dev,
>> struct ufs_hba **hba_handle)
>>   		err = -ENOMEM;
>>   		goto out_error;
>>   	}
>> +	host->nr_maps = 3;
> 
> I don't understand why not directly uses HCTX_MAX_TYPES, 3 is already
> maximum.

If new map types would be introduced in the future, we still need 3 maps.
Hence the choice for '3' instead of HCTX_MAX_TYPES.

Thanks,

Bart.
Adrian Hunter Nov. 30, 2021, 7:15 p.m. UTC | #3
On 30/11/2021 19:51, Bart Van Assche wrote:
> On 11/29/21 10:41 PM, Adrian Hunter wrote:
>> On 29/11/2021 21:32, Bart Van Assche wrote:
>>> * The code in blk_cleanup_queue() that waits for pending requests to finish
>>>    before resources associated with the request queue are freed.
>>>    ufshcd_remove() calls blk_cleanup_queue(hba->cmd_queue) and hence waits until
>>>    pending device management commands have finished. That would no longer be the
>>>    case if the block layer is bypassed to submit device management commands.
>>
>> cmd_queue is used only by the UFS driver, so if the driver is racing with
>> itself at "remove", then that should be fixed. The risk is not that the UFS
>> driver might use requests, but that it might still be operating when hba or
>> other resources get freed.
>>
>> So the question remains, for device commands, we do not need the block
>> layer, nor SCSI commands which still begs the question, why involve them
>> at all?
> 
> By using the block layer request allocation functions the block layer guarantees
> that each tag is in use in only one context. When bypassing the block layer code
> would have to be inserted in ufshcd_exec_dev_cmd() and ufshcd_issue_devman_upiu_cmd()
> to serialize these functions.

They already are serialized, but you are essentially saying the functionality
being duplicated is just a lock.  What you are proposing seems awfully
complicated just to get the functionality of a lock.

> In other words, we would be duplicating existing
> functionality if we bypass the block layer. The recommended approach in the Linux
> kernel is not to duplicate existing functionality.

More accurately, the functionality would not be being used at all, so not
really any duplication.
Bart Van Assche Nov. 30, 2021, 7:21 p.m. UTC | #4
On 11/30/21 11:15 AM, Adrian Hunter wrote:
> On 30/11/2021 19:51, Bart Van Assche wrote:
>> By using the block layer request allocation functions the block layer guarantees
>> that each tag is in use in only one context. When bypassing the block layer code
>> would have to be inserted in ufshcd_exec_dev_cmd() and ufshcd_issue_devman_upiu_cmd()
>> to serialize these functions.
> 
> They already are serialized, but you are essentially saying the functionality
> being duplicated is just a lock.  What you are proposing seems awfully
> complicated just to get the functionality of a lock.

Are you perhaps referring to hba->dev_cmd.lock? I had overlooked that lock
when I wrote my previous email. I will take a closer look at the reserved
slot approach.

Thanks,

Bart.