Message ID | 1666693096-180008-1-git-send-email-john.garry@huawei.com |
---|---|
Headers | show |
Series | blk-mq/libata/scsi: SCSI driver tagging improvements Part I | expand |
On 25/10/2022 11:17, John Garry wrote: Hi all, I meant to say that this is just an update for where I got to here. I am actually changing employer soon, but will continue in upstream linux storage domain. So I don't want people to think that I am just throwing some stuff over the wall for the community to deal with. I would still like people to check this. Thanks, John > Currently SCSI low-level drivers are required to manage tags for commands > which do not come via the block layer - libata internal commands would be > an example of one of these. We want to make blk-mq manage these tags also. > > There was some work to provide "reserved commands" support in such series > as https://lore.kernel.org/linux-scsi/20211125151048.103910-1-hare@suse.de/ > > This was based on allocating a request for the lifetime of the "internal" > command. > > This series tries to solve that problem by not just allocating the request > but also sending it as a request through the block layer. Reasons to do > this: > - Normal flow of a request and also commonality for regular scsi command > lifetime > - We don't leave request and scsi_cmnd fields dangling as when we just > allocate and free the request for the lifetime of the "internal" command > - For poll mode support we can only poll in block layer, so could not send > internal commands on poll mode queues if we did not do this, which is a > problem > - Can get rid of duplicated code like libsas internal command timeout > handling > > Series part I contains core SCSI midlayer, libata, and libsas changes to > queue libsas "slow" tasks as requests. > > Series part II of this series focused on changing libata to queue internal > commands as requests. > > Testing: > QEMU with AHCI with disk and cdrom attached, hisi_sas, pm8001. > > Branch containing all patches is at: > https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-6.1-block > > v2 was here: > https://lore.kernel.org/linux-scsi/1654770559-101375-1-git-send-email-john.garry@huawei.com/ > > Hannes Reinecke (1): > scsi: core: Implement reserved command handling > > John Garry (21): > blk-mq: Don't get budget for reserved requests > scsi: core: Add scsi_get_dev() > scsi: core: Add support to send reserved commands > scsi: core: Add support for reserved command timeout handling > scsi: libsas: Improve sas_ex_discover_expander() error handling > scsi: libsas: Notify LLDD expander found before calling sas_rphy_add() > scsi: scsi_transport_sas: Alloc sdev for expander > scsi: libsas: Add sas_alloc_slow_task_rq() > scsi: libsas: Add sas_queuecommand_internal() > scsi: libsas: Add sas_internal_timeout() > scsi: core: Use SCSI_SCAN_RESCAN in __scsi_add_device() > scsi: scsi_transport_sas: Allocate end device target id in the rphy > alloc > ata: libata-scsi: Add ata_scsi_setup_sdev() > scsi: libsas: Add sas_ata_setup_device() > ata: libata-scsi: Allocate sdev early in port probe > scsi: libsas drivers: Reserve tags > scsi: libsas: Queue SMP commands as requests > scsi: libsas: Queue TMF commands as requests > scsi: core: Add scsi_alloc_request_hwq() > scsi: libsas: Queue internal abort commands as requests > scsi: libsas: Delete sas_task_slow.timer > > block/blk-mq.c | 4 +- > drivers/ata/libata-eh.c | 1 + > drivers/ata/libata-scsi.c | 49 ++++++++---- > drivers/ata/libata.h | 1 + > drivers/scsi/aic94xx/aic94xx_init.c | 3 + > drivers/scsi/hisi_sas/hisi_sas_main.c | 40 +++++----- > drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 3 + > drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 3 + > drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 ++ > drivers/scsi/hosts.c | 16 ++++ > drivers/scsi/isci/init.c | 3 + > drivers/scsi/libsas/sas_ata.c | 20 +++++ > drivers/scsi/libsas/sas_expander.c | 101 ++++++++++++++----------- > drivers/scsi/libsas/sas_init.c | 61 ++++++++++++++- > drivers/scsi/libsas/sas_internal.h | 5 ++ > drivers/scsi/libsas/sas_scsi_host.c | 93 ++++++++++++----------- > drivers/scsi/mvsas/mv_init.c | 7 ++ > drivers/scsi/pm8001/pm8001_init.c | 8 +- > drivers/scsi/scsi_error.c | 3 + > drivers/scsi/scsi_lib.c | 42 +++++++++- > drivers/scsi/scsi_scan.c | 29 ++++++- > drivers/scsi/scsi_transport_sas.c | 34 ++++++--- > include/linux/libata.h | 2 + > include/scsi/libsas.h | 8 +- > include/scsi/scsi_cmnd.h | 3 + > include/scsi/scsi_host.h | 21 ++++- > 26 files changed, 424 insertions(+), 143 deletions(-) >
On 10/25/22 19:17, John Garry wrote: > From: Hannes Reinecke <hare@suse.de> > > Quite some drivers are using management commands internally, which > typically use the same hardware tag pool (ie they are being allocated > from the same hardware resources) as the 'normal' I/O commands. > These commands are set aside before allocating the block-mq tag bitmap, > so they'll never show up as busy in the tag map. > The block-layer, OTOH, already has 'reserved_tags' to handle precisely > this situation. > So this patch adds a new field 'nr_reserved_cmds' to the SCSI host > template to instruct the block layer to set aside a tag space for these > management commands by using reserved tags. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > #jpg: Set tag_set->queue_depth = shost->can_queue, and not > = shost->can_queue + shost->nr_reserved_cmds; > Signed-off-by: John Garry <john.garry@huawei.com> > --- > drivers/scsi/hosts.c | 3 +++ > drivers/scsi/scsi_lib.c | 2 ++ > include/scsi/scsi_host.h | 15 ++++++++++++++- > 3 files changed, 19 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > index 12346e2297fd..db89afc37bc9 100644 > --- a/drivers/scsi/hosts.c > +++ b/drivers/scsi/hosts.c > @@ -489,6 +489,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) > if (sht->virt_boundary_mask) > shost->virt_boundary_mask = sht->virt_boundary_mask; > > + if (sht->nr_reserved_cmds) > + shost->nr_reserved_cmds = sht->nr_reserved_cmds; > + Nit: the if is not really necessary I think. But it does not hurt. > device_initialize(&shost->shost_gendev); > dev_set_name(&shost->shost_gendev, "host%d", shost->host_no); > shost->shost_gendev.bus = &scsi_bus_type; > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 39d4fd124375..a8c4e7c037ae 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1978,6 +1978,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) > tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1; > tag_set->nr_maps = shost->nr_maps ? : 1; > tag_set->queue_depth = shost->can_queue; > + tag_set->reserved_tags = shost->nr_reserved_cmds; > + Why the blank line ? > tag_set->cmd_size = cmd_size; > tag_set->numa_node = dev_to_node(shost->dma_dev); > tag_set->flags = BLK_MQ_F_SHOULD_MERGE; > diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h > index 750ccf126377..91678c77398e 100644 > --- a/include/scsi/scsi_host.h > +++ b/include/scsi/scsi_host.h > @@ -360,10 +360,17 @@ struct scsi_host_template { > /* > * This determines if we will use a non-interrupt driven > * or an interrupt driven scheme. It is set to the maximum number > - * of simultaneous commands a single hw queue in HBA will accept. > + * of simultaneous commands a single hw queue in HBA will accept > + * including reserved commands. > */ > int can_queue; > > + /* > + * This determines how many commands the HBA will set aside > + * for reserved commands. > + */ > + int nr_reserved_cmds; > + > /* > * In many instances, especially where disconnect / reconnect are > * supported, our host also has an ID on the SCSI bus. If this is > @@ -611,6 +618,12 @@ struct Scsi_Host { > */ > unsigned nr_hw_queues; > unsigned nr_maps; > + > + /* > + * Number of reserved commands to allocate, if any. > + */ > + unsigned int nr_reserved_cmds; > + > unsigned active_mode:2; > > /*
On 10/27/22 03:18, Damien Le Moal wrote: > On 10/25/22 19:17, John Garry wrote: >> From: Hannes Reinecke <hare@suse.de> >> >> Quite some drivers are using management commands internally, which >> typically use the same hardware tag pool (ie they are being allocated >> from the same hardware resources) as the 'normal' I/O commands. >> These commands are set aside before allocating the block-mq tag bitmap, >> so they'll never show up as busy in the tag map. >> The block-layer, OTOH, already has 'reserved_tags' to handle precisely >> this situation. >> So this patch adds a new field 'nr_reserved_cmds' to the SCSI host >> template to instruct the block layer to set aside a tag space for these >> management commands by using reserved tags. >> >> Signed-off-by: Hannes Reinecke <hare@suse.de> >> #jpg: Set tag_set->queue_depth = shost->can_queue, and not >> = shost->can_queue + shost->nr_reserved_cmds; >> Signed-off-by: John Garry <john.garry@huawei.com> >> --- >> drivers/scsi/hosts.c | 3 +++ >> drivers/scsi/scsi_lib.c | 2 ++ >> include/scsi/scsi_host.h | 15 ++++++++++++++- >> 3 files changed, 19 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c >> index 12346e2297fd..db89afc37bc9 100644 >> --- a/drivers/scsi/hosts.c >> +++ b/drivers/scsi/hosts.c >> @@ -489,6 +489,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) >> if (sht->virt_boundary_mask) >> shost->virt_boundary_mask = sht->virt_boundary_mask; >> >> + if (sht->nr_reserved_cmds) >> + shost->nr_reserved_cmds = sht->nr_reserved_cmds; >> + > > Nit: the if is not really necessary I think. But it does not hurt. > Yes, we do. Not all HBAs are able to figure out the number of reserved commands upfront; some modify that based on the PCI device used etc. So I'd keep it for now. Cheers, Hannes
On 27/10/2022 08:51, Hannes Reinecke wrote: >>> >>> Signed-off-by: Hannes Reinecke <hare@suse.de> >>> #jpg: Set tag_set->queue_depth = shost->can_queue, and not >>> = shost->can_queue + shost->nr_reserved_cmds; >>> Signed-off-by: John Garry <john.garry@huawei.com> >>> --- >>> drivers/scsi/hosts.c | 3 +++ >>> drivers/scsi/scsi_lib.c | 2 ++ >>> include/scsi/scsi_host.h | 15 ++++++++++++++- >>> 3 files changed, 19 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c >>> index 12346e2297fd..db89afc37bc9 100644 >>> --- a/drivers/scsi/hosts.c >>> +++ b/drivers/scsi/hosts.c >>> @@ -489,6 +489,9 @@ struct Scsi_Host *scsi_host_alloc(struct >>> scsi_host_template *sht, int privsize) >>> if (sht->virt_boundary_mask) >>> shost->virt_boundary_mask = sht->virt_boundary_mask; >>> + if (sht->nr_reserved_cmds) >>> + shost->nr_reserved_cmds = sht->nr_reserved_cmds; >>> + >> >> Nit: the if is not really necessary I think. But it does not hurt. >> > Yes, we do. > Not all HBAs are able to figure out the number of reserved commands > upfront; some modify that based on the PCI device used etc. > So I'd keep it for now. I think logically Damien is right as in the shost alloc shost->nr_reserved_cmds is initially zero, so: if (sht->nr_reserved_cmds) shost->nr_reserved_cmds = sht->nr_reserved_cmds; is same as simply: shost->nr_reserved_cmds = sht->nr_reserved_cmds; However I am just copying the coding style. Thanks, John
On 27/10/2022 02:16, Damien Le Moal wrote: >> Signed-off-by: John Garry<john.garry@huawei.com> >> --- >> block/blk-mq.c | 4 +++- >> drivers/scsi/scsi_lib.c | 3 ++- >> 2 files changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 260adeb2e455..d8baabb32ea4 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -1955,11 +1955,13 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list, >> errors = queued = 0; >> do { >> struct blk_mq_queue_data bd; >> + bool need_budget; >> >> rq = list_first_entry(list, struct request, queuelist); >> >> WARN_ON_ONCE(hctx != rq->mq_hctx); >> - prep = blk_mq_prep_dispatch_rq(rq, !nr_budgets); >> + need_budget = !nr_budgets && !blk_mq_is_reserved_rq(rq); >> + prep = blk_mq_prep_dispatch_rq(rq, need_budget); >> if (prep != PREP_DISPATCH_OK) >> break; > Below this code, there is: > > if (nr_budgets) > nr_budgets--; > > Don't you need to change that to: > > if (need_budget && nr_budgets) > nr_budgets--; > > ? Otherwise, the accounting will be off. > Ah, yes, I think that you are right. I actually need to check nr_budgets usage further as nr_budgets initial value would be dependent on any reserved request requiring a budget (which we don't get). Thanks, John
> >> device_initialize(&shost->shost_gendev); >> dev_set_name(&shost->shost_gendev, "host%d", shost->host_no); >> shost->shost_gendev.bus = &scsi_bus_type; >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >> index 39d4fd124375..a8c4e7c037ae 100644 >> --- a/drivers/scsi/scsi_lib.c >> +++ b/drivers/scsi/scsi_lib.c >> @@ -1978,6 +1978,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) >> tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1; >> tag_set->nr_maps = shost->nr_maps ? : 1; >> tag_set->queue_depth = shost->can_queue; >> + tag_set->reserved_tags = shost->nr_reserved_cmds; >> + > Why the blank line ? > I don't think that it is required, I can remedy. >> tag_set->cmd_size = cmd_size; Thanks, John