Message ID | 1621434662-173079-1-git-send-email-john.garry@huawei.com |
---|---|
State | New |
Headers | show |
Series | scsi: core: Cap shost cmd_per_lun at can_queue | expand |
On Wed, May 19, 2021 at 10:31:02PM +0800, John Garry wrote: > Function sdev_store_queue_depth() enforces that the sdev queue depth cannot > exceed Shost.can_queue. > > The sdev initial value comes from shost cmd_per_lun. > > However, the LLDD may still set cmd_per_lun > can_queue, which leads to an > initial sdev queue depth greater than can_queue. > > Such an issue was reported in [0], which caused a hang. That has since > been fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun, > set to max_queue"). > > Stop this possibly happening for other drivers by capping > shost.cmd_per_lun at shost.can_queue. > > [0] https://lore.kernel.org/linux-scsi/YHaez6iN2HHYxYOh@T590/ > > Signed-off-by: John Garry <john.garry@huawei.com> > --- > Earlier patch was in https://lore.kernel.org/linux-scsi/1618848384-204144-1-git-send-email-john.garry@huawei.com/ > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > index ba72bd4202a2..624e2582c3df 100644 > --- a/drivers/scsi/hosts.c > +++ b/drivers/scsi/hosts.c > @@ -220,6 +220,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, > goto fail; > } > > + shost->cmd_per_lun = min_t(short, shost->cmd_per_lun, > + shost->can_queue); > + > error = scsi_init_sense_cache(shost); > if (error) > goto fail; > -- > 2.26.2 > Reviewed-by: Ming Lei <ming.lei@redhat.com> -- Ming
On 5/19/21 7:31 AM, John Garry wrote: > Function sdev_store_queue_depth() enforces that the sdev queue depth cannot > exceed Shost.can_queue. > > The sdev initial value comes from shost cmd_per_lun. > > However, the LLDD may still set cmd_per_lun > can_queue, which leads to an > initial sdev queue depth greater than can_queue. > > Such an issue was reported in [0], which caused a hang. That has since > been fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun, > set to max_queue"). > > Stop this possibly happening for other drivers by capping > shost.cmd_per_lun at shost.can_queue. > > [0] https://lore.kernel.org/linux-scsi/YHaez6iN2HHYxYOh@T590/ > > Signed-off-by: John Garry <john.garry@huawei.com> > --- > Earlier patch was in https://lore.kernel.org/linux-scsi/1618848384-204144-1-git-send-email-john.garry@huawei.com/ > > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c > index ba72bd4202a2..624e2582c3df 100644 > --- a/drivers/scsi/hosts.c > +++ b/drivers/scsi/hosts.c > @@ -220,6 +220,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, > goto fail; > } > > + shost->cmd_per_lun = min_t(short, shost->cmd_per_lun, > + shost->can_queue); > + > error = scsi_init_sense_cache(shost); > if (error) > goto fail; In SCSI header files a mix of int, short and unsigned int is used for cmd_per_lun and can_queue. How about changing the types of these two member variables in include/scsi/*h into u16? Anyway: Reviewed-by: Bart Van Assche <bvanassche@acm.org>
On 20/05/2021 16:57, Bart Van Assche wrote: > On 5/19/21 7:31 AM, John Garry wrote: >> Function sdev_store_queue_depth() enforces that the sdev queue depth cannot >> exceed Shost.can_queue. >> >> The sdev initial value comes from shost cmd_per_lun. >> >> However, the LLDD may still set cmd_per_lun > can_queue, which leads to an >> initial sdev queue depth greater than can_queue. >> >> Such an issue was reported in [0], which caused a hang. That has since >> been fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun, >> set to max_queue"). >> >> Stop this possibly happening for other drivers by capping >> shost.cmd_per_lun at shost.can_queue. >> >> [0] https://lore.kernel.org/linux-scsi/YHaez6iN2HHYxYOh@T590/ >> >> Signed-off-by: John Garry <john.garry@huawei.com> >> --- >> Earlier patch was in https://lore.kernel.org/linux-scsi/1618848384-204144-1-git-send-email-john.garry@huawei.com/ >> >> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c >> index ba72bd4202a2..624e2582c3df 100644 >> --- a/drivers/scsi/hosts.c >> +++ b/drivers/scsi/hosts.c >> @@ -220,6 +220,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, >> goto fail; >> } >> >> + shost->cmd_per_lun = min_t(short, shost->cmd_per_lun, >> + shost->can_queue); >> + >> error = scsi_init_sense_cache(shost); >> if (error) >> goto fail; > > Hi Bart, > In SCSI header files a mix of int, short and unsigned int is used for > cmd_per_lun and can_queue. How about changing the types of these two > member variables in include/scsi/*h into u16? I don't mind doing that, but is there any requirement for can_queue to not be limited to 16b? It seems intentional that can_queue is int and cmd_per_lun is short. As an aside, if short is 16b, it does not even seem to have efficient packing on Scsi_host today (although we can move things around): int can_queue; short cmd_per_lun; short unsigned int sg_tablesize; short unsigned int sg_prot_tablesize; /* 16b hole */ unsigned int max_sectors; > > Anyway: > > Reviewed-by: Bart Van Assche <bvanassche@acm.org> > thanks! > > > . >
On 5/20/21 9:41 AM, John Garry wrote: > On 20/05/2021 16:57, Bart Van Assche wrote: >> In SCSI header files a mix of int, short and unsigned int is used for >> cmd_per_lun and can_queue. How about changing the types of these two >> member variables in include/scsi/*h into u16? > I don't mind doing that, but is there any requirement for can_queue to > not be limited to 16b? Maybe I'm missing something but it is not clear to me why different structures in the SCSI headers use different data types for can_queue and cmd_per_lun? $ git grep -nHEw '(cmd_per_lun|can_queue);' include/scsi include/scsi/scsi_device.h:318: unsigned int can_queue; include/scsi/scsi_host.h:372: int can_queue; include/scsi/scsi_host.h:425: short cmd_per_lun; include/scsi/scsi_host.h:612: int can_queue; include/scsi/scsi_host.h:613: short cmd_per_lun; > It seems intentional that can_queue is int and cmd_per_lun is short. Intentional? It is not clear to me why? Even high-performance drivers like iSER and SRP set can_queue by default to a value that fits well in a 16-bit variable (512 and 64 respectively). The highest value that I found after a quick search is the following: #define ISCSI_TOTAL_CMDS_MAX 4096 Thanks, Bart.
On 20/05/2021 17:57, Bart Van Assche wrote: >> not be limited to 16b? > Maybe I'm missing something but it is not clear to me why different > structures in the SCSI headers use different data types for can_queue > and cmd_per_lun? For cmd_per_lun, is it related to SCSI task tag limit? SAM-3 says upto 64b for task tag, but then SAS uses 16b for TMF tag, so not sure. Someone with more SCSI spec knowledge than we can clarify this. > > $ git grep -nHEw '(cmd_per_lun|can_queue);' include/scsi > include/scsi/scsi_device.h:318: unsigned int can_queue; > include/scsi/scsi_host.h:372: int can_queue; > include/scsi/scsi_host.h:425: short cmd_per_lun; > include/scsi/scsi_host.h:612: int can_queue; > include/scsi/scsi_host.h:613: short cmd_per_lun; > >> It seems intentional that can_queue is int and cmd_per_lun is short. > Intentional? It is not clear to me why? Even high-performance drivers > like iSER and SRP set can_queue by default to a value that fits well in > a 16-bit variable (512 and 64 respectively). The highest value that I > found after a quick search is the following: > > #define ISCSI_TOTAL_CMDS_MAX 4096 I guess int was used for can_queue as an arbitrarily big number. And if we try to use 16b for can_queue, reducing size of variables/structure members sometimes breaks things, from my experience. Thanks, John
John, > I don't mind doing that, but is there any requirement for can_queue to > not be limited to 16b? > > It seems intentional that can_queue is int and cmd_per_lun is short. I suspect cmd_per_lun was originally chosen to be 16 bits because of FC (SPI was 8 bits). And that it seemed unreasonable that the initiator should be limited to what a single LUN could express. But this is all guesswork. This code was obviously written a very, very long time ago... -- Martin K. Petersen Oracle Linux Engineering
John, > Function sdev_store_queue_depth() enforces that the sdev queue depth > cannot exceed Shost.can_queue. Applied to 5.14/scsi-staging, thanks! -- Martin K. Petersen Oracle Linux Engineering
On Wed, 19 May 2021 22:31:02 +0800, John Garry wrote: > Function sdev_store_queue_depth() enforces that the sdev queue depth cannot > exceed Shost.can_queue. > > The sdev initial value comes from shost cmd_per_lun. > > However, the LLDD may still set cmd_per_lun > can_queue, which leads to an > initial sdev queue depth greater than can_queue. > > [...] Applied to 5.14/scsi-queue, thanks! [1/1] scsi: core: Cap shost cmd_per_lun at can_queue https://git.kernel.org/mkp/scsi/c/ea2f0f77538c -- Martin K. Petersen Oracle Linux Engineering
diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index ba72bd4202a2..624e2582c3df 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -220,6 +220,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev, goto fail; } + shost->cmd_per_lun = min_t(short, shost->cmd_per_lun, + shost->can_queue); + error = scsi_init_sense_cache(shost); if (error) goto fail;
Function sdev_store_queue_depth() enforces that the sdev queue depth cannot exceed Shost.can_queue. The sdev initial value comes from shost cmd_per_lun. However, the LLDD may still set cmd_per_lun > can_queue, which leads to an initial sdev queue depth greater than can_queue. Such an issue was reported in [0], which caused a hang. That has since been fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun, set to max_queue"). Stop this possibly happening for other drivers by capping shost.cmd_per_lun at shost.can_queue. [0] https://lore.kernel.org/linux-scsi/YHaez6iN2HHYxYOh@T590/ Signed-off-by: John Garry <john.garry@huawei.com> --- Earlier patch was in https://lore.kernel.org/linux-scsi/1618848384-204144-1-git-send-email-john.garry@huawei.com/ -- 2.26.2