diff mbox series

scsi: core: Cap shost cmd_per_lun at can_queue

Message ID 1621434662-173079-1-git-send-email-john.garry@huawei.com
State New
Headers show
Series scsi: core: Cap shost cmd_per_lun at can_queue | expand

Commit Message

John Garry May 19, 2021, 2:31 p.m. UTC
Function sdev_store_queue_depth() enforces that the sdev queue depth cannot
exceed Shost.can_queue.

The sdev initial value comes from shost cmd_per_lun.

However, the LLDD may still set cmd_per_lun > can_queue, which leads to an
initial sdev queue depth greater than can_queue.

Such an issue was reported in [0], which caused a hang. That has since
been fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun,
set to max_queue").

Stop this possibly happening for other drivers by capping
shost.cmd_per_lun at shost.can_queue.

[0] https://lore.kernel.org/linux-scsi/YHaez6iN2HHYxYOh@T590/

Signed-off-by: John Garry <john.garry@huawei.com>

---
Earlier patch was in https://lore.kernel.org/linux-scsi/1618848384-204144-1-git-send-email-john.garry@huawei.com/

-- 
2.26.2

Comments

Ming Lei May 20, 2021, 1:24 a.m. UTC | #1
On Wed, May 19, 2021 at 10:31:02PM +0800, John Garry wrote:
> Function sdev_store_queue_depth() enforces that the sdev queue depth cannot

> exceed Shost.can_queue.

> 

> The sdev initial value comes from shost cmd_per_lun.

> 

> However, the LLDD may still set cmd_per_lun > can_queue, which leads to an

> initial sdev queue depth greater than can_queue.

> 

> Such an issue was reported in [0], which caused a hang. That has since

> been fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun,

> set to max_queue").

> 

> Stop this possibly happening for other drivers by capping

> shost.cmd_per_lun at shost.can_queue.

> 

> [0] https://lore.kernel.org/linux-scsi/YHaez6iN2HHYxYOh@T590/

> 

> Signed-off-by: John Garry <john.garry@huawei.com>

> ---

> Earlier patch was in https://lore.kernel.org/linux-scsi/1618848384-204144-1-git-send-email-john.garry@huawei.com/

> 

> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c

> index ba72bd4202a2..624e2582c3df 100644

> --- a/drivers/scsi/hosts.c

> +++ b/drivers/scsi/hosts.c

> @@ -220,6 +220,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,

>  		goto fail;

>  	}

>  

> +	shost->cmd_per_lun = min_t(short, shost->cmd_per_lun,

> +				   shost->can_queue);

> +

>  	error = scsi_init_sense_cache(shost);

>  	if (error)

>  		goto fail;

> -- 

> 2.26.2

> 


Reviewed-by: Ming Lei <ming.lei@redhat.com>


-- 
Ming
Bart Van Assche May 20, 2021, 3:57 p.m. UTC | #2
On 5/19/21 7:31 AM, John Garry wrote:
> Function sdev_store_queue_depth() enforces that the sdev queue depth cannot

> exceed Shost.can_queue.

> 

> The sdev initial value comes from shost cmd_per_lun.

> 

> However, the LLDD may still set cmd_per_lun > can_queue, which leads to an

> initial sdev queue depth greater than can_queue.

> 

> Such an issue was reported in [0], which caused a hang. That has since

> been fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun,

> set to max_queue").

> 

> Stop this possibly happening for other drivers by capping

> shost.cmd_per_lun at shost.can_queue.

> 

> [0] https://lore.kernel.org/linux-scsi/YHaez6iN2HHYxYOh@T590/

> 

> Signed-off-by: John Garry <john.garry@huawei.com>

> ---

> Earlier patch was in https://lore.kernel.org/linux-scsi/1618848384-204144-1-git-send-email-john.garry@huawei.com/

> 

> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c

> index ba72bd4202a2..624e2582c3df 100644

> --- a/drivers/scsi/hosts.c

> +++ b/drivers/scsi/hosts.c

> @@ -220,6 +220,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,

>  		goto fail;

>  	}

>  

> +	shost->cmd_per_lun = min_t(short, shost->cmd_per_lun,

> +				   shost->can_queue);

> +

>  	error = scsi_init_sense_cache(shost);

>  	if (error)

>  		goto fail;



In SCSI header files a mix of int, short and unsigned int is used for
cmd_per_lun and can_queue. How about changing the types of these two
member variables in include/scsi/*h into u16?

Anyway:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
John Garry May 20, 2021, 4:41 p.m. UTC | #3
On 20/05/2021 16:57, Bart Van Assche wrote:
> On 5/19/21 7:31 AM, John Garry wrote:

>> Function sdev_store_queue_depth() enforces that the sdev queue depth cannot

>> exceed Shost.can_queue.

>>

>> The sdev initial value comes from shost cmd_per_lun.

>>

>> However, the LLDD may still set cmd_per_lun > can_queue, which leads to an

>> initial sdev queue depth greater than can_queue.

>>

>> Such an issue was reported in [0], which caused a hang. That has since

>> been fixed in commit fc09acb7de31 ("scsi: scsi_debug: Fix cmd_per_lun,

>> set to max_queue").

>>

>> Stop this possibly happening for other drivers by capping

>> shost.cmd_per_lun at shost.can_queue.

>>

>> [0] https://lore.kernel.org/linux-scsi/YHaez6iN2HHYxYOh@T590/

>>

>> Signed-off-by: John Garry <john.garry@huawei.com>

>> ---

>> Earlier patch was in https://lore.kernel.org/linux-scsi/1618848384-204144-1-git-send-email-john.garry@huawei.com/

>>

>> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c

>> index ba72bd4202a2..624e2582c3df 100644

>> --- a/drivers/scsi/hosts.c

>> +++ b/drivers/scsi/hosts.c

>> @@ -220,6 +220,9 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,

>>   		goto fail;

>>   	}

>>   

>> +	shost->cmd_per_lun = min_t(short, shost->cmd_per_lun,

>> +				   shost->can_queue);

>> +

>>   	error = scsi_init_sense_cache(shost);

>>   	if (error)

>>   		goto fail;

> 

> 


Hi Bart,

> In SCSI header files a mix of int, short and unsigned int is used for

> cmd_per_lun and can_queue. How about changing the types of these two

> member variables in include/scsi/*h into u16?

I don't mind doing that, but is there any requirement for can_queue to 
not be limited to 16b?

It seems intentional that can_queue is int and cmd_per_lun is short.

As an aside, if short is 16b, it does not even seem to have efficient 
packing on Scsi_host today (although we can move things around):

int can_queue;
short cmd_per_lun;
short unsigned int sg_tablesize;
short unsigned int sg_prot_tablesize;
/* 16b hole */
unsigned int max_sectors;

> 

> Anyway:

> 

> Reviewed-by: Bart Van Assche <bvanassche@acm.org>

> 


thanks!

> 

> 

> .

>
Bart Van Assche May 20, 2021, 4:57 p.m. UTC | #4
On 5/20/21 9:41 AM, John Garry wrote:
> On 20/05/2021 16:57, Bart Van Assche wrote:

>> In SCSI header files a mix of int, short and unsigned int is used for

>> cmd_per_lun and can_queue. How about changing the types of these two

>> member variables in include/scsi/*h into u16?

> I don't mind doing that, but is there any requirement for can_queue to

> not be limited to 16b?


Maybe I'm missing something but it is not clear to me why different
structures in the SCSI headers use different data types for can_queue
and cmd_per_lun?

$ git grep -nHEw '(cmd_per_lun|can_queue);' include/scsi
include/scsi/scsi_device.h:318:	unsigned int		can_queue;
include/scsi/scsi_host.h:372:	int can_queue;
include/scsi/scsi_host.h:425:	short cmd_per_lun;
include/scsi/scsi_host.h:612:	int can_queue;
include/scsi/scsi_host.h:613:	short cmd_per_lun;

> It seems intentional that can_queue is int and cmd_per_lun is short.


Intentional? It is not clear to me why? Even high-performance drivers
like iSER and SRP set can_queue by default to a value that fits well in
a 16-bit variable (512 and 64 respectively). The highest value that I
found after a quick search is the following:

 #define ISCSI_TOTAL_CMDS_MAX		4096

Thanks,

Bart.
John Garry May 20, 2021, 5:24 p.m. UTC | #5
On 20/05/2021 17:57, Bart Van Assche wrote:
>> not be limited to 16b?

> Maybe I'm missing something but it is not clear to me why different

> structures in the SCSI headers use different data types for can_queue

> and cmd_per_lun?


For cmd_per_lun, is it related to SCSI task tag limit? SAM-3 says upto 
64b for task tag, but then SAS uses 16b for TMF tag, so not sure.

Someone with more SCSI spec knowledge than we can clarify this.

> 

> $ git grep -nHEw '(cmd_per_lun|can_queue);' include/scsi

> include/scsi/scsi_device.h:318:	unsigned int		can_queue;

> include/scsi/scsi_host.h:372:	int can_queue;

> include/scsi/scsi_host.h:425:	short cmd_per_lun;

> include/scsi/scsi_host.h:612:	int can_queue;

> include/scsi/scsi_host.h:613:	short cmd_per_lun;

> 

>> It seems intentional that can_queue is int and cmd_per_lun is short.

> Intentional? It is not clear to me why? Even high-performance drivers

> like iSER and SRP set can_queue by default to a value that fits well in

> a 16-bit variable (512 and 64 respectively). The highest value that I

> found after a quick search is the following:

> 

>   #define ISCSI_TOTAL_CMDS_MAX		4096


I guess int was used for can_queue as an arbitrarily big number.

And if we try to use 16b for can_queue, reducing size of 
variables/structure members sometimes breaks things, from my experience.

Thanks,
John
Martin K. Petersen May 22, 2021, 4:28 a.m. UTC | #6
John,

> I don't mind doing that, but is there any requirement for can_queue to

> not be limited to 16b?

>

> It seems intentional that can_queue is int and cmd_per_lun is short.


I suspect cmd_per_lun was originally chosen to be 16 bits because of FC
(SPI was 8 bits). And that it seemed unreasonable that the initiator
should be limited to what a single LUN could express. But this is all
guesswork. This code was obviously written a very, very long time ago...

-- 
Martin K. Petersen	Oracle Linux Engineering
Martin K. Petersen May 22, 2021, 4:35 a.m. UTC | #7
John,

> Function sdev_store_queue_depth() enforces that the sdev queue depth

> cannot exceed Shost.can_queue.


Applied to 5.14/scsi-staging, thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering
Martin K. Petersen May 26, 2021, 4:07 a.m. UTC | #8
On Wed, 19 May 2021 22:31:02 +0800, John Garry wrote:

> Function sdev_store_queue_depth() enforces that the sdev queue depth cannot

> exceed Shost.can_queue.

> 

> The sdev initial value comes from shost cmd_per_lun.

> 

> However, the LLDD may still set cmd_per_lun > can_queue, which leads to an

> initial sdev queue depth greater than can_queue.

> 

> [...]


Applied to 5.14/scsi-queue, thanks!

[1/1] scsi: core: Cap shost cmd_per_lun at can_queue
      https://git.kernel.org/mkp/scsi/c/ea2f0f77538c

-- 
Martin K. Petersen	Oracle Linux Engineering
diff mbox series

Patch

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index ba72bd4202a2..624e2582c3df 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -220,6 +220,9 @@  int scsi_add_host_with_dma(struct Scsi_Host *shost, struct device *dev,
 		goto fail;
 	}
 
+	shost->cmd_per_lun = min_t(short, shost->cmd_per_lun,
+				   shost->can_queue);
+
 	error = scsi_init_sense_cache(shost);
 	if (error)
 		goto fail;