diff mbox series

scsi: sd: Keep the discard mode stable

Message ID 20240614160350.180490-1-fengli@smartx.com
State New
Headers show
Series scsi: sd: Keep the discard mode stable | expand

Commit Message

Li Feng June 14, 2024, 4:03 p.m. UTC
There is a scenario where a large number of discard commands
are issued when the iscsi initiator connects to the target
and then performs a session rescan operation. There is a time
window, most of the commands are in UNMAP mode, and some
discard commands become WRITE SAME with UNMAP.

The discard mode has been negotiated during the SCSI probe. If
the mode is temporarily changed from UNMAP to WRITE SAME with
UNMAP, IO ERROR may occur because the target may not implement
WRITE SAME with UNMAP. Keep the discard mode stable to fix this
issue.

Signed-off-by: Li Feng <fengli@smartx.com>
---
 drivers/scsi/sd.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

'Christoph Hellwig' June 17, 2024, 6:17 a.m. UTC | #1
On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote:
> +		/*
> +		 * When the discard mode has been set to UNMAP, it should not be set to

Overly long line here.

> +		 * WRITE SAME with UNMAP.
> +		 */
> +		if (!sdkp->max_unmap_blocks)
> +			sd_config_discard(sdkp, SD_LBP_WS16);

But more importantly this doesn't really scale to all the variations
of reported / guessed at probe time vs overriden.  I think you just
need an explicit override flag that skips the discard settings.
Li Feng June 17, 2024, 9:03 a.m. UTC | #2
> 2024年6月17日 14:17,Christoph Hellwig <hch@infradead.org> 写道:
> 
> On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote:
>> + /*
>> + * When the discard mode has been set to UNMAP, it should not be set to
> 
> Overly long line here.
OK.

> 
>> + * WRITE SAME with UNMAP.
>> + */
>> + if (!sdkp->max_unmap_blocks)
>> + sd_config_discard(sdkp, SD_LBP_WS16);
> 
> But more importantly this doesn't really scale to all the variations
> of reported / guessed at probe time vs overriden.  I think you just
> need an explicit override flag that skips the discard settings.
> 
I think we only need to prevent the temporary change of discard mode 
from UNMAP to WS16, and this patch should be enough.

Maybe it is a good idea to remove the call to sd_config_discard 
from read_capacity_16 . Because the unmap_alignment/ unmap_granularity
used by sd_config_discard are assigned in sd_read_block_limits. 

sd_read_block_limits is enough to negotiate the discard parameter. 
It is redundant for read_capacity to modify the discard parameter.  In this way, 
when the SCSI probe sends read_capacity first and then read block limits, 
it avoids the change of discard from DISABLE to WS16 to UNMAP.

Thanks,
Li
Benjamin Block June 17, 2024, 4:26 p.m. UTC | #3
Hey,

On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote:
> There is a scenario where a large number of discard commands
> are issued when the iscsi initiator connects to the target
> and then performs a session rescan operation. 

Is this with just one specific target implementation? This sounds like a
broken/buggy target, or is there a reason why this happens in general?

And broken target sounds like device quirk, rather than impacting every
possible target.

> There is a time
> window, most of the commands are in UNMAP mode, and some
> discard commands become WRITE SAME with UNMAP.
> 
> The discard mode has been negotiated during the SCSI probe. If
> the mode is temporarily changed from UNMAP to WRITE SAME with
> UNMAP, IO ERROR may occur because the target may not implement
> WRITE SAME with UNMAP. Keep the discard mode stable to fix this
> issue.
> 
> Signed-off-by: Li Feng <fengli@smartx.com>
> ---
>  drivers/scsi/sd.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index f6c822c9cbd2..0165dc70a99b 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -2598,7 +2598,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  		if (buffer[14] & 0x40) /* LBPRZ */
>  			sdkp->lbprz = 1;
>  
> -		sd_config_discard(sdkp, SD_LBP_WS16);
> +		/*
> +		 * When the discard mode has been set to UNMAP, it should not be set to
> +		 * WRITE SAME with UNMAP.
> +		 */
> +		if (!sdkp->max_unmap_blocks)
> +			sd_config_discard(sdkp, SD_LBP_WS16);
>  	}
>  
>  	sdkp->capacity = lba + 1;
Li Feng June 18, 2024, 3:06 a.m. UTC | #4
> 2024年6月18日 00:26,Benjamin Block <bblock@linux.ibm.com> 写道:
> 
> Hey,
> 
> On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote:
>> There is a scenario where a large number of discard commands
>> are issued when the iscsi initiator connects to the target
>> and then performs a session rescan operation. 
> 
> Is this with just one specific target implementation? This sounds like a
> broken/buggy target, or is there a reason why this happens in general?
> 
> And broken target sounds like device quirk, rather than impacting every
> possible target.

(resend due to GMail HTML bounce)

This is a common problem. Before sending a rescan, discard has been 
negotiated to UNMAP. After the rescan, there will be a short window for 
it to become WS16, and then it will immediately become UNMAP. 
However, during this period, a small amount of discard commands 
may become WS16, resulting in a strange problem.

> 
>> There is a time
>> window, most of the commands are in UNMAP mode, and some
>> discard commands become WRITE SAME with UNMAP.
>> 
>> The discard mode has been negotiated during the SCSI probe. If
>> the mode is temporarily changed from UNMAP to WRITE SAME with
>> UNMAP, IO ERROR may occur because the target may not implement
>> WRITE SAME with UNMAP. Keep the discard mode stable to fix this
>> issue.
>> 
>> Signed-off-by: Li Feng <fengli@smartx.com>
>> ---
>> drivers/scsi/sd.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index f6c822c9cbd2..0165dc70a99b 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -2598,7 +2598,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
>> if (buffer[14] & 0x40) /* LBPRZ */
>> sdkp->lbprz = 1;
>> 
>> - sd_config_discard(sdkp, SD_LBP_WS16);
>> + /*
>> +  * When the discard mode has been set to UNMAP, it should not be set to
>> +  * WRITE SAME with UNMAP.
>> +  */
>> + if (!sdkp->max_unmap_blocks)
>> + sd_config_discard(sdkp, SD_LBP_WS16);
>> }
>> 
>> sdkp->capacity = lba + 1;
> 
> -- 
> Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
> IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
> Vors. Aufs.-R.: Wolfgang Wendt         /        Gesch?ftsf?hrung: David Faller
> Sitz der Ges.: B?blingen     /    Registergericht: AmtsG Stuttgart, HRB 243294
'Christoph Hellwig' June 18, 2024, 6:41 a.m. UTC | #5
On Mon, Jun 17, 2024 at 05:03:03PM +0800, Li Feng wrote:
> > But more importantly this doesn't really scale to all the variations
> > of reported / guessed at probe time vs overriden.  I think you just
> > need an explicit override flag that skips the discard settings.
> > 
> I think we only need to prevent the temporary change of discard mode 
> from UNMAP to WS16, and this patch should be enough.
> 
> Maybe it is a good idea to remove the call to sd_config_discard 
> from read_capacity_16 . Because the unmap_alignment/ unmap_granularity
> used by sd_config_discard are assigned in sd_read_block_limits. 
> 
> sd_read_block_limits is enough to negotiate the discard parameter. 
> It is redundant for read_capacity to modify the discard parameter.  In this way, 
> when the SCSI probe sends read_capacity first and then read block limits, 
> it avoids the change of discard from DISABLE to WS16 to UNMAP.

Note that in the linux-next tree for 6.11 we're not only applying
the discard choice to the queue_limits structure and not commiting
it in read_capacity_16.  So it will be overriden before it gets
actually applied.  Can you check that your issue doesn't show up in
linux-next?
Benjamin Block June 18, 2024, 8:47 a.m. UTC | #6
On Tue, Jun 18, 2024 at 11:06:13AM +0800, Li Feng wrote:
> > 2024年6月18日 00:26,Benjamin Block <bblock@linux.ibm.com> 写道:
> > On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote:
> >> There is a scenario where a large number of discard commands
> >> are issued when the iscsi initiator connects to the target
> >> and then performs a session rescan operation. 
> > 
> > Is this with just one specific target implementation? This sounds like a
> > broken/buggy target, or is there a reason why this happens in general?
> > 
> > And broken target sounds like device quirk, rather than impacting every
> > possible target.
> 
> This is a common problem. Before sending a rescan, discard has been 
> negotiated to UNMAP. After the rescan, there will be a short window for 
> it to become WS16, and then it will immediately become UNMAP. 
> However, during this period, a small amount of discard commands 
> may become WS16, resulting in a strange problem.

Ok, interesting. Do you know why this short window happens? 

> >> There is a time
> >> window, most of the commands are in UNMAP mode, and some
> >> discard commands become WRITE SAME with UNMAP.
> >> 
> >> The discard mode has been negotiated during the SCSI probe. If
> >> the mode is temporarily changed from UNMAP to WRITE SAME with
> >> UNMAP, IO ERROR may occur because the target may not implement
> >> WRITE SAME with UNMAP. Keep the discard mode stable to fix this
> >> issue.
> >> 
> >> Signed-off-by: Li Feng <fengli@smartx.com>
> >> ---
Li Feng June 19, 2024, 6:56 a.m. UTC | #7
> 2024年6月18日 14:41,Christoph Hellwig <hch@infradead.org> 写道:
> 
> On Mon, Jun 17, 2024 at 05:03:03PM +0800, Li Feng wrote:
>>> But more importantly this doesn't really scale to all the variations
>>> of reported / guessed at probe time vs overriden.  I think you just
>>> need an explicit override flag that skips the discard settings.
>>> 
>> I think we only need to prevent the temporary change of discard mode 
>> from UNMAP to WS16, and this patch should be enough.
>> 
>> Maybe it is a good idea to remove the call to sd_config_discard 
>> from read_capacity_16 . Because the unmap_alignment/ unmap_granularity
>> used by sd_config_discard are assigned in sd_read_block_limits. 
>> 
>> sd_read_block_limits is enough to negotiate the discard parameter. 
>> It is redundant for read_capacity to modify the discard parameter.  In this way, 
>> when the SCSI probe sends read_capacity first and then read block limits, 
>> it avoids the change of discard from DISABLE to WS16 to UNMAP.
> 
> Note that in the linux-next tree for 6.11 we're not only applying
> the discard choice to the queue_limits structure and not commixing
> it in read_capacity_16.  So it will be overriden before it gets
> actually applied.  Can you check that your issue doesn't show up in
> linux-next?
> 

I pulled the latest linux-next and the problem still appeared.

[ 302.773386] sd 0:0:0:3: [sdc] tag#104 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 302.774839] sd 0:0:0:3: [sdc] tag#104 Sense Key : Illegal Request [current]
[ 302.775638] sd 0:0:0:3: [sdc] tag#104 Add. Sense: Invalid field in cdb
[ 302.776383] sd 0:0:0:3: [sdc] tag#104 CDB: Write same(16) 93 08 00 00 00 00 00 17 58 00 00 00 08 00 00 00
[ 302.777443] critical target error, dev sdc, sector 1529856 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class 0

The discard mode will undergo a transition from UNMAP->WS16->UNMAP during the rescan process.

This results in an unexpected WS16 discard command.

We can see that the SCSI probe stage calls the following in sequence:
sd_read_capacity
sd_read_block_provisioning(sdkp);
sd_read_block_limits(sdkp, &lim);

We only need to negotiate the discard mode after these function calls are completed.
In this way, there will be no temporary unexpected changes to DISCARD.

Also fixed a problem where the old code could to WS16 in set read_capacity_16 when sdkp->lbpws = 0.
For my SCSI disk, lbpws is equal to 0.

How does this patch?

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e01393ed4207..4c0962ebe7d5 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2622,7 +2622,6 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
              if (buffer[14] & 0x40) /* LBPRZ */
                      sdkp->lbprz = 1;

-               sd_config_discard(sdkp, lim, SD_LBP_WS16);
      }

      sdkp->capacity = lba + 1;
@@ -3271,8 +3270,6 @@ static void sd_read_block_limits(struct scsi_disk *sdkp,
              if (vpd->data[32] & 0x80)
                      sdkp->unmap_alignment =
                              get_unaligned_be32(&vpd->data[32]) & ~(1 << 31);
-
-               sd_config_discard(sdkp, lim, sd_discard_mode(sdkp));
      }

out:
@@ -3670,6 +3667,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
                      sd_zbc_read_zones(sdkp, &lim, buffer);
                      sd_read_cpr(sdkp);
              }
+               sd_config_discard(sdkp, &lim, sd_discard_mode(sdkp));

              sd_print_capacity(sdkp, old_capacity);


If it's OK, I'll submit v2.

Thanks,
Li
Li Feng June 19, 2024, 6:58 a.m. UTC | #8
> 2024年6月18日 16:47,Benjamin Block <bblock@linux.ibm.com> 写道:
> 
> On Tue, Jun 18, 2024 at 11:06:13AM +0800, Li Feng wrote:
>>> 2024年6月18日 00:26,Benjamin Block <bblock@linux.ibm.com> 写道:
>>> On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote:
>>>> There is a scenario where a large number of discard commands
>>>> are issued when the iscsi initiator connects to the target
>>>> and then performs a session rescan operation. 
>>> 
>>> Is this with just one specific target implementation? This sounds like a
>>> broken/buggy target, or is there a reason why this happens in general?
>>> 
>>> And broken target sounds like device quirk, rather than impacting every
>>> possible target.
>> 
>> This is a common problem. Before sending a rescan, discard has been 
>> negotiated to UNMAP. After the rescan, there will be a short window for 
>> it to become WS16, and then it will immediately become UNMAP. 
>> However, during this period, a small amount of discard commands 
>> may become WS16, resulting in a strange problem.
> 
> Ok, interesting. Do you know why this short window happens? 

I have explained it in other emails, you can read them.

Thanks,
Li

> 
>>>> There is a time
>>>> window, most of the commands are in UNMAP mode, and some
>>>> discard commands become WRITE SAME with UNMAP.
>>>> 
>>>> The discard mode has been negotiated during the SCSI probe. If
>>>> the mode is temporarily changed from UNMAP to WRITE SAME with
>>>> UNMAP, IO ERROR may occur because the target may not implement
>>>> WRITE SAME with UNMAP. Keep the discard mode stable to fix this
>>>> issue.
>>>> 
>>>> Signed-off-by: Li Feng <fengli@smartx.com>
>>>> ---
> 
> -- 
> Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
> IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
> Vors. Aufs.-R.: Wolfgang Wendt         /        Geschäftsführung: David Faller
> Sitz der Ges.: Böblingen     /    Registergericht: AmtsG Stuttgart, HRB 243294
'Christoph Hellwig' June 19, 2024, 7:02 a.m. UTC | #9
On Wed, Jun 19, 2024 at 02:56:18PM +0800, Li Feng wrote:
> +               sd_config_discard(sdkp, &lim, sd_discard_mode(sdkp));
> 
>               sd_print_capacity(sdkp, old_capacity);
> 
> 
> If it's OK, I'll submit v2.

Except for the whitespace damage in the moved call this looks good
to me.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f6c822c9cbd2..0165dc70a99b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2598,7 +2598,12 @@  static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		if (buffer[14] & 0x40) /* LBPRZ */
 			sdkp->lbprz = 1;
 
-		sd_config_discard(sdkp, SD_LBP_WS16);
+		/*
+		 * When the discard mode has been set to UNMAP, it should not be set to
+		 * WRITE SAME with UNMAP.
+		 */
+		if (!sdkp->max_unmap_blocks)
+			sd_config_discard(sdkp, SD_LBP_WS16);
 	}
 
 	sdkp->capacity = lba + 1;