diff mbox series

[v5,4/5] sd: limit to use write life hints

Message ID 20240910150200.6589-5-joshi.k@samsung.com
State New
Headers show
Series data placement hints and FDP | expand

Commit Message

Kanchan Joshi Sept. 10, 2024, 3:01 p.m. UTC
From: Nitesh Shetty <nj.shetty@samsung.com>

The incoming hint value maybe either lifetime hint or placement hint.
Make SCSI interpret only temperature-based write lifetime hints.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
 drivers/scsi/sd.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig Sept. 12, 2024, 1:02 p.m. UTC | #1
On Tue, Sep 10, 2024 at 08:31:59PM +0530, Kanchan Joshi wrote:
> From: Nitesh Shetty <nj.shetty@samsung.com>
> 
> The incoming hint value maybe either lifetime hint or placement hint.

.. may either be .. ?

> Make SCSI interpret only temperature-based write lifetime hints.
> 
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
>  drivers/scsi/sd.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index dad3991397cf..82bd4b07314e 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1191,8 +1191,8 @@ static u8 sd_group_number(struct scsi_cmnd *cmd)
>  	if (!sdkp->rscs)
>  		return 0;
>  
> -	return min3((u32)rq->write_hint, (u32)sdkp->permanent_stream_count,
> -		    0x3fu);
> +	return min3((u32)WRITE_LIFETIME_HINT(rq->write_hint),

No fan of the screaming WRITE_LIFETIME_HINT.    Or the fact that multiple
things are multiplexed into the single rq->write_hint field to
start with.

This code could also use a bit of documentation already in the existing
version, but even more so now.
Kanchan Joshi Sept. 12, 2024, 4:31 p.m. UTC | #2
On 9/12/2024 6:32 PM, Christoph Hellwig wrote:
> On Tue, Sep 10, 2024 at 08:31:59PM +0530, Kanchan Joshi wrote:
>> From: Nitesh Shetty <nj.shetty@samsung.com>
>>
>> The incoming hint value maybe either lifetime hint or placement hint.
> 
> .. may either be .. ?

Sure.

>> Make SCSI interpret only temperature-based write lifetime hints.
>>
>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
>> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
>> ---
>>   drivers/scsi/sd.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index dad3991397cf..82bd4b07314e 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1191,8 +1191,8 @@ static u8 sd_group_number(struct scsi_cmnd *cmd)
>>   	if (!sdkp->rscs)
>>   		return 0;
>>   
>> -	return min3((u32)rq->write_hint, (u32)sdkp->permanent_stream_count,
>> -		    0x3fu);
>> +	return min3((u32)WRITE_LIFETIME_HINT(rq->write_hint),
> 
> No fan of the screaming WRITE_LIFETIME_HINT. 

Macros tend to. Once it becomes lowercase (inline function), it will 
stop screaming.

    Or the fact that multiple
> things are multiplexed into the single rq->write_hint field to
> start with.

Please see the response in patch #1. My worries were:
(a) adding a new field and propagating it across the stack will cause 
code duplication.
(b) to add a new field we need to carve space within inode, bio and 
request.
We had a hole in request, but it is set to vanish after ongoing 
integrity refactoring patch of Keith [1]. For inode also, there is no 
liberty at this point [2].

I think current multiplexing approach is similar to ioprio where 
multiple io priority classes/values are expressed within an int type. 
And few kernel components choose to interpret certain ioprio values at will.

And all this is still in-kernel details. Which can be changed if/when 
other factors start helping.

[1] 
https://lore.kernel.org/linux-nvme/20240911201240.3982856-2-kbusch@meta.com/
[2] 
https://lore.kernel.org/linux-nvme/20240903-erfassen-bandmitglieder-32dfaeee66b2@brauner/
Christoph Hellwig Sept. 13, 2024, 8:06 a.m. UTC | #3
On Thu, Sep 12, 2024 at 10:01:00PM +0530, Kanchan Joshi wrote:
> Please see the response in patch #1. My worries were:
> (a) adding a new field and propagating it across the stack will cause 
> code duplication.
> (b) to add a new field we need to carve space within inode, bio and 
> request.
> We had a hole in request, but it is set to vanish after ongoing 
> integrity refactoring patch of Keith [1]. For inode also, there is no 
> liberty at this point [2].
> 
> I think current multiplexing approach is similar to ioprio where 
> multiple io priority classes/values are expressed within an int type. 
> And few kernel components choose to interpret certain ioprio values at will.
> 
> And all this is still in-kernel details. Which can be changed if/when 
> other factors start helping.

Maybe part of the problem is that the API is very confusing.  A smal
part of that is of course that the existing temperature hints already
have some issues, but this seems to be taking them make it significantly
worse.

Note: this tries to include highlevel comments from the discussion of
the previous patches instead of splitting them over multiple threads.

F_{S,G}ET_RW_HINT works on arbitrary file descriptors with absolutely no
check for support by the device or file system and not check for the
file type.  That's not exactly good API design, but not really a major
because they are clearly designed as hints with a fixed number of
values, allowing the implementation to map them if not enough are
supported.

But if we increase this to a variable number of hints that don't have
any meaning (and even if that is just the rough order of the temperature
hints assigned to them), that doesn't really work.  We'll need an API
to check if these stream hints are supported and how many of them,
otherwise the applications can't make any sensible use of them.

If these aren't just stream hints of the file system but you actually
want them as an abstract API for FDP you'll also need to actually
expose even more information like the reclaim unit size, but let's
ignore that for this part of the discssion.

Back the the API: the existing lifetime hints have basically three
layers:

 1) syscall ABI
 2) the hint stored in the inode
 3) the hint passed in the bio

1) is very much fixed for the temperature API, we just need to think if
   we want to support it at the same time as a more general hints API.
   Or if we can map one into another.  Or if we can't support them at
   the same time how that is communicated.

For 2) and 3) we can use an actual union if we decide to not support
both at the same time, keyed off a flag outside the field, but if not
we simply need space for both.
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index dad3991397cf..82bd4b07314e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1191,8 +1191,8 @@  static u8 sd_group_number(struct scsi_cmnd *cmd)
 	if (!sdkp->rscs)
 		return 0;
 
-	return min3((u32)rq->write_hint, (u32)sdkp->permanent_stream_count,
-		    0x3fu);
+	return min3((u32)WRITE_LIFETIME_HINT(rq->write_hint),
+			(u32)sdkp->permanent_stream_count, 0x3fu);
 }
 
 static blk_status_t sd_setup_rw32_cmnd(struct scsi_cmnd *cmd, bool write,
@@ -1390,7 +1390,8 @@  static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 		ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
 					 protect | fua, dld);
 	} else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
-		   sdp->use_10_for_rw || protect || rq->write_hint) {
+		   sdp->use_10_for_rw || protect ||
+		   WRITE_LIFETIME_HINT(rq->write_hint)) {
 		ret = sd_setup_rw10_cmnd(cmd, write, lba, nr_blocks,
 					 protect | fua);
 	} else {