mbox series

[v4,0/5] Write-placement hints and FDP

Message ID 20240826170606.255718-1-joshi.k@samsung.com
Headers show
Series Write-placement hints and FDP | expand

Message

Kanchan Joshi Aug. 26, 2024, 5:06 p.m. UTC
Current write-hint infrastructure supports 6 temperature-based data life
hints.
The series extends the infrastructure with a new temperature-agnostic
placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
send the hint type/value on file. See patch #3 commit description for
the details.

Overall this creates 128 placement hint values [*] that users can pass.
Patch #5 adds the ability to map these new hint values to nvme-specific
placement-identifiers.
Patch #4 restricts SCSI to use only life hint values.
Patch #1 and #2 are simple prep patches.

[*] While the user-interface can support more, this limit is due to the
in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
hole in the inode, but the code had this comment too:

/* 32-bit hole reserved for expanding i_fsnotify_mask */

Not must, but it will be good to know if a byte (or two) can be used
here.

Changes since v3:
- 4 new patches to introduce write-placement hints
- Make nvme patch use the placement hints rather than write-life hints

Changes since v2:
- Base it on nvme-6.11 and resolve a merge conflict

Changes since v1:
- Reduce the fetched plids from 128 to 6 (Keith)
- Use struct_size for a calculation (Keith)
- Handle robot/sparse warning

Kanchan Joshi (4):
  fs, block: refactor enum rw_hint
  fcntl: rename rw_hint_* to rw_life_hint_*
  fcntl: add F_{SET/GET}_RW_HINT_EX
  nvme: enable FDP support

Nitesh Shetty (1):
  sd: limit to use write life hints

 drivers/nvme/host/core.c   | 81 ++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h   |  4 ++
 drivers/scsi/sd.c          |  7 ++--
 fs/buffer.c                |  4 +-
 fs/f2fs/f2fs.h             |  4 +-
 fs/f2fs/segment.c          |  4 +-
 fs/fcntl.c                 | 79 ++++++++++++++++++++++++++++++++++---
 include/linux/blk-mq.h     |  2 +-
 include/linux/blk_types.h  |  2 +-
 include/linux/fs.h         |  2 +-
 include/linux/nvme.h       | 19 +++++++++
 include/linux/rw_hint.h    | 20 +++++++---
 include/uapi/linux/fcntl.h | 14 +++++++
 13 files changed, 218 insertions(+), 24 deletions(-)

Comments

Kanchan Joshi Aug. 27, 2024, 5:12 a.m. UTC | #1
On 8/26/2024 11:14 PM, Bart Van Assche wrote:
> On 8/26/24 10:06 AM, Kanchan Joshi wrote:
>> Change i_write_hint (in inode), bi_write_hint (in bio) and write_hint
>> (in request) to use u8 data-type rather than this enum.
> 
> That sounds fishy to me. Why to increase the size of this enum? Why to
> reduce the ability of the compiler to perform type checking? I think
> this needs to be motivated clearly in the patch description.

Since inode/bio/request stopped using this, the __packed annotation did 
not seem to serve much purpose. But sure, I can retain the size/checks 
on the renamed enum (rw_life_hint) too.

Motivation for keeping u8 in inode/bio/request is to represent another 
hint type. This is similar to ioprio where multiple io priority 
classes/values are expressed within an int type.
Javier Gonzalez Aug. 30, 2024, 11:59 a.m. UTC | #2
On 26.08.2024 22:36, Kanchan Joshi wrote:
>Current write-hint infrastructure supports 6 temperature-based data life
>hints.
>The series extends the infrastructure with a new temperature-agnostic
>placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
>send the hint type/value on file. See patch #3 commit description for
>the details.
>
>Overall this creates 128 placement hint values [*] that users can pass.
>Patch #5 adds the ability to map these new hint values to nvme-specific
>placement-identifiers.
>Patch #4 restricts SCSI to use only life hint values.
>Patch #1 and #2 are simple prep patches.
>
>[*] While the user-interface can support more, this limit is due to the
>in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
>hole in the inode, but the code had this comment too:
>
>/* 32-bit hole reserved for expanding i_fsnotify_mask */
>
>Not must, but it will be good to know if a byte (or two) can be used
>here.
>
>Changes since v3:
>- 4 new patches to introduce write-placement hints
>- Make nvme patch use the placement hints rather than write-life hints
>
>Changes since v2:
>- Base it on nvme-6.11 and resolve a merge conflict
>
>Changes since v1:
>- Reduce the fetched plids from 128 to 6 (Keith)
>- Use struct_size for a calculation (Keith)
>- Handle robot/sparse warning
>
>Kanchan Joshi (4):
>  fs, block: refactor enum rw_hint
>  fcntl: rename rw_hint_* to rw_life_hint_*
>  fcntl: add F_{SET/GET}_RW_HINT_EX
>  nvme: enable FDP support
>
>Nitesh Shetty (1):
>  sd: limit to use write life hints
>
> drivers/nvme/host/core.c   | 81 ++++++++++++++++++++++++++++++++++++++
> drivers/nvme/host/nvme.h   |  4 ++
> drivers/scsi/sd.c          |  7 ++--
> fs/buffer.c                |  4 +-
> fs/f2fs/f2fs.h             |  4 +-
> fs/f2fs/segment.c          |  4 +-
> fs/fcntl.c                 | 79 ++++++++++++++++++++++++++++++++++---
> include/linux/blk-mq.h     |  2 +-
> include/linux/blk_types.h  |  2 +-
> include/linux/fs.h         |  2 +-
> include/linux/nvme.h       | 19 +++++++++
> include/linux/rw_hint.h    | 20 +++++++---
> include/uapi/linux/fcntl.h | 14 +++++++
> 13 files changed, 218 insertions(+), 24 deletions(-)
>
>-- 
>2.25.1
>

Keith, Christoph, Martin

Does this approach align with the offline conversation we had arund FMS?
Comments on the list would help us move forward with this series.

We would like to move the folks that are using off-tree patches for FDP
to mainline support.

Thanks,
Javier
Kanchan Joshi Sept. 3, 2024, 2:28 p.m. UTC | #3
Hi Amir,


On 8/26/2024 10:36 PM, Kanchan Joshi wrote:
> Current write-hint infrastructure supports 6 temperature-based data life
> hints.
> The series extends the infrastructure with a new temperature-agnostic
> placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
> send the hint type/value on file. See patch #3 commit description for
> the details.
> 
> Overall this creates 128 placement hint values [*] that users can pass.
> Patch #5 adds the ability to map these new hint values to nvme-specific
> placement-identifiers.
> Patch #4 restricts SCSI to use only life hint values.
> Patch #1 and #2 are simple prep patches.
> 
> [*] While the user-interface can support more, this limit is due to the
> in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
> hole in the inode, but the code had this comment too:
> 
> /* 32-bit hole reserved for expanding i_fsnotify_mask */
> 
> Not must, but it will be good to know if a byte (or two) can be used
> here.

Since having one extra byte will simplify things, I can't help but ask - 
do you still have the plans to use this space (in entirety) within inode?
Christian Brauner Sept. 3, 2024, 2:35 p.m. UTC | #4
On Tue, Sep 03, 2024 at 07:58:46PM GMT, Kanchan Joshi wrote:
> Hi Amir,
> 
> 
> On 8/26/2024 10:36 PM, Kanchan Joshi wrote:
> > Current write-hint infrastructure supports 6 temperature-based data life
> > hints.
> > The series extends the infrastructure with a new temperature-agnostic
> > placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
> > send the hint type/value on file. See patch #3 commit description for
> > the details.
> > 
> > Overall this creates 128 placement hint values [*] that users can pass.
> > Patch #5 adds the ability to map these new hint values to nvme-specific
> > placement-identifiers.
> > Patch #4 restricts SCSI to use only life hint values.
> > Patch #1 and #2 are simple prep patches.
> > 
> > [*] While the user-interface can support more, this limit is due to the
> > in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
> > hole in the inode, but the code had this comment too:
> > 
> > /* 32-bit hole reserved for expanding i_fsnotify_mask */
> > 
> > Not must, but it will be good to know if a byte (or two) can be used
> > here.
> 
> Since having one extra byte will simplify things, I can't help but ask - 
> do you still have the plans to use this space (in entirety) within inode?

I just freed up 8 bytes in struct inode with what's currently in -next.
There will be no using up those 8 bytes unless it's for a good reason 
and something that is very widely useful.
Kanchan Joshi Sept. 4, 2024, 2:57 p.m. UTC | #5
On 9/3/2024 8:05 PM, Christian Brauner wrote:
> On Tue, Sep 03, 2024 at 07:58:46PM GMT, Kanchan Joshi wrote:
>> Hi Amir,
>>
>>
>> On 8/26/2024 10:36 PM, Kanchan Joshi wrote:
>>> Current write-hint infrastructure supports 6 temperature-based data life
>>> hints.
>>> The series extends the infrastructure with a new temperature-agnostic
>>> placement-type hint. New fcntl codes F_{SET/GET}_RW_HINT_EX allow to
>>> send the hint type/value on file. See patch #3 commit description for
>>> the details.
>>>
>>> Overall this creates 128 placement hint values [*] that users can pass.
>>> Patch #5 adds the ability to map these new hint values to nvme-specific
>>> placement-identifiers.
>>> Patch #4 restricts SCSI to use only life hint values.
>>> Patch #1 and #2 are simple prep patches.
>>>
>>> [*] While the user-interface can support more, this limit is due to the
>>> in-kernel plumbing consideration of the inode size. Pahole showed 32-bit
>>> hole in the inode, but the code had this comment too:
>>>
>>> /* 32-bit hole reserved for expanding i_fsnotify_mask */
>>>
>>> Not must, but it will be good to know if a byte (or two) can be used
>>> here.
>>
>> Since having one extra byte will simplify things, I can't help but ask -
>> do you still have the plans to use this space (in entirety) within inode?
> 
> I just freed up 8 bytes in struct inode with what's currently in -next.
> There will be no using up those 8 bytes unless it's for a good reason
> and something that is very widely useful.

I see, so now there are two holes. Seems the plan is to co-locate these 
and reduce the size by 8 bytes. Thanks for the pointer. Primary reason 
is a bit cleaner plumbing, but I'll manage without extra space.