mbox series

[v2,0/3] Fix dm-crypt zoned block device support

Message ID 20210417023323.852530-1-damien.lemoal@wdc.com
Headers show
Series Fix dm-crypt zoned block device support | expand

Message

Damien Le Moal April 17, 2021, 2:33 a.m. UTC
Mike,

Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
of the zone to be written instead of the actual location sector to
write. The write location is determined by the device and returned to
the host upon completion of the operation.

This interface, while simple and efficient for writing into sequential
zones of a zoned block device, is incompatible with the use of sector
values to calculate a cypher block IV. All data written in a zone is
encrypted using an IV calculated from the first sectors of the zone,
but read operation will specify any sector within the zone, resulting
in an IV mismatch between encryption and decryption. Reads fail in that
case.

Using a single sector value (e.g. the zone start sector) for all read
and writes into a zone can solve this problem, but at the cost of
weakening the cypher chosen by the user. Emulating zone append using
regular writes would be another potential solution, but it is complex
and would add a lot of overhead.

Instead, to solve this problem, explicitly disable support for zone
append operations in dm-crypt if the target was setup using a cypher IV
mode using sector values. The null and random IV modes can still be used
with zone append operations. This lack of support for zone append is
exposed to the user by setting the dm-crypt target queue limit
max_zone_append_sectors to 0. This change is done in patches 1 and 2.

Patch 3 fixes zonefs to fall back to using regular write when
max_zone_append_sectors is 0 (Note: I can take this patch through the
zonefs tree. But since I have nothing else for an eventual rc8 and next
cycle, you can take it too. No chance of conflict).

Overall, these changes do not break user space:
1) There is no interface allowing a user to use zone append write
without a file system. So applications using directly a raw dm-crypt
device will continue working using regular write operations.
2) btrfs zoned support was added in 5.12. Anybody trying btrfs-zoned on
top of dm-crypt would have faced the read failures already. So there
are no existing deployments to preserve. Same for zonefs.

For file systems, using zone append with encryption will need to be
supported within the file system (e.g. fscrypt). In this case, cypher IV
calculation can rely for instance on file block offsets as these are
known before a zone append operation write these blocks to disk at
unknown locations.

Reviews and comments are very much welcome.

Changes from v1:
* Addressed Johannes comments by renaming the CRYPT_IV_NO_SECTORS flag
  to CRYPT_IV_ZONE_APPEND to avoid a double negation with !test_bit().
  This also clarifies that the flag is used solely to check zone append
  support.
* Removed btrfs patch (former patch 3) as David is taking that patch
  through the btrfs tree misc-next branch.
* Added reviewed-by, Fixes and Cc tags.

Damien Le Moal (3):
  dm: Introduce zone append support control
  dm crypt: Fix zoned block device support
  zonefs: fix synchronous write to sequential zone files

 drivers/md/dm-crypt.c         | 49 ++++++++++++++++++++++++++++-------
 drivers/md/dm-table.c         | 41 +++++++++++++++++++++++++++++
 fs/zonefs/super.c             | 16 +++++++++---
 fs/zonefs/zonefs.h            |  2 ++
 include/linux/device-mapper.h |  6 +++++
 5 files changed, 101 insertions(+), 13 deletions(-)

Comments

Christoph Hellwig April 19, 2021, 6:45 a.m. UTC | #1
On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote:
> Synchronous writes to sequential zone files cannot use zone append

> operations if the underlying zoned device queue limit

> max_zone_append_sectors is 0, indicating that the device does not

> support this operation. In this case, fall back to using regular write

> operations.


Zone append is a mandatory feature of the zoned device API.
Damien Le Moal April 19, 2021, 7:08 a.m. UTC | #2
On 2021/04/19 15:45, Christoph Hellwig wrote:
> On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote:
>> Synchronous writes to sequential zone files cannot use zone append
>> operations if the underlying zoned device queue limit
>> max_zone_append_sectors is 0, indicating that the device does not
>> support this operation. In this case, fall back to using regular write
>> operations.
> 
> Zone append is a mandatory feature of the zoned device API.

Yes, I am well aware of that. All physical zoned devices and null blk do support
zone append, but the logical device created by dm-crypt is out. And we cannot
simply disable zone support in dm-crypt as there are use cases out there in the
field that I am aware of, in SMR space.

So this series is a compromise: preserve dm-crypt zone support for SMR (no one
uses the zone append emulation yet, as far as I know) by disabling zone append.

For zonefs, we can:
1) refuse to mount if ZA is disabled, same as btrfs
2) Do as I did in the patch, fallback to regular writes since that is easy to do
(zonefs file size tracks the WP position already).

I chose option (2) to allow for SMR+dm-crypt to still work with zonefs.
Christoph Hellwig April 19, 2021, 7:10 a.m. UTC | #3
On Mon, Apr 19, 2021 at 07:08:46AM +0000, Damien Le Moal wrote:
> 1) refuse to mount if ZA is disabled, same as btrfs


Yes, please do that.
Mikulas Patocka April 19, 2021, 12:52 p.m. UTC | #4
On Sat, 17 Apr 2021, Damien Le Moal wrote:

> Mike,

> 

> Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector

> of the zone to be written instead of the actual location sector to

> write. The write location is determined by the device and returned to

> the host upon completion of the operation.


I'd like to ask what's the reason for this semantics? Why can't users of 
the zoned device supply real sector numbers?

> This interface, while simple and efficient for writing into sequential

> zones of a zoned block device, is incompatible with the use of sector

> values to calculate a cypher block IV. All data written in a zone is

> encrypted using an IV calculated from the first sectors of the zone,

> but read operation will specify any sector within the zone, resulting

> in an IV mismatch between encryption and decryption. Reads fail in that

> case.


I would say that it is incompatible with all dm targets - even the linear 
target is changing the sector number and so it may redirect the write 
outside of the range specified in dm-table and cause corruption.

Instead of complicating device mapper with imperfect support, I would just 
disable REQ_OP_ZONE_APPEND on device mapper at all.

Mikulas
Damien Le Moal April 19, 2021, 1:02 p.m. UTC | #5
On 2021/04/19 21:52, Mikulas Patocka wrote:
> 
> 
> On Sat, 17 Apr 2021, Damien Le Moal wrote:
> 
>> Mike,
>>
>> Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
>> of the zone to be written instead of the actual location sector to
>> write. The write location is determined by the device and returned to
>> the host upon completion of the operation.
> 
> I'd like to ask what's the reason for this semantics? Why can't users of 
> the zoned device supply real sector numbers?

They can, if they use regular write commands :)

Zone append was designed to address sequential write ordering difficulties on
the host. Unlike regular writes which must be delivered to the device in
sequential order, zone append commands can be sent to the device in any order.
The device will process the write at the WP position when the command is
executed and return the first sector written. This command makes it easy on the
host in multi-queue environment and avoids the need for serializing commands &
locking zones for writing. So very efficient performance.

>> This interface, while simple and efficient for writing into sequential
>> zones of a zoned block device, is incompatible with the use of sector
>> values to calculate a cypher block IV. All data written in a zone is
>> encrypted using an IV calculated from the first sectors of the zone,
>> but read operation will specify any sector within the zone, resulting
>> in an IV mismatch between encryption and decryption. Reads fail in that
>> case.
> 
> I would say that it is incompatible with all dm targets - even the linear 
> target is changing the sector number and so it may redirect the write 
> outside of the range specified in dm-table and cause corruption.

DM remapping of BIO sectors is zone compatible because target entries must be
zone aligned. In the case of zone append, the BIO sector always point to the
start sector of the target zone. DM sector remapping will remap that to another
zone start as all zones are the same size. No issue here. We extensively use
dm-linear for various test environment to reduce the size of the device tested
(to speed up tests). I am confident there are no problems there.

> Instead of complicating device mapper with imperfect support, I would just 
> disable REQ_OP_ZONE_APPEND on device mapper at all.

That was my initial approach, but for dm-crypt only since other targets that
support zoned devices are fine. However, this breaks zoned block device
requirement that zone append be supported so that users are presented with a
uniform interface for different devices. So while simple to do, disabling zone
append is far from ideal.

> 
> Mikulas
> 
>
Mikulas Patocka April 19, 2021, 1:55 p.m. UTC | #6
On Mon, 19 Apr 2021, Damien Le Moal wrote:

> > I would say that it is incompatible with all dm targets - even the linear 

> > target is changing the sector number and so it may redirect the write 

> > outside of the range specified in dm-table and cause corruption.

> 

> DM remapping of BIO sectors is zone compatible because target entries must be

> zone aligned. In the case of zone append, the BIO sector always point to the

> start sector of the target zone. DM sector remapping will remap that to another

> zone start as all zones are the same size. No issue here. We extensively use

> dm-linear for various test environment to reduce the size of the device tested

> (to speed up tests). I am confident there are no problems there.

> 

> > Instead of complicating device mapper with imperfect support, I would just 

> > disable REQ_OP_ZONE_APPEND on device mapper at all.

> 

> That was my initial approach, but for dm-crypt only since other targets that

> support zoned devices are fine. However, this breaks zoned block device

> requirement that zone append be supported so that users are presented with a

> uniform interface for different devices. So while simple to do, disabling zone

> append is far from ideal.


So, we could enable it for the linear target and disable for all other 
targets?

I talked with Milan about it and he doesn't want to add more bloat to the 
crypt target. I agree with him.

Mikulas
Milan Broz April 19, 2021, 2:31 p.m. UTC | #7
On 19/04/2021 15:55, Mikulas Patocka wrote:
> 

> 

> On Mon, 19 Apr 2021, Damien Le Moal wrote:

> 

>>> I would say that it is incompatible with all dm targets - even the linear 

>>> target is changing the sector number and so it may redirect the write 

>>> outside of the range specified in dm-table and cause corruption.

>>

>> DM remapping of BIO sectors is zone compatible because target entries must be

>> zone aligned. In the case of zone append, the BIO sector always point to the

>> start sector of the target zone. DM sector remapping will remap that to another

>> zone start as all zones are the same size. No issue here. We extensively use

>> dm-linear for various test environment to reduce the size of the device tested

>> (to speed up tests). I am confident there are no problems there.

>>

>>> Instead of complicating device mapper with imperfect support, I would just 

>>> disable REQ_OP_ZONE_APPEND on device mapper at all.

>>

>> That was my initial approach, but for dm-crypt only since other targets that

>> support zoned devices are fine. However, this breaks zoned block device

>> requirement that zone append be supported so that users are presented with a

>> uniform interface for different devices. So while simple to do, disabling zone

>> append is far from ideal.

> 

> So, we could enable it for the linear target and disable for all other 

> targets?

> 

> I talked with Milan about it and he doesn't want to add more bloat to the 

> crypt target. I agree with him.


This is all fine even for dm-crypt IF the tweaking is unique for the sector position
(it can be something just derived from the sector offset in principle).

For FDE, we must never allow writing sectors to different positions with the same
tweak (IV) and key - there are real attacks based on this issue.

So zones can do any recalculation and reshuffling it wants if sector tweak
in dm-crypt is unique.
(Another solution would be to use different keys for different areas, but that
is not possible with dm-crypt or FDE in general, but fs encryption can do that.)

If you want dm-crypt to support zones properly, there is a need for emulation
of the real sector offset - because that is what IV expects now.

And I think such emulation should be in DM core, not in dm-crypt itself, because other
targets can need the same functionality (I guess that dm-integrity journal
has a problem with that already, Mikulas will know more).

For online reencryption we also use multiple targets in the table that dynamically moves
(linear combined with dm-crypt), so dm-crypt must support all commands as
dm-linear to make this work.

I hope I understand the problem correctly; all I want is to so avoid patching
the wrong place (dmcrypt crypto) because that problem will appear elsewhere later.
Also for security it would be nice to not add exceptions to encryption
code - it is always recipe for disaster.

Milan
Douglas Gilbert April 20, 2021, 1:20 a.m. UTC | #8
On 2021-04-19 2:45 a.m., Christoph Hellwig wrote:
> On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote:

>> Synchronous writes to sequential zone files cannot use zone append

>> operations if the underlying zoned device queue limit

>> max_zone_append_sectors is 0, indicating that the device does not

>> support this operation. In this case, fall back to using regular write

>> operations.

> 

> Zone append is a mandatory feature of the zoned device API.


So a hack required for ZNS and not needed by ZBC and ZAC becomes
a "mandatory feature" in a Linux API. Like many hacks, that one might
come back to bite you :-)

Doug Gilbert
Damien Le Moal April 20, 2021, 1:35 a.m. UTC | #9
On 2021/04/20 10:20, Douglas Gilbert wrote:
> On 2021-04-19 2:45 a.m., Christoph Hellwig wrote:
>> On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote:
>>> Synchronous writes to sequential zone files cannot use zone append
>>> operations if the underlying zoned device queue limit
>>> max_zone_append_sectors is 0, indicating that the device does not
>>> support this operation. In this case, fall back to using regular write
>>> operations.
>>
>> Zone append is a mandatory feature of the zoned device API.
> 
> So a hack required for ZNS and not needed by ZBC and ZAC becomes
> a "mandatory feature" in a Linux API. Like many hacks, that one might
> come back to bite you :-)

Zone append is not a hack in ZNS. It is a write interface that fits very well
with the multi-queue nature of NVMe. The "hack" is the emulation in scsi.

We decided on having this mandatory for zoned devices (all types) to make sure
that file systems do not have to implement different IO paths for sequential
writing to zones. Zone append does simplify a lot of things and allows to get
the best performance from ZNS drives. Zone write locking/serialization of writes
per zones using regular writes is much harder to implement, make a mess of the
file system code, and would kill write performance on ZNS.