diff mbox series

[REGRESSION] dm_crypt essiv ciphers do not use async driver mv-aes-cbc anymore

Message ID 53f57de2-ef58-4855-bb3c-f0d54472dc4d@yuka.dev
State New
Headers show
Series [REGRESSION] dm_crypt essiv ciphers do not use async driver mv-aes-cbc anymore | expand

Commit Message

Yureka Sept. 29, 2023, 9:08 p.m. UTC
#regzbot introduced: 7bcb2c99f8ed

I am running the NixOS distribution cross-compiled from x86_64 to a Marvell Armada 388 armv7 SoC.

I am not getting expected speeds when reading/writing on my encrypted hard drive with 6.5.5, while it is fast on 5.4.257. Volume is formatted like this: `cryptsetup luksFormat -c aes-cbc-essiv:sha256 /dev/sda`.

Specifically, I tracked this down to the changes to crypto/essiv.c from 7bcb2c99f8ed mentioned above. Reverting those changes on top of a 6.5.5 kernel provides working (see applicable diff further below).

I'm *guessing* that this is related to the mv-aes-cbc crypto driver (from the marvell-cesa module) being registered as async (according to /proc/crypto), and I *suspect* that async drivers are not being used anymore by essiv or dm_crypt. Going by the commit description, which sounds more like a refactor, this does not seem intentional.

Would appreciate a lot if someone more experienced with the crypto subsystem can have a look at this.


Greetings,

Yureka


---
Some more detailed information


Excerpt from /proc/crypto in 5.4.257:

name         : essiv(cbc(aes),sha256)
driver       : essiv(mv-cbc-aes,sha256-generic)
async        : yes

speeds of 130MB/s can be observed

In 5.10.197 (the first branch that has this issue and is still receiving updates):

name         : essiv(cbc(aes),sha256)
driver       : essiv(cbc(aes-arm),sha256-generic)
async        : no

speeds are less than half, 55MB/s
the other listings in /proc/crypto seem to be unchanged. mv-cbc-aes is still the highest priority driver providing cbc(aes)

Comments

Linux regression tracking (Thorsten Leemhuis) Nov. 1, 2023, 12:04 p.m. UTC | #1
n 30.09.23 00:43, Eric Biggers wrote:
> On Fri, Sep 29, 2023 at 11:08:55PM +0200, Yureka wrote:
>>
>> I am running the NixOS distribution cross-compiled from x86_64 to a Marvell
>> Armada 388 armv7 SoC.
>>
>> I am not getting expected speeds when reading/writing on my encrypted hard
>> drive with 6.5.5, while it is fast on 5.4.257. Volume is formatted like this:
>> `cryptsetup luksFormat -c aes-cbc-essiv:sha256 /dev/sda`.
>>
>> Specifically, I tracked this down to the changes to crypto/essiv.c from
>> 7bcb2c99f8ed mentioned above. Reverting those changes on top of a 6.5.5 kernel
>> provides working (see applicable diff further below).
>>
>> I'm *guessing* that this is related to the mv-aes-cbc crypto driver (from the
>> marvell-cesa module) being registered as async (according to /proc/crypto),
>> and I *suspect* that async drivers are not being used anymore by essiv or
>> dm_crypt. Going by the commit description, which sounds more like a refactor,
>> this does not seem intentional.
> 
> This is actually from commit b8aa7dc5c753 ("crypto: drivers - set the flag
> CRYPTO_ALG_ALLOCATES_MEMORY"), which set CRYPTO_ALG_ALLOCATES_MEMORY in
> marvell-cesa.  7bcb2c99f8ed is just one of the prerequisite commits.
> 
> I understand that the dm-crypt developers did this as an intentional bug fix in
> order to prevent dm-crypt from using crypto drivers that are known to cause
> deadlocks due to allocating memory during requests.
> 
> If you are interested in still being able to use marvell-cesa with dm-crypt, I
> believe it would need to be fixed to meet the requirements for not needing
> CRYPTO_ALG_ALLOCATES_MEMORY.  I've Cc'ed the maintainers of that driver.
> 
> #regzbot introduced: b8aa7dc5c753

BTW: Eric, thx for this.

Boris, Arnaud, Srujana, and Mikulas, could you maybe comment on this? I
understand that this is not some everyday regression due to deadlock
risk, but it nevertheless would be good to get this resolved somehow to
stay in line with our "no regressions" rule.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke
Linux regression tracking (Thorsten Leemhuis) Nov. 11, 2023, 11:58 a.m. UTC | #2
On 01.11.23 13:47, Mikulas Patocka wrote:
> On Wed, 1 Nov 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> 
>>> #regzbot introduced: b8aa7dc5c753
>>
>> BTW: Eric, thx for this.
>>
>> Boris, Arnaud, Srujana, and Mikulas, could you maybe comment on this? I
>> understand that this is not some everyday regression due to deadlock
>> risk, but it nevertheless would be good to get this resolved somehow to
>> stay in line with our "no regressions" rule.
>>
> 
> The driver drivers/crypto/marvell/cesa/cipher.c uses GFP_ATOMIC 
> allocations (see mv_cesa_skcipher_dma_req_init). So, it is not really safe 
> to use it for dm-crypt.
> 
> GFP_ATOMIC allocations may fail anytime (for example, they fill fail if 
> the machine receives too many network packets in a short timeframe and 
> runs temporarily out of memory). And when the GFP_ATOMIC allocation fails, 
> you get a write I/O error and data corruption.
> 
> It could be possible to change it to use GFP_NOIO allocations, then we 
> would risk deadlock instead of data corruption. The best thing would be to 
> convert the driver to use mempools.

Thx, now I understand things better. I also had a small hope that my
prodding here might motivate someone to look into this, but that didn't
happen. Well, that's how it is.

I'm not totally sure if this regression was handled like Linus would
have want it to be handled. But I guess it's not worth bringing him in
-- among others because it took so long for somebody to complain. I'll
thus strop tracking this now.

#regzbot resolve: tricky situation that remains unresolved for now

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
diff mbox series

Patch

diff that restores the working state on recent kernels (tested with 6.5.5):


diff --git a/crypto/essiv.c b/crypto/essiv.c
index 85bb624e32b9..8d57245add54 100644
--- a/crypto/essiv.c
+++ b/crypto/essiv.c
@@ -471,7 +471,7 @@  static int essiv_create(struct crypto_template *tmpl, struct rtattr **tb)
                return PTR_ERR(shash_name);
 
        type = algt->type & algt->mask;
-       mask = crypto_algt_inherited_mask(algt);
+       mask = (algt->type ^ CRYPTO_ALG_ASYNC) & algt->mask & CRYPTO_ALG_ASYNC;
 
        switch (type) {
        case CRYPTO_ALG_TYPE_SKCIPHER:
@@ -530,7 +530,7 @@  static int essiv_create(struct crypto_template *tmpl, struct rtattr **tb)
        /* Synchronous hash, e.g., "sha256" */
        _hash_alg = crypto_alg_mod_lookup(shash_name,
                                          CRYPTO_ALG_TYPE_SHASH,
-                                         CRYPTO_ALG_TYPE_MASK | mask);
+                                         CRYPTO_ALG_TYPE_MASK);
        if (IS_ERR(_hash_alg)) {
                err = PTR_ERR(_hash_alg);
                goto out_drop_skcipher;
@@ -562,12 +562,7 @@  static int essiv_create(struct crypto_template *tmpl, struct rtattr **tb)
                     hash_alg->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
                goto out_free_hash;
 
-       /*
-        * hash_alg wasn't gotten via crypto_grab*(), so we need to inherit its
-        * flags manually.
-        */
-       base->cra_flags        |= (hash_alg->base.cra_flags &
-                                  CRYPTO_ALG_INHERITED_FLAGS);
+       base->cra_flags         = block_base->cra_flags & CRYPTO_ALG_ASYNC;
        base->cra_blocksize     = block_base->cra_blocksize;
        base->cra_ctxsize       = sizeof(struct essiv_tfm_ctx);
        base->cra_alignmask     = block_base->cra_alignmask;