Message ID | 20240429134942.2873253-1-aaron.toponce@gmail.com |
---|---|
State | New |
Headers | show |
Series | random: add chacha8_block and swtich the rng to it | expand |
On Mon, Apr 29, 2024 at 07:48:49AM -0600, Aaron Toponce wrote: > According to Jean-Philippe Aumasson in his paper "Too Much Crypto" [1]: > > > "The best result on ChaCha is a key recovery attack on the 7-round version > > with 2^237.7 time complexity using output data from 2^96 instances of ChaCha, > > that is, 2^105 bytes of data." > > He then proposes that ChaCha use 8 rounds instead of 20, providing a 2.5x > speed-up. As such, this patch adds chacha8_block and chacha12_block and switches > the RNG from ChaCha20 to ChaCha8 to take advantage of that efficiency without > sacrificing security. > I don't think there is consensus on ChaCha8 being recommended. Adiantum uses ChaCha12, but even that received some pushback. The Linux RNG is also usually used only for small amounts of data, and its security (and the perception of its security) is extremely important. So just staying with ChaCha20 seems appropriate. Note also that currently the Linux RNG is using a portable C implementation of ChaCha20. If there is actually a desire to accelerate large reads (which again, aren't the main use case of the Linux RNG), it would be possible to use a SIMD implementation of ChaCha20, which already exists in the kernel. That would speed up ChaCha20 by roughly 2-5x depending on the CPU. - Eric
On Mon, Apr 29, 2024 at 08:11:05PM -0700, Eric Biggers wrote: > I don't think there is consensus on ChaCha8 being recommended. Adiantum uses > ChaCha12, but even that received some pushback. > > The Linux RNG is also usually used only for small amounts of data, and its > security (and the perception of its security) is extremely important. > > So just staying with ChaCha20 seems appropriate. The 7-round attack does indeed fall short of the required 256-bits of security per the stated goals of the ChaCha stream cipher, coming in at ~237 bits. However, this attack is not catastrophic and of theoretical interest only. It's well outside of practical reach. The 8-round version however reaches our required security goal and is currently unbroken. An interesting note in that paper is how we got ChaCha20 to begin with. ChaCha is an evolution of Salsa20 which was included in the final eSTREAM portfolio [1] The final eSTREAM portfolio recommends Salsa20/12, which is the 12 round Salsa20. but with better diffusion [2]. In the "Too Much Crypto" paper, it states [3]: > "Regarding ChaCha, the eSTREAM actually recommended Salsa20/12, or ChaCha's > predecessor with 12 rounds instead of 20, but ChaCha was de facto standardized > with 20 rounds." ChaCha20 is 13 additional rounds for extra security margin, more than have been demonstrated for ChaCha to be secure. [1]: https://www.ecrypt.eu.org/stream/ [2]: https://cr.yp.to/chacha/chacha-20080128.pdf [3]: https://eprint.iacr.org/2019/1492 The reduced-round analysis of ChaCha is actually *better* than Salsa20. Salsa20/8 has a known attack complexity with ~249 bits and Salsa20/7 has a known attack complexity of ~153 bits. No known attacks exist against ChaCha8, and the complexity against ChaCha7 is ~237 bits. This demonstrates to me that ChaCha's security is very robust, and ChaCha8 is solid choice for a CSPRNG. > Note also that currently the Linux RNG is using a portable C implementation of > ChaCha20. If there is actually a desire to accelerate large reads (which > again, aren't the main use case of the Linux RNG), it would be possible to use > a SIMD implementation of ChaCha20, which already exists in the kernel. That > would speed up ChaCha20 by roughly 2-5x depending on the CPU. If ChaCha8 makes us uncomfortable, even though defensible, ChaCha12 is a good compromise. As you mentioned, Google implemented ChaCha12 in Adiantum. It offers a 1.67x speedup over ChaCha20 while still providing 5 additional rounds of security over the best known attack.
On Mon, Apr 29, 2024 at 10:41:10PM -0600, Aaron Toponce wrote: > > Note also that currently the Linux RNG is using a portable C > > implementation of ChaCha20. If there is actually a desire to > > accelerate large reads (which again, aren't the main use case of > > the Linux RNG), it would be possible to use a SIMD implementation > > of ChaCha20, which already exists in the kernel. That would speed > > up ChaCha20 by roughly 2-5x depending on the CPU. > > If ChaCha8 makes us uncomfortable, even though defensible, ChaCha12 > is a good compromise. As you mentioned, Google implemented ChaCha12 > in Adiantum. It offers a 1.67x speedup over ChaCha20 while still > providing 5 additional rounds of security over the best known > attack. I'm not sure I see the point of trying to accelerate the Linux RNG. Sure, doing "dd if=/dev/urandom" is *fun*, but what's the real world use case where this actually matters? The kernel RNG is meant for key generation, where a much larger safety margin is a good thing, and where absolute performance is generally not a big deal. After all, with key generation generally you are also performing some kind of assymetric key crypto as part of the IKE or TLS setup, and the time to generate the session key is in the noise. And if you are trying to wipe a disk, using something like shred(1) is really the right tool. Ultimately this boils down to a risk/benefit tradeoff. I judge the risk that you are a shill sent by a nation-state security agency ala Jia Tan of xz infamy, trying to weaken Linux's RNG to be very low; but what's the benefit? If the risk is low, but the benefit is also low, maybe it's not worth it? Cheers, - Ted
On Tue, Apr 30, 2024 at 12:26:32PM -0400, Theodore Ts'o wrote: > I'm not sure I see the point of trying to accelerate the Linux RNG. > Sure, doing "dd if=/dev/urandom" is *fun*, but what's the real world > use case where this actually matters? The kernel RNG is meant for key > generation, where a much larger safety margin is a good thing, and > where absolute performance is generally not a big deal. The goal is just to make the CSPRNG more efficient without sacrificing security. Of course most reads will be small for cryptographic keys. ChaCha8 means even those small reads will be 2.5x more efficient than ChaCha20. The dd(1) example was just to demonstrate the efficiency, not to be "fun". > I judge the risk that you are a shill sent by a nation-state security agency > ala Jia Tan of xz infamy, trying to weaken Linux's RNG to be very low; Unlike Jia Tan, my name is not anonymous. I've been very public and transparent about who I am, the software I work on, the security research I've participated in, and the communities I involve myself in. I don't work for a nation state nor am I interested in compromising the kernel RNG. In fact, I work for a local ISP out of Salt Lake City, Utah where we provide a web hosting product with KVM. We are very interested in a secure Linux stack as our business depends on it. You and I have also had email communication about the kernel RNG in the paste. I've also interacted with Jason Donenfeld about the RNG and putting together a document on the evolution of the RNG from 1.3.30 to current. I'll ignore the attempeted ad hominem. I understand the uneasy feeling due to the xz(1) backdoor and the kneejerk reactions to not trust anyone with proposals that might seem radical.
So first of all, my apologies for giving you offense. I really didn't think you were a shill for the NSA or the MSS, but I have to admit that when I get large set of patches which removes "unnecessary" code, which is _technically_ safe, but which reduces the safety margin, I find myself wondering whether it's part of a binary payload. (This is especially when I get patches from someone that I don't normally receive patches from.) Unfortunately, in the wake of the xz hack, we're just all going to have to be a lot more careful. On Tue, Apr 30, 2024 at 10:44:09AM -0600, Aaron Toponce wrote: > > The goal is just to make the CSPRNG more efficient without sacrificing security. > Of course most reads will be small for cryptographic keys. ChaCha8 means even > those small reads will be 2.5x more efficient than ChaCha20. The dd(1) example > was just to demonstrate the efficiency, not to be "fun". This is a philosophical question; are we going for maximum efficiency, or maximum safety so long as it meets the performance requirements for the intended use case? From an academic perspective, or if a cryptographer is designing cipher for a NIST competition, there's a strong desire for maximum efficiency, since that's one of the metrics used in the competition. But for the Linux RNG, my bias is to go for safety, since we're not competing on who can do the fast bulk encryption, but "sufficiently fast for keygen". People of good will can disagree on what the approach should be. I tend to have much of a pragmatic engineer's perspective. It's been said that the Empire State Building is overbuilt by a factor of 10, but that doesn't bother me. People are now saying that perhaps the Francis Scott Key bridge, when it is rebuilt, should have more safety margin, since container ships have gotten so much bigger. (And apparently, cheap sh*t diesel fuel that is contaminated and the ship owners buy fuel from the lowest bidder.) Or we can talk about how Boeing has been trying to cheap-out on plane manufacturing to save $$$; but I think you get the point of where I'm coming from. I'm not a big fan of trimming safety margins and making things more efficient for it's own sake. (At least in the case of Boeing, the CEO at least got paid $22/million a year, so at least there's that. :-) Now, if this is actually impacting the TLS connection termination for a Facebook or Bing or Google's front end web server, then great, we can try to optimize it. But if it's not a bottleneck, what's the point? Making change for change's sake, especially when it's reducing safety margins, is just one of those things that I find really hard to get excited about. Cheers, - Ted
Hey Aaron, There are probably better ways of speeding this up (e.g. my vDSO work, which should be coming back soon) than just removing rounds and hoping for the best. The problem is that there's extremely broad consensus that ChaCha20 is good at what it does. There's much less so for ChaCha8. JP's _probably_ right, and it all seems like a sensible risk analysis...maybe...but also, why play with fire? Is it really worth it? I don't think there's much harm done in being really conservative about all this. Another consideration with the RNG is that most everybody else's crypto relies on the RNG being good. If some consumer of the RNG wants to use single DES, so be it. If another consumer wants to use a cascade of ChaCha20 and AES and Serpent and Keccak for something, okay. Those aren't our choices. But we shouldn't prevent those choices by weakening the RNG. So while it *might* be kinda overkill, there's also broad consensus that what we've got is *definitely* sufficient for all uses. At the same time, it's still pretty darn fast, there exist other ways to make it faster, and I don't think it's /overly/ much. Jason
My 2 cents: As a cryptanalyst, having discovered the 2008 attack on ChaCha that's only been slightly improved in 16 years: the 20-round ChaCha20is a clear waste of CPU cycles, but ChaCha8 is admittedly risky, though more in terms of PR than pure crypto merits (plus, afaiu the threat model of ChaCha in the Linux PRNG doesnt allow the kind of chosen-IV "attack" known to work on reduced-round versions). Switching from ChaCha20 to ChaCha12 might still raise eyebrows but I dont think any respectable crypto/security expert will suspect a JiaTan situation. On Wed, May 1, 2024 at 2:28 PM Theodore Ts'o <tytso@mit.edu> wrote: > > So first of all, my apologies for giving you offense. I really didn't > think you were a shill for the NSA or the MSS, but I have to admit > that when I get large set of patches which removes "unnecessary" code, > which is _technically_ safe, but which reduces the safety margin, I > find myself wondering whether it's part of a binary payload. (This is > especially when I get patches from someone that I don't normally > receive patches from.) Unfortunately, in the wake of the xz hack, > we're just all going to have to be a lot more careful. > > On Tue, Apr 30, 2024 at 10:44:09AM -0600, Aaron Toponce wrote: > > > > The goal is just to make the CSPRNG more efficient without sacrificing security. > > Of course most reads will be small for cryptographic keys. ChaCha8 means even > > those small reads will be 2.5x more efficient than ChaCha20. The dd(1) example > > was just to demonstrate the efficiency, not to be "fun". > > This is a philosophical question; are we going for maximum efficiency, > or maximum safety so long as it meets the performance requirements for > the intended use case? From an academic perspective, or if a > cryptographer is designing cipher for a NIST competition, there's a > strong desire for maximum efficiency, since that's one of the metrics > used in the competition. But for the Linux RNG, my bias is to go for > safety, since we're not competing on who can do the fast bulk > encryption, but "sufficiently fast for keygen". > > People of good will can disagree on what the approach should be. I > tend to have much of a pragmatic engineer's perspective. It's been > said that the Empire State Building is overbuilt by a factor of 10, > but that doesn't bother me. People are now saying that perhaps the > Francis Scott Key bridge, when it is rebuilt, should have more safety > margin, since container ships have gotten so much bigger. (And > apparently, cheap sh*t diesel fuel that is contaminated and the ship > owners buy fuel from the lowest bidder.) > > Or we can talk about how Boeing has been trying to cheap-out on plane > manufacturing to save $$$; but I think you get the point of where I'm > coming from. I'm not a big fan of trimming safety margins and making > things more efficient for it's own sake. (At least in the case of > Boeing, the CEO at least got paid $22/million a year, so at least > there's that. :-) > > Now, if this is actually impacting the TLS connection termination for > a Facebook or Bing or Google's front end web server, then great, we > can try to optimize it. But if it's not a bottleneck, what's the > point? Making change for change's sake, especially when it's reducing > safety margins, is just one of those things that I find really hard to > get excited about. > > Cheers, > > - Ted
On Wed, May 01, 2024 at 02:38:52PM +0200, Jean-Philippe Aumasson wrote: > Switching from ChaCha20 to ChaCha12 might still raise eyebrows but I > dont think any respectable crypto/security expert will suspect a > JiaTan situation. I also mentioned this earlier in the thread; that is, to switch to ChaCha12 if ChaCha8 makes us uncomfortable. It's not without precedent also: - eSTREAM recommends Salsa20/12 in their final portfolio - Adiantum uses XChaCha12 - Rust uses ChaCha12 rand::rngs::StdRng There may be other precedent of ChaCha12 with from non-trivial projects I'm unfamiliar with.
On Wed, May 01, 2024 at 02:21:35PM +0200, Jason A. Donenfeld wrote: > There are probably better ways of speeding this up (e.g. my vDSO work, > which should be coming back soon) than just removing rounds and hoping > for the best. > > The problem is that there's extremely broad consensus that ChaCha20 is > good at what it does. There's much less so for ChaCha8. JP's _probably_ > right, and it all seems like a sensible risk analysis...maybe...but > also, why play with fire? Is it really worth it? I don't think there's > much harm done in being really conservative about all this. > > Another consideration with the RNG is that most everybody else's crypto > relies on the RNG being good. If some consumer of the RNG wants to use > single DES, so be it. If another consumer wants to use a cascade of > ChaCha20 and AES and Serpent and Keccak for something, okay. Those > aren't our choices. But we shouldn't prevent those choices by weakening > the RNG. > > So while it *might* be kinda overkill, there's also broad consensus that > what we've got is *definitely* sufficient for all uses. At the same > time, it's still pretty darn fast, there exist other ways to make it > faster, and I don't think it's /overly/ much. ChaCha20 reminds me of cascading encryption actually. That's a good analogy. VeraCrypt offers several cascading options choices, such as AES(Twofish), AES(Twofish(Serpent)), Kuzneychik(Serpent(Camellia)), etc. While there isn't anything technically wrong with the approach, most security-minded folks would agree it's overkill. Using just AES, or just Twofish, or just Serpent is more than sufficent. ChaCha20 is kind of like that. It's extra security because "just in case". ChaCha8 is certainly aggressive. As another analogy, it's a 10 character random password. While a 10 character password hashed with MD5 is *probably* fine at 65 bits, 13 random characters (80 bits) would definitely be safer. But 20 random characters (128 bits) is certainly overkill to protect against even the most well-funded orgs with distributed GPU resources cracking password hashes. ChaCha12 seems like a good compromise. It's 5 rounds of security away from the latest known attack while also providing a solid performance improvement. Cheers,
diff --git a/drivers/char/random.c b/drivers/char/random.c index 2597cb43f438..2e14a30b795f 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -302,7 +302,7 @@ static void crng_fast_key_erasure(u8 key[CHACHA_KEY_SIZE], chacha_init_consts(chacha_state); memcpy(&chacha_state[4], key, CHACHA_KEY_SIZE); memset(&chacha_state[12], 0, sizeof(u32) * 4); - chacha20_block(chacha_state, first_block); + chacha8_block(chacha_state, first_block); memcpy(key, first_block, CHACHA_KEY_SIZE); memcpy(random_data, first_block + CHACHA_KEY_SIZE, random_data_len); @@ -388,13 +388,13 @@ static void _get_random_bytes(void *buf, size_t len) while (len) { if (len < CHACHA_BLOCK_SIZE) { - chacha20_block(chacha_state, tmp); + chacha8_block(chacha_state, tmp); memcpy(buf, tmp, len); memzero_explicit(tmp, sizeof(tmp)); break; } - chacha20_block(chacha_state, buf); + chacha8_block(chacha_state, buf); if (unlikely(chacha_state[12] == 0)) ++chacha_state[13]; len -= CHACHA_BLOCK_SIZE; @@ -444,7 +444,7 @@ static ssize_t get_random_bytes_user(struct iov_iter *iter) } for (;;) { - chacha20_block(chacha_state, block); + chacha8_block(chacha_state, block); if (unlikely(chacha_state[12] == 0)) ++chacha_state[13]; diff --git a/include/crypto/chacha.h b/include/crypto/chacha.h index b3ea73b81944..64c45121c69a 100644 --- a/include/crypto/chacha.h +++ b/include/crypto/chacha.h @@ -8,8 +8,7 @@ * * The ChaCha paper specifies 20, 12, and 8-round variants. In general, it is * recommended to use the 20-round variant ChaCha20. However, the other - * variants can be needed in some performance-sensitive scenarios. The generic - * ChaCha code currently allows only the 20 and 12-round variants. + * variants can be needed in some performance-sensitive scenarios. */ #ifndef _CRYPTO_CHACHA_H @@ -31,11 +30,22 @@ #define XCHACHA_IV_SIZE 32 void chacha_block_generic(u32 *state, u8 *stream, int nrounds); + static inline void chacha20_block(u32 *state, u8 *stream) { chacha_block_generic(state, stream, 20); } +static inline void chacha12_block(u32 *state, u8 *stream) +{ + chacha_block_generic(state, stream, 12); +} + +static inline void chacha8_block(u32 *state, u8 *stream) +{ + chacha_block_generic(state, stream, 8); +} + void hchacha_block_arch(const u32 *state, u32 *out, int nrounds); void hchacha_block_generic(const u32 *state, u32 *out, int nrounds); diff --git a/lib/crypto/chacha.c b/lib/crypto/chacha.c index b748fd3d256e..15e773629f1d 100644 --- a/lib/crypto/chacha.c +++ b/lib/crypto/chacha.c @@ -18,7 +18,7 @@ static void chacha_permute(u32 *x, int nrounds) int i; /* whitelist the allowed round counts */ - WARN_ON_ONCE(nrounds != 20 && nrounds != 12); + WARN_ON_ONCE(nrounds != 20 && nrounds != 12 && nrounds != 8); for (i = 0; i < nrounds; i += 2) { x[0] += x[4]; x[12] = rol32(x[12] ^ x[0], 16); @@ -67,7 +67,7 @@ static void chacha_permute(u32 *x, int nrounds) * chacha_block_generic - generate one keystream block and increment block counter * @state: input state matrix (16 32-bit words) * @stream: output keystream block (64 bytes) - * @nrounds: number of rounds (20 or 12; 20 is recommended) + * @nrounds: number of rounds (20, 12, or 8; 20 is recommended) * * This is the ChaCha core, a function from 64-byte strings to 64-byte strings. * The caller has already converted the endianness of the input. This function @@ -93,7 +93,7 @@ EXPORT_SYMBOL(chacha_block_generic); * hchacha_block_generic - abbreviated ChaCha core, for XChaCha * @state: input state matrix (16 32-bit words) * @stream: output (8 32-bit words) - * @nrounds: number of rounds (20 or 12; 20 is recommended) + * @nrounds: number of rounds (20, 12, or 8; 20 is recommended) * * HChaCha is the ChaCha equivalent of HSalsa and is an intermediate step * towards XChaCha (see https://cr.yp.to/snuffle/xsalsa-20081128.pdf). HChaCha