Message ID | 20201020203957.3512851-2-nivedita@alum.mit.edu |
---|---|
State | Superseded |
Headers | show |
Series | crypto: lib/sha256 - cleanup/optimization | expand |
On Tue, Oct 20, 2020 at 04:39:52PM -0400, Arvind Sankar wrote: > Without the barrier_data() inside memzero_explicit(), the compiler may > optimize away the state-clearing if it can tell that the state is not > used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the > function can get inlined into sha256(), in which case the memset is > optimized away. > > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Reviewed-by: Eric Biggers <ebiggers@google.com> Maybe get the one in arch/arm64/crypto/sha3-ce-glue.c too? - Eric
On Wed, Oct 21, 2020 at 09:36:33PM -0700, Eric Biggers wrote: > On Tue, Oct 20, 2020 at 04:39:52PM -0400, Arvind Sankar wrote: > > Without the barrier_data() inside memzero_explicit(), the compiler may > > optimize away the state-clearing if it can tell that the state is not > > used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the > > function can get inlined into sha256(), in which case the memset is > > optimized away. > > > > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> > > Reviewed-by: Eric Biggers <ebiggers@google.com> > > Maybe get the one in arch/arm64/crypto/sha3-ce-glue.c too? > > - Eric Hm, there are a few more as well like that. But now I'm thinking it's only the generic sha256.c that may be problematic. The rest of them are in _final() functions which will be stored as function pointers in a structure, so there should be no risk of them getting optimized away?
On Fri, Oct 23, 2020 at 11:39:27AM -0400, Arvind Sankar wrote: > On Wed, Oct 21, 2020 at 09:36:33PM -0700, Eric Biggers wrote: > > On Tue, Oct 20, 2020 at 04:39:52PM -0400, Arvind Sankar wrote: > > > Without the barrier_data() inside memzero_explicit(), the compiler may > > > optimize away the state-clearing if it can tell that the state is not > > > used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the > > > function can get inlined into sha256(), in which case the memset is > > > optimized away. > > > > > > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> > > > > Reviewed-by: Eric Biggers <ebiggers@google.com> > > > > Maybe get the one in arch/arm64/crypto/sha3-ce-glue.c too? > > > > - Eric > > Hm, there are a few more as well like that. But now I'm thinking it's > only the generic sha256.c that may be problematic. The rest of them are > in _final() functions which will be stored as function pointers in a > structure, so there should be no risk of them getting optimized away? When clearing memory because "it may be sensitive" rather than "it's needed for the code to behave correctly", I think it's best to use memzero_explicit() to make the intent clear, even if it seems that memset() is sufficient. Also keep in mind that support for compiling the kernel with LTO (link-time optimization) is being worked on (and some people already do it), which results in more code being optimized out. - Eric
On Fri, Oct 23, 2020 at 08:56:04AM -0700, Eric Biggers wrote: > > When clearing memory because "it may be sensitive" rather than "it's needed for > the code to behave correctly", I think it's best to use memzero_explicit() to > make the intent clear, even if it seems that memset() is sufficient. Also keep > in mind that support for compiling the kernel with LTO (link-time optimization) > is being worked on (and some people already do it), which results in more code > being optimized out. The rule up until now has been that we only use memzero_explicit for stack variables. At this point please don't convert anything else as it will cause too much churn. If LTO did arrive we should do a global conversion. Cheers, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Sat, Oct 24, 2020 at 07:45:36AM +1100, Herbert Xu wrote: > On Fri, Oct 23, 2020 at 08:56:04AM -0700, Eric Biggers wrote: > > > > When clearing memory because "it may be sensitive" rather than "it's needed for > > the code to behave correctly", I think it's best to use memzero_explicit() to > > make the intent clear, even if it seems that memset() is sufficient. Also keep > > in mind that support for compiling the kernel with LTO (link-time optimization) > > is being worked on (and some people already do it), which results in more code > > being optimized out. > > The rule up until now has been that we only use memzero_explicit for > stack variables. At this point please don't convert anything else > as it will cause too much churn. > > If LTO did arrive we should do a global conversion. > LTO is actively being worked on, now up to v6: https://lkml.kernel.org/lkml/20201013003203.4168817-1-samitolvanen@google.com/ And in the real world it's already being used; the Android Compatibility Definition Document strongly recommends enabling CFI, which depends on LTO. It's doubtful that anyone will do a global conversion from memset() to memzero_explicit(), as it's too hard to find all the places that should be converted. They are in lots of different subsystems; the crypto subsystem will have the most, but not all. We just need to fix as many as we can. If you'd like to do something more comprehensive than this patch, that would be great, but I hope we don't wait forever for a global conversion that never happens. FWIW, kfree_sensitive() (formerly kzfree) already got converted by https://git.kernel.org/linus/8982ae527fbef170, and it wasn't really controversial. Some people even wanted Cc stable (which I disagreed with, but it apparently made the final version). - Eric
On Fri, Oct 23, 2020 at 02:53:29PM -0700, Eric Biggers wrote: > > It's doubtful that anyone will do a global conversion from memset() to > memzero_explicit(), as it's too hard to find all the places that should be > converted. They are in lots of different subsystems; the crypto subsystem will > have the most, but not all. We just need to fix as many as we can. If you'd > like to do something more comprehensive than this patch, that would be great, > but I hope we don't wait forever for a global conversion that never happens. I had thought that there were more instances of memset in crypto/ that needed converting, but a quick grep seems to indicate that most are not sensitive. So I'm OK with this patch. Thanks, -- Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
diff --git a/include/crypto/sha1_base.h b/include/crypto/sha1_base.h index 20fd1f7468af..a5d6033efef7 100644 --- a/include/crypto/sha1_base.h +++ b/include/crypto/sha1_base.h @@ -12,6 +12,7 @@ #include <crypto/sha.h> #include <linux/crypto.h> #include <linux/module.h> +#include <linux/string.h> #include <asm/unaligned.h> @@ -101,7 +102,7 @@ static inline int sha1_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++) put_unaligned_be32(sctx->state[i], digest++); - *sctx = (struct sha1_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h index 6ded110783ae..93f9fd21cc06 100644 --- a/include/crypto/sha256_base.h +++ b/include/crypto/sha256_base.h @@ -12,6 +12,7 @@ #include <crypto/sha.h> #include <linux/crypto.h> #include <linux/module.h> +#include <linux/string.h> #include <asm/unaligned.h> @@ -105,7 +106,7 @@ static inline int sha256_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be32)) put_unaligned_be32(sctx->state[i], digest++); - *sctx = (struct sha256_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/include/crypto/sha512_base.h b/include/crypto/sha512_base.h index fb19c77494dc..93ab73baa38e 100644 --- a/include/crypto/sha512_base.h +++ b/include/crypto/sha512_base.h @@ -12,6 +12,7 @@ #include <crypto/sha.h> #include <linux/crypto.h> #include <linux/module.h> +#include <linux/string.h> #include <asm/unaligned.h> @@ -126,7 +127,7 @@ static inline int sha512_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; digest_size > 0; i++, digest_size -= sizeof(__be64)) put_unaligned_be64(sctx->state[i], digest++); - *sctx = (struct sha512_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/include/crypto/sm3_base.h b/include/crypto/sm3_base.h index 1cbf9aa1fe52..2f3a32ab97bb 100644 --- a/include/crypto/sm3_base.h +++ b/include/crypto/sm3_base.h @@ -13,6 +13,7 @@ #include <crypto/sm3.h> #include <linux/crypto.h> #include <linux/module.h> +#include <linux/string.h> #include <asm/unaligned.h> typedef void (sm3_block_fn)(struct sm3_state *sst, u8 const *src, int blocks); @@ -104,7 +105,7 @@ static inline int sm3_base_finish(struct shash_desc *desc, u8 *out) for (i = 0; i < SM3_DIGEST_SIZE / sizeof(__be32); i++) put_unaligned_be32(sctx->state[i], digest++); - *sctx = (struct sm3_state){}; + memzero_explicit(sctx, sizeof(*sctx)); return 0; } diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c index 2321f6cb322f..d43bc39ab05e 100644 --- a/lib/crypto/sha256.c +++ b/lib/crypto/sha256.c @@ -265,7 +265,7 @@ static void __sha256_final(struct sha256_state *sctx, u8 *out, int digest_words) put_unaligned_be32(sctx->state[i], &dst[i]); /* Zeroize sensitive information. */ - memset(sctx, 0, sizeof(*sctx)); + memzero_explicit(sctx, sizeof(*sctx)); } void sha256_final(struct sha256_state *sctx, u8 *out)
Without the barrier_data() inside memzero_explicit(), the compiler may optimize away the state-clearing if it can tell that the state is not used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the function can get inlined into sha256(), in which case the memset is optimized away. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> --- include/crypto/sha1_base.h | 3 ++- include/crypto/sha256_base.h | 3 ++- include/crypto/sha512_base.h | 3 ++- include/crypto/sm3_base.h | 3 ++- lib/crypto/sha256.c | 2 +- 5 files changed, 9 insertions(+), 5 deletions(-)