Message ID | 20170721154238.21697-4-ard.biesheuvel@linaro.org |
---|---|
State | Accepted |
Commit | 6a8487a1f29fa2801a530200c6cdf81592b1a4d7 |
Headers | show |
Series | crypto: scompress - defer allocation of percpu scratch buffers | expand |
Hi Ard, On Fri, Jul 21, 2017 at 04:42:38PM +0100, Ard Biesheuvel wrote: > +static int crypto_scomp_init_tfm(struct crypto_tfm *tfm) > +{ > + int ret; > + > + mutex_lock(&scomp_lock); > + ret = crypto_scomp_alloc_all_scratches(); > + mutex_unlock(&scomp_lock); > + > + return ret; > +} If you allocate the scratch buffers at init_tfm, don't you end up with a situation where if you allocate two tfms of the same algo then you get twice the number of scratches? If that is the case, we should implement a reference count mechanism. Am I missing something? Regards, -- Giovanni
> On 26 Jul 2017, at 00:36, Giovanni Cabiddu <giovanni.cabiddu@intel.com> wrote: > > Hi Ard, > >> On Fri, Jul 21, 2017 at 04:42:38PM +0100, Ard Biesheuvel wrote: >> +static int crypto_scomp_init_tfm(struct crypto_tfm *tfm) >> +{ >> + int ret; >> + >> + mutex_lock(&scomp_lock); >> + ret = crypto_scomp_alloc_all_scratches(); >> + mutex_unlock(&scomp_lock); >> + >> + return ret; >> +} > If you allocate the scratch buffers at init_tfm, don't you end > up with a situation where if you allocate two tfms of the same algo > then you get twice the number of scratches? > If that is the case, we should implement a reference count > mechanism. > Am I missing something? > Isn't the mutex supposed to take care of that?
On Wed, Jul 26, 2017 at 01:07:34AM +0100, Ard Biesheuvel wrote: > > > On 26 Jul 2017, at 00:36, Giovanni Cabiddu <giovanni.cabiddu@intel.com> wrote: > > > > Hi Ard, > > > >> On Fri, Jul 21, 2017 at 04:42:38PM +0100, Ard Biesheuvel wrote: > >> +static int crypto_scomp_init_tfm(struct crypto_tfm *tfm) > >> +{ > >> + int ret; > >> + > >> + mutex_lock(&scomp_lock); > >> + ret = crypto_scomp_alloc_all_scratches(); > >> + mutex_unlock(&scomp_lock); > >> + > >> + return ret; > >> +} > > If you allocate the scratch buffers at init_tfm, don't you end > > up with a situation where if you allocate two tfms of the same algo > > then you get twice the number of scratches? > > If that is the case, we should implement a reference count > > mechanism. > > Am I missing something? > > > > Isn't the mutex supposed to take care of that? Ignore my previous question. There is already a reference count mechanism that prevents that to happen. Your change sounds correct to me. Regards, -- Giovanni
diff --git a/crypto/scompress.c b/crypto/scompress.c index 2c07648305ad..2075e2c4e7df 100644 --- a/crypto/scompress.c +++ b/crypto/scompress.c @@ -65,11 +65,6 @@ static void crypto_scomp_show(struct seq_file *m, struct crypto_alg *alg) seq_puts(m, "type : scomp\n"); } -static int crypto_scomp_init_tfm(struct crypto_tfm *tfm) -{ - return 0; -} - static void crypto_scomp_free_scratches(void * __percpu *scratches) { int i; @@ -134,6 +129,17 @@ static int crypto_scomp_alloc_all_scratches(void) return 0; } +static int crypto_scomp_init_tfm(struct crypto_tfm *tfm) +{ + int ret; + + mutex_lock(&scomp_lock); + ret = crypto_scomp_alloc_all_scratches(); + mutex_unlock(&scomp_lock); + + return ret; +} + static void crypto_scomp_sg_free(struct scatterlist *sgl) { int i, n; @@ -241,6 +247,10 @@ static void crypto_exit_scomp_ops_async(struct crypto_tfm *tfm) struct crypto_scomp **ctx = crypto_tfm_ctx(tfm); crypto_free_scomp(*ctx); + + mutex_lock(&scomp_lock); + crypto_scomp_free_all_scratches(); + mutex_unlock(&scomp_lock); } int crypto_init_scomp_ops_async(struct crypto_tfm *tfm) @@ -317,40 +327,18 @@ static const struct crypto_type crypto_scomp_type = { int crypto_register_scomp(struct scomp_alg *alg) { struct crypto_alg *base = &alg->base; - int ret = -ENOMEM; - - mutex_lock(&scomp_lock); - if (crypto_scomp_alloc_all_scratches()) - goto error; base->cra_type = &crypto_scomp_type; base->cra_flags &= ~CRYPTO_ALG_TYPE_MASK; base->cra_flags |= CRYPTO_ALG_TYPE_SCOMPRESS; - ret = crypto_register_alg(base); - if (ret) - goto error; - - mutex_unlock(&scomp_lock); - return ret; - -error: - crypto_scomp_free_all_scratches(); - mutex_unlock(&scomp_lock); - return ret; + return crypto_register_alg(base); } EXPORT_SYMBOL_GPL(crypto_register_scomp); int crypto_unregister_scomp(struct scomp_alg *alg) { - int ret; - - mutex_lock(&scomp_lock); - ret = crypto_unregister_alg(&alg->base); - crypto_scomp_free_all_scratches(); - mutex_unlock(&scomp_lock); - - return ret; + return crypto_unregister_alg(&alg->base); } EXPORT_SYMBOL_GPL(crypto_unregister_scomp);
The scompress code allocates 2 x 128 KB of scratch buffers for each CPU, so that clients of the async API can use synchronous implementations even from atomic context. However, on systems such as Cavium Thunderx (which has 96 cores), this adds up to a non-negligible 24 MB. Also, 32-bit systems may prefer to use their precious vmalloc space for other things,especially since there don't appear to be any clients for the async compression API yet. So let's defer allocation of the scratch buffers until the first time we allocate an acompress cipher based on an scompress implementation. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- crypto/scompress.c | 46 ++++++++------------ 1 file changed, 17 insertions(+), 29 deletions(-) -- 2.11.0