Message ID | 20241123070127.332773-2-kanchana.p.sridhar@intel.com |
---|---|
State | New |
Headers | show |
Series | zswap IAA compress batching | expand |
On Fri, Nov 22, 2024 at 11:01:18PM -0800, Kanchana P Sridhar wrote: > This commit adds batch_compress() and batch_decompress() interfaces to: > > struct acomp_alg > struct crypto_acomp > > This allows the iaa_crypto Intel IAA driver to register implementations for > the batch_compress() and batch_decompress() API, that can subsequently be > invoked from the kernel zswap/zram swap modules to compress/decompress > up to CRYPTO_BATCH_SIZE (i.e. 8) pages in parallel in the IAA hardware > accelerator to improve swapout/swapin performance. > > A new helper function acomp_has_async_batching() can be invoked to query > if a crypto_acomp has registered these batch_compress and batch_decompress > interfaces. > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > --- > crypto/acompress.c | 2 + > include/crypto/acompress.h | 91 +++++++++++++++++++++++++++++ > include/crypto/internal/acompress.h | 16 +++++ > 3 files changed, 109 insertions(+) This should be rebased on top of my request chaining patch: https://lore.kernel.org/linux-crypto/677614fbdc70b31df2e26483c8d2cd1510c8af91.1730021644.git.herbert@gondor.apana.org.au/ Request chaining provides a perfect fit for batching. Cheers,
> -----Original Message----- > From: Herbert Xu <herbert@gondor.apana.org.au> > Sent: Monday, November 25, 2024 1:35 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; ying.huang@intel.com; 21cnbao@gmail.com; > akpm@linux-foundation.org; linux-crypto@vger.kernel.org; > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org; > ebiggers@google.com; surenb@google.com; Accardi, Kristen C > <kristen.c.accardi@intel.com>; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; > Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v4 01/10] crypto: acomp - Define two new interfaces for > compress/decompress batching. > > On Fri, Nov 22, 2024 at 11:01:18PM -0800, Kanchana P Sridhar wrote: > > This commit adds batch_compress() and batch_decompress() interfaces to: > > > > struct acomp_alg > > struct crypto_acomp > > > > This allows the iaa_crypto Intel IAA driver to register implementations for > > the batch_compress() and batch_decompress() API, that can subsequently > be > > invoked from the kernel zswap/zram swap modules to > compress/decompress > > up to CRYPTO_BATCH_SIZE (i.e. 8) pages in parallel in the IAA hardware > > accelerator to improve swapout/swapin performance. > > > > A new helper function acomp_has_async_batching() can be invoked to > query > > if a crypto_acomp has registered these batch_compress and > batch_decompress > > interfaces. > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > --- > > crypto/acompress.c | 2 + > > include/crypto/acompress.h | 91 +++++++++++++++++++++++++++++ > > include/crypto/internal/acompress.h | 16 +++++ > > 3 files changed, 109 insertions(+) > > This should be rebased on top of my request chaining patch: > > https://lore.kernel.org/linux- > crypto/677614fbdc70b31df2e26483c8d2cd1510c8af91.1730021644.git.herb > ert@gondor.apana.org.au/ > > Request chaining provides a perfect fit for batching. Thanks Herbert. I am working on integrating the request chaining with the iaa_crypto driver, expecting to have this ready for v5. Thanks, Kanchana > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Hi Herbert, > -----Original Message----- > From: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Sent: Monday, November 25, 2024 12:03 PM > To: Herbert Xu <herbert@gondor.apana.org.au> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; ying.huang@intel.com; 21cnbao@gmail.com; > akpm@linux-foundation.org; linux-crypto@vger.kernel.org; > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org; > ebiggers@google.com; surenb@google.com; Accardi, Kristen C > <kristen.c.accardi@intel.com>; Feghali, Wajdi K <wajdi.k.feghali@intel.com>; > Gopal, Vinodh <vinodh.gopal@intel.com>; Sridhar, Kanchana P > <kanchana.p.sridhar@intel.com> > Subject: RE: [PATCH v4 01/10] crypto: acomp - Define two new interfaces for > compress/decompress batching. > > > > -----Original Message----- > > From: Herbert Xu <herbert@gondor.apana.org.au> > > Sent: Monday, November 25, 2024 1:35 AM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > > chengming.zhou@linux.dev; usamaarif642@gmail.com; > > ryan.roberts@arm.com; ying.huang@intel.com; 21cnbao@gmail.com; > > akpm@linux-foundation.org; linux-crypto@vger.kernel.org; > > davem@davemloft.net; clabbe@baylibre.com; ardb@kernel.org; > > ebiggers@google.com; surenb@google.com; Accardi, Kristen C > > <kristen.c.accardi@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; > > Gopal, Vinodh <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v4 01/10] crypto: acomp - Define two new interfaces > for > > compress/decompress batching. > > > > On Fri, Nov 22, 2024 at 11:01:18PM -0800, Kanchana P Sridhar wrote: > > > This commit adds batch_compress() and batch_decompress() interfaces > to: > > > > > > struct acomp_alg > > > struct crypto_acomp > > > > > > This allows the iaa_crypto Intel IAA driver to register implementations for > > > the batch_compress() and batch_decompress() API, that can subsequently > > be > > > invoked from the kernel zswap/zram swap modules to > > compress/decompress > > > up to CRYPTO_BATCH_SIZE (i.e. 8) pages in parallel in the IAA hardware > > > accelerator to improve swapout/swapin performance. > > > > > > A new helper function acomp_has_async_batching() can be invoked to > > query > > > if a crypto_acomp has registered these batch_compress and > > batch_decompress > > > interfaces. > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> > > > --- > > > crypto/acompress.c | 2 + > > > include/crypto/acompress.h | 91 > +++++++++++++++++++++++++++++ > > > include/crypto/internal/acompress.h | 16 +++++ > > > 3 files changed, 109 insertions(+) > > > > This should be rebased on top of my request chaining patch: > > > > https://lore.kernel.org/linux- > > > crypto/677614fbdc70b31df2e26483c8d2cd1510c8af91.1730021644.git.herb > > ert@gondor.apana.org.au/ > > > > Request chaining provides a perfect fit for batching. I wanted to make sure I understand your suggestion: Are you suggesting we implement request chaining for "struct acomp_req" similar to how this is being done for "struct ahash_request" in your patch? I guess I was a bit confused by your comment about rebasing, which would imply a direct use of the request chaining API you've provided for "crypto hash". I would appreciate it if you could clarify. Thanks, Kanchana > > Thanks Herbert. I am working on integrating the request chaining with > the iaa_crypto driver, expecting to have this ready for v5. > > Thanks, > Kanchana > > > > > Cheers, > > -- > > Email: Herbert Xu <herbert@gondor.apana.org.au> > > Home Page: http://gondor.apana.org.au/~herbert/ > > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Tue, Nov 26, 2024 at 02:13:00AM +0000, Sridhar, Kanchana P wrote: > > I wanted to make sure I understand your suggestion: Are you suggesting we > implement request chaining for "struct acomp_req" similar to how this is being > done for "struct ahash_request" in your patch? > > I guess I was a bit confused by your comment about rebasing, which would > imply a direct use of the request chaining API you've provided for "crypto hash". > I would appreciate it if you could clarify. Yes I was referring to the generic part of request chaining, and not rebasing acomp on top of ahash. Cheers,
> -----Original Message----- > From: Herbert Xu <herbert@gondor.apana.org.au> > Sent: Monday, November 25, 2024 6:14 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; > linux-crypto@vger.kernel.org; davem@davemloft.net; clabbe@baylibre.com; > ardb@kernel.org; ebiggers@google.com; surenb@google.com; Accardi, > Kristen C <kristen.c.accardi@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > Subject: Re: [PATCH v4 01/10] crypto: acomp - Define two new interfaces for > compress/decompress batching. > > On Tue, Nov 26, 2024 at 02:13:00AM +0000, Sridhar, Kanchana P wrote: > > > > I wanted to make sure I understand your suggestion: Are you suggesting we > > implement request chaining for "struct acomp_req" similar to how this is > being > > done for "struct ahash_request" in your patch? > > > > I guess I was a bit confused by your comment about rebasing, which would > > imply a direct use of the request chaining API you've provided for "crypto > hash". > > I would appreciate it if you could clarify. > > Yes I was referring to the generic part of request chaining, > and not rebasing acomp on top of ahash. Ok, thanks for the clarification! Would it be simpler if you could submit a crypto_acomp request chaining patch that I can then use in iaa_crypto? I would greatly appreciate this. Thanks, Kanchana > > Cheers, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> -----Original Message----- > From: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Sent: Monday, November 25, 2024 6:37 PM > To: Herbert Xu <herbert@gondor.apana.org.au> > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > chengming.zhou@linux.dev; usamaarif642@gmail.com; > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; > linux-crypto@vger.kernel.org; davem@davemloft.net; clabbe@baylibre.com; > ardb@kernel.org; ebiggers@google.com; surenb@google.com; Accardi, > Kristen C <kristen.c.accardi@intel.com>; Feghali, Wajdi K > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com>; > Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > Subject: RE: [PATCH v4 01/10] crypto: acomp - Define two new interfaces for > compress/decompress batching. > > > > -----Original Message----- > > From: Herbert Xu <herbert@gondor.apana.org.au> > > Sent: Monday, November 25, 2024 6:14 PM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@intel.com> > > Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; > > hannes@cmpxchg.org; yosryahmed@google.com; nphamcs@gmail.com; > > chengming.zhou@linux.dev; usamaarif642@gmail.com; > > ryan.roberts@arm.com; 21cnbao@gmail.com; akpm@linux-foundation.org; > > linux-crypto@vger.kernel.org; davem@davemloft.net; > clabbe@baylibre.com; > > ardb@kernel.org; ebiggers@google.com; surenb@google.com; Accardi, > > Kristen C <kristen.c.accardi@intel.com>; Feghali, Wajdi K > > <wajdi.k.feghali@intel.com>; Gopal, Vinodh <vinodh.gopal@intel.com> > > Subject: Re: [PATCH v4 01/10] crypto: acomp - Define two new interfaces > for > > compress/decompress batching. > > > > On Tue, Nov 26, 2024 at 02:13:00AM +0000, Sridhar, Kanchana P wrote: > > > > > > I wanted to make sure I understand your suggestion: Are you suggesting > we > > > implement request chaining for "struct acomp_req" similar to how this is > > being > > > done for "struct ahash_request" in your patch? > > > > > > I guess I was a bit confused by your comment about rebasing, which > would > > > imply a direct use of the request chaining API you've provided for "crypto > > hash". > > > I would appreciate it if you could clarify. > > > > Yes I was referring to the generic part of request chaining, > > and not rebasing acomp on top of ahash. > > Ok, thanks for the clarification! Would it be simpler if you could submit a > crypto_acomp request chaining patch that I can then use in iaa_crypto? > I would greatly appreciate this. Hi Herbert, I was able to take a more in-depth look at the request chaining you have implemented in crypto ahash, and I think I have a good understanding of what needs to be done in crypto acomp for request chaining. I will go ahead and try to implement this and reach out if I have any questions. I would be interested to know the performance impact if we kept the crypto_wait based chaining of the requests (which makes the request chaining synchronous IIUC), wrt the asynchronous polling that's currently used for batching in the iaa_crypto driver. If you have any ideas on introducing polling to the chaining concept, please do share, I would greatly appreciate it. Besides this, some questions that came up as far as applying request chaining to crypto_acomp_batch_[de]compress were: 1) It appears a calling module like zswap would only be able to get 1 error status for all the requests that are chained, as against individual error statuses for each of the [de]compress jobs. Is this understanding correct? 2) The request chaining makes use of the req->base.complete and req->base.data, which are also used for internal data by the iaa_crypto driver. I can add another "void *data1" member to struct crypto_async_request to work around this, such that iaa_crypto uses "data1" instead of "data". Please let me know if you have any suggestions. Also, if you have already begun working on acomp request chaining, just let me know. I will wait for your patch in this case (rather than implementing it myself). Thanks, Kanchana > > Thanks, > Kanchana > > > > > > Cheers, > > -- > > Email: Herbert Xu <herbert@gondor.apana.org.au> > > Home Page: http://gondor.apana.org.au/~herbert/ > > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
On Wed, Nov 27, 2024 at 01:22:40AM +0000, Sridhar, Kanchana P wrote: > > 1) It appears a calling module like zswap would only be able to get 1 error > status for all the requests that are chained, as against individual error > statuses for each of the [de]compress jobs. Is this understanding correct? No, each request gets its own error in req->base.err. > 2) The request chaining makes use of the req->base.complete and req->base.data, > which are also used for internal data by the iaa_crypto driver. I can add another > "void *data1" member to struct crypto_async_request to work around this, > such that iaa_crypto uses "data1" instead of "data". These fields are meant for the user. It's best not to use them to store driver data, but if you really wanted to, then the API ahash code provides an example of doing it (please only do this as a last resort as it's rather fragile). Cheers,
diff --git a/crypto/acompress.c b/crypto/acompress.c index 6fdf0ff9f3c0..a506db499a37 100644 --- a/crypto/acompress.c +++ b/crypto/acompress.c @@ -71,6 +71,8 @@ static int crypto_acomp_init_tfm(struct crypto_tfm *tfm) acomp->compress = alg->compress; acomp->decompress = alg->decompress; + acomp->batch_compress = alg->batch_compress; + acomp->batch_decompress = alg->batch_decompress; acomp->dst_free = alg->dst_free; acomp->reqsize = alg->reqsize; diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h index 54937b615239..4252bab3d0e1 100644 --- a/include/crypto/acompress.h +++ b/include/crypto/acompress.h @@ -37,12 +37,20 @@ struct acomp_req { void *__ctx[] CRYPTO_MINALIGN_ATTR; }; +/* + * The max compress/decompress batch size, for crypto algorithms + * that support batch_compress and batch_decompress API. + */ +#define CRYPTO_BATCH_SIZE 8UL + /** * struct crypto_acomp - user-instantiated objects which encapsulate * algorithms and core processing logic * * @compress: Function performs a compress operation * @decompress: Function performs a de-compress operation + * @batch_compress: Function performs a batch compress operation + * @batch_decompress: Function performs a batch decompress operation * @dst_free: Frees destination buffer if allocated inside the * algorithm * @reqsize: Context size for (de)compression requests @@ -51,6 +59,20 @@ struct acomp_req { struct crypto_acomp { int (*compress)(struct acomp_req *req); int (*decompress)(struct acomp_req *req); + void (*batch_compress)(struct acomp_req *reqs[], + struct crypto_wait *wait, + struct page *pages[], + u8 *dsts[], + unsigned int dlens[], + int errors[], + int nr_pages); + void (*batch_decompress)(struct acomp_req *reqs[], + struct crypto_wait *wait, + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages); void (*dst_free)(struct scatterlist *dst); unsigned int reqsize; struct crypto_tfm base; @@ -142,6 +164,13 @@ static inline bool acomp_is_async(struct crypto_acomp *tfm) CRYPTO_ALG_ASYNC; } +static inline bool acomp_has_async_batching(struct crypto_acomp *tfm) +{ + return (acomp_is_async(tfm) && + (crypto_comp_alg_common(tfm)->base.cra_flags & CRYPTO_ALG_TYPE_ACOMPRESS) && + tfm->batch_compress && tfm->batch_decompress); +} + static inline struct crypto_acomp *crypto_acomp_reqtfm(struct acomp_req *req) { return __crypto_acomp_tfm(req->base.tfm); @@ -265,4 +294,66 @@ static inline int crypto_acomp_decompress(struct acomp_req *req) return crypto_acomp_reqtfm(req)->decompress(req); } +/** + * crypto_acomp_batch_compress() -- Invoke asynchronous compress of + * a batch of requests + * + * Function invokes the asynchronous batch compress operation + * + * @reqs: @nr_pages asynchronous compress requests. + * @wait: crypto_wait for synchronous acomp batch compress. If NULL, the + * driver must provide a way to process completions asynchronously. + * @pages: Pages to be compressed. + * @dsts: Pre-allocated destination buffers to store results of compression. + * @dlens: Will contain the compressed lengths. + * @errors: zero on successful compression of the corresponding + * req, or error code in case of error. + * @nr_pages: The number of pages, up to CRYPTO_BATCH_SIZE, + * to be compressed. + */ +static inline void crypto_acomp_batch_compress(struct acomp_req *reqs[], + struct crypto_wait *wait, + struct page *pages[], + u8 *dsts[], + unsigned int dlens[], + int errors[], + int nr_pages) +{ + struct crypto_acomp *tfm = crypto_acomp_reqtfm(reqs[0]); + + return tfm->batch_compress(reqs, wait, pages, dsts, + dlens, errors, nr_pages); +} + +/** + * crypto_acomp_batch_decompress() -- Invoke asynchronous decompress of + * a batch of requests + * + * Function invokes the asynchronous batch decompress operation + * + * @reqs: @nr_pages asynchronous decompress requests. + * @wait: crypto_wait for synchronous acomp batch decompress. If NULL, the + * driver must provide a way to process completions asynchronously. + * @srcs: The src buffers to be decompressed. + * @pages: The pages to store the decompressed buffers. + * @slens: Compressed lengths of @srcs. + * @errors: zero on successful compression of the corresponding + * req, or error code in case of error. + * @nr_pages: The number of pages, up to CRYPTO_BATCH_SIZE, + * to be decompressed. + */ +static inline void crypto_acomp_batch_decompress(struct acomp_req *reqs[], + struct crypto_wait *wait, + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages) +{ + struct crypto_acomp *tfm = crypto_acomp_reqtfm(reqs[0]); + + return tfm->batch_decompress(reqs, wait, srcs, pages, + slens, errors, nr_pages); +} + #endif diff --git a/include/crypto/internal/acompress.h b/include/crypto/internal/acompress.h index 8831edaafc05..acfe2d9d5a83 100644 --- a/include/crypto/internal/acompress.h +++ b/include/crypto/internal/acompress.h @@ -17,6 +17,8 @@ * * @compress: Function performs a compress operation * @decompress: Function performs a de-compress operation + * @batch_compress: Function performs a batch compress operation + * @batch_decompress: Function performs a batch decompress operation * @dst_free: Frees destination buffer if allocated inside the algorithm * @init: Initialize the cryptographic transformation object. * This function is used to initialize the cryptographic @@ -37,6 +39,20 @@ struct acomp_alg { int (*compress)(struct acomp_req *req); int (*decompress)(struct acomp_req *req); + void (*batch_compress)(struct acomp_req *reqs[], + struct crypto_wait *wait, + struct page *pages[], + u8 *dsts[], + unsigned int dlens[], + int errors[], + int nr_pages); + void (*batch_decompress)(struct acomp_req *reqs[], + struct crypto_wait *wait, + u8 *srcs[], + struct page *pages[], + unsigned int slens[], + int errors[], + int nr_pages); void (*dst_free)(struct scatterlist *dst); int (*init)(struct crypto_acomp *tfm); void (*exit)(struct crypto_acomp *tfm);
This commit adds batch_compress() and batch_decompress() interfaces to: struct acomp_alg struct crypto_acomp This allows the iaa_crypto Intel IAA driver to register implementations for the batch_compress() and batch_decompress() API, that can subsequently be invoked from the kernel zswap/zram swap modules to compress/decompress up to CRYPTO_BATCH_SIZE (i.e. 8) pages in parallel in the IAA hardware accelerator to improve swapout/swapin performance. A new helper function acomp_has_async_batching() can be invoked to query if a crypto_acomp has registered these batch_compress and batch_decompress interfaces. Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> --- crypto/acompress.c | 2 + include/crypto/acompress.h | 91 +++++++++++++++++++++++++++++ include/crypto/internal/acompress.h | 16 +++++ 3 files changed, 109 insertions(+)