Message ID | 20240220064414.262582-1-21cnbao@gmail.com |
---|---|
Headers | show |
Series | mm/zswap & crypto/compress: remove a couple of memcpy | expand |
On Tue, Feb 20, 2024 at 07:44:12PM +1300, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > acomp's users might want to know if acomp is really async to > optimize themselves. One typical user which can benefit from > exposed async stat is zswap. > > In zswap, zsmalloc is the most commonly used allocator for > (and perhaps the only one). For zsmalloc, we cannot sleep > while we map the compressed memory, so we copy it to a > temporary buffer. By knowing the alg won't sleep can help > zswap to avoid the need for a buffer. This shows noticeable > improvement in load/store latency of zswap. > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > --- > include/crypto/acompress.h | 6 ++++++ > 1 file changed, 6 insertions(+) Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks,
On Wed, Feb 21, 2024 at 6:35 PM Herbert Xu <herbert@gondor.apana.org.au> wrote: > > On Tue, Feb 20, 2024 at 07:44:14PM +1300, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > while sg_nents is 1 which is always true for the current kernel > > as the only user - zswap is the case, we should remove two big > > memcpy. > > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > Tested-by: Chengming Zhou <zhouchengming@bytedance.com> > > --- > > crypto/scompress.c | 36 +++++++++++++++++++++++++++++------- > > 1 file changed, 29 insertions(+), 7 deletions(-) > > This patch is independent of the other two. Please split it > out so I can apply it directly. Ok. OTOH, patch 3/3 has no dependency with other patches. so patch 3/3 should be perfectly applicable to crypto :-) Hi Andrew, Would you please handle patch 1/3 and 2/3 in mm-tree given Herbert's ack on 1/3? > > > @@ -134,13 +135,25 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) > > scratch = raw_cpu_ptr(&scomp_scratch); > > spin_lock(&scratch->lock); > > > > - scatterwalk_map_and_copy(scratch->src, req->src, 0, req->slen, 0); > > + if (sg_nents(req->src) == 1) { > > + src = kmap_local_page(sg_page(req->src)) + req->src->offset; > > What if the SG entry is longer than PAGE_SIZE (or indeed crosses a > page boundary)? I think the test needs to be strengthened. I don't understand what is the problem for a nents to cross two pages as anyway they are contiguous in both physical and virtual addresses. if they are not contiguous, they will be two nents. > > Thanks, > -- > Email: Herbert Xu <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt Thanks Barry
From: Barry Song <v-songbaohua@oppo.com> The patchset removes a couple of memcpy in zswap and crypto to improve zswap's performance. Thanks for Chengming Zhou's test and perf data. Quote from Chengming, I just tested these three patches on my server, found improvement in the kernel build testcase on a tmpfs with zswap (lz4 + zsmalloc) enabled. mm-stable 501a06fe8e4c patched real 1m38.028s 1m32.317s user 19m11.482s 18m39.439s sys 19m26.445s 17m5.646s This patchset applies to mm-unstable as recently zswap has lots of change. -v5: * remove the helper of exposing algorithm flags, alternative directly expose acomp_is_async() by test ASYNC flag according to Herbert; * remove the fixes of cra_flags for intel and hisilicon async drivers, they are separated patches[1] according to Herbert [1] https://lore.kernel.org/linux-crypto/20240220044222.197614-1-v-songbaohua@oppo.com/ Barry Song (3): crypto: introduce: acomp_is_async to expose if comp drivers might sleep mm/zswap: remove the memcpy if acomp is not sleepable crypto: scompress: remove memcpy if sg_nents is 1 crypto/scompress.c | 36 +++++++++++++++++++++++++++++------- include/crypto/acompress.h | 6 ++++++ mm/zswap.c | 6 ++++-- 3 files changed, 39 insertions(+), 9 deletions(-)