Message ID | 20240301192745.14987-1-21cnbao@gmail.com |
---|---|
State | New |
Headers | show |
Series | [v7] crypto: scompress: remove memcpy if sg_nents is 1 and pages are lowmem | expand |
On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > while sg_nents is 1, which is always true for the current kernel > as the only user - zswap is this case, we might have a chance to > remove memcpy, thus improve the performance. > Though sg_nents is 1, its buffer might cross two pages. If those > pages are highmem, we have no cheap way to map them to contiguous > virtual address because kmap doesn't support more than one page > (kmap single higmem page could be still expensive for tlb) and > vmap is expensive. > So we also test and enure page is not highmem in order to safely > use page_to_virt before removing the memcpy. The good news is > that in the most majority of cases, we are lowmem, and we are > always lowmem in those modern and popular hardware. > > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Nhat Pham <nphamcs@gmail.com> > Cc: Yosry Ahmed <yosryahmed@google.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > Tested-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > -v7: > * fix the problem pointed out by Herbert - flush all pages if dst > is longer than one page. > > crypto/scompress.c | 36 +++++++++++++++++++++++++++++------- > 1 file changed, 29 insertions(+), 7 deletions(-) Patch applied. Thanks.
Hi, On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote: [ ... ] > @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) > ret = -ENOSPC; > goto out; > } > - scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen, > - 1); > + if (dst == scratch->dst) { > + scatterwalk_map_and_copy(scratch->dst, req->dst, 0, > + req->dlen, 1); > + } else { > + int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE); > + int i; > + struct page *dst_page = sg_page(req->dst); > + > + for (i = 0; i < nr_pages; i++) > + flush_dcache_page(dst_page + i); flush_dcache_page() is an empty macro on some architectures such as xtensa. This results in Building xtensa:allmodconfig ... failed -------------- Error log: crypto/scompress.c: In function 'scomp_acomp_comp_decomp': crypto/scompress.c:174:38: error: unused variable 'dst_page' on the affected architectures. Guenter
On Mon, Mar 18, 2024 at 7:13 AM Guenter Roeck <linux@roeck-us.net> wrote: > > Hi, > > On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote: > [ ... ] > > @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) > > ret = -ENOSPC; > > goto out; > > } > > - scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen, > > - 1); > > + if (dst == scratch->dst) { > > + scatterwalk_map_and_copy(scratch->dst, req->dst, 0, > > + req->dlen, 1); > > + } else { > > + int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE); > > + int i; > > + struct page *dst_page = sg_page(req->dst); > > + > > + for (i = 0; i < nr_pages; i++) > > + flush_dcache_page(dst_page + i); > > flush_dcache_page() is an empty macro on some architectures > such as xtensa. This results in Hi Guenter, this is a bug of xtensa, could you test the patch: https://lore.kernel.org/all/20240313045036.51065-1-21cnbao@gmail.com/ > > Building xtensa:allmodconfig ... failed > -------------- > Error log: > crypto/scompress.c: In function 'scomp_acomp_comp_decomp': > crypto/scompress.c:174:38: error: unused variable 'dst_page' > > on the affected architectures. > > Guenter Thanks Barry
On 3/17/24 16:48, Barry Song wrote: > On Mon, Mar 18, 2024 at 7:13 AM Guenter Roeck <linux@roeck-us.net> wrote: >> >> Hi, >> >> On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote: >> [ ... ] >>> @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) >>> ret = -ENOSPC; >>> goto out; >>> } >>> - scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen, >>> - 1); >>> + if (dst == scratch->dst) { >>> + scatterwalk_map_and_copy(scratch->dst, req->dst, 0, >>> + req->dlen, 1); >>> + } else { >>> + int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE); >>> + int i; >>> + struct page *dst_page = sg_page(req->dst); >>> + >>> + for (i = 0; i < nr_pages; i++) >>> + flush_dcache_page(dst_page + i); >> >> flush_dcache_page() is an empty macro on some architectures >> such as xtensa. This results in > > Hi Guenter, > > this is a bug of xtensa, could you test the patch: Thanks for the update. > https://lore.kernel.org/all/20240313045036.51065-1-21cnbao@gmail.com/ > That doesn't build for me. CC arch/xtensa/kernel/asm-offsets.s In file included from ./include/linux/highmem.h:8, from ./include/linux/bvec.h:10, from ./include/linux/blk_types.h:10, from ./include/linux/writeback.h:13, from ./include/linux/memcontrol.h:23, from ./include/linux/swap.h:9, from ./include/linux/suspend.h:5, from arch/xtensa/kernel/asm-offsets.c:24: ./include/linux/cacheflush.h:9:5: error: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined, evaluates to 0 [-Werror=undef] 9 | #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE A similar works for loongarch, so I don't know what is wrong. Maybe some context patch is missing. Guenter
On 3/17/24 17:21, Barry Song wrote: > On Mon, Mar 18, 2024 at 8:14 AM Guenter Roeck <linux@roeck-us.net> wrote: >> >> On 3/17/24 16:48, Barry Song wrote: >>> On Mon, Mar 18, 2024 at 7:13 AM Guenter Roeck <linux@roeck-us.net> wrote: >>>> >>>> Hi, >>>> >>>> On Sat, Mar 02, 2024 at 08:27:45AM +1300, Barry Song wrote: >>>> [ ... ] >>>>> @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) >>>>> ret = -ENOSPC; >>>>> goto out; >>>>> } >>>>> - scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen, >>>>> - 1); >>>>> + if (dst == scratch->dst) { >>>>> + scatterwalk_map_and_copy(scratch->dst, req->dst, 0, >>>>> + req->dlen, 1); >>>>> + } else { >>>>> + int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE); >>>>> + int i; >>>>> + struct page *dst_page = sg_page(req->dst); >>>>> + >>>>> + for (i = 0; i < nr_pages; i++) >>>>> + flush_dcache_page(dst_page + i); >>>> >>>> flush_dcache_page() is an empty macro on some architectures >>>> such as xtensa. This results in >>> >>> Hi Guenter, >>> >>> this is a bug of xtensa, could you test the patch: >> >> Thanks for the update. >> >>> https://lore.kernel.org/all/20240313045036.51065-1-21cnbao@gmail.com/ >>> >> >> That doesn't build for me. >> >> CC arch/xtensa/kernel/asm-offsets.s >> In file included from ./include/linux/highmem.h:8, >> from ./include/linux/bvec.h:10, >> from ./include/linux/blk_types.h:10, >> from ./include/linux/writeback.h:13, >> from ./include/linux/memcontrol.h:23, >> from ./include/linux/swap.h:9, >> from ./include/linux/suspend.h:5, >> from arch/xtensa/kernel/asm-offsets.c:24: >> ./include/linux/cacheflush.h:9:5: error: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined, evaluates to 0 [-Werror=undef] >> 9 | #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE >> >> A similar works for loongarch, so I don't know what is wrong. >> Maybe some context patch is missing. > > this is weird as include/asm-generic/cacheflush.h will define it to 0 > > #ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE > static inline void flush_dcache_page(struct page *page) > { > } > > #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0 > #endif > > Maybe somewhere else also needs to be fixed. > Can you tell me your toolchain version and toolchain name? and defconfig name? > config is allmodconfig - GCC_PLUGINS. Toolchains are self-built from gcc source using the buildall system. I tried gcc 11.4, 12.3, and 13.2. I don't think that matters, though, since "asm-generic/cacheflush.h" isn't included from arch/xtensa. The diff below seems to fix the problem, but I have not fully tested it. Guenter --- diff --git a/arch/xtensa/include/asm/cacheflush.h b/arch/xtensa/include/asm/cacheflush.h index 11efdc8eb262..62662919cbbc 100644 --- a/arch/xtensa/include/asm/cacheflush.h +++ b/arch/xtensa/include/asm/cacheflush.h @@ -183,4 +183,6 @@ extern void copy_from_user_page(struct vm_area_struct*, struct page*, #endif +#include <asm-generic/cacheflush.h> + #endif /* _XTENSA_CACHEFLUSH_H */
diff --git a/crypto/scompress.c b/crypto/scompress.c index b108a30a7600..60bbb7ea4060 100644 --- a/crypto/scompress.c +++ b/crypto/scompress.c @@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) struct crypto_scomp *scomp = *tfm_ctx; void **ctx = acomp_request_ctx(req); struct scomp_scratch *scratch; + void *src, *dst; unsigned int dlen; int ret; @@ -134,13 +135,25 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) scratch = raw_cpu_ptr(&scomp_scratch); spin_lock(&scratch->lock); - scatterwalk_map_and_copy(scratch->src, req->src, 0, req->slen, 0); + if (sg_nents(req->src) == 1 && !PageHighMem(sg_page(req->src))) { + src = page_to_virt(sg_page(req->src)) + req->src->offset; + } else { + scatterwalk_map_and_copy(scratch->src, req->src, 0, + req->slen, 0); + src = scratch->src; + } + + if (req->dst && sg_nents(req->dst) == 1 && !PageHighMem(sg_page(req->dst))) + dst = page_to_virt(sg_page(req->dst)) + req->dst->offset; + else + dst = scratch->dst; + if (dir) - ret = crypto_scomp_compress(scomp, scratch->src, req->slen, - scratch->dst, &req->dlen, *ctx); + ret = crypto_scomp_compress(scomp, src, req->slen, + dst, &req->dlen, *ctx); else - ret = crypto_scomp_decompress(scomp, scratch->src, req->slen, - scratch->dst, &req->dlen, *ctx); + ret = crypto_scomp_decompress(scomp, src, req->slen, + dst, &req->dlen, *ctx); if (!ret) { if (!req->dst) { req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL); @@ -152,8 +165,17 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir) ret = -ENOSPC; goto out; } - scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen, - 1); + if (dst == scratch->dst) { + scatterwalk_map_and_copy(scratch->dst, req->dst, 0, + req->dlen, 1); + } else { + int nr_pages = DIV_ROUND_UP(req->dst->offset + req->dlen, PAGE_SIZE); + int i; + struct page *dst_page = sg_page(req->dst); + + for (i = 0; i < nr_pages; i++) + flush_dcache_page(dst_page + i); + } } out: spin_unlock(&scratch->lock);