mbox series

[v3,0/14] Zbb string optimizations and call support in alternatives

Message ID 20221130225614.1594256-1-heiko@sntech.de
Headers show
Series Zbb string optimizations and call support in alternatives | expand

Message

Heiko Stuebner Nov. 30, 2022, 10:56 p.m. UTC
From: Heiko Stuebner <heiko.stuebner@vrull.eu>

The Zbb extension can be used to make string functions run a lot
faster.

To allow There are essentially two problems to solve:
- making it possible for str* functions to replace what they do
  in a performant way

  This is done by inlining the core functions and then
  using alternatives to call the actual variant.

  This of course will need a more intelligent selection mechanism
  down the road when more variants may exist using different
  available extensions.

- actually allowing calls in alternatives
  Function calls use auipc + jalr to reach those 32bit relative
  addresses but when they're compiled the offset will be wrong
  as alternatives live in a different section. So when the patch
  gets applied the address will point to the wrong location.

  So similar to arm64 the target addresses need to be updated.

  This is probably also helpful for other things needing more
  complex code in alternatives.


In my half-scientific test-case of running the functions in question
on a 95 character string in a loop of 10000 iterations, the Zbb
variants shave off around 2/3 of the original runtime.


For v2 I got into some sort of cleanup spree for the general instruction
parsing that already existed. A number of places do their own
instruction parsing and I tried consolidating some of them.

Noteable, the kvm parts still do, but I had to stop somewhere :-)

The series is based on v6.1-rc7 right now.

changes since v2:
- add patch fixing the c.jalr funct4 value
- reword some commit messages
- fix position of auipc addition patch (earlier)
- fix compile errors from patch-reordering gone wrong
  (worked at the end of v2, but compiling individual patches
   caused issues) - patches are now tested individually
- limit Zbb variants for GNU as for now
  (LLVM support for .option arch is still under review)
- prevent str-functions from getting optimized to builtin-variants

changes since v1:
- a number of generalizations/cleanups for instruction parsing
- use accessor function to access instructions (Emil)
- actually patch the correct location when having more than one
  instruction in an alternative block
- string function cleanups (comments etc) (Conor)
- move zbb extension above s* extensions in cpu.c lists

changes since rfc:
- make Zbb code actually work
- drop some unneeded patches
- a lot of cleanups

Heiko Stuebner (14):
  RISC-V: fix funct4 definition for c.jalr in parse_asm.h
  RISC-V: add prefix to all constants/macros in parse_asm.h
  RISC-V: detach funct-values from their offset
  RISC-V: add ebreak instructions to definitions
  RISC-V: add auipc elements to parse_asm header
  RISC-V: Move riscv_insn_is_* macros into a common header
  RISC-V: rename parse_asm.h to insn.h
  RISC-V: kprobes: use central defined funct3 constants
  RISC-V: add U-type imm parsing to insn.h header
  RISC-V: add rd reg parsing to insn.h header
  RISC-V: fix auipc-jalr addresses in patched alternatives
  efi/riscv: libstub: mark when compiling libstub
  RISC-V: add infrastructure to allow different str* implementations
  RISC-V: add zbb support to string functions

 arch/riscv/Kconfig                       |  24 ++
 arch/riscv/Makefile                      |   3 +
 arch/riscv/include/asm/alternative.h     |   3 +
 arch/riscv/include/asm/errata_list.h     |   3 +-
 arch/riscv/include/asm/hwcap.h           |   1 +
 arch/riscv/include/asm/insn.h            | 292 +++++++++++++++++++++++
 arch/riscv/include/asm/parse_asm.h       | 219 -----------------
 arch/riscv/include/asm/string.h          |  83 +++++++
 arch/riscv/kernel/alternative.c          |  72 ++++++
 arch/riscv/kernel/cpu.c                  |   1 +
 arch/riscv/kernel/cpufeature.c           |  29 ++-
 arch/riscv/kernel/image-vars.h           |   6 +-
 arch/riscv/kernel/kgdb.c                 |  63 ++---
 arch/riscv/kernel/probes/simulate-insn.c |  19 +-
 arch/riscv/kernel/probes/simulate-insn.h |  26 +-
 arch/riscv/lib/Makefile                  |   6 +
 arch/riscv/lib/strcmp.S                  |  38 +++
 arch/riscv/lib/strcmp_zbb.S              |  96 ++++++++
 arch/riscv/lib/strlen.S                  |  29 +++
 arch/riscv/lib/strlen_zbb.S              | 115 +++++++++
 arch/riscv/lib/strncmp.S                 |  41 ++++
 arch/riscv/lib/strncmp_zbb.S             | 112 +++++++++
 drivers/firmware/efi/libstub/Makefile    |   2 +-
 23 files changed, 982 insertions(+), 301 deletions(-)
 create mode 100644 arch/riscv/include/asm/insn.h
 delete mode 100644 arch/riscv/include/asm/parse_asm.h
 create mode 100644 arch/riscv/lib/strcmp.S
 create mode 100644 arch/riscv/lib/strcmp_zbb.S
 create mode 100644 arch/riscv/lib/strlen.S
 create mode 100644 arch/riscv/lib/strlen_zbb.S
 create mode 100644 arch/riscv/lib/strncmp.S
 create mode 100644 arch/riscv/lib/strncmp_zbb.S

Comments

Heiko Stuebner Dec. 1, 2022, 11:42 a.m. UTC | #1
Am Donnerstag, 1. Dezember 2022, 01:02:08 CET schrieb Conor Dooley:
> On 30/11/2022 22:56, Heiko Stuebner wrote:

> > changes since v2:
> > - add patch fixing the c.jalr funct4 value
> > - reword some commit messages
> > - fix position of auipc addition patch (earlier)
> > - fix compile errors from patch-reordering gone wrong
> >   (worked at the end of v2, but compiling individual patches
> >    caused issues) - patches are now tested individually
> > - limit Zbb variants for GNU as for now
> >   (LLVM support for .option arch is still under review)
> 
> Still no good on that front chief:
> ld.lld: error: undefined symbol: __strlen_generic
> >>> referenced by ctype.c
> >>>               arch/riscv/purgatory/purgatory.ro:(strlcpy)
> >>> referenced by ctype.c
> >>>               arch/riscv/purgatory/purgatory.ro:(strlcat)
> >>> referenced by ctype.c
> >>>               arch/riscv/purgatory/purgatory.ro:(strlcat)
> >>> referenced 3 more times
> make[5]: *** [/stuff/linux/arch/riscv/purgatory/Makefile:85: arch/riscv/purgatory/purgatory.chk] Error 1
> make[5]: Target 'arch/riscv/purgatory/' not remade because of errors.
> make[4]: *** [/stuff/linux/scripts/Makefile.build:500: arch/riscv/purgatory] Error 2

Oh interesting, there is another efistub-like thingy hidden in the tree.
(and CRYPTO_SHA256 needs to be built-in, not a module) to allow the
kexec-purgatory to be build.

The following should do the trick:

---------------- 8< --------------
diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 806c402c874e..b99698983045 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -27,7 +27,7 @@ extern asmlinkage int __strcmp_zbb(const char *cs, const char *ct);
 
 static inline int strcmp(const char *cs, const char *ct)
 {
-#ifdef RISCV_EFISTUB
+#if defined(RISCV_EFISTUB) || defined(RISCV_PURGATORY)
 	return __strcmp_generic(cs, ct);
 #else
 	register const char *a0 asm("a0") = cs;
@@ -55,7 +55,7 @@ extern asmlinkage int __strncmp_zbb(const char *cs,
 
 static inline int strncmp(const char *cs, const char *ct, size_t count)
 {
-#ifdef RISCV_EFISTUB
+#if defined(RISCV_EFISTUB) || defined(RISCV_PURGATORY)
 	return __strncmp_generic(cs, ct, count);
 #else
 	register const char *a0 asm("a0") = cs;
@@ -82,7 +82,7 @@ extern asmlinkage __kernel_size_t __strlen_zbb(const char *);
 
 static inline __kernel_size_t strlen(const char *s)
 {
-#ifdef RISCV_EFISTUB
+#if defined(RISCV_EFISTUB) || defined(RISCV_PURGATORY)
 	return __strlen_generic(s);
 #else
 	register const char *a0 asm("a0") = s;
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index dd58e1d99397..1d0969722875 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -2,6 +2,7 @@
 OBJECT_FILES_NON_STANDARD := y
 
 purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
+purgatory-y += strcmp.o strlen.o strncmp.o
 
 targets += $(purgatory-y)
 PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
@@ -18,6 +19,15 @@ $(obj)/memcpy.o: $(srctree)/arch/riscv/lib/memcpy.S FORCE
 $(obj)/memset.o: $(srctree)/arch/riscv/lib/memset.S FORCE
 	$(call if_changed_rule,as_o_S)
 
+$(obj)/strcmp.o: $(srctree)/arch/riscv/lib/strcmp.S FORCE
+	$(call if_changed_rule,as_o_S)
+
+$(obj)/strlen.o: $(srctree)/arch/riscv/lib/strlen.S FORCE
+	$(call if_changed_rule,as_o_S)
+
+$(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE
+	$(call if_changed_rule,as_o_S)
+
 $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
@@ -46,6 +56,7 @@ PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel
 PURGATORY_CFLAGS := -mcmodel=medany -ffreestanding -fno-zero-initialized-in-bss
 PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN) -DDISABLE_BRANCH_PROFILING
 PURGATORY_CFLAGS += -fno-stack-protector -g0
+PURGATORY_CFLAGS += -DRISCV_PURGATORY
 
 # Default KBUILD_CFLAGS can have -pg option set when FTRACE is enabled. That
 # in turn leaves some undefined symbols like __fentry__ in purgatory and not
@@ -77,6 +88,9 @@ CFLAGS_ctype.o			+= $(PURGATORY_CFLAGS)
 AFLAGS_REMOVE_entry.o		+= -Wa,-gdwarf-2
 AFLAGS_REMOVE_memcpy.o		+= -Wa,-gdwarf-2
 AFLAGS_REMOVE_memset.o		+= -Wa,-gdwarf-2
+AFLAGS_REMOVE_strcmp.o		+= -Wa,-gdwarf-2
+AFLAGS_REMOVE_strlen.o		+= -Wa,-gdwarf-2
+AFLAGS_REMOVE_strncmp.o		+= -Wa,-gdwarf-2
 
 $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
 		$(call if_changed,ld)
Heiko Stuebner Dec. 1, 2022, 10:39 p.m. UTC | #2
Hi Ard,

Am Donnerstag, 1. Dezember 2022, 21:57:00 CET schrieb Ard Biesheuvel:
> On Thu, 1 Dec 2022 at 20:35, Andrew Jones <ajones@ventanamicro.com> wrote:
> >
> > On Wed, Nov 30, 2022 at 11:56:12PM +0100, Heiko Stuebner wrote:
> > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > >
> > > We may want to runtime-optimize some core functions (str*, mem*),
> > > but not have this leak into libstub and cause build issues.
> > > Instead libstub, for the short while it's running, should just use
> > > the generic implementation.
> > >
> > > So, to be able to determine whether functions, that are used both in
> > > libstub and the main kernel, are getting compiled as part of libstub or
> > > not, add a compile-flag we can check via #ifdef.
> > >
> > > Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
> > > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> 
> I think it would be better to update arch/riscv/kernel/image-vars.h so
> that only these generic implementations are exposed to the stub in the
> first place.

The relevant code is in patch13 + patch14.

To provide more context, the actual str* function we want to run is
determined at runtime. This is due to a all the possible extensions
(present and future) a riscv core can or cannot support, which in turn
blooms into a plethora of possible implementations for them.

Of course we want to have a unified kernel image, so we check on boot
for available extensions and patch the call to the actual best function

The introduction in the mentioned patches is still simple with a
generic + bitmanipulation variant, but that is more to keep the changes
somewhat manageable and there are already more variants on the horizon.

So the actual strlen and friends is just an inline function with a call to
the actual function, which gets patched via alternatives.

So this looks then like:

--------- 8< ---------
static inline int strcmp(const char *cs, const char *ct)
{
#if defined(RISCV_EFISTUB) || defined(RISCV_PURGATORY)
        return __strcmp_generic(cs, ct);
#else
        register const char *a0 asm("a0") = cs;
        register const char *a1 asm("a1") = ct;
        register int a0_out asm("a0");

        asm volatile(
                ALTERNATIVE(
                        "call __strcmp_generic\n\t",
                        "call __strcmp_zbb\n\t",
                        0, CPUFEATURE_ZBB, CONFIG_RISCV_ISA_ZBB)
                : "=r"(a0_out)
                : "r"(a0), "r"(a1)
                : "ra", "t0", "t1", "t2", "t3", "t4", "t5");

        return a0_out;
#endif
}
--------- 8< ---------

When that gets pulled into libstub without that separation, libstub ends up
with references to __strcmp_generic and __strcmp_zbb.

Of course the zbb variant would never get used, but still would also need
to be present in libstub and image-vars.h just to make the build happy.

And for every additional variant this would also mean adding more to
unused code to libstub and image-vars.h, hence I came up with the flag
to mark when code gets to be part of libstub.


Heiko



> 
> > > ---
> > >  drivers/firmware/efi/libstub/Makefile | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > > index ef5045a53ce0..777d1ab059e3 100644
> > > --- a/drivers/firmware/efi/libstub/Makefile
> > > +++ b/drivers/firmware/efi/libstub/Makefile
> > > @@ -25,7 +25,7 @@ cflags-$(CONFIG_ARM)                := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> > >                                  -fno-builtin -fpic \
> > >                                  $(call cc-option,-mno-single-pic-base)
> > >  cflags-$(CONFIG_RISCV)               := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> > > -                                -fpic
> > > +                                -fpic -DRISCV_EFISTUB
> > >  cflags-$(CONFIG_LOONGARCH)   := $(subst $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> > >                                  -fpie
> > >
> > > --
> > > 2.35.1
> > >
> >
> > Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
>
Ard Biesheuvel Dec. 2, 2022, 4:37 p.m. UTC | #3
On Thu, 1 Dec 2022 at 23:39, Heiko Stübner <heiko@sntech.de> wrote:
>
> Hi Ard,
>
> Am Donnerstag, 1. Dezember 2022, 21:57:00 CET schrieb Ard Biesheuvel:
> > On Thu, 1 Dec 2022 at 20:35, Andrew Jones <ajones@ventanamicro.com> wrote:
> > >
> > > On Wed, Nov 30, 2022 at 11:56:12PM +0100, Heiko Stuebner wrote:
> > > > From: Heiko Stuebner <heiko.stuebner@vrull.eu>
> > > >
> > > > We may want to runtime-optimize some core functions (str*, mem*),
> > > > but not have this leak into libstub and cause build issues.
> > > > Instead libstub, for the short while it's running, should just use
> > > > the generic implementation.
> > > >
> > > > So, to be able to determine whether functions, that are used both in
> > > > libstub and the main kernel, are getting compiled as part of libstub or
> > > > not, add a compile-flag we can check via #ifdef.
> > > >
> > > > Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
> > > > Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
> >
> > I think it would be better to update arch/riscv/kernel/image-vars.h so
> > that only these generic implementations are exposed to the stub in the
> > first place.
>

Actually, all references to string and memory functions are going away
from the stub. This is already in -next.

EFI now has zboot support, which means you can create a EFI bootable
kernel image that carries the actual kernel in compressed form rather
than as a hybrid EFI/bare metal image.