mbox series

[v2,0/2] ARM: allow kernel mode NEON in softirq context

Message ID 20221207103936.2198407-1-ardb@kernel.org
Headers show
Series ARM: allow kernel mode NEON in softirq context | expand

Message

Ard Biesheuvel Dec. 7, 2022, 10:39 a.m. UTC
Currently on ARM, we only permit kernel mode NEON in task context, and
NEON based processing triggered from softirq context is queued for
asynchronous completion via the crypto API's cryptd layer.

For IPsec packet encryption involving highly performant crypto
implementations, this results in a substantial performance hit, and so
it would be desirable to permit those crypto operations to complete
synchronously even when invoked from softirq context.

For example, on a 1 GHz Cortex-A53 machine (SynQuacer), AES-256-GCM
executes in 7.2 cycles per byte, putting an upper bound of ~140 MB/s
on the achievable throughput of a single CPU.

Without these changes, an IPsec tunnel from a 32-bit VM to the 64-bit
host can achieve a throughput of 9.5 MB/s TX and 11.9 MB/s RX.

When the crypto algorithm is permitted to execute in softirq context,
the throughput increases to 16.5 MB/s TX and 41 MB/s RX.

(This is measured using debian's iperf3 3.11 with the default options)

So let's reorganize the VFP state handling so that it its critical
handling of the FPU registers runs with softirqs disabled. Then, update
the kernel_neon_begin()/end() logic to keep softirq processing disabled
as long as the NEON is being used in kernel mode.

Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>

Ard Biesheuvel (2):
  ARM: vfp: Manipulate VFP state with softirqs disabled
  ARM: permit non-nested kernel mode NEON in softirq context

 arch/arm/include/asm/assembler.h | 19 ++++++++++++-------
 arch/arm/include/asm/simd.h      |  8 ++++++++
 arch/arm/kernel/asm-offsets.c    |  1 +
 arch/arm/vfp/entry.S             |  4 ++--
 arch/arm/vfp/vfphw.S             |  4 ++--
 arch/arm/vfp/vfpmodule.c         | 19 ++++++++++++-------
 6 files changed, 37 insertions(+), 18 deletions(-)
 create mode 100644 arch/arm/include/asm/simd.h

Comments

Martin Willi Dec. 12, 2022, 2:37 p.m. UTC | #1
Hi Ard,

> Currently on ARM, we only permit kernel mode NEON in task context [...]
> For IPsec packet encryption involving highly performant crypto
> implementations, this results in a substantial performance hit [...]

Thanks for your continued work on this.

> Without these changes, an IPsec tunnel from a 32-bit VM to the 64-bit
> host can achieve a throughput of 9.5 MB/s TX and 11.9 MB/s RX.
> 
> When the crypto algorithm is permitted to execute in softirq context,
> the throughput increases to 16.5 MB/s TX and 41 MB/s RX.

In my tests on an Armada 385, I could increase IPsec throughput with
ChaCha20/Poly1305 on RX from ~230 to ~260 MBit/s when using the NEON
code path. So you may add my:

Tested-by: Martin Willi <martin@strongswan.org>

Thanks,
Martin
Ard Biesheuvel Dec. 13, 2022, 4:56 p.m. UTC | #2
On Mon, 12 Dec 2022 at 15:38, Martin Willi <martin@strongswan.org> wrote:
>
> Hi Ard,
>
> > Currently on ARM, we only permit kernel mode NEON in task context [...]
> > For IPsec packet encryption involving highly performant crypto
> > implementations, this results in a substantial performance hit [...]
>
> Thanks for your continued work on this.
>
> > Without these changes, an IPsec tunnel from a 32-bit VM to the 64-bit
> > host can achieve a throughput of 9.5 MB/s TX and 11.9 MB/s RX.
> >
> > When the crypto algorithm is permitted to execute in softirq context,
> > the throughput increases to 16.5 MB/s TX and 41 MB/s RX.
>
> In my tests on an Armada 385, I could increase IPsec throughput with
> ChaCha20/Poly1305 on RX from ~230 to ~260 MBit/s when using the NEON
> code path. So you may add my:
>
> Tested-by: Martin Willi <martin@strongswan.org>
>

Thanks!