[bpf-next,0/8] arm32, bpf: add support for cpuv4 insns

Message ID	20230905210621.1711859-1-puranjay12@gmail.com
Headers	show Return-Path: <linux-kselftest-owner@vger.kernel.org> From: Puranjay Mohan <puranjay12@gmail.com> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Song Liu <song@kernel.org>, Yonghong Song <yonghong.song@linux.dev>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, Shubham Bansal <illusionist.neo@gmail.com>, Russell King <linux@armlinux.org.uk>, Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Cc: puranjay12@gmail.com Subject: [PATCH bpf-next 0/8] arm32, bpf: add support for cpuv4 insns Date: Tue, 5 Sep 2023 21:06:13 +0000 Message-Id: <20230905210621.1711859-1-puranjay12@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	arm32, bpf: add support for cpuv4 insns \| expand [bpf-next,0/8] arm32, bpf: add support for cpuv4 insns [bpf-next,1/8] arm32, bpf: add support for 32-bit offset jmp instruction [bpf-next,2/8] arm32, bpf: add support for sign-extension load instruction [bpf-next,3/8] arm32, bpf: add support for sign-extension mov instruction [bpf-next,4/8] arm32, bpf: add support for unconditional bswap instruction [bpf-next,5/8] arm32, bpf: add support for 32-bit signed division [bpf-next,6/8] arm32, bpf: add support for 64 bit division instruction [bpf-next,7/8] selftest, bpf: enable cpu v4 tests for arm32 [bpf-next,8/8] bpf/tests: add tests for cpuv4 instructions

Puranjay Mohan Sept. 5, 2023, 9:06 p.m. UTC

Add the support for cpuv4 instructions for ARM32 BPF JIT. 64-bit division
was not supported earlier so this series adds 64-bit DIV, SDIV, MOD, SMOD
instructions.

This series needs any one of the patches from [1] to support ldsx.

The relevant selftests have passed expect ldsx_insn which needs fentry:

Tested on BeagleBone Black (ARMv7-A):

[root@alarm del]# echo 1 > /proc/sys/net/core/bpf_jit_enable
[root@alarm del]# ./test_progs -a verifier_sdiv,verifier_movsx,verifier_ldsx,verifier_gotol,verifier_bswap
#337/1   verifier_bswap/BSWAP, 16:OK
#337/2   verifier_bswap/BSWAP, 16 @unpriv:OK
#337/3   verifier_bswap/BSWAP, 32:OK
#337/4   verifier_bswap/BSWAP, 32 @unpriv:OK
#337/5   verifier_bswap/BSWAP, 64:OK
#337/6   verifier_bswap/BSWAP, 64 @unpriv:OK
#337     verifier_bswap:OK
#351/1   verifier_gotol/gotol, small_imm:OK
#351/2   verifier_gotol/gotol, small_imm @unpriv:OK
#351     verifier_gotol:OK
#359/1   verifier_ldsx/LDSX, S8:OK
#359/2   verifier_ldsx/LDSX, S8 @unpriv:OK
#359/3   verifier_ldsx/LDSX, S16:OK
#359/4   verifier_ldsx/LDSX, S16 @unpriv:OK
#359/5   verifier_ldsx/LDSX, S32:OK
#359/6   verifier_ldsx/LDSX, S32 @unpriv:OK
#359/7   verifier_ldsx/LDSX, S8 range checking, privileged:OK
#359/8   verifier_ldsx/LDSX, S16 range checking:OK
#359/9   verifier_ldsx/LDSX, S16 range checking @unpriv:OK
#359/10  verifier_ldsx/LDSX, S32 range checking:OK
#359/11  verifier_ldsx/LDSX, S32 range checking @unpriv:OK
#359     verifier_ldsx:OK
#370/1   verifier_movsx/MOV32SX, S8:OK
#370/2   verifier_movsx/MOV32SX, S8 @unpriv:OK
#370/3   verifier_movsx/MOV32SX, S16:OK
#370/4   verifier_movsx/MOV32SX, S16 @unpriv:OK
#370/5   verifier_movsx/MOV64SX, S8:OK
#370/6   verifier_movsx/MOV64SX, S8 @unpriv:OK
#370/7   verifier_movsx/MOV64SX, S16:OK
#370/8   verifier_movsx/MOV64SX, S16 @unpriv:OK
#370/9   verifier_movsx/MOV64SX, S32:OK
#370/10  verifier_movsx/MOV64SX, S32 @unpriv:OK
#370/11  verifier_movsx/MOV32SX, S8, range_check:OK
#370/12  verifier_movsx/MOV32SX, S8, range_check @unpriv:OK
#370/13  verifier_movsx/MOV32SX, S16, range_check:OK
#370/14  verifier_movsx/MOV32SX, S16, range_check @unpriv:OK
#370/15  verifier_movsx/MOV32SX, S16, range_check 2:OK
#370/16  verifier_movsx/MOV32SX, S16, range_check 2 @unpriv:OK
#370/17  verifier_movsx/MOV64SX, S8, range_check:OK
#370/18  verifier_movsx/MOV64SX, S8, range_check @unpriv:OK
#370/19  verifier_movsx/MOV64SX, S16, range_check:OK
#370/20  verifier_movsx/MOV64SX, S16, range_check @unpriv:OK
#370/21  verifier_movsx/MOV64SX, S32, range_check:OK
#370/22  verifier_movsx/MOV64SX, S32, range_check @unpriv:OK
#370/23  verifier_movsx/MOV64SX, S16, R10 Sign Extension:OK
#370/24  verifier_movsx/MOV64SX, S16, R10 Sign Extension @unpriv:OK
#370     verifier_movsx:OK
#382/1   verifier_sdiv/SDIV32, non-zero imm divisor, check 1:OK
#382/2   verifier_sdiv/SDIV32, non-zero imm divisor, check 1 @unpriv:OK
#382/3   verifier_sdiv/SDIV32, non-zero imm divisor, check 2:OK
#382/4   verifier_sdiv/SDIV32, non-zero imm divisor, check 2 @unpriv:OK
#382/5   verifier_sdiv/SDIV32, non-zero imm divisor, check 3:OK
#382/6   verifier_sdiv/SDIV32, non-zero imm divisor, check 3 @unpriv:OK
#382/7   verifier_sdiv/SDIV32, non-zero imm divisor, check 4:OK
#382/8   verifier_sdiv/SDIV32, non-zero imm divisor, check 4 @unpriv:OK
#382/9   verifier_sdiv/SDIV32, non-zero imm divisor, check 5:OK
#382/10  verifier_sdiv/SDIV32, non-zero imm divisor, check 5 @unpriv:OK
#382/11  verifier_sdiv/SDIV32, non-zero imm divisor, check 6:OK
#382/12  verifier_sdiv/SDIV32, non-zero imm divisor, check 6 @unpriv:OK
#382/13  verifier_sdiv/SDIV32, non-zero imm divisor, check 7:OK
#382/14  verifier_sdiv/SDIV32, non-zero imm divisor, check 7 @unpriv:OK
#382/15  verifier_sdiv/SDIV32, non-zero imm divisor, check 8:OK
#382/16  verifier_sdiv/SDIV32, non-zero imm divisor, check 8 @unpriv:OK
#382/17  verifier_sdiv/SDIV32, non-zero reg divisor, check 1:OK
#382/18  verifier_sdiv/SDIV32, non-zero reg divisor, check 1 @unpriv:OK
#382/19  verifier_sdiv/SDIV32, non-zero reg divisor, check 2:OK
#382/20  verifier_sdiv/SDIV32, non-zero reg divisor, check 2 @unpriv:OK
#382/21  verifier_sdiv/SDIV32, non-zero reg divisor, check 3:OK
#382/22  verifier_sdiv/SDIV32, non-zero reg divisor, check 3 @unpriv:OK
#382/23  verifier_sdiv/SDIV32, non-zero reg divisor, check 4:OK
#382/24  verifier_sdiv/SDIV32, non-zero reg divisor, check 4 @unpriv:OK
#382/25  verifier_sdiv/SDIV32, non-zero reg divisor, check 5:OK
#382/26  verifier_sdiv/SDIV32, non-zero reg divisor, check 5 @unpriv:OK
#382/27  verifier_sdiv/SDIV32, non-zero reg divisor, check 6:OK
#382/28  verifier_sdiv/SDIV32, non-zero reg divisor, check 6 @unpriv:OK
#382/29  verifier_sdiv/SDIV32, non-zero reg divisor, check 7:OK
#382/30  verifier_sdiv/SDIV32, non-zero reg divisor, check 7 @unpriv:OK
#382/31  verifier_sdiv/SDIV32, non-zero reg divisor, check 8:OK
#382/32  verifier_sdiv/SDIV32, non-zero reg divisor, check 8 @unpriv:OK
#382/33  verifier_sdiv/SDIV64, non-zero imm divisor, check 1:OK
#382/34  verifier_sdiv/SDIV64, non-zero imm divisor, check 1 @unpriv:OK
#382/35  verifier_sdiv/SDIV64, non-zero imm divisor, check 2:OK
#382/36  verifier_sdiv/SDIV64, non-zero imm divisor, check 2 @unpriv:OK
#382/37  verifier_sdiv/SDIV64, non-zero imm divisor, check 3:OK
#382/38  verifier_sdiv/SDIV64, non-zero imm divisor, check 3 @unpriv:OK
#382/39  verifier_sdiv/SDIV64, non-zero imm divisor, check 4:OK
#382/40  verifier_sdiv/SDIV64, non-zero imm divisor, check 4 @unpriv:OK
#382/41  verifier_sdiv/SDIV64, non-zero imm divisor, check 5:OK
#382/42  verifier_sdiv/SDIV64, non-zero imm divisor, check 5 @unpriv:OK
#382/43  verifier_sdiv/SDIV64, non-zero imm divisor, check 6:OK
#382/44  verifier_sdiv/SDIV64, non-zero imm divisor, check 6 @unpriv:OK
#382/45  verifier_sdiv/SDIV64, non-zero reg divisor, check 1:OK
#382/46  verifier_sdiv/SDIV64, non-zero reg divisor, check 1 @unpriv:OK
#382/47  verifier_sdiv/SDIV64, non-zero reg divisor, check 2:OK
#382/48  verifier_sdiv/SDIV64, non-zero reg divisor, check 2 @unpriv:OK
#382/49  verifier_sdiv/SDIV64, non-zero reg divisor, check 3:OK
#382/50  verifier_sdiv/SDIV64, non-zero reg divisor, check 3 @unpriv:OK
#382/51  verifier_sdiv/SDIV64, non-zero reg divisor, check 4:OK
#382/52  verifier_sdiv/SDIV64, non-zero reg divisor, check 4 @unpriv:OK
#382/53  verifier_sdiv/SDIV64, non-zero reg divisor, check 5:OK
#382/54  verifier_sdiv/SDIV64, non-zero reg divisor, check 5 @unpriv:OK
#382/55  verifier_sdiv/SDIV64, non-zero reg divisor, check 6:OK
#382/56  verifier_sdiv/SDIV64, non-zero reg divisor, check 6 @unpriv:OK
#382/57  verifier_sdiv/SMOD32, non-zero imm divisor, check 1:OK
#382/58  verifier_sdiv/SMOD32, non-zero imm divisor, check 1 @unpriv:OK
#382/59  verifier_sdiv/SMOD32, non-zero imm divisor, check 2:OK
#382/60  verifier_sdiv/SMOD32, non-zero imm divisor, check 2 @unpriv:OK
#382/61  verifier_sdiv/SMOD32, non-zero imm divisor, check 3:OK
#382/62  verifier_sdiv/SMOD32, non-zero imm divisor, check 3 @unpriv:OK
#382/63  verifier_sdiv/SMOD32, non-zero imm divisor, check 4:OK
#382/64  verifier_sdiv/SMOD32, non-zero imm divisor, check 4 @unpriv:OK
#382/65  verifier_sdiv/SMOD32, non-zero imm divisor, check 5:OK
#382/66  verifier_sdiv/SMOD32, non-zero imm divisor, check 5 @unpriv:OK
#382/67  verifier_sdiv/SMOD32, non-zero imm divisor, check 6:OK
#382/68  verifier_sdiv/SMOD32, non-zero imm divisor, check 6 @unpriv:OK
#382/69  verifier_sdiv/SMOD32, non-zero reg divisor, check 1:OK
#382/70  verifier_sdiv/SMOD32, non-zero reg divisor, check 1 @unpriv:OK
#382/71  verifier_sdiv/SMOD32, non-zero reg divisor, check 2:OK
#382/72  verifier_sdiv/SMOD32, non-zero reg divisor, check 2 @unpriv:OK
#382/73  verifier_sdiv/SMOD32, non-zero reg divisor, check 3:OK
#382/74  verifier_sdiv/SMOD32, non-zero reg divisor, check 3 @unpriv:OK
#382/75  verifier_sdiv/SMOD32, non-zero reg divisor, check 4:OK
#382/76  verifier_sdiv/SMOD32, non-zero reg divisor, check 4 @unpriv:OK
#382/77  verifier_sdiv/SMOD32, non-zero reg divisor, check 5:OK
#382/78  verifier_sdiv/SMOD32, non-zero reg divisor, check 5 @unpriv:OK
#382/79  verifier_sdiv/SMOD32, non-zero reg divisor, check 6:OK
#382/80  verifier_sdiv/SMOD32, non-zero reg divisor, check 6 @unpriv:OK
#382/81  verifier_sdiv/SMOD64, non-zero imm divisor, check 1:OK
#382/82  verifier_sdiv/SMOD64, non-zero imm divisor, check 1 @unpriv:OK
#382/83  verifier_sdiv/SMOD64, non-zero imm divisor, check 2:OK
#382/84  verifier_sdiv/SMOD64, non-zero imm divisor, check 2 @unpriv:OK
#382/85  verifier_sdiv/SMOD64, non-zero imm divisor, check 3:OK
#382/86  verifier_sdiv/SMOD64, non-zero imm divisor, check 3 @unpriv:OK
#382/87  verifier_sdiv/SMOD64, non-zero imm divisor, check 4:OK
#382/88  verifier_sdiv/SMOD64, non-zero imm divisor, check 4 @unpriv:OK
#382/89  verifier_sdiv/SMOD64, non-zero imm divisor, check 5:OK
#382/90  verifier_sdiv/SMOD64, non-zero imm divisor, check 5 @unpriv:OK
#382/91  verifier_sdiv/SMOD64, non-zero imm divisor, check 6:OK
#382/92  verifier_sdiv/SMOD64, non-zero imm divisor, check 6 @unpriv:OK
#382/93  verifier_sdiv/SMOD64, non-zero imm divisor, check 7:OK
#382/94  verifier_sdiv/SMOD64, non-zero imm divisor, check 7 @unpriv:OK
#382/95  verifier_sdiv/SMOD64, non-zero imm divisor, check 8:OK
#382/96  verifier_sdiv/SMOD64, non-zero imm divisor, check 8 @unpriv:OK
#382/97  verifier_sdiv/SMOD64, non-zero reg divisor, check 1:OK
#382/98  verifier_sdiv/SMOD64, non-zero reg divisor, check 1 @unpriv:OK
#382/99  verifier_sdiv/SMOD64, non-zero reg divisor, check 2:OK
#382/100 verifier_sdiv/SMOD64, non-zero reg divisor, check 2 @unpriv:OK
#382/101 verifier_sdiv/SMOD64, non-zero reg divisor, check 3:OK
#382/102 verifier_sdiv/SMOD64, non-zero reg divisor, check 3 @unpriv:OK
#382/103 verifier_sdiv/SMOD64, non-zero reg divisor, check 4:OK
#382/104 verifier_sdiv/SMOD64, non-zero reg divisor, check 4 @unpriv:OK
#382/105 verifier_sdiv/SMOD64, non-zero reg divisor, check 5:OK
#382/106 verifier_sdiv/SMOD64, non-zero reg divisor, check 5 @unpriv:OK
#382/107 verifier_sdiv/SMOD64, non-zero reg divisor, check 6:OK
#382/108 verifier_sdiv/SMOD64, non-zero reg divisor, check 6 @unpriv:OK
#382/109 verifier_sdiv/SMOD64, non-zero reg divisor, check 7:OK
#382/110 verifier_sdiv/SMOD64, non-zero reg divisor, check 7 @unpriv:OK
#382/111 verifier_sdiv/SMOD64, non-zero reg divisor, check 8:OK
#382/112 verifier_sdiv/SMOD64, non-zero reg divisor, check 8 @unpriv:OK
#382/113 verifier_sdiv/SDIV32, zero divisor:OK
#382/114 verifier_sdiv/SDIV32, zero divisor @unpriv:OK
#382/115 verifier_sdiv/SDIV64, zero divisor:OK
#382/116 verifier_sdiv/SDIV64, zero divisor @unpriv:OK
#382/117 verifier_sdiv/SMOD32, zero divisor:OK
#382/118 verifier_sdiv/SMOD32, zero divisor @unpriv:OK
#382/119 verifier_sdiv/SMOD64, zero divisor:OK
#382/120 verifier_sdiv/SMOD64, zero divisor @unpriv:OK
#382     verifier_sdiv:OK
Summary: 5/163 PASSED, 0 SKIPPED, 0 FAILED

As the selftests don't compile for 32-bit architectures without
modifications I have added new tests to lib/test_bpf.c for cpuv4 insns:

test_bpf: Summary: 1052 PASSED, 0 FAILED, [891/1040 JIT'ed]
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED

[1] https://lore.kernel.org/all/mb61p5y4u3ptd.fsf@amazon.com/

Puranjay Mohan (8):
  arm32, bpf: add support for 32-bit offset jmp instruction
  arm32, bpf: add support for sign-extension load instruction
  arm32, bpf: add support for sign-extension mov instruction
  arm32, bpf: add support for unconditional bswap instruction
  arm32, bpf: add support for 32-bit signed division
  arm32, bpf: add support for 64 bit division instruction
  selftest, bpf: enable cpu v4 tests for arm32
  bpf/tests: add tests for cpuv4 instructions

 arch/arm/net/bpf_jit_32.c                     | 250 +++++++++++-
 arch/arm/net/bpf_jit_32.h                     |   4 +
 include/linux/filter.h                        |  50 ++-
 lib/test_bpf.c                                | 371 ++++++++++++++++++
 .../selftests/bpf/progs/verifier_bswap.c      |   3 +-
 .../selftests/bpf/progs/verifier_gotol.c      |   3 +-
 .../selftests/bpf/progs/verifier_ldsx.c       |   3 +-
 .../selftests/bpf/progs/verifier_movsx.c      |   3 +-
 .../selftests/bpf/progs/verifier_sdiv.c       |   3 +-
 9 files changed, 665 insertions(+), 25 deletions(-)

Russell King (Oracle) Sept. 5, 2023, 9:24 p.m. UTC | #1

On Tue, Sep 05, 2023 at 09:06:15PM +0000, Puranjay Mohan wrote:
> The cpuv4 added the support of an instruction that is similar to load
> but also sign-extends the result after the load.
> 
> BPF_MEMSX | <size> | BPF_LDX means dst = *(signed size *) (src + offset)
> here <size> can be one of BPF_B, BPF_H, BPF_W.
> 
> ARM32 has instructions to load a byte or a half word with sign
> extension into a 32bit register. As the JIT uses two 32 bit registers
> to simulate a 64-bit BPF register, an extra instruction is emitted to
> sign-extent the result up to the second register.
> 
> Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
> ---
>  arch/arm/net/bpf_jit_32.c | 69 ++++++++++++++++++++++++++++++++++++++-
>  arch/arm/net/bpf_jit_32.h |  2 ++
>  2 files changed, 70 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index b26579da770e..f7c162479cf2 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -333,6 +333,9 @@ static u32 arm_bpf_ldst_imm8(u32 op, u8 rt, u8 rn, s16 imm8)
>  #define ARM_LDRD_I(rt, rn, off)	arm_bpf_ldst_imm8(ARM_INST_LDRD_I, rt, rn, off)
>  #define ARM_LDRH_I(rt, rn, off)	arm_bpf_ldst_imm8(ARM_INST_LDRH_I, rt, rn, off)
>  
> +#define ARM_LDRSH_I(rt, rn, off) arm_bpf_ldst_imm8(ARM_INST_LDRSH_I, rt, rn, off)
> +#define ARM_LDRSB_I(rt, rn, off) arm_bpf_ldst_imm8(ARM_INST_LDRSB_I, rt, rn, off)
> +
>  #define ARM_STR_I(rt, rn, off)	arm_bpf_ldst_imm12(ARM_INST_STR_I, rt, rn, off)
>  #define ARM_STRB_I(rt, rn, off)	arm_bpf_ldst_imm12(ARM_INST_STRB_I, rt, rn, off)
>  #define ARM_STRD_I(rt, rn, off)	arm_bpf_ldst_imm8(ARM_INST_STRD_I, rt, rn, off)
> @@ -1026,6 +1029,24 @@ static bool is_ldst_imm(s16 off, const u8 size)
>  	return -off_max <= off && off <= off_max;
>  }
>  
> +static bool is_ldst_imm8(s16 off, const u8 size)
> +{
> +	s16 off_max = 0;
> +
> +	switch (size) {
> +	case BPF_B:
> +		off_max = 0xff;
> +		break;
> +	case BPF_W:
> +		off_max = 0xfff;
> +		break;
> +	case BPF_H:
> +		off_max = 0xff;
> +		break;
> +	}
> +	return -off_max <= off && off <= off_max;
> +}
> +
>  /* *(size *)(dst + off) = src */
>  static inline void emit_str_r(const s8 dst, const s8 src[],
>  			      s16 off, struct jit_ctx *ctx, const u8 sz){
> @@ -1105,6 +1126,45 @@ static inline void emit_ldx_r(const s8 dst[], const s8 src,
>  	arm_bpf_put_reg64(dst, rd, ctx);
>  }
>  
> +/* dst = *(signed size*)(src + off) */
> +static inline void emit_ldsx_r(const s8 dst[], const s8 src,
> +			       s16 off, struct jit_ctx *ctx, const u8 sz){
> +	const s8 *tmp = bpf2a32[TMP_REG_1];
> +	const s8 *rd = is_stacked(dst_lo) ? tmp : dst;
> +	s8 rm = src;
> +
> +	if (!is_ldst_imm8(off, sz)) {
> +		emit_a32_mov_i(tmp[0], off, ctx);
> +		emit(ARM_ADD_R(tmp[0], tmp[0], src), ctx);

Hmm. This looks inefficient when "off" is able to fit in an immediate.
Please try:

	int add_off;

	if (!is_ldst_imm8(off, sz)) {
		add_off = imm8m(off);
		if (add_off > 0) {
			emit(ARM_ADD_I(tmp[0], src, add_off), ctx);
			rm = tmp[0];
		} else {
			emit_a32_mov_i(tmp[0], off, ctx);
			emit(ARM_ADD_R(tmp[0], tmp[0], src), ctx);
			rm = tmp[0];
		}
		off = 0;
> +	} else if (rd[1] == rm) {
> +		emit(ARM_MOV_R(tmp[0], rm), ctx);
> +		rm = tmp[0];

Why do you need this? rd and rm can be the same for LDRS[BH].

> +	}
> +	switch (sz) {
> +	case BPF_B:
> +		/* Load a Byte with sign extension*/
> +		emit(ARM_LDRSB_I(rd[1], rm, off), ctx);
> +		/* Carry the sign extension to upper 32 bits */
> +		emit(ARM_ASR_I(rd[0], rd[1], 31), ctx);
> +		break;
> +	case BPF_H:
> +		/* Load a HalfWord with sign extension*/
> +		emit(ARM_LDRSH_I(rd[1], rm, off), ctx);
> +		/* Carry the sign extension to upper 32 bits */
> +		emit(ARM_ASR_I(rd[0], rd[1], 31), ctx);
> +		break;
> +	case BPF_W:
> +		/* Load a Word*/
> +		emit(ARM_LDR_I(rd[1], rm, off), ctx);
> +		/* Carry the sign extension to upper 32 bits */
> +		emit(ARM_ASR_I(rd[0], rd[1], 31), ctx);

The last instruction extending to the upper 32 bits is the same in each
of these cases, so is there any reason not to do it outside the switch
statement?

Russell King (Oracle) Sept. 5, 2023, 9:44 p.m. UTC | #2

On Tue, Sep 05, 2023 at 09:06:18PM +0000, Puranjay Mohan wrote:
> The cpuv4 added a new BPF_SDIV instruction that does signed division.
> The encoding is similar to BPF_DIV but BPF_SDIV sets offset=1.
> 
> ARM32 already supports 32-bit BPF_DIV which can be easily extended to
> support BPF_SDIV as ARM32 has the SDIV instruction. When the CPU is not
> ARM-v7, we implement that SDIV/SMOD with the function call similar to
> the implementation of DIV/MOD.
> 
> Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
> ---
>  arch/arm/net/bpf_jit_32.c | 26 ++++++++++++++++++++------
>  arch/arm/net/bpf_jit_32.h |  2 ++
>  2 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
> index 09496203f13e..f580ecf75710 100644
> --- a/arch/arm/net/bpf_jit_32.c
> +++ b/arch/arm/net/bpf_jit_32.c
> @@ -228,6 +228,16 @@ static u32 jit_mod32(u32 dividend, u32 divisor)
>  	return dividend % divisor;
>  }
>  
> +static s32 jit_sdiv32(s32 dividend, s32 divisor)
> +{
> +	return dividend / divisor;
> +}
> +
> +static s32 jit_smod32(s32 dividend, s32 divisor)
> +{
> +	return dividend % divisor;
> +}
> +
>  static inline void _emit(int cond, u32 inst, struct jit_ctx *ctx)
>  {
>  	inst |= (cond << 28);
> @@ -477,7 +487,7 @@ static inline int epilogue_offset(const struct jit_ctx *ctx)
>  	return to - from - 2;
>  }
>  
> -static inline void emit_udivmod(u8 rd, u8 rm, u8 rn, struct jit_ctx *ctx, u8 op)
> +static inline void emit_udivmod(u8 rd, u8 rm, u8 rn, struct jit_ctx *ctx, u8 op, u8 sign)
>  {
>  	const int exclude_mask = BIT(ARM_R0) | BIT(ARM_R1);
>  	const s8 *tmp = bpf2a32[TMP_REG_1];
> @@ -485,9 +495,10 @@ static inline void emit_udivmod(u8 rd, u8 rm, u8 rn, struct jit_ctx *ctx, u8 op)
>  #if __LINUX_ARM_ARCH__ == 7
>  	if (elf_hwcap & HWCAP_IDIVA) {
>  		if (op == BPF_DIV)
> -			emit(ARM_UDIV(rd, rm, rn), ctx);
> +			sign ? emit(ARM_SDIV(rd, rm, rn), ctx) : emit(ARM_UDIV(rd, rm, rn), ctx);

Oh no, let's not go using the ternary operator like that. If we want
to use the ternary operator, then:

			emit(sign ? ARM_SDIV(rd, rm, rn) :
				    ARM_UDIV(rd, rm, rn), ctx);

would be _much_ better, since what is actually conditional is the value
passed to emit().

If we want to avoid the ternary operator altogether, then obviously
if() emit() else emit(), but I'd prefer my suggestion above.
>  	/* Call appropriate function */
> -	emit_mov_i(ARM_IP, op == BPF_DIV ?
> -		   (u32)jit_udiv32 : (u32)jit_mod32, ctx);
> +	if (sign)
> +		emit_mov_i(ARM_IP, op == BPF_DIV ? (u32)jit_sdiv32 : (u32)jit_smod32, ctx);
> +	else
> +		emit_mov_i(ARM_IP, op == BPF_DIV ? (u32)jit_udiv32 : (u32)jit_mod32, ctx);

	u32 dst;

	if (sign) {
		if (op == BPF_DIV)
			dst = (u32)jit_sdiv32;
		else
			dst = (u32)jit_smod32;
	} else {
		if (op == BPF_DIV)
			dst = (u32)jit_udiv32;
		else
			dst = (u32)hit_mod32;
	}

	emit_mov_i(ARM_IP, dst, dtx);
>  	emit_blx_r(ARM_IP, ctx);
>  
>  	/* Restore caller-saved registers from stack */

Puranjay Mohan Sept. 6, 2023, 9:29 a.m. UTC | #3

On Tue, Sep 05 2023, Russell King (Oracle) wrote:

> On Tue, Sep 05, 2023 at 09:06:19PM +0000, Puranjay Mohan wrote:
>> +cont:
>> +
>> +	/* Call appropriate function */
>> +	if (sign)
>> +		emit_mov_i(ARM_IP, op == BPF_DIV ? (u32)jit_sdiv64 : (u32)jit_smod64, ctx);
>> +	else
>> +		emit_mov_i(ARM_IP, op == BPF_DIV ? (u32)jit_udiv64 : (u32)jit_mod64, ctx);
>
> Same comment as the previous patch here.

Will fix both in next version.

>
>> +
>> +	emit_blx_r(ARM_IP, ctx);
>> +
>> +	/* Save return value */
>> +	if (rd[1] != ARM_R0) {
>> +		emit(ARM_MOV_R(rd[0], ARM_R1), ctx);
>> +		emit(ARM_MOV_R(rd[1], ARM_R0), ctx);
>> +	}
>> +
>> +	/* Recover {R1, R0} from stack if it is not Rd */
>> +	if (rd[1] != ARM_R0)
>> +		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
>> +	else
>> +		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
>> +
>> +	/* Recover {R3, R2} from stack if it is not Rd */
>> +	if (rd[1] != ARM_R2)
>> +		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);
>> +	else
>> +		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
>
> 	if (rd[1] != ARM_R0) {
> 		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
> 		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
> 	} else if (rd[1] != ARM_R2) {
> 		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
> 		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);
> 	} else {
> 		emit(ARM_ADD_I(ARM_SP, ARM_SP, 16), ctx);
> 	}
>
> Hmm?

Actually, there can also be a situation where rd[1] != ARM_R0 && rd[1] != ARM_R2,
so should I do it like:

 	if (rd[1] != ARM_R0 && rd[1] != ARM_R2) {
 		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
 		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);      
 	} else if (rd[1] != ARM_R0) {
 		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
 		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
 	} else if (rd[1] != ARM_R2) {
 		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
 		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);
 	} else {
 		emit(ARM_ADD_I(ARM_SP, ARM_SP, 16), ctx);
 	}

Thanks,
Puranjay

Puranjay Mohan Sept. 6, 2023, 11:47 a.m. UTC | #4

On Tue, Sep 05 2023, Russell King (Oracle) wrote:

[...]
>> +/* dst = *(signed size*)(src + off) */
>> +static inline void emit_ldsx_r(const s8 dst[], const s8 src,
>> +			       s16 off, struct jit_ctx *ctx, const u8 sz){
>> +	const s8 *tmp = bpf2a32[TMP_REG_1];
>> +	const s8 *rd = is_stacked(dst_lo) ? tmp : dst;
>> +	s8 rm = src;
>> +
>> +	if (!is_ldst_imm8(off, sz)) {
>> +		emit_a32_mov_i(tmp[0], off, ctx);
>> +		emit(ARM_ADD_R(tmp[0], tmp[0], src), ctx);
>
> Hmm. This looks inefficient when "off" is able to fit in an immediate.
> Please try:
>
> 	int add_off;
>
> 	if (!is_ldst_imm8(off, sz)) {
> 		add_off = imm8m(off);
> 		if (add_off > 0) {
> 			emit(ARM_ADD_I(tmp[0], src, add_off), ctx);
> 			rm = tmp[0];
> 		} else {
> 			emit_a32_mov_i(tmp[0], off, ctx);
> 			emit(ARM_ADD_R(tmp[0], tmp[0], src), ctx);
> 			rm = tmp[0];
> 		}
> 		off = 0;
>> +	} else if (rd[1] == rm) {
>> +		emit(ARM_MOV_R(tmp[0], rm), ctx);
>> +		rm = tmp[0];
>
> Why do you need this? rd and rm can be the same for LDRS[BH].

I agree that this is not required, will remove in the next version.
Will also use the suggested optimization for immediate.

>> +	}
>> +	switch (sz) {
>> +	case BPF_B:
>> +		/* Load a Byte with sign extension*/
>> +		emit(ARM_LDRSB_I(rd[1], rm, off), ctx);
>> +		/* Carry the sign extension to upper 32 bits */
>> +		emit(ARM_ASR_I(rd[0], rd[1], 31), ctx);
>> +		break;
>> +	case BPF_H:
>> +		/* Load a HalfWord with sign extension*/
>> +		emit(ARM_LDRSH_I(rd[1], rm, off), ctx);
>> +		/* Carry the sign extension to upper 32 bits */
>> +		emit(ARM_ASR_I(rd[0], rd[1], 31), ctx);
>> +		break;
>> +	case BPF_W:
>> +		/* Load a Word*/
>> +		emit(ARM_LDR_I(rd[1], rm, off), ctx);
>> +		/* Carry the sign extension to upper 32 bits */
>> +		emit(ARM_ASR_I(rd[0], rd[1], 31), ctx);
>
> The last instruction extending to the upper 32 bits is the same in each
> of these cases, so is there any reason not to do it outside the switch
> statement?

Will move it outside in the next version.


Thanks,
Puranjay

Russell King (Oracle) Sept. 6, 2023, 6:56 p.m. UTC | #5

On Wed, Sep 06, 2023 at 09:29:19AM +0000, Puranjay Mohan wrote:
> On Tue, Sep 05 2023, Russell King (Oracle) wrote:
> 
> > On Tue, Sep 05, 2023 at 09:06:19PM +0000, Puranjay Mohan wrote:
> Actually, there can also be a situation where rd[1] != ARM_R0 && rd[1] != ARM_R2,
> so should I do it like:
> 
>  	if (rd[1] != ARM_R0 && rd[1] != ARM_R2) {
>  		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
>  		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);      
>  	} else if (rd[1] != ARM_R0) {
>  		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
>  		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
>  	} else if (rd[1] != ARM_R2) {
>  		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
>  		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);
>  	} else {
>  		emit(ARM_ADD_I(ARM_SP, ARM_SP, 16), ctx);
>  	}

Are you sure all four states are possible?

Puranjay Mohan Sept. 6, 2023, 7:19 p.m. UTC | #6

On Wed, Sep 06 2023, Russell King (Oracle) wrote:

> On Wed, Sep 06, 2023 at 09:29:19AM +0000, Puranjay Mohan wrote:
>> On Tue, Sep 05 2023, Russell King (Oracle) wrote:
>> 
>> > On Tue, Sep 05, 2023 at 09:06:19PM +0000, Puranjay Mohan wrote:
>> Actually, there can also be a situation where rd[1] != ARM_R0 && rd[1] != ARM_R2,
>> so should I do it like:
>> 
>>  	if (rd[1] != ARM_R0 && rd[1] != ARM_R2) {
>>  		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
>>  		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);      
>>  	} else if (rd[1] != ARM_R0) {
>>  		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
>>  		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
>>  	} else if (rd[1] != ARM_R2) {
>>  		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
>>  		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);
>>  	} else {
>>  		emit(ARM_ADD_I(ARM_SP, ARM_SP, 16), ctx);
>>  	}
>
> Are you sure all four states are possible?

ohh!

I just realized that the last else will never run.
rd[1] can never be equal to both ARM_R0 and ARM_R2.
Will fix it in V3 as I already sent out the V2.

I need to learn to leave patches on the list for few days before re-spinning.

Thanks,
Puranjay

Russell King (Oracle) Sept. 6, 2023, 8:26 p.m. UTC | #7

On Wed, Sep 06, 2023 at 07:19:50PM +0000, Puranjay Mohan wrote:
> On Wed, Sep 06 2023, Russell King (Oracle) wrote:
> 
> > On Wed, Sep 06, 2023 at 09:29:19AM +0000, Puranjay Mohan wrote:
> >> On Tue, Sep 05 2023, Russell King (Oracle) wrote:
> >> 
> >> > On Tue, Sep 05, 2023 at 09:06:19PM +0000, Puranjay Mohan wrote:
> >> Actually, there can also be a situation where rd[1] != ARM_R0 && rd[1] != ARM_R2,
> >> so should I do it like:
> >> 
> >>  	if (rd[1] != ARM_R0 && rd[1] != ARM_R2) {
> >>  		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
> >>  		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);      
> >>  	} else if (rd[1] != ARM_R0) {
> >>  		emit(ARM_POP(BIT(ARM_R0) | BIT(ARM_R1)), ctx);
> >>  		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
> >>  	} else if (rd[1] != ARM_R2) {
> >>  		emit(ARM_ADD_I(ARM_SP, ARM_SP, 8), ctx);
> >>  		emit(ARM_POP(BIT(ARM_R2) | BIT(ARM_R3)), ctx);
> >>  	} else {
> >>  		emit(ARM_ADD_I(ARM_SP, ARM_SP, 16), ctx);
> >>  	}
> >
> > Are you sure all four states are possible?
> 
> ohh!
> 
> I just realized that the last else will never run.
> rd[1] can never be equal to both ARM_R0 and ARM_R2.
> Will fix it in V3 as I already sent out the V2.
> 
> I need to learn to leave patches on the list for few days before re-spinning.

The last comment on that is you can pop r0-r3 in one go, rather than
using two instructions.

[bpf-next,0/8] arm32, bpf: add support for cpuv4 insns

Message

Comments