mbox series

[00/20] tcg: vector improvements

Message ID 20211218194250.247633-1-richard.henderson@linaro.org
Headers show
Series tcg: vector improvements | expand

Message

Richard Henderson Dec. 18, 2021, 7:42 p.m. UTC
Add some opcodes for compound logic operations that were so
far marked as TODO.  Implement those for PPC and S390X.

We do not want to implement 512-bit width operations, because
those trigger a cluster clock slowdown on the current set of
Intel cpus.  But there are new operations in avx512 that apply
to 128 and 256-bit vectors, which do not trigger the slowdown,
and those are very interesting.


r~


Richard Henderson (20):
  tcg/optimize: Fix folding of vector ops
  tcg: Add opcodes for vector nand, nor, eqv
  tcg/ppc: Implement vector NAND, NOR, EQV
  tcg/s390x: Implement vector NAND, NOR, EQV
  tcg/i386: Detect AVX512
  tcg/i386: Add tcg_out_evex_opc
  tcg/i386: Use tcg_can_emit_vec_op in expand_vec_cmp_noinv
  tcg/i386: Implement avx512 variable shifts
  tcg/i386: Implement avx512 scalar shift
  tcg/i386: Implement avx512 immediate sari shift
  tcg/i386: Implement avx512 immediate rotate
  tcg/i386: Implement avx512 variable rotate
  tcg/i386: Support avx512vbmi2 vector shift-double instructions
  tcg/i386: Expand vector word rotate as avx512vbmi2 shift-double
  tcg/i386: Remove rotls_vec from tcg_target_op_def
  tcg/i386: Expand scalar rotate with avx512 insns
  tcg/i386: Implement avx512 min/max/abs
  tcg/i386: Implement avx512 multiply
  tcg/i386: Implement more logical operations for avx512
  tcg/i386: Implement bitsel for avx512

 include/qemu/cpuid.h          |  20 +-
 include/tcg/tcg-opc.h         |   3 +
 include/tcg/tcg.h             |   3 +
 tcg/aarch64/tcg-target.h      |   3 +
 tcg/arm/tcg-target.h          |   3 +
 tcg/i386/tcg-target-con-set.h |   1 +
 tcg/i386/tcg-target.h         |  17 +-
 tcg/i386/tcg-target.opc.h     |   3 +
 tcg/ppc/tcg-target.h          |   3 +
 tcg/s390x/tcg-target.h        |   3 +
 tcg/optimize.c                |  61 ++++--
 tcg/tcg-op-vec.c              |  27 ++-
 tcg/tcg.c                     |   6 +
 tcg/i386/tcg-target.c.inc     | 386 ++++++++++++++++++++++++++++------
 tcg/ppc/tcg-target.c.inc      |  15 ++
 tcg/s390x/tcg-target.c.inc    |  17 ++
 16 files changed, 472 insertions(+), 99 deletions(-)

Comments

Richard Henderson Jan. 29, 2022, 9:28 a.m. UTC | #1
Ping?

Patch 1 is now upstream, but only patches 2-4 have reviews.
It applies cleanly to master...


r~

On 12/19/21 06:42, Richard Henderson wrote:
> Add some opcodes for compound logic operations that were so
> far marked as TODO.  Implement those for PPC and S390X.
> 
> We do not want to implement 512-bit width operations, because
> those trigger a cluster clock slowdown on the current set of
> Intel cpus.  But there are new operations in avx512 that apply
> to 128 and 256-bit vectors, which do not trigger the slowdown,
> and those are very interesting.
> 
> 
> r~
> 
> 
> Richard Henderson (20):
>    tcg/optimize: Fix folding of vector ops
>    tcg: Add opcodes for vector nand, nor, eqv
>    tcg/ppc: Implement vector NAND, NOR, EQV
>    tcg/s390x: Implement vector NAND, NOR, EQV
>    tcg/i386: Detect AVX512
>    tcg/i386: Add tcg_out_evex_opc
>    tcg/i386: Use tcg_can_emit_vec_op in expand_vec_cmp_noinv
>    tcg/i386: Implement avx512 variable shifts
>    tcg/i386: Implement avx512 scalar shift
>    tcg/i386: Implement avx512 immediate sari shift
>    tcg/i386: Implement avx512 immediate rotate
>    tcg/i386: Implement avx512 variable rotate
>    tcg/i386: Support avx512vbmi2 vector shift-double instructions
>    tcg/i386: Expand vector word rotate as avx512vbmi2 shift-double
>    tcg/i386: Remove rotls_vec from tcg_target_op_def
>    tcg/i386: Expand scalar rotate with avx512 insns
>    tcg/i386: Implement avx512 min/max/abs
>    tcg/i386: Implement avx512 multiply
>    tcg/i386: Implement more logical operations for avx512
>    tcg/i386: Implement bitsel for avx512
> 
>   include/qemu/cpuid.h          |  20 +-
>   include/tcg/tcg-opc.h         |   3 +
>   include/tcg/tcg.h             |   3 +
>   tcg/aarch64/tcg-target.h      |   3 +
>   tcg/arm/tcg-target.h          |   3 +
>   tcg/i386/tcg-target-con-set.h |   1 +
>   tcg/i386/tcg-target.h         |  17 +-
>   tcg/i386/tcg-target.opc.h     |   3 +
>   tcg/ppc/tcg-target.h          |   3 +
>   tcg/s390x/tcg-target.h        |   3 +
>   tcg/optimize.c                |  61 ++++--
>   tcg/tcg-op-vec.c              |  27 ++-
>   tcg/tcg.c                     |   6 +
>   tcg/i386/tcg-target.c.inc     | 386 ++++++++++++++++++++++++++++------
>   tcg/ppc/tcg-target.c.inc      |  15 ++
>   tcg/s390x/tcg-target.c.inc    |  17 ++
>   16 files changed, 472 insertions(+), 99 deletions(-)
>
Alex Bennée Feb. 3, 2022, 10:25 a.m. UTC | #2
Richard Henderson <richard.henderson@linaro.org> writes:

> Add some opcodes for compound logic operations that were so
> far marked as TODO.  Implement those for PPC and S390X.
>
> We do not want to implement 512-bit width operations, because
> those trigger a cluster clock slowdown on the current set of
> Intel cpus.  But there are new operations in avx512 that apply
> to 128 and 256-bit vectors, which do not trigger the slowdown,
> and those are very interesting.

So with a tweak to the vector tests patches I sent yesterday and running
on hackbox (which has AVX on it) I got coverage in tcg/i386 from:

    	   Hit 	Total 	Coverage
Lines: 	   839 	1768 	47.5 %
Functions: 56 	81 	69.1 %
Branches: 336 	864 	38.9 %

to:

           Hit 	Total 	Coverage
Lines: 	   1077 1668 	64.6 %
Functions: 68 	77 	88.3 %
Branches:504 	852 	59.2 %

which I think warrants a:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

for the series.