Message ID | 20211218194250.247633-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | tcg: vector improvements | expand |
Ping? Patch 1 is now upstream, but only patches 2-4 have reviews. It applies cleanly to master... r~ On 12/19/21 06:42, Richard Henderson wrote: > Add some opcodes for compound logic operations that were so > far marked as TODO. Implement those for PPC and S390X. > > We do not want to implement 512-bit width operations, because > those trigger a cluster clock slowdown on the current set of > Intel cpus. But there are new operations in avx512 that apply > to 128 and 256-bit vectors, which do not trigger the slowdown, > and those are very interesting. > > > r~ > > > Richard Henderson (20): > tcg/optimize: Fix folding of vector ops > tcg: Add opcodes for vector nand, nor, eqv > tcg/ppc: Implement vector NAND, NOR, EQV > tcg/s390x: Implement vector NAND, NOR, EQV > tcg/i386: Detect AVX512 > tcg/i386: Add tcg_out_evex_opc > tcg/i386: Use tcg_can_emit_vec_op in expand_vec_cmp_noinv > tcg/i386: Implement avx512 variable shifts > tcg/i386: Implement avx512 scalar shift > tcg/i386: Implement avx512 immediate sari shift > tcg/i386: Implement avx512 immediate rotate > tcg/i386: Implement avx512 variable rotate > tcg/i386: Support avx512vbmi2 vector shift-double instructions > tcg/i386: Expand vector word rotate as avx512vbmi2 shift-double > tcg/i386: Remove rotls_vec from tcg_target_op_def > tcg/i386: Expand scalar rotate with avx512 insns > tcg/i386: Implement avx512 min/max/abs > tcg/i386: Implement avx512 multiply > tcg/i386: Implement more logical operations for avx512 > tcg/i386: Implement bitsel for avx512 > > include/qemu/cpuid.h | 20 +- > include/tcg/tcg-opc.h | 3 + > include/tcg/tcg.h | 3 + > tcg/aarch64/tcg-target.h | 3 + > tcg/arm/tcg-target.h | 3 + > tcg/i386/tcg-target-con-set.h | 1 + > tcg/i386/tcg-target.h | 17 +- > tcg/i386/tcg-target.opc.h | 3 + > tcg/ppc/tcg-target.h | 3 + > tcg/s390x/tcg-target.h | 3 + > tcg/optimize.c | 61 ++++-- > tcg/tcg-op-vec.c | 27 ++- > tcg/tcg.c | 6 + > tcg/i386/tcg-target.c.inc | 386 ++++++++++++++++++++++++++++------ > tcg/ppc/tcg-target.c.inc | 15 ++ > tcg/s390x/tcg-target.c.inc | 17 ++ > 16 files changed, 472 insertions(+), 99 deletions(-) >
Richard Henderson <richard.henderson@linaro.org> writes: > Add some opcodes for compound logic operations that were so > far marked as TODO. Implement those for PPC and S390X. > > We do not want to implement 512-bit width operations, because > those trigger a cluster clock slowdown on the current set of > Intel cpus. But there are new operations in avx512 that apply > to 128 and 256-bit vectors, which do not trigger the slowdown, > and those are very interesting. So with a tweak to the vector tests patches I sent yesterday and running on hackbox (which has AVX on it) I got coverage in tcg/i386 from: Hit Total Coverage Lines: 839 1768 47.5 % Functions: 56 81 69.1 % Branches: 336 864 38.9 % to: Hit Total Coverage Lines: 1077 1668 64.6 % Functions: 68 77 88.3 % Branches:504 852 59.2 % which I think warrants a: Tested-by: Alex Bennée <alex.bennee@linaro.org> for the series.