[00/20] tcg: vector improvements

Message ID	20211218194250.247633-1-richard.henderson@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [PATCH 00/20] tcg: vector improvements Date: Sat, 18 Dec 2021 11:42:30 -0800 Message-Id: <20211218194250.247633-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::1031; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1031.google.com X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	tcg: vector improvements \| expand [00/20] tcg: vector improvements [01/20] tcg/optimize: Fix folding of vector ops [02/20] tcg: Add opcodes for vector nand, nor, eqv [03/20] tcg/ppc: Implement vector NAND, NOR, EQV [04/20] tcg/s390x: Implement vector NAND, NOR, EQV [05/20] tcg/i386: Detect AVX512 [06/20] tcg/i386: Add tcg_out_evex_opc [07/20] tcg/i386: Use tcg_can_emit_vec_op in expand_vec_cmp_noinv [08/20] tcg/i386: Implement avx512 variable shifts [09/20] tcg/i386: Implement avx512 scalar shift [10/20] tcg/i386: Implement avx512 immediate sari shift [11/20] tcg/i386: Implement avx512 immediate rotate [12/20] tcg/i386: Implement avx512 variable rotate [13/20] tcg/i386: Support avx512vbmi2 vector shift-double instructions [14/20] tcg/i386: Expand vector word rotate as avx512vbmi2 shift-double [15/20] tcg/i386: Remove rotls_vec from tcg_target_op_def [16/20] tcg/i386: Expand scalar rotate with avx512 insns [17/20] tcg/i386: Implement avx512 min/max/abs [18/20] tcg/i386: Implement avx512 multiply [19/20] tcg/i386: Implement more logical operations for avx512 [20/20] tcg/i386: Implement bitsel for avx512

Message ID

20211218194250.247633-1-richard.henderson@linaro.org

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PATCH 00/20] tcg: vector improvements
Date: Sat, 18 Dec 2021 11:42:30 -0800
Message-Id: <20211218194250.247633-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::1031;
 envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x1031.google.com
X-Spam_score_int: -12
X-Spam_score: -1.3
X-Spam_bar: -
X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Series

tcg: vector improvements | expand

Message

Richard Henderson Dec. 18, 2021, 7:42 p.m. UTC

Add some opcodes for compound logic operations that were so
far marked as TODO.  Implement those for PPC and S390X.

We do not want to implement 512-bit width operations, because
those trigger a cluster clock slowdown on the current set of
Intel cpus.  But there are new operations in avx512 that apply
to 128 and 256-bit vectors, which do not trigger the slowdown,
and those are very interesting.


r~


Richard Henderson (20):
  tcg/optimize: Fix folding of vector ops
  tcg: Add opcodes for vector nand, nor, eqv
  tcg/ppc: Implement vector NAND, NOR, EQV
  tcg/s390x: Implement vector NAND, NOR, EQV
  tcg/i386: Detect AVX512
  tcg/i386: Add tcg_out_evex_opc
  tcg/i386: Use tcg_can_emit_vec_op in expand_vec_cmp_noinv
  tcg/i386: Implement avx512 variable shifts
  tcg/i386: Implement avx512 scalar shift
  tcg/i386: Implement avx512 immediate sari shift
  tcg/i386: Implement avx512 immediate rotate
  tcg/i386: Implement avx512 variable rotate
  tcg/i386: Support avx512vbmi2 vector shift-double instructions
  tcg/i386: Expand vector word rotate as avx512vbmi2 shift-double
  tcg/i386: Remove rotls_vec from tcg_target_op_def
  tcg/i386: Expand scalar rotate with avx512 insns
  tcg/i386: Implement avx512 min/max/abs
  tcg/i386: Implement avx512 multiply
  tcg/i386: Implement more logical operations for avx512
  tcg/i386: Implement bitsel for avx512

 include/qemu/cpuid.h          |  20 +-
 include/tcg/tcg-opc.h         |   3 +
 include/tcg/tcg.h             |   3 +
 tcg/aarch64/tcg-target.h      |   3 +
 tcg/arm/tcg-target.h          |   3 +
 tcg/i386/tcg-target-con-set.h |   1 +
 tcg/i386/tcg-target.h         |  17 +-
 tcg/i386/tcg-target.opc.h     |   3 +
 tcg/ppc/tcg-target.h          |   3 +
 tcg/s390x/tcg-target.h        |   3 +
 tcg/optimize.c                |  61 ++++--
 tcg/tcg-op-vec.c              |  27 ++-
 tcg/tcg.c                     |   6 +
 tcg/i386/tcg-target.c.inc     | 386 ++++++++++++++++++++++++++++------
 tcg/ppc/tcg-target.c.inc      |  15 ++
 tcg/s390x/tcg-target.c.inc    |  17 ++
 16 files changed, 472 insertions(+), 99 deletions(-)

Comments

Richard Henderson Jan. 29, 2022, 9:28 a.m. UTC | #1

Ping?

Patch 1 is now upstream, but only patches 2-4 have reviews.
It applies cleanly to master...


r~

On 12/19/21 06:42, Richard Henderson wrote:
> Add some opcodes for compound logic operations that were so
> far marked as TODO.  Implement those for PPC and S390X.
> 
> We do not want to implement 512-bit width operations, because
> those trigger a cluster clock slowdown on the current set of
> Intel cpus.  But there are new operations in avx512 that apply
> to 128 and 256-bit vectors, which do not trigger the slowdown,
> and those are very interesting.
> 
> 
> r~
> 
> 
> Richard Henderson (20):
>    tcg/optimize: Fix folding of vector ops
>    tcg: Add opcodes for vector nand, nor, eqv
>    tcg/ppc: Implement vector NAND, NOR, EQV
>    tcg/s390x: Implement vector NAND, NOR, EQV
>    tcg/i386: Detect AVX512
>    tcg/i386: Add tcg_out_evex_opc
>    tcg/i386: Use tcg_can_emit_vec_op in expand_vec_cmp_noinv
>    tcg/i386: Implement avx512 variable shifts
>    tcg/i386: Implement avx512 scalar shift
>    tcg/i386: Implement avx512 immediate sari shift
>    tcg/i386: Implement avx512 immediate rotate
>    tcg/i386: Implement avx512 variable rotate
>    tcg/i386: Support avx512vbmi2 vector shift-double instructions
>    tcg/i386: Expand vector word rotate as avx512vbmi2 shift-double
>    tcg/i386: Remove rotls_vec from tcg_target_op_def
>    tcg/i386: Expand scalar rotate with avx512 insns
>    tcg/i386: Implement avx512 min/max/abs
>    tcg/i386: Implement avx512 multiply
>    tcg/i386: Implement more logical operations for avx512
>    tcg/i386: Implement bitsel for avx512
> 
>   include/qemu/cpuid.h          |  20 +-
>   include/tcg/tcg-opc.h         |   3 +
>   include/tcg/tcg.h             |   3 +
>   tcg/aarch64/tcg-target.h      |   3 +
>   tcg/arm/tcg-target.h          |   3 +
>   tcg/i386/tcg-target-con-set.h |   1 +
>   tcg/i386/tcg-target.h         |  17 +-
>   tcg/i386/tcg-target.opc.h     |   3 +
>   tcg/ppc/tcg-target.h          |   3 +
>   tcg/s390x/tcg-target.h        |   3 +
>   tcg/optimize.c                |  61 ++++--
>   tcg/tcg-op-vec.c              |  27 ++-
>   tcg/tcg.c                     |   6 +
>   tcg/i386/tcg-target.c.inc     | 386 ++++++++++++++++++++++++++++------
>   tcg/ppc/tcg-target.c.inc      |  15 ++
>   tcg/s390x/tcg-target.c.inc    |  17 ++
>   16 files changed, 472 insertions(+), 99 deletions(-)
>

Alex Bennée Feb. 3, 2022, 10:25 a.m. UTC | #2

Richard Henderson <richard.henderson@linaro.org> writes:

> Add some opcodes for compound logic operations that were so
> far marked as TODO.  Implement those for PPC and S390X.
>
> We do not want to implement 512-bit width operations, because
> those trigger a cluster clock slowdown on the current set of
> Intel cpus.  But there are new operations in avx512 that apply
> to 128 and 256-bit vectors, which do not trigger the slowdown,
> and those are very interesting.

So with a tweak to the vector tests patches I sent yesterday and running
on hackbox (which has AVX on it) I got coverage in tcg/i386 from:

    	   Hit 	Total 	Coverage
Lines: 	   839 	1768 	47.5 %
Functions: 56 	81 	69.1 %
Branches: 336 	864 	38.9 %

to:

           Hit 	Total 	Coverage
Lines: 	   1077 1668 	64.6 %
Functions: 68 	77 	88.3 %
Branches:504 	852 	59.2 %

which I think warrants a:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

for the series.