[27/46] tcg/optimize: Use fold_masks_zs in fold_qemu_ld

Message ID	20241210152401.1823648-28-richard.henderson@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [PATCH 27/46] tcg/optimize: Use fold_masks_zs in fold_qemu_ld Date: Tue, 10 Dec 2024 09:23:42 -0600 Message-ID: <20241210152401.1823648-28-richard.henderson@linaro.org> In-Reply-To: <20241210152401.1823648-1-richard.henderson@linaro.org> References: <20241210152401.1823648-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::234; envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x234.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	tcg: Remove in-flight mask data from OptContext \| expand [00/46] tcg: Remove in-flight mask data from OptContext [01/46] tcg/optimize: Split out finish_bb, finish_ebb [02/46] tcg/optimize: Copy mask writeback to fold_masks [03/46] tcg/optimize: Add fold_masks_zsa, fold_masks_zs, fold_masks_z [04/46] tcg/optimize: Use finish_folding in fold_add, fold_add_vec [05/46] tcg/optimize: Use finish_folding in fold_addsub2 [06/46] tcg/optimize: Use fold_masks_zsa in fold_and [07/46] tcg/optimize: Use fold_masks_zsa in fold_andc [08/46] tcg/optimize: Use fold_masks_zs in fold_bswap [09/46] tcg/optimize: Use fold_masks_z in fold_count_zeros [10/46] tcg/optimize: Use fold_masks_z in fold_ctpop [11/46] tcg/optimize: Use fold_and and fold_masks_z in fold_deposit [12/46] tcg/optimize: Use finish_folding in fold_divide [13/46] tcg/optimize: Use finish_folding in fold_dup, fold_dup2 [14/46] tcg/optimize: Use fold_masks_zs in fold_eqv [15/46] tcg/optimize: Use fold_masks_zsa in fold_extract [16/46] tcg/optimize: Use finish_folding in fold_extract2 [17/46] tcg/optimize: Use fold_masks_zsa in fold_exts [18/46] tcg/optimize: Use fold_masks_zsa in fold_extu [19/46] tcg/optimize: Use fold_masks_zs in fold_movcond [20/46] tcg/optimize: Use finish_folding in fold_mul* [21/46] tcg/optimize: Use fold_masks_zs in fold_nand [22/46] tcg/optimize: Use fold_masks_z in fold_neg_no_const [23/46] tcg/optimize: Use fold_masks_zs in fold_nor [24/46] tcg/optimize: Use fold_masks_zs in fold_not [25/46] tcg/optimize: Use fold_masks_zs in fold_or [26/46] tcg/optimize: Use fold_masks_zs in fold_orc [27/46] tcg/optimize: Use fold_masks_zs in fold_qemu_ld [28/46] tcg/optimize: Return true from fold_qemu_st, fold_tcg_st [29/46] tcg/optimize: Use finish_folding in fold_remainder [30/46] tcg/optimize: Distinguish simplification in fold_setcond_zmask [31/46] tcg/optimize: Use fold_masks_z in fold_setcond [32/46] tcg/optimize: Use fold_masks_zs in fold_negsetcond [33/46] tcg/optimize: Use fold_masks_z in fold_setcond2 [34/46] tcg/optimize: Use finish_folding in fold_cmp_vec [35/46] tcg/optimize: Use finish_folding in fold_cmpsel_vec [36/46] tcg/optimize: Use fold_masks_zsa in fold_sextract [37/46] tcg/optimize: Use fold_masks_zs in fold_shift [38/46] tcg/optimize: Use finish_folding in fold_sub, fold_sub_vec [39/46] tcg/optimize: Use fold_masks_zs in fold_tcg_ld [40/46] tcg/optimize: Use finish_folding in fold_tcg_ld_memcopy [41/46] tcg/optimize: Use fold_masks_zs in fold_xor [42/46] tcg/optimize: Use finish_folding in fold_bitsel_vec [43/46] tcg/optimize: Use finish_folding as default in tcg_optimize [44/46] tcg/optimize: Remove [zsa]_mask from OptContext [45/46] tcg/optimize: Move fold_bitsel_vec into alphabetic sort [46/46] tcg/optimize: Move fold_cmp_vec, fold_cmpsel_vec into alphabetic sort

Message ID

20241210152401.1823648-28-richard.henderson@linaro.org

State

Superseded

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PATCH 27/46] tcg/optimize: Use fold_masks_zs in fold_qemu_ld
Date: Tue, 10 Dec 2024 09:23:42 -0600
Message-ID: <20241210152401.1823648-28-richard.henderson@linaro.org>
In-Reply-To: <20241210152401.1823648-1-richard.henderson@linaro.org>
References: <20241210152401.1823648-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::234;
 envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x234.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org

Series

tcg: Remove in-flight mask data from OptContext | expand

Commit Message

Richard Henderson Dec. 10, 2024, 3:23 p.m. UTC

Be careful not to call fold_masks_zs when the memory operation
is wide enough to require multiple outputs, so split into two
functions: fold_qemu_ld_1reg and fold_qemu_ld_2reg.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

Comments

Pierrick Bouvier Dec. 17, 2024, 8:35 p.m. UTC | #1

On 12/10/24 07:23, Richard Henderson wrote:
> Be careful not to call fold_masks_zs when the memory operation
> is wide enough to require multiple outputs, so split into two
> functions: fold_qemu_ld_1reg and fold_qemu_ld_2reg.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tcg/optimize.c | 28 ++++++++++++++++++++++------
>   1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 76ad02d73b..6f41ef5adb 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -2092,24 +2092,33 @@ static bool fold_orc(OptContext *ctx, TCGOp *op)
>       return fold_masks_zs(ctx, op, -1, s_mask);
>   }
>   
> -static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
> +static bool fold_qemu_ld_1reg(OptContext *ctx, TCGOp *op)
>   {
>       const TCGOpDef *def = &tcg_op_defs[op->opc];
>       MemOpIdx oi = op->args[def->nb_oargs + def->nb_iargs];
>       MemOp mop = get_memop(oi);
>       int width = 8 * memop_size(mop);
> +    uint64_t z_mask = -1, s_mask = 0;
>   
>       if (width < 64) {
> -        ctx->s_mask = MAKE_64BIT_MASK(width, 64 - width);
> +        s_mask = MAKE_64BIT_MASK(width, 64 - width);
>           if (!(mop & MO_SIGN)) {
> -            ctx->z_mask = MAKE_64BIT_MASK(0, width);
> -            ctx->s_mask <<= 1;
> +            z_mask = MAKE_64BIT_MASK(0, width);
> +            s_mask <<= 1;
>           }
>       }
>   
>       /* Opcodes that touch guest memory stop the mb optimization.  */
>       ctx->prev_mb = NULL;
> -    return false;
> +
> +    return fold_masks_zs(ctx, op, z_mask, s_mask);
> +}
> +
> +static bool fold_qemu_ld_2reg(OptContext *ctx, TCGOp *op)
> +{
> +    /* Opcodes that touch guest memory stop the mb optimization.  */
> +    ctx->prev_mb = NULL;
> +    return finish_folding(ctx, op);
>   }
>   
>   static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
> @@ -3001,11 +3010,18 @@ void tcg_optimize(TCGContext *s)
>               break;
>           case INDEX_op_qemu_ld_a32_i32:
>           case INDEX_op_qemu_ld_a64_i32:
> +            done = fold_qemu_ld_1reg(&ctx, op);
> +            break;
>           case INDEX_op_qemu_ld_a32_i64:
>           case INDEX_op_qemu_ld_a64_i64:
> +            if (TCG_TARGET_REG_BITS == 64) {
> +                done = fold_qemu_ld_1reg(&ctx, op);
> +                break;
> +            }
> +            QEMU_FALLTHROUGH;
>           case INDEX_op_qemu_ld_a32_i128:
>           case INDEX_op_qemu_ld_a64_i128:
> -            done = fold_qemu_ld(&ctx, op);
> +            done = fold_qemu_ld_2reg(&ctx, op);
>               break;
>           case INDEX_op_qemu_st8_a32_i32:
>           case INDEX_op_qemu_st8_a64_i32:

Couldn't we handle this case in fold_masks instead (at least the 64 bits 
store on 32 bits guest case)?

Richard Henderson Dec. 18, 2024, 3:26 a.m. UTC | #2

On 12/17/24 14:35, Pierrick Bouvier wrote:
>> @@ -3001,11 +3010,18 @@ void tcg_optimize(TCGContext *s)
>>               break;
>>           case INDEX_op_qemu_ld_a32_i32:
>>           case INDEX_op_qemu_ld_a64_i32:
>> +            done = fold_qemu_ld_1reg(&ctx, op);
>> +            break;
>>           case INDEX_op_qemu_ld_a32_i64:
>>           case INDEX_op_qemu_ld_a64_i64:
>> +            if (TCG_TARGET_REG_BITS == 64) {
>> +                done = fold_qemu_ld_1reg(&ctx, op);
>> +                break;
>> +            }
>> +            QEMU_FALLTHROUGH;
>>           case INDEX_op_qemu_ld_a32_i128:
>>           case INDEX_op_qemu_ld_a64_i128:
>> -            done = fold_qemu_ld(&ctx, op);
>> +            done = fold_qemu_ld_2reg(&ctx, op);
>>               break;
>>           case INDEX_op_qemu_st8_a32_i32:
>>           case INDEX_op_qemu_st8_a64_i32:
> 
> Couldn't we handle this case in fold_masks instead (at least the 64 bits store on 32 bits 
> guest case)?

No, not with the assertion that the TCGOp passed to fold_masks have a single output.


r~

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 76ad02d73b..6f41ef5adb 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -2092,24 +2092,33 @@  static bool fold_orc(OptContext *ctx, TCGOp *op)
     return fold_masks_zs(ctx, op, -1, s_mask);
 }
 
-static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
+static bool fold_qemu_ld_1reg(OptContext *ctx, TCGOp *op)
 {
     const TCGOpDef *def = &tcg_op_defs[op->opc];
     MemOpIdx oi = op->args[def->nb_oargs + def->nb_iargs];
     MemOp mop = get_memop(oi);
     int width = 8 * memop_size(mop);
+    uint64_t z_mask = -1, s_mask = 0;
 
     if (width < 64) {
-        ctx->s_mask = MAKE_64BIT_MASK(width, 64 - width);
+        s_mask = MAKE_64BIT_MASK(width, 64 - width);
         if (!(mop & MO_SIGN)) {
-            ctx->z_mask = MAKE_64BIT_MASK(0, width);
-            ctx->s_mask <<= 1;
+            z_mask = MAKE_64BIT_MASK(0, width);
+            s_mask <<= 1;
         }
     }
 
     /* Opcodes that touch guest memory stop the mb optimization.  */
     ctx->prev_mb = NULL;
-    return false;
+
+    return fold_masks_zs(ctx, op, z_mask, s_mask);
+}
+
+static bool fold_qemu_ld_2reg(OptContext *ctx, TCGOp *op)
+{
+    /* Opcodes that touch guest memory stop the mb optimization.  */
+    ctx->prev_mb = NULL;
+    return finish_folding(ctx, op);
 }
 
 static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
@@ -3001,11 +3010,18 @@  void tcg_optimize(TCGContext *s)
             break;
         case INDEX_op_qemu_ld_a32_i32:
         case INDEX_op_qemu_ld_a64_i32:
+            done = fold_qemu_ld_1reg(&ctx, op);
+            break;
         case INDEX_op_qemu_ld_a32_i64:
         case INDEX_op_qemu_ld_a64_i64:
+            if (TCG_TARGET_REG_BITS == 64) {
+                done = fold_qemu_ld_1reg(&ctx, op);
+                break;
+            }
+            QEMU_FALLTHROUGH;
         case INDEX_op_qemu_ld_a32_i128:
         case INDEX_op_qemu_ld_a64_i128:
-            done = fold_qemu_ld(&ctx, op);
+            done = fold_qemu_ld_2reg(&ctx, op);
             break;
         case INDEX_op_qemu_st8_a32_i32:
         case INDEX_op_qemu_st8_a64_i32:

[27/46] tcg/optimize: Use fold_masks_zs in fold_qemu_ld

Commit Message

Comments

Patch