mbox series

[0/2] tcg: Fix branch/label link during plugin expansion

Message ID 20240910212351.977753-1-richard.henderson@linaro.org
Headers show
Series tcg: Fix branch/label link during plugin expansion | expand

Message

Richard Henderson Sept. 10, 2024, 9:23 p.m. UTC
With tcg_last_op(), we always get the last op of the stream.
With TCGContext.emit_before_op, the most recently emitted op
is no longer the last op.

Instead, pass the op being emitted back from the allocator so
that we can link it to the label without needing to look it up.


r~


Richard Henderson (2):
  tcg: Return TCGOp from tcg_gen_op[1-6]
  tcg: Propagate new TCGOp to add_as_label_use

 tcg/tcg-internal.h | 12 +++----
 tcg/tcg-op.c       | 86 +++++++++++++++++++++++++---------------------
 2 files changed, 53 insertions(+), 45 deletions(-)

Comments

Richard Henderson Sept. 10, 2024, 9:28 p.m. UTC | #1
On 9/10/24 14:23, Richard Henderson wrote:
> With tcg_last_op(), we always get the last op of the stream.
> With TCGContext.emit_before_op, the most recently emitted op
> is no longer the last op.
> 
> Instead, pass the op being emitted back from the allocator so
> that we can link it to the label without needing to look it up.

Oh, I meant to point out from whence this comes.
The plugin uses a conditional

  ld_i32 tmp18,env,$0xffffffffffffdb10
  mul_i32 tmp18,tmp18,$0x18
  ext_i32_i64 tmp17,tmp18
  add_i64 tmp17,tmp17,$0x575410edadc8
  ld_i64 tmp21,tmp17,$0x0
  brcond_i64 tmp21,$0x0,ltu,$L1
  ld_i32 tmp18,env,$0xffffffffffffdb10
  call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0
  set_label $L1

Note that the branch is X < 0 (unsigned), which is always false, and thus the branch is 
optimized away.


r~
Alex Bennée Sept. 13, 2024, 10:23 a.m. UTC | #2
Richard Henderson <richard.henderson@linaro.org> writes:

> On 9/10/24 14:23, Richard Henderson wrote:
>> With tcg_last_op(), we always get the last op of the stream.
>> With TCGContext.emit_before_op, the most recently emitted op
>> is no longer the last op.
>> Instead, pass the op being emitted back from the allocator so
>> that we can link it to the label without needing to look it up.
>
> Oh, I meant to point out from whence this comes.
> The plugin uses a conditional

    size_t n_insns = qemu_plugin_tb_n_insns(tb);
    qemu_plugin_u64 quantum_insn =
        qemu_plugin_scoreboard_u64_in_struct(vcpus, vCPUTime, quantum_insn);
    /* count (and eventually trap) once per tb */
    qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
        tb, QEMU_PLUGIN_INLINE_ADD_U64, quantum_insn, n_insns);

>  ld_i32 tmp18,env,$0xffffffffffffdb10
>  mul_i32 tmp18,tmp18,$0x18
>  ext_i32_i64 tmp17,tmp18
>  add_i64 tmp17,tmp17,$0x575410edadc8

    qemu_plugin_register_vcpu_tb_exec_cond_cb(
        tb, every_quantum_insn,
        QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_COND_GE,
        quantum_insn, max_insn_per_quantum, NULL);

?

>  ld_i64 tmp21,tmp17,$0x0
>  brcond_i64 tmp21,$0x0,ltu,$L1
>  ld_i32 tmp18,env,$0xffffffffffffdb10
>  call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0
>  set_label $L1
>
> Note that the branch is X < 0 (unsigned), which is always false, and
> thus the branch is optimized away.

I'm obviously missing something reading this. How can TCG know the state
of the scoreboard variables and optimise away the branch?

>
>
> r~
Richard Henderson Sept. 13, 2024, 4:27 p.m. UTC | #3
On 9/13/24 03:23, Alex Bennée wrote:
>> Note that the branch is X < 0 (unsigned), which is always false, and
>> thus the branch is optimized away.
> 
> I'm obviously missing something reading this. How can TCG know the state
> of the scoreboard variables and optimise away the branch?

0 < 0 is of course false.

r~
Pierrick Bouvier Sept. 18, 2024, 6:43 p.m. UTC | #4
On 9/13/24 03:23, Alex Bennée wrote:
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> On 9/10/24 14:23, Richard Henderson wrote:
>>> With tcg_last_op(), we always get the last op of the stream.
>>> With TCGContext.emit_before_op, the most recently emitted op
>>> is no longer the last op.
>>> Instead, pass the op being emitted back from the allocator so
>>> that we can link it to the label without needing to look it up.
>>
>> Oh, I meant to point out from whence this comes.
>> The plugin uses a conditional
> 
>      size_t n_insns = qemu_plugin_tb_n_insns(tb);
>      qemu_plugin_u64 quantum_insn =
>          qemu_plugin_scoreboard_u64_in_struct(vcpus, vCPUTime, quantum_insn);
>      /* count (and eventually trap) once per tb */
>      qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>          tb, QEMU_PLUGIN_INLINE_ADD_U64, quantum_insn, n_insns);
> 
>>   ld_i32 tmp18,env,$0xffffffffffffdb10
>>   mul_i32 tmp18,tmp18,$0x18
>>   ext_i32_i64 tmp17,tmp18
>>   add_i64 tmp17,tmp17,$0x575410edadc8
> 
>      qemu_plugin_register_vcpu_tb_exec_cond_cb(
>          tb, every_quantum_insn,
>          QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_COND_GE,
>          quantum_insn, max_insn_per_quantum, NULL);
> 
> ?
> 
>>   ld_i64 tmp21,tmp17,$0x0
>>   brcond_i64 tmp21,$0x0,ltu,$L1
>>   ld_i32 tmp18,env,$0xffffffffffffdb10
>>   call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0
>>   set_label $L1
>>
>> Note that the branch is X < 0 (unsigned), which is always false, and
>> thus the branch is optimized away.
> 
> I'm obviously missing something reading this. How can TCG know the state
> of the scoreboard variables and optimise away the branch?
> 

The constant against which we compare scoreboard entry value is known at 
translation time.

>>
>>
>> r~
>