[v1,2/7] accel/tcg: suppress IRQ check for special TBs

Message ID	20211123205729.2205806-3-alex.bennee@linaro.org
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org> To: qemu-devel@nongnu.org Subject: [PATCH v1 2/7] accel/tcg: suppress IRQ check for special TBs Date: Tue, 23 Nov 2021 20:57:24 +0000 Message-Id: <20211123205729.2205806-3-alex.bennee@linaro.org> In-Reply-To: <20211123205729.2205806-1-alex.bennee@linaro.org> References: <20211123205729.2205806-1-alex.bennee@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::431; envelope-from=alex.bennee@linaro.org; helo=mail-wr1-x431.google.com X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action Precedence: list Cc: fam@euphon.net, berrange@redhat.com, Richard Henderson <richard.henderson@linaro.org>, f4bug@amsat.org, stefanha@redhat.com, crosa@redhat.com, pbonzini@redhat.com, Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>, =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, aurelien@aurel32.net Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	more tcg, plugin, test and build fixes \| expand [for,6.2,v1,0/7] more tcg, plugin, test and build fixes [v1,1/7] softmmu: fix watchpoint-interrupt races [v1,2/7] accel/tcg: suppress IRQ check for special TBs [v1,3/7] tests/avocado: fix tcg_plugin mem access count test [v1,4/7] plugins/meson.build: fix linker issue with weird paths [v1,5/7] gdbstub: handle a potentially racing TaskState [v1,6/7] MAINTAINERS: Remove me as a reviewer for the build and test/avocado [v1,7/7] MAINTAINERS: Add section for Aarch64 GitLab custom runner

Message ID

20211123205729.2205806-3-alex.bennee@linaro.org

State

New

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PATCH  v1 2/7] accel/tcg: suppress IRQ check for special TBs
Date: Tue, 23 Nov 2021 20:57:24 +0000
Message-Id: <20211123205729.2205806-3-alex.bennee@linaro.org>
In-Reply-To: <20211123205729.2205806-1-alex.bennee@linaro.org>
References: <20211123205729.2205806-1-alex.bennee@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2a00:1450:4864:20::431;
 envelope-from=alex.bennee@linaro.org; helo=mail-wr1-x431.google.com
X-Spam_score_int: -12
X-Spam_score: -1.3
X-Spam_bar: -
X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: fam@euphon.net, berrange@redhat.com,
 Richard Henderson <richard.henderson@linaro.org>, f4bug@amsat.org,
 stefanha@redhat.com, crosa@redhat.com, pbonzini@redhat.com,
 Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>,
 =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, aurelien@aurel32.net
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Series

more tcg, plugin, test and build fixes | expand

Commit Message

Alex Bennée Nov. 23, 2021, 8:57 p.m. UTC

Generally when we set cpu->cflags_next_tb it is because we want to
carefully control the execution of the next TB. Currently there is a
race that causes cflags_next_tb to get ignored if an IRQ is processed
before we execute any actual instructions.

To avoid this we introduce a new compiler flag: CF_NOIRQ to suppress
this check in the generated code so we know we will definitely execute
the next block.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245
---
 include/exec/exec-all.h   |  1 +
 include/exec/gen-icount.h | 21 +++++++++++++++++----
 accel/tcg/cpu-exec.c      | 14 ++++++++++++++
 3 files changed, 32 insertions(+), 4 deletions(-)

Comments

Richard Henderson Nov. 24, 2021, 9:23 a.m. UTC | #1

On 11/23/21 9:57 PM, Alex Bennée wrote:
> Generally when we set cpu->cflags_next_tb it is because we want to
> carefully control the execution of the next TB. Currently there is a
> race that causes cflags_next_tb to get ignored if an IRQ is processed
> before we execute any actual instructions.
> 
> To avoid this we introduce a new compiler flag: CF_NOIRQ to suppress
> this check in the generated code so we know we will definitely execute
> the next block.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245
> ---
>   include/exec/exec-all.h   |  1 +
>   include/exec/gen-icount.h | 21 +++++++++++++++++----
>   accel/tcg/cpu-exec.c      | 14 ++++++++++++++
>   3 files changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 6bb2a0f7ec..35d8e93976 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -503,6 +503,7 @@ struct TranslationBlock {
>   #define CF_USE_ICOUNT    0x00020000
>   #define CF_INVALID       0x00040000 /* TB is stale. Set with @jmp_lock held */
>   #define CF_PARALLEL      0x00080000 /* Generate code for a parallel context */
> +#define CF_NOIRQ         0x00100000 /* Generate an uninterruptible TB */
>   #define CF_CLUSTER_MASK  0xff000000 /* Top 8 bits are cluster ID */
>   #define CF_CLUSTER_SHIFT 24
>   
> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> index 610cba58fe..c57204ddad 100644
> --- a/include/exec/gen-icount.h
> +++ b/include/exec/gen-icount.h
> @@ -21,7 +21,6 @@ static inline void gen_tb_start(const TranslationBlock *tb)
>   {
>       TCGv_i32 count;
>   
> -    tcg_ctx->exitreq_label = gen_new_label();
>       if (tb_cflags(tb) & CF_USE_ICOUNT) {
>           count = tcg_temp_local_new_i32();
>       } else {
> @@ -42,7 +41,19 @@ static inline void gen_tb_start(const TranslationBlock *tb)
>           icount_start_insn = tcg_last_op();
>       }
>   
> -    tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
> +    /*
> +     * Emit the check against icount_decr.u32 to see if we should exit
> +     * unless we suppress the check with CF_NOIRQ. If we are using
> +     * icount and have suppressed interruption the higher level code
> +     * should have ensured we don't run more instructions than the
> +     * budget.
> +     */
> +    if (tb_cflags(tb) & CF_NOIRQ) {
> +        tcg_ctx->exitreq_label = NULL;
> +    } else {
> +        tcg_ctx->exitreq_label = gen_new_label();
> +        tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
> +    }
>   
>       if (tb_cflags(tb) & CF_USE_ICOUNT) {
>           tcg_gen_st16_i32(count, cpu_env,
> @@ -74,8 +85,10 @@ static inline void gen_tb_end(const TranslationBlock *tb, int num_insns)
>                              tcgv_i32_arg(tcg_constant_i32(num_insns)));
>       }
>   
> -    gen_set_label(tcg_ctx->exitreq_label);
> -    tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
> +    if (tcg_ctx->exitreq_label) {
> +        gen_set_label(tcg_ctx->exitreq_label);
> +        tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
> +    }
>   }
>   
>   #endif

Split patch here, I think.


> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 9cb892e326..9e3ed42ceb 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -721,6 +721,15 @@ static inline bool need_replay_interrupt(int interrupt_request)
>   static inline bool cpu_handle_interrupt(CPUState *cpu,
>                                           TranslationBlock **last_tb)
>   {
> +    /*
> +     * If we have special cflags lets not get distracted with IRQs. We
> +     * shall exit the loop as soon as the next TB completes what it
> +     * needs to do.
> +     */

We will *probably* exit the loop, I think.

With watchpoints, we expect to perform the same memory operation, which is expected to hit 
the watchpoint_hit check in cpu_check_watchpoint, which will raise CPU_INTERRUPT_DEBUG.

With SMC, we flush all TBs from the current page, re-execute one insn, and then (probably) 
have to exit to generate the next tb.

With icount, in cpu_loop_exec_tb, we have no idea what's coming; we only know that we want 
no more than N insns to execute.

None of which directly exit the loop -- we need the IRQ check at the beginning of the 
*next* TB to exit the loop.

If we want to force an exit from the loop, we need CF_NO_GOTO_TB | CF_NO_GOTO_PTR.  Which 
is probably a good idea, at least for watchpoints; exit_tb is the fastest way out of the 
TB to recognize the debug interrupt.

The icount usage I find a bit odd.  I don't think that we should suppress an IRQ in that 
case -- we can have up to 510 insns outstanding on icount_budget, which is clearly far too 
many to have IRQs disabled.  I think we should not use cflags_next_tb for this at all, but 
should apply the icount limit later somehow, because an IRQ *could* be recognized 
immediately, with the first TB of the interrupt handler running with limited budget, and 
the icount tick being recognized at that point.

> +             * As we don't want this special TB being interrupted by
> +             * some sort of asynchronous event we apply CF_NOIRQ to
> +             * disable the usual event checking.
>                */
>               cflags = cpu->cflags_next_tb;
>               if (cflags == -1) {
>                   cflags = curr_cflags(cpu);
>               } else {
> +                cflags |= CF_NOIRQ;
>                   cpu->cflags_next_tb = -1;
>               }

Is it clearer to add NOIRQ here, or back where we set cflags_next_tb in the first place?


r~

Alex Bennée Nov. 24, 2021, 10:24 a.m. UTC | #2

Richard Henderson <richard.henderson@linaro.org> writes:

> On 11/23/21 9:57 PM, Alex Bennée wrote:
>> Generally when we set cpu->cflags_next_tb it is because we want to
>> carefully control the execution of the next TB. Currently there is a
>> race that causes cflags_next_tb to get ignored if an IRQ is processed
>> before we execute any actual instructions.
>> To avoid this we introduce a new compiler flag: CF_NOIRQ to suppress
>> this check in the generated code so we know we will definitely execute
>> the next block.
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Cc: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245
>> ---
>>   include/exec/exec-all.h   |  1 +
>>   include/exec/gen-icount.h | 21 +++++++++++++++++----
>>   accel/tcg/cpu-exec.c      | 14 ++++++++++++++
>>   3 files changed, 32 insertions(+), 4 deletions(-)
>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> index 6bb2a0f7ec..35d8e93976 100644
>> --- a/include/exec/exec-all.h
>> +++ b/include/exec/exec-all.h
>> @@ -503,6 +503,7 @@ struct TranslationBlock {
>>   #define CF_USE_ICOUNT    0x00020000
>>   #define CF_INVALID       0x00040000 /* TB is stale. Set with @jmp_lock held */
>>   #define CF_PARALLEL      0x00080000 /* Generate code for a parallel context */
>> +#define CF_NOIRQ         0x00100000 /* Generate an uninterruptible TB */
>>   #define CF_CLUSTER_MASK  0xff000000 /* Top 8 bits are cluster ID */
>>   #define CF_CLUSTER_SHIFT 24
>>   diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
>> index 610cba58fe..c57204ddad 100644
>> --- a/include/exec/gen-icount.h
>> +++ b/include/exec/gen-icount.h
>> @@ -21,7 +21,6 @@ static inline void gen_tb_start(const TranslationBlock *tb)
>>   {
>>       TCGv_i32 count;
>>   -    tcg_ctx->exitreq_label = gen_new_label();
>>       if (tb_cflags(tb) & CF_USE_ICOUNT) {
>>           count = tcg_temp_local_new_i32();
>>       } else {
>> @@ -42,7 +41,19 @@ static inline void gen_tb_start(const TranslationBlock *tb)
>>           icount_start_insn = tcg_last_op();
>>       }
>>   -    tcg_gen_brcondi_i32(TCG_COND_LT, count, 0,
>> tcg_ctx->exitreq_label);
>> +    /*
>> +     * Emit the check against icount_decr.u32 to see if we should exit
>> +     * unless we suppress the check with CF_NOIRQ. If we are using
>> +     * icount and have suppressed interruption the higher level code
>> +     * should have ensured we don't run more instructions than the
>> +     * budget.
>> +     */
>> +    if (tb_cflags(tb) & CF_NOIRQ) {
>> +        tcg_ctx->exitreq_label = NULL;
>> +    } else {
>> +        tcg_ctx->exitreq_label = gen_new_label();
>> +        tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
>> +    }
>>         if (tb_cflags(tb) & CF_USE_ICOUNT) {
>>           tcg_gen_st16_i32(count, cpu_env,
>> @@ -74,8 +85,10 @@ static inline void gen_tb_end(const TranslationBlock *tb, int num_insns)
>>                              tcgv_i32_arg(tcg_constant_i32(num_insns)));
>>       }
>>   -    gen_set_label(tcg_ctx->exitreq_label);
>> -    tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
>> +    if (tcg_ctx->exitreq_label) {
>> +        gen_set_label(tcg_ctx->exitreq_label);
>> +        tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
>> +    }
>>   }
>>     #endif
>
> Split patch here, I think.

Not including the header to cpu_handle_interrupt? 

>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> index 9cb892e326..9e3ed42ceb 100644
>> --- a/accel/tcg/cpu-exec.c
>> +++ b/accel/tcg/cpu-exec.c
>> @@ -721,6 +721,15 @@ static inline bool need_replay_interrupt(int interrupt_request)
>>   static inline bool cpu_handle_interrupt(CPUState *cpu,
>>                                           TranslationBlock **last_tb)
>>   {
>> +    /*
>> +     * If we have special cflags lets not get distracted with IRQs. We
>> +     * shall exit the loop as soon as the next TB completes what it
>> +     * needs to do.
>> +     */
>
> We will *probably* exit the loop, I think.
>
> With watchpoints, we expect to perform the same memory operation,
> which is expected to hit the watchpoint_hit check in
> cpu_check_watchpoint, which will raise CPU_INTERRUPT_DEBUG.
>
> With SMC, we flush all TBs from the current page, re-execute one insn,
> and then (probably) have to exit to generate the next tb.
>
> With icount, in cpu_loop_exec_tb, we have no idea what's coming; we
> only know that we want no more than N insns to execute.

I think technically we do because all asynchronous interrupt are tied to
the icount (which is part of the budget calculation - icount_get_limit).
In theory we could drop the interrupt check altogether in icount mode
because we should always end and exit to the outer loop when a timer is
going to expire.

> None of which directly exit the loop -- we need the IRQ check at the
> beginning of the *next* TB to exit the loop.
>
> If we want to force an exit from the loop, we need CF_NO_GOTO_TB |
> CF_NO_GOTO_PTR.  Which is probably a good idea, at least for
> watchpoints; exit_tb is the fastest way out of the TB to recognize the
> debug interrupt.
>
> The icount usage I find a bit odd.  I don't think that we should
> suppress an IRQ in that case -- we can have up to 510 insns
> outstanding on icount_budget, which is clearly far too many to have
> IRQs disabled.  I think we should not use cflags_next_tb for this at
> all, but should apply the icount limit later somehow, because an IRQ
> *could* be recognized immediately, with the first TB of the interrupt
> handler running with limited budget, and the icount tick being
> recognized at that point.

I wonder what would happen if we checked u16.high in icount mode? No
timer should ever set it - although I guess it could get set during an
IO operation.

Perhaps really all icount exit checks should be done at the end of
blocks? I suspect that breaks too many assumptions in the rest of the
code.

>
>> +             * As we don't want this special TB being interrupted by
>> +             * some sort of asynchronous event we apply CF_NOIRQ to
>> +             * disable the usual event checking.
>>                */
>>               cflags = cpu->cflags_next_tb;
>>               if (cflags == -1) {
>>                   cflags = curr_cflags(cpu);
>>               } else {
>> +                cflags |= CF_NOIRQ;
>>                   cpu->cflags_next_tb = -1;
>>               }
>
> Is it clearer to add NOIRQ here, or back where we set cflags_next_tb
> in the first place?

Are there cases of setting cpu->cflags_next_tb which we are happy to get
preempted by asynchronous events? I guess in the SMC case it wouldn't
matter because when we get back from the IRQ things get reset?

>
>
> r~

Richard Henderson Nov. 24, 2021, 2:42 p.m. UTC | #3

On 11/24/21 11:24 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> On 11/23/21 9:57 PM, Alex Bennée wrote:
>>> Generally when we set cpu->cflags_next_tb it is because we want to
>>> carefully control the execution of the next TB. Currently there is a
>>> race that causes cflags_next_tb to get ignored if an IRQ is processed
>>> before we execute any actual instructions.
>>> To avoid this we introduce a new compiler flag: CF_NOIRQ to suppress
>>> this check in the generated code so we know we will definitely execute
>>> the next block.
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>> Cc: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245
>>> ---
>>>    include/exec/exec-all.h   |  1 +
>>>    include/exec/gen-icount.h | 21 +++++++++++++++++----
>>>    accel/tcg/cpu-exec.c      | 14 ++++++++++++++
>>>    3 files changed, 32 insertions(+), 4 deletions(-)
>>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>>> index 6bb2a0f7ec..35d8e93976 100644
>>> --- a/include/exec/exec-all.h
>>> +++ b/include/exec/exec-all.h
>>> @@ -503,6 +503,7 @@ struct TranslationBlock {
>>>    #define CF_USE_ICOUNT    0x00020000
>>>    #define CF_INVALID       0x00040000 /* TB is stale. Set with @jmp_lock held */
>>>    #define CF_PARALLEL      0x00080000 /* Generate code for a parallel context */
>>> +#define CF_NOIRQ         0x00100000 /* Generate an uninterruptible TB */
>>>    #define CF_CLUSTER_MASK  0xff000000 /* Top 8 bits are cluster ID */
>>>    #define CF_CLUSTER_SHIFT 24
>>>    diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
>>> index 610cba58fe..c57204ddad 100644
>>> --- a/include/exec/gen-icount.h
>>> +++ b/include/exec/gen-icount.h
>>> @@ -21,7 +21,6 @@ static inline void gen_tb_start(const TranslationBlock *tb)
>>>    {
>>>        TCGv_i32 count;
>>>    -    tcg_ctx->exitreq_label = gen_new_label();
>>>        if (tb_cflags(tb) & CF_USE_ICOUNT) {
>>>            count = tcg_temp_local_new_i32();
>>>        } else {
>>> @@ -42,7 +41,19 @@ static inline void gen_tb_start(const TranslationBlock *tb)
>>>            icount_start_insn = tcg_last_op();
>>>        }
>>>    -    tcg_gen_brcondi_i32(TCG_COND_LT, count, 0,
>>> tcg_ctx->exitreq_label);
>>> +    /*
>>> +     * Emit the check against icount_decr.u32 to see if we should exit
>>> +     * unless we suppress the check with CF_NOIRQ. If we are using
>>> +     * icount and have suppressed interruption the higher level code
>>> +     * should have ensured we don't run more instructions than the
>>> +     * budget.
>>> +     */
>>> +    if (tb_cflags(tb) & CF_NOIRQ) {
>>> +        tcg_ctx->exitreq_label = NULL;
>>> +    } else {
>>> +        tcg_ctx->exitreq_label = gen_new_label();
>>> +        tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
>>> +    }
>>>          if (tb_cflags(tb) & CF_USE_ICOUNT) {
>>>            tcg_gen_st16_i32(count, cpu_env,
>>> @@ -74,8 +85,10 @@ static inline void gen_tb_end(const TranslationBlock *tb, int num_insns)
>>>                               tcgv_i32_arg(tcg_constant_i32(num_insns)));
>>>        }
>>>    -    gen_set_label(tcg_ctx->exitreq_label);
>>> -    tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
>>> +    if (tcg_ctx->exitreq_label) {
>>> +        gen_set_label(tcg_ctx->exitreq_label);
>>> +        tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
>>> +    }
>>>    }
>>>      #endif
>>
>> Split patch here, I think.
> 
> Not including the header to cpu_handle_interrupt?

Correct.  Introduce CF_NOIRQ without using it yet.

>> With icount, in cpu_loop_exec_tb, we have no idea what's coming; we
>> only know that we want no more than N insns to execute.
> 
> I think technically we do because all asynchronous interrupt are tied to
> the icount (which is part of the budget calculation - icount_get_limit).

Are you sure that's plain icount and not replay?
In icount_get_limit we talk about timers, not any other asynchronous interrupt, like a 
keyboard press.

> In theory we could drop the interrupt check altogether in icount mode
> because we should always end and exit to the outer loop when a timer is
> going to expire.

But we know nothing about synchronous exceptions or anything else.

> I wonder what would happen if we checked u16.high in icount mode? No
> timer should ever set it - although I guess it could get set during an
> IO operation.

Uh, we do, via u32?  I'm not sure what you're saying here.

> Perhaps really all icount exit checks should be done at the end of
> blocks? I suspect that breaks too many assumptions in the rest of the
> code.

There are multiple exits at the end of the block, which is why we do the check at the 
beginning of the next block.  Besides, we have to check that the block we're about to 
execute is within budget.

> Are there cases of setting cpu->cflags_next_tb which we are happy to get
> preempted by asynchronous events?

Well, icount.

> I guess in the SMC case it wouldn't
> matter because when we get back from the IRQ things get reset?

SMC probably would work with an interrupt, but we'd wind up having to repeat the process 
of flushing all TBs on the page, so we might as well perform the one store and get it over 
with.


r~

Alex Bennée Nov. 24, 2021, 4:04 p.m. UTC | #4

Richard Henderson <richard.henderson@linaro.org> writes:

> On 11/24/21 11:24 AM, Alex Bennée wrote:
>> Richard Henderson <richard.henderson@linaro.org> writes:
>> 
>>> On 11/23/21 9:57 PM, Alex Bennée wrote:
>>>> Generally when we set cpu->cflags_next_tb it is because we want to
>>>> carefully control the execution of the next TB. Currently there is a
>>>> race that causes cflags_next_tb to get ignored if an IRQ is processed
>>>> before we execute any actual instructions.
>>>> To avoid this we introduce a new compiler flag: CF_NOIRQ to suppress
>>>> this check in the generated code so we know we will definitely execute
>>>> the next block.
>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>> Cc: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru>
>>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245
>>>> ---
>>>>    include/exec/exec-all.h   |  1 +
>>>>    include/exec/gen-icount.h | 21 +++++++++++++++++----
>>>>    accel/tcg/cpu-exec.c      | 14 ++++++++++++++
>>>>    3 files changed, 32 insertions(+), 4 deletions(-)
>>>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>>>> index 6bb2a0f7ec..35d8e93976 100644
>>>> --- a/include/exec/exec-all.h
>>>> +++ b/include/exec/exec-all.h
>>>> @@ -503,6 +503,7 @@ struct TranslationBlock {
>>>>    #define CF_USE_ICOUNT    0x00020000
>>>>    #define CF_INVALID       0x00040000 /* TB is stale. Set with @jmp_lock held */
>>>>    #define CF_PARALLEL      0x00080000 /* Generate code for a parallel context */
>>>> +#define CF_NOIRQ         0x00100000 /* Generate an uninterruptible TB */
>>>>    #define CF_CLUSTER_MASK  0xff000000 /* Top 8 bits are cluster ID */
>>>>    #define CF_CLUSTER_SHIFT 24
>>>>    diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
>>>> index 610cba58fe..c57204ddad 100644
>>>> --- a/include/exec/gen-icount.h
>>>> +++ b/include/exec/gen-icount.h
>>>> @@ -21,7 +21,6 @@ static inline void gen_tb_start(const TranslationBlock *tb)
>>>>    {
>>>>        TCGv_i32 count;
>>>>    -    tcg_ctx->exitreq_label = gen_new_label();
>>>>        if (tb_cflags(tb) & CF_USE_ICOUNT) {
>>>>            count = tcg_temp_local_new_i32();
>>>>        } else {
>>>> @@ -42,7 +41,19 @@ static inline void gen_tb_start(const TranslationBlock *tb)
>>>>            icount_start_insn = tcg_last_op();
>>>>        }
>>>>    -    tcg_gen_brcondi_i32(TCG_COND_LT, count, 0,
>>>> tcg_ctx->exitreq_label);
>>>> +    /*
>>>> +     * Emit the check against icount_decr.u32 to see if we should exit
>>>> +     * unless we suppress the check with CF_NOIRQ. If we are using
>>>> +     * icount and have suppressed interruption the higher level code
>>>> +     * should have ensured we don't run more instructions than the
>>>> +     * budget.
>>>> +     */
>>>> +    if (tb_cflags(tb) & CF_NOIRQ) {
>>>> +        tcg_ctx->exitreq_label = NULL;
>>>> +    } else {
>>>> +        tcg_ctx->exitreq_label = gen_new_label();
>>>> +        tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
>>>> +    }
>>>>          if (tb_cflags(tb) & CF_USE_ICOUNT) {
>>>>            tcg_gen_st16_i32(count, cpu_env,
>>>> @@ -74,8 +85,10 @@ static inline void gen_tb_end(const TranslationBlock *tb, int num_insns)
>>>>                               tcgv_i32_arg(tcg_constant_i32(num_insns)));
>>>>        }
>>>>    -    gen_set_label(tcg_ctx->exitreq_label);
>>>> -    tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
>>>> +    if (tcg_ctx->exitreq_label) {
>>>> +        gen_set_label(tcg_ctx->exitreq_label);
>>>> +        tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
>>>> +    }
>>>>    }
>>>>      #endif
>>>
>>> Split patch here, I think.
>> Not including the header to cpu_handle_interrupt?
>
> Correct.  Introduce CF_NOIRQ without using it yet.
>
>>> With icount, in cpu_loop_exec_tb, we have no idea what's coming; we
>>> only know that we want no more than N insns to execute.
>> I think technically we do because all asynchronous interrupt are
>> tied to
>> the icount (which is part of the budget calculation - icount_get_limit).
>
> Are you sure that's plain icount and not replay?
> In icount_get_limit we talk about timers, not any other asynchronous
> interrupt, like a keyboard press.

Hmm right - and I guess other I/O during record of icount. I guess it's
only fully deterministic (outside of replay) if it's a totally
"isolated" from external system events.

>> In theory we could drop the interrupt check altogether in icount mode
>> because we should always end and exit to the outer loop when a timer is
>> going to expire.
>
> But we know nothing about synchronous exceptions or anything else.

Hmm I didn't think we needed to care about synchronous events but now
you have me wandering what happens in icount mode when an exception
happens mid-block?

>> I wonder what would happen if we checked u16.high in icount mode? No
>> timer should ever set it - although I guess it could get set during an
>> IO operation.
>
> Uh, we do, via u32?  I'm not sure what you're saying here.

I mean should we detect if something has called cpu_exit() mid block
rather than just icount expiring.

<snip>
>> Are there cases of setting cpu->cflags_next_tb which we are happy to get
>> preempted by asynchronous events?
>
> Well, icount.
>
>> I guess in the SMC case it wouldn't
>> matter because when we get back from the IRQ things get reset?
>
> SMC probably would work with an interrupt, but we'd wind up having to
> repeat the process of flushing all TBs on the page, so we might as
> well perform the one store and get it over with.

I guess. Makes the patch a bit noisier though...

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6bb2a0f7ec..35d8e93976 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -503,6 +503,7 @@  struct TranslationBlock {
 #define CF_USE_ICOUNT    0x00020000
 #define CF_INVALID       0x00040000 /* TB is stale. Set with @jmp_lock held */
 #define CF_PARALLEL      0x00080000 /* Generate code for a parallel context */
+#define CF_NOIRQ         0x00100000 /* Generate an uninterruptible TB */
 #define CF_CLUSTER_MASK  0xff000000 /* Top 8 bits are cluster ID */
 #define CF_CLUSTER_SHIFT 24
 
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 610cba58fe..c57204ddad 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -21,7 +21,6 @@  static inline void gen_tb_start(const TranslationBlock *tb)
 {
     TCGv_i32 count;
 
-    tcg_ctx->exitreq_label = gen_new_label();
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
         count = tcg_temp_local_new_i32();
     } else {
@@ -42,7 +41,19 @@  static inline void gen_tb_start(const TranslationBlock *tb)
         icount_start_insn = tcg_last_op();
     }
 
-    tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
+    /*
+     * Emit the check against icount_decr.u32 to see if we should exit
+     * unless we suppress the check with CF_NOIRQ. If we are using
+     * icount and have suppressed interruption the higher level code
+     * should have ensured we don't run more instructions than the
+     * budget.
+     */
+    if (tb_cflags(tb) & CF_NOIRQ) {
+        tcg_ctx->exitreq_label = NULL;
+    } else {
+        tcg_ctx->exitreq_label = gen_new_label();
+        tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
+    }
 
     if (tb_cflags(tb) & CF_USE_ICOUNT) {
         tcg_gen_st16_i32(count, cpu_env,
@@ -74,8 +85,10 @@  static inline void gen_tb_end(const TranslationBlock *tb, int num_insns)
                            tcgv_i32_arg(tcg_constant_i32(num_insns)));
     }
 
-    gen_set_label(tcg_ctx->exitreq_label);
-    tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
+    if (tcg_ctx->exitreq_label) {
+        gen_set_label(tcg_ctx->exitreq_label);
+        tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
+    }
 }
 
 #endif
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 9cb892e326..9e3ed42ceb 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -721,6 +721,15 @@  static inline bool need_replay_interrupt(int interrupt_request)
 static inline bool cpu_handle_interrupt(CPUState *cpu,
                                         TranslationBlock **last_tb)
 {
+    /*
+     * If we have special cflags lets not get distracted with IRQs. We
+     * shall exit the loop as soon as the next TB completes what it
+     * needs to do.
+     */
+    if (cpu->cflags_next_tb != -1) {
+        return false;
+    }
+
     /* Clear the interrupt flag now since we're processing
      * cpu->interrupt_request and cpu->exit_request.
      * Ensure zeroing happens before reading cpu->exit_request or
@@ -954,11 +963,16 @@  int cpu_exec(CPUState *cpu)
              * after-access watchpoints.  Since this request should never
              * have CF_INVALID set, -1 is a convenient invalid value that
              * does not require tcg headers for cpu_common_reset.
+             *
+             * As we don't want this special TB being interrupted by
+             * some sort of asynchronous event we apply CF_NOIRQ to
+             * disable the usual event checking.
              */
             cflags = cpu->cflags_next_tb;
             if (cflags == -1) {
                 cflags = curr_cflags(cpu);
             } else {
+                cflags |= CF_NOIRQ;
                 cpu->cflags_next_tb = -1;
             }

[v1,2/7] accel/tcg: suppress IRQ check for special TBs

Commit Message

Comments

Patch