Message ID | 20241128213843.1023080-1-pierrick.bouvier@linaro.org |
---|---|
State | New |
Headers | show |
Series | [v2] plugins: optimize cpu_index code generation | expand |
On 11/28/24 15:38, Pierrick Bouvier wrote: > When running with a single vcpu, we can return a constant instead of a > load when accessing cpu_index. > A side effect is that all tcg operations using it are optimized, most > notably scoreboard access. > When running a simple loop in user-mode, the speedup is around 20%. > > Signed-off-by: Pierrick Bouvier<pierrick.bouvier@linaro.org> > > --- > > v2: > - no need to do a flush, as user-mode already does it when spawning a > second cpu (to honor CF_PARALLEL flags). > - change condition detection to use CF_PARALLEL instead > --- > accel/tcg/plugin-gen.c | 9 +++++++++ > 1 file changed, 9 insertions(+) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
Hi Richard, On 11/29/24 05:29, Richard Henderson wrote: > On 11/28/24 15:38, Pierrick Bouvier wrote: >> When running with a single vcpu, we can return a constant instead of a >> load when accessing cpu_index. >> A side effect is that all tcg operations using it are optimized, most >> notably scoreboard access. >> When running a simple loop in user-mode, the speedup is around 20%. >> >> Signed-off-by: Pierrick Bouvier<pierrick.bouvier@linaro.org> >> >> --- >> >> v2: >> - no need to do a flush, as user-mode already does it when spawning a >> second cpu (to honor CF_PARALLEL flags). >> - change condition detection to use CF_PARALLEL instead >> --- >> accel/tcg/plugin-gen.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) > > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > > r~ any chance you could pull that as part of one of your series? Thanks, Pierrick
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c index 0f47bfbb489..961d9305913 100644 --- a/accel/tcg/plugin-gen.c +++ b/accel/tcg/plugin-gen.c @@ -102,6 +102,15 @@ static void gen_disable_mem_helper(void) static TCGv_i32 gen_cpu_index(void) { + /* + * Optimize when we run with a single vcpu. All values using cpu_index, + * including scoreboard index, will be optimized out. + * User-mode calls tb_flush when setting this flag. In system-mode, all + * vcpus are created before generating code. + */ + if (!tcg_cflags_has(current_cpu, CF_PARALLEL)) { + return tcg_constant_i32(current_cpu->cpu_index); + } TCGv_i32 cpu_index = tcg_temp_ebb_new_i32(); tcg_gen_ld_i32(cpu_index, tcg_env, -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
When running with a single vcpu, we can return a constant instead of a load when accessing cpu_index. A side effect is that all tcg operations using it are optimized, most notably scoreboard access. When running a simple loop in user-mode, the speedup is around 20%. Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> --- v2: - no need to do a flush, as user-mode already does it when spawning a second cpu (to honor CF_PARALLEL flags). - change condition detection to use CF_PARALLEL instead --- accel/tcg/plugin-gen.c | 9 +++++++++ 1 file changed, 9 insertions(+)