[PULL,00/46] tcg patch queue

Message ID	20210205225650.1330794-1-richard.henderson@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [PULL 00/46] tcg patch queue Date: Fri, 5 Feb 2021 12:56:04 -1000 Message-Id: <20210205225650.1330794-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::102f; envelope-from=richard.henderson@linaro.org; helo=mail-pj1-x102f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: peter.maydell@linaro.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	tcg patch queue \| expand [PULL,00/46] tcg patch queue [PULL,01/46] tcg/s390: Fix compare instruction from extended-immediate facility [PULL,02/46] exec/cpu-defs: Remove TCG backends dependency [PULL,03/46] tcg/aarch64: Do not convert TCGArg to temps that are not temps [PULL,04/46] configure: Fix --enable-tcg-interpreter [PULL,05/46] tcg/tci: Make tci_tb_ptr thread-local [PULL,06/46] tcg/tci: Implement INDEX_op_ld16s_i32 [PULL,07/46] tcg/tci: Implement INDEX_op_ld8s_i64 [PULL,08/46] tcg/tci: Inline tci_write_reg32s into the only caller [PULL,09/46] tcg/tci: Inline tci_write_reg8 into its callers [PULL,10/46] tcg/tci: Inline tci_write_reg16 into the only caller [PULL,11/46] tcg/tci: Inline tci_write_reg32 into all callers [PULL,12/46] tcg/tci: Inline tci_write_reg64 into 64-bit callers [PULL,13/46] tcg/tci: Merge INDEX_op_ld8u_{i32,i64} [PULL,14/46] tcg/tci: Merge INDEX_op_ld8s_{i32,i64} [PULL,15/46] tcg/tci: Merge INDEX_op_ld16u_{i32,i64} [PULL,16/46] tcg/tci: Merge INDEX_op_ld16s_{i32,i64} [PULL,17/46] tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64} [PULL,18/46] tcg/tci: Merge INDEX_op_st8_{i32,i64} [PULL,19/46] tcg/tci: Merge INDEX_op_st16_{i32,i64} [PULL,20/46] tcg/tci: Move stack bounds check to compile-time [PULL,21/46] tcg/tci: Merge INDEX_op_{st_i32,st32_i64} [PULL,22/46] tcg/tci: Use g_assert_not_reached [PULL,23/46] tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_* [PULL,24/46] tcg/tci: Implement 64-bit division [PULL,25/46] tcg/tci: Remove TODO as unused [PULL,26/46] tcg/tci: Restrict TCG_TARGET_NB_REGS to 16 [PULL,27/46] tcg/tci: Fix TCG_REG_R4 misusage [PULL,28/46] tcg/tci: Remove TCG_CONST [PULL,29/46] cpu: Introduce TCGCpuOperations struct [PULL,30/46] target/riscv: remove CONFIG_TCG, as it is always TCG [PULL,31/46] accel/tcg: split TCG-only code from cpu_exec_realizefn [PULL,32/46] cpu: Move synchronize_from_tb() to tcg_ops [PULL,33/46] cpu: Move cpu_exec_* to tcg_ops [PULL,34/46] cpu: Move tlb_fill to tcg_ops [PULL,35/46] cpu: Move debug_excp_handler to tcg_ops [PULL,36/46] target/arm: do not use cc->do_interrupt for KVM directly [PULL,37/46] cpu: move cc->do_interrupt to tcg_ops [PULL,38/46] cpu: move cc->transaction_failed to tcg_ops [PULL,39/46] cpu: move do_unaligned_access to tcg_ops [PULL,40/46] physmem: make watchpoint checking code TCG-only [PULL,41/46] cpu: move adjust_watchpoint_address to tcg_ops [PULL,42/46] cpu: move debug_check_watchpoint to tcg_ops [PULL,43/46] cpu: tcg_ops: move to tcg-cpu-ops.h, keep a pointer in CPUClass [PULL,44/46] accel: extend AccelState and AccelClass to user-mode [PULL,45/46] accel: replace struct CpusAccel with AccelOpsClass [PULL,46/46] accel: introduce AccelCPUClass extending CPUClass

Richard Henderson Feb. 5, 2021, 10:56 p.m. UTC

The following changes since commit d0dddab40e472ba62b5f43f11cc7dba085dabe71:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging (2021-02-05 15:27:02 +0000)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210205

for you to fetch changes up to fb6916dd6ca8bb4b42d44baba9c67ecaf2279577:

  accel: introduce AccelCPUClass extending CPUClass (2021-02-05 10:24:15 -1000)

----------------------------------------------------------------
TCGCPUOps cleanups (claudio)
tcg/s390 compare fix (phil)
tcg/aarch64 rotli_vec fix
tcg/tci cleanups and fixes

----------------------------------------------------------------
Claudio Fontana (13):
      target/riscv: remove CONFIG_TCG, as it is always TCG
      accel/tcg: split TCG-only code from cpu_exec_realizefn
      target/arm: do not use cc->do_interrupt for KVM directly
      cpu: move cc->do_interrupt to tcg_ops
      cpu: move cc->transaction_failed to tcg_ops
      cpu: move do_unaligned_access to tcg_ops
      physmem: make watchpoint checking code TCG-only
      cpu: move adjust_watchpoint_address to tcg_ops
      cpu: move debug_check_watchpoint to tcg_ops
      cpu: tcg_ops: move to tcg-cpu-ops.h, keep a pointer in CPUClass
      accel: extend AccelState and AccelClass to user-mode
      accel: replace struct CpusAccel with AccelOpsClass
      accel: introduce AccelCPUClass extending CPUClass

Eduardo Habkost (5):
      cpu: Introduce TCGCpuOperations struct
      cpu: Move synchronize_from_tb() to tcg_ops
      cpu: Move cpu_exec_* to tcg_ops
      cpu: Move tlb_fill to tcg_ops
      cpu: Move debug_excp_handler to tcg_ops

Philippe Mathieu-Daudé (2):
      tcg/s390: Fix compare instruction from extended-immediate facility
      exec/cpu-defs: Remove TCG backends dependency

Richard Henderson (24):
      tcg/aarch64: Do not convert TCGArg to temps that are not temps
      configure: Fix --enable-tcg-interpreter
      tcg/tci: Make tci_tb_ptr thread-local
      tcg/tci: Inline tci_write_reg32s into the only caller
      tcg/tci: Inline tci_write_reg8 into its callers
      tcg/tci: Inline tci_write_reg16 into the only caller
      tcg/tci: Inline tci_write_reg32 into all callers
      tcg/tci: Inline tci_write_reg64 into 64-bit callers
      tcg/tci: Merge INDEX_op_ld8u_{i32,i64}
      tcg/tci: Merge INDEX_op_ld8s_{i32,i64}
      tcg/tci: Merge INDEX_op_ld16u_{i32,i64}
      tcg/tci: Merge INDEX_op_ld16s_{i32,i64}
      tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64}
      tcg/tci: Merge INDEX_op_st8_{i32,i64}
      tcg/tci: Merge INDEX_op_st16_{i32,i64}
      tcg/tci: Move stack bounds check to compile-time
      tcg/tci: Merge INDEX_op_{st_i32,st32_i64}
      tcg/tci: Use g_assert_not_reached
      tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_*
      tcg/tci: Implement 64-bit division
      tcg/tci: Remove TODO as unused
      tcg/tci: Restrict TCG_TARGET_NB_REGS to 16
      tcg/tci: Fix TCG_REG_R4 misusage
      tcg/tci: Remove TCG_CONST

Stefan Weil (2):
      tcg/tci: Implement INDEX_op_ld16s_i32
      tcg/tci: Implement INDEX_op_ld8s_i64

 configure                                          |   5 +-
 accel/accel-softmmu.h                              |  15 +
 accel/kvm/kvm-cpus.h                               |   2 -
 .../{tcg-cpus-icount.h => tcg-accel-ops-icount.h}  |   2 +
 accel/tcg/tcg-accel-ops-mttcg.h                    |  19 +
 accel/tcg/{tcg-cpus-rr.h => tcg-accel-ops-rr.h}    |   0
 accel/tcg/{tcg-cpus.h => tcg-accel-ops.h}          |   6 +-
 include/exec/cpu-all.h                             |  11 +-
 include/exec/cpu-defs.h                            |   3 -
 include/exec/exec-all.h                            |   2 +-
 include/hw/boards.h                                |   2 +-
 include/hw/core/accel-cpu.h                        |  38 ++
 include/hw/core/cpu.h                              |  86 +---
 include/hw/core/tcg-cpu-ops.h                      |  97 +++++
 include/{sysemu => qemu}/accel.h                   |  16 +-
 include/sysemu/accel-ops.h                         |  45 ++
 include/sysemu/cpus.h                              |  26 +-
 include/sysemu/hvf.h                               |   2 +-
 include/sysemu/kvm.h                               |   2 +-
 include/sysemu/kvm_int.h                           |   2 +-
 target/arm/internals.h                             |   6 +
 target/i386/hax/{hax-cpus.h => hax-accel-ops.h}    |   2 -
 target/i386/hax/hax-windows.h                      |   2 +-
 target/i386/hvf/{hvf-cpus.h => hvf-accel-ops.h}    |   2 -
 target/i386/hvf/hvf-i386.h                         |   2 +-
 target/i386/whpx/{whpx-cpus.h => whpx-accel-ops.h} |   2 -
 tcg/tci/tcg-target-con-set.h                       |   6 +-
 tcg/tci/tcg-target.h                               |  37 +-
 accel/accel-common.c                               | 105 +++++
 accel/{accel.c => accel-softmmu.c}                 |  61 ++-
 accel/accel-user.c                                 |  24 ++
 accel/kvm/{kvm-cpus.c => kvm-accel-ops.c}          |  28 +-
 accel/kvm/kvm-all.c                                |   2 -
 accel/qtest/qtest.c                                |  25 +-
 accel/tcg/cpu-exec.c                               |  53 ++-
 accel/tcg/cputlb.c                                 |  34 +-
 .../{tcg-cpus-icount.c => tcg-accel-ops-icount.c}  |  21 +-
 .../{tcg-cpus-mttcg.c => tcg-accel-ops-mttcg.c}    |  14 +-
 accel/tcg/{tcg-cpus-rr.c => tcg-accel-ops-rr.c}    |  13 +-
 accel/tcg/{tcg-cpus.c => tcg-accel-ops.c}          |  47 +-
 accel/tcg/tcg-all.c                                |  19 +-
 accel/tcg/user-exec.c                              |   8 +-
 accel/xen/xen-all.c                                |  26 +-
 bsd-user/main.c                                    |  11 +-
 cpu.c                                              |  66 +--
 hw/core/cpu.c                                      |  21 +-
 hw/mips/jazz.c                                     |  12 +-
 linux-user/main.c                                  |   7 +-
 softmmu/cpus.c                                     |  12 +-
 softmmu/memory.c                                   |   2 +-
 softmmu/physmem.c                                  | 149 ++++---
 softmmu/qtest.c                                    |   2 +-
 softmmu/vl.c                                       |   9 +-
 target/alpha/cpu.c                                 |  21 +-
 target/arm/cpu.c                                   |  45 +-
 target/arm/cpu64.c                                 |   4 +-
 target/arm/cpu_tcg.c                               |  32 +-
 target/arm/helper.c                                |   4 +
 target/arm/kvm64.c                                 |   6 +-
 target/avr/cpu.c                                   |  19 +-
 target/avr/helper.c                                |   5 +-
 target/cris/cpu.c                                  |  43 +-
 target/cris/helper.c                               |   5 +-
 target/hppa/cpu.c                                  |  24 +-
 target/i386/hax/{hax-cpus.c => hax-accel-ops.c}    |  33 +-
 target/i386/hax/hax-all.c                          |   7 +-
 target/i386/hax/hax-mem.c                          |   2 +-
 target/i386/hax/hax-posix.c                        |   2 +-
 target/i386/hax/hax-windows.c                      |   2 +-
 target/i386/hvf/{hvf-cpus.c => hvf-accel-ops.c}    |  29 +-
 target/i386/hvf/hvf.c                              |   5 +-
 target/i386/hvf/x86_task.c                         |   2 +-
 target/i386/hvf/x86hvf.c                           |   2 +-
 target/i386/tcg/tcg-cpu.c                          |  26 +-
 target/i386/whpx/{whpx-cpus.c => whpx-accel-ops.c} |  33 +-
 target/i386/whpx/whpx-all.c                        |   9 +-
 target/lm32/cpu.c                                  |  19 +-
 target/m68k/cpu.c                                  |  19 +-
 target/microblaze/cpu.c                            |  25 +-
 target/mips/cpu.c                                  |  35 +-
 target/moxie/cpu.c                                 |  15 +-
 target/nios2/cpu.c                                 |  18 +-
 target/openrisc/cpu.c                              |  17 +-
 target/riscv/cpu.c                                 |  26 +-
 target/riscv/cpu_helper.c                          |   2 +-
 target/rx/cpu.c                                    |  20 +-
 target/s390x/cpu.c                                 |  33 +-
 target/s390x/excp_helper.c                         |   2 +-
 target/sh4/cpu.c                                   |  21 +-
 target/sparc/cpu.c                                 |  25 +-
 target/tilegx/cpu.c                                |  17 +-
 target/tricore/cpu.c                               |  12 +-
 target/unicore32/cpu.c                             |  17 +-
 target/xtensa/cpu.c                                |  23 +-
 target/xtensa/helper.c                             |   4 +-
 tcg/tcg-common.c                                   |   4 -
 tcg/tci.c                                          | 479 ++++++++-------------
 target/ppc/translate_init.c.inc                    |  39 +-
 tcg/aarch64/tcg-target.c.inc                       |   7 +-
 tcg/s390/tcg-target.c.inc                          |   2 +-
 tcg/tci/tcg-target.c.inc                           | 149 ++-----
 MAINTAINERS                                        |   7 +-
 accel/kvm/meson.build                              |   2 +-
 accel/meson.build                                  |   4 +-
 accel/tcg/meson.build                              |  10 +-
 target/i386/hax/meson.build                        |   2 +-
 target/i386/hvf/meson.build                        |   2 +-
 target/i386/whpx/meson.build                       |   2 +-
 108 files changed, 1565 insertions(+), 1065 deletions(-)
 create mode 100644 accel/accel-softmmu.h
 rename accel/tcg/{tcg-cpus-icount.h => tcg-accel-ops-icount.h} (88%)
 create mode 100644 accel/tcg/tcg-accel-ops-mttcg.h
 rename accel/tcg/{tcg-cpus-rr.h => tcg-accel-ops-rr.h} (100%)
 rename accel/tcg/{tcg-cpus.h => tcg-accel-ops.h} (72%)
 create mode 100644 include/hw/core/accel-cpu.h
 create mode 100644 include/hw/core/tcg-cpu-ops.h
 rename include/{sysemu => qemu}/accel.h (94%)
 create mode 100644 include/sysemu/accel-ops.h
 rename target/i386/hax/{hax-cpus.h => hax-accel-ops.h} (95%)
 rename target/i386/hvf/{hvf-cpus.h => hvf-accel-ops.h} (94%)
 rename target/i386/whpx/{whpx-cpus.h => whpx-accel-ops.h} (96%)
 create mode 100644 accel/accel-common.c
 rename accel/{accel.c => accel-softmmu.c} (64%)
 create mode 100644 accel/accel-user.c
 rename accel/kvm/{kvm-cpus.c => kvm-accel-ops.c} (72%)
 rename accel/tcg/{tcg-cpus-icount.c => tcg-accel-ops-icount.c} (89%)
 rename accel/tcg/{tcg-cpus-mttcg.c => tcg-accel-ops-mttcg.c} (92%)
 rename accel/tcg/{tcg-cpus-rr.c => tcg-accel-ops-rr.c} (97%)
 rename accel/tcg/{tcg-cpus.c => tcg-accel-ops.c} (63%)
 rename target/i386/hax/{hax-cpus.c => hax-accel-ops.c} (69%)
 rename target/i386/hvf/{hvf-cpus.c => hvf-accel-ops.c} (84%)
 rename target/i386/whpx/{whpx-cpus.c => whpx-accel-ops.c} (71%)

Peter Maydell Feb. 6, 2021, 2:28 p.m. UTC | #1

On Fri, 5 Feb 2021 at 22:56, Richard Henderson
<richard.henderson@linaro.org> wrote:
>

> The following changes since commit d0dddab40e472ba62b5f43f11cc7dba085dabe71:

>

>   Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging (2021-02-05 15:27:02 +0000)

>

> are available in the Git repository at:

>

>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210205

>

> for you to fetch changes up to fb6916dd6ca8bb4b42d44baba9c67ecaf2279577:

>

>   accel: introduce AccelCPUClass extending CPUClass (2021-02-05 10:24:15 -1000)

>

> ----------------------------------------------------------------

> TCGCPUOps cleanups (claudio)

> tcg/s390 compare fix (phil)

> tcg/aarch64 rotli_vec fix

> tcg/tci cleanups and fixes

>

> ----------------------------------------------------------------



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0
for any user-visible changes.

-- PMM

Philippe Mathieu-Daudé Feb. 6, 2021, 7:14 p.m. UTC | #2

On 2/6/21 3:28 PM, Peter Maydell wrote:
> On Fri, 5 Feb 2021 at 22:56, Richard Henderson

> <richard.henderson@linaro.org> wrote:

>>

>> The following changes since commit d0dddab40e472ba62b5f43f11cc7dba085dabe71:

>>

>>   Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging (2021-02-05 15:27:02 +0000)

>>

>> are available in the Git repository at:

>>

>>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210205

>>

>> for you to fetch changes up to fb6916dd6ca8bb4b42d44baba9c67ecaf2279577:

>>

>>   accel: introduce AccelCPUClass extending CPUClass (2021-02-05 10:24:15 -1000)

>>

>> ----------------------------------------------------------------

>> TCGCPUOps cleanups (claudio)

>> tcg/s390 compare fix (phil)

>> tcg/aarch64 rotli_vec fix

>> tcg/tci cleanups and fixes

>>

>> ----------------------------------------------------------------

> 

> 

> Applied, thanks.

> 

> Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0

> for any user-visible changes.


FYI I couldn't do incremental build on my TCI configured directory,
but it works again after blowing away the whole directories. Not a
big deal as there are mostly 3 TCI users and 1.5 testers.

[I had scheduled to test this series during the week-end, no time
during the week. Eventually I'd have reported that issue earlier]

Richard Henderson Feb. 7, 2021, 3:45 a.m. UTC | #3

On 2/6/21 11:38 AM, Stefan Weil wrote:
> I am still searching what caused this detoriation. My first suspect was thread

> local storage, but that wasn't it. Do you have any idea?


No, but since it's 1/3 of a complete patch set, I don't care to investigate the
intermediate result either.


r~

Stefan Weil Feb. 7, 2021, 10:50 a.m. UTC | #4

Am 07.02.21 um 04:45 schrieb Richard Henderson:

> On 2/6/21 11:38 AM, Stefan Weil wrote:

>> I am still searching what caused this detoriation. My first suspect was thread

>> local storage, but that wasn't it. Do you have any idea?

> No, but since it's 1/3 of a complete patch set, I don't care to investigate the

> intermediate result either.

Your latest code from the rth7680/tci-next branch is twice as fast as my 
code with BIOS boot and qemu-x86_64 on sparc64. That's great.

With that code I don't get any BIOS output at all when running 
qemu-i386. That's not so good.

Did I test the correct branch? If yes, I could try the same test on 
amd64 and arm64 hosts.

Stefan

Richard Henderson Feb. 7, 2021, 6:37 p.m. UTC | #5

On 2/7/21 2:50 AM, Stefan Weil wrote:
> Your latest code from the rth7680/tci-next branch is twice as fast as my code

> with BIOS boot and qemu-x86_64 on sparc64. That's great.

> 

> With that code I don't get any BIOS output at all when running qemu-i386.

> That's not so good.

> 

> Did I test the correct branch? If yes, I could try the same test on amd64 and

> arm64 hosts.

Yes, tci-next is the correct branch.  I've just rebased it against master,
which includes the first 30-odd patches.

What host do you not see bios output from qemu-system-i386 (I assume that's a
typo above)?  I see correct output on x86_64, sparc64, ppc64le, and aarch64 hosts.

r~

Stefan Weil Feb. 7, 2021, 10 p.m. UTC | #6

On 07.02.21 19:37, Richard Henderson wrote:
> On 2/7/21 2:50 AM, Stefan Weil wrote:

>> Your latest code from the rth7680/tci-next branch is twice as fast as my code

>> with BIOS boot and qemu-x86_64 on sparc64. That's great.

>>

>> With that code I don't get any BIOS output at all when running qemu-i386.

>> That's not so good.

>>

>> Did I test the correct branch? If yes, I could try the same test on amd64 and

>> arm64 hosts.

> 

> Yes, tci-next is the correct branch.  I've just rebased it against master,

> which includes the first 30-odd patches.

> 

> What host do you not see bios output from qemu-system-i386 (I assume that's a

> typo above)?  I see correct output on x86_64, sparc64, ppc64le, and aarch64 hosts.

Right, the TCI test was done with qemu-system-i386 of course.

I repeated the TCI test with qemu-system-i386 and qemu-system-x86_64 and
the rebased branch.

The system emulation for a BIOS boot works on Apple M1 arm64 with less
that 5 s user time (similar fast as before the latest TCI changes):

./qemu-system-i386 --nographic
  4,28s user 0,03s system 37% cpu 11,398 total
./qemu-system-x86_64 --nographic
  4,39s user 0,03s system 34% cpu 12,982 total

The same test shows similar timings on an AMD64 server:

./qemu-system-i386 --nographic
 user 0m4,958s before tcg-next, 0m5,115s after tcg-next

./qemu-system-x86_64 --nographic
 user 0m4,967s before tcg-next, 0m5,263s after tcg-next

Here tci-next is slightly slower than the old code.

The results on sparc64 did not change with the rebased tci-next:
qemu-system-i386 still fails to run, and qemu-system-x86_64 takes about
20 s user time.

Stefan

[PULL,00/46] tcg patch queue

Message

Comments