[v6,00/50] tcg tb_lock removal

Message ID	20171016172609.23422-1-richard.henderson@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Date: Mon, 16 Oct 2017 10:25:19 -0700 Message-Id: <20171016172609.23422-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [Qemu-devel] [PATCH v6 00/50] tcg tb_lock removal Precedence: list Cc: cota@braap.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	tcg tb_lock removal \| expand [v6,00/50] tcg tb_lock removal [v6,01/50] tcg: Merge opcode arguments into TCGOp [v6,02/50] tcg: Propagate args to op->args in optimizer [v6,03/50] tcg: Propagate args to op->args in tcg.c [v6,04/50] tcg: Propagate TCGOp down to allocators [v6,05/50] tcg: Introduce arg_temp [v6,06/50] tcg: Add temp_global bit to TCGTemp [v6,07/50] tcg: Return NULL temp for TCG_CALL_DUMMY_ARG [v6,08/50] tcg: Introduce temp_arg [v6,09/50] tcg: Use per-temp state data in liveness [v6,10/50] tcg: Avoid loops against variable bounds [v6,11/50] tcg: Change temp_allocate_frame arg to TCGTemp [v6,12/50] tcg: Remove unused TCG_CALL_DUMMY_TCGV [v6,13/50] tcg: Export temp_idx [v6,14/50] tcg: Use per-temp state data in optimize [v6,15/50] tcg: Push tcg_ctx into generator functions [v6,16/50] tcg: Push tcg_ctx into tcg_gen_callN [v6,17/50] tcg: Introduce index_arg [v6,18/50] tcg: Reserve temporary index 0 [v6,19/50] target/alpha: Avoid translate_init unless tcg_enabled [v6,20/50] qom: Introduce CPUClass.tcg_initialize [v6,21/50] tcg: Use pointers in TCGOp->args [v6,22/50] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK [v6,23/50] hack dump tb->flags and tb->cflags [v6,24/50] tcg: Add CPUState step_next_tb [v6,25/50] tcg: Include CF_COUNT_MASK in CF_HASH_MASK [v6,26/50] tcg: convert tb->cflags reads to tb_cflags(tb) [v6,27/50] target/arm: check CF_PARALLEL instead of parallel_cpus [v6,28/50] target/hppa: check CF_PARALLEL instead of parallel_cpus [v6,29/50] target/i386: check CF_PARALLEL instead of parallel_cpus [v6,30/50] target/m68k: check CF_PARALLEL instead of parallel_cpus [v6,31/50] target/s390x: check CF_PARALLEL instead of parallel_cpus [v6,32/50] target/sh4: check CF_PARALLEL instead of parallel_cpus [v6,33/50] target/sparc: check CF_PARALLEL instead of parallel_cpus [v6,34/50] tcg: check CF_PARALLEL instead of parallel_cpus [v6,35/50] cpu-exec: lookup/generate TB outside exclusive region during step_atomic [v6,36/50] tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK [v6,37/50] tcg: Remove CF_IGNORE_ICOUNT [v6,38/50] translate-all: use a binary search tree to track TBs in TBContext [v6,39/50] exec-all: rename tb_free to tb_remove [v6,40/50] translate-all: report correct avg host TB size [v6,41/50] tcg: take tb_ctx out of TCGContext [v6,42/50] tcg: define tcg_init_ctx and make tcg_ctx a pointer [v6,43/50] gen-icount: fold exitreq_label into TCGContext [v6,44/50] tcg: introduce **tcg_ctxs to keep track of all TCGContext's [v6,45/50] tcg: distribute profiling counters across TCGContext's [v6,46/50] tcg: allocate optimizer temps with tcg_malloc [v6,47/50] osdep: introduce qemu_mprotect_rwx/none [v6,48/50] translate-all: use qemu_protect_rwx/none helpers [v6,49/50] tcg: introduce regions to split code_gen_buffer [v6,50/50] tcg: enable multiple TCG contexts in softmmu

Richard Henderson Oct. 16, 2017, 5:25 p.m. UTC

I've fixed two bugs within v5 of Emilio's patch set:

 - The step_next_tb patch fixes the "rep movsb" bug that appeared
   when we included CF_COUNT_MASK into CF_HASH_MASK.  We had been
   relying on magic to single-step the next guest insn.

 - The original "allocate optimizer temps with tcg_malloc" patch
   failed testing on arm32 host.  I didn't really look into exactly
   what was wrong because I had an older patch set that touched the
   same portion of the optimizer.

   Thus, an extra 20 patches in the patch set rearranging how temps
   are referenced within the TCG backend.  Some of them have appeared
   on the list before, but it would have been last November.


r~


Emilio G. Cota (24):
  tcg: define CF_PARALLEL and use it for TB hashing along with
    CF_COUNT_MASK
  tcg: convert tb->cflags reads to tb_cflags(tb)
  target/arm: check CF_PARALLEL instead of parallel_cpus
  target/hppa: check CF_PARALLEL instead of parallel_cpus
  target/i386: check CF_PARALLEL instead of parallel_cpus
  target/m68k: check CF_PARALLEL instead of parallel_cpus
  target/s390x: check CF_PARALLEL instead of parallel_cpus
  target/sh4: check CF_PARALLEL instead of parallel_cpus
  target/sparc: check CF_PARALLEL instead of parallel_cpus
  tcg: check CF_PARALLEL instead of parallel_cpus
  cpu-exec: lookup/generate TB outside exclusive region during
    step_atomic
  translate-all: use a binary search tree to track TBs in TBContext
  exec-all: rename tb_free to tb_remove
  translate-all: report correct avg host TB size
  tcg: take tb_ctx out of TCGContext
  tcg: define tcg_init_ctx and make tcg_ctx a pointer
  gen-icount: fold exitreq_label into TCGContext
  tcg: introduce **tcg_ctxs to keep track of all TCGContext's
  tcg: distribute profiling counters across TCGContext's
  tcg: allocate optimizer temps with tcg_malloc
  osdep: introduce qemu_mprotect_rwx/none
  translate-all: use qemu_protect_rwx/none helpers
  tcg: introduce regions to split code_gen_buffer
  tcg: enable multiple TCG contexts in softmmu

Richard Henderson (26):
  tcg: Merge opcode arguments into TCGOp
  tcg: Propagate args to op->args in optimizer
  tcg: Propagate args to op->args in tcg.c
  tcg: Propagate TCGOp down to allocators
  tcg: Introduce arg_temp
  tcg: Add temp_global bit to TCGTemp
  tcg: Return NULL temp for TCG_CALL_DUMMY_ARG
  tcg: Introduce temp_arg
  tcg: Use per-temp state data in liveness
  tcg: Avoid loops against variable bounds
  tcg: Change temp_allocate_frame arg to TCGTemp
  tcg: Remove unused TCG_CALL_DUMMY_TCGV
  tcg: Export temp_idx
  tcg: Use per-temp state data in optimize
  tcg: Push tcg_ctx into generator functions
  tcg: Push tcg_ctx into tcg_gen_callN
  tcg: Introduce index_arg
  tcg: Reserve temporary index 0
  target/alpha: Avoid translate_init unless tcg_enabled
  qom: Introduce CPUClass.tcg_initialize
  tcg: Use pointers in TCGOp->args
  hack dump tb->flags and tb->cflags
  tcg: Add CPUState step_next_tb
  tcg: Include CF_COUNT_MASK in CF_HASH_MASK
  tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK
  tcg: Remove CF_IGNORE_ICOUNT

 include/exec/exec-all.h       |   41 +-
 include/exec/gen-icount.h     |   25 +-
 include/exec/helper-gen.h     |   12 +-
 include/exec/tb-context.h     |    6 +-
 include/exec/tb-hash-xx.h     |    9 +-
 include/exec/tb-hash.h        |    4 +-
 include/exec/tb-lookup.h      |    6 +-
 include/qemu/osdep.h          |    2 +
 include/qom/cpu.h             |    9 +-
 target/arm/helper-a64.h       |    4 +
 target/hppa/helper.h          |    2 +
 target/m68k/helper.h          |    1 +
 target/s390x/helper.h         |    4 +
 target/sparc/cpu.h            |    2 +-
 tcg/tcg-op.h                  |  132 +++---
 tcg/tcg.h                     |  184 +++++---
 accel/tcg/cpu-exec.c          |  102 ++--
 accel/tcg/tcg-runtime.c       |    4 +-
 accel/tcg/translate-all.c     |  511 ++++++++++----------
 accel/tcg/translator.c        |    4 +-
 bsd-user/main.c               |    3 +-
 cpus.c                        |   14 +
 exec.c                        |   13 +-
 linux-user/main.c             |    9 +-
 linux-user/syscall.c          |    1 +
 target/alpha/cpu.c            |    3 +-
 target/alpha/translate.c      |   12 +-
 target/arm/cpu.c              |    6 +-
 target/arm/helper-a64.c       |   38 +-
 target/arm/op_helper.c        |    7 -
 target/arm/translate-a64.c    |   38 +-
 target/arm/translate.c        |   17 +-
 target/cris/cpu.c             |   16 +-
 target/cris/translate.c       |    8 +-
 target/cris/translate_v10.c   |    2 +-
 target/hppa/cpu.c             |    3 +-
 target/hppa/op_helper.c       |   32 +-
 target/hppa/translate.c       |   22 +-
 target/i386/cpu.c             |    5 +-
 target/i386/translate.c       |   60 ++-
 target/lm32/cpu.c             |    7 +-
 target/lm32/translate.c       |   16 +-
 target/m68k/cpu.c             |    7 +-
 target/m68k/op_helper.c       |   33 +-
 target/m68k/translate.c       |   20 +-
 target/microblaze/cpu.c       |    7 +-
 target/microblaze/translate.c |    8 +-
 target/mips/cpu.c             |    5 +-
 target/mips/translate.c       |   35 +-
 target/moxie/cpu.c            |    7 +-
 target/moxie/translate.c      |   10 +-
 target/nios2/cpu.c            |    7 +-
 target/nios2/translate.c      |    6 +-
 target/openrisc/cpu.c         |    7 +-
 target/openrisc/translate.c   |    8 +-
 target/ppc/translate.c        |   14 +-
 target/ppc/translate_init.c   |   37 +-
 target/s390x/cpu.c            |    7 +-
 target/s390x/mem_helper.c     |   80 +++-
 target/s390x/translate.c      |   36 +-
 target/sh4/cpu.c              |    5 +-
 target/sh4/translate.c        |   17 +-
 target/sparc/cpu.c            |    5 +-
 target/sparc/translate.c      |   19 +-
 target/tilegx/cpu.c           |    7 +-
 target/tilegx/translate.c     |    4 +-
 target/tricore/cpu.c          |    5 +-
 target/tricore/translate.c    |    9 +-
 target/unicore32/cpu.c        |    7 +-
 target/unicore32/translate.c  |    8 +-
 target/xtensa/cpu.c           |    7 +-
 target/xtensa/translate.c     |   30 +-
 tcg/optimize.c                |  659 ++++++++++++++------------
 tcg/tcg-op.c                  |  186 ++++----
 tcg/tcg.c                     | 1030 ++++++++++++++++++++++++++++-------------
 tests/qht-bench.c             |    2 +-
 util/osdep.c                  |   41 ++
 77 files changed, 2194 insertions(+), 1577 deletions(-)

-- 
2.13.6

no-reply@patchew.org Oct. 16, 2017, 6:45 p.m. UTC | #1

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20171016172609.23422-1-richard.henderson@linaro.org
Subject: [Qemu-devel] [PATCH v6 00/50] tcg tb_lock removal

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]               patchew/1508176489-9002-1-git-send-email-sundeep.lkml@gmail.com -> patchew/1508176489-9002-1-git-send-email-sundeep.lkml@gmail.com
Switched to a new branch 'test'
a7a80a1761 tcg: enable multiple TCG contexts in softmmu
45b2d9096d tcg: introduce regions to split code_gen_buffer
cc99683553 translate-all: use qemu_protect_rwx/none helpers
98da101651 osdep: introduce qemu_mprotect_rwx/none
15f78c7e81 tcg: allocate optimizer temps with tcg_malloc
eef9b06948 tcg: distribute profiling counters across TCGContext's
0407851577 tcg: introduce **tcg_ctxs to keep track of all TCGContext's
e77580ec19 gen-icount: fold exitreq_label into TCGContext
a42271eb2a tcg: define tcg_init_ctx and make tcg_ctx a pointer
e90bedf9c9 tcg: take tb_ctx out of TCGContext
aa32e035a4 translate-all: report correct avg host TB size
75b3259505 exec-all: rename tb_free to tb_remove
146f8837fa translate-all: use a binary search tree to track TBs in TBContext
aa80b9aab0 tcg: Remove CF_IGNORE_ICOUNT
a666027577 tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK
ff25a4de38 cpu-exec: lookup/generate TB outside exclusive region during step_atomic
fabf792f62 tcg: check CF_PARALLEL instead of parallel_cpus
a420a291f1 target/sparc: check CF_PARALLEL instead of parallel_cpus
590acd9365 target/sh4: check CF_PARALLEL instead of parallel_cpus
65f026c3d9 target/s390x: check CF_PARALLEL instead of parallel_cpus
04ed7cef2f target/m68k: check CF_PARALLEL instead of parallel_cpus
bc5868d322 target/i386: check CF_PARALLEL instead of parallel_cpus
80c379d85a target/hppa: check CF_PARALLEL instead of parallel_cpus
b81fc08b40 target/arm: check CF_PARALLEL instead of parallel_cpus
8a82d2d8c5 tcg: convert tb->cflags reads to tb_cflags(tb)
e72fce1f10 tcg: Include CF_COUNT_MASK in CF_HASH_MASK
4b93cb71c0 tcg: Add CPUState step_next_tb
f8ad2e06db hack dump tb->flags and tb->cflags
fb77430355 tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
4015ca53aa tcg: Use pointers in TCGOp->args
fb2e994a9e qom: Introduce CPUClass.tcg_initialize
d38b346bc8 target/alpha: Avoid translate_init unless tcg_enabled
4dc76582d0 tcg: Reserve temporary index 0
0b0397a863 tcg: Introduce index_arg
5aedea1076 tcg: Push tcg_ctx into tcg_gen_callN
c70aed7e09 tcg: Push tcg_ctx into generator functions
e086b91868 tcg: Use per-temp state data in optimize
180e36d249 tcg: Export temp_idx
0367cc88b5 tcg: Remove unused TCG_CALL_DUMMY_TCGV
a84f9e98c9 tcg: Change temp_allocate_frame arg to TCGTemp
ff6a503bf4 tcg: Avoid loops against variable bounds
66ce744e70 tcg: Use per-temp state data in liveness
6fded83e78 tcg: Introduce temp_arg
6ca9011a8f tcg: Return NULL temp for TCG_CALL_DUMMY_ARG
1a61746e2c tcg: Add temp_global bit to TCGTemp
838e08d353 tcg: Introduce arg_temp
7458bf1177 tcg: Propagate TCGOp down to allocators
41bd25c6cf tcg: Propagate args to op->args in tcg.c
9622f2d4f7 tcg: Propagate args to op->args in optimizer
bb243a992e tcg: Merge opcode arguments into TCGOp

=== OUTPUT BEGIN ===
Checking PATCH 1/50: tcg: Merge opcode arguments into TCGOp...
ERROR: spaces prohibited around that ':' (ctx:WxW)
#481: FILE: tcg/tcg.h:613:
+    unsigned calli  : 4;        /* 12 */
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#482: FILE: tcg/tcg.h:614:
+    unsigned callo  : 2;        /* 14 */
                     ^

ERROR: space prohibited before that ':' (ctx:WxW)
#483: FILE: tcg/tcg.h:615:
+    unsigned        : 2;        /* 16 */
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#488: FILE: tcg/tcg.h:618:
+    unsigned prev   : 16;       /* 32 */
                     ^

ERROR: spaces prohibited around that ':' (ctx:WxW)
#489: FILE: tcg/tcg.h:619:
+    unsigned next   : 16;       /* 48 */
                     ^

total: 5 errors, 0 warnings, 485 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/50: tcg: Propagate args to op->args in optimizer...
ERROR: spaces required around that '-' (ctx:VxV)
#648: FILE: tcg/optimize.c:1165:
+                tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]);
                                                               ^

total: 1 errors, 0 warnings, 912 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/50: tcg: Propagate args to op->args in tcg.c...
Checking PATCH 4/50: tcg: Propagate TCGOp down to allocators...
Checking PATCH 5/50: tcg: Introduce arg_temp...
Checking PATCH 6/50: tcg: Add temp_global bit to TCGTemp...
Checking PATCH 7/50: tcg: Return NULL temp for TCG_CALL_DUMMY_ARG...
Checking PATCH 8/50: tcg: Introduce temp_arg...
Checking PATCH 9/50: tcg: Use per-temp state data in liveness...
WARNING: line over 80 characters
#186: FILE: tcg/tcg.c:1815:
+            } else if (arg_temp(op->args[0])->state == TS_DEAD && have_opc_new2) {

total: 0 errors, 1 warnings, 441 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 10/50: tcg: Avoid loops against variable bounds...
Checking PATCH 11/50: tcg: Change temp_allocate_frame arg to TCGTemp...
Checking PATCH 12/50: tcg: Remove unused TCG_CALL_DUMMY_TCGV...
Checking PATCH 13/50: tcg: Export temp_idx...
Checking PATCH 14/50: tcg: Use per-temp state data in optimize...
Checking PATCH 15/50: tcg: Push tcg_ctx into generator functions...
Checking PATCH 16/50: tcg: Push tcg_ctx into tcg_gen_callN...
Checking PATCH 17/50: tcg: Introduce index_arg...
Checking PATCH 18/50: tcg: Reserve temporary index 0...
Checking PATCH 19/50: target/alpha: Avoid translate_init unless tcg_enabled...
Checking PATCH 20/50: qom: Introduce CPUClass.tcg_initialize...
Checking PATCH 21/50: tcg: Use pointers in TCGOp->args...
Checking PATCH 22/50: tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK...
Checking PATCH 23/50: hack dump tb->flags and tb->cflags...
Checking PATCH 24/50: tcg: Add CPUState step_next_tb...
Checking PATCH 25/50: tcg: Include CF_COUNT_MASK in CF_HASH_MASK...
Checking PATCH 26/50: tcg: convert tb->cflags reads to tb_cflags(tb)...
ERROR: return is not a function, parentheses are not required
#102: FILE: target/alpha/translate.c:458:
+    return ((tb_cflags(ctx->base.tb) & CF_LAST_IO)

WARNING: line over 80 characters
#218: FILE: target/hppa/translate.c:472:
+    if ((tb_cflags(ctx->base.tb) & CF_LAST_IO) || ctx->base.singlestep_enabled) {

total: 1 errors, 1 warnings, 924 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 27/50: target/arm: check CF_PARALLEL instead of parallel_cpus...
Checking PATCH 28/50: target/hppa: check CF_PARALLEL instead of parallel_cpus...
Checking PATCH 29/50: target/i386: check CF_PARALLEL instead of parallel_cpus...
WARNING: line over 80 characters
#22: FILE: target/i386/translate.c:5268:
+            if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->base.tb) & CF_PARALLEL)) {

WARNING: line over 80 characters
#31: FILE: target/i386/translate.c:5279:
+            if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->base.tb) & CF_PARALLEL)) {

total: 0 errors, 2 warnings, 16 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 30/50: target/m68k: check CF_PARALLEL instead of parallel_cpus...
Checking PATCH 31/50: target/s390x: check CF_PARALLEL instead of parallel_cpus...
Checking PATCH 32/50: target/sh4: check CF_PARALLEL instead of parallel_cpus...
Checking PATCH 33/50: target/sparc: check CF_PARALLEL instead of parallel_cpus...
Checking PATCH 34/50: tcg: check CF_PARALLEL instead of parallel_cpus...
Checking PATCH 35/50: cpu-exec: lookup/generate TB outside exclusive region during step_atomic...
Checking PATCH 36/50: tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK...
Checking PATCH 37/50: tcg: Remove CF_IGNORE_ICOUNT...
Checking PATCH 38/50: translate-all: use a binary search tree to track TBs in TBContext...
Checking PATCH 39/50: exec-all: rename tb_free to tb_remove...
Checking PATCH 40/50: translate-all: report correct avg host TB size...
Checking PATCH 41/50: tcg: take tb_ctx out of TCGContext...
Checking PATCH 42/50: tcg: define tcg_init_ctx and make tcg_ctx a pointer...
Checking PATCH 43/50: gen-icount: fold exitreq_label into TCGContext...
Checking PATCH 44/50: tcg: introduce **tcg_ctxs to keep track of all TCGContext's...
Checking PATCH 45/50: tcg: distribute profiling counters across TCGContext's...
Checking PATCH 46/50: tcg: allocate optimizer temps with tcg_malloc...
Checking PATCH 47/50: osdep: introduce qemu_mprotect_rwx/none...
Checking PATCH 48/50: translate-all: use qemu_protect_rwx/none helpers...
Checking PATCH 49/50: tcg: introduce regions to split code_gen_buffer...
Checking PATCH 50/50: tcg: enable multiple TCG contexts in softmmu...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

Emilio Cota Oct. 18, 2017, 4:04 a.m. UTC | #2

On Mon, Oct 16, 2017 at 10:25:19 -0700, Richard Henderson wrote:
>    Thus, an extra 20 patches in the patch set rearranging how temps

>    are referenced within the TCG backend.  Some of them have appeared

>    on the list before, but it would have been last November.


I remember seeing these on the list before -- in June'17:
  https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg04596.html

FWIW with these 20 patches we no longer get a code reduction:

   text    data     bss     dec     hex filename
6850654 2144032 4420896 13415582         ccb49e alpha-softmmu/qemu-system-alpha (before)
6874086 2144032 4437248 13455366         cd5006 alpha-softmmu/qemu-system-alpha (after)

but we've got to keep in mind that the patches aren't exactly the same.

		E.

Emilio Cota Oct. 18, 2017, 10:45 p.m. UTC | #3

On Mon, Oct 16, 2017 at 10:25:19 -0700, Richard Henderson wrote:
> I've fixed two bugs within v5 of Emilio's patch set:

> 

>  - The step_next_tb patch fixes the "rep movsb" bug that appeared

>    when we included CF_COUNT_MASK into CF_HASH_MASK.  We had been

>    relying on magic to single-step the next guest insn.

> 

>  - The original "allocate optimizer temps with tcg_malloc" patch

>    failed testing on arm32 host.  I didn't really look into exactly

>    what was wrong because I had an older patch set that touched the

>    same portion of the optimizer.

Thanks a lot for fixing these issues and respinning the series.

I have just pushed a branch on top of this series that includes
10 patches that further pave the way for the removal of tb_lock:

  https://github.com/cota/qemu/tree/multi-tcg-v6-plus

These patches are a subset of the ones that I posted on the
tb_lock removal patchset [1]. In particular, these patches are
groundwork that doesn't change anything fundamental wrt locking,
which does get tricky.

Given how close we are to the soft freeze for 2.11 [2], do you want
me to post these patches on the list for review? Otherwise I can wait
for the 2.12 dev cycle to post them with the complete tb_lock removal
work.

That said, I think we should at least cherry-pick "translate-all: exit
from tb_phys_invalidate if qht_remove fails" for 2.11, since it
fixes a real bug. Stable should also get it.

Thanks,

		Emilio

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01199.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg02217.html

Paolo Bonzini Oct. 19, 2017, 1:05 p.m. UTC | #4

On 19/10/2017 00:45, Emilio G. Cota wrote:
> On Mon, Oct 16, 2017 at 10:25:19 -0700, Richard Henderson wrote:

>> I've fixed two bugs within v5 of Emilio's patch set:

>>

>>  - The step_next_tb patch fixes the "rep movsb" bug that appeared

>>    when we included CF_COUNT_MASK into CF_HASH_MASK.  We had been

>>    relying on magic to single-step the next guest insn.

>>

>>  - The original "allocate optimizer temps with tcg_malloc" patch

>>    failed testing on arm32 host.  I didn't really look into exactly

>>    what was wrong because I had an older patch set that touched the

>>    same portion of the optimizer.

> 

> Thanks a lot for fixing these issues and respinning the series.

> 

> I have just pushed a branch on top of this series that includes

> 10 patches that further pave the way for the removal of tb_lock:

> 

>   https://github.com/cota/qemu/tree/multi-tcg-v6-plus


I started reviewing those, I have a few questions:

1) why is tcg_region_tree separate from tcg_region_state?  Would it make
sense to prepare a linked list of tcg_region_state structs, and reuse
the region lock for the region tree?

2) in tb_for_each_tagged_safe, could the "prev" argument instead be
"next", like


+    for (n = (head) & 1,                                        \
+             tb = (TranslationBlock *)((head) & ~1);            \
+         tb && ((next = (TranslationBlock *)tb->field[n]), 1);  \
+             n = (uintptr_t)next & 1,                           \
+             tb = (TranslationBlock *)((uintptr_t)next & ~1))

(also please make the iterator macros UPPERCASE)

3) "translate-all: exit from tb_phys_invalidate if qht_remove fails" may
be worth posting now?

Paolo

> These patches are a subset of the ones that I posted on the

> tb_lock removal patchset [1]. In particular, these patches are

> groundwork that doesn't change anything fundamental wrt locking,

> which does get tricky.

> 

> Given how close we are to the soft freeze for 2.11 [2], do you want

> me to post these patches on the list for review? Otherwise I can wait

> for the 2.12 dev cycle to post them with the complete tb_lock removal

> work.

> 

> That said, I think we should at least cherry-pick "translate-all: exit

> from tb_phys_invalidate if qht_remove fails" for 2.11, since it

> fixes a real bug. Stable should also get it.

> 

> Thanks,

> 

> 		Emilio

> 

> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01199.html

> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg02217.html

> 

>

Emilio Cota Oct. 19, 2017, 8:11 p.m. UTC | #5

On Thu, Oct 19, 2017 at 15:05:17 +0200, Paolo Bonzini wrote:
> On 19/10/2017 00:45, Emilio G. Cota wrote:

> > I have just pushed a branch on top of this series that includes

> > 10 patches that further pave the way for the removal of tb_lock:

> > 

> >   https://github.com/cota/qemu/tree/multi-tcg-v6-plus

> 

> I started reviewing those,

Nice, thanks!

> I have a few questions:

> 

> 1) why is tcg_region_tree separate from tcg_region_state?  Would it make

> sense to prepare a linked list of tcg_region_state structs, and reuse

> the region lock for the region tree?

I think the naming here might be confusing; "tcg_region_state" should be
understood as "tcg_region_global_state". IOW, there is no per-region struct.

That said, the array of per-region trees could be embedded in this global
struct. I was hesitant to do so because then one could think that
region_state.lock and rt.lock are somehow related; they are not.

> 2) in tb_for_each_tagged_safe, could the "prev" argument instead be

> "next", like

> 

> 

> +    for (n = (head) & 1,                                        \

> +             tb = (TranslationBlock *)((head) & ~1);            \

> +         tb && ((next = (TranslationBlock *)tb->field[n]), 1);  \

> +             n = (uintptr_t)next & 1,                           \

> +             tb = (TranslationBlock *)((uintptr_t)next & ~1))

Is this just to make them closer to the macros in queue.h?

In this case tracking *prev in the loop (rather than next) is
useful because it makes removing the "current" element very simple:

static inline void tb_page_remove(PageDesc *pd, TranslationBlock *tb)
{
    TranslationBlock *tb1;
    uintptr_t *prev;
    unsigned int n1;

    page_for_each_tb_safe(pd, tb1, n1, prev) {
        if (tb1 == tb) {
            *prev = tb1->page_next[n1];
            return;
        }
    }
    g_assert_not_reached();
}

If we wanted to use something similar to QSLIST_REMOVE_AFTER, we'd
have to track three pointers instead of two: prev (tracked by the caller),
current and next (these two as part of the for loop).

> (also please make the iterator macros UPPERCASE)

Will do.

> 3) "translate-all: exit from tb_phys_invalidate if qht_remove fails" may

> be worth posting now?

I'll post it to be included in the next iteration of this series.

Thanks,

		Emilio

Paolo Bonzini Oct. 20, 2017, 7:10 a.m. UTC | #6

On 19/10/2017 22:11, Emilio G. Cota wrote:
> On Thu, Oct 19, 2017 at 15:05:17 +0200, Paolo Bonzini wrote:

>> On 19/10/2017 00:45, Emilio G. Cota wrote:

>>> I have just pushed a branch on top of this series that includes

>>> 10 patches that further pave the way for the removal of tb_lock:

>>>

>>>   https://github.com/cota/qemu/tree/multi-tcg-v6-plus

>>

>> I started reviewing those,

> 

> Nice, thanks!

> 

>> I have a few questions:

>>

>> 1) why is tcg_region_tree separate from tcg_region_state?  Would it make

>> sense to prepare a linked list of tcg_region_state structs, and reuse

>> the region lock for the region tree?

> 

> I think the naming here might be confusing; "tcg_region_state" should be

> understood as "tcg_region_global_state". IOW, there is no per-region struct.

> 

> That said, the array of per-region trees could be embedded in this global

> struct. I was hesitant to do so because then one could think that

> region_state.lock and rt.lock are somehow related; they are not.


Ok, this is clearer now.

>> 2) in tb_for_each_tagged_safe, could the "prev" argument instead be

>> "next", like

> 

> Is this just to make them closer to the macros in queue.h?

> 

> In this case tracking *prev in the loop (rather than next) is

> useful because it makes removing the "current" element very simple:


This actually makes a lot of sense.  Maybe we should change queue.h the
other way. ;)

Can you rename "prev" to "pprev" though?

Paolo

Emilio Cota Oct. 21, 2017, 2:34 a.m. UTC | #7

On Fri, Oct 20, 2017 at 09:10:38 +0200, Paolo Bonzini wrote:
> >> 2) in tb_for_each_tagged_safe, could the "prev" argument instead be

> >> "next", like

> > 

> > Is this just to make them closer to the macros in queue.h?

> > 

> > In this case tracking *prev in the loop (rather than next) is

> > useful because it makes removing the "current" element very simple:

> 

> This actually makes a lot of sense.  Maybe we should change queue.h the

> other way. ;)

Turns out this works here but it isn't as general-purpose as it
might look. In this case it works because we don't free the tb. If
we did, then we'd either need a branch in the iterator or a third
*next pointer.

In fact, the macro wouldn't be safe even if it tracked *next, since
on a removal *pprev must not be updated to the removed item.
IOW, the caller must be the one keeping track of *pprev, for otherwise
things break after the first removal. (Again, this doesn't affect this
particular instance, because its only caller stops iterating after
a removal.)

I have therefore given up on the macro and changed its only caller to
update *pprev. IMO it's less pretty, but more robust.

The updated branch with the changes you suggested (plus the above) is
available at:
  https://github.com/cota/qemu/tree/multi-tcg-v6-plus2

Thanks,

		Emilio

Emilio Cota Oct. 26, 2017, 1:47 a.m. UTC | #8

I have given a respin to the tb_lock branch on top of master.
Turns out the freezes I was getting were due to the magic we used
to rely on (e.g. for icount). Works now!

The branch is here:
  https://github.com/cota/qemu/tree/tb-lock

Boot-tested on ppc64, aarch64 and alpha smp guests (<=64) with MTTCG.

A couple of questions:

- Does anyone remember what work remains to be done to safely enable
  MTTCG for i386? I just forced it on: without the tb lock removal
  I booted an 8-core guest, but without tb_lock the guest kernel
  dies pretty quickly. So I guess there's still work to do.

- In user-mode without tb_lock, should I worry about fork happening
  while any of the newly-introduced locks are held? For instance,
  tb->jmp_lock (introduced in "translate-all: protect TB jumps with
  a per-destination-TB lock"), or tcg_region_tree's lock.

Thanks,

		Emilio

[v6,00/50] tcg tb_lock removal

Message

Comments