Message ID | 20171016172609.23422-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | tcg tb_lock removal | expand |
Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20171016172609.23422-1-richard.henderson@linaro.org Subject: [Qemu-devel] [PATCH v6 00/50] tcg tb_lock removal === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 git config --local diff.renamelimit 0 git config --local diff.renames True commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/1508176489-9002-1-git-send-email-sundeep.lkml@gmail.com -> patchew/1508176489-9002-1-git-send-email-sundeep.lkml@gmail.com Switched to a new branch 'test' a7a80a1761 tcg: enable multiple TCG contexts in softmmu 45b2d9096d tcg: introduce regions to split code_gen_buffer cc99683553 translate-all: use qemu_protect_rwx/none helpers 98da101651 osdep: introduce qemu_mprotect_rwx/none 15f78c7e81 tcg: allocate optimizer temps with tcg_malloc eef9b06948 tcg: distribute profiling counters across TCGContext's 0407851577 tcg: introduce **tcg_ctxs to keep track of all TCGContext's e77580ec19 gen-icount: fold exitreq_label into TCGContext a42271eb2a tcg: define tcg_init_ctx and make tcg_ctx a pointer e90bedf9c9 tcg: take tb_ctx out of TCGContext aa32e035a4 translate-all: report correct avg host TB size 75b3259505 exec-all: rename tb_free to tb_remove 146f8837fa translate-all: use a binary search tree to track TBs in TBContext aa80b9aab0 tcg: Remove CF_IGNORE_ICOUNT a666027577 tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK ff25a4de38 cpu-exec: lookup/generate TB outside exclusive region during step_atomic fabf792f62 tcg: check CF_PARALLEL instead of parallel_cpus a420a291f1 target/sparc: check CF_PARALLEL instead of parallel_cpus 590acd9365 target/sh4: check CF_PARALLEL instead of parallel_cpus 65f026c3d9 target/s390x: check CF_PARALLEL instead of parallel_cpus 04ed7cef2f target/m68k: check CF_PARALLEL instead of parallel_cpus bc5868d322 target/i386: check CF_PARALLEL instead of parallel_cpus 80c379d85a target/hppa: check CF_PARALLEL instead of parallel_cpus b81fc08b40 target/arm: check CF_PARALLEL instead of parallel_cpus 8a82d2d8c5 tcg: convert tb->cflags reads to tb_cflags(tb) e72fce1f10 tcg: Include CF_COUNT_MASK in CF_HASH_MASK 4b93cb71c0 tcg: Add CPUState step_next_tb f8ad2e06db hack dump tb->flags and tb->cflags fb77430355 tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK 4015ca53aa tcg: Use pointers in TCGOp->args fb2e994a9e qom: Introduce CPUClass.tcg_initialize d38b346bc8 target/alpha: Avoid translate_init unless tcg_enabled 4dc76582d0 tcg: Reserve temporary index 0 0b0397a863 tcg: Introduce index_arg 5aedea1076 tcg: Push tcg_ctx into tcg_gen_callN c70aed7e09 tcg: Push tcg_ctx into generator functions e086b91868 tcg: Use per-temp state data in optimize 180e36d249 tcg: Export temp_idx 0367cc88b5 tcg: Remove unused TCG_CALL_DUMMY_TCGV a84f9e98c9 tcg: Change temp_allocate_frame arg to TCGTemp ff6a503bf4 tcg: Avoid loops against variable bounds 66ce744e70 tcg: Use per-temp state data in liveness 6fded83e78 tcg: Introduce temp_arg 6ca9011a8f tcg: Return NULL temp for TCG_CALL_DUMMY_ARG 1a61746e2c tcg: Add temp_global bit to TCGTemp 838e08d353 tcg: Introduce arg_temp 7458bf1177 tcg: Propagate TCGOp down to allocators 41bd25c6cf tcg: Propagate args to op->args in tcg.c 9622f2d4f7 tcg: Propagate args to op->args in optimizer bb243a992e tcg: Merge opcode arguments into TCGOp === OUTPUT BEGIN === Checking PATCH 1/50: tcg: Merge opcode arguments into TCGOp... ERROR: spaces prohibited around that ':' (ctx:WxW) #481: FILE: tcg/tcg.h:613: + unsigned calli : 4; /* 12 */ ^ ERROR: spaces prohibited around that ':' (ctx:WxW) #482: FILE: tcg/tcg.h:614: + unsigned callo : 2; /* 14 */ ^ ERROR: space prohibited before that ':' (ctx:WxW) #483: FILE: tcg/tcg.h:615: + unsigned : 2; /* 16 */ ^ ERROR: spaces prohibited around that ':' (ctx:WxW) #488: FILE: tcg/tcg.h:618: + unsigned prev : 16; /* 32 */ ^ ERROR: spaces prohibited around that ':' (ctx:WxW) #489: FILE: tcg/tcg.h:619: + unsigned next : 16; /* 48 */ ^ total: 5 errors, 0 warnings, 485 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 2/50: tcg: Propagate args to op->args in optimizer... ERROR: spaces required around that '-' (ctx:VxV) #648: FILE: tcg/optimize.c:1165: + tcg_opt_gen_mov(s, op, op->args[0], op->args[4-tmp]); ^ total: 1 errors, 0 warnings, 912 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 3/50: tcg: Propagate args to op->args in tcg.c... Checking PATCH 4/50: tcg: Propagate TCGOp down to allocators... Checking PATCH 5/50: tcg: Introduce arg_temp... Checking PATCH 6/50: tcg: Add temp_global bit to TCGTemp... Checking PATCH 7/50: tcg: Return NULL temp for TCG_CALL_DUMMY_ARG... Checking PATCH 8/50: tcg: Introduce temp_arg... Checking PATCH 9/50: tcg: Use per-temp state data in liveness... WARNING: line over 80 characters #186: FILE: tcg/tcg.c:1815: + } else if (arg_temp(op->args[0])->state == TS_DEAD && have_opc_new2) { total: 0 errors, 1 warnings, 441 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 10/50: tcg: Avoid loops against variable bounds... Checking PATCH 11/50: tcg: Change temp_allocate_frame arg to TCGTemp... Checking PATCH 12/50: tcg: Remove unused TCG_CALL_DUMMY_TCGV... Checking PATCH 13/50: tcg: Export temp_idx... Checking PATCH 14/50: tcg: Use per-temp state data in optimize... Checking PATCH 15/50: tcg: Push tcg_ctx into generator functions... Checking PATCH 16/50: tcg: Push tcg_ctx into tcg_gen_callN... Checking PATCH 17/50: tcg: Introduce index_arg... Checking PATCH 18/50: tcg: Reserve temporary index 0... Checking PATCH 19/50: target/alpha: Avoid translate_init unless tcg_enabled... Checking PATCH 20/50: qom: Introduce CPUClass.tcg_initialize... Checking PATCH 21/50: tcg: Use pointers in TCGOp->args... Checking PATCH 22/50: tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK... Checking PATCH 23/50: hack dump tb->flags and tb->cflags... Checking PATCH 24/50: tcg: Add CPUState step_next_tb... Checking PATCH 25/50: tcg: Include CF_COUNT_MASK in CF_HASH_MASK... Checking PATCH 26/50: tcg: convert tb->cflags reads to tb_cflags(tb)... ERROR: return is not a function, parentheses are not required #102: FILE: target/alpha/translate.c:458: + return ((tb_cflags(ctx->base.tb) & CF_LAST_IO) WARNING: line over 80 characters #218: FILE: target/hppa/translate.c:472: + if ((tb_cflags(ctx->base.tb) & CF_LAST_IO) || ctx->base.singlestep_enabled) { total: 1 errors, 1 warnings, 924 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 27/50: target/arm: check CF_PARALLEL instead of parallel_cpus... Checking PATCH 28/50: target/hppa: check CF_PARALLEL instead of parallel_cpus... Checking PATCH 29/50: target/i386: check CF_PARALLEL instead of parallel_cpus... WARNING: line over 80 characters #22: FILE: target/i386/translate.c:5268: + if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->base.tb) & CF_PARALLEL)) { WARNING: line over 80 characters #31: FILE: target/i386/translate.c:5279: + if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->base.tb) & CF_PARALLEL)) { total: 0 errors, 2 warnings, 16 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Checking PATCH 30/50: target/m68k: check CF_PARALLEL instead of parallel_cpus... Checking PATCH 31/50: target/s390x: check CF_PARALLEL instead of parallel_cpus... Checking PATCH 32/50: target/sh4: check CF_PARALLEL instead of parallel_cpus... Checking PATCH 33/50: target/sparc: check CF_PARALLEL instead of parallel_cpus... Checking PATCH 34/50: tcg: check CF_PARALLEL instead of parallel_cpus... Checking PATCH 35/50: cpu-exec: lookup/generate TB outside exclusive region during step_atomic... Checking PATCH 36/50: tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK... Checking PATCH 37/50: tcg: Remove CF_IGNORE_ICOUNT... Checking PATCH 38/50: translate-all: use a binary search tree to track TBs in TBContext... Checking PATCH 39/50: exec-all: rename tb_free to tb_remove... Checking PATCH 40/50: translate-all: report correct avg host TB size... Checking PATCH 41/50: tcg: take tb_ctx out of TCGContext... Checking PATCH 42/50: tcg: define tcg_init_ctx and make tcg_ctx a pointer... Checking PATCH 43/50: gen-icount: fold exitreq_label into TCGContext... Checking PATCH 44/50: tcg: introduce **tcg_ctxs to keep track of all TCGContext's... Checking PATCH 45/50: tcg: distribute profiling counters across TCGContext's... Checking PATCH 46/50: tcg: allocate optimizer temps with tcg_malloc... Checking PATCH 47/50: osdep: introduce qemu_mprotect_rwx/none... Checking PATCH 48/50: translate-all: use qemu_protect_rwx/none helpers... Checking PATCH 49/50: tcg: introduce regions to split code_gen_buffer... Checking PATCH 50/50: tcg: enable multiple TCG contexts in softmmu... === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@freelists.org
On Mon, Oct 16, 2017 at 10:25:19 -0700, Richard Henderson wrote: > Thus, an extra 20 patches in the patch set rearranging how temps > are referenced within the TCG backend. Some of them have appeared > on the list before, but it would have been last November. I remember seeing these on the list before -- in June'17: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg04596.html FWIW with these 20 patches we no longer get a code reduction: text data bss dec hex filename 6850654 2144032 4420896 13415582 ccb49e alpha-softmmu/qemu-system-alpha (before) 6874086 2144032 4437248 13455366 cd5006 alpha-softmmu/qemu-system-alpha (after) but we've got to keep in mind that the patches aren't exactly the same. E.
On Mon, Oct 16, 2017 at 10:25:19 -0700, Richard Henderson wrote: > I've fixed two bugs within v5 of Emilio's patch set: > > - The step_next_tb patch fixes the "rep movsb" bug that appeared > when we included CF_COUNT_MASK into CF_HASH_MASK. We had been > relying on magic to single-step the next guest insn. > > - The original "allocate optimizer temps with tcg_malloc" patch > failed testing on arm32 host. I didn't really look into exactly > what was wrong because I had an older patch set that touched the > same portion of the optimizer. Thanks a lot for fixing these issues and respinning the series. I have just pushed a branch on top of this series that includes 10 patches that further pave the way for the removal of tb_lock: https://github.com/cota/qemu/tree/multi-tcg-v6-plus These patches are a subset of the ones that I posted on the tb_lock removal patchset [1]. In particular, these patches are groundwork that doesn't change anything fundamental wrt locking, which does get tricky. Given how close we are to the soft freeze for 2.11 [2], do you want me to post these patches on the list for review? Otherwise I can wait for the 2.12 dev cycle to post them with the complete tb_lock removal work. That said, I think we should at least cherry-pick "translate-all: exit from tb_phys_invalidate if qht_remove fails" for 2.11, since it fixes a real bug. Stable should also get it. Thanks, Emilio [1] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01199.html [2] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg02217.html
On 19/10/2017 00:45, Emilio G. Cota wrote: > On Mon, Oct 16, 2017 at 10:25:19 -0700, Richard Henderson wrote: >> I've fixed two bugs within v5 of Emilio's patch set: >> >> - The step_next_tb patch fixes the "rep movsb" bug that appeared >> when we included CF_COUNT_MASK into CF_HASH_MASK. We had been >> relying on magic to single-step the next guest insn. >> >> - The original "allocate optimizer temps with tcg_malloc" patch >> failed testing on arm32 host. I didn't really look into exactly >> what was wrong because I had an older patch set that touched the >> same portion of the optimizer. > > Thanks a lot for fixing these issues and respinning the series. > > I have just pushed a branch on top of this series that includes > 10 patches that further pave the way for the removal of tb_lock: > > https://github.com/cota/qemu/tree/multi-tcg-v6-plus I started reviewing those, I have a few questions: 1) why is tcg_region_tree separate from tcg_region_state? Would it make sense to prepare a linked list of tcg_region_state structs, and reuse the region lock for the region tree? 2) in tb_for_each_tagged_safe, could the "prev" argument instead be "next", like + for (n = (head) & 1, \ + tb = (TranslationBlock *)((head) & ~1); \ + tb && ((next = (TranslationBlock *)tb->field[n]), 1); \ + n = (uintptr_t)next & 1, \ + tb = (TranslationBlock *)((uintptr_t)next & ~1)) (also please make the iterator macros UPPERCASE) 3) "translate-all: exit from tb_phys_invalidate if qht_remove fails" may be worth posting now? Paolo > These patches are a subset of the ones that I posted on the > tb_lock removal patchset [1]. In particular, these patches are > groundwork that doesn't change anything fundamental wrt locking, > which does get tricky. > > Given how close we are to the soft freeze for 2.11 [2], do you want > me to post these patches on the list for review? Otherwise I can wait > for the 2.12 dev cycle to post them with the complete tb_lock removal > work. > > That said, I think we should at least cherry-pick "translate-all: exit > from tb_phys_invalidate if qht_remove fails" for 2.11, since it > fixes a real bug. Stable should also get it. > > Thanks, > > Emilio > > [1] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01199.html > [2] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg02217.html > >
On Thu, Oct 19, 2017 at 15:05:17 +0200, Paolo Bonzini wrote: > On 19/10/2017 00:45, Emilio G. Cota wrote: > > I have just pushed a branch on top of this series that includes > > 10 patches that further pave the way for the removal of tb_lock: > > > > https://github.com/cota/qemu/tree/multi-tcg-v6-plus > > I started reviewing those, Nice, thanks! > I have a few questions: > > 1) why is tcg_region_tree separate from tcg_region_state? Would it make > sense to prepare a linked list of tcg_region_state structs, and reuse > the region lock for the region tree? I think the naming here might be confusing; "tcg_region_state" should be understood as "tcg_region_global_state". IOW, there is no per-region struct. That said, the array of per-region trees could be embedded in this global struct. I was hesitant to do so because then one could think that region_state.lock and rt.lock are somehow related; they are not. > 2) in tb_for_each_tagged_safe, could the "prev" argument instead be > "next", like > > > + for (n = (head) & 1, \ > + tb = (TranslationBlock *)((head) & ~1); \ > + tb && ((next = (TranslationBlock *)tb->field[n]), 1); \ > + n = (uintptr_t)next & 1, \ > + tb = (TranslationBlock *)((uintptr_t)next & ~1)) Is this just to make them closer to the macros in queue.h? In this case tracking *prev in the loop (rather than next) is useful because it makes removing the "current" element very simple: static inline void tb_page_remove(PageDesc *pd, TranslationBlock *tb) { TranslationBlock *tb1; uintptr_t *prev; unsigned int n1; page_for_each_tb_safe(pd, tb1, n1, prev) { if (tb1 == tb) { *prev = tb1->page_next[n1]; return; } } g_assert_not_reached(); } If we wanted to use something similar to QSLIST_REMOVE_AFTER, we'd have to track three pointers instead of two: prev (tracked by the caller), current and next (these two as part of the for loop). > (also please make the iterator macros UPPERCASE) Will do. > 3) "translate-all: exit from tb_phys_invalidate if qht_remove fails" may > be worth posting now? I'll post it to be included in the next iteration of this series. Thanks, Emilio
On 19/10/2017 22:11, Emilio G. Cota wrote: > On Thu, Oct 19, 2017 at 15:05:17 +0200, Paolo Bonzini wrote: >> On 19/10/2017 00:45, Emilio G. Cota wrote: >>> I have just pushed a branch on top of this series that includes >>> 10 patches that further pave the way for the removal of tb_lock: >>> >>> https://github.com/cota/qemu/tree/multi-tcg-v6-plus >> >> I started reviewing those, > > Nice, thanks! > >> I have a few questions: >> >> 1) why is tcg_region_tree separate from tcg_region_state? Would it make >> sense to prepare a linked list of tcg_region_state structs, and reuse >> the region lock for the region tree? > > I think the naming here might be confusing; "tcg_region_state" should be > understood as "tcg_region_global_state". IOW, there is no per-region struct. > > That said, the array of per-region trees could be embedded in this global > struct. I was hesitant to do so because then one could think that > region_state.lock and rt.lock are somehow related; they are not. Ok, this is clearer now. >> 2) in tb_for_each_tagged_safe, could the "prev" argument instead be >> "next", like > > Is this just to make them closer to the macros in queue.h? > > In this case tracking *prev in the loop (rather than next) is > useful because it makes removing the "current" element very simple: This actually makes a lot of sense. Maybe we should change queue.h the other way. ;) Can you rename "prev" to "pprev" though? Paolo
On Fri, Oct 20, 2017 at 09:10:38 +0200, Paolo Bonzini wrote: > >> 2) in tb_for_each_tagged_safe, could the "prev" argument instead be > >> "next", like > > > > Is this just to make them closer to the macros in queue.h? > > > > In this case tracking *prev in the loop (rather than next) is > > useful because it makes removing the "current" element very simple: > > This actually makes a lot of sense. Maybe we should change queue.h the > other way. ;) Turns out this works here but it isn't as general-purpose as it might look. In this case it works because we don't free the tb. If we did, then we'd either need a branch in the iterator or a third *next pointer. In fact, the macro wouldn't be safe even if it tracked *next, since on a removal *pprev must not be updated to the removed item. IOW, the caller must be the one keeping track of *pprev, for otherwise things break after the first removal. (Again, this doesn't affect this particular instance, because its only caller stops iterating after a removal.) I have therefore given up on the macro and changed its only caller to update *pprev. IMO it's less pretty, but more robust. The updated branch with the changes you suggested (plus the above) is available at: https://github.com/cota/qemu/tree/multi-tcg-v6-plus2 Thanks, Emilio
I have given a respin to the tb_lock branch on top of master. Turns out the freezes I was getting were due to the magic we used to rely on (e.g. for icount). Works now! The branch is here: https://github.com/cota/qemu/tree/tb-lock Boot-tested on ppc64, aarch64 and alpha smp guests (<=64) with MTTCG. A couple of questions: - Does anyone remember what work remains to be done to safely enable MTTCG for i386? I just forced it on: without the tb lock removal I booted an 8-core guest, but without tb_lock the guest kernel dies pretty quickly. So I guess there's still work to do. - In user-mode without tb_lock, should I worry about fork happening while any of the newly-introduced locks are held? For instance, tb->jmp_lock (introduced in "translate-all: protect TB jumps with a per-destination-TB lock"), or tcg_region_tree's lock. Thanks, Emilio