Message ID | 20230806033715.244648-1-richard.henderson@linaro.org |
---|---|
State | New |
Headers | show |
On 8/5/23 20:36, Richard Henderson wrote: > The following changes since commit 6db03ccc7f4ca33c99debaac290066f4500a2dfb: > > Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2023-08-04 14:47:00 -0700) > > are available in the Git repository at: > > https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230805 > > for you to fetch changes up to 843246699425adfb6b81f927c16c9c6249b51e1d: > > linux-user/elfload: Set V in ELF_HWCAP for RISC-V (2023-08-05 18:17:20 +0000) > > ---------------------------------------------------------------- > accel/tcg: Do not issue misaligned i/o > accel/tcg: Call save_iotlb_data from io_readx > gdbstub: use 0 ("any process") on packets with no PID > linux-user: Fixes for MAP_FIXED_NOREPLACE > linux-user: Fixes for brk > linux-user: Adjust task_unmapped_base for reserved_va > linux-user: Use ELF_ET_DYN_BASE for ET_DYN with interpreter > linux-user: Remove host != guest page size workarounds in brk and image load > linux-user: Set V in ELF_HWCAP for RISC-V > *-user: Remove last_brk as unused > > ---------------------------------------------------------------- > Akihiko Odaki (6): > linux-user: Unset MAP_FIXED_NOREPLACE for host > linux-user: Fix MAP_FIXED_NOREPLACE on old kernels > linux-user: Do not call get_errno() in do_brk() > linux-user: Use MAP_FIXED_NOREPLACE for do_brk() > linux-user: Do nothing if too small brk is specified > linux-user: Do not align brk with host page size > > Helge Deller (1): > linux-user: Adjust initial brk when interpreter is close to executable > > Matheus Tavares Bernardino (1): > gdbstub: use 0 ("any process") on packets with no PID > > Mikhail Tyutin (1): > accel/tcg: Call save_iotlb_data from io_readx as well. > > Nathan Egge (1): > linux-user/elfload: Set V in ELF_HWCAP for RISC-V > > Richard Henderson (14): > accel/tcg: Adjust parameters and locking with do_{ld,st}_mmio_* > accel/tcg: Issue wider aligned i/o in do_{ld,st}_mmio_* > accel/tcg: Do not issue misaligned i/o > linux-user: Remove last_brk > bsd-user: Remove last_brk > linux-user: Adjust task_unmapped_base for reserved_va > linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h > linux-user: Define ELF_ET_DYN_BASE in $guest/target_mman.h > linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap > linux-user: Use elf_et_dyn_base for ET_DYN with interpreter > linux-user: Properly set image_info.brk in flatload > linux-user: Do not adjust image mapping for host page size > linux-user: Do not adjust zero_bss for host page size > linux-user: Use zero_bss for PT_LOAD with no file contents too Applied a truncated version of this PR: 3c4a8a8fda bsd-user: Remove last_brk 62cbf08150 linux-user: Remove last_brk 0662a626a7 linux-user: Properly set image_info.brk in flatload 2aea137a42 linux-user: Do not align brk with host page size cb9d5d1fda linux-user: Do nothing if too small brk is specified e69e032d1a linux-user: Use MAP_FIXED_NOREPLACE for do_brk() c6cc059eca linux-user: Do not call get_errno() in do_brk() ddcdd8c48f linux-user: Fix MAP_FIXED_NOREPLACE on old kernels c3dd50da0f linux-user: Unset MAP_FIXED_NOREPLACE for host 4333f0924c linux-user/elfload: Set V in ELF_HWCAP for RISC-V 89e5b7935e configure: Fix linux-user host detection for riscv64 6c78de6eb6 gdbstub: use 0 ("any process") on packets with no PID c30d0b861c accel/tcg: Call save_iotlb_data from io_readx as well f7eaf9d702 accel/tcg: Do not issue misaligned i/o 190aba803f accel/tcg: Issue wider aligned i/o in do_{ld,st}_mmio_* 1966855e56 accel/tcg: Adjust parameters and locking with do_{ld,st}_mmio_* The "Use MAP_FIXED_NOREPLACE for initial image mmap" patch tickles a latent bug in probe_guest_base, which affects our s390x host. I omitted all of the task_unmapped_base and elf_et_dyn_base patches as well since they are also affect layout. r~
On 8/23/23 06:04, Thomas Huth wrote: > On 06/08/2023 05.36, Richard Henderson wrote: >> The following changes since commit 6db03ccc7f4ca33c99debaac290066f4500a2dfb: >> >> Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2023-08-04 >> 14:47:00 -0700) >> >> are available in the Git repository at: >> >> https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230805 >> >> for you to fetch changes up to 843246699425adfb6b81f927c16c9c6249b51e1d: >> >> linux-user/elfload: Set V in ELF_HWCAP for RISC-V (2023-08-05 18:17:20 +0000) >> >> ---------------------------------------------------------------- >> accel/tcg: Do not issue misaligned i/o >> accel/tcg: Call save_iotlb_data from io_readx >> gdbstub: use 0 ("any process") on packets with no PID >> linux-user: Fixes for MAP_FIXED_NOREPLACE >> linux-user: Fixes for brk >> linux-user: Adjust task_unmapped_base for reserved_va >> linux-user: Use ELF_ET_DYN_BASE for ET_DYN with interpreter >> linux-user: Remove host != guest page size workarounds in brk and image load >> linux-user: Set V in ELF_HWCAP for RISC-V >> *-user: Remove last_brk as unused > > Hi Richard, > > I noticed that we currently have two failing Avocado jobs in our CI, avocado-system-centos > and avocado-system-opensuse, where the boot_linux.py:BootLinuxX8664.test_pc_i440fx_tcg and > the boot_linux.py:BootLinuxX8664.test_pc_q35_tcg are now apparently crashing. If I've got > the history right, it started with your pull request here, in the preceeding one from > Paolo, everything is still green: > > https://gitlab.com/qemu-project/qemu/-/pipelines/956543770 > > But here the jobs started failing: > > https://gitlab.com/qemu-project/qemu/-/pipelines/957458385 > > Could you please have a look? It's some sort of timing issue, which sometimes goes away when re-run. I was re-running tests *a lot* in order to get them to go green while running the 8.1 release. For instance, with very little added except for your s390x pull, the same BootLinuxX8664.test_pc_i440fx_tcg test passes: https://gitlab.com/qemu-project/qemu/-/jobs/4931341744#L136 In the failing i44fx_tcg test, you can even see it's a timing issue: https://qemu-project.gitlab.io/-/qemu/-/jobs/4813804725/artifacts/build/tests/results/latest/test-results/02-tests_avocado_boot_linux.py_BootLinuxX8664.test_pc_i440fx_tcg/debug.log 23:42:30 DEBUG| [ 61.003328] Sending NMI from CPU 0 to CPUs 1: 23:42:30 DEBUG| [ 61.007829] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.622 msecs 23:42:30 DEBUG| [ 61.003328] NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xe/0x10 23:42:30 DEBUG| [ 61.003328] rcu: rcu_sched kthread starved for 60002 jiffies! g-963 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1 23:42:30 DEBUG| [ 61.003328] rcu: RCU grace-period kthread stack dump: 23:42:30 DEBUG| [ 61.003328] rcu_sched I 0 10 2 0x80004000 23:42:30 DEBUG| [ 61.003328] Call Trace: 23:42:30 DEBUG| [ 61.003328] ? __schedule+0x29f/0x680 ... r~
Richard Henderson <richard.henderson@linaro.org> writes: > On 8/23/23 06:04, Thomas Huth wrote: >> On 06/08/2023 05.36, Richard Henderson wrote: >>> The following changes since commit 6db03ccc7f4ca33c99debaac290066f4500a2dfb: >>> >>> Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into >>> staging (2023-08-04 14:47:00 -0700) >>> >>> are available in the Git repository at: >>> >>> https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230805 >>> >>> for you to fetch changes up to 843246699425adfb6b81f927c16c9c6249b51e1d: >>> >>> linux-user/elfload: Set V in ELF_HWCAP for RISC-V (2023-08-05 18:17:20 +0000) >>> >>> ---------------------------------------------------------------- >>> accel/tcg: Do not issue misaligned i/o >>> accel/tcg: Call save_iotlb_data from io_readx >>> gdbstub: use 0 ("any process") on packets with no PID >>> linux-user: Fixes for MAP_FIXED_NOREPLACE >>> linux-user: Fixes for brk >>> linux-user: Adjust task_unmapped_base for reserved_va >>> linux-user: Use ELF_ET_DYN_BASE for ET_DYN with interpreter >>> linux-user: Remove host != guest page size workarounds in brk and image load >>> linux-user: Set V in ELF_HWCAP for RISC-V >>> *-user: Remove last_brk as unused >> Hi Richard, >> I noticed that we currently have two failing Avocado jobs in our CI, >> avocado-system-centos and avocado-system-opensuse, where the >> boot_linux.py:BootLinuxX8664.test_pc_i440fx_tcg and the >> boot_linux.py:BootLinuxX8664.test_pc_q35_tcg are now apparently >> crashing. If I've got the history right, it started with your pull >> request here, in the preceeding one from Paolo, everything is still >> green: >> https://gitlab.com/qemu-project/qemu/-/pipelines/956543770 >> But here the jobs started failing: >> https://gitlab.com/qemu-project/qemu/-/pipelines/957458385 >> Could you please have a look? > > It's some sort of timing issue, which sometimes goes away when re-run. > I was re-running tests *a lot* in order to get them to go green while > running the 8.1 release. There is a definite regression point for the test_pc_q35 case: ./tests/venv/bin/avocado run ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg JOB ID : b8ea329d3353db7a47eb955fcad2f26b2dbe9f29 JOB LOG : /home/alex.bennee/avocado/job-results/job-2023-08-24T15.27-b8ea329/job.log (1/1) ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg: PASS (110.70 s) RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB TIME : 111.22 s 🕙15:29:06 alex.bennee@hackbox2:qemu.git/builds/bisect (190aba8) (BISECTING) [$!?] took 1m51s ➜ make -j30 [1/8] Generating qemu-version.h with a custom command (wrapped by meson to capture output) [2/8] Compiling C object qga/qemu-ga.p/main.c.o [3/8] Compiling C object libqmp.fa.p/monitor_qmp-cmds-control.c.o [4/8] Compiling C object libqemu-x86_64-softmmu.fa.p/accel_tcg_cputlb.c.o [5/8] Compiling C object libcommon.fa.p/softmmu_vl.c.o [6/8] Linking static target libqmp.fa [7/8] Linking target qga/qemu-ga [8/8] Linking target qemu-system-x86_64 🕙15:30:12 alex.bennee@hackbox2:qemu.git/builds/bisect (f7eaf9d) (BISECTING) [$!?] took 5s ➜ ./tests/venv/bin/avocado run ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg JOB ID : 56768272dee373062792251ee3445cc81092634e JOB LOG : /home/alex.bennee/avocado/job-results/job-2023-08-24T15.30-5676827/job.log (1/1) ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '1-./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg', 'logdir': '/home/alex.bennee/avocado/job-results/job-2023-08-24T15.30-5676827/test-results... (480.28 s) RESULTS : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0 JOB TIME : 480.80 s which bisects to: commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, refs/bisect/bad) Author: Richard Henderson <richard.henderson@linaro.org> Date: Tue Aug 1 10:46:03 2023 -0700 accel/tcg: Do not issue misaligned i/o In the single-page case we were issuing misaligned i/o to the memory subsystem, which does not handle it properly. Split such accesses via do_{ld,st}_mmio_*. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1800 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > For instance, with very little added except for your s390x pull, the > same BootLinuxX8664.test_pc_i440fx_tcg test passes: > > https://gitlab.com/qemu-project/qemu/-/jobs/4931341744#L136 > > In the failing i44fx_tcg test, you can even see it's a timing issue: > > https://qemu-project.gitlab.io/-/qemu/-/jobs/4813804725/artifacts/build/tests/results/latest/test-results/02-tests_avocado_boot_linux.py_BootLinuxX8664.test_pc_i440fx_tcg/debug.log > > 23:42:30 DEBUG| [ 61.003328] Sending NMI from CPU 0 to CPUs 1: > 23:42:30 DEBUG| [ 61.007829] INFO: NMI handler > (nmi_cpu_backtrace_handler) took too long to run: 2.622 msecs > 23:42:30 DEBUG| [ 61.003328] NMI backtrace for cpu 1 skipped: idling > at native_safe_halt+0xe/0x10 > 23:42:30 DEBUG| [ 61.003328] rcu: rcu_sched kthread starved for > 60002 jiffies! g-963 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1 > 23:42:30 DEBUG| [ 61.003328] rcu: RCU grace-period kthread stack dump: > 23:42:30 DEBUG| [ 61.003328] rcu_sched I 0 10 2 0x80004000 > 23:42:30 DEBUG| [ 61.003328] Call Trace: > 23:42:30 DEBUG| [ 61.003328] ? __schedule+0x29f/0x680 > ... > > > r~
24.08.2023 18:31, Alex Bennée wrote: .. > which bisects to: > > commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, refs/bisect/bad) > Author: Richard Henderson <richard.henderson@linaro.org> > Date: Tue Aug 1 10:46:03 2023 -0700 > > accel/tcg: Do not issue misaligned i/o It's not the first time something bisects to this commit. But I can't find other relevant cases right now.. /mjt
On 8/24/23 08:31, Alex Bennée wrote: >> It's some sort of timing issue, which sometimes goes away when re-run. >> I was re-running tests *a lot* in order to get them to go green while >> running the 8.1 release. > > There is a definite regression point for the test_pc_q35 case: Not exactly "definite" because it does vanish. > which bisects to: > > commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, refs/bisect/bad) > Author: Richard Henderson <richard.henderson@linaro.org> > Date: Tue Aug 1 10:46:03 2023 -0700 > > accel/tcg: Do not issue misaligned i/o Well, since you can reproduce it, would you please debug it. r~
On 24/8/23 18:23, Michael Tokarev wrote: > 24.08.2023 18:31, Alex Bennée wrote: > .. >> which bisects to: >> >> commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, >> refs/bisect/bad) >> Author: Richard Henderson <richard.henderson@linaro.org> >> Date: Tue Aug 1 10:46:03 2023 -0700 >> >> accel/tcg: Do not issue misaligned i/o > > It's not the first time something bisects to this commit. > But I can't find other relevant cases right now.. This seems to be our "we don't model the ISA bus" friend again. TCG i/o DTRT for me.
On 24/8/23 20:31, Richard Henderson wrote: > On 8/24/23 08:31, Alex Bennée wrote: >>> It's some sort of timing issue, which sometimes goes away when re-run. >>> I was re-running tests *a lot* in order to get them to go green while >>> running the 8.1 release. >> >> There is a definite regression point for the test_pc_q35 case: > > Not exactly "definite" because it does vanish. > >> which bisects to: >> >> commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, >> refs/bisect/bad) >> Author: Richard Henderson <richard.henderson@linaro.org> >> Date: Tue Aug 1 10:46:03 2023 -0700 >> >> accel/tcg: Do not issue misaligned i/o > > Well, since you can reproduce it, would you please debug it. Not sure if that helps, but there is no failure when using -icount auto.