mbox

[PULL,00/24] tcg + linux-user queue for 8.1-rc3

Message ID 20230806033715.244648-1-richard.henderson@linaro.org
State New
Headers show

Pull-request

https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230805

Message

Richard Henderson Aug. 6, 2023, 3:36 a.m. UTC
The following changes since commit 6db03ccc7f4ca33c99debaac290066f4500a2dfb:

  Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2023-08-04 14:47:00 -0700)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230805

for you to fetch changes up to 843246699425adfb6b81f927c16c9c6249b51e1d:

  linux-user/elfload: Set V in ELF_HWCAP for RISC-V (2023-08-05 18:17:20 +0000)

----------------------------------------------------------------
accel/tcg: Do not issue misaligned i/o
accel/tcg: Call save_iotlb_data from io_readx
gdbstub: use 0 ("any process") on packets with no PID
linux-user: Fixes for MAP_FIXED_NOREPLACE
linux-user: Fixes for brk
linux-user: Adjust task_unmapped_base for reserved_va
linux-user: Use ELF_ET_DYN_BASE for ET_DYN with interpreter
linux-user: Remove host != guest page size workarounds in brk and image load
linux-user: Set V in ELF_HWCAP for RISC-V
*-user: Remove last_brk as unused

----------------------------------------------------------------
Akihiko Odaki (6):
      linux-user: Unset MAP_FIXED_NOREPLACE for host
      linux-user: Fix MAP_FIXED_NOREPLACE on old kernels
      linux-user: Do not call get_errno() in do_brk()
      linux-user: Use MAP_FIXED_NOREPLACE for do_brk()
      linux-user: Do nothing if too small brk is specified
      linux-user: Do not align brk with host page size

Helge Deller (1):
      linux-user: Adjust initial brk when interpreter is close to executable

Matheus Tavares Bernardino (1):
      gdbstub: use 0 ("any process") on packets with no PID

Mikhail Tyutin (1):
      accel/tcg: Call save_iotlb_data from io_readx as well.

Nathan Egge (1):
      linux-user/elfload: Set V in ELF_HWCAP for RISC-V

Richard Henderson (14):
      accel/tcg: Adjust parameters and locking with do_{ld,st}_mmio_*
      accel/tcg: Issue wider aligned i/o in do_{ld,st}_mmio_*
      accel/tcg: Do not issue misaligned i/o
      linux-user: Remove last_brk
      bsd-user: Remove last_brk
      linux-user: Adjust task_unmapped_base for reserved_va
      linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h
      linux-user: Define ELF_ET_DYN_BASE in $guest/target_mman.h
      linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
      linux-user: Use elf_et_dyn_base for ET_DYN with interpreter
      linux-user: Properly set image_info.brk in flatload
      linux-user: Do not adjust image mapping for host page size
      linux-user: Do not adjust zero_bss for host page size
      linux-user: Use zero_bss for PT_LOAD with no file contents too

 bsd-user/qemu.h                      |   1 -
 linux-user/aarch64/target_mman.h     |  13 ++
 linux-user/alpha/target_mman.h       |  11 ++
 linux-user/arm/target_mman.h         |  11 ++
 linux-user/cris/target_mman.h        |  12 ++
 linux-user/hexagon/target_mman.h     |  13 ++
 linux-user/hppa/target_mman.h        |   6 +
 linux-user/i386/target_mman.h        |  16 ++
 linux-user/loongarch64/target_mman.h |  11 ++
 linux-user/m68k/target_mman.h        |   5 +
 linux-user/microblaze/target_mman.h  |  11 ++
 linux-user/mips/target_mman.h        |  10 ++
 linux-user/nios2/target_mman.h       |  10 ++
 linux-user/openrisc/target_mman.h    |  10 ++
 linux-user/ppc/target_mman.h         |  20 +++
 linux-user/qemu.h                    |   2 -
 linux-user/riscv/target_mman.h       |  10 ++
 linux-user/s390x/target_mman.h       |  20 +++
 linux-user/sh4/target_mman.h         |   7 +
 linux-user/sparc/target_mman.h       |  25 +++
 linux-user/user-mmap.h               |   6 +-
 linux-user/x86_64/target_mman.h      |  15 ++
 linux-user/xtensa/target_mman.h      |  10 ++
 accel/tcg/cputlb.c                   | 289 +++++++++++++++++++++++------------
 bsd-user/mmap.c                      |   2 -
 gdbstub/gdbstub.c                    |   2 +-
 linux-user/elfload.c                 | 184 ++++++++++------------
 linux-user/flatload.c                |   2 +-
 linux-user/main.c                    |  45 +++++-
 linux-user/mmap.c                    |  68 +++++----
 linux-user/syscall.c                 |  69 ++-------
 31 files changed, 622 insertions(+), 294 deletions(-)

Comments

Richard Henderson Aug. 7, 2023, 1:22 a.m. UTC | #1
On 8/5/23 20:36, Richard Henderson wrote:
> The following changes since commit 6db03ccc7f4ca33c99debaac290066f4500a2dfb:
> 
>    Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2023-08-04 14:47:00 -0700)
> 
> are available in the Git repository at:
> 
>    https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230805
> 
> for you to fetch changes up to 843246699425adfb6b81f927c16c9c6249b51e1d:
> 
>    linux-user/elfload: Set V in ELF_HWCAP for RISC-V (2023-08-05 18:17:20 +0000)
> 
> ----------------------------------------------------------------
> accel/tcg: Do not issue misaligned i/o
> accel/tcg: Call save_iotlb_data from io_readx
> gdbstub: use 0 ("any process") on packets with no PID
> linux-user: Fixes for MAP_FIXED_NOREPLACE
> linux-user: Fixes for brk
> linux-user: Adjust task_unmapped_base for reserved_va
> linux-user: Use ELF_ET_DYN_BASE for ET_DYN with interpreter
> linux-user: Remove host != guest page size workarounds in brk and image load
> linux-user: Set V in ELF_HWCAP for RISC-V
> *-user: Remove last_brk as unused
> 
> ----------------------------------------------------------------
> Akihiko Odaki (6):
>        linux-user: Unset MAP_FIXED_NOREPLACE for host
>        linux-user: Fix MAP_FIXED_NOREPLACE on old kernels
>        linux-user: Do not call get_errno() in do_brk()
>        linux-user: Use MAP_FIXED_NOREPLACE for do_brk()
>        linux-user: Do nothing if too small brk is specified
>        linux-user: Do not align brk with host page size
> 
> Helge Deller (1):
>        linux-user: Adjust initial brk when interpreter is close to executable
> 
> Matheus Tavares Bernardino (1):
>        gdbstub: use 0 ("any process") on packets with no PID
> 
> Mikhail Tyutin (1):
>        accel/tcg: Call save_iotlb_data from io_readx as well.
> 
> Nathan Egge (1):
>        linux-user/elfload: Set V in ELF_HWCAP for RISC-V
> 
> Richard Henderson (14):
>        accel/tcg: Adjust parameters and locking with do_{ld,st}_mmio_*
>        accel/tcg: Issue wider aligned i/o in do_{ld,st}_mmio_*
>        accel/tcg: Do not issue misaligned i/o
>        linux-user: Remove last_brk
>        bsd-user: Remove last_brk
>        linux-user: Adjust task_unmapped_base for reserved_va
>        linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h
>        linux-user: Define ELF_ET_DYN_BASE in $guest/target_mman.h
>        linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap
>        linux-user: Use elf_et_dyn_base for ET_DYN with interpreter
>        linux-user: Properly set image_info.brk in flatload
>        linux-user: Do not adjust image mapping for host page size
>        linux-user: Do not adjust zero_bss for host page size
>        linux-user: Use zero_bss for PT_LOAD with no file contents too

Applied a truncated version of this PR:

3c4a8a8fda bsd-user: Remove last_brk
62cbf08150 linux-user: Remove last_brk
0662a626a7 linux-user: Properly set image_info.brk in flatload
2aea137a42 linux-user: Do not align brk with host page size
cb9d5d1fda linux-user: Do nothing if too small brk is specified
e69e032d1a linux-user: Use MAP_FIXED_NOREPLACE for do_brk()
c6cc059eca linux-user: Do not call get_errno() in do_brk()
ddcdd8c48f linux-user: Fix MAP_FIXED_NOREPLACE on old kernels
c3dd50da0f linux-user: Unset MAP_FIXED_NOREPLACE for host
4333f0924c linux-user/elfload: Set V in ELF_HWCAP for RISC-V
89e5b7935e configure: Fix linux-user host detection for riscv64
6c78de6eb6 gdbstub: use 0 ("any process") on packets with no PID
c30d0b861c accel/tcg: Call save_iotlb_data from io_readx as well
f7eaf9d702 accel/tcg: Do not issue misaligned i/o
190aba803f accel/tcg: Issue wider aligned i/o in do_{ld,st}_mmio_*
1966855e56 accel/tcg: Adjust parameters and locking with do_{ld,st}_mmio_*


The "Use MAP_FIXED_NOREPLACE for initial image mmap" patch tickles a latent bug in 
probe_guest_base, which affects our s390x host.  I omitted all of the task_unmapped_base 
and elf_et_dyn_base patches as well since they are also affect layout.


r~
Richard Henderson Aug. 23, 2023, 4:27 p.m. UTC | #2
On 8/23/23 06:04, Thomas Huth wrote:
> On 06/08/2023 05.36, Richard Henderson wrote:
>> The following changes since commit 6db03ccc7f4ca33c99debaac290066f4500a2dfb:
>>
>>    Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2023-08-04 
>> 14:47:00 -0700)
>>
>> are available in the Git repository at:
>>
>>    https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230805
>>
>> for you to fetch changes up to 843246699425adfb6b81f927c16c9c6249b51e1d:
>>
>>    linux-user/elfload: Set V in ELF_HWCAP for RISC-V (2023-08-05 18:17:20 +0000)
>>
>> ----------------------------------------------------------------
>> accel/tcg: Do not issue misaligned i/o
>> accel/tcg: Call save_iotlb_data from io_readx
>> gdbstub: use 0 ("any process") on packets with no PID
>> linux-user: Fixes for MAP_FIXED_NOREPLACE
>> linux-user: Fixes for brk
>> linux-user: Adjust task_unmapped_base for reserved_va
>> linux-user: Use ELF_ET_DYN_BASE for ET_DYN with interpreter
>> linux-user: Remove host != guest page size workarounds in brk and image load
>> linux-user: Set V in ELF_HWCAP for RISC-V
>> *-user: Remove last_brk as unused
> 
>   Hi Richard,
> 
> I noticed that we currently have two failing Avocado jobs in our CI, avocado-system-centos 
> and avocado-system-opensuse, where the boot_linux.py:BootLinuxX8664.test_pc_i440fx_tcg and 
> the boot_linux.py:BootLinuxX8664.test_pc_q35_tcg are now apparently crashing. If I've got 
> the history right, it started with your pull request here, in the preceeding one from 
> Paolo, everything is still green:
> 
>   https://gitlab.com/qemu-project/qemu/-/pipelines/956543770
> 
> But here the jobs started failing:
> 
>   https://gitlab.com/qemu-project/qemu/-/pipelines/957458385
> 
> Could you please have a look?

It's some sort of timing issue, which sometimes goes away when re-run.  I was re-running 
tests *a lot* in order to get them to go green while running the 8.1 release.

For instance, with very little added except for your s390x pull, the same 
BootLinuxX8664.test_pc_i440fx_tcg test passes:

https://gitlab.com/qemu-project/qemu/-/jobs/4931341744#L136

In the failing i44fx_tcg test, you can even see it's a timing issue:

https://qemu-project.gitlab.io/-/qemu/-/jobs/4813804725/artifacts/build/tests/results/latest/test-results/02-tests_avocado_boot_linux.py_BootLinuxX8664.test_pc_i440fx_tcg/debug.log

23:42:30 DEBUG| [   61.003328] Sending NMI from CPU 0 to CPUs 1:
23:42:30 DEBUG| [   61.007829] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long 
to run: 2.622 msecs
23:42:30 DEBUG| [   61.003328] NMI backtrace for cpu 1 skipped: idling at 
native_safe_halt+0xe/0x10
23:42:30 DEBUG| [   61.003328] rcu: rcu_sched kthread starved for 60002 jiffies! g-963 
f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
23:42:30 DEBUG| [   61.003328] rcu: RCU grace-period kthread stack dump:
23:42:30 DEBUG| [   61.003328] rcu_sched       I    0    10      2 0x80004000
23:42:30 DEBUG| [   61.003328] Call Trace:
23:42:30 DEBUG| [   61.003328]  ? __schedule+0x29f/0x680
...


r~
Alex Bennée Aug. 24, 2023, 3:31 p.m. UTC | #3
Richard Henderson <richard.henderson@linaro.org> writes:

> On 8/23/23 06:04, Thomas Huth wrote:
>> On 06/08/2023 05.36, Richard Henderson wrote:
>>> The following changes since commit 6db03ccc7f4ca33c99debaac290066f4500a2dfb:
>>>
>>>    Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into
>>> staging (2023-08-04 14:47:00 -0700)
>>>
>>> are available in the Git repository at:
>>>
>>>    https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230805
>>>
>>> for you to fetch changes up to 843246699425adfb6b81f927c16c9c6249b51e1d:
>>>
>>>    linux-user/elfload: Set V in ELF_HWCAP for RISC-V (2023-08-05 18:17:20 +0000)
>>>
>>> ----------------------------------------------------------------
>>> accel/tcg: Do not issue misaligned i/o
>>> accel/tcg: Call save_iotlb_data from io_readx
>>> gdbstub: use 0 ("any process") on packets with no PID
>>> linux-user: Fixes for MAP_FIXED_NOREPLACE
>>> linux-user: Fixes for brk
>>> linux-user: Adjust task_unmapped_base for reserved_va
>>> linux-user: Use ELF_ET_DYN_BASE for ET_DYN with interpreter
>>> linux-user: Remove host != guest page size workarounds in brk and image load
>>> linux-user: Set V in ELF_HWCAP for RISC-V
>>> *-user: Remove last_brk as unused
>>   Hi Richard,
>> I noticed that we currently have two failing Avocado jobs in our CI,
>> avocado-system-centos and avocado-system-opensuse, where the
>> boot_linux.py:BootLinuxX8664.test_pc_i440fx_tcg and the
>> boot_linux.py:BootLinuxX8664.test_pc_q35_tcg are now apparently
>> crashing. If I've got the history right, it started with your pull
>> request here, in the preceeding one from Paolo, everything is still
>> green:
>>   https://gitlab.com/qemu-project/qemu/-/pipelines/956543770
>> But here the jobs started failing:
>>   https://gitlab.com/qemu-project/qemu/-/pipelines/957458385
>> Could you please have a look?
>
> It's some sort of timing issue, which sometimes goes away when re-run.
> I was re-running tests *a lot* in order to get them to go green while
> running the 8.1 release.

There is a definite regression point for the test_pc_q35 case:

  ./tests/venv/bin/avocado run ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg
  JOB ID     : b8ea329d3353db7a47eb955fcad2f26b2dbe9f29
  JOB LOG    : /home/alex.bennee/avocado/job-results/job-2023-08-24T15.27-b8ea329/job.log
   (1/1) ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg: PASS (110.70 s)
  RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
  JOB TIME   : 111.22 s
  🕙15:29:06 alex.bennee@hackbox2:qemu.git/builds/bisect  (190aba8) (BISECTING) [$!?] took 1m51s 
  ➜  make -j30
  [1/8] Generating qemu-version.h with a custom command (wrapped by meson to capture output)
  [2/8] Compiling C object qga/qemu-ga.p/main.c.o
  [3/8] Compiling C object libqmp.fa.p/monitor_qmp-cmds-control.c.o
  [4/8] Compiling C object libqemu-x86_64-softmmu.fa.p/accel_tcg_cputlb.c.o
  [5/8] Compiling C object libcommon.fa.p/softmmu_vl.c.o
  [6/8] Linking static target libqmp.fa
  [7/8] Linking target qga/qemu-ga
  [8/8] Linking target qemu-system-x86_64
  🕙15:30:12 alex.bennee@hackbox2:qemu.git/builds/bisect  (f7eaf9d) (BISECTING) [$!?] took 5s 
  ➜  ./tests/venv/bin/avocado run ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg
  JOB ID     : 56768272dee373062792251ee3445cc81092634e
  JOB LOG    : /home/alex.bennee/avocado/job-results/job-2023-08-24T15.30-5676827/job.log
   (1/1) ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '1-./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg', 'logdir': '/home/alex.bennee/avocado/job-results/job-2023-08-24T15.30-5676827/test-results... (480.28 s)
  RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
  JOB TIME   : 480.80 s

which bisects to:

  commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, refs/bisect/bad)
  Author: Richard Henderson <richard.henderson@linaro.org>
  Date:   Tue Aug 1 10:46:03 2023 -0700

      accel/tcg: Do not issue misaligned i/o

      In the single-page case we were issuing misaligned i/o to
      the memory subsystem, which does not handle it properly.
      Split such accesses via do_{ld,st}_mmio_*.

      Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1800
      Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
      Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

>
> For instance, with very little added except for your s390x pull, the
> same BootLinuxX8664.test_pc_i440fx_tcg test passes:
>
> https://gitlab.com/qemu-project/qemu/-/jobs/4931341744#L136
>
> In the failing i44fx_tcg test, you can even see it's a timing issue:
>
> https://qemu-project.gitlab.io/-/qemu/-/jobs/4813804725/artifacts/build/tests/results/latest/test-results/02-tests_avocado_boot_linux.py_BootLinuxX8664.test_pc_i440fx_tcg/debug.log
>
> 23:42:30 DEBUG| [   61.003328] Sending NMI from CPU 0 to CPUs 1:
> 23:42:30 DEBUG| [   61.007829] INFO: NMI handler
> (nmi_cpu_backtrace_handler) took too long to run: 2.622 msecs
> 23:42:30 DEBUG| [   61.003328] NMI backtrace for cpu 1 skipped: idling
> at native_safe_halt+0xe/0x10
> 23:42:30 DEBUG| [   61.003328] rcu: rcu_sched kthread starved for
> 60002 jiffies! g-963 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
> 23:42:30 DEBUG| [   61.003328] rcu: RCU grace-period kthread stack dump:
> 23:42:30 DEBUG| [   61.003328] rcu_sched       I    0    10      2 0x80004000
> 23:42:30 DEBUG| [   61.003328] Call Trace:
> 23:42:30 DEBUG| [   61.003328]  ? __schedule+0x29f/0x680
> ...
>
>
> r~
Michael Tokarev Aug. 24, 2023, 4:23 p.m. UTC | #4
24.08.2023 18:31, Alex Bennée wrote:
..
> which bisects to:
> 
>    commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, refs/bisect/bad)
>    Author: Richard Henderson <richard.henderson@linaro.org>
>    Date:   Tue Aug 1 10:46:03 2023 -0700
> 
>        accel/tcg: Do not issue misaligned i/o

It's not the first time something bisects to this commit.
But I can't find other relevant cases right now..

/mjt
Richard Henderson Aug. 24, 2023, 6:31 p.m. UTC | #5
On 8/24/23 08:31, Alex Bennée wrote:
>> It's some sort of timing issue, which sometimes goes away when re-run.
>> I was re-running tests *a lot* in order to get them to go green while
>> running the 8.1 release.
> 
> There is a definite regression point for the test_pc_q35 case:

Not exactly "definite" because it does vanish.

> which bisects to:
> 
>    commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, refs/bisect/bad)
>    Author: Richard Henderson <richard.henderson@linaro.org>
>    Date:   Tue Aug 1 10:46:03 2023 -0700
> 
>        accel/tcg: Do not issue misaligned i/o

Well, since you can reproduce it, would you please debug it.


r~
Philippe Mathieu-Daudé Aug. 25, 2023, 11:05 a.m. UTC | #6
On 24/8/23 18:23, Michael Tokarev wrote:
> 24.08.2023 18:31, Alex Bennée wrote:
> ..
>> which bisects to:
>>
>>    commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, 
>> refs/bisect/bad)
>>    Author: Richard Henderson <richard.henderson@linaro.org>
>>    Date:   Tue Aug 1 10:46:03 2023 -0700
>>
>>        accel/tcg: Do not issue misaligned i/o
> 
> It's not the first time something bisects to this commit.
> But I can't find other relevant cases right now..

This seems to be our "we don't model the ISA bus" friend again.

TCG i/o DTRT for me.
Philippe Mathieu-Daudé Aug. 25, 2023, 11:36 a.m. UTC | #7
On 24/8/23 20:31, Richard Henderson wrote:
> On 8/24/23 08:31, Alex Bennée wrote:
>>> It's some sort of timing issue, which sometimes goes away when re-run.
>>> I was re-running tests *a lot* in order to get them to go green while
>>> running the 8.1 release.
>>
>> There is a definite regression point for the test_pc_q35 case:
> 
> Not exactly "definite" because it does vanish.
> 
>> which bisects to:
>>
>>    commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, 
>> refs/bisect/bad)
>>    Author: Richard Henderson <richard.henderson@linaro.org>
>>    Date:   Tue Aug 1 10:46:03 2023 -0700
>>
>>        accel/tcg: Do not issue misaligned i/o
> 
> Well, since you can reproduce it, would you please debug it.

Not sure if that helps, but there is no failure when using -icount auto.