Acceptance Test BootLinuxAarch64.test_virt_tcg execution times

Message ID	20200806193553.GA1463846@localhost.localdomain
State	New
Headers	show Return-Path: <SRS0=Z/nf=BQ=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 24E2420748 Date: Thu, 6 Aug 2020 15:35:53 -0400 From: Cleber Rosa <crosa@redhat.com> To: Peter Maydell <peter.maydell@linaro.org> Subject: Acceptance Test BootLinuxAarch64.test_virt_tcg execution times Message-ID: <20200806193553.GA1463846@localhost.localdomain> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="nFreZHaLTZJo0R7j" Content-Disposition: inline Received-SPF: pass client-ip=207.211.31.81; envelope-from=crosa@redhat.com; helo=us-smtp-delivery-1.mimecast.com Precedence: list Cc: Alex =?iso-8859-1?q?Benn=E9e?= <alex.bennee@linaro.org>, qemu-arm@nongnu.org, Philippe =?iso-8859-1?q?Mathieu-Daud=E9?= <philmd@redhat.com>, QEMU Developers <qemu-devel@nongnu.org> Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	Acceptance Test BootLinuxAarch64.test_virt_tcg execution times \| expand Acceptance Test BootLinuxAarch64.test_virt_tcg execution times

Message ID

20200806193553.GA1463846@localhost.localdomain

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 24E2420748
Date: Thu, 6 Aug 2020 15:35:53 -0400
From: Cleber Rosa <crosa@redhat.com>
To: Peter Maydell <peter.maydell@linaro.org>
Subject: Acceptance Test BootLinuxAarch64.test_virt_tcg execution times
Message-ID: <20200806193553.GA1463846@localhost.localdomain>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="nFreZHaLTZJo0R7j"
Content-Disposition: inline
Received-SPF: pass client-ip=207.211.31.81; envelope-from=crosa@redhat.com; 
	helo=us-smtp-delivery-1.mimecast.com
X-Spam_score_int: -30
X-Spam_score: -3.1
X-Spam_bar: ---
X-Spam_report: (-3.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1,
	DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	DKIM_VALID_EF=-0.1, 
	RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01,
	RCVD_IN_MSPIKE_WL=-0.01, 
	SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Alex =?iso-8859-1?q?Benn=E9e?= <alex.bennee@linaro.org>,
	qemu-arm@nongnu.org, Philippe =?iso-8859-1?q?Mathieu-Daud=E9?=
	<philmd@redhat.com>,  QEMU Developers <qemu-devel@nongnu.org>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>

Series

Acceptance Test BootLinuxAarch64.test_virt_tcg execution times | expand

Commit Message

Cleber Rosa Aug. 6, 2020, 7:35 p.m. UTC

TL;DR: This is a followup from an IRC chat about the
tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg test
taking many orders of magnitute longer than other acceptance (and even
similar boot) tests. I could not find an easy way for this specific
test (aarch64+tcg) to have a significant execution time improvement.
The best solution may be to filter out tests that are known to be
slow, and create a specific "test job" that includes them.

Fisrt, if it's not clear, this specific test runs QEMU with TCG and
boots a Fedora 31 "cloud image" and waits until the cloud-init agent
notifies the test that the boot is over.

Out of the four archtiectures tested with the same approach under
"tests/acceptance/boot_linux.py", aarch64 was special, in the sense
that many Linux "cloud images" got stuck very late in the boot
process. What seemed to be a disk activity within the guest that
seemed to make the kernel drain its random number sources if my memory
serves me right. Giving the machine a RNG device fixed it. This can
still be verified Today if you comment the virtio-rng lines in the
aarch64 test.

So, even with the RNG device and the boot process not getting stuck, a
lot of the test time is spent with QEMU actively using CPU time
produced by the guest boot process. This may or may not be the cause
for the slowness.

One approach to have a shorter test time, would be to reduce the
things that happen during the guest boot process. Choosing a minimal
guest, such as CirrOS, would be an example of such a solution, but:

* With less things happening during the guest boot, less things
get tested within QEMU;

* CirrOS can not make use of the same boot cloud-init configuration
and boot verification the test currently uses;

So that leaves other non-minimal Linux "cloud images" as options. But
still, the following things are required or nice to have:

* Support for cloud-init;

* Support for as many as possible architectures;

* Wide user base;

* Be thoroughly tested with this "boot_linux.py" test

So in the end, I picked Fedora 31, which was available and behaved
well for four different architectures with and without KVM. Today, I
verified if switching distros would provide an "easy fix", but the
results were negative. Any ideas on how to improve the test execution
times are appreciated.

For the record, one of the ways we're trying to improve the overall
test experience is to allow tests to run in parallel (expected to be
fully supported on the upcoming version 81.0).

For those interested, these are the numbers I got, and how I tested
with other distros. I'm using QEMU e1d322c405 with a vanilla
configure under a x86_64 Fedora 32 host.

Fedora 31 (baseline):
---

2. Download the image before the test:

$ ./tests/venv/bin/avocado vmimage get --distro=ubuntu --arch aarch64 --distro-version=20.04
The image was downloaded:
Provider Version Architecture File
ubuntu 20.04 arm64 /tmp/data/cache/by_location/19db8c6d910a3f2660c4109ffb85d73d43e5cdf2/ubuntu-20.04-server-cloudimg-arm64.img

3. Run the tests:

$ ./tests/venv/bin/avocado run -t arch:aarch64,accel:tcg --keep-tmp on -- tests/acceptance/boot_linux.py{,,,,}
JOB ID : 92a1bdbb5e933e6dff8b882808a191f1de3c2600
JOB LOG : /home/cleber/avocado/job-results/job-2020-08-06T12.13-92a1bdb/job.log
(1/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (341.40 s)
(2/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (345.82 s)
(3/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (335.91 s)
(4/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (320.32 s)
(5/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (319.79 s)
RESULTS : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME : 1663.92 s

====================

  $ make check-venv
  $ ./tests/venv/bin/avocado run -t arch:aarch64,accel:tcg --keep-tmp on -- tests/acceptance/boot_linux.py{,,,,}
  JOB ID     : 14802f9d5016a44d2937ed7b1fec63b2eaa06e89
  JOB LOG    : /home/cleber/avocado/job-results/job-2020-08-06T13.19-14802f9/job.log
   (1/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (153.12 s)
   (2/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (149.57 s)
   (3/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (154.45 s)
   (4/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (148.97 s)
   (5/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (150.70 s)
  RESULTS    : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
  JOB TIME   : 757.50 s

Fedora 32:
==========

1. Tweak version and image hash:

---
diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
index 0055dc7cee..44c62bd4a2 100644
--- a/tests/acceptance/boot_linux.py
+++ b/tests/acceptance/boot_linux.py
@@ -48,7 +48,7 @@  class BootLinuxBase(Test):
             image_arch = 'ppc64le'
         try:
             boot = vmimage.get(
-                'fedora', arch=image_arch, version='31',
+                'fedora', arch=image_arch, version='32',
                 checksum=self.chksum,
                 algorithm='sha256',
                 cache_dir=self.cache_dirs[0],
@@ -160,7 +160,7 @@  class BootLinuxAarch64(BootLinux):
     :avocado: tags=machine:gic-version=2
     """
 
-    chksum = '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'
+    chksum = 'b367755c664a2d7a26955bbfff985855adfa2ca15e908baf15b4b176d68d3967'
 
     def add_common_args(self):
         self.vm.add_args('-bios',
---

2. Download the image before the test:

  $ ./tests/venv/bin/avocado vmimage get --distro=fedora --arch aarch64 --distro-version=32
  The image was downloaded:
  Provider Version Architecture File
  fedora   32      aarch64      /tmp/data/cache/by_location/7049001631a4b2eabf5766cc110e66d486e09821/Fedora-Cloud-Base-32-1.6.aarch64.qcow2

3. Run the tests:

  $ ./tests/venv/bin/avocado run -t arch:aarch64,accel:tcg --keep-tmp on -- tests/acceptance/boot_linux.py{,,,,}
JOB ID     : 09e740a41dc400f9fcbb9253f613734597fe0efc
JOB LOG    : /home/cleber/avocado/job-results/job-2020-08-06T13.53-09e740a/job.log
 (1/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (162.06 s)
 (2/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (167.78 s)
 (3/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (166.98 s)
 (4/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (171.13 s)
 (5/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (167.43 s)
RESULTS    : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 836.05 s

Ubuntu 20.04:
=============

1. Tweak version and image hash:

---
diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
index 0055dc7cee..03c0e1bee9 100644
--- a/tests/acceptance/boot_linux.py
+++ b/tests/acceptance/boot_linux.py
@@ -48,7 +48,7 @@  class BootLinuxBase(Test):
             image_arch = 'ppc64le'
         try:
             boot = vmimage.get(
-                'fedora', arch=image_arch, version='31',
+                'ubuntu', arch=image_arch, version='20.04',
                 checksum=self.chksum,
                 algorithm='sha256',
                 cache_dir=self.cache_dirs[0],
@@ -160,7 +160,7 @@  class BootLinuxAarch64(BootLinux):
     :avocado: tags=machine:gic-version=2
     """
 
-    chksum = '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'
+    chksum = '1d9e50f3381145835b11911adf611f455d674a570814086b7d6581ecc0718770'
 
     def add_common_args(self):
         self.vm.add_args('-bios',

Acceptance Test BootLinuxAarch64.test_virt_tcg execution times

Commit Message

Patch