mbox series

[RFC,v1,0/7] Bootstage reports for CI

Message ID 20250411153040.1772000-1-jerome.forissier@linaro.org
Headers show
Series Bootstage reports for CI | expand

Message

Jerome Forissier April 11, 2025, 3:29 p.m. UTC
The BOOTSTAGE Kconfig symbol allows to record boot time information
which can be consumed in several ways:

1) Printed to the console just before the OS is booted (when
   BOOTSTAGE_REPORT=y)
2) Printed to the console by the "bootstage report" command (when
   CMD_BOOTSTAGE=y)
3) Passed to the OS in the Device Tree (when BOOTSTAGE_FDT=y)
4) Written to some memory location in binary format before the OS is
   booted (when BOOTSTAGE_STASH=y)

None of these options are convenient for use in CI. Suppose we want to
monitor a set of boards for boot time regressions -- in other words,
make sure the boot time does not degrade unexpectedly as the code
evolves. For that, we'd like to be able to record the bootstage data in
some kind of database or persistent storage and possibly draw graphs
showing trends over time.

This RFC is a step in that direction. It introduces two new output
formats for the bootstage data. The two are independant, they are simply
two options I considered:

1) JSON
2) InfluxDB v2 line protocol [1]

Both depend on BOOTSTAGE_REPORT and are enabled by BOOTSTAGE_REPORT_JSON
and BOOTSTAGE_REPORT_INFLUXDB respectively. Each format comes with its
own test in test/py/tests. The InfluxDB test is special in that it is
able to upload the data to a cloud database, provided the environment
variables BOOTSTAGE_INFLUXDB_URI and BOOTSTAGE_INFLUXDB_TOKEN are set
properly.

I have been able to run a boot test on rpi4 in the sjg-lab with
BOOTSTGE_REPORT_INFLUXDB enabled. The CI log [2] shows that the data
were indeed uploaded to my InfluxDB Cloud test account.

This is published as an RFC since it is just an investigation. If
someone finds this useful I may folloow-up with a non-RFC series.

Comments are welcome.

[1] https://docs.influxdata.com/influxdb/v2/reference/syntax/line-protocol/
[2] https://source.denx.de/u-boot/custodians/u-boot-net/-/jobs/1097039#L116


Jerome Forissier (7):
  efi_loader: make efi_exit_boot_services() call bootstage_report()
  bootstage: add support for reporting in JSON format
  test/py/tests/test_bootstage.py: add test for JSON report
  bootstage: add support for reporting in InfluxDB v2 line format
  test/py/tests/test_bootstage.py: add test for InfluxDB report
  sandbox64_defconfig: enable bootstage report in JSON and InfluxDB
    formats
  test/py/tests/test_bootstage.py: upload bootstage data to InfluxDB
    cloud

 boot/Kconfig                    |  17 ++
 common/bootstage.c              | 289 +++++++++++++++++++++++++++++++-
 configs/sandbox64_defconfig     |   2 +
 lib/efi_loader/efi_boottime.c   |   7 +
 test/py/requirements.txt        |   1 +
 test/py/tests/test_bootstage.py |  57 +++++++
 6 files changed, 372 insertions(+), 1 deletion(-)

Comments

Tom Rini April 11, 2025, 5:37 p.m. UTC | #1
On Fri, Apr 11, 2025 at 05:29:26PM +0200, Jerome Forissier wrote:

> The BOOTSTAGE Kconfig symbol allows to record boot time information
> which can be consumed in several ways:
> 
> 1) Printed to the console just before the OS is booted (when
>    BOOTSTAGE_REPORT=y)
> 2) Printed to the console by the "bootstage report" command (when
>    CMD_BOOTSTAGE=y)
> 3) Passed to the OS in the Device Tree (when BOOTSTAGE_FDT=y)
> 4) Written to some memory location in binary format before the OS is
>    booted (when BOOTSTAGE_STASH=y)
> 
> None of these options are convenient for use in CI. Suppose we want to
> monitor a set of boards for boot time regressions -- in other words,
> make sure the boot time does not degrade unexpectedly as the code
> evolves. For that, we'd like to be able to record the bootstage data in
> some kind of database or persistent storage and possibly draw graphs
> showing trends over time.
> 
> This RFC is a step in that direction. It introduces two new output
> formats for the bootstage data. The two are independant, they are simply
> two options I considered:
> 
> 1) JSON
> 2) InfluxDB v2 line protocol [1]
> 
> Both depend on BOOTSTAGE_REPORT and are enabled by BOOTSTAGE_REPORT_JSON
> and BOOTSTAGE_REPORT_INFLUXDB respectively. Each format comes with its
> own test in test/py/tests. The InfluxDB test is special in that it is
> able to upload the data to a cloud database, provided the environment
> variables BOOTSTAGE_INFLUXDB_URI and BOOTSTAGE_INFLUXDB_TOKEN are set
> properly.
> 
> I have been able to run a boot test on rpi4 in the sjg-lab with
> BOOTSTGE_REPORT_INFLUXDB enabled. The CI log [2] shows that the data
> were indeed uploaded to my InfluxDB Cloud test account.
> 
> This is published as an RFC since it is just an investigation. If
> someone finds this useful I may folloow-up with a non-RFC series.
> 
> Comments are welcome.
> 
> [1] https://docs.influxdata.com/influxdb/v2/reference/syntax/line-protocol/
> [2] https://source.denx.de/u-boot/custodians/u-boot-net/-/jobs/1097039#L116

This is very interesting. One thing I wonder about wrt reporting is if
what would be in the JUnit XML report already is enough, or if we need
more granularity? Something like the following might work to get them
saved for Simon's lab like they are for the emulated targets:
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index e54bdd6c4bec..8ec5dfbb2528 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -558,6 +558,7 @@ coreboot test.py:
     - export USE_LABGRID_SJG=1
     # export verbose="-v"
     - ${SRC}/test/py/test.py --role ${ROLE} --build-dir "${OUT}"
+        --junitxml="${OUT}"/results.xml
         --capture=tee-sys -k "not bootstd ${TEST_PY_TEST_SPEC}" || ret=$?
     - U_BOOT_BOARD_IDENTITY="${ROLE}" u-boot-test-release || true
     - if [[ $ret -ne 0 ]]; then
@@ -568,7 +569,10 @@ coreboot test.py:
     paths:
       - "build/${BOARD}/test-log.html"
       - "build/${BOARD}/multiplexed_log.css"
+      - "build/${BOARD}/results.xml"
     expire_in: 1 week
+    reports:
+      junit: build/${BOARD}/results.xml
 
 rpi3:
   variables: