mbox series

[RFC,0/2] Coroutines

Message ID cover.1737380886.git.jerome.forissier@linaro.org
Headers show
Series Coroutines | expand

Message

Jerome Forissier Jan. 20, 2025, 1:50 p.m. UTC
This series introduces a simple coroutines framework and uses it to
speed up the efi_init_obj_list() function. It is just an example to
trigger discussions; hopefully other places in U-Boot can benefit from
a similar treatment. Suggestions are welcome.

I came up with this idea after analyzing some profiling data (ftrace)
taken during the execution of the "printenv -e" command on a Kria KV260
board. When the command is first invoked, efi_init_obj_list() is called
which takes a significant amount of time (2.872 seconds). This time can
be split into 0.570 s for efi_disks_register(), 0.811 s for
efi_tcg2_register() and 1.418 s for efi_tcg2_do_initial_measurement().
All the other child functions are much quicker. Another interesting
observation is that a large part of the time is actually spent waiting
in __udelay(). More precisely:
- For efi_disks_register(): 421 ms / 570 ms = 73.8 % spent in __udelay()
- For efi_tcg2_register(): 805 ms / 811 ms = 99.1 % spent in __udelay()
- For efi_tcg2_do_initial_measurement(): 1395.025 ms / 1418.372 ms =
  98.3 % spent in __udelay()

Given the above data, it was reasonable to think that a nice speedup
could be obtained if these functions could somehow be run in parallel.
efi_tcg2_do_initial_measurement() unfortunately needs to be excluded
because it depends on the first two. But efi_disks_register() and
efi_tcg2_register() are clearly independant and are therefore good
candidates for concurrent execution.

So, I considered two options:
- Spin up a secondary core to take care of one function while the other
one runs on the main core
- Introduce some kind of software scheduling
I quickly ruled out the first one for several reasons: initializing a
secondary core is typically quite hardware-specific, it would not scale
well if more functions than available cores would need to be run in
parallel, it would make debugging harder etc.
Software scheduling however can be accomplished quite easily, especially
since we don't need to consider preemptive multitasking. Coroutines [1]
for example can perfectly do the job. They provide a way to save and
restore the execution context (registers and stack). Here is how it
look like:

static void do_some_work(int v)
{
	int i;
	
	for (i = 0; i < 5; i++) {
		printf("%d", v);
		/* Save context and resume main thread */
		co_yield();
	}
}

static void co1(void *arg)
{
	do_some_work(1);
	/* Mark coroutine as "done" and resume main thread */
	co_exit();
}

static void co2(void *arg)
{
	do_some_work(2);
	co_exit();
}

void main_thread(void)
{
	struct co *co1, *co2;

	co1 = co_create(co1, ...);
	co2 = co_create(co2, ...);

	do {
		printf("A");
		if (!co1->done) {
			/* Invoke or resume first coroutine */
			co_resume(co1);
		}
		printf("B");
		if (!cor21->done) {
			/* Invoke or resume second coroutine */
			co_resume(co2);
		}
	} while (!(co1->done && co2->done));

	/* At this point, co1 and co2 have both called co_exit() */
}

The above example would print: A1B2A1B2A1B2A1B2A1B2.

- The first commit introduces the coroutine framework, loosely based on
libaco [2]. The code was simplified and reformatted to better suit
U-Boot.
- The second commit modifies efi_init_obj_list() in order to turn
efi_disks_register() and efi_tcg2_register() into coroutines when
COROUTINES in enabled.

[1] https://en.wikipedia.org/wiki/Coroutine
[2] https://github.com/hnes/libaco/

Jerome Forissier (2):
  Introduce coroutines framework
  efi_loader: optimize efi_init_obj_list() with coroutines

 arch/arm/cpu/armv8/Makefile     |   1 +
 arch/arm/cpu/armv8/co_switch.S  |  35 +++++++
 arch/x86/cpu/i386/Makefile      |   1 +
 arch/x86/cpu/i386/co_switch.S   |  26 +++++
 arch/x86/cpu/x86_64/Makefile    |   2 +
 arch/x86/cpu/x86_64/co_switch.S |  26 +++++
 include/coroutines.h            | 151 +++++++++++++++++++++++++++
 lib/Kconfig                     |  10 ++
 lib/Makefile                    |   2 +
 lib/coroutines.c                | 176 ++++++++++++++++++++++++++++++++
 lib/efi_loader/efi_setup.c      | 113 +++++++++++++++++++-
 lib/time.c                      |  14 ++-
 12 files changed, 552 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm/cpu/armv8/co_switch.S
 create mode 100644 arch/x86/cpu/i386/co_switch.S
 create mode 100644 arch/x86/cpu/x86_64/co_switch.S
 create mode 100644 include/coroutines.h
 create mode 100644 lib/coroutines.c