diff mbox series

[v2] tests/functional/aarch64: add tests for FEAT_RME

Message ID 20241128213729.1021961-1-pierrick.bouvier@linaro.org
State New
Headers show
Series [v2] tests/functional/aarch64: add tests for FEAT_RME | expand

Commit Message

Pierrick Bouvier Nov. 28, 2024, 9:37 p.m. UTC
This boot an OP-TEE environment, and launch a nested guest VM inside it
using the Realms feature. We do it for virt and sbsa-ref platforms.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>

-----

v2:
- move test to its own file
- add sbsa test
- check output of `cca-workload-attestation report`

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 tests/functional/meson.build                 |  4 +
 tests/functional/test_aarch64_rme_sbsaref.py | 96 +++++++++++++++++++
 tests/functional/test_aarch64_rme_virt.py    | 98 ++++++++++++++++++++
 3 files changed, 198 insertions(+)
 create mode 100755 tests/functional/test_aarch64_rme_sbsaref.py
 create mode 100755 tests/functional/test_aarch64_rme_virt.py

Comments

Marcin Juszkiewicz Dec. 1, 2024, 1:34 p.m. UTC | #1
W dniu 28.11.2024 o 22:37, Pierrick Bouvier pisze:
> This boot an OP-TEE environment, and launch a nested guest VM inside it
> using the Realms feature. We do it for virt and sbsa-ref platforms.
> 
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>

> diff --git a/tests/functional/meson.build b/tests/functional/meson.build
> index 5c048cfac6d..b975a1560df 100644
> --- a/tests/functional/meson.build
> +++ b/tests/functional/meson.build
> @@ -13,6 +13,8 @@ endif
>   test_timeouts = {
>     'aarch64_aspeed' : 600,
>     'aarch64_raspi4' : 480,

> +  'aarch64_rme_virt' : 720,

Took 2974.95s on M1 Pro macbook.

> +  'aarch64_rme_sbsaref' : 720,

This one needed 2288.29s.

>     'aarch64_sbsaref_alpine' : 720,

Have to check cause timed out.

>     'aarch64_sbsaref_freebsd' : 720,

331.65s

So RME tests probably need longer timeouts or would not run at all.


> +++ b/tests/functional/test_aarch64_rme_sbsaref.py

> +        self.vm.add_args('-accel', 'tcg')

That's default value so can be skipped.

> +        self.vm.add_args('-cpu', 'max,x-rme=on')

> +        self.vm.add_args('-m', '2G')

I sent patch to bump default memsize to 2G recently.

> +        self.vm.add_args('-M', 'sbsa-ref')
> +        self.vm.add_args('-drive', f'file={pflash0},format=raw,if=pflash')
> +        self.vm.add_args('-drive', f'file={pflash1},format=raw,if=pflash')
> +        self.vm.add_args('-drive', f'file=fat:rw:{virtual},format=raw')

> +        self.vm.add_args('-drive', f'format=raw,if=none,file={drive},id=hd0')
> +        self.vm.add_args('-device', 'virtio-blk-pci,drive=hd0')

sbsa-ref is fully emulated target. There is AHCI controller built-in so 
only "-drive" argument should be needed (no "-device" one).

> +        self.vm.add_args('-device', 'virtio-9p-pci,fsdev=shr0,mount_tag=shr0')
> +        self.vm.add_args('-fsdev', f'local,security_model=none,path={rme_stack},id=shr0')

> +        self.vm.add_args('-device', 'virtio-net-pci,netdev=net0')
> +        self.vm.add_args('-netdev', 'user,id=net0')

e1000e is built-in already


As both virt and sbsa-ref tests do "more or less" the same stuff then it 
would be good to make common file/class and reuse it both tests by 
adding hardware differences.
Pierrick Bouvier Dec. 1, 2024, 6:09 p.m. UTC | #2
Hi Marcin,

On 12/1/24 05:34, Marcin Juszkiewicz wrote:
> W dniu 28.11.2024 o 22:37, Pierrick Bouvier pisze:
>> This boot an OP-TEE environment, and launch a nested guest VM inside it
>> using the Realms feature. We do it for virt and sbsa-ref platforms.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> 
>> diff --git a/tests/functional/meson.build b/tests/functional/meson.build
>> index 5c048cfac6d..b975a1560df 100644
>> --- a/tests/functional/meson.build
>> +++ b/tests/functional/meson.build
>> @@ -13,6 +13,8 @@ endif
>>    test_timeouts = {
>>      'aarch64_aspeed' : 600,
>>      'aarch64_raspi4' : 480,
> 
>> +  'aarch64_rme_virt' : 720,
> 
> Took 2974.95s on M1 Pro macbook.
> 
>> +  'aarch64_rme_sbsaref' : 720,
> 
> This one needed 2288.29s.
> 
>>      'aarch64_sbsaref_alpine' : 720,
> 
> Have to check cause timed out.
> 
>>      'aarch64_sbsaref_freebsd' : 720,
> 
> 331.65s
> 
> So RME tests probably need longer timeouts or would not run at all.
> 

By any chance, are you running those tests in debug mode?
It seems to me that CI is running functional tests with optimized 
builds, so I'm not sure we want to support debug "times" here.

> 
>> +++ b/tests/functional/test_aarch64_rme_sbsaref.py
> 
>> +        self.vm.add_args('-accel', 'tcg')
> 
> That's default value so can be skipped.
> 

Ok.

>> +        self.vm.add_args('-cpu', 'max,x-rme=on')
> 
>> +        self.vm.add_args('-m', '2G')
> 
> I sent patch to bump default memsize to 2G recently.
> 

Is that merged already, or will be later?

>> +        self.vm.add_args('-M', 'sbsa-ref')
>> +        self.vm.add_args('-drive', f'file={pflash0},format=raw,if=pflash')
>> +        self.vm.add_args('-drive', f'file={pflash1},format=raw,if=pflash')
>> +        self.vm.add_args('-drive', f'file=fat:rw:{virtual},format=raw')
> 
>> +        self.vm.add_args('-drive', f'format=raw,if=none,file={drive},id=hd0')
>> +        self.vm.add_args('-device', 'virtio-blk-pci,drive=hd0')
> 
> sbsa-ref is fully emulated target. There is AHCI controller built-in so
> only "-drive" argument should be needed (no "-device" one).
> 

I followed official instructions from Jean Philippe to build RME stack, 
and I think it's better to keep them in sync.

>> +        self.vm.add_args('-device', 'virtio-9p-pci,fsdev=shr0,mount_tag=shr0')
>> +        self.vm.add_args('-fsdev', f'local,security_model=none,path={rme_stack},id=shr0')
> 
>> +        self.vm.add_args('-device', 'virtio-net-pci,netdev=net0')
>> +        self.vm.add_args('-netdev', 'user,id=net0')
> 
> e1000e is built-in already
> 

This is needed, because without this, there is an explicit wait for a 
network interface when booting. Adding this device allows to skip it.

> 
> As both virt and sbsa-ref tests do "more or less" the same stuff then it
> would be good to make common file/class and reuse it both tests by
> adding hardware differences.

I was thinking that at the beginning, but most of the config is 
different. The only common part is the nested guest test.

However, I didn't see any other tests that were importing functions from 
other files, and since we want to keep those two tests in separate files 
(to allow parallelism), the most pragmatic solution was to duplicate.
Overall, tests files should be as simple as possible, even if the price 
is to repeat a few lines.
If you have a cleaner solution, I'm open to implement it.

Thanks,
Pierrick
Peter Maydell Dec. 2, 2024, 10:57 a.m. UTC | #3
On Sun, 1 Dec 2024 at 18:09, Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> Hi Marcin,
>
> On 12/1/24 05:34, Marcin Juszkiewicz wrote:
> > W dniu 28.11.2024 o 22:37, Pierrick Bouvier pisze:
> >> This boot an OP-TEE environment, and launch a nested guest VM inside it
> >> using the Realms feature. We do it for virt and sbsa-ref platforms.
> >>
> >> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> >
> >> diff --git a/tests/functional/meson.build b/tests/functional/meson.build
> >> index 5c048cfac6d..b975a1560df 100644
> >> --- a/tests/functional/meson.build
> >> +++ b/tests/functional/meson.build
> >> @@ -13,6 +13,8 @@ endif
> >>    test_timeouts = {
> >>      'aarch64_aspeed' : 600,
> >>      'aarch64_raspi4' : 480,
> >
> >> +  'aarch64_rme_virt' : 720,
> >
> > Took 2974.95s on M1 Pro macbook.
> >
> >> +  'aarch64_rme_sbsaref' : 720,
> >
> > This one needed 2288.29s.
> >
> >>      'aarch64_sbsaref_alpine' : 720,
> >
> > Have to check cause timed out.
> >
> >>      'aarch64_sbsaref_freebsd' : 720,
> >
> > 331.65s
> >
> > So RME tests probably need longer timeouts or would not run at all.
> >
>
> By any chance, are you running those tests in debug mode?
> It seems to me that CI is running functional tests with optimized
> builds, so I'm not sure we want to support debug "times" here.

We do need to support debug times, because a common developer
use case is "doing a debug build, run 'make check-functional'
to check whether anything is broken. The debug times also
are useful because the CI runners can have highly variable
performance -- if a test is slow enough to hit the timeout
for a debug build locally, it's probably going to also hit
the timeout at least sometimes in CI.

thanks
-- PMM
Marcin Juszkiewicz Dec. 2, 2024, 10:58 a.m. UTC | #4
W dniu 1.12.2024 o 19:09, Pierrick Bouvier pisze:
> Hi Marcin,
> 
> On 12/1/24 05:34, Marcin Juszkiewicz wrote:

>> So RME tests probably need longer timeouts or would not run at all.
>>
> 
> By any chance, are you running those tests in debug mode?

zOMG, thanks! Turned out that I left "--enable-debug" in my local build 
and forgot to remove it.

90s vs 2974s is a difference ;D

>>> +++ b/tests/functional/test_aarch64_rme_sbsaref.py
>>
>>> +        self.vm.add_args('-accel', 'tcg')
>>
>> That's default value so can be skipped.
>>> +        self.vm.add_args('-m', '2G')
>>
>> I sent patch to bump default memsize to 2G recently.
>>
> 
> Is that merged already, or will be later?

Still in review queue.

>>> +        self.vm.add_args('-M', 'sbsa-ref')
>>> +        self.vm.add_args('-drive', 
>>> f'file={pflash0},format=raw,if=pflash')
>>> +        self.vm.add_args('-drive', 
>>> f'file={pflash1},format=raw,if=pflash')
>>> +        self.vm.add_args('-drive', f'file=fat:rw:{virtual},format=raw')
>>
>>> +        self.vm.add_args('-drive', 
>>> f'format=raw,if=none,file={drive},id=hd0')
>>> +        self.vm.add_args('-device', 'virtio-blk-pci,drive=hd0')
>>
>> sbsa-ref is fully emulated target. There is AHCI controller built-in so
>> only "-drive" argument should be needed (no "-device" one).
>>
> 
> I followed official instructions from Jean Philippe to build RME stack, 
> and I think it's better to keep them in sync.

OK.

>>> +        self.vm.add_args('-device', 'virtio-net-pci,netdev=net0')
>>> +        self.vm.add_args('-netdev', 'user,id=net0')
>>
>> e1000e is built-in already
>>
> 
> This is needed, because without this, there is an explicit wait for a 
> network interface when booting. Adding this device allows to skip it.

Thanks.

>> As both virt and sbsa-ref tests do "more or less" the same stuff then it
>> would be good to make common file/class and reuse it both tests by
>> adding hardware differences.
> 
> I was thinking that at the beginning, but most of the config is 
> different. The only common part is the nested guest test.
> 
> However, I didn't see any other tests that were importing functions from 
> other files, and since we want to keep those two tests in separate files 
> (to allow parallelism), the most pragmatic solution was to duplicate.

Fetching firmware on sbsa-ref is done in separate to not duplicate it. 
Which gave me idea.

> Overall, tests files should be as simple as possible, even if the price 
> is to repeat a few lines.
> If you have a cleaner solution, I'm open to implement it.
Peter Maydell Dec. 2, 2024, 11:04 a.m. UTC | #5
On Mon, 2 Dec 2024 at 10:58, Marcin Juszkiewicz
<marcin.juszkiewicz@linaro.org> wrote:
>
> W dniu 1.12.2024 o 19:09, Pierrick Bouvier pisze:
> > Hi Marcin,
> >
> > On 12/1/24 05:34, Marcin Juszkiewicz wrote:
>
> >> So RME tests probably need longer timeouts or would not run at all.
> >>
> >
> > By any chance, are you running those tests in debug mode?
>
> zOMG, thanks! Turned out that I left "--enable-debug" in my local build
> and forgot to remove it.
>
> 90s vs 2974s is a difference ;D

That is a particularly awful debug-to-non-debug performance
difference; it might be worth looking into what exactly
is causing it.

-- PMM
Pierrick Bouvier Dec. 2, 2024, 5:31 p.m. UTC | #6
On 12/2/24 02:57, Peter Maydell wrote:
> On Sun, 1 Dec 2024 at 18:09, Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>>
>> Hi Marcin,
>>
>> On 12/1/24 05:34, Marcin Juszkiewicz wrote:
>>> W dniu 28.11.2024 o 22:37, Pierrick Bouvier pisze:
>>>> This boot an OP-TEE environment, and launch a nested guest VM inside it
>>>> using the Realms feature. We do it for virt and sbsa-ref platforms.
>>>>
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>
>>>> diff --git a/tests/functional/meson.build b/tests/functional/meson.build
>>>> index 5c048cfac6d..b975a1560df 100644
>>>> --- a/tests/functional/meson.build
>>>> +++ b/tests/functional/meson.build
>>>> @@ -13,6 +13,8 @@ endif
>>>>     test_timeouts = {
>>>>       'aarch64_aspeed' : 600,
>>>>       'aarch64_raspi4' : 480,
>>>
>>>> +  'aarch64_rme_virt' : 720,
>>>
>>> Took 2974.95s on M1 Pro macbook.
>>>
>>>> +  'aarch64_rme_sbsaref' : 720,
>>>
>>> This one needed 2288.29s.
>>>
>>>>       'aarch64_sbsaref_alpine' : 720,
>>>
>>> Have to check cause timed out.
>>>
>>>>       'aarch64_sbsaref_freebsd' : 720,
>>>
>>> 331.65s
>>>
>>> So RME tests probably need longer timeouts or would not run at all.
>>>
>>
>> By any chance, are you running those tests in debug mode?
>> It seems to me that CI is running functional tests with optimized
>> builds, so I'm not sure we want to support debug "times" here.
> 
> We do need to support debug times, because a common developer
> use case is "doing a debug build, run 'make check-functional'
> to check whether anything is broken. The debug times also
> are useful because the CI runners can have highly variable
> performance -- if a test is slow enough to hit the timeout
> for a debug build locally, it's probably going to also hit
> the timeout at least sometimes in CI.
> 

I understand the scenario, but given how slow debug builds are, it would 
probably be faster to advise developer to recompile in release mode.
The overall time of compile + test is slower than waiting for debug.

Beyond using a debugger, what is the advantage to compile with -O0?

> thanks
> -- PMM
Pierrick Bouvier Dec. 2, 2024, 5:33 p.m. UTC | #7
On 12/2/24 03:04, Peter Maydell wrote:
> On Mon, 2 Dec 2024 at 10:58, Marcin Juszkiewicz
> <marcin.juszkiewicz@linaro.org> wrote:
>>
>> W dniu 1.12.2024 o 19:09, Pierrick Bouvier pisze:
>>> Hi Marcin,
>>>
>>> On 12/1/24 05:34, Marcin Juszkiewicz wrote:
>>
>>>> So RME tests probably need longer timeouts or would not run at all.
>>>>
>>>
>>> By any chance, are you running those tests in debug mode?
>>
>> zOMG, thanks! Turned out that I left "--enable-debug" in my local build
>> and forgot to remove it.
>>
>> 90s vs 2974s is a difference ;D
> 
> That is a particularly awful debug-to-non-debug performance
> difference; it might be worth looking into what exactly
> is causing it.
> 

Indeed.

This test boots a nested VM (using kvm, not tcg), do you any idea why 
such use case could be so slow with QEMU?

> -- PMM
Alex Bennée Dec. 2, 2024, 6:23 p.m. UTC | #8
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

> On 12/2/24 02:57, Peter Maydell wrote:
>> On Sun, 1 Dec 2024 at 18:09, Pierrick Bouvier
>> <pierrick.bouvier@linaro.org> wrote:
>>>
>>> Hi Marcin,
>>>
>>> On 12/1/24 05:34, Marcin Juszkiewicz wrote:
>>>> W dniu 28.11.2024 o 22:37, Pierrick Bouvier pisze:
>>>>> This boot an OP-TEE environment, and launch a nested guest VM inside it
>>>>> using the Realms feature. We do it for virt and sbsa-ref platforms.
>>>>>
>>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>>
>>>>> diff --git a/tests/functional/meson.build b/tests/functional/meson.build
>>>>> index 5c048cfac6d..b975a1560df 100644
>>>>> --- a/tests/functional/meson.build
>>>>> +++ b/tests/functional/meson.build
>>>>> @@ -13,6 +13,8 @@ endif
>>>>>     test_timeouts = {
>>>>>       'aarch64_aspeed' : 600,
>>>>>       'aarch64_raspi4' : 480,
>>>>
>>>>> +  'aarch64_rme_virt' : 720,
>>>>
>>>> Took 2974.95s on M1 Pro macbook.
>>>>
>>>>> +  'aarch64_rme_sbsaref' : 720,
>>>>
>>>> This one needed 2288.29s.
>>>>
>>>>>       'aarch64_sbsaref_alpine' : 720,
>>>>
>>>> Have to check cause timed out.
>>>>
>>>>>       'aarch64_sbsaref_freebsd' : 720,
>>>>
>>>> 331.65s
>>>>
>>>> So RME tests probably need longer timeouts or would not run at all.
>>>>
>>>
>>> By any chance, are you running those tests in debug mode?
>>> It seems to me that CI is running functional tests with optimized
>>> builds, so I'm not sure we want to support debug "times" here.
>> We do need to support debug times, because a common developer
>> use case is "doing a debug build, run 'make check-functional'
>> to check whether anything is broken. The debug times also
>> are useful because the CI runners can have highly variable
>> performance -- if a test is slow enough to hit the timeout
>> for a debug build locally, it's probably going to also hit
>> the timeout at least sometimes in CI.
>> 
>
> I understand the scenario, but given how slow debug builds are, it
> would probably be faster to advise developer to recompile in release
> mode.
> The overall time of compile + test is slower than waiting for debug.
>
> Beyond using a debugger, what is the advantage to compile with -O0?

--enable-debug

  - enables -00 with -g3 for symbols
  - and enables additional checks to validate TCG

You can use --enable-debug-info for just debug info without the overhead.

>
>> thanks
>> -- PMM
Pierrick Bouvier Dec. 2, 2024, 6:36 p.m. UTC | #9
On 12/2/24 10:23, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> 
>> On 12/2/24 02:57, Peter Maydell wrote:
>>> On Sun, 1 Dec 2024 at 18:09, Pierrick Bouvier
>>> <pierrick.bouvier@linaro.org> wrote:
>>>>
>>>> Hi Marcin,
>>>>
>>>> On 12/1/24 05:34, Marcin Juszkiewicz wrote:
>>>>> W dniu 28.11.2024 o 22:37, Pierrick Bouvier pisze:
>>>>>> This boot an OP-TEE environment, and launch a nested guest VM inside it
>>>>>> using the Realms feature. We do it for virt and sbsa-ref platforms.
>>>>>>
>>>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>>>
>>>>>> diff --git a/tests/functional/meson.build b/tests/functional/meson.build
>>>>>> index 5c048cfac6d..b975a1560df 100644
>>>>>> --- a/tests/functional/meson.build
>>>>>> +++ b/tests/functional/meson.build
>>>>>> @@ -13,6 +13,8 @@ endif
>>>>>>      test_timeouts = {
>>>>>>        'aarch64_aspeed' : 600,
>>>>>>        'aarch64_raspi4' : 480,
>>>>>
>>>>>> +  'aarch64_rme_virt' : 720,
>>>>>
>>>>> Took 2974.95s on M1 Pro macbook.
>>>>>
>>>>>> +  'aarch64_rme_sbsaref' : 720,
>>>>>
>>>>> This one needed 2288.29s.
>>>>>
>>>>>>        'aarch64_sbsaref_alpine' : 720,
>>>>>
>>>>> Have to check cause timed out.
>>>>>
>>>>>>        'aarch64_sbsaref_freebsd' : 720,
>>>>>
>>>>> 331.65s
>>>>>
>>>>> So RME tests probably need longer timeouts or would not run at all.
>>>>>
>>>>
>>>> By any chance, are you running those tests in debug mode?
>>>> It seems to me that CI is running functional tests with optimized
>>>> builds, so I'm not sure we want to support debug "times" here.
>>> We do need to support debug times, because a common developer
>>> use case is "doing a debug build, run 'make check-functional'
>>> to check whether anything is broken. The debug times also
>>> are useful because the CI runners can have highly variable
>>> performance -- if a test is slow enough to hit the timeout
>>> for a debug build locally, it's probably going to also hit
>>> the timeout at least sometimes in CI.
>>>
>>
>> I understand the scenario, but given how slow debug builds are, it
>> would probably be faster to advise developer to recompile in release
>> mode.
>> The overall time of compile + test is slower than waiting for debug.
>>
>> Beyond using a debugger, what is the advantage to compile with -O0?
> 
> --enable-debug
> 
>    - enables -00 with -g3 for symbols
>    - and enables additional checks to validate TCG
> 

I'm aware this is the combination of both, and indeed, runtime checks 
can be done on optimized builds as well (it's the combination I use 
personally).

Maybe our enable-debug should produced optimized builds by default, and 
we could have a new --enable-debug-unopt for the "I need to use a 
debugger" use case. Would save a lot of time for devs, and in CI where 
minutes are precious.

> You can use --enable-debug-info for just debug info without the overhead.
> 
>>
>>> thanks
>>> -- PMM
>
Peter Maydell Dec. 2, 2024, 7:56 p.m. UTC | #10
On Mon, 2 Dec 2024 at 18:36, Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
> Maybe our enable-debug should produced optimized builds by default, and
> we could have a new --enable-debug-unopt for the "I need to use a
> debugger" use case. Would save a lot of time for devs, and in CI where
> minutes are precious.

The whole point of --enable-debug is "I might need to use a debugger"
(or a sanitizer, or anything else where you care about debug info).
If you want the optimized builds, that's the default.

thanks
-- PMM
Pierrick Bouvier Dec. 2, 2024, 8:09 p.m. UTC | #11
On 12/2/24 11:56, Peter Maydell wrote:
> On Mon, 2 Dec 2024 at 18:36, Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>> Maybe our enable-debug should produced optimized builds by default, and
>> we could have a new --enable-debug-unopt for the "I need to use a
>> debugger" use case. Would save a lot of time for devs, and in CI where
>> minutes are precious.
> 
> The whole point of --enable-debug is "I might need to use a debugger"
> (or a sanitizer, or anything else where you care about debug info).
> If you want the optimized builds, that's the default.
> 

It seems like we associate debug info to "I might need to use a 
debugger". But, it's not the only use case.
Sanitizers for example, are usable with optimizations enabled as well 
(with some caveats, as some errors might be optimized out).

I don't have anything against what --enable-debug does, but supporting 
this for tests (and CI) because people *might* use it is a mistake IMHO.
It's an opinion beyond the current series use case.

It takes more time for devs and CI to run tests with debug build, 
compared to recompile and run test suites. And we hit timeouts and lose 
minutes because of that.

> thanks
> -- PMM
Pierrick Bouvier Dec. 2, 2024, 8:09 p.m. UTC | #12
On 12/2/24 11:56, Peter Maydell wrote:
> On Mon, 2 Dec 2024 at 18:36, Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>> Maybe our enable-debug should produced optimized builds by default, and
>> we could have a new --enable-debug-unopt for the "I need to use a
>> debugger" use case. Would save a lot of time for devs, and in CI where
>> minutes are precious.
> 
> The whole point of --enable-debug is "I might need to use a debugger"
> (or a sanitizer, or anything else where you care about debug info).
> If you want the optimized builds, that's the default.
> 

It seems like we associate debug info to "I might need to use a 
debugger". But, it's not the only use case.
Sanitizers for example, are usable with optimizations enabled as well 
(with some caveats, as some errors might be optimized out).

I don't have anything against what --enable-debug does, but supporting 
this for tests (and CI) because people *might* use it is a mistake IMHO.
It's an opinion beyond the current series use case.

It takes more time for devs and CI to run tests with debug build, 
compared to recompile and run test suites. And we hit timeouts and lose 
minutes because of that.

> thanks
> -- PMM
Peter Maydell Dec. 3, 2024, 10:29 a.m. UTC | #13
On Mon, 2 Dec 2024 at 20:09, Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 12/2/24 11:56, Peter Maydell wrote:
> > On Mon, 2 Dec 2024 at 18:36, Pierrick Bouvier
> > <pierrick.bouvier@linaro.org> wrote:
> >> Maybe our enable-debug should produced optimized builds by default, and
> >> we could have a new --enable-debug-unopt for the "I need to use a
> >> debugger" use case. Would save a lot of time for devs, and in CI where
> >> minutes are precious.
> >
> > The whole point of --enable-debug is "I might need to use a debugger"
> > (or a sanitizer, or anything else where you care about debug info).
> > If you want the optimized builds, that's the default.
> >
>
> It seems like we associate debug info to "I might need to use a
> debugger". But, it's not the only use case.
> Sanitizers for example, are usable with optimizations enabled as well
> (with some caveats, as some errors might be optimized out).

Yes, it's the caveats that are the problem. If compilers
supported an optimization mode that guaranteed not to break
the debug illusion for backtracing, interactive debug, etc,
then we could use it. But the best they do is -O0, so that's
what we use. (-Og sounds like it ought to be what we want,
but unfortunately it still leaves you with "value optimized
out" errors in debuggers, so it's no good.)

> I don't have anything against what --enable-debug does, but supporting
> this for tests (and CI) because people *might* use it is a mistake IMHO.
> It's an opinion beyond the current series use case.

If your test is hitting the timeouts for --enable-debug on
a local dev machine then it has a high chance of hitting the
timeouts in CI in a non-debug build because of the "noisy
neighbour" problem where a CI job runs massively slower than
usual. The fix is to make sure the timeout is high enough not
to be hit in a debug build, and wherever possible to have tests
that are cut down so they have short runtimes.

thanks
-- PMM
Gustavo Romero Dec. 3, 2024, 2:55 p.m. UTC | #14
Hi Pierrick,

On 12/2/24 14:33, Pierrick Bouvier wrote:
> On 12/2/24 03:04, Peter Maydell wrote:
>> On Mon, 2 Dec 2024 at 10:58, Marcin Juszkiewicz
>> <marcin.juszkiewicz@linaro.org> wrote:
>>>
>>> W dniu 1.12.2024 o 19:09, Pierrick Bouvier pisze:
>>>> Hi Marcin,
>>>>
>>>> On 12/1/24 05:34, Marcin Juszkiewicz wrote:
>>>
>>>>> So RME tests probably need longer timeouts or would not run at all.
>>>>>
>>>>
>>>> By any chance, are you running those tests in debug mode?
>>>
>>> zOMG, thanks! Turned out that I left "--enable-debug" in my local build
>>> and forgot to remove it.
>>>
>>> 90s vs 2974s is a difference ;D
>>
>> That is a particularly awful debug-to-non-debug performance
>> difference; it might be worth looking into what exactly
>> is causing it.
>>
> 
> Indeed.
> 
> This test boots a nested VM (using kvm, not tcg), do you any idea why such use case could be so slow with QEMU?

It's not "really" using kvm acceleration right?

It's using kvm but the kvm module in the host kernel is actually running
fully emulated in QEMU system mode (-accel tcg is being passed to it).

So I understand it's "all TCG" in the end and no HW acceleration is ever used in the tests.


Cheers,
Gustavo
Alex Bennée Dec. 3, 2024, 2:56 p.m. UTC | #15
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

> This boot an OP-TEE environment, and launch a nested guest VM inside it
> using the Realms feature. We do it for virt and sbsa-ref platforms.
>
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>
<snip>
> +
> +        self.vm.add_args('-accel', 'tcg')
> +        self.vm.add_args('-cpu', 'max,x-rme=on')

With debug on the PAC function are certainly very high in the perf
report. So pauth-impdef=on seems worthwhile here.

> +        self.vm.add_args('-m', '2G')
> +        self.vm.add_args('-M', 'sbsa-ref')
> +        self.vm.add_args('-drive', f'file={pflash0},format=raw,if=pflash')
> +        self.vm.add_args('-drive', f'file={pflash1},format=raw,if=pflash')
> +        self.vm.add_args('-drive', f'file=fat:rw:{virtual},format=raw')
> +        self.vm.add_args('-drive', f'format=raw,if=none,file={drive},id=hd0')
> +        self.vm.add_args('-device', 'virtio-blk-pci,drive=hd0')
> +        self.vm.add_args('-device', 'virtio-9p-pci,fsdev=shr0,mount_tag=shr0')
> +        self.vm.add_args('-fsdev', f'local,security_model=none,path={rme_stack},id=shr0')
> +        self.vm.add_args('-device', 'virtio-net-pci,netdev=net0')
> +        self.vm.add_args('-netdev', 'user,id=net0')
<snip>
> +
> +        self.vm.add_args('-accel', 'tcg')
> +        self.vm.add_args('-cpu', 'max,x-rme=on')

And here.

<snip>

With that the tests both pass with --enable-debug (312s, 352s) and the
profile looks like:

   6.33%  qemu-system-aar  qemu-system-aarch64                       [.] arm_feature
   5.66%  qemu-system-aar  qemu-system-aarch64                       [.] tcg_flush_jmp_cache
   3.44%  qemu-system-aar  qemu-system-aarch64                       [.] rebuild_hflags_a64

This I suspect is triggered by assert_hflags_rebuild_correctly() which
is validating we've not skipped rebuilding the flags when we need to.
It's a lot easier than debugging why your execution trace looks weird.

   2.95%  qemu-system-aar  qemu-system-aarch64                       [.] extract64
   2.52%  qemu-system-aar  qemu-system-aarch64                       [.] extract64

This is usually triggered by translation code which uses extract64
heavily during instruction decode.

It might be useful to see if we can get functional tests run under TCG
to dump "info jit" at the end and ensure we are not over generating code
and exhausting the translation cache.

   2.12%  qemu-system-aar  qemu-system-aarch64                       [.] arm_el_is_aa64
   2.11%  qemu-system-aar  qemu-system-aarch64                       [.] arm_security_space_below_el3
   2.11%  qemu-system-aar  qemu-system-aarch64                       [.] deposit64
   1.49%  qemu-system-aar  qemu-system-aarch64                       [.] arm_hcr_el2_eff_secstate
   1.46%  qemu-system-aar  qemu-system-aarch64                       [.] arm_is_el2_enabled_secstate
   1.38%  qemu-system-aar  qemu-system-aarch64                       [.] extract32
   1.34%  qemu-system-aar  qemu-system-aarch64                       [.] extract64
   1.30%  qemu-system-aar  qemu-system-aarch64                       [.] get_phys_addr_lpae
   1.23%  qemu-system-aar  qemu-system-aarch64                       [.] aa64_va_parameters
   1.09%  qemu-system-aar  qemu-system-aarch64                       [.] rol32
   1.07%  qemu-system-aar  qemu-system-aarch64                       [.] probe_access_internal
   1.02%  qemu-system-aar  qemu-system-aarch64                       [.] deposit32
Gustavo Romero Dec. 3, 2024, 3:02 p.m. UTC | #16
Hi Alex,

On 12/2/24 15:23, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> 
>> On 12/2/24 02:57, Peter Maydell wrote:
>>> On Sun, 1 Dec 2024 at 18:09, Pierrick Bouvier
>>> <pierrick.bouvier@linaro.org> wrote:
>>>>
>>>> Hi Marcin,
>>>>
>>>> On 12/1/24 05:34, Marcin Juszkiewicz wrote:
>>>>> W dniu 28.11.2024 o 22:37, Pierrick Bouvier pisze:
>>>>>> This boot an OP-TEE environment, and launch a nested guest VM inside it
>>>>>> using the Realms feature. We do it for virt and sbsa-ref platforms.
>>>>>>
>>>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>>>
>>>>>> diff --git a/tests/functional/meson.build b/tests/functional/meson.build
>>>>>> index 5c048cfac6d..b975a1560df 100644
>>>>>> --- a/tests/functional/meson.build
>>>>>> +++ b/tests/functional/meson.build
>>>>>> @@ -13,6 +13,8 @@ endif
>>>>>>      test_timeouts = {
>>>>>>        'aarch64_aspeed' : 600,
>>>>>>        'aarch64_raspi4' : 480,
>>>>>
>>>>>> +  'aarch64_rme_virt' : 720,
>>>>>
>>>>> Took 2974.95s on M1 Pro macbook.
>>>>>
>>>>>> +  'aarch64_rme_sbsaref' : 720,
>>>>>
>>>>> This one needed 2288.29s.
>>>>>
>>>>>>        'aarch64_sbsaref_alpine' : 720,
>>>>>
>>>>> Have to check cause timed out.
>>>>>
>>>>>>        'aarch64_sbsaref_freebsd' : 720,
>>>>>
>>>>> 331.65s
>>>>>
>>>>> So RME tests probably need longer timeouts or would not run at all.
>>>>>
>>>>
>>>> By any chance, are you running those tests in debug mode?
>>>> It seems to me that CI is running functional tests with optimized
>>>> builds, so I'm not sure we want to support debug "times" here.
>>> We do need to support debug times, because a common developer
>>> use case is "doing a debug build, run 'make check-functional'
>>> to check whether anything is broken. The debug times also
>>> are useful because the CI runners can have highly variable
>>> performance -- if a test is slow enough to hit the timeout
>>> for a debug build locally, it's probably going to also hit
>>> the timeout at least sometimes in CI.
>>>
>>
>> I understand the scenario, but given how slow debug builds are, it
>> would probably be faster to advise developer to recompile in release
>> mode.
>> The overall time of compile + test is slower than waiting for debug.
>>
>> Beyond using a debugger, what is the advantage to compile with -O0?
> 
> --enable-debug
> 
>    - enables -00 with -g3 for symbols
>    - and enables additional checks to validate TCG

hm, do we ever used -g3 for --enable-debug?

https://gitlab.com/qemu-project/qemu/-/blob/master/configure?ref_type=heads#L749

I'd love to use -g3 instead of only -g for having the macro symbols.

In my builds I use --extra-cflags="-g3" to have it but would like to drop it.

unless I'm missing some other change in the flags down the lane...


Cheers,
Gustavo

> You can use --enable-debug-info for just debug info without the overhead.
> 
>>
>>> thanks
>>> -- PMM
>
Richard Henderson Dec. 3, 2024, 3:12 p.m. UTC | #17
On 12/3/24 08:55, Gustavo Romero wrote:
> It's not "really" using kvm acceleration right?
> 
> It's using kvm but the kvm module in the host kernel is actually running
> fully emulated in QEMU system mode (-accel tcg is being passed to it).
> 
> So I understand it's "all TCG" in the end and no HW acceleration is ever used in the tests.

Ehh.. yes and no.

Yes, the system as a whole is running on a host under TCG.

No, the nested guest is itself using KVM interfaces to the guest kernel.  In this case, 
it's what we must to do in order to exercise the FEAT_RME code within TCG.

But contrast this with another scenario in which the nested guest is *also* running under 
TCG.  Double jit translation is, as you might imagine, a bit slow.  But it can be useful 
for smoke testing code running in the outermost guest.


r~
Alex Bennée Dec. 3, 2024, 3:30 p.m. UTC | #18
Gustavo Romero <gustavo.romero@linaro.org> writes:

> Hi Pierrick,
>
> On 12/2/24 14:33, Pierrick Bouvier wrote:
>> On 12/2/24 03:04, Peter Maydell wrote:
>>> On Mon, 2 Dec 2024 at 10:58, Marcin Juszkiewicz
>>> <marcin.juszkiewicz@linaro.org> wrote:
>>>>
>>>> W dniu 1.12.2024 o 19:09, Pierrick Bouvier pisze:
>>>>> Hi Marcin,
>>>>>
>>>>> On 12/1/24 05:34, Marcin Juszkiewicz wrote:
>>>>
>>>>>> So RME tests probably need longer timeouts or would not run at all.
>>>>>>
>>>>>
>>>>> By any chance, are you running those tests in debug mode?
>>>>
>>>> zOMG, thanks! Turned out that I left "--enable-debug" in my local build
>>>> and forgot to remove it.
>>>>
>>>> 90s vs 2974s is a difference ;D
>>>
>>> That is a particularly awful debug-to-non-debug performance
>>> difference; it might be worth looking into what exactly
>>> is causing it.
>>>
>> Indeed.
>> This test boots a nested VM (using kvm, not tcg), do you any idea
>> why such use case could be so slow with QEMU?
>
> It's not "really" using kvm acceleration right?
>
> It's using kvm but the kvm module in the host kernel is actually running
> fully emulated in QEMU system mode (-accel tcg is being passed to it).
>
> So I understand it's "all TCG" in the end and no HW acceleration is
> ever used in the tests.

Right, but from the internal QEMU's point of view its just real hardware
so we don't have nested translation. All the overhead of that is in the
"host" QEMU.

In theory we could also use kvmtool to launch a realm but I doubt in
this case the internal QEMU is actually taking up that much of the
runtime.

>
>
> Cheers,
> Gustavo
Alex Bennée Dec. 3, 2024, 3:31 p.m. UTC | #19
Gustavo Romero <gustavo.romero@linaro.org> writes:

> Hi Alex,
>
> On 12/2/24 15:23, Alex Bennée wrote:
>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>> 
>>> On 12/2/24 02:57, Peter Maydell wrote:
>>>> On Sun, 1 Dec 2024 at 18:09, Pierrick Bouvier
>>>> <pierrick.bouvier@linaro.org> wrote:
>>>>>
>>>>> Hi Marcin,
>>>>>
>>>>> On 12/1/24 05:34, Marcin Juszkiewicz wrote:
>>>>>> W dniu 28.11.2024 o 22:37, Pierrick Bouvier pisze:
>>>>>>> This boot an OP-TEE environment, and launch a nested guest VM inside it
>>>>>>> using the Realms feature. We do it for virt and sbsa-ref platforms.
>>>>>>>
>>>>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>>>>
>>>>>>> diff --git a/tests/functional/meson.build b/tests/functional/meson.build
>>>>>>> index 5c048cfac6d..b975a1560df 100644
>>>>>>> --- a/tests/functional/meson.build
>>>>>>> +++ b/tests/functional/meson.build
>>>>>>> @@ -13,6 +13,8 @@ endif
>>>>>>>      test_timeouts = {
>>>>>>>        'aarch64_aspeed' : 600,
>>>>>>>        'aarch64_raspi4' : 480,
>>>>>>
>>>>>>> +  'aarch64_rme_virt' : 720,
>>>>>>
>>>>>> Took 2974.95s on M1 Pro macbook.
>>>>>>
>>>>>>> +  'aarch64_rme_sbsaref' : 720,
>>>>>>
>>>>>> This one needed 2288.29s.
>>>>>>
>>>>>>>        'aarch64_sbsaref_alpine' : 720,
>>>>>>
>>>>>> Have to check cause timed out.
>>>>>>
>>>>>>>        'aarch64_sbsaref_freebsd' : 720,
>>>>>>
>>>>>> 331.65s
>>>>>>
>>>>>> So RME tests probably need longer timeouts or would not run at all.
>>>>>>
>>>>>
>>>>> By any chance, are you running those tests in debug mode?
>>>>> It seems to me that CI is running functional tests with optimized
>>>>> builds, so I'm not sure we want to support debug "times" here.
>>>> We do need to support debug times, because a common developer
>>>> use case is "doing a debug build, run 'make check-functional'
>>>> to check whether anything is broken. The debug times also
>>>> are useful because the CI runners can have highly variable
>>>> performance -- if a test is slow enough to hit the timeout
>>>> for a debug build locally, it's probably going to also hit
>>>> the timeout at least sometimes in CI.
>>>>
>>>
>>> I understand the scenario, but given how slow debug builds are, it
>>> would probably be faster to advise developer to recompile in release
>>> mode.
>>> The overall time of compile + test is slower than waiting for debug.
>>>
>>> Beyond using a debugger, what is the advantage to compile with -O0?
>> --enable-debug
>>    - enables -00 with -g3 for symbols
>>    - and enables additional checks to validate TCG
>
> hm, do we ever used -g3 for --enable-debug?
>
> https://gitlab.com/qemu-project/qemu/-/blob/master/configure?ref_type=heads#L749
>
> I'd love to use -g3 instead of only -g for having the macro symbols.

Hmm yeah, I tend to use extra-cflags with it as well.

>
> In my builds I use --extra-cflags="-g3" to have it but would like to drop it.
>
> unless I'm missing some other change in the flags down the lane...
>
>
> Cheers,
> Gustavo
>
>> You can use --enable-debug-info for just debug info without the overhead.
>> 
>>>
>>>> thanks
>>>> -- PMM
>>
Pierrick Bouvier Dec. 3, 2024, 6:24 p.m. UTC | #20
On 12/3/24 06:56, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> 
>> This boot an OP-TEE environment, and launch a nested guest VM inside it
>> using the Realms feature. We do it for virt and sbsa-ref platforms.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>
> <snip>
>> +
>> +        self.vm.add_args('-accel', 'tcg')
>> +        self.vm.add_args('-cpu', 'max,x-rme=on')
> 
> With debug on the PAC function are certainly very high in the perf
> report. So pauth-impdef=on seems worthwhile here.
> 
>> +        self.vm.add_args('-m', '2G')
>> +        self.vm.add_args('-M', 'sbsa-ref')
>> +        self.vm.add_args('-drive', f'file={pflash0},format=raw,if=pflash')
>> +        self.vm.add_args('-drive', f'file={pflash1},format=raw,if=pflash')
>> +        self.vm.add_args('-drive', f'file=fat:rw:{virtual},format=raw')
>> +        self.vm.add_args('-drive', f'format=raw,if=none,file={drive},id=hd0')
>> +        self.vm.add_args('-device', 'virtio-blk-pci,drive=hd0')
>> +        self.vm.add_args('-device', 'virtio-9p-pci,fsdev=shr0,mount_tag=shr0')
>> +        self.vm.add_args('-fsdev', f'local,security_model=none,path={rme_stack},id=shr0')
>> +        self.vm.add_args('-device', 'virtio-net-pci,netdev=net0')
>> +        self.vm.add_args('-netdev', 'user,id=net0')
> <snip>
>> +
>> +        self.vm.add_args('-accel', 'tcg')
>> +        self.vm.add_args('-cpu', 'max,x-rme=on')
> 
> And here.
> 
> <snip>
> 
> With that the tests both pass with --enable-debug (312s, 352s) and the
> profile looks like:
> 
>     6.33%  qemu-system-aar  qemu-system-aarch64                       [.] arm_feature
>     5.66%  qemu-system-aar  qemu-system-aarch64                       [.] tcg_flush_jmp_cache
>     3.44%  qemu-system-aar  qemu-system-aarch64                       [.] rebuild_hflags_a64
> 
> This I suspect is triggered by assert_hflags_rebuild_correctly() which
> is validating we've not skipped rebuilding the flags when we need to.
> It's a lot easier than debugging why your execution trace looks weird.
> 
>     2.95%  qemu-system-aar  qemu-system-aarch64                       [.] extract64
>     2.52%  qemu-system-aar  qemu-system-aarch64                       [.] extract64
> 
> This is usually triggered by translation code which uses extract64
> heavily during instruction decode.
> 
> It might be useful to see if we can get functional tests run under TCG
> to dump "info jit" at the end and ensure we are not over generating code
> and exhausting the translation cache.
> 
>     2.12%  qemu-system-aar  qemu-system-aarch64                       [.] arm_el_is_aa64
>     2.11%  qemu-system-aar  qemu-system-aarch64                       [.] arm_security_space_below_el3
>     2.11%  qemu-system-aar  qemu-system-aarch64                       [.] deposit64
>     1.49%  qemu-system-aar  qemu-system-aarch64                       [.] arm_hcr_el2_eff_secstate
>     1.46%  qemu-system-aar  qemu-system-aarch64                       [.] arm_is_el2_enabled_secstate
>     1.38%  qemu-system-aar  qemu-system-aarch64                       [.] extract32
>     1.34%  qemu-system-aar  qemu-system-aarch64                       [.] extract64
>     1.30%  qemu-system-aar  qemu-system-aarch64                       [.] get_phys_addr_lpae
>     1.23%  qemu-system-aar  qemu-system-aarch64                       [.] aa64_va_parameters
>     1.09%  qemu-system-aar  qemu-system-aarch64                       [.] rol32
>     1.07%  qemu-system-aar  qemu-system-aarch64                       [.] probe_access_internal
>     1.02%  qemu-system-aar  qemu-system-aarch64                       [.] deposit32
> 
> 

Thanks Alex.

I did the same investigation, and switching to pauth-impdef brings down 
time from 1500s to a more "acceptable" 450s on my machine.
In my profile (using call graphs, which I'm not sure you used), I 
observe that 26% of the time is spent in 
assert_hflags_rebuild_correctly, which is enabled by --enable-debug-tcg.

I'll send a v3 switching to impdef and increasing the timeout, should be 
enough for this time.

Pierrick
diff mbox series

Patch

diff --git a/tests/functional/meson.build b/tests/functional/meson.build
index 5c048cfac6d..b975a1560df 100644
--- a/tests/functional/meson.build
+++ b/tests/functional/meson.build
@@ -13,6 +13,8 @@  endif
 test_timeouts = {
   'aarch64_aspeed' : 600,
   'aarch64_raspi4' : 480,
+  'aarch64_rme_virt' : 720,
+  'aarch64_rme_sbsaref' : 720,
   'aarch64_sbsaref_alpine' : 720,
   'aarch64_sbsaref_freebsd' : 720,
   'aarch64_tuxrun' : 240,
@@ -52,6 +54,8 @@  tests_aarch64_system_thorough = [
   'aarch64_aspeed',
   'aarch64_raspi3',
   'aarch64_raspi4',
+  'aarch64_rme_virt',
+  'aarch64_rme_sbsaref',
   'aarch64_sbsaref',
   'aarch64_sbsaref_alpine',
   'aarch64_sbsaref_freebsd',
diff --git a/tests/functional/test_aarch64_rme_sbsaref.py b/tests/functional/test_aarch64_rme_sbsaref.py
new file mode 100755
index 00000000000..af3f9fb2d64
--- /dev/null
+++ b/tests/functional/test_aarch64_rme_sbsaref.py
@@ -0,0 +1,96 @@ 
+#!/usr/bin/env python3
+#
+# Functional test that boots a Realms environment on sbsa-ref machine and a
+# nested guest VM using it.
+#
+# Copyright (c) 2024 Linaro Ltd.
+#
+# Author: Pierrick Bouvier <pierrick.bouvier@linaro.org>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import time
+import os
+import logging
+
+from qemu_test import QemuSystemTest, Asset
+from qemu_test import exec_command, wait_for_console_pattern
+from qemu_test import exec_command_and_wait_for_pattern
+from qemu_test.utils import archive_extract
+
+
+class Aarch64RMESbsaRefMachine(QemuSystemTest):
+    def wait_for_console_pattern(self, success_message, vm=None):
+        wait_for_console_pattern(self, success_message,
+                                 failure_message='Kernel panic - not syncing',
+                                 vm=vm)
+
+    # Stack is built with OP-TEE build environment from those instructions:
+    # https://linaro.atlassian.net/wiki/spaces/QEMU/pages/29051027459/
+    # https://github.com/pbo-linaro/qemu-rme-stack
+    ASSET_RME_STACK_SBSA = Asset(
+        ('https://fileserver.linaro.org/s/pBCeJktME2T5ikj/'
+         'download/rme-stack-op-tee-4.2.0-cca-v3-sbsa.tar.gz'),
+         '67542c80e5cfa84494f757bbee80b3535946c74666b8b82681f8eceb0e91d4ef')
+
+    # This tests the FEAT_RME cpu implementation, by booting a VM supporting it,
+    # and launching a nested VM using it.
+    def test_aarch64_rme_sbsaref(self):
+        stack_path_tar_gz = self.ASSET_RME_STACK_SBSA.fetch()
+        archive_extract(stack_path_tar_gz, self.workdir)
+
+        self.set_machine('sbsa-ref')
+        self.vm.set_console()
+        self.require_accelerator('tcg')
+
+        rme_stack = os.path.join(self.workdir,
+                                 'rme-stack-op-tee-4.2.0-cca-v3-sbsa')
+        pflash0 = os.path.join(rme_stack, 'images', 'SBSA_FLASH0.fd')
+        pflash1 = os.path.join(rme_stack, 'images', 'SBSA_FLASH1.fd')
+        virtual = os.path.join(rme_stack, 'images', 'disks', 'virtual')
+        drive = os.path.join(rme_stack, 'out-br', 'images', 'rootfs.ext4')
+
+        self.vm.add_args('-accel', 'tcg')
+        self.vm.add_args('-cpu', 'max,x-rme=on')
+        self.vm.add_args('-m', '2G')
+        self.vm.add_args('-M', 'sbsa-ref')
+        self.vm.add_args('-drive', f'file={pflash0},format=raw,if=pflash')
+        self.vm.add_args('-drive', f'file={pflash1},format=raw,if=pflash')
+        self.vm.add_args('-drive', f'file=fat:rw:{virtual},format=raw')
+        self.vm.add_args('-drive', f'format=raw,if=none,file={drive},id=hd0')
+        self.vm.add_args('-device', 'virtio-blk-pci,drive=hd0')
+        self.vm.add_args('-device', 'virtio-9p-pci,fsdev=shr0,mount_tag=shr0')
+        self.vm.add_args('-fsdev', f'local,security_model=none,path={rme_stack},id=shr0')
+        self.vm.add_args('-device', 'virtio-net-pci,netdev=net0')
+        self.vm.add_args('-netdev', 'user,id=net0')
+
+        self.vm.launch()
+        # Wait for host VM boot to complete.
+        self.wait_for_console_pattern('Welcome to Buildroot')
+        exec_command_and_wait_for_pattern(self, 'root', '#')
+
+        # We now boot the (nested) guest VM
+        exec_command(self,
+                     'qemu-system-aarch64 -M virt,gic-version=3 '
+                     '-cpu host -enable-kvm -m 512M '
+                     '-M confidential-guest-support=rme0 '
+                     '-object rme-guest,id=rme0,measurement-algo=sha256 '
+                     '-device virtio-net-pci,netdev=net0,romfile= '
+                     '-netdev user,id=net0 '
+                     '-kernel /mnt/out/bin/Image '
+                     '-initrd /mnt/out-br/images/rootfs.cpio '
+                     '-serial stdio')
+        # Detect Realm activation during (nested) guest boot.
+        self.wait_for_console_pattern('SMC_RMI_REALM_ACTIVATE')
+        # Wait for (nested) guest boot to complete.
+        self.wait_for_console_pattern('Welcome to Buildroot')
+        exec_command_and_wait_for_pattern(self, 'root', '#')
+
+        # query (nested) guest cca report
+        exec_command(self, 'cca-workload-attestation report')
+        self.wait_for_console_pattern('"cca-platform-hash-algo-id": "sha-256"')
+        self.wait_for_console_pattern('"cca-realm-hash-algo-id": "sha-256"')
+        self.wait_for_console_pattern('"cca-realm-public-key-hash-algo-id": "sha-256"')
+
+if __name__ == '__main__':
+    QemuSystemTest.main()
diff --git a/tests/functional/test_aarch64_rme_virt.py b/tests/functional/test_aarch64_rme_virt.py
new file mode 100755
index 00000000000..cd895da46b6
--- /dev/null
+++ b/tests/functional/test_aarch64_rme_virt.py
@@ -0,0 +1,98 @@ 
+#!/usr/bin/env python3
+#
+# Functional test that boots a Realms environment on virt machine and a nested
+# guest VM using it.
+#
+# Copyright (c) 2024 Linaro Ltd.
+#
+# Author: Pierrick Bouvier <pierrick.bouvier@linaro.org>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import time
+import os
+import logging
+
+from qemu_test import QemuSystemTest, Asset
+from qemu_test import exec_command, wait_for_console_pattern
+from qemu_test import exec_command_and_wait_for_pattern
+from qemu_test.utils import archive_extract
+
+
+class Aarch64RMEVirtMachine(QemuSystemTest):
+    def wait_for_console_pattern(self, success_message, vm=None):
+        wait_for_console_pattern(self, success_message,
+                                 failure_message='Kernel panic - not syncing',
+                                 vm=vm)
+
+    # Stack is built with OP-TEE build environment from those instructions:
+    # https://linaro.atlassian.net/wiki/spaces/QEMU/pages/29051027459/
+    # https://github.com/pbo-linaro/qemu-rme-stack
+    ASSET_RME_STACK_VIRT = Asset(
+        ('https://fileserver.linaro.org/s/7dsXdtbDfBZbCTC/'
+         'download/rme-stack-op-tee-4.2.0-cca-v3-qemu_v8.tar.gz'),
+         '0b49f2652a7201d100365b8ccc6c317f1f8f2f4de6feed084080f52fe1b5fadc')
+
+    # This tests the FEAT_RME cpu implementation, by booting a VM supporting it,
+    # and launching a nested VM using it.
+    def test_aarch64_rme_virt(self):
+        stack_path_tar_gz = self.ASSET_RME_STACK_VIRT.fetch()
+        archive_extract(stack_path_tar_gz, self.workdir)
+
+        self.set_machine('virt')
+        self.vm.set_console()
+        self.require_accelerator('tcg')
+
+        rme_stack = os.path.join(self.workdir,
+                                 'rme-stack-op-tee-4.2.0-cca-v3-qemu_v8')
+        kernel = os.path.join(rme_stack, 'out', 'bin', 'Image')
+        bios = os.path.join(rme_stack, 'out', 'bin', 'flash.bin')
+        drive = os.path.join(rme_stack, 'out-br', 'images', 'rootfs.ext4')
+
+        self.vm.add_args('-accel', 'tcg')
+        self.vm.add_args('-cpu', 'max,x-rme=on')
+        self.vm.add_args('-m', '2G')
+        self.vm.add_args('-M', 'virt,acpi=off,'
+                         'virtualization=on,'
+                         'secure=on,'
+                         'gic-version=3')
+        self.vm.add_args('-bios', bios)
+        self.vm.add_args('-kernel', kernel)
+        self.vm.add_args('-drive', f'format=raw,if=none,file={drive},id=hd0')
+        self.vm.add_args('-device', 'virtio-blk-pci,drive=hd0')
+        self.vm.add_args('-device', 'virtio-9p-device,fsdev=shr0,mount_tag=shr0')
+        self.vm.add_args('-fsdev', f'local,security_model=none,path={rme_stack},id=shr0')
+        self.vm.add_args('-device', 'virtio-net-pci,netdev=net0')
+        self.vm.add_args('-netdev', 'user,id=net0')
+        self.vm.add_args('-append', 'root=/dev/vda')
+
+        self.vm.launch()
+        # Wait for host VM boot to complete.
+        self.wait_for_console_pattern('Welcome to Buildroot')
+        exec_command_and_wait_for_pattern(self, 'root', '#')
+
+        # We now boot the (nested) guest VM
+        exec_command(self,
+                     'qemu-system-aarch64 -M virt,gic-version=3 '
+                     '-cpu host -enable-kvm -m 512M '
+                     '-M confidential-guest-support=rme0 '
+                     '-object rme-guest,id=rme0,measurement-algo=sha512 '
+                     '-device virtio-net-pci,netdev=net0,romfile= '
+                     '-netdev user,id=net0 '
+                     '-kernel /mnt/out/bin/Image '
+                     '-initrd /mnt/out-br/images/rootfs.cpio '
+                     '-serial stdio')
+        # Detect Realm activation during (nested) guest boot.
+        self.wait_for_console_pattern('SMC_RMI_REALM_ACTIVATE')
+        # Wait for (nested) guest boot to complete.
+        self.wait_for_console_pattern('Welcome to Buildroot')
+        exec_command_and_wait_for_pattern(self, 'root', '#')
+
+        # query (nested) guest cca report
+        exec_command(self, 'cca-workload-attestation report')
+        self.wait_for_console_pattern('"cca-platform-hash-algo-id": "sha-256"')
+        self.wait_for_console_pattern('"cca-realm-hash-algo-id": "sha-512"')
+        self.wait_for_console_pattern('"cca-realm-public-key-hash-algo-id": "sha-256"')
+
+if __name__ == '__main__':
+    QemuSystemTest.main()