[RFC,00/40] Toward class init of cpu features

Message ID	20230103181646.55711-1-richard.henderson@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Cc: pbonzini@redhat.com, berrange@redhat.com, eduardo@habkost.net, armbru@redhat.com, ajones@ventanamicro.com, alex.bennee@linaro.org Subject: [RFC PATCH 00/40] Toward class init of cpu features Date: Tue, 3 Jan 2023 10:16:06 -0800 Message-Id: <20230103181646.55711-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::b33; envelope-from=richard.henderson@linaro.org; helo=mail-yb1-xb33.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	Toward class init of cpu features \| expand [RFC,00/40] Toward class init of cpu features [RFC,01/40] qdev: Don't always force the global property array non-null [RFC,02/40] qom: Introduce class_late_init [RFC,03/40] qom: Create class properties [RFC,04/40] target/arm: Remove aarch64_cpu_finalizefn [RFC,05/40] target/arm: Create arm_cpu_register_parent [RFC,06/40] target/arm: Remove AArch64CPUClass [RFC,07/40] target/arm: Create TYPE_ARM_V7M_CPU [RFC,08/40] target/arm: Pass ARMCPUClass to ARMCPUInfo.class_init [RFC,09/40] target/arm: Utilize arm-cpu instance_post_init hook [RFC,10/40] target/arm: Copy dtb_compatible from ARMCPUClass [RFC,11/40] target/arm: Copy features from ARMCPUClass [RFC,12/40] target/arm: Copy isar and friends from ARMCPUClass [RFC,13/40] hw/arm/bcm2836: Set mp-affinity property in realize [RFC,14/40] target/arm: Rename arm_cpu_mp_affinity [RFC,15/40] target/arm: Create arm_cpu_mp_affinity [RFC,16/40] target/arm: Represent the entire MPIDR_EL1 [RFC,17/40] target/arm: Copy cp_regs from ARMCPUClass [RFC,18/40] target/arm: Create cpreg definition functions with GHashTable arg [RFC,19/40] target/arm: Move most cpu initialization to the class [RFC,20/40] target/arm: Merge kvm64.c with kvm.c [RFC,21/40] target/arm: Remove aarch64 check from aarch64_host_object_init [RFC,22/40] target/arm: Hoist feature and dtb_compatible from KVM, HVF [RFC,23/40] target/arm: Probe KVM host into ARMCPUClass [RFC,24/40] target/arm/hvf: Probe host into ARMCPUClass [RFC,25/40] target/arm/hvf: Use offsetof in hvf_arm_get_host_cpu_features [RFC,26/40] target/arm: Rename 'cpu' to 'acc' in class init functions [RFC,27/40] target/arm: Split out strongarm_class_init [RFC,28/40] target/arm: Split out xscale*_class_init [RFC,29/40] target/arm: Remove m-profile has_vfp and has_dsp properties [RFC,30/40] target/arm: Move feature bit propagation to class init [RFC,31/40] target/arm: Get and set class properties in the monitor [RFC,32/40] target/arm: Move "midr" to class property [RFC,33/40] target/arm: Move "cntfrq" to class property [RFC,34/40] target/arm: Move "reset-hivecs" to class property [RFC,35/40] target/arm: Move "has_el2" to class property [RFC,36/40] target/arm: Move "has_el3" to class property [RFC,37/40] target/arm: Move "cfgend" to class property [RFC,38/40] target/arm: Move "vfp" and "neon" to class properties [RFC,39/40] target/arm: Move "has-mpu" and "pmsav7-dregion" to class properties [RFC,40/40] target/arm: Move "pmu" to class property

Richard Henderson Jan. 3, 2023, 6:16 p.m. UTC

The specific problem I'm trying to solve is the location and
representation of the coprocessor register hash table for ARM cpus,
but in the process affects how cpu initialization might be done for
all targets.

At present, each cpu (for all targets) is initialized separately.
For some, like ARM, we apply QOM properties and then build a hash
table of all coprocessor regs that are valid for the cpu.  For others,
like m68k and ppc, we build tables of all valid instructions
(ppc is slowly moving away from constructed tables to decodetree,
but the migration is not complete).

Importantly, this happens N times for smp system emulation, for each
cpu instance, meaning we do N times the work on startup as necessary.
For system emulation this isn't so bad, as N is generally small-ish,
but it certainly could be improved.

More importantly, this happens for each thread in user-only emulation.
This is much more significant because threads can be short-lived, and
there can be many hundreds of them, each one performing what amounts
to redundant initialization.

The "obvious" solution is to make better use of the cpu class object.
Construct data structures once to be shared with all instances.

The immediate problem encountered here is that of QOM properties,
which are used to control how each object is configured.

The first couple of patches here create a new QOM "class property",
which is sufficient to allow command-line parameters to find their
way to the correct bit of code to interpret them.  (Pardon the
roughness in these patches, RFC and all.)  The restriction I add 
here is that all class properties must be applied before the first
object is created, which gives for a definite point at which the
properties must have be applied and we can construct data structures
based on those values.

However, I wind up banging my head against the wall later, when it
comes to the first qmp_query_cpu_model_expansion properties.  This
wants to create an object, apply properties, and see what stuck.
Adding a query of the class properties is fine, but I can't set any
of them because of the object creation.  And if I tweak the location
of the class property set vs the object create, that only allows the
first such probe to succeed, because the second will still have had
an object created.

I can imagine a way out of this, e.g. by keeping a ref count of the
number of instantiated objects, and allowing class properties to change 
when that count is zero.  But it feels like I really should know what
QMP is attempting to do here, rather than work around it blindly.


r~


Richard Henderson (40):
  qdev: Don't always force the global property array non-null
  qom: Introduce class_late_init
  qom: Create class properties
  target/arm: Remove aarch64_cpu_finalizefn
  target/arm: Create arm_cpu_register_parent
  target/arm: Remove AArch64CPUClass
  target/arm: Create TYPE_ARM_V7M_CPU
  target/arm: Pass ARMCPUClass to ARMCPUInfo.class_init
  target/arm: Utilize arm-cpu instance_post_init hook
  target/arm: Copy dtb_compatible from ARMCPUClass
  target/arm: Copy features from ARMCPUClass
  target/arm: Copy isar and friends from ARMCPUClass
  hw/arm/bcm2836: Set mp-affinity property in realize
  target/arm: Rename arm_cpu_mp_affinity
  target/arm: Create arm_cpu_mp_affinity
  target/arm: Represent the entire MPIDR_EL1
  target/arm: Copy cp_regs from ARMCPUClass
  target/arm: Create cpreg definition functions with GHashTable arg
  target/arm: Move most cpu initialization to the class
  target/arm: Merge kvm64.c with kvm.c
  target/arm: Remove aarch64 check from aarch64_host_object_init
  target/arm: Hoist feature and dtb_compatible from KVM, HVF
  target/arm: Probe KVM host into ARMCPUClass
  target/arm/hvf: Probe host into ARMCPUClass
  target/arm/hvf: Use offsetof in hvf_arm_get_host_cpu_features
  target/arm: Rename 'cpu' to 'acc' in class init functions
  target/arm: Split out strongarm_class_init
  target/arm: Split out xscale*_class_init
  target/arm: Remove m-profile has_vfp and has_dsp properties
  target/arm: Move feature bit propagation to class init
  target/arm: Get and set class properties in the monitor
  target/arm: Move "midr" to class property
  target/arm: Move "cntfrq" to class property
  target/arm: Move "reset-hivecs" to class property
  target/arm: Move "has_el2" to class property
  target/arm: Move "has_el3" to class property
  target/arm: Move "cfgend" to class property
  target/arm: Move "vfp" and "neon" to class properties
  target/arm: Move "has-mpu" and "pmsav7-dregion" to class properties
  target/arm: Move "pmu" to class property

 hw/core/qdev-prop-internal.h |    2 +
 include/hw/arm/armsse.h      |    3 +-
 include/hw/arm/armv7m.h      |    2 -
 include/qom/object.h         |   67 ++
 include/qom/qom-qobject.h    |   28 +
 target/arm/cpregs.h          |   33 +-
 target/arm/cpu-qom.h         |  171 ++-
 target/arm/cpu.h             |   89 +-
 target/arm/hvf_arm.h         |    2 +-
 target/arm/internals.h       |    6 +-
 target/arm/kvm_arm.h         |  217 +---
 hw/arm/allwinner-h3.c        |   14 +-
 hw/arm/armsse.c              |   53 +-
 hw/arm/armv7m.c              |   12 -
 hw/arm/aspeed_ast2600.c      |    9 +-
 hw/arm/bcm2836.c             |    7 +-
 hw/arm/digic.c               |   11 +-
 hw/arm/exynos4210.c          |   18 +-
 hw/arm/integratorcp.c        |   12 +-
 hw/arm/musca.c               |   14 +-
 hw/arm/npcm7xx.c             |   16 +-
 hw/arm/realview.c            |   20 +-
 hw/arm/sbsa-ref.c            |    2 +-
 hw/arm/versatilepb.c         |   12 +-
 hw/arm/vexpress.c            |   19 +-
 hw/arm/virt-acpi-build.c     |    2 +-
 hw/arm/virt.c                |   47 +-
 hw/arm/xilinx_zynq.c         |   15 +-
 hw/arm/xlnx-versal-virt.c    |    3 +-
 hw/arm/xlnx-zynqmp.c         |   17 +-
 hw/core/cpu-common.c         |   61 +-
 hw/core/qdev-properties.c    |   79 +-
 hw/core/qdev.c               |    2 +
 hw/cpu/a15mpcore.c           |   14 +-
 hw/cpu/a9mpcore.c            |    6 +-
 hw/intc/armv7m_nvic.c        |    2 +-
 hw/misc/xlnx-versal-crl.c    |    4 +-
 qom/object.c                 |  249 ++++-
 qom/object_interfaces.c      |   13 +
 qom/qom-qmp-cmds.c           |   37 +
 target/arm/arm-powerctl.c    |    2 +-
 target/arm/cpu.c             | 1195 +++++++++++---------
 target/arm/cpu64.c           | 1014 +++++++++--------
 target/arm/cpu_tcg.c         | 1486 +++++++++++++------------
 target/arm/helper.c          |   96 +-
 target/arm/hvf/hvf.c         |   94 +-
 target/arm/kvm.c             | 2006 +++++++++++++++++++++++++++++++---
 target/arm/kvm64.c           | 1632 ---------------------------
 target/arm/monitor.c         |   21 +-
 target/arm/psci.c            |    2 +-
 target/arm/meson.build       |    2 +-
 51 files changed, 4765 insertions(+), 4175 deletions(-)
 delete mode 100644 target/arm/kvm64.c

Richard Henderson Jan. 4, 2023, 12:01 a.m. UTC | #1

On 1/3/23 10:16, Richard Henderson wrote:
> Richard Henderson (40):
>    target/arm: Remove aarch64_cpu_finalizefn
>    target/arm: Create arm_cpu_register_parent
>    target/arm: Remove AArch64CPUClass
>    target/arm: Create TYPE_ARM_V7M_CPU
...
>    target/arm: Utilize arm-cpu instance_post_init hook
...
>    hw/arm/bcm2836: Set mp-affinity property in realize
>    target/arm: Rename arm_cpu_mp_affinity
>    target/arm: Create arm_cpu_mp_affinity
>    target/arm: Represent the entire MPIDR_EL1

I meant to say: Peter, at least this handfull of patches ought to be useful cleanup 
regardless of the rest of the series.


r~

Peter Maydell Jan. 6, 2023, 7:12 p.m. UTC | #2

On Tue, 3 Jan 2023 at 18:27, Richard Henderson
<richard.henderson@linaro.org> wrote:
> The specific problem I'm trying to solve is the location and
> representation of the coprocessor register hash table for ARM cpus,
> but in the process affects how cpu initialization might be done for
> all targets.
>
> At present, each cpu (for all targets) is initialized separately.
> For some, like ARM, we apply QOM properties and then build a hash
> table of all coprocessor regs that are valid for the cpu.  For others,
> like m68k and ppc, we build tables of all valid instructions
> (ppc is slowly moving away from constructed tables to decodetree,
> but the migration is not complete).
>
> Importantly, this happens N times for smp system emulation, for each
> cpu instance, meaning we do N times the work on startup as necessary.
> For system emulation this isn't so bad, as N is generally small-ish,
> but it certainly could be improved.
>
> More importantly, this happens for each thread in user-only emulation.
> This is much more significant because threads can be short-lived, and
> there can be many hundreds of them, each one performing what amounts
> to redundant initialization.
>
> The "obvious" solution is to make better use of the cpu class object.
> Construct data structures once to be shared with all instances.

The trouble with this idea is that not all instances of the same
class are actually necessarily the same. For instance, if you
have a system with both (a) a Cortex-A53 with a PMU, and
(b) a Cortex-A53 without a PMU, then they're both instances of
the same class, but they shouldn't be sharing the coprocessor
register hashtable because they don't have an identical set of
system registers.

This kind of same-CPU-type-heterogenous-configuration system is
not something we're currently using on A-profile, but we do have
it for M-profile (the sse200 has a dual-core setup where only one
of the CPUs has an FPU), so it's not totally outlandish.

thanks
-- PMM

Richard Henderson Jan. 6, 2023, 7:29 p.m. UTC | #3

On 1/6/23 11:12, Peter Maydell wrote:
> The trouble with this idea is that not all instances of the same
> class are actually necessarily the same. For instance, if you
> have a system with both (a) a Cortex-A53 with a PMU, and
> (b) a Cortex-A53 without a PMU, then they're both instances of
> the same class, but they shouldn't be sharing the coprocessor
> register hashtable because they don't have an identical set of
> system registers.
> 
> This kind of same-CPU-type-heterogenous-configuration system is
> not something we're currently using on A-profile, but we do have
> it for M-profile (the sse200 has a dual-core setup where only one
> of the CPUs has an FPU), so it's not totally outlandish.

Yes, I know.  See patch 29 where I moved the vfp and dsp properties off of the m-profile 
cpus and created new cpu classes instead, specifically for the sse220.

It's not scalable, I'll grant you, but it's hard to design for something we're not using. 
What we use now, apart from the sse200, are common properties set on the command-line.

If we were presented with the class properties early enough, we could create sub-classes 
with the desired properties before instantiating the objects that go with.

Anyway, it seems like we ought to have some solution that does not involve repeating the 
same id register finalization + cpregs hash table construction per cpu -- especially for 
user-only.

r~

Peter Maydell Jan. 6, 2023, 9:59 p.m. UTC | #4

On Fri, 6 Jan 2023 at 19:29, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 1/6/23 11:12, Peter Maydell wrote:
> > The trouble with this idea is that not all instances of the same
> > class are actually necessarily the same. For instance, if you
> > have a system with both (a) a Cortex-A53 with a PMU, and
> > (b) a Cortex-A53 without a PMU, then they're both instances of
> > the same class, but they shouldn't be sharing the coprocessor
> > register hashtable because they don't have an identical set of
> > system registers.
> >
> > This kind of same-CPU-type-heterogenous-configuration system is
> > not something we're currently using on A-profile, but we do have
> > it for M-profile (the sse200 has a dual-core setup where only one
> > of the CPUs has an FPU), so it's not totally outlandish.
>
> Yes, I know.  See patch 29 where I moved the vfp and dsp properties off of the m-profile
> cpus and created new cpu classes instead, specifically for the sse220.
>
> It's not scalable, I'll grant you, but it's hard to design for something we're not using.
> What we use now, apart from the sse200, are common properties set on the command-line.

We also set some properties in code -- eg aspeed_ast2600.c clears
the 'neon' property on its CPUs, lots of the boards clear
has_el3 and has_el2, etc. I hadn't got as far as patch 29, but
looking at it now that looks like a pretty strong indication
that this is the wrong way to go. It creates 3 extra
cortex-m33 CPU classes, and if we find another thing that
ought to be a CPU property then we'll be up to 8; and the
mess propagates up into the SSE-200 class, which is also
no longer able to be configured in the normal way by setting
properties on it, and it becomes visible in user-facing
command line stuff.

I think our object model pretty strongly wants "create object;
set properties on it that only affect this object you created;
realize it", and having one particular subset of objects that
doesn't work the same way is going to be very confusing.

thanks
-- PMM

Richard Henderson Jan. 6, 2023, 10:28 p.m. UTC | #5

On 1/6/23 13:59, Peter Maydell wrote:
> We also set some properties in code -- eg aspeed_ast2600.c clears
> the 'neon' property on its CPUs, lots of the boards clear
> has_el3 and has_el2, etc.

Yes indeed, but in all of those cases we want all of the cpus to act identically.  Those 
are all easily handled (patches 35, 36, 38).

> I hadn't got as far as patch 29, but
> looking at it now that looks like a pretty strong indication
> that this is the wrong way to go. It creates 3 extra
> cortex-m33 CPU classes, and if we find another thing that
> ought to be a CPU property then we'll be up to 8;

If we find another thing that needs to be different between cpus, you mean?

> and it becomes visible in user-facing command line stuff.

No it doesn't -- command line is *not* affected, because both before and after, all 
properties are applied identically to all objects.

QMP is affected, which is where I stopped and started asking questions about what QMP is 
actually trying to do.

> I think our object model pretty strongly wants "create object;
> set properties on it that only affect this object you created;
> realize it", and having one particular subset of objects that
> doesn't work the same way is going to be very confusing.

Eh, I didn't think it's particularly confusing as a concept.
The code is rough, buy what one might expect from an RFC.

We really ought to have *some* solution to not repeating property + feature + isar 
interpretation on a per-cpu basis.  I'd be delighted to hear alternatives.

r~

Alex Bennée Jan. 6, 2023, 11:43 p.m. UTC | #6

Peter Maydell <peter.maydell@linaro.org> writes:

> On Fri, 6 Jan 2023 at 19:29, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 1/6/23 11:12, Peter Maydell wrote:
>> > The trouble with this idea is that not all instances of the same
>> > class are actually necessarily the same. For instance, if you
>> > have a system with both (a) a Cortex-A53 with a PMU, and
>> > (b) a Cortex-A53 without a PMU, then they're both instances of
>> > the same class, but they shouldn't be sharing the coprocessor
>> > register hashtable because they don't have an identical set of
>> > system registers.
>> >
>> > This kind of same-CPU-type-heterogenous-configuration system is
>> > not something we're currently using on A-profile, but we do have
>> > it for M-profile (the sse200 has a dual-core setup where only one
>> > of the CPUs has an FPU), so it's not totally outlandish.
>>
>> Yes, I know.  See patch 29 where I moved the vfp and dsp properties off of the m-profile
>> cpus and created new cpu classes instead, specifically for the sse220.
>>
>> It's not scalable, I'll grant you, but it's hard to design for something we're not using.
>> What we use now, apart from the sse200, are common properties set on the command-line.
>
> We also set some properties in code -- eg aspeed_ast2600.c clears
> the 'neon' property on its CPUs, lots of the boards clear
> has_el3 and has_el2, etc. I hadn't got as far as patch 29, but
> looking at it now that looks like a pretty strong indication
> that this is the wrong way to go. It creates 3 extra
> cortex-m33 CPU classes, and if we find another thing that
> ought to be a CPU property then we'll be up to 8; and the
> mess propagates up into the SSE-200 class, which is also
> no longer able to be configured in the normal way by setting
> properties on it, and it becomes visible in user-facing
> command line stuff.
>
> I think our object model pretty strongly wants "create object;
> set properties on it that only affect this object you created;
> realize it", and having one particular subset of objects that
> doesn't work the same way is going to be very confusing.

What about cloning objects after they are realised? After all that is
what we do for the core CPUClass in user-mode.

>
> thanks
> -- PMM

Richard Henderson Jan. 6, 2023, 11:57 p.m. UTC | #7

On 1/6/23 15:43, Alex Bennée wrote:
> What about cloning objects after they are realised? After all that is
> what we do for the core CPUClass in user-mode.

No we don't.  Where do you get that idea?


r~

Alex Bennée Jan. 7, 2023, 10:19 a.m. UTC | #8

Richard Henderson <richard.henderson@linaro.org> writes:

> On 1/6/23 15:43, Alex Bennée wrote:
>> What about cloning objects after they are realised? After all that is
>> what we do for the core CPUClass in user-mode.
>
> No we don't.  Where do you get that idea?

Well linux-user does cpu_copy which involves a create step followed by a
reset and then a bunch of copying state across. Can we assume all CPUs
get reset before they are actively used?

Would it be too hacky to defer the creation of those hash tables to the
reset phase and skip it if already defined?

Peter Maydell Jan. 7, 2023, 1:14 p.m. UTC | #9

On Fri, 6 Jan 2023 at 22:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 1/6/23 13:59, Peter Maydell wrote:
> > We also set some properties in code -- eg aspeed_ast2600.c clears
> > the 'neon' property on its CPUs, lots of the boards clear
> > has_el3 and has_el2, etc.
>
> Yes indeed, but in all of those cases we want all of the cpus to act identically.  Those
> are all easily handled (patches 35, 36, 38).

That's just a happenstance of how the boards create them.
There's no inherent reason that every CPU of a particular
type in the system has to have identical properties.

> > I hadn't got as far as patch 29, but
> > looking at it now that looks like a pretty strong indication
> > that this is the wrong way to go. It creates 3 extra
> > cortex-m33 CPU classes, and if we find another thing that
> > ought to be a CPU property then we'll be up to 8;
>
> If we find another thing that needs to be different between cpus, you mean?

Yes. But conceptually we already have lots of those, we just
happen not to be using them right this instant.

> > I think our object model pretty strongly wants "create object;
> > set properties on it that only affect this object you created;
> > realize it", and having one particular subset of objects that
> > doesn't work the same way is going to be very confusing.
>
> Eh, I didn't think it's particularly confusing as a concept.
> The code is rough, buy what one might expect from an RFC.
>
> We really ought to have *some* solution to not repeating property + feature + isar
> interpretation on a per-cpu basis.  I'd be delighted to hear alternatives.

Hash "cpu type plus property settings plus ID registers",
and look them up to see if we've already created the
cpregs hashtable for an existing CPU?

-- PMM

Richard Henderson Jan. 7, 2023, 5:53 p.m. UTC | #10

On 1/7/23 02:19, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> On 1/6/23 15:43, Alex Bennée wrote:
>>> What about cloning objects after they are realised? After all that is
>>> what we do for the core CPUClass in user-mode.
>>
>> No we don't.  Where do you get that idea?
> 
> Well linux-user does cpu_copy which involves a create step followed by a
> reset and then a bunch of copying state across. Can we assume all CPUs
> get reset before they are actively used?

The hash table creation happens during qdev_realize, there in cpu_create.

> Would it be too hacky to defer the creation of those hash tables to the
> reset phase and skip it if already defined?

Even then you have the copy after the reset, so no, that won't work.


r~

[RFC,00/40] Toward class init of cpu features

Message

Comments