mbox series

[RFC,00/12] hw/arm/virt: Introduce cpu and cache topology support

Message ID 20200917032033.2020-1-fangying1@huawei.com
Headers show
Series hw/arm/virt: Introduce cpu and cache topology support | expand

Message

Ying Fang Sept. 17, 2020, 3:20 a.m. UTC
An accurate cpu topology may help improve the cpu scheduler's decision
making when dealing with multi-core system. So cpu topology description
is helpful to provide guest with the right view. Cpu cache information may
also have slight impact on the sched domain, and even userspace software
may check the cpu cache information to do some optimizations. Thus this patch
series is posted to provide cpu and cache topology support for arm.

To make the cpu topology consistent with MPIDR, an vcpu ioctl
KVM_ARM_SET_MP_AFFINITY is introduced so that userspace can set MPIDR
according to the topology specified [1]. To describe the cpu topology
both fdt and ACPI are supported. To describe the cpu cache information,
a default cache hierarchy is given and can be made configurable later.
The cpu topology is built according to processor hierarchy node structure.
The cpu cache information is built according to cache type structure.

This patch series is partially based on the patches posted by Andrew Jone
years ago [2], I jumped in on it since some OS vendor cooperative partners
are eager for it. Thanks for Andrew's contribution. Please feel free to reply
to me if there is anything improper.

[1] https://patchwork.kernel.org/cover/11781317
[2] https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjones@redhat.com

Andrew Jones (2):
  device_tree: add qemu_fdt_add_path
  hw/arm/virt: DT: add cpu-map

Ying Fang (10):
  linux headers: Update linux header with KVM_ARM_SET_MP_AFFINITY
  target/arm/kvm64: make MPIDR consistent with CPU Topology
  target/arm/kvm32: make MPIDR consistent with CPU Topology
  hw/arm/virt-acpi-build: distinguish possible and present cpus
  hw/acpi/aml-build: add processor hierarchy node structure
  hw/arm/virt-acpi-build: add PPTT table
  target/arm/cpu: Add CPU cache description for arm
  hw/arm/virt: add fdt cache information
  hw/acpi/aml-build: build ACPI CPU cache topology information
  hw/arm/virt-acpi-build: Enable CPU cache topology

 device_tree.c                |  24 +++++++
 hw/acpi/aml-build.c          |  68 +++++++++++++++++++
 hw/arm/virt-acpi-build.c     |  99 +++++++++++++++++++++++++--
 hw/arm/virt.c                | 128 ++++++++++++++++++++++++++++++++++-
 include/hw/acpi/acpi-defs.h  |  14 ++++
 include/hw/acpi/aml-build.h  |  11 +++
 include/hw/arm/virt.h        |   1 +
 include/sysemu/device_tree.h |   1 +
 linux-headers/linux/kvm.h    |   3 +
 target/arm/cpu.c             |  42 ++++++++++++
 target/arm/cpu.h             |  27 ++++++++
 target/arm/kvm32.c           |  46 ++++++++++---
 target/arm/kvm64.c           |  46 ++++++++++---
 13 files changed, 488 insertions(+), 22 deletions(-)

Comments

Andrew Jones Sept. 17, 2020, 8:07 a.m. UTC | #1
On Thu, Sep 17, 2020 at 11:20:24AM +0800, Ying Fang wrote:
> MPIDR helps to provide an additional PE identification in a multiprocessor
> system. This patch adds support for setting MPIDR from userspace, so that
> MPIDR is consistent with CPU topology configured.
> 
> Signed-off-by: Ying Fang <fangying1@huawei.com>
> ---
>  target/arm/kvm32.c | 46 ++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 38 insertions(+), 8 deletions(-)
> 
> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> index 0af46b41c8..85694dc8bf 100644
> --- a/target/arm/kvm32.c
> +++ b/target/arm/kvm32.c

This file no longer exists in mainline. Please rebase the whole series.

Thanks,
drew
Andrew Jones Sept. 17, 2020, 8:27 a.m. UTC | #2
On Thu, Sep 17, 2020 at 11:20:28AM +0800, Ying Fang wrote:
> Add the processor hierarchy node structures to build ACPI information
> for CPU topology. Three helpers are introduced:
> 
> (1) build_socket_hierarchy for socket description structure
> (2) build_processor_hierarchy for processor description structure
> (3) build_smt_hierarchy for thread (logic processor) description structure
> 
> Signed-off-by: Ying Fang <fangying1@huawei.com>
> Signed-off-by: Henglong Fan <fanhenglong@huawei.com>
> ---
>  hw/acpi/aml-build.c         | 37 +++++++++++++++++++++++++++++++++++++
>  include/hw/acpi/aml-build.h |  7 +++++++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index f6fbc9b95d..13eb6e1345 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1754,6 +1754,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)
>                   table_data->len - slit_start, 1, NULL, NULL);
>  }
>  
> +/*
> + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)
> + */
> +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
> +{
> +    build_append_byte(tbl, 0);          /* Type 0 - processor */
> +    build_append_byte(tbl, 20);         /* Length, no private resources */
> +    build_append_int_noprefix(tbl, 0, 2);  /* Reserved */
> +    build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */
> +    build_append_int_noprefix(tbl, parent, 4);  /* Parent */
> +    build_append_int_noprefix(tbl, id, 4);     /* ACPI processor ID */
> +    build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
> +}
> +
> +void build_processor_hierarchy(GArray *tbl, uint32_t flags,
> +                               uint32_t parent, uint32_t id)
> +{
> +    build_append_byte(tbl, 0);          /* Type 0 - processor */
> +    build_append_byte(tbl, 20);         /* Length, no private resources */
> +    build_append_int_noprefix(tbl, 0, 2);      /* Reserved */
> +    build_append_int_noprefix(tbl, flags, 4);  /* Flags */
> +    build_append_int_noprefix(tbl, parent, 4); /* Parent */
> +    build_append_int_noprefix(tbl, id, 4);     /* ACPI processor ID */
> +    build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
> +}

I see you took this from
https://patchwork.ozlabs.org/project/qemu-devel/patch/20180704124923.32483-6-drjones@redhat.com/
(even though you neglected to mention that)

I've tweaked my implementation of it slightly per Igor's comments for the
refresh. See build_processor_hierarchy_node() in
https://github.com/rhdrjones/qemu/commit/439b38d67ca1f2cbfa5b9892a822b651ebd05c11

> +
> +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
> +{
> +    build_append_byte(tbl, 0);            /* Type 0 - processor */
> +    build_append_byte(tbl, 20);           /* Length, add private resources */
> +    build_append_int_noprefix(tbl, 0, 2); /* Reserved */
> +    build_append_int_noprefix(tbl, 0x0e, 4);    /* Processor is a thread */
> +    build_append_int_noprefix(tbl, parent , 4); /* parent */
> +    build_append_int_noprefix(tbl, id, 4);      /* ACPI processor ID */
> +    build_append_int_noprefix(tbl, 0, 4);       /* Num of private resources */
> +}
> +
>  /* build rev1/rev3/rev5.1 FADT */
>  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
>                  const char *oem_id, const char *oem_table_id)
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index d27da03d64..ff4c6a38f3 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -435,6 +435,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
>  
>  void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);
>  
> +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
> +
> +void build_processor_hierarchy(GArray *tbl, uint32_t flags,
> +                               uint32_t parent, uint32_t id);
> +
> +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);

Why add build_socket_hierarchy() and build_smt_hierarchy() ?

> +
>  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
>                  const char *oem_id, const char *oem_table_id);
>  
> -- 
> 2.23.0
> 

Thanks,
drew
Andrew Jones Sept. 17, 2020, 8:39 a.m. UTC | #3
On Thu, Sep 17, 2020 at 11:20:30AM +0800, Ying Fang wrote:
> Add the CPUCacheInfo structure to hold CPU cache information for ARM cpus.
> A classic three level cache topology is used here. The default cache
> capacity is given and userspace can overwrite these values.

Doesn't TCG already have some sort of fake cache hierarchy? If so, then
we shouldn't be adding another one, we should be simply describing the
one we already have. For KVM, we shouldn't describe anything other than
what is actually on the host.

drew
Ying Fang Sept. 17, 2020, 1:26 p.m. UTC | #4
On 9/17/2020 4:07 PM, Andrew Jones wrote:
> On Thu, Sep 17, 2020 at 11:20:24AM +0800, Ying Fang wrote:

>> MPIDR helps to provide an additional PE identification in a multiprocessor

>> system. This patch adds support for setting MPIDR from userspace, so that

>> MPIDR is consistent with CPU topology configured.

>>

>> Signed-off-by: Ying Fang <fangying1@huawei.com>

>> ---

>>   target/arm/kvm32.c | 46 ++++++++++++++++++++++++++++++++++++++--------

>>   1 file changed, 38 insertions(+), 8 deletions(-)

>>

>> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c

>> index 0af46b41c8..85694dc8bf 100644

>> --- a/target/arm/kvm32.c

>> +++ b/target/arm/kvm32.c

> 

> This file no longer exists in mainline. Please rebase the whole series.

Thanks, it is gone. Will rebase it.
> 

> Thanks,

> drew

> 

> .

>
Ying Fang Sept. 17, 2020, 2:03 p.m. UTC | #5
On 9/17/2020 4:27 PM, Andrew Jones wrote:
> On Thu, Sep 17, 2020 at 11:20:28AM +0800, Ying Fang wrote:

>> Add the processor hierarchy node structures to build ACPI information

>> for CPU topology. Three helpers are introduced:

>>

>> (1) build_socket_hierarchy for socket description structure

>> (2) build_processor_hierarchy for processor description structure

>> (3) build_smt_hierarchy for thread (logic processor) description structure

>>

>> Signed-off-by: Ying Fang <fangying1@huawei.com>

>> Signed-off-by: Henglong Fan <fanhenglong@huawei.com>

>> ---

>>   hw/acpi/aml-build.c         | 37 +++++++++++++++++++++++++++++++++++++

>>   include/hw/acpi/aml-build.h |  7 +++++++

>>   2 files changed, 44 insertions(+)

>>

>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c

>> index f6fbc9b95d..13eb6e1345 100644

>> --- a/hw/acpi/aml-build.c

>> +++ b/hw/acpi/aml-build.c

>> @@ -1754,6 +1754,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms)

>>                    table_data->len - slit_start, 1, NULL, NULL);

>>   }

>>   

>> +/*

>> + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0)

>> + */

>> +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)

>> +{

>> +    build_append_byte(tbl, 0);          /* Type 0 - processor */

>> +    build_append_byte(tbl, 20);         /* Length, no private resources */

>> +    build_append_int_noprefix(tbl, 0, 2);  /* Reserved */

>> +    build_append_int_noprefix(tbl, 1, 4);  /* Flags: Physical package */

>> +    build_append_int_noprefix(tbl, parent, 4);  /* Parent */

>> +    build_append_int_noprefix(tbl, id, 4);     /* ACPI processor ID */

>> +    build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */

>> +}

>> +

>> +void build_processor_hierarchy(GArray *tbl, uint32_t flags,

>> +                               uint32_t parent, uint32_t id)

>> +{

>> +    build_append_byte(tbl, 0);          /* Type 0 - processor */

>> +    build_append_byte(tbl, 20);         /* Length, no private resources */

>> +    build_append_int_noprefix(tbl, 0, 2);      /* Reserved */

>> +    build_append_int_noprefix(tbl, flags, 4);  /* Flags */

>> +    build_append_int_noprefix(tbl, parent, 4); /* Parent */

>> +    build_append_int_noprefix(tbl, id, 4);     /* ACPI processor ID */

>> +    build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */

>> +}

> 

> I see you took this from

> https://patchwork.ozlabs.org/project/qemu-devel/patch/20180704124923.32483-6-drjones@redhat.com/

> (even though you neglected to mention that)

> 

> I've tweaked my implementation of it slightly per Igor's comments for the

> refresh. See build_processor_hierarchy_node() in

> https://github.com/rhdrjones/qemu/commit/439b38d67ca1f2cbfa5b9892a822b651ebd05c11

Ok, I will sync with your work and test it.
> 

>> +

>> +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)

>> +{

>> +    build_append_byte(tbl, 0);            /* Type 0 - processor */

>> +    build_append_byte(tbl, 20);           /* Length, add private resources */

>> +    build_append_int_noprefix(tbl, 0, 2); /* Reserved */

>> +    build_append_int_noprefix(tbl, 0x0e, 4);    /* Processor is a thread */

>> +    build_append_int_noprefix(tbl, parent , 4); /* parent */

>> +    build_append_int_noprefix(tbl, id, 4);      /* ACPI processor ID */

>> +    build_append_int_noprefix(tbl, 0, 4);       /* Num of private resources */

>> +}

>> +

>>   /* build rev1/rev3/rev5.1 FADT */

>>   void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,

>>                   const char *oem_id, const char *oem_table_id)

>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h

>> index d27da03d64..ff4c6a38f3 100644

>> --- a/include/hw/acpi/aml-build.h

>> +++ b/include/hw/acpi/aml-build.h

>> @@ -435,6 +435,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,

>>   

>>   void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms);

>>   

>> +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);

>> +

>> +void build_processor_hierarchy(GArray *tbl, uint32_t flags,

>> +                               uint32_t parent, uint32_t id);

>> +

>> +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);

> 

> Why add build_socket_hierarchy() and build_smt_hierarchy() ?


To distinguish between socket, core and thread topology level,
build_socket_hierarchy and build_smt_hierarchy are introduced.
They will make the code logical in built_pptt much more much 
straightforward I think.

> 

>> +

>>   void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,

>>                   const char *oem_id, const char *oem_table_id);

>>   

>> -- 

>> 2.23.0

>>

> 

> Thanks,

> drew

> 

> .

>
Ying Fang Sept. 17, 2020, 2:11 p.m. UTC | #6
On 9/17/2020 4:39 PM, Andrew Jones wrote:
> On Thu, Sep 17, 2020 at 11:20:30AM +0800, Ying Fang wrote:

>> Add the CPUCacheInfo structure to hold CPU cache information for ARM cpus.

>> A classic three level cache topology is used here. The default cache

>> capacity is given and userspace can overwrite these values.

> 

> Doesn't TCG already have some sort of fake cache hierarchy? If so, then


TCG may have some sort of fake cache hierarchy via CCSIDR.

> we shouldn't be adding another one, we should be simply describing the

> one we already have. For KVM, we shouldn't describe anything other than

> what is actually on the host.


Yes, I agreed. Cache capacity should be the with host otherwise it may
have bad impact on guest performance, we can do that by query from the
host and make cache capacity configurable from userspace.

Dario Faggioli is going to give a talk about it in KVM forum [1].

[1] 
https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse?iframe=no&w=100%&sidebar=yes&bg=no

Thanks.

> 

> drew

> 

> .

>
Salil Mehta Sept. 18, 2020, 12:25 a.m. UTC | #7
> From: Qemu-arm [mailto:qemu-arm-bounces+salil.mehta=huawei.com@nongnu.org]
> On Behalf Of Andrew Jones
> Sent: Thursday, September 17, 2020 9:13 AM
> To: fangying <fangying1@huawei.com>
> Cc: peter.maydell@linaro.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>;
> Alexander Graf <agraf@suse.de>; qemu-devel@nongnu.org; Chenzhendong (alex)
> <alex.chen@huawei.com>; shannon.zhaosl@gmail.com; qemu-arm@nongnu.org;
> alistair.francis@wdc.com; imammedo@redhat.com
> Subject: Re: [RFC PATCH 04/12] device_tree: add qemu_fdt_add_path
> 
> On Thu, Sep 17, 2020 at 11:20:25AM +0800, Ying Fang wrote:
> > From: Andrew Jones <drjones@redhat.com>
> >
> > qemu_fdt_add_path works like qemu_fdt_add_subnode, except it
> > also recursively adds any missing parent nodes.
> >
> > Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>
> > Cc: Alexander Graf <agraf@suse.de>
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  device_tree.c                | 24 ++++++++++++++++++++++++
> >  include/sysemu/device_tree.h |  1 +
> >  2 files changed, 25 insertions(+)
> >
> > diff --git a/device_tree.c b/device_tree.c
> > index b335dae707..1854be3a02 100644
> > --- a/device_tree.c
> > +++ b/device_tree.c
> > @@ -524,6 +524,30 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
> >      return retval;
> >  }
> >
> > +int qemu_fdt_add_path(void *fdt, const char *path)
> > +{
> > +    char *parent;
> > +    int offset;
> > +
> > +    offset = fdt_path_offset(fdt, path);
> > +    if (offset < 0 && offset != -FDT_ERR_NOTFOUND) {
> > +        error_report("%s Couldn't find node %s: %s", __func__, path,
> > +                     fdt_strerror(offset));
> > +        exit(1);
> > +    }
> > +
> > +    if (offset != -FDT_ERR_NOTFOUND) {
> > +        return offset;
> > +    }
> > +
> > +    parent = g_strdup(path);
> > +    strrchr(parent, '/')[0] = '\0';
> > +    qemu_fdt_add_path(fdt, parent);
> > +    g_free(parent);
> > +
> > +    return qemu_fdt_add_subnode(fdt, path);
> > +}
> 
> Igor didn't like the recursion when I posted this before so I changed
> it when doing the refresh[*] that I gave to Salil Mehta. Salil also
> works for Huawei, are you guys not working together?
> 
> [*] https://github.com/rhdrjones/qemu/commits/virt-cpu-topology-refresh
> 


I was not aware of this work going on. I am based at Cambridge and Fangying
in Hangzhou and work for different teams.

Yes, I have it and have integrated it with the Virtual CPU hotplug branch
and I am testing them.

I have also rebased virtual cpu hotplug patches and integrated the PMU
related changes recently been discussed in other mail-chain. Plus, I am
also dealing with the MPIDR/affinity part from the Kernel which has been
discussed earlier with the Marc Zyngier. 

It looks some of the changes are common here like ability to set MPIDR
of the vcpus at the time of their creation inside KVM. And the PPTT
table changes which were present in your refresh branch as well. Changes
related to the possible and present vcpus might need a sync as well.

@Andrew, should I take out the part which is common and affects the
virtual cpu hotplug and push them separately. This way I can have part
of the changes related to the vcpu hotplug done which will also cover
this patch-set requirements as well? 

@Fangying, will this work for you?


Salil.
Andrew Jones Sept. 18, 2020, 6:06 a.m. UTC | #8
On Fri, Sep 18, 2020 at 12:25:19AM +0000, Salil Mehta wrote:
> 

> > From: Qemu-arm [mailto:qemu-arm-bounces+salil.mehta=huawei.com@nongnu.org]

> > On Behalf Of Andrew Jones

> > Sent: Thursday, September 17, 2020 9:13 AM

> > To: fangying <fangying1@huawei.com>

> > Cc: peter.maydell@linaro.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>;

> > Alexander Graf <agraf@suse.de>; qemu-devel@nongnu.org; Chenzhendong (alex)

> > <alex.chen@huawei.com>; shannon.zhaosl@gmail.com; qemu-arm@nongnu.org;

> > alistair.francis@wdc.com; imammedo@redhat.com

> > Subject: Re: [RFC PATCH 04/12] device_tree: add qemu_fdt_add_path

> > 

> > On Thu, Sep 17, 2020 at 11:20:25AM +0800, Ying Fang wrote:

> > > From: Andrew Jones <drjones@redhat.com>

> > >

> > > qemu_fdt_add_path works like qemu_fdt_add_subnode, except it

> > > also recursively adds any missing parent nodes.

> > >

> > > Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>

> > > Cc: Alexander Graf <agraf@suse.de>

> > > Signed-off-by: Andrew Jones <drjones@redhat.com>

> > > ---

> > >  device_tree.c                | 24 ++++++++++++++++++++++++

> > >  include/sysemu/device_tree.h |  1 +

> > >  2 files changed, 25 insertions(+)

> > >

> > > diff --git a/device_tree.c b/device_tree.c

> > > index b335dae707..1854be3a02 100644

> > > --- a/device_tree.c

> > > +++ b/device_tree.c

> > > @@ -524,6 +524,30 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)

> > >      return retval;

> > >  }

> > >

> > > +int qemu_fdt_add_path(void *fdt, const char *path)

> > > +{

> > > +    char *parent;

> > > +    int offset;

> > > +

> > > +    offset = fdt_path_offset(fdt, path);

> > > +    if (offset < 0 && offset != -FDT_ERR_NOTFOUND) {

> > > +        error_report("%s Couldn't find node %s: %s", __func__, path,

> > > +                     fdt_strerror(offset));

> > > +        exit(1);

> > > +    }

> > > +

> > > +    if (offset != -FDT_ERR_NOTFOUND) {

> > > +        return offset;

> > > +    }

> > > +

> > > +    parent = g_strdup(path);

> > > +    strrchr(parent, '/')[0] = '\0';

> > > +    qemu_fdt_add_path(fdt, parent);

> > > +    g_free(parent);

> > > +

> > > +    return qemu_fdt_add_subnode(fdt, path);

> > > +}

> > 

> > Igor didn't like the recursion when I posted this before so I changed

> > it when doing the refresh[*] that I gave to Salil Mehta. Salil also

> > works for Huawei, are you guys not working together?

> > 

> > [*] https://github.com/rhdrjones/qemu/commits/virt-cpu-topology-refresh

> > 

> 

> 

> I was not aware of this work going on. I am based at Cambridge and Fangying

> in Hangzhou and work for different teams.

> 

> Yes, I have it and have integrated it with the Virtual CPU hotplug branch

> and I am testing them.

> 

> I have also rebased virtual cpu hotplug patches and integrated the PMU

> related changes recently been discussed in other mail-chain. Plus, I am

> also dealing with the MPIDR/affinity part from the Kernel which has been

> discussed earlier with the Marc Zyngier. 

> 

> It looks some of the changes are common here like ability to set MPIDR

> of the vcpus at the time of their creation inside KVM. And the PPTT

> table changes which were present in your refresh branch as well. Changes

> related to the possible and present vcpus might need a sync as well.

> 

> @Andrew, should I take out the part which is common and affects the

> virtual cpu hotplug and push them separately. This way I can have part

> of the changes related to the vcpu hotplug done which will also cover

> this patch-set requirements as well? 


Whatever works best for you and Ying Fang. It looks like this series
only focuses on topology. It's not considering present and possible
cpus, but it probably should. It also adds the cache hierarchy stuff,
but I'm not sure it's approaching that in the right way. I think it
may make sense to put this series on hold and take another look at
your hotplug series when it's reposted before deciding what to do.

Thanks,
drew

> 

> @Fangying, will this work for you?

> 

> 

> Salil. 

> 

>
Salil Mehta Sept. 18, 2020, 4:58 p.m. UTC | #9
> From: Andrew Jones [mailto:drjones@redhat.com]
> Sent: Friday, September 18, 2020 7:07 AM
> 
> On Fri, Sep 18, 2020 at 12:25:19AM +0000, Salil Mehta wrote:
> >
> > > From: Qemu-arm
> [mailto:qemu-arm-bounces+salil.mehta=huawei.com@nongnu.org]
> > > On Behalf Of Andrew Jones
> > > Sent: Thursday, September 17, 2020 9:13 AM
> > > To: fangying <fangying1@huawei.com>
> > > Cc: peter.maydell@linaro.org; Zhanghailiang
> <zhang.zhanghailiang@huawei.com>;
> > > Alexander Graf <agraf@suse.de>; qemu-devel@nongnu.org; Chenzhendong (alex)
> > > <alex.chen@huawei.com>; shannon.zhaosl@gmail.com; qemu-arm@nongnu.org;
> > > alistair.francis@wdc.com; imammedo@redhat.com
> > > Subject: Re: [RFC PATCH 04/12] device_tree: add qemu_fdt_add_path
> > >
> > > On Thu, Sep 17, 2020 at 11:20:25AM +0800, Ying Fang wrote:
> > > > From: Andrew Jones <drjones@redhat.com>
> > > >
> > > > qemu_fdt_add_path works like qemu_fdt_add_subnode, except it
> > > > also recursively adds any missing parent nodes.
> > > >
> > > > Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>
> > > > Cc: Alexander Graf <agraf@suse.de>
> > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > ---
> > > >  device_tree.c                | 24 ++++++++++++++++++++++++
> > > >  include/sysemu/device_tree.h |  1 +
> > > >  2 files changed, 25 insertions(+)
> > > >
> > > > diff --git a/device_tree.c b/device_tree.c
> > > > index b335dae707..1854be3a02 100644
> > > > --- a/device_tree.c
> > > > +++ b/device_tree.c
> > > > @@ -524,6 +524,30 @@ int qemu_fdt_add_subnode(void *fdt, const char *name)
> > > >      return retval;
> > > >  }
> > > >
> > > > +int qemu_fdt_add_path(void *fdt, const char *path)
> > > > +{
> > > > +    char *parent;
> > > > +    int offset;
> > > > +
> > > > +    offset = fdt_path_offset(fdt, path);
> > > > +    if (offset < 0 && offset != -FDT_ERR_NOTFOUND) {
> > > > +        error_report("%s Couldn't find node %s: %s", __func__, path,
> > > > +                     fdt_strerror(offset));
> > > > +        exit(1);
> > > > +    }
> > > > +
> > > > +    if (offset != -FDT_ERR_NOTFOUND) {
> > > > +        return offset;
> > > > +    }
> > > > +
> > > > +    parent = g_strdup(path);
> > > > +    strrchr(parent, '/')[0] = '\0';
> > > > +    qemu_fdt_add_path(fdt, parent);
> > > > +    g_free(parent);
> > > > +
> > > > +    return qemu_fdt_add_subnode(fdt, path);
> > > > +}
> > >
> > > Igor didn't like the recursion when I posted this before so I changed
> > > it when doing the refresh[*] that I gave to Salil Mehta. Salil also
> > > works for Huawei, are you guys not working together?
> > >
> > > [*] https://github.com/rhdrjones/qemu/commits/virt-cpu-topology-refresh
> > >
> >
> >
> > I was not aware of this work going on. I am based at Cambridge and Fangying
> > in Hangzhou and work for different teams.
> >
> > Yes, I have it and have integrated it with the Virtual CPU hotplug branch
> > and I am testing them.
> >
> > I have also rebased virtual cpu hotplug patches and integrated the PMU
> > related changes recently been discussed in other mail-chain. Plus, I am
> > also dealing with the MPIDR/affinity part from the Kernel which has been
> > discussed earlier with the Marc Zyngier.
> >
> > It looks some of the changes are common here like ability to set MPIDR
> > of the vcpus at the time of their creation inside KVM. And the PPTT
> > table changes which were present in your refresh branch as well. Changes
> > related to the possible and present vcpus might need a sync as well.
> >
> > @Andrew, should I take out the part which is common and affects the
> > virtual cpu hotplug and push them separately. This way I can have part
> > of the changes related to the vcpu hotplug done which will also cover
> > this patch-set requirements as well?
> 
> Whatever works best for you and Ying Fang. It looks like this series
> only focuses on topology. It's not considering present and possible
> cpus, but it probably should. It also adds the cache hierarchy stuff,
> but I'm not sure it's approaching that in the right way. I think it
> may make sense to put this series on hold and take another look at
> your hotplug series when it's reposted before deciding what to do.


Ok fine. Let me collaborate with him internally first. Either of us
will have to rebase our patches on others code.


thanks


> 
> Thanks,
> drew
> 
> >
> > @Fangying, will this work for you?
> >
> >
> > Salil.
> >
> >
Zeng Tao Oct. 13, 2020, 12:11 p.m. UTC | #10
Cc valentin

> -----Original Message-----
> From: Qemu-devel
> [mailto:qemu-devel-bounces+prime.zeng=hisilicon.com@nongnu.org]
> On Behalf Of Ying Fang
> Sent: Thursday, September 17, 2020 11:20 AM
> To: qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; drjones@redhat.com; Zhanghailiang;
> Chenzhendong (alex); shannon.zhaosl@gmail.com;
> qemu-arm@nongnu.org; alistair.francis@wdc.com; fangying;
> imammedo@redhat.com
> Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache
> topology support
> 
> An accurate cpu topology may help improve the cpu scheduler's
> decision
> making when dealing with multi-core system. So cpu topology
> description
> is helpful to provide guest with the right view. Cpu cache information
> may
> also have slight impact on the sched domain, and even userspace
> software
> may check the cpu cache information to do some optimizations. Thus
> this patch
> series is posted to provide cpu and cache topology support for arm.
> 
> To make the cpu topology consistent with MPIDR, an vcpu ioctl

For aarch64, the cpu topology don't depends on the MPDIR.
See https://patchwork.kernel.org/patch/11744387/ 

> KVM_ARM_SET_MP_AFFINITY is introduced so that userspace can set
> MPIDR
> according to the topology specified [1]. To describe the cpu topology
> both fdt and ACPI are supported. To describe the cpu cache
> information,
> a default cache hierarchy is given and can be made configurable later.
> The cpu topology is built according to processor hierarchy node
> structure.
> The cpu cache information is built according to cache type structure.
> 
> This patch series is partially based on the patches posted by Andrew
> Jone
> years ago [2], I jumped in on it since some OS vendor cooperative
> partners
> are eager for it. Thanks for Andrew's contribution. Please feel free to
> reply
> to me if there is anything improper.
> 
> [1] https://patchwork.kernel.org/cover/11781317
> [2]
> https://patchwork.ozlabs.org/project/qemu-devel/cover/2018070412
> 4923.32483-1-drjones@redhat.com
> 
> Andrew Jones (2):
>   device_tree: add qemu_fdt_add_path
>   hw/arm/virt: DT: add cpu-map
> 
> Ying Fang (10):
>   linux headers: Update linux header with
> KVM_ARM_SET_MP_AFFINITY
>   target/arm/kvm64: make MPIDR consistent with CPU Topology
>   target/arm/kvm32: make MPIDR consistent with CPU Topology
>   hw/arm/virt-acpi-build: distinguish possible and present cpus
>   hw/acpi/aml-build: add processor hierarchy node structure
>   hw/arm/virt-acpi-build: add PPTT table
>   target/arm/cpu: Add CPU cache description for arm
>   hw/arm/virt: add fdt cache information
>   hw/acpi/aml-build: build ACPI CPU cache topology information
>   hw/arm/virt-acpi-build: Enable CPU cache topology
> 
>  device_tree.c                |  24 +++++++
>  hw/acpi/aml-build.c          |  68 +++++++++++++++++++
>  hw/arm/virt-acpi-build.c     |  99
> +++++++++++++++++++++++++--
>  hw/arm/virt.c                | 128
> ++++++++++++++++++++++++++++++++++-
>  include/hw/acpi/acpi-defs.h  |  14 ++++
>  include/hw/acpi/aml-build.h  |  11 +++
>  include/hw/arm/virt.h        |   1 +
>  include/sysemu/device_tree.h |   1 +
>  linux-headers/linux/kvm.h    |   3 +
>  target/arm/cpu.c             |  42 ++++++++++++
>  target/arm/cpu.h             |  27 ++++++++
>  target/arm/kvm32.c           |  46 ++++++++++---
>  target/arm/kvm64.c           |  46 ++++++++++---
>  13 files changed, 488 insertions(+), 22 deletions(-)
> 
> --
> 2.23.0
>
Andrew Jones Oct. 13, 2020, 6:08 p.m. UTC | #11
On Tue, Oct 13, 2020 at 12:11:20PM +0000, Zengtao (B) wrote:
> Cc valentin
> 
> > -----Original Message-----
> > From: Qemu-devel
> > [mailto:qemu-devel-bounces+prime.zeng=hisilicon.com@nongnu.org]
> > On Behalf Of Ying Fang
> > Sent: Thursday, September 17, 2020 11:20 AM
> > To: qemu-devel@nongnu.org
> > Cc: peter.maydell@linaro.org; drjones@redhat.com; Zhanghailiang;
> > Chenzhendong (alex); shannon.zhaosl@gmail.com;
> > qemu-arm@nongnu.org; alistair.francis@wdc.com; fangying;
> > imammedo@redhat.com
> > Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache
> > topology support
> > 
> > An accurate cpu topology may help improve the cpu scheduler's
> > decision
> > making when dealing with multi-core system. So cpu topology
> > description
> > is helpful to provide guest with the right view. Cpu cache information
> > may
> > also have slight impact on the sched domain, and even userspace
> > software
> > may check the cpu cache information to do some optimizations. Thus
> > this patch
> > series is posted to provide cpu and cache topology support for arm.
> > 
> > To make the cpu topology consistent with MPIDR, an vcpu ioctl
> 
> For aarch64, the cpu topology don't depends on the MPDIR.
> See https://patchwork.kernel.org/patch/11744387/ 
>

The topology should not be inferred from the MPIDR Aff fields,
but MPIDR is the CPU identifier. When describing a topology
with ACPI or DT the CPU elements in the topology description
must map to actual CPUs. MPIDR is that mapping link. KVM
currently determines what the MPIDR of a VCPU is. If KVM
userspace is going to determine the VCPU topology, then it
also needs control over the MPIDR values, otherwise it
becomes quite messy trying to get the mapping right.

Thanks,
drew
Ying Fang Oct. 15, 2020, 2:07 a.m. UTC | #12
On 10/14/2020 2:08 AM, Andrew Jones wrote:
> On Tue, Oct 13, 2020 at 12:11:20PM +0000, Zengtao (B) wrote:
>> Cc valentin
>>
>>> -----Original Message-----
>>> From: Qemu-devel
>>> [mailto:qemu-devel-bounces+prime.zeng=hisilicon.com@nongnu.org]
>>> On Behalf Of Ying Fang
>>> Sent: Thursday, September 17, 2020 11:20 AM
>>> To: qemu-devel@nongnu.org
>>> Cc: peter.maydell@linaro.org; drjones@redhat.com; Zhanghailiang;
>>> Chenzhendong (alex); shannon.zhaosl@gmail.com;
>>> qemu-arm@nongnu.org; alistair.francis@wdc.com; fangying;
>>> imammedo@redhat.com
>>> Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache
>>> topology support
>>>
>>> An accurate cpu topology may help improve the cpu scheduler's
>>> decision
>>> making when dealing with multi-core system. So cpu topology
>>> description
>>> is helpful to provide guest with the right view. Cpu cache information
>>> may
>>> also have slight impact on the sched domain, and even userspace
>>> software
>>> may check the cpu cache information to do some optimizations. Thus
>>> this patch
>>> series is posted to provide cpu and cache topology support for arm.
>>>
>>> To make the cpu topology consistent with MPIDR, an vcpu ioctl
>>
>> For aarch64, the cpu topology don't depends on the MPDIR.
>> See https://patchwork.kernel.org/patch/11744387/
>>
> 
> The topology should not be inferred from the MPIDR Aff fields,

MPIDR is abused by ARM OEM manufactures. It is only used as a
identifer for a specific cpu, not representation of the topology.

> but MPIDR is the CPU identifier. When describing a topology
> with ACPI or DT the CPU elements in the topology description
> must map to actual CPUs. MPIDR is that mapping link. KVM
> currently determines what the MPIDR of a VCPU is. If KVM

KVM currently assigns MPIDR with vcpu->vcpu_id which mapped
into affinity levels. See reset_mpidr in sys_regs.c

> userspace is going to determine the VCPU topology, then it
> also needs control over the MPIDR values, otherwise it
> becomes quite messy trying to get the mapping right.
If we are going to control MPIDR, shall we assign MPIDR with
vcpu_id or map topology hierarchy into affinity levels or any
other link schema ?

> 
> Thanks,
> drew
> 
> .
> 
Thanks Ying.
Andrew Jones Oct. 15, 2020, 7:59 a.m. UTC | #13
On Thu, Oct 15, 2020 at 10:07:16AM +0800, Ying Fang wrote:
> 

> 

> On 10/14/2020 2:08 AM, Andrew Jones wrote:

> > On Tue, Oct 13, 2020 at 12:11:20PM +0000, Zengtao (B) wrote:

> > > Cc valentin

> > > 

> > > > -----Original Message-----

> > > > From: Qemu-devel

> > > > [mailto:qemu-devel-bounces+prime.zeng=hisilicon.com@nongnu.org]

> > > > On Behalf Of Ying Fang

> > > > Sent: Thursday, September 17, 2020 11:20 AM

> > > > To: qemu-devel@nongnu.org

> > > > Cc: peter.maydell@linaro.org; drjones@redhat.com; Zhanghailiang;

> > > > Chenzhendong (alex); shannon.zhaosl@gmail.com;

> > > > qemu-arm@nongnu.org; alistair.francis@wdc.com; fangying;

> > > > imammedo@redhat.com

> > > > Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache

> > > > topology support

> > > > 

> > > > An accurate cpu topology may help improve the cpu scheduler's

> > > > decision

> > > > making when dealing with multi-core system. So cpu topology

> > > > description

> > > > is helpful to provide guest with the right view. Cpu cache information

> > > > may

> > > > also have slight impact on the sched domain, and even userspace

> > > > software

> > > > may check the cpu cache information to do some optimizations. Thus

> > > > this patch

> > > > series is posted to provide cpu and cache topology support for arm.

> > > > 

> > > > To make the cpu topology consistent with MPIDR, an vcpu ioctl

> > > 

> > > For aarch64, the cpu topology don't depends on the MPDIR.

> > > See https://patchwork.kernel.org/patch/11744387/

> > > 

> > 

> > The topology should not be inferred from the MPIDR Aff fields,

> 

> MPIDR is abused by ARM OEM manufactures. It is only used as a

> identifer for a specific cpu, not representation of the topology.


Right, which is why I stated topology should not be inferred from
it.

> 

> > but MPIDR is the CPU identifier. When describing a topology

> > with ACPI or DT the CPU elements in the topology description

> > must map to actual CPUs. MPIDR is that mapping link. KVM

> > currently determines what the MPIDR of a VCPU is. If KVM

> 

> KVM currently assigns MPIDR with vcpu->vcpu_id which mapped

> into affinity levels. See reset_mpidr in sys_regs.c


I know, but how KVM assigns MPIDRs today is not really important
to KVM userspace. KVM userspace shouldn't depend on a KVM
algorithm, as it could change.

> 

> > userspace is going to determine the VCPU topology, then it

> > also needs control over the MPIDR values, otherwise it

> > becomes quite messy trying to get the mapping right.

> If we are going to control MPIDR, shall we assign MPIDR with

> vcpu_id or map topology hierarchy into affinity levels or any

> other link schema ?

>


We can assign them to whatever we want, as long as they're
unique and as long as Aff0 is assigned per the GIC requirements,
e.g. GICv3 requires that Aff0 be from 0 to 0xf. Also, when
pinning VCPUs to PCPUs we should ensure that MPIDRs with matching
Aff3,Aff2,Aff1 fields should actually be peers with respect to
the GIC.

We shouldn't try to encode topology in the MPIDR in any way,
so we might as well simply increment a counter to assign them,
which could possibly be the same as the VCPU ID.

Thanks,
drew
Ying Fang Oct. 16, 2020, 9:40 a.m. UTC | #14
On 10/15/2020 3:59 PM, Andrew Jones wrote:
> On Thu, Oct 15, 2020 at 10:07:16AM +0800, Ying Fang wrote:
>>
>>
>> On 10/14/2020 2:08 AM, Andrew Jones wrote:
>>> On Tue, Oct 13, 2020 at 12:11:20PM +0000, Zengtao (B) wrote:
>>>> Cc valentin
>>>>
>>>>> -----Original Message-----
>>>>> From: Qemu-devel
>>>>> [mailto:qemu-devel-bounces+prime.zeng=hisilicon.com@nongnu.org]
>>>>> On Behalf Of Ying Fang
>>>>> Sent: Thursday, September 17, 2020 11:20 AM
>>>>> To: qemu-devel@nongnu.org
>>>>> Cc: peter.maydell@linaro.org; drjones@redhat.com; Zhanghailiang;
>>>>> Chenzhendong (alex); shannon.zhaosl@gmail.com;
>>>>> qemu-arm@nongnu.org; alistair.francis@wdc.com; fangying;
>>>>> imammedo@redhat.com
>>>>> Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache
>>>>> topology support
>>>>>
>>>>> An accurate cpu topology may help improve the cpu scheduler's
>>>>> decision
>>>>> making when dealing with multi-core system. So cpu topology
>>>>> description
>>>>> is helpful to provide guest with the right view. Cpu cache information
>>>>> may
>>>>> also have slight impact on the sched domain, and even userspace
>>>>> software
>>>>> may check the cpu cache information to do some optimizations. Thus
>>>>> this patch
>>>>> series is posted to provide cpu and cache topology support for arm.
>>>>>
>>>>> To make the cpu topology consistent with MPIDR, an vcpu ioctl
>>>>
>>>> For aarch64, the cpu topology don't depends on the MPDIR.
>>>> See https://patchwork.kernel.org/patch/11744387/
>>>>
>>>
>>> The topology should not be inferred from the MPIDR Aff fields,
>>
>> MPIDR is abused by ARM OEM manufactures. It is only used as a
>> identifer for a specific cpu, not representation of the topology.
> 
> Right, which is why I stated topology should not be inferred from
> it.
> 
>>
>>> but MPIDR is the CPU identifier. When describing a topology
>>> with ACPI or DT the CPU elements in the topology description
>>> must map to actual CPUs. MPIDR is that mapping link. KVM
>>> currently determines what the MPIDR of a VCPU is. If KVM
>>
>> KVM currently assigns MPIDR with vcpu->vcpu_id which mapped
>> into affinity levels. See reset_mpidr in sys_regs.c
> 
> I know, but how KVM assigns MPIDRs today is not really important
> to KVM userspace. KVM userspace shouldn't depend on a KVM
> algorithm, as it could change.
> 
>>
>>> userspace is going to determine the VCPU topology, then it
>>> also needs control over the MPIDR values, otherwise it
>>> becomes quite messy trying to get the mapping right.
>> If we are going to control MPIDR, shall we assign MPIDR with
>> vcpu_id or map topology hierarchy into affinity levels or any
>> other link schema ?
>>
> 
> We can assign them to whatever we want, as long as they're
> unique and as long as Aff0 is assigned per the GIC requirements,
> e.g. GICv3 requires that Aff0 be from 0 to 0xf. Also, when
> pinning VCPUs to PCPUs we should ensure that MPIDRs with matching
> Aff3,Aff2,Aff1 fields should actually be peers with respect to
> the GIC.

Still not clear why vCPU's MPIDR need to match pPCPU's GIC affinity.
Maybe I should read spec for GICv3.

> 
> We shouldn't try to encode topology in the MPIDR in any way,
> so we might as well simply increment a counter to assign them,
> which could possibly be the same as the VCPU ID.

Hmm, then we can leave it as it is.

> 
> Thanks,
> drew
> 
> .
>
Andrew Jones Oct. 16, 2020, 10:07 a.m. UTC | #15
On Fri, Oct 16, 2020 at 05:40:02PM +0800, Ying Fang wrote:
> 

> 

> On 10/15/2020 3:59 PM, Andrew Jones wrote:

> > On Thu, Oct 15, 2020 at 10:07:16AM +0800, Ying Fang wrote:

> > > 

> > > 

> > > On 10/14/2020 2:08 AM, Andrew Jones wrote:

> > > > On Tue, Oct 13, 2020 at 12:11:20PM +0000, Zengtao (B) wrote:

> > > > > Cc valentin

> > > > > 

> > > > > > -----Original Message-----

> > > > > > From: Qemu-devel

> > > > > > [mailto:qemu-devel-bounces+prime.zeng=hisilicon.com@nongnu.org]

> > > > > > On Behalf Of Ying Fang

> > > > > > Sent: Thursday, September 17, 2020 11:20 AM

> > > > > > To: qemu-devel@nongnu.org

> > > > > > Cc: peter.maydell@linaro.org; drjones@redhat.com; Zhanghailiang;

> > > > > > Chenzhendong (alex); shannon.zhaosl@gmail.com;

> > > > > > qemu-arm@nongnu.org; alistair.francis@wdc.com; fangying;

> > > > > > imammedo@redhat.com

> > > > > > Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache

> > > > > > topology support

> > > > > > 

> > > > > > An accurate cpu topology may help improve the cpu scheduler's

> > > > > > decision

> > > > > > making when dealing with multi-core system. So cpu topology

> > > > > > description

> > > > > > is helpful to provide guest with the right view. Cpu cache information

> > > > > > may

> > > > > > also have slight impact on the sched domain, and even userspace

> > > > > > software

> > > > > > may check the cpu cache information to do some optimizations. Thus

> > > > > > this patch

> > > > > > series is posted to provide cpu and cache topology support for arm.

> > > > > > 

> > > > > > To make the cpu topology consistent with MPIDR, an vcpu ioctl

> > > > > 

> > > > > For aarch64, the cpu topology don't depends on the MPDIR.

> > > > > See https://patchwork.kernel.org/patch/11744387/

> > > > > 

> > > > 

> > > > The topology should not be inferred from the MPIDR Aff fields,

> > > 

> > > MPIDR is abused by ARM OEM manufactures. It is only used as a

> > > identifer for a specific cpu, not representation of the topology.

> > 

> > Right, which is why I stated topology should not be inferred from

> > it.

> > 

> > > 

> > > > but MPIDR is the CPU identifier. When describing a topology

> > > > with ACPI or DT the CPU elements in the topology description

> > > > must map to actual CPUs. MPIDR is that mapping link. KVM

> > > > currently determines what the MPIDR of a VCPU is. If KVM

> > > 

> > > KVM currently assigns MPIDR with vcpu->vcpu_id which mapped

> > > into affinity levels. See reset_mpidr in sys_regs.c

> > 

> > I know, but how KVM assigns MPIDRs today is not really important

> > to KVM userspace. KVM userspace shouldn't depend on a KVM

> > algorithm, as it could change.

> > 

> > > 

> > > > userspace is going to determine the VCPU topology, then it

> > > > also needs control over the MPIDR values, otherwise it

> > > > becomes quite messy trying to get the mapping right.

> > > If we are going to control MPIDR, shall we assign MPIDR with

> > > vcpu_id or map topology hierarchy into affinity levels or any

> > > other link schema ?

> > > 

> > 

> > We can assign them to whatever we want, as long as they're

> > unique and as long as Aff0 is assigned per the GIC requirements,

> > e.g. GICv3 requires that Aff0 be from 0 to 0xf. Also, when

> > pinning VCPUs to PCPUs we should ensure that MPIDRs with matching

> > Aff3,Aff2,Aff1 fields should actually be peers with respect to

> > the GIC.

> 

> Still not clear why vCPU's MPIDR need to match pPCPU's GIC affinity.

> Maybe I should read spec for GICv3.


Look at how IPIs are efficiently sent to "peers", where the definition
of a peer is that only Aff0 differs in its MPIDR. But, gicv3's
optimizations can only handle 16 peers. If we want pinned VCPUs to
have the same performance as PCPUS, then we should maintain this
Aff0 limit.

Thanks,
drew

> 

> > 

> > We shouldn't try to encode topology in the MPIDR in any way,

> > so we might as well simply increment a counter to assign them,

> > which could possibly be the same as the VCPU ID.

> 

> Hmm, then we can leave it as it is.

> 

> > 

> > Thanks,

> > drew

> > 

> > .

> > 

>
Ying Fang Oct. 20, 2020, 2:52 a.m. UTC | #16
On 10/16/2020 6:07 PM, Andrew Jones wrote:
> On Fri, Oct 16, 2020 at 05:40:02PM +0800, Ying Fang wrote:
>>
>>
>> On 10/15/2020 3:59 PM, Andrew Jones wrote:
>>> On Thu, Oct 15, 2020 at 10:07:16AM +0800, Ying Fang wrote:
>>>>
>>>>
>>>> On 10/14/2020 2:08 AM, Andrew Jones wrote:
>>>>> On Tue, Oct 13, 2020 at 12:11:20PM +0000, Zengtao (B) wrote:
>>>>>> Cc valentin
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Qemu-devel
>>>>>>> [mailto:qemu-devel-bounces+prime.zeng=hisilicon.com@nongnu.org]
>>>>>>> On Behalf Of Ying Fang
>>>>>>> Sent: Thursday, September 17, 2020 11:20 AM
>>>>>>> To: qemu-devel@nongnu.org
>>>>>>> Cc: peter.maydell@linaro.org; drjones@redhat.com; Zhanghailiang;
>>>>>>> Chenzhendong (alex); shannon.zhaosl@gmail.com;
>>>>>>> qemu-arm@nongnu.org; alistair.francis@wdc.com; fangying;
>>>>>>> imammedo@redhat.com
>>>>>>> Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache
>>>>>>> topology support
>>>>>>>
>>>>>>> An accurate cpu topology may help improve the cpu scheduler's
>>>>>>> decision
>>>>>>> making when dealing with multi-core system. So cpu topology
>>>>>>> description
>>>>>>> is helpful to provide guest with the right view. Cpu cache information
>>>>>>> may
>>>>>>> also have slight impact on the sched domain, and even userspace
>>>>>>> software
>>>>>>> may check the cpu cache information to do some optimizations. Thus
>>>>>>> this patch
>>>>>>> series is posted to provide cpu and cache topology support for arm.
>>>>>>>
>>>>>>> To make the cpu topology consistent with MPIDR, an vcpu ioctl
>>>>>>
>>>>>> For aarch64, the cpu topology don't depends on the MPDIR.
>>>>>> See https://patchwork.kernel.org/patch/11744387/
>>>>>>
>>>>>
>>>>> The topology should not be inferred from the MPIDR Aff fields,
>>>>
>>>> MPIDR is abused by ARM OEM manufactures. It is only used as a
>>>> identifer for a specific cpu, not representation of the topology.
>>>
>>> Right, which is why I stated topology should not be inferred from
>>> it.
>>>
>>>>
>>>>> but MPIDR is the CPU identifier. When describing a topology
>>>>> with ACPI or DT the CPU elements in the topology description
>>>>> must map to actual CPUs. MPIDR is that mapping link. KVM
>>>>> currently determines what the MPIDR of a VCPU is. If KVM
>>>>
>>>> KVM currently assigns MPIDR with vcpu->vcpu_id which mapped
>>>> into affinity levels. See reset_mpidr in sys_regs.c
>>>
>>> I know, but how KVM assigns MPIDRs today is not really important
>>> to KVM userspace. KVM userspace shouldn't depend on a KVM
>>> algorithm, as it could change.
>>>
>>>>
>>>>> userspace is going to determine the VCPU topology, then it
>>>>> also needs control over the MPIDR values, otherwise it
>>>>> becomes quite messy trying to get the mapping right.
>>>> If we are going to control MPIDR, shall we assign MPIDR with
>>>> vcpu_id or map topology hierarchy into affinity levels or any
>>>> other link schema ?
>>>>
>>>
>>> We can assign them to whatever we want, as long as they're
>>> unique and as long as Aff0 is assigned per the GIC requirements,
>>> e.g. GICv3 requires that Aff0 be from 0 to 0xf. Also, when
>>> pinning VCPUs to PCPUs we should ensure that MPIDRs with matching
>>> Aff3,Aff2,Aff1 fields should actually be peers with respect to
>>> the GIC.
>>
>> Still not clear why vCPU's MPIDR need to match pPCPU's GIC affinity.
>> Maybe I should read spec for GICv3.
> 
> Look at how IPIs are efficiently sent to "peers", where the definition
> of a peer is that only Aff0 differs in its MPIDR. But, gicv3's
> optimizations can only handle 16 peers. If we want pinned VCPUs to
> have the same performance as PCPUS, then we should maintain this
> Aff0 limit.

Yes I see. I think *virt_cpu_mp_affinity* in qemu has limit
on the clustersz. It groups every 16 vCPUs into a cluster
and then mapped into the first two affinity levels.

Thanks.
Ying.

> 
> Thanks,
> drew
> 
>>
>>>
>>> We shouldn't try to encode topology in the MPIDR in any way,
>>> so we might as well simply increment a counter to assign them,
>>> which could possibly be the same as the VCPU ID.
>>
>> Hmm, then we can leave it as it is.
>>
>>>
>>> Thanks,
>>> drew
>>>
>>> .
>>>
>>
> 
> .
>
Andrew Jones Oct. 20, 2020, 8:20 a.m. UTC | #17
On Tue, Oct 20, 2020 at 10:52:11AM +0800, Ying Fang wrote:
> 

> 

> On 10/16/2020 6:07 PM, Andrew Jones wrote:

> > On Fri, Oct 16, 2020 at 05:40:02PM +0800, Ying Fang wrote:

> > > 

> > > 

> > > On 10/15/2020 3:59 PM, Andrew Jones wrote:

> > > > On Thu, Oct 15, 2020 at 10:07:16AM +0800, Ying Fang wrote:

> > > > > 

> > > > > 

> > > > > On 10/14/2020 2:08 AM, Andrew Jones wrote:

> > > > > > On Tue, Oct 13, 2020 at 12:11:20PM +0000, Zengtao (B) wrote:

> > > > > > > Cc valentin

> > > > > > > 

> > > > > > > > -----Original Message-----

> > > > > > > > From: Qemu-devel

> > > > > > > > [mailto:qemu-devel-bounces+prime.zeng=hisilicon.com@nongnu.org]

> > > > > > > > On Behalf Of Ying Fang

> > > > > > > > Sent: Thursday, September 17, 2020 11:20 AM

> > > > > > > > To: qemu-devel@nongnu.org

> > > > > > > > Cc: peter.maydell@linaro.org; drjones@redhat.com; Zhanghailiang;

> > > > > > > > Chenzhendong (alex); shannon.zhaosl@gmail.com;

> > > > > > > > qemu-arm@nongnu.org; alistair.francis@wdc.com; fangying;

> > > > > > > > imammedo@redhat.com

> > > > > > > > Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache

> > > > > > > > topology support

> > > > > > > > 

> > > > > > > > An accurate cpu topology may help improve the cpu scheduler's

> > > > > > > > decision

> > > > > > > > making when dealing with multi-core system. So cpu topology

> > > > > > > > description

> > > > > > > > is helpful to provide guest with the right view. Cpu cache information

> > > > > > > > may

> > > > > > > > also have slight impact on the sched domain, and even userspace

> > > > > > > > software

> > > > > > > > may check the cpu cache information to do some optimizations. Thus

> > > > > > > > this patch

> > > > > > > > series is posted to provide cpu and cache topology support for arm.

> > > > > > > > 

> > > > > > > > To make the cpu topology consistent with MPIDR, an vcpu ioctl

> > > > > > > 

> > > > > > > For aarch64, the cpu topology don't depends on the MPDIR.

> > > > > > > See https://patchwork.kernel.org/patch/11744387/

> > > > > > > 

> > > > > > 

> > > > > > The topology should not be inferred from the MPIDR Aff fields,

> > > > > 

> > > > > MPIDR is abused by ARM OEM manufactures. It is only used as a

> > > > > identifer for a specific cpu, not representation of the topology.

> > > > 

> > > > Right, which is why I stated topology should not be inferred from

> > > > it.

> > > > 

> > > > > 

> > > > > > but MPIDR is the CPU identifier. When describing a topology

> > > > > > with ACPI or DT the CPU elements in the topology description

> > > > > > must map to actual CPUs. MPIDR is that mapping link. KVM

> > > > > > currently determines what the MPIDR of a VCPU is. If KVM

> > > > > 

> > > > > KVM currently assigns MPIDR with vcpu->vcpu_id which mapped

> > > > > into affinity levels. See reset_mpidr in sys_regs.c

> > > > 

> > > > I know, but how KVM assigns MPIDRs today is not really important

> > > > to KVM userspace. KVM userspace shouldn't depend on a KVM

> > > > algorithm, as it could change.

> > > > 

> > > > > 

> > > > > > userspace is going to determine the VCPU topology, then it

> > > > > > also needs control over the MPIDR values, otherwise it

> > > > > > becomes quite messy trying to get the mapping right.

> > > > > If we are going to control MPIDR, shall we assign MPIDR with

> > > > > vcpu_id or map topology hierarchy into affinity levels or any

> > > > > other link schema ?

> > > > > 

> > > > 

> > > > We can assign them to whatever we want, as long as they're

> > > > unique and as long as Aff0 is assigned per the GIC requirements,

> > > > e.g. GICv3 requires that Aff0 be from 0 to 0xf. Also, when

> > > > pinning VCPUs to PCPUs we should ensure that MPIDRs with matching

> > > > Aff3,Aff2,Aff1 fields should actually be peers with respect to

> > > > the GIC.

> > > 

> > > Still not clear why vCPU's MPIDR need to match pPCPU's GIC affinity.

> > > Maybe I should read spec for GICv3.

> > 

> > Look at how IPIs are efficiently sent to "peers", where the definition

> > of a peer is that only Aff0 differs in its MPIDR. But, gicv3's

> > optimizations can only handle 16 peers. If we want pinned VCPUs to

> > have the same performance as PCPUS, then we should maintain this

> > Aff0 limit.

> 

> Yes I see. I think *virt_cpu_mp_affinity* in qemu has limit

> on the clustersz. It groups every 16 vCPUs into a cluster

> and then mapped into the first two affinity levels.

>


Right, and it's probably sufficient to just switch to this function
for assigning affinity fields of MPIDRs for both TCG and KVM. Currently
it's only for TCG, as the comment in it explains, but it does the same
thing as KVM anyway. So, while nothing should change from the view of
the guest, userspace gains control over the MPIDRs, and that allows it
to build VCPU topologies more smoothly and in advance of VCPU creation.

Thanks,
drew