mbox series

[v4,00/11] Add per-core RAPL energy counter support for AMD CPUs

Message ID 20240711102436.4432-1-Dhananjay.Ugwekar@amd.com
Headers show
Series Add per-core RAPL energy counter support for AMD CPUs | expand

Message

Dhananjay Ugwekar July 11, 2024, 10:24 a.m. UTC
Currently the energy-cores event in the power PMU aggregates energy
consumption data at a package level. On the other hand the core energy
RAPL counter in AMD CPUs has a core scope (which means the energy 
consumption is recorded separately for each core). Earlier efforts to add
the core event in the power PMU had failed [1], due to the difference in 
the scope of these two events. Hence, there is a need for a new core scope
PMU.

This patchset adds a new "power_per_core" PMU alongside the existing
"power" PMU, which will be responsible for collecting the new
"energy-per-core" event.

Tested the package level and core level PMU counters with workloads
pinned to different CPUs.

Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa 
machine:

$ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1

 Performance counter stats for 'system wide':

S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/

[1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/

This patchset applies cleanly on top of v6.10-rc7 as well as latest 
tip/master.

v4 changes:
* Add patch 11 which removes the unused function cpu_to_rapl_pmu()
* Add Rui's rb tag for patch 1
* Invert the pmu scope check logic in patch 2 (Peter)
* Add comments explaining the scope check in patch 2 (Peter)
* Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
* Move renaming code to patch 8 (Rui)
* Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
* Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
  10 (Rui)

PS: Scope check logic is still kept the same (i.e., all Intel systems being 
considered as die scope), Rui will be modifying it to limit the die-scope 
only to Cascadelake-AP in a future patch on top of this patchset.

v3 changes:
* Patch 1 added to introduce the logical_core_id which is unique across
  the system (Prateek)
* Use the unique topology_logical_core_id() instead of
  topology_core_id() (which is only unique within a package on tested
  AMD and Intel systems) in Patch 10

v2 changes:
* Patches 6,7,8 added to split some changes out of the last patch
* Use container_of to get the rapl_pmus from event variable (Rui)
* Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
* Use event id 0x1 for energy-per-core event (Rui)
* Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
  per-core counter hw support (Rui)

Dhananjay Ugwekar (10):
  perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
  perf/x86/rapl: Rename rapl_pmu variables
  perf/x86/rapl: Make rapl_model struct global
  perf/x86/rapl: Move cpumask variable to rapl_pmus struct
  perf/x86/rapl: Add wrapper for online/offline functions
  perf/x86/rapl: Add an argument to the cleanup and init functions
  perf/x86/rapl: Modify the generic variable names to *_pkg*
  perf/x86/rapl: Remove the global variable rapl_msrs
  perf/x86/rapl: Add per-core energy counter support for AMD CPUs
  perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu

K Prateek Nayak (1):
  x86/topology: Introduce topology_logical_core_id()

 Documentation/arch/x86/topology.rst   |   4 +
 arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
 arch/x86/include/asm/processor.h      |   1 +
 arch/x86/include/asm/topology.h       |   1 +
 arch/x86/kernel/cpu/debugfs.c         |   1 +
 arch/x86/kernel/cpu/topology_common.c |   1 +
 6 files changed, 328 insertions(+), 134 deletions(-)

Comments

Dhananjay Ugwekar July 15, 2024, 9:35 a.m. UTC | #1
Hello Ian,

On 7/12/2024 3:53 AM, Ian Rogers wrote:
> On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@amd.com> wrote:
>>
>> Currently the energy-cores event in the power PMU aggregates energy
>> consumption data at a package level. On the other hand the core energy
>> RAPL counter in AMD CPUs has a core scope (which means the energy
>> consumption is recorded separately for each core). Earlier efforts to add
>> the core event in the power PMU had failed [1], due to the difference in
>> the scope of these two events. Hence, there is a need for a new core scope
>> PMU.
>>
>> This patchset adds a new "power_per_core" PMU alongside the existing
>> "power" PMU, which will be responsible for collecting the new
>> "energy-per-core" event.
> 
> Sorry for being naive, is the only reason for adding the new PMU for
> the sake of the cpumask? Perhaps we can add per event cpumasks like
> say `/sys/devices/power/events/energy-per-core.cpumask` which contains
> the CPUs of the different cores in this case. There's supporting
> hotplugging CPUs as an issue. Adding the tool support for this
> wouldn't be hard and it may be less messy (although old perf tools on
> new kernels wouldn't know about these files).

I went over the two approaches and below are my thoughts,
 
New PMU approach:
Pros
* It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
Cons
* More code changes in rapl.c

Event specific cpumask approach:
Pros
* It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
Cons
* Both new kernel and perf tool will be required to use the new per-core event.
  
I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.

Thanks,
Dhananjay

> 
> Thanks,
> Ian
> 
> 
>> Tested the package level and core level PMU counters with workloads
>> pinned to different CPUs.
>>
>> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
>> machine:
>>
>> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
>>
>>  Performance counter stats for 'system wide':
>>
>> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
>> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
>> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
>>
>> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
>>
>> This patchset applies cleanly on top of v6.10-rc7 as well as latest
>> tip/master.
>>
>> v4 changes:
>> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
>> * Add Rui's rb tag for patch 1
>> * Invert the pmu scope check logic in patch 2 (Peter)
>> * Add comments explaining the scope check in patch 2 (Peter)
>> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
>> * Move renaming code to patch 8 (Rui)
>> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
>> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
>>   10 (Rui)
>>
>> PS: Scope check logic is still kept the same (i.e., all Intel systems being
>> considered as die scope), Rui will be modifying it to limit the die-scope
>> only to Cascadelake-AP in a future patch on top of this patchset.
>>
>> v3 changes:
>> * Patch 1 added to introduce the logical_core_id which is unique across
>>   the system (Prateek)
>> * Use the unique topology_logical_core_id() instead of
>>   topology_core_id() (which is only unique within a package on tested
>>   AMD and Intel systems) in Patch 10
>>
>> v2 changes:
>> * Patches 6,7,8 added to split some changes out of the last patch
>> * Use container_of to get the rapl_pmus from event variable (Rui)
>> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
>> * Use event id 0x1 for energy-per-core event (Rui)
>> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
>>   per-core counter hw support (Rui)
>>
>> Dhananjay Ugwekar (10):
>>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
>>   perf/x86/rapl: Rename rapl_pmu variables
>>   perf/x86/rapl: Make rapl_model struct global
>>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
>>   perf/x86/rapl: Add wrapper for online/offline functions
>>   perf/x86/rapl: Add an argument to the cleanup and init functions
>>   perf/x86/rapl: Modify the generic variable names to *_pkg*
>>   perf/x86/rapl: Remove the global variable rapl_msrs
>>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
>>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
>>
>> K Prateek Nayak (1):
>>   x86/topology: Introduce topology_logical_core_id()
>>
>>  Documentation/arch/x86/topology.rst   |   4 +
>>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
>>  arch/x86/include/asm/processor.h      |   1 +
>>  arch/x86/include/asm/topology.h       |   1 +
>>  arch/x86/kernel/cpu/debugfs.c         |   1 +
>>  arch/x86/kernel/cpu/topology_common.c |   1 +
>>  6 files changed, 328 insertions(+), 134 deletions(-)
>>
>> --
>> 2.34.1
>>
Ian Rogers July 15, 2024, 3:22 p.m. UTC | #2
On Mon, Jul 15, 2024 at 2:36 AM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
>
> Hello Ian,
>
> On 7/12/2024 3:53 AM, Ian Rogers wrote:
> > On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
> > <Dhananjay.Ugwekar@amd.com> wrote:
> >>
> >> Currently the energy-cores event in the power PMU aggregates energy
> >> consumption data at a package level. On the other hand the core energy
> >> RAPL counter in AMD CPUs has a core scope (which means the energy
> >> consumption is recorded separately for each core). Earlier efforts to add
> >> the core event in the power PMU had failed [1], due to the difference in
> >> the scope of these two events. Hence, there is a need for a new core scope
> >> PMU.
> >>
> >> This patchset adds a new "power_per_core" PMU alongside the existing
> >> "power" PMU, which will be responsible for collecting the new
> >> "energy-per-core" event.
> >
> > Sorry for being naive, is the only reason for adding the new PMU for
> > the sake of the cpumask? Perhaps we can add per event cpumasks like
> > say `/sys/devices/power/events/energy-per-core.cpumask` which contains
> > the CPUs of the different cores in this case. There's supporting
> > hotplugging CPUs as an issue. Adding the tool support for this
> > wouldn't be hard and it may be less messy (although old perf tools on
> > new kernels wouldn't know about these files).
>
> I went over the two approaches and below are my thoughts,
>
> New PMU approach:
> Pros
> * It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
> Cons
> * More code changes in rapl.c
>
> Event specific cpumask approach:
> Pros
> * It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
> Cons
> * Both new kernel and perf tool will be required to use the new per-core event.
>
> I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.

Thanks Dhananjay, and thanks for taking the time for an objective
discussion on the mailing list. I'm very supportive of seeing the work
you are enabling land.

My concern comes from the tool side. If every PMU starts to have
variants for the sake of the cpumask what does this mean for
aggregation in the perf tool? There is another issue around event
grouping, you can't group events across PMUs, but my feeling is that
event grouping needs to be rethought. By default the power_per_core
events are going to be aggregated together by the perf tool, which
then loses their per_core-ness.

I was trying to think of the same problem but in other PMUs. One
thought I had was the difference between hyperthread and core events.
At least on Intel, some events can only count for the whole core not
per hyperthread. The events don't have a cpu_per_core PMU, they just
use the regular cpu one, and so the cpumask is set to all online
hyperthreads. When a per-core event is programmed it will get
programmed on every hyperthread and so counted twice for the core.
This at the least wastes a counter, but it probably also yields twice
the expected count as every event is counted twice then aggregated. So
this is just wrong and the user is expected to be smart and fix it
(checking the x86 events there is a convention to use a ".ALL" or
".ANY" suffix for core wide events iirc). If we had a cpumask for
these events then we could avoid the double setting, free up a counter
and avoid double counting. Were we to fix things the way it is done in
this patch series we'd add another PMU.

My feeling is that in the longer term a per event cpumask looks
cleaner. I think either way you need a new kernel for the new RAPL
events. The problem with an old perf tool and a new kernel, this
doesn't normally happen with distributions as they match the perf tool
to the kernel version needlessly losing features and fixes along the
way. If the new PMU is going to get backported through fixes.. then we
can do similar for reading the per event cpumask. I'd be tempted not
to do this and focus on the next LTS kernel, getting the kernel and
tool fixes in as necessary.

Thanks,
Ian


> Thanks,
> Dhananjay
>
> >
> > Thanks,
> > Ian
> >
> >
> >> Tested the package level and core level PMU counters with workloads
> >> pinned to different CPUs.
> >>
> >> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
> >> machine:
> >>
> >> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
> >>
> >>  Performance counter stats for 'system wide':
> >>
> >> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
> >> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
> >> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
> >>
> >> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
> >>
> >> This patchset applies cleanly on top of v6.10-rc7 as well as latest
> >> tip/master.
> >>
> >> v4 changes:
> >> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
> >> * Add Rui's rb tag for patch 1
> >> * Invert the pmu scope check logic in patch 2 (Peter)
> >> * Add comments explaining the scope check in patch 2 (Peter)
> >> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
> >> * Move renaming code to patch 8 (Rui)
> >> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
> >> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
> >>   10 (Rui)
> >>
> >> PS: Scope check logic is still kept the same (i.e., all Intel systems being
> >> considered as die scope), Rui will be modifying it to limit the die-scope
> >> only to Cascadelake-AP in a future patch on top of this patchset.
> >>
> >> v3 changes:
> >> * Patch 1 added to introduce the logical_core_id which is unique across
> >>   the system (Prateek)
> >> * Use the unique topology_logical_core_id() instead of
> >>   topology_core_id() (which is only unique within a package on tested
> >>   AMD and Intel systems) in Patch 10
> >>
> >> v2 changes:
> >> * Patches 6,7,8 added to split some changes out of the last patch
> >> * Use container_of to get the rapl_pmus from event variable (Rui)
> >> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
> >> * Use event id 0x1 for energy-per-core event (Rui)
> >> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
> >>   per-core counter hw support (Rui)
> >>
> >> Dhananjay Ugwekar (10):
> >>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
> >>   perf/x86/rapl: Rename rapl_pmu variables
> >>   perf/x86/rapl: Make rapl_model struct global
> >>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
> >>   perf/x86/rapl: Add wrapper for online/offline functions
> >>   perf/x86/rapl: Add an argument to the cleanup and init functions
> >>   perf/x86/rapl: Modify the generic variable names to *_pkg*
> >>   perf/x86/rapl: Remove the global variable rapl_msrs
> >>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
> >>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
> >>
> >> K Prateek Nayak (1):
> >>   x86/topology: Introduce topology_logical_core_id()
> >>
> >>  Documentation/arch/x86/topology.rst   |   4 +
> >>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
> >>  arch/x86/include/asm/processor.h      |   1 +
> >>  arch/x86/include/asm/topology.h       |   1 +
> >>  arch/x86/kernel/cpu/debugfs.c         |   1 +
> >>  arch/x86/kernel/cpu/topology_common.c |   1 +
> >>  6 files changed, 328 insertions(+), 134 deletions(-)
> >>
> >> --
> >> 2.34.1
> >>
Dhananjay Ugwekar July 16, 2024, 8:42 a.m. UTC | #3
Hello Ian,

On 7/15/2024 8:52 PM, Ian Rogers wrote:
> On Mon, Jul 15, 2024 at 2:36 AM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@amd.com> wrote:
>>
>> Hello Ian,
>>
>> On 7/12/2024 3:53 AM, Ian Rogers wrote:
>>> On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
>>> <Dhananjay.Ugwekar@amd.com> wrote:
>>>>
>>>> Currently the energy-cores event in the power PMU aggregates energy
>>>> consumption data at a package level. On the other hand the core energy
>>>> RAPL counter in AMD CPUs has a core scope (which means the energy
>>>> consumption is recorded separately for each core). Earlier efforts to add
>>>> the core event in the power PMU had failed [1], due to the difference in
>>>> the scope of these two events. Hence, there is a need for a new core scope
>>>> PMU.
>>>>
>>>> This patchset adds a new "power_per_core" PMU alongside the existing
>>>> "power" PMU, which will be responsible for collecting the new
>>>> "energy-per-core" event.
>>>
>>> Sorry for being naive, is the only reason for adding the new PMU for
>>> the sake of the cpumask? Perhaps we can add per event cpumasks like
>>> say `/sys/devices/power/events/energy-per-core.cpumask` which contains
>>> the CPUs of the different cores in this case. There's supporting
>>> hotplugging CPUs as an issue. Adding the tool support for this
>>> wouldn't be hard and it may be less messy (although old perf tools on
>>> new kernels wouldn't know about these files).
>>
>> I went over the two approaches and below are my thoughts,
>>
>> New PMU approach:
>> Pros
>> * It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
>> Cons
>> * More code changes in rapl.c
>>
>> Event specific cpumask approach:
>> Pros
>> * It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
>> Cons
>> * Both new kernel and perf tool will be required to use the new per-core event.
>>
>> I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.
> 
> Thanks Dhananjay, and thanks for taking the time for an objective
> discussion on the mailing list. I'm very supportive of seeing the work
> you are enabling land.
> 
> My concern comes from the tool side. If every PMU starts to have
> variants for the sake of the cpumask what does this mean for
> aggregation in the perf tool? There is another issue around event
> grouping, you can't group events across PMUs, but my feeling is that
> event grouping needs to be rethought. By default the power_per_core
> events are going to be aggregated together by the perf tool, which
> then loses their per_core-ness.

Yea right, maybe we need to fix this behavior.

> 
> I was trying to think of the same problem but in other PMUs. One
> thought I had was the difference between hyperthread and core events.
> At least on Intel, some events can only count for the whole core not
> per hyperthread. The events don't have a cpu_per_core PMU, they just
> use the regular cpu one, and so the cpumask is set to all online
> hyperthreads. When a per-core event is programmed it will get
> programmed on every hyperthread and so counted twice for the core.
> This at the least wastes a counter, but it probably also yields twice
> the expected count as every event is counted twice then aggregated. So
> this is just wrong and the user is expected to be smart and fix it
> (checking the x86 events there is a convention to use a ".ALL" or
> ".ANY" suffix for core wide events iirc). If we had a cpumask for
> these events then we could avoid the double setting, free up a counter
> and avoid double counting. Were we to fix things the way it is done in
> this patch series we'd add another PMU.

Yes, this seems like a valid usecase for event-specific cpumasks.

> 
> My feeling is that in the longer term a per event cpumask looks
> cleaner. I think either way you need a new kernel for the new RAPL
> events. The problem with an old perf tool and a new kernel, this
> doesn't normally happen with distributions as they match the perf tool
> to the kernel version needlessly losing features and fixes along the
> way. If the new PMU is going to get backported through fixes.. then we
> can do similar for reading the per event cpumask. I'd be tempted not
> to do this and focus on the next LTS kernel, getting the kernel and
> tool fixes in as necessary.

Makes sense, even though this approach will require more effort but it seems 
to be worthwhile as it would help things down the line (make it easier to have 
heterogenous-scope events within a PMU). I'll need to go through the perf tool 
to see how we can design this. I'll get back with an RFC series probably once 
I have an initial design in mind.

Thanks,
Dhananjay

> 
> Thanks,
> Ian
> 
> 
>> Thanks,
>> Dhananjay
>>
>>>
>>> Thanks,
>>> Ian
>>>
>>>
>>>> Tested the package level and core level PMU counters with workloads
>>>> pinned to different CPUs.
>>>>
>>>> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
>>>> machine:
>>>>
>>>> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
>>>>
>>>>  Performance counter stats for 'system wide':
>>>>
>>>> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
>>>> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
>>>> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
>>>>
>>>> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
>>>>
>>>> This patchset applies cleanly on top of v6.10-rc7 as well as latest
>>>> tip/master.
>>>>
>>>> v4 changes:
>>>> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
>>>> * Add Rui's rb tag for patch 1
>>>> * Invert the pmu scope check logic in patch 2 (Peter)
>>>> * Add comments explaining the scope check in patch 2 (Peter)
>>>> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
>>>> * Move renaming code to patch 8 (Rui)
>>>> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
>>>> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
>>>>   10 (Rui)
>>>>
>>>> PS: Scope check logic is still kept the same (i.e., all Intel systems being
>>>> considered as die scope), Rui will be modifying it to limit the die-scope
>>>> only to Cascadelake-AP in a future patch on top of this patchset.
>>>>
>>>> v3 changes:
>>>> * Patch 1 added to introduce the logical_core_id which is unique across
>>>>   the system (Prateek)
>>>> * Use the unique topology_logical_core_id() instead of
>>>>   topology_core_id() (which is only unique within a package on tested
>>>>   AMD and Intel systems) in Patch 10
>>>>
>>>> v2 changes:
>>>> * Patches 6,7,8 added to split some changes out of the last patch
>>>> * Use container_of to get the rapl_pmus from event variable (Rui)
>>>> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
>>>> * Use event id 0x1 for energy-per-core event (Rui)
>>>> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
>>>>   per-core counter hw support (Rui)
>>>>
>>>> Dhananjay Ugwekar (10):
>>>>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
>>>>   perf/x86/rapl: Rename rapl_pmu variables
>>>>   perf/x86/rapl: Make rapl_model struct global
>>>>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
>>>>   perf/x86/rapl: Add wrapper for online/offline functions
>>>>   perf/x86/rapl: Add an argument to the cleanup and init functions
>>>>   perf/x86/rapl: Modify the generic variable names to *_pkg*
>>>>   perf/x86/rapl: Remove the global variable rapl_msrs
>>>>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
>>>>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
>>>>
>>>> K Prateek Nayak (1):
>>>>   x86/topology: Introduce topology_logical_core_id()
>>>>
>>>>  Documentation/arch/x86/topology.rst   |   4 +
>>>>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
>>>>  arch/x86/include/asm/processor.h      |   1 +
>>>>  arch/x86/include/asm/topology.h       |   1 +
>>>>  arch/x86/kernel/cpu/debugfs.c         |   1 +
>>>>  arch/x86/kernel/cpu/topology_common.c |   1 +
>>>>  6 files changed, 328 insertions(+), 134 deletions(-)
>>>>
>>>> --
>>>> 2.34.1
>>>>
Ian Rogers July 16, 2024, 10:47 p.m. UTC | #4
On Tue, Jul 16, 2024 at 1:42 AM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
>
> Hello Ian,
>
> On 7/15/2024 8:52 PM, Ian Rogers wrote:
> > On Mon, Jul 15, 2024 at 2:36 AM Dhananjay Ugwekar
> > <Dhananjay.Ugwekar@amd.com> wrote:
> >>
> >> Hello Ian,
> >>
> >> On 7/12/2024 3:53 AM, Ian Rogers wrote:
> >>> On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
> >>> <Dhananjay.Ugwekar@amd.com> wrote:
> >>>>
> >>>> Currently the energy-cores event in the power PMU aggregates energy
> >>>> consumption data at a package level. On the other hand the core energy
> >>>> RAPL counter in AMD CPUs has a core scope (which means the energy
> >>>> consumption is recorded separately for each core). Earlier efforts to add
> >>>> the core event in the power PMU had failed [1], due to the difference in
> >>>> the scope of these two events. Hence, there is a need for a new core scope
> >>>> PMU.
> >>>>
> >>>> This patchset adds a new "power_per_core" PMU alongside the existing
> >>>> "power" PMU, which will be responsible for collecting the new
> >>>> "energy-per-core" event.
> >>>
> >>> Sorry for being naive, is the only reason for adding the new PMU for
> >>> the sake of the cpumask? Perhaps we can add per event cpumasks like
> >>> say `/sys/devices/power/events/energy-per-core.cpumask` which contains
> >>> the CPUs of the different cores in this case. There's supporting
> >>> hotplugging CPUs as an issue. Adding the tool support for this
> >>> wouldn't be hard and it may be less messy (although old perf tools on
> >>> new kernels wouldn't know about these files).
> >>
> >> I went over the two approaches and below are my thoughts,
> >>
> >> New PMU approach:
> >> Pros
> >> * It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
> >> Cons
> >> * More code changes in rapl.c
> >>
> >> Event specific cpumask approach:
> >> Pros
> >> * It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
> >> Cons
> >> * Both new kernel and perf tool will be required to use the new per-core event.
> >>
> >> I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.
> >
> > Thanks Dhananjay, and thanks for taking the time for an objective
> > discussion on the mailing list. I'm very supportive of seeing the work
> > you are enabling land.
> >
> > My concern comes from the tool side. If every PMU starts to have
> > variants for the sake of the cpumask what does this mean for
> > aggregation in the perf tool? There is another issue around event
> > grouping, you can't group events across PMUs, but my feeling is that
> > event grouping needs to be rethought. By default the power_per_core
> > events are going to be aggregated together by the perf tool, which
> > then loses their per_core-ness.
>
> Yea right, maybe we need to fix this behavior.
>
> >
> > I was trying to think of the same problem but in other PMUs. One
> > thought I had was the difference between hyperthread and core events.
> > At least on Intel, some events can only count for the whole core not
> > per hyperthread. The events don't have a cpu_per_core PMU, they just
> > use the regular cpu one, and so the cpumask is set to all online
> > hyperthreads. When a per-core event is programmed it will get
> > programmed on every hyperthread and so counted twice for the core.
> > This at the least wastes a counter, but it probably also yields twice
> > the expected count as every event is counted twice then aggregated. So
> > this is just wrong and the user is expected to be smart and fix it
> > (checking the x86 events there is a convention to use a ".ALL" or
> > ".ANY" suffix for core wide events iirc). If we had a cpumask for
> > these events then we could avoid the double setting, free up a counter
> > and avoid double counting. Were we to fix things the way it is done in
> > this patch series we'd add another PMU.
>
> Yes, this seems like a valid usecase for event-specific cpumasks.
>
> >
> > My feeling is that in the longer term a per event cpumask looks
> > cleaner. I think either way you need a new kernel for the new RAPL
> > events. The problem with an old perf tool and a new kernel, this
> > doesn't normally happen with distributions as they match the perf tool
> > to the kernel version needlessly losing features and fixes along the
> > way. If the new PMU is going to get backported through fixes.. then we
> > can do similar for reading the per event cpumask. I'd be tempted not
> > to do this and focus on the next LTS kernel, getting the kernel and
> > tool fixes in as necessary.
>
> Makes sense, even though this approach will require more effort but it seems
> to be worthwhile as it would help things down the line (make it easier to have
> heterogenous-scope events within a PMU). I'll need to go through the perf tool
> to see how we can design this. I'll get back with an RFC series probably once
> I have an initial design in mind.

Hi Dhananjay,

I can have a go at the perf tool side of this - I probably know the
way around the code best. Basically we need to do something similar to
how other "<event>.<setting>" values are parsed:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n335
The cpumask handling in the perf tool is a little weird as there is a
summary value in the evlist. Anyway, I can do that if you want to spin
the RAPL/power PMU side of things.

Thanks,
Ian

> Thanks,
> Dhananjay
>
> >
> > Thanks,
> > Ian
> >
> >
> >> Thanks,
> >> Dhananjay
> >>
> >>>
> >>> Thanks,
> >>> Ian
> >>>
> >>>
> >>>> Tested the package level and core level PMU counters with workloads
> >>>> pinned to different CPUs.
> >>>>
> >>>> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
> >>>> machine:
> >>>>
> >>>> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
> >>>>
> >>>>  Performance counter stats for 'system wide':
> >>>>
> >>>> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
> >>>> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
> >>>>
> >>>> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
> >>>>
> >>>> This patchset applies cleanly on top of v6.10-rc7 as well as latest
> >>>> tip/master.
> >>>>
> >>>> v4 changes:
> >>>> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
> >>>> * Add Rui's rb tag for patch 1
> >>>> * Invert the pmu scope check logic in patch 2 (Peter)
> >>>> * Add comments explaining the scope check in patch 2 (Peter)
> >>>> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
> >>>> * Move renaming code to patch 8 (Rui)
> >>>> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
> >>>> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
> >>>>   10 (Rui)
> >>>>
> >>>> PS: Scope check logic is still kept the same (i.e., all Intel systems being
> >>>> considered as die scope), Rui will be modifying it to limit the die-scope
> >>>> only to Cascadelake-AP in a future patch on top of this patchset.
> >>>>
> >>>> v3 changes:
> >>>> * Patch 1 added to introduce the logical_core_id which is unique across
> >>>>   the system (Prateek)
> >>>> * Use the unique topology_logical_core_id() instead of
> >>>>   topology_core_id() (which is only unique within a package on tested
> >>>>   AMD and Intel systems) in Patch 10
> >>>>
> >>>> v2 changes:
> >>>> * Patches 6,7,8 added to split some changes out of the last patch
> >>>> * Use container_of to get the rapl_pmus from event variable (Rui)
> >>>> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
> >>>> * Use event id 0x1 for energy-per-core event (Rui)
> >>>> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
> >>>>   per-core counter hw support (Rui)
> >>>>
> >>>> Dhananjay Ugwekar (10):
> >>>>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
> >>>>   perf/x86/rapl: Rename rapl_pmu variables
> >>>>   perf/x86/rapl: Make rapl_model struct global
> >>>>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
> >>>>   perf/x86/rapl: Add wrapper for online/offline functions
> >>>>   perf/x86/rapl: Add an argument to the cleanup and init functions
> >>>>   perf/x86/rapl: Modify the generic variable names to *_pkg*
> >>>>   perf/x86/rapl: Remove the global variable rapl_msrs
> >>>>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
> >>>>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
> >>>>
> >>>> K Prateek Nayak (1):
> >>>>   x86/topology: Introduce topology_logical_core_id()
> >>>>
> >>>>  Documentation/arch/x86/topology.rst   |   4 +
> >>>>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
> >>>>  arch/x86/include/asm/processor.h      |   1 +
> >>>>  arch/x86/include/asm/topology.h       |   1 +
> >>>>  arch/x86/kernel/cpu/debugfs.c         |   1 +
> >>>>  arch/x86/kernel/cpu/topology_common.c |   1 +
> >>>>  6 files changed, 328 insertions(+), 134 deletions(-)
> >>>>
> >>>> --
> >>>> 2.34.1
> >>>>
Dhananjay Ugwekar July 17, 2024, 8:04 a.m. UTC | #5
On 7/17/2024 4:17 AM, Ian Rogers wrote:
> On Tue, Jul 16, 2024 at 1:42 AM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@amd.com> wrote:
>>
>> Hello Ian,
>>
>> On 7/15/2024 8:52 PM, Ian Rogers wrote:
>>> On Mon, Jul 15, 2024 at 2:36 AM Dhananjay Ugwekar
>>> <Dhananjay.Ugwekar@amd.com> wrote:
>>>>
>>>> Hello Ian,
>>>>
>>>> On 7/12/2024 3:53 AM, Ian Rogers wrote:
>>>>> On Thu, Jul 11, 2024 at 3:25 AM Dhananjay Ugwekar
>>>>> <Dhananjay.Ugwekar@amd.com> wrote:
>>>>>>
>>>>>> Currently the energy-cores event in the power PMU aggregates energy
>>>>>> consumption data at a package level. On the other hand the core energy
>>>>>> RAPL counter in AMD CPUs has a core scope (which means the energy
>>>>>> consumption is recorded separately for each core). Earlier efforts to add
>>>>>> the core event in the power PMU had failed [1], due to the difference in
>>>>>> the scope of these two events. Hence, there is a need for a new core scope
>>>>>> PMU.
>>>>>>
>>>>>> This patchset adds a new "power_per_core" PMU alongside the existing
>>>>>> "power" PMU, which will be responsible for collecting the new
>>>>>> "energy-per-core" event.
>>>>>
>>>>> Sorry for being naive, is the only reason for adding the new PMU for
>>>>> the sake of the cpumask? Perhaps we can add per event cpumasks like
>>>>> say `/sys/devices/power/events/energy-per-core.cpumask` which contains
>>>>> the CPUs of the different cores in this case. There's supporting
>>>>> hotplugging CPUs as an issue. Adding the tool support for this
>>>>> wouldn't be hard and it may be less messy (although old perf tools on
>>>>> new kernels wouldn't know about these files).
>>>>
>>>> I went over the two approaches and below are my thoughts,
>>>>
>>>> New PMU approach:
>>>> Pros
>>>> * It will work with older perf tools, hence these patches can be backported to an older kernel and the new per-core event will work there as well.
>>>> Cons
>>>> * More code changes in rapl.c
>>>>
>>>> Event specific cpumask approach:
>>>> Pros
>>>> * It might be easier to add diff scope events within the same PMU in future(although currently I'm not able to find such a usecase, apart from the RAPL pkg and core energy counters)
>>>> Cons
>>>> * Both new kernel and perf tool will be required to use the new per-core event.
>>>>
>>>> I feel that while the event-specific cpumask is a viable alternative to the new PMU addition approach, I dont see any clear pros to select that over the current approach. Please let me know if you have any design related concerns to the addition of new PMU or your concern is mostly about the amount of code changes in this approach.
>>>
>>> Thanks Dhananjay, and thanks for taking the time for an objective
>>> discussion on the mailing list. I'm very supportive of seeing the work
>>> you are enabling land.
>>>
>>> My concern comes from the tool side. If every PMU starts to have
>>> variants for the sake of the cpumask what does this mean for
>>> aggregation in the perf tool? There is another issue around event
>>> grouping, you can't group events across PMUs, but my feeling is that
>>> event grouping needs to be rethought. By default the power_per_core
>>> events are going to be aggregated together by the perf tool, which
>>> then loses their per_core-ness.
>>
>> Yea right, maybe we need to fix this behavior.
>>
>>>
>>> I was trying to think of the same problem but in other PMUs. One
>>> thought I had was the difference between hyperthread and core events.
>>> At least on Intel, some events can only count for the whole core not
>>> per hyperthread. The events don't have a cpu_per_core PMU, they just
>>> use the regular cpu one, and so the cpumask is set to all online
>>> hyperthreads. When a per-core event is programmed it will get
>>> programmed on every hyperthread and so counted twice for the core.
>>> This at the least wastes a counter, but it probably also yields twice
>>> the expected count as every event is counted twice then aggregated. So
>>> this is just wrong and the user is expected to be smart and fix it
>>> (checking the x86 events there is a convention to use a ".ALL" or
>>> ".ANY" suffix for core wide events iirc). If we had a cpumask for
>>> these events then we could avoid the double setting, free up a counter
>>> and avoid double counting. Were we to fix things the way it is done in
>>> this patch series we'd add another PMU.
>>
>> Yes, this seems like a valid usecase for event-specific cpumasks.
>>
>>>
>>> My feeling is that in the longer term a per event cpumask looks
>>> cleaner. I think either way you need a new kernel for the new RAPL
>>> events. The problem with an old perf tool and a new kernel, this
>>> doesn't normally happen with distributions as they match the perf tool
>>> to the kernel version needlessly losing features and fixes along the
>>> way. If the new PMU is going to get backported through fixes.. then we
>>> can do similar for reading the per event cpumask. I'd be tempted not
>>> to do this and focus on the next LTS kernel, getting the kernel and
>>> tool fixes in as necessary.
>>
>> Makes sense, even though this approach will require more effort but it seems
>> to be worthwhile as it would help things down the line (make it easier to have
>> heterogenous-scope events within a PMU). I'll need to go through the perf tool
>> to see how we can design this. I'll get back with an RFC series probably once
>> I have an initial design in mind.
> 
> Hi Dhananjay,
> 
> I can have a go at the perf tool side of this - I probably know the
> way around the code best. Basically we need to do something similar to
> how other "<event>.<setting>" values are parsed:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n335
Yea, right.

> The cpumask handling in the perf tool is a little weird as there is a
> summary value in the evlist. Anyway, I can do that if you want to spin
> the RAPL/power PMU side of things.

Sounds great!, I'll be happy to refactor the RAPL code to use the event.cpumask 
feature to add the per-core energy counter. Also, please let me know if you need 
any help from me on the perf tool side as well.

Regards,
Dhananjay

> 
> Thanks,
> Ian
> 
>> Thanks,
>> Dhananjay
>>
>>>
>>> Thanks,
>>> Ian
>>>
>>>
>>>> Thanks,
>>>> Dhananjay
>>>>
>>>>>
>>>>> Thanks,
>>>>> Ian
>>>>>
>>>>>
>>>>>> Tested the package level and core level PMU counters with workloads
>>>>>> pinned to different CPUs.
>>>>>>
>>>>>> Results with workload pinned to CPU 1 in Core 1 on an AMD Zen4 Genoa
>>>>>> machine:
>>>>>>
>>>>>> $ perf stat -a --per-core -e power_per_core/energy-per-core/ -- sleep 1
>>>>>>
>>>>>>  Performance counter stats for 'system wide':
>>>>>>
>>>>>> S0-D0-C0         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C1         1          5.72 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C2         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C3         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C4         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C5         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C6         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C7         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C8         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C9         1          0.02 Joules power_per_core/energy-per-core/
>>>>>> S0-D0-C10        1          0.02 Joules power_per_core/energy-per-core/
>>>>>>
>>>>>> [1]: https://lore.kernel.org/lkml/3e766f0e-37d4-0f82-3868-31b14228868d@linux.intel.com/
>>>>>>
>>>>>> This patchset applies cleanly on top of v6.10-rc7 as well as latest
>>>>>> tip/master.
>>>>>>
>>>>>> v4 changes:
>>>>>> * Add patch 11 which removes the unused function cpu_to_rapl_pmu()
>>>>>> * Add Rui's rb tag for patch 1
>>>>>> * Invert the pmu scope check logic in patch 2 (Peter)
>>>>>> * Add comments explaining the scope check in patch 2 (Peter)
>>>>>> * Use cpumask_var_t instead of cpumask_t in patch 5 (Peter)
>>>>>> * Move renaming code to patch 8 (Rui)
>>>>>> * Reorder the cleanup order of per-core and per-pkg PMU in patch 10 (Rui)
>>>>>> * Add rapl_core_hw_unit variable to store the per-core PMU unit in patch
>>>>>>   10 (Rui)
>>>>>>
>>>>>> PS: Scope check logic is still kept the same (i.e., all Intel systems being
>>>>>> considered as die scope), Rui will be modifying it to limit the die-scope
>>>>>> only to Cascadelake-AP in a future patch on top of this patchset.
>>>>>>
>>>>>> v3 changes:
>>>>>> * Patch 1 added to introduce the logical_core_id which is unique across
>>>>>>   the system (Prateek)
>>>>>> * Use the unique topology_logical_core_id() instead of
>>>>>>   topology_core_id() (which is only unique within a package on tested
>>>>>>   AMD and Intel systems) in Patch 10
>>>>>>
>>>>>> v2 changes:
>>>>>> * Patches 6,7,8 added to split some changes out of the last patch
>>>>>> * Use container_of to get the rapl_pmus from event variable (Rui)
>>>>>> * Set PERF_EV_CAP_READ_ACTIVE_PKG flag only for pkg scope PMU (Rui)
>>>>>> * Use event id 0x1 for energy-per-core event (Rui)
>>>>>> * Use PERF_RAPL_PER_CORE bit instead of adding a new flag to check for
>>>>>>   per-core counter hw support (Rui)
>>>>>>
>>>>>> Dhananjay Ugwekar (10):
>>>>>>   perf/x86/rapl: Fix the energy-pkg event for AMD CPUs
>>>>>>   perf/x86/rapl: Rename rapl_pmu variables
>>>>>>   perf/x86/rapl: Make rapl_model struct global
>>>>>>   perf/x86/rapl: Move cpumask variable to rapl_pmus struct
>>>>>>   perf/x86/rapl: Add wrapper for online/offline functions
>>>>>>   perf/x86/rapl: Add an argument to the cleanup and init functions
>>>>>>   perf/x86/rapl: Modify the generic variable names to *_pkg*
>>>>>>   perf/x86/rapl: Remove the global variable rapl_msrs
>>>>>>   perf/x86/rapl: Add per-core energy counter support for AMD CPUs
>>>>>>   perf/x86/rapl: Remove the unused function cpu_to_rapl_pmu
>>>>>>
>>>>>> K Prateek Nayak (1):
>>>>>>   x86/topology: Introduce topology_logical_core_id()
>>>>>>
>>>>>>  Documentation/arch/x86/topology.rst   |   4 +
>>>>>>  arch/x86/events/rapl.c                | 454 ++++++++++++++++++--------
>>>>>>  arch/x86/include/asm/processor.h      |   1 +
>>>>>>  arch/x86/include/asm/topology.h       |   1 +
>>>>>>  arch/x86/kernel/cpu/debugfs.c         |   1 +
>>>>>>  arch/x86/kernel/cpu/topology_common.c |   1 +
>>>>>>  6 files changed, 328 insertions(+), 134 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.34.1
>>>>>>
Ian Rogers July 17, 2024, 3:36 p.m. UTC | #6
On Wed, Jul 17, 2024 at 1:05 AM Dhananjay Ugwekar
<Dhananjay.Ugwekar@amd.com> wrote:
...
> Sounds great!, I'll be happy to refactor the RAPL code to use the event.cpumask
> feature to add the per-core energy counter. Also, please let me know if you need
> any help from me on the perf tool side as well.

I hope to send out some patches soon. One thought I have is that
"event.cpumask" is probably less intention revealing the "event.cpus"
as a name, so I'm going to code assuming the suffix is ".cpus" but
obviously we can change it to ".cpumask". To me the "mask" makes it
sound like we "and" the values with something, hence wanting to avoid
the confusion. There's much confusion in the area already, such as the
term "cpu map" that does not implement some kind of map.

Thanks,
Ian