mbox series

[v3,0/2] arm: remove cpu_efficiency

Message ID 20171024102718.16113-1-dietmar.eggemann@arm.com
Headers show
Series arm: remove cpu_efficiency | expand

Message

Dietmar Eggemann Oct. 24, 2017, 10:27 a.m. UTC
v2: review results [1]:

Vincent pointed out that there are missing of_node_put() calls for cpu
device node in parse_dt_topology(). Added them with an extra patch.

v1: review results [2]:

Vincent pointed out that there is a superfluous continue statement in
parse_dt_topology(). Got rid of it.
Krzysztof Kozlowski took the exynos and Simon Horman the renesas related
dt bits into their for-next (v4.15) branches.

***

For Cortex-A15/A7 arm big.LITTLE systems there are currently two ways to
set the cpu capacity.

The first one (commit 06073ee26775 "ARM: 8621/3: parse cpu
capacity-dmips-mhz from DT") is based on dt 'cpu capacity-dmips-mhz'
bindings and the appropriate dt parsing code in
drivers/base/arch_topology.c. It further takes differences in maximum
cpu frequency values into consideration, normalizes the maximum cpu
capacity to SCHED_CAPACITY_SCALE (1024) and scales all the cpus
accordingly.

  cpu capacity = (capacity-dmips-mhz * max cpu frequency) / 
                 (max capacity-dmips-mhz * max (max cpu frequency)

This solution is shared between arm and arm64 and works for other
combinations of big and little cpus (besides Cortex-A15/A7) as well.

The second one (commit 339ca09d7ada "ARM: 7463/1: topology: Update
cpu_power according to DT information" is based on the 'struct
cpu_efficiency table_efficiency[]' and the dt parsing code in
arch/arm/kernel/topology.c. It further requires a clock-frequency
property per cpu node, calculates a so called middle frequency for an
average cpu in the system which is as close as possible to
SCHED_CAPACITY_SCALE (1024) and uses this to compute the cpu capacity
values.

  cpu capacity = (cpu efficiency * clock frequency) / middle capacity

This solution only works for Cortex-A15/A7 arm big.LITTLE systems.

The aim of this patch is to have only one solution for all arm and arm64
big.LITTLE platforms.

(1) Therefore, it removes the code for the 'cpu_efficiency/
    clock-frequency dt property' (second) solution.

(2) Moreover, it will also assure that the highest original cpu capacity
    (rq->cpu_capacity_orig) in a non-smt system is SCHED_CAPACITY_SCALE
    (1024).

(3) And finally, another advantage is the dynamic detection of the max
    cpu frequency which comes with the first solution instead of the
    static clock-frequency dt property value.

Currently, the arm dt parsing code in parse_dt_topology() checks if the
dt uses the capacity-dmips-mhz property. If this is the case it uses
the first, otherwise the second solution. This patch removes the code
for the second solution from arch/arm/kernel/topology.c.

With the dt related patches for exynos and renesas now in the
appropriated for-next branches for v4.15 there are no Cortex-A15/A7 arm
big.LITTLE systems left relying on the 'cpu_efficiency/clock-frequency
dt property' based solution anymore.

This patch has been tested on TC2 and Samsung Chromebook 2 13"
(peach-pi, Exynos 5800).

[1] https://marc.info/?l=linux-kernel&m=150781686132670&w=2
[2] https://marc.info/?l=linux-kernel&m=150410411807050&w=2

Changes v2->v3:

  - Rebase on top of v4.14-rc6
  - Added missing of_node_put() for cpu device node [02/02]

Changes v1->v2:

  - Rebase on top of v4.14-rc4
  - Remove superfluous continue statement in parse_dt_topology()
    [01/04]
  - Remove 'cpu capacity scale management' and 'cpu capacity table'
    related comments [01/04]
  - Remove dt related patches [02-04/04]

Dietmar Eggemann (2):
  arm: topology: remove cpu_efficiency
  arm: topology: add missing of_node_put() for cpu device node

 arch/arm/kernel/topology.c | 135 ++-------------------------------------------
 1 file changed, 6 insertions(+), 129 deletions(-)

-- 
2.11.0

Comments

Russell King (Oracle) Oct. 24, 2017, 10:52 a.m. UTC | #1
On Tue, Oct 24, 2017 at 11:27:16AM +0100, Dietmar Eggemann wrote:
> With the dt related patches for exynos and renesas now in the

> appropriated for-next branches for v4.15 there are no Cortex-A15/A7 arm

> big.LITTLE systems left relying on the 'cpu_efficiency/clock-frequency

> dt property' based solution anymore.


This is way too early to remove support for something that has been
in place since 2012.  As you've just shown, people are using it with
DT files today.  We don't know how long people will persist using
older files, and we don't know whether there are DT files out in the
wild that we don't know about.

Our general rule is that we maintain compatibility for older DT.

NAK.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
Dietmar Eggemann Oct. 24, 2017, 2:31 p.m. UTC | #2
Hi Russel,

Thanks for the review!

On 24/10/17 11:52, Russell King - ARM Linux wrote:
> On Tue, Oct 24, 2017 at 11:27:16AM +0100, Dietmar Eggemann wrote:

>> With the dt related patches for exynos and renesas now in the

>> appropriated for-next branches for v4.15 there are no Cortex-A15/A7 arm

>> big.LITTLE systems left relying on the 'cpu_efficiency/clock-frequency

>> dt property' based solution anymore.

> 

> This is way too early to remove support for something that has been

> in place since 2012.  As you've just shown, people are using it with

> DT files today.  We don't know how long people will persist using

> older files, and we don't know whether there are DT files out in the

> wild that we don't know about.


Understood. But do we really care about out of tree dt files?

In case the mentioned exynos and renesas Cortex-A15/A7 platforms change
to using 'capacity-dmips-mhz' in v4.15 there shouldn't be any
Cortex-A15/A7 platforms in mainline left using the cpu_efficiency
solution anymore.

> Our general rule is that we maintain compatibility for older DT.


OK.

> NAK.


Then I still send out [2/2] separately.
Dietmar Eggemann Nov. 24, 2017, 11:57 a.m. UTC | #3
On 10/24/2017 03:31 PM, Dietmar Eggemann wrote:
> Hi Russel,

> 

> Thanks for the review!

> 

> On 24/10/17 11:52, Russell King - ARM Linux wrote:

>> On Tue, Oct 24, 2017 at 11:27:16AM +0100, Dietmar Eggemann wrote:

>>> With the dt related patches for exynos and renesas now in the

>>> appropriated for-next branches for v4.15 there are no Cortex-A15/A7 arm

>>> big.LITTLE systems left relying on the 'cpu_efficiency/clock-frequency

>>> dt property' based solution anymore.

>>

>> This is way too early to remove support for something that has been

>> in place since 2012.  As you've just shown, people are using it with

>> DT files today.  We don't know how long people will persist using

>> older files, and we don't know whether there are DT files out in the

>> wild that we don't know about.

> 

> Understood. But do we really care about out of tree dt files?

> 

> In case the mentioned exynos and renesas Cortex-A15/A7 platforms change

> to using 'capacity-dmips-mhz' in v4.15 there shouldn't be any

> Cortex-A15/A7 platforms in mainline left using the cpu_efficiency

> solution anymore.

> 

>> Our general rule is that we maintain compatibility for older DT.

> 

> OK.

> 

>> NAK.


Morten reminded me that the per-cpu capacity functionality for
heterogeneous systems (different values than the default 1024 
(SCHED_CAPACITY_SCALE)) has been broken since v4.4 (Jan 2016) due to 
8cd5601c5060 "sched/fair: Convert arch_scale_cpu_capacity() from weak 
function to #define".

So even people had cpu clock-frequency specified in their dt for their
Cortex-A15/A7 arm platforms, they didn't noticed (or cared) that the
task scheduler sees all cpus with the capacity of 1024 rather than
different values for the A15's and A7's.

As an example, a v4.3 TC2 (w/ 'clock-frequency = <1000000000>' added for 
A15 and ,clock-frequency = <800000000>, added for a A7) had the 
following cpu capacity values
The log snippet (for cpu0: A15) comes from sched_domain_debug_one() 
[kernel/sched/core.c] which appears if scheduler debugging is enabled:

   ...
   CPU0 attaching sched-domain:
    domain 0: span 0-1 level MC
     groups: 0 (cpu_capacity = 1441) 1 (cpu_capacity = 1441)
     domain 1: span 0-4 level DIE
      groups: 0-1 (cpu_capacity = 2882) 2-4 (cpu_capacity = 1818)
   ...

   So A15 has 1441 and A7 has 606 as cpu capacity.

Whereas with v4.4 it switched back to:

  ...
  CPU0 attaching sched-domain:
   domain 0: span 0-1 level MC
    groups: 0 1
    domain 1: span 0-4 level DIE
     groups: 0-1 (cpu_capacity = 2048) 2-4 (cpu_capacity = 3072)
  ...

    So A15 and A7 have 1024 (SCHED_CAPACITY_SCALE).


So maybe the requirement to maintain compatibility for older DT's is
less important in this specific case and the argument that we can get
rid of a lot of code and make arm/arm64 consistent in this area is more
important than backward compatibility?

BTW, the missing bit, the #define of arch_scale_cpu_capacity, is only
provided with 552c4653bf89 "arm: wire cpu-invariant accounting support
up to the task scheduler" which is currently queued for v4.15-rc1.

-- Dietmar

[...]