From patchwork Mon Jul 28 17:51:45 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 34402 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-we0-f200.google.com (mail-we0-f200.google.com [74.125.82.200]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id F327D202E4 for ; Mon, 28 Jul 2014 18:01:38 +0000 (UTC) Received: by mail-we0-f200.google.com with SMTP id t60sf5279951wes.11 for ; Mon, 28 Jul 2014 11:01:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:sender:precedence:list-id :x-original-sender:x-original-authentication-results:mailing-list :list-post:list-help:list-archive:list-unsubscribe; bh=IMCu+Bu7W0p36Xyx1pB85+DZEN1bgsf5a4anyTxesyo=; b=ipEqY6pWzqkEhHiULjQOFDiICtliZ1F4PbfK8o/HjNyOIo65Q7pc2tebvjsAcuZmKk c6FTTWWK1DQwJ8AfK9Zaqs18Dv4EjrdVnYAG0HKhUdYTIdYah0IJUodPHfeFY64ah5m3 hugLEK3LmE16CujdAmmL4MFHGMHOM9aWIRTWog8FuwAQpcScVLOkpFrp5y5DakM9mSUK 9Hozl4lO5V/EyGnqd+MB7eI7S/0GFk5qOjp2/RDWOpF6p9L78GIH0BHKO11/p2Gdbfq/ FsSWQSN3Hloj7v0k+gPLGPkaJDRemf77RvSULItvgXUb+xYZc1yttQYfpaOnZSY47cpA KAoA== X-Gm-Message-State: ALoCoQlo8NjBUcNVi6pAHOj1JmgLknDfx+IwB3fNsM9h2piXrXNDDkObmRrFdD+h6YYtqblfLtvG X-Received: by 10.180.182.227 with SMTP id eh3mr1768427wic.4.1406570496647; Mon, 28 Jul 2014 11:01:36 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.34.44 with SMTP id k41ls2117916qgk.78.gmail; Mon, 28 Jul 2014 11:01:36 -0700 (PDT) X-Received: by 10.52.31.104 with SMTP id z8mr39689329vdh.23.1406570496453; Mon, 28 Jul 2014 11:01:36 -0700 (PDT) Received: from mail-vc0-f176.google.com (mail-vc0-f176.google.com [209.85.220.176]) by mx.google.com with ESMTPS id ad18si12995923vec.43.2014.07.28.11.01.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 28 Jul 2014 11:01:36 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.176 as permitted sender) client-ip=209.85.220.176; Received: by mail-vc0-f176.google.com with SMTP id id10so11745737vcb.7 for ; Mon, 28 Jul 2014 11:01:36 -0700 (PDT) X-Received: by 10.52.244.81 with SMTP id xe17mr39903038vdc.24.1406570496360; Mon, 28 Jul 2014 11:01:36 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.221.37.5 with SMTP id tc5csp186386vcb; Mon, 28 Jul 2014 11:01:35 -0700 (PDT) X-Received: by 10.68.93.101 with SMTP id ct5mr40193568pbb.27.1406570054850; Mon, 28 Jul 2014 10:54:14 -0700 (PDT) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id qc2si9288155pdb.307.2014.07.28.10.54.11 for ; Mon, 28 Jul 2014 10:54:14 -0700 (PDT) Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751902AbaG1Rxj (ORCPT + 26 others); Mon, 28 Jul 2014 13:53:39 -0400 Received: from mail-wg0-f47.google.com ([74.125.82.47]:41280 "EHLO mail-wg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751587AbaG1RxB (ORCPT ); Mon, 28 Jul 2014 13:53:01 -0400 Received: by mail-wg0-f47.google.com with SMTP id b13so7688788wgh.30 for ; Mon, 28 Jul 2014 10:52:57 -0700 (PDT) X-Received: by 10.194.184.166 with SMTP id ev6mr47955673wjc.61.1406569977105; Mon, 28 Jul 2014 10:52:57 -0700 (PDT) Received: from lmenx30s.lme.st.com (LPuteaux-656-01-48-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPSA id ex4sm33758149wic.2.2014.07.28.10.52.55 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 28 Jul 2014 10:52:56 -0700 (PDT) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, preeti@linux.vnet.ibm.com, linux@arm.linux.org.uk, linux-arm-kernel@lists.infradead.org Cc: riel@redhat.com, Morten.Rasmussen@arm.com, efault@gmx.de, nicolas.pitre@linaro.org, linaro-kernel@lists.linaro.org, daniel.lezcano@linaro.org, dietmar.eggemann@arm.com, Vincent Guittot Subject: [PATCH v4 11/12] sched: replace capacity_factor by utilization Date: Mon, 28 Jul 2014 19:51:45 +0200 Message-Id: <1406569906-9763-12-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1406569906-9763-1-git-send-email-vincent.guittot@linaro.org> References: <1406569906-9763-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: vincent.guittot@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.220.176 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , The scheduler tries to compute how many tasks a group of CPUs can handle by assuming that a task's load is SCHED_LOAD_SCALE and a CPU capacity is SCHED_CAPACITY_SCALE. We can now have a better idea of the capacity of a group of CPUs and of the utilization of this group thanks to the rework of group_capacity_orig and the group_utilization. We can now deduct how many capacity is still available. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 115 +++++++++++++++++++++------------------------------- 1 file changed, 47 insertions(+), 68 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8524760..292ee7c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5612,13 +5612,13 @@ struct sg_lb_stats { unsigned long sum_weighted_load; /* Weighted load of group's tasks */ unsigned long load_per_task; unsigned long group_capacity; + unsigned long group_capacity_orig; unsigned long group_utilization; /* Total utilization of the group */ unsigned int sum_nr_running; /* Nr tasks running in the group */ - unsigned int group_capacity_factor; unsigned int idle_cpus; unsigned int group_weight; int group_imb; /* Is there an imbalance in the group ? */ - int group_has_free_capacity; + int group_out_of_capacity; #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; unsigned int nr_preferred_running; @@ -5838,31 +5838,6 @@ void update_group_capacity(struct sched_domain *sd, int cpu) } /* - * Try and fix up capacity for tiny siblings, this is needed when - * things like SD_ASYM_PACKING need f_b_g to select another sibling - * which on its own isn't powerful enough. - * - * See update_sd_pick_busiest() and check_asym_packing(). - */ -static inline int -fix_small_capacity(struct sched_domain *sd, struct sched_group *group) -{ - /* - * Only siblings can have significantly less than SCHED_CAPACITY_SCALE - */ - if (!(sd->flags & SD_SHARE_CPUCAPACITY)) - return 0; - - /* - * If ~90% of the cpu_capacity is still there, we're good. - */ - if (group->sgc->capacity * 32 > group->sgc->capacity_orig * 29) - return 1; - - return 0; -} - -/* * Group imbalance indicates (and tries to solve) the problem where balancing * groups is inadequate due to tsk_cpus_allowed() constraints. * @@ -5896,32 +5871,30 @@ static inline int sg_imbalanced(struct sched_group *group) return group->sgc->imbalance; } -/* - * Compute the group capacity factor. - * - * Avoid the issue where N*frac(smt_capacity) >= 1 creates 'phantom' cores by - * first dividing out the smt factor and computing the actual number of cores - * and limit unit capacity with that. - */ -static inline int sg_capacity_factor(struct lb_env *env, struct sched_group *group) +static inline int group_has_free_capacity(struct sg_lb_stats *sgs, + struct lb_env *env) { - unsigned int capacity_factor, smt, cpus; - unsigned int capacity, capacity_orig; + if ((sgs->group_capacity_orig * 100) > + (sgs->group_utilization * env->sd->imbalance_pct)) + return 1; - capacity = group->sgc->capacity; - capacity_orig = group->sgc->capacity_orig; - cpus = group->group_weight; + if (sgs->sum_nr_running < sgs->group_weight) + return 1; - /* smt := ceil(cpus / capacity), assumes: 1 < smt_capacity < 2 */ - smt = DIV_ROUND_UP(SCHED_CAPACITY_SCALE * cpus, capacity_orig); - capacity_factor = cpus / smt; /* cores */ + return 0; +} - capacity_factor = min_t(unsigned, - capacity_factor, DIV_ROUND_CLOSEST(capacity, SCHED_CAPACITY_SCALE)); - if (!capacity_factor) - capacity_factor = fix_small_capacity(env->sd, group); +static inline int group_is_overloaded(struct sg_lb_stats *sgs, + struct lb_env *env) +{ + if (sgs->sum_nr_running <= sgs->group_weight) + return 0; - return capacity_factor; + if ((sgs->group_capacity_orig * 100) < + (sgs->group_utilization * env->sd->imbalance_pct)) + return 1; + + return 0; } /** @@ -5967,6 +5940,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->idle_cpus++; } + sgs->group_capacity_orig = group->sgc->capacity_orig; /* Adjust by relative CPU capacity of the group */ sgs->group_capacity = group->sgc->capacity; sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; @@ -5977,10 +5951,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_weight = group->group_weight; sgs->group_imb = sg_imbalanced(group); - sgs->group_capacity_factor = sg_capacity_factor(env, group); - if (sgs->group_capacity_factor > sgs->sum_nr_running) - sgs->group_has_free_capacity = 1; + sgs->group_out_of_capacity = group_is_overloaded(sgs, env); } /** @@ -6004,7 +5976,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, if (sgs->avg_load <= sds->busiest_stat.avg_load) return false; - if (sgs->sum_nr_running > sgs->group_capacity_factor) + if (sgs->group_out_of_capacity) return true; if (sgs->group_imb) @@ -6105,17 +6077,21 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd /* * In case the child domain prefers tasks go to siblings - * first, lower the sg capacity factor to one so that we'll try + * first, lower the sg capacity to one so that we'll try * and move all the excess tasks away. We lower the capacity * of a group only if the local group has the capacity to fit - * these excess tasks, i.e. nr_running < group_capacity_factor. The + * these excess tasks, i.e. group_capacity > 0. The * extra check prevents the case where you always pull from the * heaviest group when it is already under-utilized (possible * with a large weight task outweighs the tasks on the system). */ if (prefer_sibling && sds->local && - sds->local_stat.group_has_free_capacity) - sgs->group_capacity_factor = min(sgs->group_capacity_factor, 1U); + group_has_free_capacity(&sds->local_stat, env)) { + if (sgs->sum_nr_running > 1) + sgs->group_out_of_capacity = 1; + sgs->group_capacity = min(sgs->group_capacity, + SCHED_CAPACITY_SCALE); + } if (update_sd_pick_busiest(env, sds, sg, sgs)) { sds->busiest = sg; @@ -6294,11 +6270,12 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s * Except of course for the group_imb case, since then we might * have to drop below capacity to reach cpu-load equilibrium. */ - load_above_capacity = - (busiest->sum_nr_running - busiest->group_capacity_factor); - - load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE); - load_above_capacity /= busiest->group_capacity; + load_above_capacity = busiest->sum_nr_running * + SCHED_LOAD_SCALE; + if (load_above_capacity > busiest->group_capacity) + load_above_capacity -= busiest->group_capacity; + else + load_above_capacity = ~0UL; } /* @@ -6361,6 +6338,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) local = &sds.local_stat; busiest = &sds.busiest_stat; + /* ASYM feature bypasses nice load balance check */ if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) && check_asym_packing(env, &sds)) return sds.busiest; @@ -6381,8 +6359,9 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */ - if (env->idle == CPU_NEWLY_IDLE && local->group_has_free_capacity && - !busiest->group_has_free_capacity) + if (env->idle == CPU_NEWLY_IDLE && + group_has_free_capacity(local, env) && + busiest->group_out_of_capacity) goto force_balance; /* @@ -6440,7 +6419,7 @@ static struct rq *find_busiest_queue(struct lb_env *env, int i; for_each_cpu_and(i, sched_group_cpus(group), env->cpus) { - unsigned long capacity, capacity_factor, wl; + unsigned long capacity, wl; enum fbq_type rt; rq = cpu_rq(i); @@ -6469,9 +6448,6 @@ static struct rq *find_busiest_queue(struct lb_env *env, continue; capacity = capacity_of(i); - capacity_factor = DIV_ROUND_CLOSEST(capacity, SCHED_CAPACITY_SCALE); - if (!capacity_factor) - capacity_factor = fix_small_capacity(env->sd, group); wl = weighted_cpuload(i); @@ -6479,7 +6455,10 @@ static struct rq *find_busiest_queue(struct lb_env *env, * When comparing with imbalance, use weighted_cpuload() * which is not scaled with the cpu capacity. */ - if (capacity_factor && rq->nr_running == 1 && wl > env->imbalance) + + if (rq->nr_running == 1 && wl > env->imbalance && + ((capacity * env->sd->imbalance_pct) >= + (rq->cpu_capacity_orig * 100))) continue; /*