From patchwork Fri May 23 15:53:05 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 30826 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qc0-f198.google.com (mail-qc0-f198.google.com [209.85.216.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 7341820AEA for ; Fri, 23 May 2014 15:55:04 +0000 (UTC) Received: by mail-qc0-f198.google.com with SMTP id m20sf16277354qcx.5 for ; Fri, 23 May 2014 08:55:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:sender:precedence:list-id :x-original-sender:x-original-authentication-results:mailing-list :list-post:list-help:list-archive:list-unsubscribe; bh=gF1BbEYdZZkmAmOlploqEIub19MEM/OyRUAHBLDeDiw=; b=PZ4oV0+cW1ZoNJeUFGKT/jv5HkWZlnZZHe9634oRGfLQtaN6BZIwtYcgaFbNr5BLrT dv1/2XLf8OALa3Q4P9ujm9/pgWRNO+00+ba9wlIozU20ar662x1erUS97svndWFA+0/H 3LH0MbXpHgjzT5PWx8A0FLYwtWglxci4yqieRO2NsZXdGxsho9dsyHwwt5GB0c1OZKUO 3vzQuK9i8qyJe5ZU0B6h8tGJqxA/QTCvqHqJNi+ReEwhZCnUnD1YDjp017GXeBmGHo/b 1IlvErCAm/XIQ7aE3NW1y9Ge27UkkeWsih87mNnmQ46yP60fHgOKgxLsXkur1MxjjloY wO6w== X-Gm-Message-State: ALoCoQmrwZBSEwSIS00Qnds9/90Axt1Be+8XnSFcHA08Au3frfWbc2mzTwn+Ib+jwW+p27r6GXCV X-Received: by 10.58.106.75 with SMTP id gs11mr2412946veb.18.1400860504223; Fri, 23 May 2014 08:55:04 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.50.16 with SMTP id r16ls1813126qga.71.gmail; Fri, 23 May 2014 08:55:04 -0700 (PDT) X-Received: by 10.52.175.69 with SMTP id by5mr4248718vdc.16.1400860504119; Fri, 23 May 2014 08:55:04 -0700 (PDT) Received: from mail-ve0-f173.google.com (mail-ve0-f173.google.com [209.85.128.173]) by mx.google.com with ESMTPS id g4si1859909vdt.98.2014.05.23.08.55.04 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 23 May 2014 08:55:04 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.128.173 as permitted sender) client-ip=209.85.128.173; Received: by mail-ve0-f173.google.com with SMTP id pa12so6460290veb.4 for ; Fri, 23 May 2014 08:55:04 -0700 (PDT) X-Received: by 10.58.216.163 with SMTP id or3mr722806vec.80.1400860504030; Fri, 23 May 2014 08:55:04 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.220.221.72 with SMTP id ib8csp42747vcb; Fri, 23 May 2014 08:55:03 -0700 (PDT) X-Received: by 10.66.250.109 with SMTP id zb13mr6931688pac.97.1400860503113; Fri, 23 May 2014 08:55:03 -0700 (PDT) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ah1si4439705pbc.97.2014.05.23.08.55.02 for ; Fri, 23 May 2014 08:55:02 -0700 (PDT) Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753332AbaEWPyb (ORCPT + 27 others); Fri, 23 May 2014 11:54:31 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:53995 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753196AbaEWPy1 (ORCPT ); Fri, 23 May 2014 11:54:27 -0400 Received: by mail-wg0-f44.google.com with SMTP id a1so5014060wgh.27 for ; Fri, 23 May 2014 08:54:26 -0700 (PDT) X-Received: by 10.180.228.100 with SMTP id sh4mr4192521wic.40.1400860466272; Fri, 23 May 2014 08:54:26 -0700 (PDT) Received: from lmenx30s.lme.st.com (LPuteaux-656-01-48-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPSA id m1sm3558594wib.20.2014.05.23.08.54.23 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 23 May 2014 08:54:24 -0700 (PDT) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, linux@arm.linux.org.uk, linux-arm-kernel@lists.infradead.org Cc: preeti@linux.vnet.ibm.com, Morten.Rasmussen@arm.com, efault@gmx.de, nicolas.pitre@linaro.org, linaro-kernel@lists.linaro.org, daniel.lezcano@linaro.org, Vincent Guittot Subject: [PATCH v2 11/11] sched: replace capacity by activity Date: Fri, 23 May 2014 17:53:05 +0200 Message-Id: <1400860385-14555-12-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1400860385-14555-1-git-send-email-vincent.guittot@linaro.org> References: <1400860385-14555-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: vincent.guittot@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.128.173 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , The scheduler tries to compute how many tasks a group of CPUs can handle by assuming that a task's load is SCHED_LOAD_SCALE and a CPU capacity is SCHED_POWER_SCALE. We can now have a better idea of the utilization of a group fo CPUs thanks to group_actitvity and deduct how many capacity is still available. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 94 ++++++++++++++--------------------------------------- 1 file changed, 24 insertions(+), 70 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2501e49..05b9502 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5538,11 +5538,10 @@ struct sg_lb_stats { unsigned long group_power; unsigned long group_activity; /* Total activity of the group */ unsigned int sum_nr_running; /* Nr tasks running in the group */ - unsigned int group_capacity; + long group_capacity; unsigned int idle_cpus; unsigned int group_weight; int group_imb; /* Is there an imbalance in the group ? */ - int group_has_capacity; /* Is there extra capacity in the group? */ #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; unsigned int nr_preferred_running; @@ -5785,31 +5784,6 @@ void update_group_power(struct sched_domain *sd, int cpu) } /* - * Try and fix up capacity for tiny siblings, this is needed when - * things like SD_ASYM_PACKING need f_b_g to select another sibling - * which on its own isn't powerful enough. - * - * See update_sd_pick_busiest() and check_asym_packing(). - */ -static inline int -fix_small_capacity(struct sched_domain *sd, struct sched_group *group) -{ - /* - * Only siblings can have significantly less than SCHED_POWER_SCALE - */ - if (!(sd->flags & SD_SHARE_CPUPOWER)) - return 0; - - /* - * If ~90% of the cpu_power is still there, we're good. - */ - if (group->sgp->power * 32 > group->sgp->power_orig * 29) - return 1; - - return 0; -} - -/* * Group imbalance indicates (and tries to solve) the problem where balancing * groups is inadequate due to tsk_cpus_allowed() constraints. * @@ -5843,33 +5817,6 @@ static inline int sg_imbalanced(struct sched_group *group) return group->sgp->imbalance; } -/* - * Compute the group capacity. - * - * Avoid the issue where N*frac(smt_power) >= 1 creates 'phantom' cores by - * first dividing out the smt factor and computing the actual number of cores - * and limit power unit capacity with that. - */ -static inline int sg_capacity(struct lb_env *env, struct sched_group *group) -{ - unsigned int capacity, smt, cpus; - unsigned int power, power_orig; - - power = group->sgp->power; - power_orig = group->sgp->power_orig; - cpus = group->group_weight; - - /* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */ - smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig); - capacity = cpus / smt; /* cores */ - - capacity = min_t(unsigned, capacity, DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE)); - if (!capacity) - capacity = fix_small_capacity(env->sd, group); - - return capacity; -} - /** * update_sg_lb_stats - Update sched_group's statistics for load balancing. * @env: The load balancing environment. @@ -5918,10 +5865,9 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_weight = group->group_weight; sgs->group_imb = sg_imbalanced(group); - sgs->group_capacity = sg_capacity(env, group); + sgs->group_capacity = group->sgp->power_orig - sgs->group_activity; + - if (sgs->group_capacity > sgs->sum_nr_running) - sgs->group_has_capacity = 1; } /** @@ -5945,7 +5891,15 @@ static bool update_sd_pick_busiest(struct lb_env *env, if (sgs->avg_load <= sds->busiest_stat.avg_load) return false; - if (sgs->sum_nr_running > sgs->group_capacity) + /* The group has an obvious long run overload */ + if (sgs->group_capacity < 0) + return true; + + /* + * The group has a short run overload because more tasks than available + * CPUs are running + */ + if (sgs->sum_nr_running > sgs->group_weight) return true; /* @@ -6052,8 +6006,8 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd * with a large weight task outweighs the tasks on the system). */ if (prefer_sibling && sds->local && - sds->local_stat.group_has_capacity) - sgs->group_capacity = min(sgs->group_capacity, 1U); + sds->local_stat.group_capacity > 0) + sgs->group_capacity = min(sgs->group_capacity, 1L); if (update_sd_pick_busiest(env, sds, sg, sgs)) { sds->busiest = sg; @@ -6228,7 +6182,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s * have to drop below capacity to reach cpu-load equilibrium. */ load_above_capacity = - (busiest->sum_nr_running - busiest->group_capacity); + (busiest->sum_nr_running - busiest->group_weight); load_above_capacity *= (SCHED_LOAD_SCALE * SCHED_POWER_SCALE); load_above_capacity /= busiest->group_power; @@ -6294,6 +6248,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) local = &sds.local_stat; busiest = &sds.busiest_stat; + /* ASYM feature bypasses nice load balance check */ if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) && check_asym_packing(env, &sds)) return sds.busiest; @@ -6313,8 +6268,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */ - if (env->idle == CPU_NEWLY_IDLE && local->group_has_capacity && - !busiest->group_has_capacity) + if (env->idle == CPU_NEWLY_IDLE && (local->group_capacity > 0) + && (busiest->group_capacity < 0)) goto force_balance; /* @@ -6372,7 +6327,7 @@ static struct rq *find_busiest_queue(struct lb_env *env, int i; for_each_cpu_and(i, sched_group_cpus(group), env->cpus) { - unsigned long power, capacity, wl; + unsigned long power, wl; enum fbq_type rt; rq = cpu_rq(i); @@ -6400,18 +6355,17 @@ static struct rq *find_busiest_queue(struct lb_env *env, if (rt > env->fbq_type) continue; - power = power_of(i); - capacity = DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE); - if (!capacity) - capacity = fix_small_capacity(env->sd, group); - wl = weighted_cpuload(i); /* * When comparing with imbalance, use weighted_cpuload() * which is not scaled with the cpu power. */ - if (capacity && rq->nr_running == 1 && wl > env->imbalance) + + power = power_of(i); + + if (rq->nr_running == 1 && wl > env->imbalance && + ((power * env->sd->imbalance_pct) >= (rq->cpu_power_orig * 100))) continue; /*