From patchwork Tue Feb 6 19:23:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 127079 Delivered-To: patch@linaro.org Received: by 10.46.124.24 with SMTP id x24csp3208448ljc; Tue, 6 Feb 2018 11:23:29 -0800 (PST) X-Google-Smtp-Source: AH8x227FCuZinxFQETbHj+50nFeNsGTUbb6fEnKPdvdaX5pYvKF39ULpvt1GQgKHONxKDCo1cLm4 X-Received: by 10.98.147.85 with SMTP id b82mr3439814pfe.203.1517945009817; Tue, 06 Feb 2018 11:23:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517945009; cv=none; d=google.com; s=arc-20160816; b=aCfgLweviwBwlzpHpjMXDL/ntLLGYyWKLUSiJVfGwoSMtaMuctq5ZrguxET9wJIDgO tKFTamXgYzTRp3BEGZIqUDLT7lLkLaZqSsb0vf156MFIYHiw4R3rxkCtKMWUdjS0X0BY sBNI+8JguAebNtIbZpbhesN9yXUIpS6BfQLZrElMU8hCjRkDnEYs17XmzOtS+BriT0G1 C3mwF6TCRTCxVWu7MqofUwVcrrVZlu1ob/Kak2uH4xxrAEDpaD5KomECo/xrBcWgmFO1 OOBoT5v+JbQP8yyFuL9HMeI1MpRHzJmwcgvaVvcBmvheFG9eUULyA75AXTAFQB/kUnU9 nnyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=kUbIQJiwRrQ1KlVh3SQAgesefSQd+t19mACg/uMzcG0=; b=Xq6PLj4sYSXkQljQ5jyGdmZ8gsYJTpvlXvYmMYRXUvYHFr8gCHN1Br+wWidRXYWuEo UFG2K0lDziG1GIGF6v+mIi/nMLEECJfY7huYo5UVXb3ymvq1JRT1y7t8801dlrs3Esxw 8VmaToTJgXPegRB/PDEMbf1dNtcpzCddDMnx6JRqebEBolmu6XW95QBx9rM9nIDHIgNF zJ3xKCSa6wzU20ifcCKi7wnYew6OycMcX3VJ0Dx/dXCCJNY24uxL/LOOZpBuggrM7Ovh u4EfE69O6G+HTkZkww1h+zufwRdTx8GcWjD/whHn515ye8AKhwCaVeliCruHflKqbdgs BqbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gukPqjVf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k12si900171pgs.499.2018.02.06.11.23.29; Tue, 06 Feb 2018 11:23:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gukPqjVf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753229AbeBFTX1 (ORCPT + 18 others); Tue, 6 Feb 2018 14:23:27 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:35010 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751555AbeBFTXV (ORCPT ); Tue, 6 Feb 2018 14:23:21 -0500 Received: by mail-wm0-f65.google.com with SMTP id r78so6064519wme.0 for ; Tue, 06 Feb 2018 11:23:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=kUbIQJiwRrQ1KlVh3SQAgesefSQd+t19mACg/uMzcG0=; b=gukPqjVfGhZgHhx8gm3WK7piQdBrXqfNuBt/20n+sycDq48cZzefT3g7270xoRszlk jZMi6ysPlIjbhgoHsbc0hag7hXspDQz26TQzCxAvTgX3dP75FTJNHrV2C7kfqI5Fyxuy 5piWk8gIPI091va8wYkLVYi4oensvOZfZ6jGQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=kUbIQJiwRrQ1KlVh3SQAgesefSQd+t19mACg/uMzcG0=; b=pAs6OOzw1WJOTMReXEhldB1D3injyxrf4HNnyPTTD/HsD1F5NaDrmxEqQ65IOVTjk/ eeQkMOIQwPGKMuV3l25EA0sMjIZ/UAUyeHa1SrQoIyd3wuMRULTVoaui2nF0wrCf1qtK CCF8pGgy48mDHlY9r/+9XBNz71BmU5CEvzBkDa0iwWpNo3ln4BfpwTeDjTm6MCTrJc9G S+KDBFm4jA+3pDT0nU8ZLE94qezeX1xH8iCw6qQuVqGlWushxbiQzDEDoqAl1/kmsvGj fJs8vaYNLwR2HYXa9Wnr47HI0odoUnKcpHhEC2BcJQhaQ4+62U4D/fXKMs30WqpCwo8+ itPg== X-Gm-Message-State: APf1xPDZUUIUR5gJd6Ngc1qYYHgIu737DVxoFmvwJpJ+7xeZ2hjRFqlf HAItPdFepD5+xfmUc0BUOSAdpQ== X-Received: by 10.80.146.240 with SMTP id l45mr5013404eda.125.1517944999720; Tue, 06 Feb 2018 11:23:19 -0800 (PST) Received: from localhost.localdomain ([2a01:e0a:f:6020:29bb:43dd:7d11:e095]) by smtp.gmail.com with ESMTPSA id s5sm78155eda.60.2018.02.06.11.23.18 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 06 Feb 2018 11:23:18 -0800 (PST) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, valentin.schneider@arm.com Cc: morten.rasmussen@foss.arm.com, brendan.jackman@arm.com, dietmar.eggemann@arm.com, Vincent Guittot Subject: [PATCH v2 1/3] sched: Stop nohz stats when decayed Date: Tue, 6 Feb 2018 20:23:05 +0100 Message-Id: <1517944987-343-2-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1517944987-343-1-git-send-email-vincent.guittot@linaro.org> References: <1517944987-343-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Stopped the periodic update of blocked load when all idle CPUs have fully decayed. We introduce a new nohz.has_blocked that reflect if some idle CPUs has blocked load that have to be periodiccally updated. nohz.has_blocked is set everytime that a Idle CPU can have blocked load and it is then clear when no more blocked load has been detected during an update. We don't need atomic operation but only to make cure of the right ordering when updating nohz.idle_cpus_mask and nohz.has_blocked. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++----------- kernel/sched/sched.h | 1 + 2 files changed, 75 insertions(+), 20 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7af1fa9..b9660b5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5383,8 +5383,9 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx) static struct { cpumask_var_t idle_cpus_mask; atomic_t nr_cpus; + int has_blocked; /* Idle CPUS has blocked load */ unsigned long next_balance; /* in jiffy units */ - unsigned long next_stats; + unsigned long next_blocked; /* Next update of blocked load in jiffies */ } nohz ____cacheline_aligned; #endif /* CONFIG_NO_HZ_COMMON */ @@ -6951,6 +6952,7 @@ enum fbq_type { regular, remote, all }; #define LBF_DST_PINNED 0x04 #define LBF_SOME_PINNED 0x08 #define LBF_NOHZ_STATS 0x10 +#define LBF_NOHZ_AGAIN 0x20 struct lb_env { struct sched_domain *sd; @@ -7335,8 +7337,6 @@ static void attach_tasks(struct lb_env *env) rq_unlock(env->dst_rq, &rf); } -#ifdef CONFIG_FAIR_GROUP_SCHED - static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) { if (cfs_rq->load.weight) @@ -7354,11 +7354,14 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) return true; } +#ifdef CONFIG_FAIR_GROUP_SCHED + static void update_blocked_averages(int cpu) { struct rq *rq = cpu_rq(cpu); struct cfs_rq *cfs_rq, *pos; struct rq_flags rf; + bool done = true; rq_lock_irqsave(rq, &rf); update_rq_clock(rq); @@ -7388,10 +7391,14 @@ static void update_blocked_averages(int cpu) */ if (cfs_rq_is_decayed(cfs_rq)) list_del_leaf_cfs_rq(cfs_rq); + else + done = false; } #ifdef CONFIG_NO_HZ_COMMON rq->last_blocked_load_update_tick = jiffies; + if (done) + rq->has_blocked_load = 0; #endif rq_unlock_irqrestore(rq, &rf); } @@ -7454,6 +7461,8 @@ static inline void update_blocked_averages(int cpu) update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq); #ifdef CONFIG_NO_HZ_COMMON rq->last_blocked_load_update_tick = jiffies; + if (cfs_rq_is_decayed(cfs_rq)) + rq->has_blocked_load = 0; #endif rq_unlock_irqrestore(rq, &rf); } @@ -7789,18 +7798,25 @@ group_type group_classify(struct sched_group *group, return group_other; } -static void update_nohz_stats(struct rq *rq) +static bool update_nohz_stats(struct rq *rq) { #ifdef CONFIG_NO_HZ_COMMON unsigned int cpu = rq->cpu; + if (!rq->has_blocked_load) + return false; + if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask)) - return; + return false; if (!time_after(jiffies, rq->last_blocked_load_update_tick)) - return; + return true; update_blocked_averages(cpu); + + return rq->has_blocked_load; +#else + return false; #endif } @@ -7826,8 +7842,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, for_each_cpu_and(i, sched_group_span(group), env->cpus) { struct rq *rq = cpu_rq(i); - if (env->flags & LBF_NOHZ_STATS) - update_nohz_stats(rq); + if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq)) + env->flags |= LBF_NOHZ_AGAIN; /* Bias balancing toward cpus of our domain */ if (local_group) @@ -7979,18 +7995,15 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd struct sg_lb_stats *local = &sds->local_stat; struct sg_lb_stats tmp_sgs; int load_idx, prefer_sibling = 0; + int has_blocked = READ_ONCE(nohz.has_blocked); bool overload = false; if (child && child->flags & SD_PREFER_SIBLING) prefer_sibling = 1; #ifdef CONFIG_NO_HZ_COMMON - if (env->idle == CPU_NEWLY_IDLE) { + if (env->idle == CPU_NEWLY_IDLE && has_blocked) env->flags |= LBF_NOHZ_STATS; - - if (cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) - nohz.next_stats = jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD); - } #endif load_idx = get_sd_load_idx(env->sd, env->idle); @@ -8046,6 +8059,15 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd sg = sg->next; } while (sg != env->sd->groups); +#ifdef CONFIG_NO_HZ_COMMON + if ((env->flags & LBF_NOHZ_AGAIN) && + cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) { + + WRITE_ONCE(nohz.next_blocked, + jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD)); + } +#endif + if (env->sd->flags & SD_NUMA) env->fbq_type = fbq_classify_group(&sds->busiest_stat); @@ -9069,6 +9091,8 @@ static void nohz_balancer_kick(struct rq *rq) struct sched_domain *sd; int nr_busy, i, cpu = rq->cpu; unsigned int flags = 0; + unsigned long has_blocked = READ_ONCE(nohz.has_blocked); + unsigned long next_blocked = READ_ONCE(nohz.next_blocked); if (unlikely(rq->idle_balance)) return; @@ -9086,7 +9110,7 @@ static void nohz_balancer_kick(struct rq *rq) if (likely(!atomic_read(&nohz.nr_cpus))) return; - if (time_after(now, nohz.next_stats)) + if (time_after(now, next_blocked) && has_blocked) flags = NOHZ_STATS_KICK; if (time_before(now, nohz.next_balance)) @@ -9207,13 +9231,15 @@ void nohz_balance_enter_idle(int cpu) if (!housekeeping_cpu(cpu, HK_FLAG_SCHED)) return; + rq->has_blocked_load = 1; + if (rq->nohz_tick_stopped) - return; + goto out; /* * If we're a completely isolated CPU, we don't play. */ - if (on_null_domain(cpu_rq(cpu))) + if (on_null_domain(rq)) return; rq->nohz_tick_stopped = 1; @@ -9222,6 +9248,13 @@ void nohz_balance_enter_idle(int cpu) atomic_inc(&nohz.nr_cpus); set_cpu_sd_state_idle(cpu); + +out: + /* + * Each time a cpu enter idle, we assume that it has blocked load and + * enable the periodic update of the load of idle cpus + */ + WRITE_ONCE(nohz.has_blocked, 1); } #else static inline void nohz_balancer_kick(struct rq *rq) { } @@ -9374,6 +9407,16 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK); + /* + * We assume there will be no idle load after this update and clear + * the has_blocked flag. If a cpu enters idle in the mean time, it will + * set the has_blocked flag and trig another update of idle load. + * Because a cpu that becomes idle, is added to idle_cpus_mask before + * setting the flag, we are sure to not clear the state and not + * check the load of an idle cpu. + */ + WRITE_ONCE(nohz.has_blocked, 0); + for_each_cpu(balance_cpu, nohz.idle_cpus_mask) { if (balance_cpu == this_cpu || !idle_cpu(balance_cpu)) continue; @@ -9383,11 +9426,16 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) * work being done for other cpus. Next load * balancing owner will pick it up. */ - if (need_resched()) - break; + if (need_resched()) { + has_blocked_load = true; + goto abort; + } rq = cpu_rq(balance_cpu); + update_blocked_averages(rq->cpu); + has_blocked_load |= rq->has_blocked_load; + /* * If time for next balance is due, * do the balance. @@ -9400,7 +9448,6 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) cpu_load_update_idle(rq); rq_unlock_irq(rq, &rf); - update_blocked_averages(rq->cpu); if (flags & NOHZ_BALANCE_KICK) rebalance_domains(rq, CPU_IDLE); } @@ -9415,7 +9462,13 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) if (flags & NOHZ_BALANCE_KICK) rebalance_domains(this_rq, CPU_IDLE); - nohz.next_stats = next_stats; + WRITE_ONCE(nohz.next_blocked, + now + msecs_to_jiffies(LOAD_AVG_PERIOD)); + +abort: + /* There is still blocked load, enable periodic update */ + if (has_blocked_load) + WRITE_ONCE(nohz.has_blocked, 1); /* * next_balance will be updated only when there is a need. @@ -10046,6 +10099,7 @@ __init void init_sched_fair_class(void) #ifdef CONFIG_NO_HZ_COMMON nohz.next_balance = jiffies; + nohz.next_blocked = jiffies; zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT); #endif #endif /* SMP */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index e200045..ad9b929 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -723,6 +723,7 @@ struct rq { #ifdef CONFIG_SMP unsigned long last_load_update_tick; unsigned long last_blocked_load_update_tick; + unsigned int has_blocked_load; #endif /* CONFIG_SMP */ unsigned int nohz_tick_stopped; atomic_t nohz_flags;