From patchwork Fri Oct 18 13:26:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176831 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861305ill; Fri, 18 Oct 2019 06:26:53 -0700 (PDT) X-Google-Smtp-Source: APXvYqwwDMqxuXyTwBovma9wNFevhqWSacOQNvWoYuaZ2j6NPya1/HMxY5qXN7ic3lUA700BZS1/ X-Received: by 2002:a17:906:d214:: with SMTP id w20mr8524872ejz.68.1571405213082; Fri, 18 Oct 2019 06:26:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405213; cv=none; d=google.com; s=arc-20160816; b=nFYWPyqVSdqD+FJ9UuZN0Ib7Ux/GycOwlglOgeBKKMNcOwzrM7EIMDcrMT56336Sab aQIYkbMrIrbpsdXt4bEkPpKpQLLqOfGHEiLs247/nzb6ULXGYfGgQDfSHS1NIxxVA4xq 4HBm5RqP/GFwD22dxT6lIhnIqPjwcrK4M7mig4UNmNXOTD0lsprRGaOnS2FBlQ3HobYD t2s7MUo4aBdhrqtWqx4X/JC8avAd1J+MtKHj48rDwEqHnH7IoJrTpe8n+d4mIuYevi1J 2jwApEg/nWVjOKaullVTpL/GQoJFMMBIDckMzE2gsdavn/cBNJOYaX6LWiYSnOa78FHv L2Iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=Hfv3JPEuhHlwsh1NcK/qqJPBcbuf8k3CcNDkmEcfSbA=; b=MtK0Ag5QWEPKxJJoiMpUuKJR/LrUdY4Q7CAGariliu7vnzWtqO8UhZMiWZDSH+ICIO ZC7HmtwwTx3/rC1gvB1GPsVfzCxGd1oqzJ4o38N6dSFOm36JMd8MtT7b7FK8mqMJynmy 2kiRG3flDi756QnyWnOCay0Wr5IV+rVlN+GquKAlcsQJZ3+4z3TNqq1hh2Z1MQaHqRGM oFtUdzniW45yrshk5RVYtpC6suDslGeev+ZH3sj/XyFLRgKMiWBfU4KvE1KHft8wZS3D s/aTUxvAXG++C4bU9DWXwbfLZ1Vtf35LINS47dWWPmDwKMqfvV0STTByp77MjFb+wqcV EKUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QM1sCHgN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z6si3531217edr.443.2019.10.18.06.26.52; Fri, 18 Oct 2019 06:26:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QM1sCHgN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408406AbfJRN0u (ORCPT + 26 others); Fri, 18 Oct 2019 09:26:50 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:38919 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729109AbfJRN0s (ORCPT ); Fri, 18 Oct 2019 09:26:48 -0400 Received: by mail-wm1-f68.google.com with SMTP id v17so6097004wml.4 for ; Fri, 18 Oct 2019 06:26:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Hfv3JPEuhHlwsh1NcK/qqJPBcbuf8k3CcNDkmEcfSbA=; b=QM1sCHgNLaKlYdW+aYM3bL2RnkBeA9SsMgSa9URgCI267/vOddN1HT25EfE1ia+8hb 4Fh2qI48WM4jrPhf+O3U2YUn80ZwjuOEEQd2i+FLy9PrcLjWuXEYnoZR38bWm69CCDlv CrHIuX73DPZLNKoSm82alLSod/V6IODGnHkCq09RomYY+B8GgESJje+yfwJEe1IBiWCe UPw4lkfmW/hXS5TJMg0rhH6yaKtgQ7h/e7MNDoyxdMB7tfQuhWuiS1o7c79+u7zX9utj okDDrQ3+rSEJDCepPaLvzbJaUp9rvowo+exNU26IKYlDYOwLuEhz5HUIi/WK3nstJLDz /GNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Hfv3JPEuhHlwsh1NcK/qqJPBcbuf8k3CcNDkmEcfSbA=; b=ipxr3tAyESBCXT3BMaB0q84fowyA2WNM6wmiHlDyJuW6FfPOE67nFu5LDjbymyJdX3 QVp8LKjRZi3X5BG5dxxKbc0f+3iYVc0Zr63X+sKuL+yo02lNhoQAKTKjBLw9CQF/SGYF TIfsu8M7uYJ+daWgdapT24FHBdKYE+cl+XcpRNIQ1IxTPEZMA/J28aidjWrxMl7EZawv NODIRSb9kS5tqVwUcyEo+L8inaobborvjBDXMDQSnhbVetO9pEpJGWB5GysFIP9dkThN otwMHjvj4CoKqseelXhTPagStJTtBHbbMHuabBG/DaSaSnFY58jMWOILsDfaLQQQG51w AfMw== X-Gm-Message-State: APjAAAXKR+7EKFrxCUF6cEd7bxEjr7AsPu7P9d03DgBkke2p3UCCr4N/ WoSmTCJ3KKDHPMrn3AsrSlYsKbfHzYY= X-Received: by 2002:a05:600c:23cc:: with SMTP id p12mr3591536wmb.163.1571405205387; Fri, 18 Oct 2019 06:26:45 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.26.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:26:44 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 01/11] sched/fair: clean up asym packing Date: Fri, 18 Oct 2019 15:26:28 +0200 Message-Id: <1571405198-27570-2-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Clean up asym packing to follow the default load balance behavior: - classify the group by creating a group_asym_packing field. - calculate the imbalance in calculate_imbalance() instead of bypassing it. We don't need to test twice same conditions anymore to detect asym packing and we consolidate the calculation of imbalance in calculate_imbalance(). There is no functional changes. Signed-off-by: Vincent Guittot Acked-by: Rik van Riel --- kernel/sched/fair.c | 63 ++++++++++++++--------------------------------------- 1 file changed, 16 insertions(+), 47 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f0a5e1..617145c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7675,6 +7675,7 @@ struct sg_lb_stats { unsigned int group_weight; enum group_type group_type; int group_no_capacity; + unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */ unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; @@ -8129,9 +8130,17 @@ static bool update_sd_pick_busiest(struct lb_env *env, * ASYM_PACKING needs to move all the work to the highest * prority CPUs in the group, therefore mark all groups * of lower priority than ourself as busy. + * + * This is primarily intended to used at the sibling level. Some + * cores like POWER7 prefer to use lower numbered SMT threads. In the + * case of POWER7, it can move to lower SMT modes only when higher + * threads are idle. When in lower SMT modes, the threads will + * perform better since they share less core resources. Hence when we + * have idle threads, we want them to be the higher ones. */ if (sgs->sum_nr_running && sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { + sgs->group_asym_packing = 1; if (!sds->busiest) return true; @@ -8273,51 +8282,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd } /** - * check_asym_packing - Check to see if the group is packed into the - * sched domain. - * - * This is primarily intended to used at the sibling level. Some - * cores like POWER7 prefer to use lower numbered SMT threads. In the - * case of POWER7, it can move to lower SMT modes only when higher - * threads are idle. When in lower SMT modes, the threads will - * perform better since they share less core resources. Hence when we - * have idle threads, we want them to be the higher ones. - * - * This packing function is run on idle threads. It checks to see if - * the busiest CPU in this domain (core in the P7 case) has a higher - * CPU number than the packing function is being run on. Here we are - * assuming lower CPU number will be equivalent to lower a SMT thread - * number. - * - * Return: 1 when packing is required and a task should be moved to - * this CPU. The amount of the imbalance is returned in env->imbalance. - * - * @env: The load balancing environment. - * @sds: Statistics of the sched_domain which is to be packed - */ -static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds) -{ - int busiest_cpu; - - if (!(env->sd->flags & SD_ASYM_PACKING)) - return 0; - - if (env->idle == CPU_NOT_IDLE) - return 0; - - if (!sds->busiest) - return 0; - - busiest_cpu = sds->busiest->asym_prefer_cpu; - if (sched_asym_prefer(busiest_cpu, env->dst_cpu)) - return 0; - - env->imbalance = sds->busiest_stat.group_load; - - return 1; -} - -/** * fix_small_imbalance - Calculate the minor imbalance that exists * amongst the groups of a sched_domain, during * load balancing. @@ -8401,6 +8365,11 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s local = &sds->local_stat; busiest = &sds->busiest_stat; + if (busiest->group_asym_packing) { + env->imbalance = busiest->group_load; + return; + } + if (busiest->group_type == group_imbalanced) { /* * In the group_imb case we cannot rely on group-wide averages @@ -8505,8 +8474,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env) busiest = &sds.busiest_stat; /* ASYM feature bypasses nice load balance check */ - if (check_asym_packing(env, &sds)) - return sds.busiest; + if (busiest->group_asym_packing) + goto force_balance; /* There is no busy sibling group to pull tasks from */ if (!sds.busiest || busiest->sum_nr_running == 0) From patchwork Fri Oct 18 13:26:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176832 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861414ill; Fri, 18 Oct 2019 06:26:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqw0RZ88UHQL2ns2uR92UVzr0YSmAvjMiHSBr8JMn0+52yBTYgTurpVwh6JkppS0kEAUs320 X-Received: by 2002:a17:906:1342:: with SMTP id x2mr8607549ejb.304.1571405218794; Fri, 18 Oct 2019 06:26:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405218; cv=none; d=google.com; s=arc-20160816; b=c70wiX8MOYevGKCBvs+5hqEyNsH1bvoV1XyCt6PHTJyTNAdMN8YpuBryjmVY2tdNVF YPJ4xb2lvm3g9lAefmTez1D5AHMlK/sGzTe0pkrEW75Eclz3Oz9bnAAMX3km/2fzZaUw VP3fHggYlMKCMxIzz4lFNQI5r98xQsRWDRtBqZKqxzm0BPOaszcEeiiz2zAanZ9T2BHE AlNl6ygNPrAlZe30rHplE0he7iCv3p38v7aivJOM/I2zOOfHMONkrmqXO++lfEUrIVNs 3b6hFjFJTcqr9g1XsnWlqwyS2JP1QLYXrw5RDv2qOEfzsfy26Whkg8/zAkkIxzrl/OEW WH3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=NQ/eLJSNkXW8LLvqvhxjiDQg9h5mHrnthUFHJ+Kv0ak=; b=cnqEDW18NXgGBdaH064mmF39wUK9a6YZPLLpQAXG5zNOLHyrCNppGw3g1K9XaK61My 9Y2at4zjJmXSFjoY7Kt0KTa3d4RJnQxVJue7h6/ZEVcRDzbV3LQhiIauP6ZIKDlJt/FH J/KrWgt7IzIs5EjnVNzdlms2r02INZJl46mM2BD7yDHM/uOUKzBjZ0+N6nE4NPFG3rP0 INm0Sa08yY+J0Q6b87kOXO7bJ3DH0CwriBUE02z144B/mlAa53iClvjpjWks+pZg1M5x Vb1pQu6K3WkfqUT8RYufFlCT+oe4jJCTQCTcO6swnC5TIetIR/yHBpwMldyHY5BqtpTl tHOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=xLuEtlbm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q4si3371475eji.152.2019.10.18.06.26.58; Fri, 18 Oct 2019 06:26:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=xLuEtlbm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410301AbfJRN0x (ORCPT + 26 others); Fri, 18 Oct 2019 09:26:53 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:44529 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728429AbfJRN0u (ORCPT ); Fri, 18 Oct 2019 09:26:50 -0400 Received: by mail-wr1-f66.google.com with SMTP id z9so6250903wrl.11 for ; Fri, 18 Oct 2019 06:26:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=NQ/eLJSNkXW8LLvqvhxjiDQg9h5mHrnthUFHJ+Kv0ak=; b=xLuEtlbmH+Nznuz5yb+VUq02dwTDaV1iJR9HpM9SMb+dWkPSwWtUAsSrD6IQiwLF3e 7q0/3FbOoEktY3dlzDqBeX/UeryUQb+IMGdhl5AxF2kfpiyVADAYak3J/CefUoQY3ibD 2oE3NL5Ab/hC5PBAwckf38DxtLA8Hcia5Kb5d9dtG4AWbuYWEmuqTDwR1YWM/cmm0OVW RmRvyHjfLJ+i8ndw8RtMy4T5YDhllFoIr5K/PedxIX3F+BL5QXelO87E2OrWgqauTTRn YXt0/voFH6DelLCfHpuCdz5wuPe6HfC1c3crCfbgoAsh2dZYqesUCBoSHFDrzjb48Pap rwtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=NQ/eLJSNkXW8LLvqvhxjiDQg9h5mHrnthUFHJ+Kv0ak=; b=MYNY69sI8GYNQIoDGFgyLKItwdgNLCKaeDeAHRQ9qU5bgs2Xzak5BlnhR7Z0aXfciP BDlzsjV+dfzzBa23O8xMgEbPOlylt5RvmVCr3/U0lw/hjGhFSwUkl0S7MbrAsu5QtQQv 4JK4Vqj7ZIB62j0nNahoO8loBt5XjLX5jl806SusvtIJVzUuBmTXz0PaqDgKavAquOcU FAP4btrl/IPtS8/jQxtFbGdB5/Pu8obgwKednFUr+zL/1u4gruqclQ5vbS7dPA429wpT +c7is4LT5WiqL+uxUPzxXexhMG15ryuKiUpL6ULQy3a37qWKzYf+k0gmpuAE60oFp19t +gMw== X-Gm-Message-State: APjAAAUfQEhYfOqBJIXsMOpFEOfGl5TyjmPx5jHaG6N8r4fh6T9xRgRJ e3Z/qCQn0g4ZmMfFEvwRq4ysrHuIsw4= X-Received: by 2002:adf:f152:: with SMTP id y18mr8106740wro.285.1571405207340; Fri, 18 Oct 2019 06:26:47 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.26.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:26:45 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 02/11] sched/fair: rename sum_nr_running to sum_h_nr_running Date: Fri, 18 Oct 2019 15:26:29 +0200 Message-Id: <1571405198-27570-3-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rename sum_nr_running to sum_h_nr_running because it effectively tracks cfs->h_nr_running so we can use sum_nr_running to track rq->nr_running when needed. There is no functional changes. Signed-off-by: Vincent Guittot Acked-by: Rik van Riel Reviewed-by: Valentin Schneider --- kernel/sched/fair.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) -- 2.7.4 Acked-by: Mel Gorman diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 617145c..9a2aceb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7670,7 +7670,7 @@ struct sg_lb_stats { unsigned long load_per_task; unsigned long group_capacity; unsigned long group_util; /* Total utilization of the group */ - unsigned int sum_nr_running; /* Nr tasks running in the group */ + unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */ unsigned int idle_cpus; unsigned int group_weight; enum group_type group_type; @@ -7715,7 +7715,7 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds) .total_capacity = 0UL, .busiest_stat = { .avg_load = 0UL, - .sum_nr_running = 0, + .sum_h_nr_running = 0, .group_type = group_other, }, }; @@ -7906,7 +7906,7 @@ static inline int sg_imbalanced(struct sched_group *group) static inline bool group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_nr_running < sgs->group_weight) + if (sgs->sum_h_nr_running < sgs->group_weight) return true; if ((sgs->group_capacity * 100) > @@ -7927,7 +7927,7 @@ group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) static inline bool group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_nr_running <= sgs->group_weight) + if (sgs->sum_h_nr_running <= sgs->group_weight) return false; if ((sgs->group_capacity * 100) < @@ -8019,7 +8019,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_load += cpu_runnable_load(rq); sgs->group_util += cpu_util(i); - sgs->sum_nr_running += rq->cfs.h_nr_running; + sgs->sum_h_nr_running += rq->cfs.h_nr_running; nr_running = rq->nr_running; if (nr_running > 1) @@ -8049,8 +8049,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_capacity = group->sgc->capacity; sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; - if (sgs->sum_nr_running) - sgs->load_per_task = sgs->group_load / sgs->sum_nr_running; + if (sgs->sum_h_nr_running) + sgs->load_per_task = sgs->group_load / sgs->sum_h_nr_running; sgs->group_weight = group->group_weight; @@ -8107,7 +8107,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, * capable CPUs may harm throughput. Maximize throughput, * power/energy consequences are not considered. */ - if (sgs->sum_nr_running <= sgs->group_weight && + if (sgs->sum_h_nr_running <= sgs->group_weight && group_smaller_min_cpu_capacity(sds->local, sg)) return false; @@ -8138,7 +8138,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, * perform better since they share less core resources. Hence when we * have idle threads, we want them to be the higher ones. */ - if (sgs->sum_nr_running && + if (sgs->sum_h_nr_running && sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { sgs->group_asym_packing = 1; if (!sds->busiest) @@ -8156,9 +8156,9 @@ static bool update_sd_pick_busiest(struct lb_env *env, #ifdef CONFIG_NUMA_BALANCING static inline enum fbq_type fbq_classify_group(struct sg_lb_stats *sgs) { - if (sgs->sum_nr_running > sgs->nr_numa_running) + if (sgs->sum_h_nr_running > sgs->nr_numa_running) return regular; - if (sgs->sum_nr_running > sgs->nr_preferred_running) + if (sgs->sum_h_nr_running > sgs->nr_preferred_running) return remote; return all; } @@ -8233,7 +8233,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd */ if (prefer_sibling && sds->local && group_has_capacity(env, local) && - (sgs->sum_nr_running > local->sum_nr_running + 1)) { + (sgs->sum_h_nr_running > local->sum_h_nr_running + 1)) { sgs->group_no_capacity = 1; sgs->group_type = group_classify(sg, sgs); } @@ -8245,7 +8245,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd next_group: /* Now, start updating sd_lb_stats */ - sds->total_running += sgs->sum_nr_running; + sds->total_running += sgs->sum_h_nr_running; sds->total_load += sgs->group_load; sds->total_capacity += sgs->group_capacity; @@ -8299,7 +8299,7 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) local = &sds->local_stat; busiest = &sds->busiest_stat; - if (!local->sum_nr_running) + if (!local->sum_h_nr_running) local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); else if (busiest->load_per_task > local->load_per_task) imbn = 1; @@ -8397,7 +8397,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s */ if (busiest->group_type == group_overloaded && local->group_type == group_overloaded) { - load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE; + load_above_capacity = busiest->sum_h_nr_running * SCHED_CAPACITY_SCALE; if (load_above_capacity > busiest->group_capacity) { load_above_capacity -= busiest->group_capacity; load_above_capacity *= scale_load_down(NICE_0_LOAD); @@ -8478,7 +8478,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* There is no busy sibling group to pull tasks from */ - if (!sds.busiest || busiest->sum_nr_running == 0) + if (!sds.busiest || busiest->sum_h_nr_running == 0) goto out_balanced; /* XXX broken for overlapping NUMA groups */ From patchwork Fri Oct 18 13:26:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176834 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861441ill; Fri, 18 Oct 2019 06:26:59 -0700 (PDT) X-Google-Smtp-Source: APXvYqxDUDWNyrRMLakaPH1rWE6SS6Td2H8DrzvJ4ZEEEeEna8SAB7/WsT8YGUWuacG81OwTxcwM X-Received: by 2002:a17:906:cc90:: with SMTP id oq16mr8482650ejb.322.1571405219746; Fri, 18 Oct 2019 06:26:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405219; cv=none; d=google.com; s=arc-20160816; b=ZqRStqgOofQIn1/Tnoyi7x10J4Jzzd/wFxcxZKkleZuxmbiCi+jt5YD2PQa/EJiCyz 1NHpiYN3oj1Tfi7DKE+0yTBTcqM1vthbvjTSEXol5IDSvRu3n3ukhghcslzeMen0y+kn w3aASxKacVtM537hoShsHt3upETxkW9AbeO7+mTjBDQJhJy+Y9ggZuG1zt/FvQcxaZa4 FiR6+fH7rK+PFrLj0KNSmnem7s3EQQuglGSSTQNs9FB+rozkAsdNJynw+HScvP81VkD6 hBnZrFRTBcfr7qiiFLT2ukqqmvOtOUQ2UC8M85tXu7c5jy1NJe2lpx28XV95lJiKKfNZ feJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=UTQ1SciiKePYzOmVVy3DyjwjaDHHkwpywXBi+0rOpxQ=; b=MYEKdIcRi+2CG6Et3g9QFc6aofqXn9c5YXxDNCJtkz7VSneffNqtr27mYTLpvh/UOE 3DBQ3H6FFNogH1rcxiwwbqc3T2ZgnerTl1LfKmgBQWkLBMc4MRiIuNde/WZg3OSKEjzW /lrAEhtlx30/M/f9o3HC8aeOgcF1+VJJDxWV+C4XVlRhZrA5qOJl/8rBXZcy5dW8rELp fXGujShlpOdsPMkTj9f8DYBXByL4w8oNLJuid1Ml+1SUgG5PaMe6Tn4DUYtjdKQg3sHn R98iA/L26tCFoXDchjLH6LAfKq4DCyQ+XDunOzxsIoxfloBbmHmI9cEdOTs3PNtdp+uS s3Yw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QXR90jeK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q4si3371475eji.152.2019.10.18.06.26.59; Fri, 18 Oct 2019 06:26:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QXR90jeK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410322AbfJRN06 (ORCPT + 26 others); Fri, 18 Oct 2019 09:26:58 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:40142 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2410292AbfJRN0w (ORCPT ); Fri, 18 Oct 2019 09:26:52 -0400 Received: by mail-wr1-f67.google.com with SMTP id o28so6271773wro.7 for ; Fri, 18 Oct 2019 06:26:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UTQ1SciiKePYzOmVVy3DyjwjaDHHkwpywXBi+0rOpxQ=; b=QXR90jeKm05HUBMzUKPG93fbL6a4P5nHC1TJmpDv528sLkrBIwVZKNq/OQC/K458Un uxBioUPxoSm+VwhN1IfQqkhRs1DOxdr/UebBO3l/UFi9v7LXNo6GekREPyB1pb/Zh5Cv eFW37PbpACZ7F5kImbRvXv6cTZmJzDoAky6U6fldf2YL5q3PGATsrrl7vxJ5sc8zKpMd WIXzgJOUQKhM2ulfj7pLDF6EUyR7yUlqAixaeQlTGEJ0be3DN5tfe7KQpMppOlVIh+Aa NH8L4R9xNW/tlOQe4kWK0g9AIOdRmYuMp2i501s9eVH9ygXAHpFesaH0yzlhQ2hhZeeC bgyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UTQ1SciiKePYzOmVVy3DyjwjaDHHkwpywXBi+0rOpxQ=; b=sbO7TkMGJ7OrIhvKF6K29sdZrhV7ql0IsaOLajX0vAAPQl+vzNo0Vt9EH5Qw4ze5iW Cw/ojyLpaZDi6/NB0qdRwehM0nlBigMdrq2Q0tZvDqGm3i9X24C8zKAx+UM8o0XeHhE0 hMy35GcSR85h7ogopZHNAvXkB0V1V8oQDpylhsDfMJEc4f2W4xkwhBUkD7GDXGUth2ED D0nZ86pd7B4nRXQd9ElobbJC8qn7qLDQD885VuL+KNIq41o1wPZN3uqikJLu7q7YNk8F bnreuN1An0jWlLtGNlzpVzE5MoFC/ed3rARbw8HbvKZ8FDkk9LcoahdrbKjlEIT/NUC8 aWlg== X-Gm-Message-State: APjAAAUrP9wx+/l4e5+UektKAadOONhDSfax5YZBaSht5hzYfsi2hGML d87iU1L0ZwsYZJ7oEs363cbUuR3x9fE= X-Received: by 2002:adf:de85:: with SMTP id w5mr7613678wrl.278.1571405209399; Fri, 18 Oct 2019 06:26:49 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.26.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:26:48 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 03/11] sched/fair: remove meaningless imbalance calculation Date: Fri, 18 Oct 2019 15:26:30 +0200 Message-Id: <1571405198-27570-4-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org clean up load_balance and remove meaningless calculation and fields before adding new algorithm. Signed-off-by: Vincent Guittot Acked-by: Rik van Riel --- kernel/sched/fair.c | 105 +--------------------------------------------------- 1 file changed, 1 insertion(+), 104 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9a2aceb..e004841 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5390,18 +5390,6 @@ static unsigned long capacity_of(int cpu) return cpu_rq(cpu)->cpu_capacity; } -static unsigned long cpu_avg_load_per_task(int cpu) -{ - struct rq *rq = cpu_rq(cpu); - unsigned long nr_running = READ_ONCE(rq->cfs.h_nr_running); - unsigned long load_avg = cpu_runnable_load(rq); - - if (nr_running) - return load_avg / nr_running; - - return 0; -} - static void record_wakee(struct task_struct *p) { /* @@ -7667,7 +7655,6 @@ static unsigned long task_h_load(struct task_struct *p) struct sg_lb_stats { unsigned long avg_load; /*Avg load across the CPUs of the group */ unsigned long group_load; /* Total load over the CPUs of the group */ - unsigned long load_per_task; unsigned long group_capacity; unsigned long group_util; /* Total utilization of the group */ unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */ @@ -8049,9 +8036,6 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_capacity = group->sgc->capacity; sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; - if (sgs->sum_h_nr_running) - sgs->load_per_task = sgs->group_load / sgs->sum_h_nr_running; - sgs->group_weight = group->group_weight; sgs->group_no_capacity = group_is_overloaded(env, sgs); @@ -8282,76 +8266,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd } /** - * fix_small_imbalance - Calculate the minor imbalance that exists - * amongst the groups of a sched_domain, during - * load balancing. - * @env: The load balancing environment. - * @sds: Statistics of the sched_domain whose imbalance is to be calculated. - */ -static inline -void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) -{ - unsigned long tmp, capa_now = 0, capa_move = 0; - unsigned int imbn = 2; - unsigned long scaled_busy_load_per_task; - struct sg_lb_stats *local, *busiest; - - local = &sds->local_stat; - busiest = &sds->busiest_stat; - - if (!local->sum_h_nr_running) - local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); - else if (busiest->load_per_task > local->load_per_task) - imbn = 1; - - scaled_busy_load_per_task = - (busiest->load_per_task * SCHED_CAPACITY_SCALE) / - busiest->group_capacity; - - if (busiest->avg_load + scaled_busy_load_per_task >= - local->avg_load + (scaled_busy_load_per_task * imbn)) { - env->imbalance = busiest->load_per_task; - return; - } - - /* - * OK, we don't have enough imbalance to justify moving tasks, - * however we may be able to increase total CPU capacity used by - * moving them. - */ - - capa_now += busiest->group_capacity * - min(busiest->load_per_task, busiest->avg_load); - capa_now += local->group_capacity * - min(local->load_per_task, local->avg_load); - capa_now /= SCHED_CAPACITY_SCALE; - - /* Amount of load we'd subtract */ - if (busiest->avg_load > scaled_busy_load_per_task) { - capa_move += busiest->group_capacity * - min(busiest->load_per_task, - busiest->avg_load - scaled_busy_load_per_task); - } - - /* Amount of load we'd add */ - if (busiest->avg_load * busiest->group_capacity < - busiest->load_per_task * SCHED_CAPACITY_SCALE) { - tmp = (busiest->avg_load * busiest->group_capacity) / - local->group_capacity; - } else { - tmp = (busiest->load_per_task * SCHED_CAPACITY_SCALE) / - local->group_capacity; - } - capa_move += local->group_capacity * - min(local->load_per_task, local->avg_load + tmp); - capa_move /= SCHED_CAPACITY_SCALE; - - /* Move if we gain throughput */ - if (capa_move > capa_now) - env->imbalance = busiest->load_per_task; -} - -/** * calculate_imbalance - Calculate the amount of imbalance present within the * groups of a given sched_domain during load balance. * @env: load balance environment @@ -8370,15 +8284,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s return; } - if (busiest->group_type == group_imbalanced) { - /* - * In the group_imb case we cannot rely on group-wide averages - * to ensure CPU-load equilibrium, look at wider averages. XXX - */ - busiest->load_per_task = - min(busiest->load_per_task, sds->avg_load); - } - /* * Avg load of busiest sg can be less and avg load of local sg can * be greater than avg load across all sgs of sd because avg load @@ -8389,7 +8294,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s (busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load)) { env->imbalance = 0; - return fix_small_imbalance(env, sds); + return; } /* @@ -8427,14 +8332,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s busiest->group_misfit_task_load); } - /* - * if *imbalance is less than the average load per runnable task - * there is no guarantee that any tasks will be moved so we'll have - * a think about bumping its value to force at least one task to be - * moved - */ - if (env->imbalance < busiest->load_per_task) - return fix_small_imbalance(env, sds); } /******* find_busiest_group() helpers end here *********************/ From patchwork Fri Oct 18 13:26:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176841 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp862022ill; Fri, 18 Oct 2019 06:27:28 -0700 (PDT) X-Google-Smtp-Source: APXvYqwMwVKaQm5OkCoNCBcG9ZMK7+dnIx4tZVI0oinXsMD/g+N+NXLcUQzDWMy+wmLxqXXNQKKa X-Received: by 2002:a17:906:3016:: with SMTP id 22mr8357181ejz.227.1571405247828; Fri, 18 Oct 2019 06:27:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405247; cv=none; d=google.com; s=arc-20160816; b=lRTYx/im8rtcqh7LT5ixVrMBK6scACEJIKFnnqmUUSpBlm+zN3Og1Fwk4J/hmE8L0k okfkuLtti85TYpzX6whSPUF/gcARsmt96376ZeDgjIoyElHvpUmpWt7xhxCAiJBhhX5D GdScyH4Q2TK+Fkv/DD5J4dgzrZstUjMsGVhbzLZmpCv2DsBf26nudgbUwmQOQrnFuRk4 BQEnxKgP0+z3drttk6DzXo+8Z8yvtDVs1UdawcEGDAImpI+t67PqMFg2agzNvM7YD13k dm/jHnhOaio4NR5zD0b3KsuNoBABzGPXpdvj7kRYNC7T9l0QbeImpz80QAIhDIqD8psG iJlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=DD+OOZJusUQze5RChHgFxDyxwgAbUDvFr0pebvimYqI=; b=ZmwPvMjUHKRiLoATSpG4uEdxCAP1B9jllXjJqVxw2VRWvMtLpxZc82IYFw6n/UTNm9 vkkyIaFABuFi76o7XQCP1pDFYN5mtzF/NWqwUkrqKhTD9DxIC3DPtDq7Lt+Cbggvxtzy 6jGsFmy1OhqBBS/pB20kIZ5FQ4wdxao/ESPw6dVg67+MBM6kJDiCMhMcuPZI9KXcue05 cX4Nb+qd7rU4XC9eOP8i1eOO0lQX3wg29A02erTi45pLguMiRc2VJgjfn6JqmWEbD+dj yBsLmvu6Sl0PVAlkyRiizSozhDW8m9+YgwwZMzcIBRNtTpgeP3CRavn6hv3/un8Xy2FO d05g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="IE5Dm6k/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q10si3850345eda.293.2019.10.18.06.27.27; Fri, 18 Oct 2019 06:27:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="IE5Dm6k/"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410340AbfJRN1X (ORCPT + 26 others); Fri, 18 Oct 2019 09:27:23 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:52868 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2410298AbfJRN04 (ORCPT ); Fri, 18 Oct 2019 09:26:56 -0400 Received: by mail-wm1-f68.google.com with SMTP id r19so6199535wmh.2 for ; Fri, 18 Oct 2019 06:26:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=DD+OOZJusUQze5RChHgFxDyxwgAbUDvFr0pebvimYqI=; b=IE5Dm6k/xy/+rde2RvWmWyI0/DUmAHyUX22NOZ5evVHlxRmzsKu2JpysW1ZASFlAWQ cT8YrDJf3apsyB0NTFkBT6S3F0DRMCAWISya/qC4zRql+VJjt122NZgj7tti+tG+iTU/ +qQakNGSagQK7//t9r8R8yDCa5SF8ZoJag7oBJm/rE0uixAQC7PsG5C7fj6PcK9HuGPb VMKHVdYfkIV61uDp+5iunOz8YWxCIX1q8rZUVr7yAKpdmIezB0i+rkpwh1aAC2jY+PDM HQa8OkcZeobkzIRJ5fhe9TaDVIFvmd8A6ceGrY1H1dvo+THmjxcnicCDUIpxad0abmFs CzTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=DD+OOZJusUQze5RChHgFxDyxwgAbUDvFr0pebvimYqI=; b=X0o68s/2K58l0pBEAdgHKYnR/Q1IueRvLenEy1re+ojcx55QDW/1dZE/KvVh5581Mh F1f3yZW0NNeEeegOU0N0NJRNaO1cBimf0o0hPfeSEF9IiDeHX5RfB/W+ExI7inkjc/rM 0oVcuJ6Xz2D4otibUGrDpdKK08IH4voOi7Ewka5p6IzcuiU2PO9eVbFBNofgiuaGeDD3 JccSBpWN7za5E1ESXOYcOJtmJ03nTJASKnwphnHEo7Ge/ek1trSXrncJhP3pQJFCbpaY +WECBRT7I7VOnlZ7p6+ELHogBiEoa3chJJmk87X8Gj1QmpGXFPk3Tn0rKeXLVN+86HTL nxyA== X-Gm-Message-State: APjAAAUr2s4ttycRwcNzWbzutePcEHSoK3GYPalW8CfR/M/2aewWwfwf fGvnZqr+/OZHodqtvxzKbznnUxZ8+zE= X-Received: by 2002:a1c:f417:: with SMTP id z23mr7892493wma.77.1571405211215; Fri, 18 Oct 2019 06:26:51 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.26.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:26:49 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 04/11] sched/fair: rework load_balance Date: Fri, 18 Oct 2019 15:26:31 +0200 Message-Id: <1571405198-27570-5-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The load_balance algorithm contains some heuristics which have become meaningless since the rework of the scheduler's metrics like the introduction of PELT. Furthermore, load is an ill-suited metric for solving certain task placement imbalance scenarios. For instance, in the presence of idle CPUs, we should simply try to get at least one task per CPU, whereas the current load-based algorithm can actually leave idle CPUs alone simply because the load is somewhat balanced. The current algorithm ends up creating virtual and meaningless value like the avg_load_per_task or tweaks the state of a group to make it overloaded whereas it's not, in order to try to migrate tasks. load_balance should better qualify the imbalance of the group and clearly define what has to be moved to fix this imbalance. The type of sched_group has been extended to better reflect the type of imbalance. We now have : group_has_spare group_fully_busy group_misfit_task group_asym_packing group_imbalanced group_overloaded Based on the type of sched_group, load_balance now sets what it wants to move in order to fix the imbalance. It can be some load as before but also some utilization, a number of task or a type of task: migrate_task migrate_util migrate_load migrate_misfit This new load_balance algorithm fixes several pending wrong tasks placement: - the 1 task per CPU case with asymmetric system - the case of cfs task preempted by other class - the case of tasks not evenly spread on groups with spare capacity Also the load balance decisions have been consolidated in the 3 functions below after removing the few bypasses and hacks of the current code: - update_sd_pick_busiest() select the busiest sched_group. - find_busiest_group() checks if there is an imbalance between local and busiest group. - calculate_imbalance() decides what have to be moved. Finally, the now unused field total_running of struct sd_lb_stats has been removed. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 611 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 402 insertions(+), 209 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e004841..5ae5281 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7068,11 +7068,26 @@ static unsigned long __read_mostly max_load_balance_interval = HZ/10; enum fbq_type { regular, remote, all }; +/* + * group_type describes the group of CPUs at the moment of the load balance. + * The enum is ordered by pulling priority, with the group with lowest priority + * first so the groupe_type can be simply compared when selecting the busiest + * group. see update_sd_pick_busiest(). + */ enum group_type { - group_other = 0, + group_has_spare = 0, + group_fully_busy, group_misfit_task, + group_asym_packing, group_imbalanced, - group_overloaded, + group_overloaded +}; + +enum migration_type { + migrate_load = 0, + migrate_util, + migrate_task, + migrate_misfit }; #define LBF_ALL_PINNED 0x01 @@ -7105,7 +7120,7 @@ struct lb_env { unsigned int loop_max; enum fbq_type fbq_type; - enum group_type src_grp_type; + enum migration_type migration_type; struct list_head tasks; }; @@ -7328,7 +7343,7 @@ static struct task_struct *detach_one_task(struct lb_env *env) static const unsigned int sched_nr_migrate_break = 32; /* - * detach_tasks() -- tries to detach up to imbalance runnable load from + * detach_tasks() -- tries to detach up to imbalance load/util/tasks from * busiest_rq, as part of a balancing operation within domain "sd". * * Returns number of detached tasks if successful and 0 otherwise. @@ -7336,8 +7351,8 @@ static const unsigned int sched_nr_migrate_break = 32; static int detach_tasks(struct lb_env *env) { struct list_head *tasks = &env->src_rq->cfs_tasks; + unsigned long util, load; struct task_struct *p; - unsigned long load; int detached = 0; lockdep_assert_held(&env->src_rq->lock); @@ -7370,19 +7385,51 @@ static int detach_tasks(struct lb_env *env) if (!can_migrate_task(p, env)) goto next; - load = task_h_load(p); + switch (env->migration_type) { + case migrate_load: + load = task_h_load(p); - if (sched_feat(LB_MIN) && load < 16 && !env->sd->nr_balance_failed) - goto next; + if (sched_feat(LB_MIN) && + load < 16 && !env->sd->nr_balance_failed) + goto next; - if ((load / 2) > env->imbalance) - goto next; + if ((load / 2) > env->imbalance) + goto next; + + env->imbalance -= load; + break; + + case migrate_util: + util = task_util_est(p); + + if (util > env->imbalance) + goto next; + + env->imbalance -= util; + break; + + case migrate_task: + env->imbalance--; + break; + + case migrate_misfit: + load = task_h_load(p); + + /* + * load of misfit task might decrease a bit since it has + * been recorded. Be conservative in the condition. + */ + if (load / 2 < env->imbalance) + goto next; + + env->imbalance = 0; + break; + } detach_task(p, env); list_add(&p->se.group_node, &env->tasks); detached++; - env->imbalance -= load; #ifdef CONFIG_PREEMPTION /* @@ -7396,7 +7443,7 @@ static int detach_tasks(struct lb_env *env) /* * We only want to steal up to the prescribed amount of - * runnable load. + * load/util/tasks. */ if (env->imbalance <= 0) break; @@ -7661,7 +7708,6 @@ struct sg_lb_stats { unsigned int idle_cpus; unsigned int group_weight; enum group_type group_type; - int group_no_capacity; unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */ unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ #ifdef CONFIG_NUMA_BALANCING @@ -7677,10 +7723,10 @@ struct sg_lb_stats { struct sd_lb_stats { struct sched_group *busiest; /* Busiest group in this sd */ struct sched_group *local; /* Local group in this sd */ - unsigned long total_running; unsigned long total_load; /* Total load of all groups in sd */ unsigned long total_capacity; /* Total capacity of all groups in sd */ unsigned long avg_load; /* Average load across all groups in sd */ + unsigned int prefer_sibling; /* tasks should go to sibling first */ struct sg_lb_stats busiest_stat;/* Statistics of the busiest group */ struct sg_lb_stats local_stat; /* Statistics of the local group */ @@ -7691,19 +7737,18 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds) /* * Skimp on the clearing to avoid duplicate work. We can avoid clearing * local_stat because update_sg_lb_stats() does a full clear/assignment. - * We must however clear busiest_stat::avg_load because - * update_sd_pick_busiest() reads this before assignment. + * We must however set busiest_stat::group_type and + * busiest_stat::idle_cpus to the worst busiest group because + * update_sd_pick_busiest() reads these before assignment. */ *sds = (struct sd_lb_stats){ .busiest = NULL, .local = NULL, - .total_running = 0UL, .total_load = 0UL, .total_capacity = 0UL, .busiest_stat = { - .avg_load = 0UL, - .sum_h_nr_running = 0, - .group_type = group_other, + .idle_cpus = UINT_MAX, + .group_type = group_has_spare, }, }; } @@ -7945,19 +7990,26 @@ group_smaller_max_cpu_capacity(struct sched_group *sg, struct sched_group *ref) } static inline enum -group_type group_classify(struct sched_group *group, +group_type group_classify(struct lb_env *env, + struct sched_group *group, struct sg_lb_stats *sgs) { - if (sgs->group_no_capacity) + if (group_is_overloaded(env, sgs)) return group_overloaded; if (sg_imbalanced(group)) return group_imbalanced; + if (sgs->group_asym_packing) + return group_asym_packing; + if (sgs->group_misfit_task_load) return group_misfit_task; - return group_other; + if (!group_has_capacity(env, sgs)) + return group_fully_busy; + + return group_has_spare; } static bool update_nohz_stats(struct rq *rq, bool force) @@ -7994,10 +8046,12 @@ static inline void update_sg_lb_stats(struct lb_env *env, struct sg_lb_stats *sgs, int *sg_status) { - int i, nr_running; + int i, nr_running, local_group; memset(sgs, 0, sizeof(*sgs)); + local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group)); + for_each_cpu_and(i, sched_group_span(group), env->cpus) { struct rq *rq = cpu_rq(i); @@ -8022,9 +8076,16 @@ static inline void update_sg_lb_stats(struct lb_env *env, /* * No need to call idle_cpu() if nr_running is not 0 */ - if (!nr_running && idle_cpu(i)) + if (!nr_running && idle_cpu(i)) { sgs->idle_cpus++; + /* Idle cpu can't have misfit task */ + continue; + } + + if (local_group) + continue; + /* Check for a misfit task on the cpu */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && sgs->group_misfit_task_load < rq->misfit_task_load) { sgs->group_misfit_task_load = rq->misfit_task_load; @@ -8032,14 +8093,24 @@ static inline void update_sg_lb_stats(struct lb_env *env, } } - /* Adjust by relative CPU capacity of the group */ + /* Check if dst cpu is idle and preferred to this group */ + if (env->sd->flags & SD_ASYM_PACKING && + env->idle != CPU_NOT_IDLE && + sgs->sum_h_nr_running && + sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) { + sgs->group_asym_packing = 1; + } + sgs->group_capacity = group->sgc->capacity; - sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; sgs->group_weight = group->group_weight; - sgs->group_no_capacity = group_is_overloaded(env, sgs); - sgs->group_type = group_classify(group, sgs); + sgs->group_type = group_classify(env, group, sgs); + + /* Computing avg_load makes sense only when group is overloaded */ + if (sgs->group_type == group_overloaded) + sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) / + sgs->group_capacity; } /** @@ -8062,6 +8133,10 @@ static bool update_sd_pick_busiest(struct lb_env *env, { struct sg_lb_stats *busiest = &sds->busiest_stat; + /* Make sure that there is at least one task to pull */ + if (!sgs->sum_h_nr_running) + return false; + /* * Don't try to pull misfit tasks we can't help. * We can use max_capacity here as reduction in capacity on some @@ -8070,7 +8145,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, */ if (sgs->group_type == group_misfit_task && (!group_smaller_max_cpu_capacity(sg, sds->local) || - !group_has_capacity(env, &sds->local_stat))) + sds->local_stat.group_type != group_has_spare)) return false; if (sgs->group_type > busiest->group_type) @@ -8079,62 +8154,80 @@ static bool update_sd_pick_busiest(struct lb_env *env, if (sgs->group_type < busiest->group_type) return false; - if (sgs->avg_load <= busiest->avg_load) - return false; - - if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) - goto asym_packing; - /* - * Candidate sg has no more than one task per CPU and - * has higher per-CPU capacity. Migrating tasks to less - * capable CPUs may harm throughput. Maximize throughput, - * power/energy consequences are not considered. + * The candidate and the current busiest group are the same type of + * group. Let check which one is the busiest according to the type. */ - if (sgs->sum_h_nr_running <= sgs->group_weight && - group_smaller_min_cpu_capacity(sds->local, sg)) - return false; - /* - * If we have more than one misfit sg go with the biggest misfit. - */ - if (sgs->group_type == group_misfit_task && - sgs->group_misfit_task_load < busiest->group_misfit_task_load) + switch (sgs->group_type) { + case group_overloaded: + /* Select the overloaded group with highest avg_load. */ + if (sgs->avg_load <= busiest->avg_load) + return false; + break; + + case group_imbalanced: + /* + * Select the 1st imbalanced group as we don't have any way to + * choose one more than another. + */ return false; -asym_packing: - /* This is the busiest node in its class. */ - if (!(env->sd->flags & SD_ASYM_PACKING)) - return true; + case group_asym_packing: + /* Prefer to move from lowest priority CPU's work */ + if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu)) + return false; + break; - /* No ASYM_PACKING if target CPU is already busy */ - if (env->idle == CPU_NOT_IDLE) - return true; - /* - * ASYM_PACKING needs to move all the work to the highest - * prority CPUs in the group, therefore mark all groups - * of lower priority than ourself as busy. - * - * This is primarily intended to used at the sibling level. Some - * cores like POWER7 prefer to use lower numbered SMT threads. In the - * case of POWER7, it can move to lower SMT modes only when higher - * threads are idle. When in lower SMT modes, the threads will - * perform better since they share less core resources. Hence when we - * have idle threads, we want them to be the higher ones. - */ - if (sgs->sum_h_nr_running && - sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { - sgs->group_asym_packing = 1; - if (!sds->busiest) - return true; + case group_misfit_task: + /* + * If we have more than one misfit sg go with the biggest + * misfit. + */ + if (sgs->group_misfit_task_load < busiest->group_misfit_task_load) + return false; + break; - /* Prefer to move from lowest priority CPU's work */ - if (sched_asym_prefer(sds->busiest->asym_prefer_cpu, - sg->asym_prefer_cpu)) - return true; + case group_fully_busy: + /* + * Select the fully busy group with highest avg_load. In + * theory, there is no need to pull task from such kind of + * group because tasks have all compute capacity that they need + * but we can still improve the overall throughput by reducing + * contention when accessing shared HW resources. + * + * XXX for now avg_load is not computed and always 0 so we + * select the 1st one. + */ + if (sgs->avg_load <= busiest->avg_load) + return false; + break; + + case group_has_spare: + /* + * Select not overloaded group with lowest number of + * idle cpus. We could also compare the spare capacity + * which is more stable but it can end up that the + * group has less spare capacity but finally more idle + * cpus which means less opportunity to pull tasks. + */ + if (sgs->idle_cpus >= busiest->idle_cpus) + return false; + break; } - return false; + /* + * Candidate sg has no more than one task per CPU and has higher + * per-CPU capacity. Migrating tasks to less capable CPUs may harm + * throughput. Maximize throughput, power/energy consequences are not + * considered. + */ + if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && + (sgs->group_type <= group_fully_busy) && + (group_smaller_min_cpu_capacity(sds->local, sg))) + return false; + + return true; } #ifdef CONFIG_NUMA_BALANCING @@ -8172,13 +8265,13 @@ static inline enum fbq_type fbq_classify_rq(struct rq *rq) * @env: The load balancing environment. * @sds: variable to hold the statistics for this sched_domain. */ + static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds) { struct sched_domain *child = env->sd->child; struct sched_group *sg = env->sd->groups; struct sg_lb_stats *local = &sds->local_stat; struct sg_lb_stats tmp_sgs; - bool prefer_sibling = child && child->flags & SD_PREFER_SIBLING; int sg_status = 0; #ifdef CONFIG_NO_HZ_COMMON @@ -8205,22 +8298,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd if (local_group) goto next_group; - /* - * In case the child domain prefers tasks go to siblings - * first, lower the sg capacity so that we'll try - * and move all the excess tasks away. We lower the capacity - * of a group only if the local group has the capacity to fit - * these excess tasks. The extra check prevents the case where - * you always pull from the heaviest group when it is already - * under-utilized (possible with a large weight task outweighs - * the tasks on the system). - */ - if (prefer_sibling && sds->local && - group_has_capacity(env, local) && - (sgs->sum_h_nr_running > local->sum_h_nr_running + 1)) { - sgs->group_no_capacity = 1; - sgs->group_type = group_classify(sg, sgs); - } if (update_sd_pick_busiest(env, sds, sg, sgs)) { sds->busiest = sg; @@ -8229,13 +8306,15 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd next_group: /* Now, start updating sd_lb_stats */ - sds->total_running += sgs->sum_h_nr_running; sds->total_load += sgs->group_load; sds->total_capacity += sgs->group_capacity; sg = sg->next; } while (sg != env->sd->groups); + /* Tag domain that child domain prefers tasks go to siblings first */ + sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING; + #ifdef CONFIG_NO_HZ_COMMON if ((env->flags & LBF_NOHZ_AGAIN) && cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) { @@ -8273,69 +8352,149 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd */ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *sds) { - unsigned long max_pull, load_above_capacity = ~0UL; struct sg_lb_stats *local, *busiest; local = &sds->local_stat; busiest = &sds->busiest_stat; - if (busiest->group_asym_packing) { - env->imbalance = busiest->group_load; + if (busiest->group_type == group_misfit_task) { + /* Set imbalance to allow misfit task to be balanced. */ + env->migration_type = migrate_misfit; + env->imbalance = busiest->group_misfit_task_load; + return; + } + + if (busiest->group_type == group_asym_packing) { + /* + * In case of asym capacity, we will try to migrate all load to + * the preferred CPU. + */ + env->migration_type = migrate_task; + env->imbalance = busiest->sum_h_nr_running; + return; + } + + if (busiest->group_type == group_imbalanced) { + /* + * In the group_imb case we cannot rely on group-wide averages + * to ensure CPU-load equilibrium, try to move any task to fix + * the imbalance. The next load balance will take care of + * balancing back the system. + */ + env->migration_type = migrate_task; + env->imbalance = 1; return; } /* - * Avg load of busiest sg can be less and avg load of local sg can - * be greater than avg load across all sgs of sd because avg load - * factors in sg capacity and sgs with smaller group_type are - * skipped when updating the busiest sg: + * Try to use spare capacity of local group without overloading it or + * emptying busiest */ - if (busiest->group_type != group_misfit_task && - (busiest->avg_load <= sds->avg_load || - local->avg_load >= sds->avg_load)) { - env->imbalance = 0; + if (local->group_type == group_has_spare) { + if (busiest->group_type > group_fully_busy) { + /* + * If busiest is overloaded, try to fill spare + * capacity. This might end up creating spare capacity + * in busiest or busiest still being overloaded but + * there is no simple way to directly compute the + * amount of load to migrate in order to balance the + * system. + */ + env->migration_type = migrate_util; + env->imbalance = max(local->group_capacity, local->group_util) - + local->group_util; + + /* + * In some case, the group's utilization is max or even + * higher than capacity because of migrations but the + * local CPU is (newly) idle. There is at least one + * waiting task in this overloaded busiest group. Let + * try to pull it. + */ + if (env->idle != CPU_NOT_IDLE && env->imbalance == 0) { + env->migration_type = migrate_task; + env->imbalance = 1; + } + + return; + } + + if (busiest->group_weight == 1 || sds->prefer_sibling) { + unsigned int nr_diff = busiest->sum_h_nr_running; + /* + * When prefer sibling, evenly spread running tasks on + * groups. + */ + env->migration_type = migrate_task; + lsub_positive(&nr_diff, local->sum_h_nr_running); + env->imbalance = nr_diff >> 1; + return; + } + + /* + * If there is no overload, we just want to even the number of + * idle cpus. + */ + env->migration_type = migrate_task; + env->imbalance = max_t(long, 0, (local->idle_cpus - + busiest->idle_cpus) >> 1); return; } /* - * If there aren't any idle CPUs, avoid creating some. + * Local is fully busy but has to take more load to relieve the + * busiest group */ - if (busiest->group_type == group_overloaded && - local->group_type == group_overloaded) { - load_above_capacity = busiest->sum_h_nr_running * SCHED_CAPACITY_SCALE; - if (load_above_capacity > busiest->group_capacity) { - load_above_capacity -= busiest->group_capacity; - load_above_capacity *= scale_load_down(NICE_0_LOAD); - load_above_capacity /= busiest->group_capacity; - } else - load_above_capacity = ~0UL; + if (local->group_type < group_overloaded) { + /* + * Local will become overloaded so the avg_load metrics are + * finally needed. + */ + + local->avg_load = (local->group_load * SCHED_CAPACITY_SCALE) / + local->group_capacity; + + sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) / + sds->total_capacity; } /* - * We're trying to get all the CPUs to the average_load, so we don't - * want to push ourselves above the average load, nor do we wish to - * reduce the max loaded CPU below the average load. At the same time, - * we also don't want to reduce the group load below the group - * capacity. Thus we look for the minimum possible imbalance. + * Both group are or will become overloaded and we're trying to get all + * the CPUs to the average_load, so we don't want to push ourselves + * above the average load, nor do we wish to reduce the max loaded CPU + * below the average load. At the same time, we also don't want to + * reduce the group load below the group capacity. Thus we look for + * the minimum possible imbalance. */ - max_pull = min(busiest->avg_load - sds->avg_load, load_above_capacity); - - /* How much load to actually move to equalise the imbalance */ + env->migration_type = migrate_load; env->imbalance = min( - max_pull * busiest->group_capacity, + (busiest->avg_load - sds->avg_load) * busiest->group_capacity, (sds->avg_load - local->avg_load) * local->group_capacity ) / SCHED_CAPACITY_SCALE; - - /* Boost imbalance to allow misfit task to be balanced. */ - if (busiest->group_type == group_misfit_task) { - env->imbalance = max_t(long, env->imbalance, - busiest->group_misfit_task_load); - } - } /******* find_busiest_group() helpers end here *********************/ +/* + * Decision matrix according to the local and busiest group type + * + * busiest \ local has_spare fully_busy misfit asym imbalanced overloaded + * has_spare nr_idle balanced N/A N/A balanced balanced + * fully_busy nr_idle nr_idle N/A N/A balanced balanced + * misfit_task force N/A N/A N/A force force + * asym_packing force force N/A N/A force force + * imbalanced force force N/A N/A force force + * overloaded force force N/A N/A force avg_load + * + * N/A : Not Applicable because already filtered while updating + * statistics. + * balanced : The system is balanced for these 2 groups. + * force : Calculate the imbalance as load migration is probably needed. + * avg_load : Only if imbalance is significant enough. + * nr_idle : dst_cpu is not busy and the number of idle cpus is quite + * different in groups. + */ + /** * find_busiest_group - Returns the busiest group within the sched_domain * if there is an imbalance. @@ -8370,17 +8529,17 @@ static struct sched_group *find_busiest_group(struct lb_env *env) local = &sds.local_stat; busiest = &sds.busiest_stat; - /* ASYM feature bypasses nice load balance check */ - if (busiest->group_asym_packing) - goto force_balance; - /* There is no busy sibling group to pull tasks from */ - if (!sds.busiest || busiest->sum_h_nr_running == 0) + if (!sds.busiest) goto out_balanced; - /* XXX broken for overlapping NUMA groups */ - sds.avg_load = (SCHED_CAPACITY_SCALE * sds.total_load) - / sds.total_capacity; + /* Misfit tasks should be dealt with regardless of the avg load */ + if (busiest->group_type == group_misfit_task) + goto force_balance; + + /* ASYM feature bypasses nice load balance check */ + if (busiest->group_type == group_asym_packing) + goto force_balance; /* * If the busiest group is imbalanced the below checks don't @@ -8391,55 +8550,64 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* - * When dst_cpu is idle, prevent SMP nice and/or asymmetric group - * capacities from resulting in underutilization due to avg_load. - */ - if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && - busiest->group_no_capacity) - goto force_balance; - - /* Misfit tasks should be dealt with regardless of the avg load */ - if (busiest->group_type == group_misfit_task) - goto force_balance; - - /* * If the local group is busier than the selected busiest group * don't try and pull any tasks. */ - if (local->avg_load >= busiest->avg_load) + if (local->group_type > busiest->group_type) goto out_balanced; /* - * Don't pull any tasks if this group is already above the domain - * average load. + * When groups are overloaded, use the avg_load to ensure fairness + * between tasks. */ - if (local->avg_load >= sds.avg_load) - goto out_balanced; + if (local->group_type == group_overloaded) { + /* + * If the local group is more loaded than the selected + * busiest group don't try and pull any tasks. + */ + if (local->avg_load >= busiest->avg_load) + goto out_balanced; + + /* XXX broken for overlapping NUMA groups */ + sds.avg_load = (sds.total_load * SCHED_CAPACITY_SCALE) / + sds.total_capacity; - if (env->idle == CPU_IDLE) { /* - * This CPU is idle. If the busiest group is not overloaded - * and there is no imbalance between this and busiest group - * wrt idle CPUs, it is balanced. The imbalance becomes - * significant if the diff is greater than 1 otherwise we - * might end up to just move the imbalance on another group + * Don't pull any tasks if this group is already above the + * domain average load. */ - if ((busiest->group_type != group_overloaded) && - (local->idle_cpus <= (busiest->idle_cpus + 1))) + if (local->avg_load >= sds.avg_load) goto out_balanced; - } else { + /* - * In the CPU_NEWLY_IDLE, CPU_NOT_IDLE cases, use - * imbalance_pct to be conservative. + * If the busiest group is more loaded, use imbalance_pct to be + * conservative. */ if (100 * busiest->avg_load <= env->sd->imbalance_pct * local->avg_load) goto out_balanced; } + /* Try to move all excess tasks to child's sibling domain */ + if (sds.prefer_sibling && local->group_type == group_has_spare && + busiest->sum_h_nr_running > local->sum_h_nr_running + 1) + goto force_balance; + + if (busiest->group_type != group_overloaded && + (env->idle == CPU_NOT_IDLE || + local->idle_cpus <= (busiest->idle_cpus + 1))) + /* + * If the busiest group is not overloaded + * and there is no imbalance between this and busiest group + * wrt idle CPUs, it is balanced. The imbalance + * becomes significant if the diff is greater than 1 otherwise + * we might end up to just move the imbalance on another + * group. + */ + goto out_balanced; + force_balance: /* Looks like there is an imbalance. Compute it */ - env->src_grp_type = busiest->group_type; calculate_imbalance(env, &sds); return env->imbalance ? sds.busiest : NULL; @@ -8455,11 +8623,13 @@ static struct rq *find_busiest_queue(struct lb_env *env, struct sched_group *group) { struct rq *busiest = NULL, *rq; - unsigned long busiest_load = 0, busiest_capacity = 1; + unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1; + unsigned int busiest_nr = 0; int i; for_each_cpu_and(i, sched_group_span(group), env->cpus) { - unsigned long capacity, load; + unsigned long capacity, load, util; + unsigned int nr_running; enum fbq_type rt; rq = cpu_rq(i); @@ -8487,20 +8657,8 @@ static struct rq *find_busiest_queue(struct lb_env *env, if (rt > env->fbq_type) continue; - /* - * For ASYM_CPUCAPACITY domains with misfit tasks we simply - * seek the "biggest" misfit task. - */ - if (env->src_grp_type == group_misfit_task) { - if (rq->misfit_task_load > busiest_load) { - busiest_load = rq->misfit_task_load; - busiest = rq; - } - - continue; - } - capacity = capacity_of(i); + nr_running = rq->cfs.h_nr_running; /* * For ASYM_CPUCAPACITY domains, don't pick a CPU that could @@ -8510,35 +8668,70 @@ static struct rq *find_busiest_queue(struct lb_env *env, */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && capacity_of(env->dst_cpu) < capacity && - rq->nr_running == 1) + nr_running == 1) continue; - load = cpu_runnable_load(rq); + switch (env->migration_type) { + case migrate_load: + /* + * When comparing with load imbalance, use + * cpu_runnable_load() which is not scaled with the CPU + * capacity. + */ + load = cpu_runnable_load(rq); - /* - * When comparing with imbalance, use cpu_runnable_load() - * which is not scaled with the CPU capacity. - */ + if (nr_running == 1 && load > env->imbalance && + !check_cpu_capacity(rq, env->sd)) + break; - if (rq->nr_running == 1 && load > env->imbalance && - !check_cpu_capacity(rq, env->sd)) - continue; + /* + * For the load comparisons with the other CPU's, + * consider the cpu_runnable_load() scaled with the CPU + * capacity, so that the load can be moved away from + * the CPU that is potentially running at a lower + * capacity. + * + * Thus we're looking for max(load_i / capacity_i), + * crosswise multiplication to rid ourselves of the + * division works out to: + * load_i * capacity_j > load_j * capacity_i; + * where j is our previous maximum. + */ + if (load * busiest_capacity > busiest_load * capacity) { + busiest_load = load; + busiest_capacity = capacity; + busiest = rq; + } + break; + + case migrate_util: + util = cpu_util(cpu_of(rq)); + + if (busiest_util < util) { + busiest_util = util; + busiest = rq; + } + break; + + case migrate_task: + if (busiest_nr < nr_running) { + busiest_nr = nr_running; + busiest = rq; + } + break; + + case migrate_misfit: + /* + * For ASYM_CPUCAPACITY domains with misfit tasks we + * simply seek the "biggest" misfit task. + */ + if (rq->misfit_task_load > busiest_load) { + busiest_load = rq->misfit_task_load; + busiest = rq; + } + + break; - /* - * For the load comparisons with the other CPU's, consider - * the cpu_runnable_load() scaled with the CPU capacity, so - * that the load can be moved away from the CPU that is - * potentially running at a lower capacity. - * - * Thus we're looking for max(load_i / capacity_i), crosswise - * multiplication to rid ourselves of the division works out - * to: load_i * capacity_j > load_j * capacity_i; where j is - * our previous maximum. - */ - if (load * busiest_capacity > busiest_load * capacity) { - busiest_load = load; - busiest_capacity = capacity; - busiest = rq; } } @@ -8584,7 +8777,7 @@ voluntary_active_balance(struct lb_env *env) return 1; } - if (env->src_grp_type == group_misfit_task) + if (env->migration_type == migrate_misfit) return 1; return 0; From patchwork Fri Oct 18 13:26:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176833 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861431ill; Fri, 18 Oct 2019 06:26:59 -0700 (PDT) X-Google-Smtp-Source: APXvYqzhpqJCcxPl7M3jqkMU+FegnTUzuXTnVYGUU4dpE1hXcbecMn+MeR8XT670ES5Y/Gnf7oCM X-Received: by 2002:a17:906:3e50:: with SMTP id t16mr8717856eji.177.1571405219240; Fri, 18 Oct 2019 06:26:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405219; cv=none; d=google.com; s=arc-20160816; b=wN0vNVG8Z/WZoUfl/my9IO4Anyq65i8LLcwjGI+QTZ8+9pqr/9Qo92G4J6ovmDYchb 9dPVm6pWDRVTo3AUEHx/cULVAUVkgNoUYBj7hVEmGB4X+tiiB0Sa4/ESPEgkhVvPhxFI NK8WBJQnL5iYCdhaQBMrqv8z7iV7fkIGj4mH0D08EYNDgSNtaoLwlQ0+QtKZuVuHjkXB rP/ndIWr8ah3l0lvyhQVclSOo0MfSdBju01D+G15kAs8uLVGdzi3B+IrvY7y8kOeKym8 50w9oCA3/Gt6uDNo2jyt+Lxx65C9RGK5LMywh3/rA9R3F+huLx6FZSpBxnkig7Bw6/ez /oOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=ACgHRAYDPePD5iyvNwVwA2/U+1dhCaKONZj4VM6vt/4=; b=VGLCtMA7LVSYNcLtdSysb+d3qj/Plvh99VA9UNPQHsYJF132t9rAPAJHPbiSCghKTl tg2PkMAoG8ZEBB+Nf2kCP4BY8PRyPYTzfTKAPcojOevNA4qxkZvwMevUW5w47ny90Us2 Y7CfGAAa8PSRVr1POWxJPUzOKllS1Z86EcSRbH7r27DG6+uCxXA/EAT4nQw/QJhmK7Lp tPstKv23+hGhhIUBlYcfYAowG8SIplX6Sbh5ZJinIo3iHrzqraP2sCk5iLK+cQ2NIgaL /nlLI1DBH5IOWxHMON6YnlIdEF9dop5PDVHqkoF2zfwFngU8n+bQPktbGSv7pLKH7PBp N6jQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=DYFQYDZv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q4si3371475eji.152.2019.10.18.06.26.59; Fri, 18 Oct 2019 06:26:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=DYFQYDZv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410312AbfJRN04 (ORCPT + 26 others); Fri, 18 Oct 2019 09:26:56 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:54991 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2410303AbfJRN0y (ORCPT ); Fri, 18 Oct 2019 09:26:54 -0400 Received: by mail-wm1-f68.google.com with SMTP id p7so6192727wmp.4 for ; Fri, 18 Oct 2019 06:26:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ACgHRAYDPePD5iyvNwVwA2/U+1dhCaKONZj4VM6vt/4=; b=DYFQYDZvBUzF7NyIwx1hcaaajr/pskthAizC7tbjYaD3UxnadPZvspq6RYLl5JDuQV lEKI6PkORzSPdHHl36BLJS1oBce2zZRRSuXXLSBZjx6XW4AJUgpDve/Kv6JFayHEN0jw XQNRSq+JsGYsfTLa5Fj2czaP7Vyqdmt4+bXOhYWdLe4Cs7TXEWqfq6kliKchFPHDPb7v R13iliv+7RV44FvjcF3/GVZ+aFGIiQyN/t+BuAED+EJZIFPUTR8sTrxqlwJmb5JhDyGd 2hITiLWee/Ztfvq41wfS0NSoBlallPH6QnLMIozQ+HyxHqmYFE/5IYKI91fn2sYsdNIE MoRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ACgHRAYDPePD5iyvNwVwA2/U+1dhCaKONZj4VM6vt/4=; b=PkRAW0jdRNZn1vVubR0wUmtUEQEtTtxyKZ6nHLABRBh4fKKmFQKxkE7Ugy0dAhIqR8 9S2gxtiaLduomyuBylkxZmaKryfMjiBI3rxzNfj09nMKseZI5tJSXyr7SBzBG2f3mhol d+7wjMcGMfJdZwA2ZcaVSSU5/EkCv6/lgs8nEqVtW76Br+r6/RCfExc7T9UkmHT77M65 FZMB8LqqZi6idsIIYI8vu/eHGeRMtfmOSQw+2CWSWWIbHt4uC8UQnFSDqGoZaYt6OJ+t oeA+cNKIutS8ubswjvbbtdd1fnLtCHIwCf2HLyg7NEpTu8vR711ZbXzn5EeoiUvBfQ6m p9FQ== X-Gm-Message-State: APjAAAWiAfvdLFSn3w6fBe9AE/KgrJBO+GeiyTHzfBLFTrKvjw4vbmAL BrO+jAI7nKBvLeTMYAn2D/l1K8DMbX8= X-Received: by 2002:a1c:1bc5:: with SMTP id b188mr8035064wmb.88.1571405212870; Fri, 18 Oct 2019 06:26:52 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.26.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:26:51 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 05/11] sched/fair: use rq->nr_running when balancing load Date: Fri, 18 Oct 2019 15:26:32 +0200 Message-Id: <1571405198-27570-6-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org cfs load_balance only takes care of CFS tasks whereas CPUs can be used by other scheduling class. Typically, a CFS task preempted by a RT or deadline task will not get a chance to be pulled on another CPU because the load_balance doesn't take into account tasks from other classes. Add sum of nr_running in the statistics and use it to detect such situation. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5ae5281..e09fe12b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7704,6 +7704,7 @@ struct sg_lb_stats { unsigned long group_load; /* Total load over the CPUs of the group */ unsigned long group_capacity; unsigned long group_util; /* Total utilization of the group */ + unsigned int sum_nr_running; /* Nr of tasks running in the group */ unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */ unsigned int idle_cpus; unsigned int group_weight; @@ -7938,7 +7939,7 @@ static inline int sg_imbalanced(struct sched_group *group) static inline bool group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_h_nr_running < sgs->group_weight) + if (sgs->sum_nr_running < sgs->group_weight) return true; if ((sgs->group_capacity * 100) > @@ -7959,7 +7960,7 @@ group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) static inline bool group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_h_nr_running <= sgs->group_weight) + if (sgs->sum_nr_running <= sgs->group_weight) return false; if ((sgs->group_capacity * 100) < @@ -8063,6 +8064,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->sum_h_nr_running += rq->cfs.h_nr_running; nr_running = rq->nr_running; + sgs->sum_nr_running += nr_running; + if (nr_running > 1) *sg_status |= SG_OVERLOAD; @@ -8420,13 +8423,13 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s } if (busiest->group_weight == 1 || sds->prefer_sibling) { - unsigned int nr_diff = busiest->sum_h_nr_running; + unsigned int nr_diff = busiest->sum_nr_running; /* * When prefer sibling, evenly spread running tasks on * groups. */ env->migration_type = migrate_task; - lsub_positive(&nr_diff, local->sum_h_nr_running); + lsub_positive(&nr_diff, local->sum_nr_running); env->imbalance = nr_diff >> 1; return; } @@ -8590,7 +8593,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) /* Try to move all excess tasks to child's sibling domain */ if (sds.prefer_sibling && local->group_type == group_has_spare && - busiest->sum_h_nr_running > local->sum_h_nr_running + 1) + busiest->sum_nr_running > local->sum_nr_running + 1) goto force_balance; if (busiest->group_type != group_overloaded && From patchwork Fri Oct 18 13:26:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176840 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861892ill; Fri, 18 Oct 2019 06:27:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqzPrb6S8H2iic3JdpOF9xujrlIyT4+ZPRWNOT7IwszACop01w2R8jzb3hiGYEIRsIWF+5MM X-Received: by 2002:a50:e40c:: with SMTP id d12mr9401054edm.256.1571405241193; Fri, 18 Oct 2019 06:27:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405241; cv=none; d=google.com; s=arc-20160816; b=mi9KNVswU0N/RKWpEcHBQQ6NmItBt5+m1Q7OjzDpk3241bdz4Z+V0KnJfWxyfN6ajw 3ktq27N0fslO9ZRNXQBrDEgx3/qFdAUCBfR/Jksr++9qOE4q+I5zHPFakf1iKvydxI01 UH/PligXeKdRkqS7tPgM0pvfjIfAQMaYA18wpWQX8FDBIvSbmXWSG1GpahhsqxINsihZ AJtGe4CHZX8GrFzmOrxnpdIEwM1HGzqjrqiPH5L0Rr5gGVkHUB5xnUc6KSm8gmhVcIde b7KbNlML/bfEV3OvNGKslasVm+oz9d/auRc3TqAc5JVE7c+a3D8uv3YAxm8bEOywvP/P csyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=B8mzip3+tA6BeidSYtlNqITD3DSe2ulF+ZWQ0m84V5I=; b=mUsaz8Y8JZ9piFuq0l0UbL3iEcSvualEBJGKYepNGuij4b2kB1a5edHRVEUGxeb8eX Z8MIZYHQ/jlPOZHfKITS8G8iANmjczSLa62YVzQrbi9uXkafm34qhN9te3b61iO6nGdt av7lU1Xx4PmU5scL2U2l+ZEnSPZCOgpeNcL1KOcvyh2ys0ohUWUkQVJGHWfZO3oj4Vl3 9h9yebs2Q1IssuzUfxh8Rgirjx9kwQuDoMqPQmSfqXtfG9Mrs7YMFyAnApTc6golR23F fYM/wKwh00RkTDhbW6R8K509OVLBBggzz7vZaFsVpshUotSP7bPkRfpheczNPBq4LIQC j1oQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=zCufCuWZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u27si3475775ejb.172.2019.10.18.06.27.21; Fri, 18 Oct 2019 06:27:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=zCufCuWZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2634111AbfJRN1R (ORCPT + 26 others); Fri, 18 Oct 2019 09:27:17 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:50538 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2410304AbfJRN04 (ORCPT ); Fri, 18 Oct 2019 09:26:56 -0400 Received: by mail-wm1-f65.google.com with SMTP id 5so6213746wmg.0 for ; Fri, 18 Oct 2019 06:26:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=B8mzip3+tA6BeidSYtlNqITD3DSe2ulF+ZWQ0m84V5I=; b=zCufCuWZkU+wU1JLp+asFBiOQ0bhsNUSe83udjSDa+1I4WD37Kt+NZ4m96hUm3/5Jb Jr3073a+WBS2tL510iWr/zKwahzlpBpAhfnNB6Cde4OyNbdtc9FmseNsczNReToG4Pfj 0cdDn3rXzUF9oddIsjG48uQ9uP0ULEdpLG8pXv0rp2cKf5i0qJ1v9BsbawcCxQyNVVDs 3LOSV89LrKWe7vXv3klu5cYLdOiS+c2uDqYMWE9exaUfEYZQaLaUm5FJSKhWBlI8Pu43 fuj87npoKTK6jP1UmuA51tv8fTBWCHvfP5+8yuuFXvN3tys6002i7UXuRe8Qm3fmQing 2G1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=B8mzip3+tA6BeidSYtlNqITD3DSe2ulF+ZWQ0m84V5I=; b=sEBOebNnFfqYUeboprbpinq5pQnPiWLMKqGJVeFgoPOOsQK2RfuR3NgzJ3vY6432fE xrQqz0YX/ekscndgq/cGDN5TTIfpFhjUM12X0o1DReUbgCEriew5TGiHEFSeQ+wx99CA RPv+Zkgw0rDxl73QP+VEv6kBszsmRDLEZ5ic+IObyyx42NbBqqZus7EComRN4F+9ms/s hiZyKGVXDL08D0pkprEJj0ZZzOeLfihiYBcOmHNqnazToS7arWlxA9DZ15NVIcKqnRlY c2iEgWpFMYX0jKlHPUNtGR3ofynYnLwWArt0v9FVPog8c7ZOXfxVq9NELwiHkJZZ3zlo VvZg== X-Gm-Message-State: APjAAAUBqLPP6WK6m7Euq8KId/IN9wOMCNqY3lG9GiyTPXVyZP8cBMDF vaLXDuk/buXKWiduQovkbqbuysY5Bjo= X-Received: by 2002:a1c:9d4c:: with SMTP id g73mr8094349wme.92.1571405214681; Fri, 18 Oct 2019 06:26:54 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.26.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:26:53 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 06/11] sched/fair: use load instead of runnable load in load_balance Date: Fri, 18 Oct 2019 15:26:33 +0200 Message-Id: <1571405198-27570-7-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org runnable load has been introduced to take into account the case where blocked load biases the load balance decision which was selecting underutilized group with huge blocked load whereas other groups were overloaded. The load is now only used when groups are overloaded. In this case, it's worth being conservative and taking into account the sleeping tasks that might wakeup on the cpu. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e09fe12b..9ac2264 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5385,6 +5385,11 @@ static unsigned long cpu_runnable_load(struct rq *rq) return cfs_rq_runnable_load_avg(&rq->cfs); } +static unsigned long cpu_load(struct rq *rq) +{ + return cfs_rq_load_avg(&rq->cfs); +} + static unsigned long capacity_of(int cpu) { return cpu_rq(cpu)->cpu_capacity; @@ -8059,7 +8064,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq, false)) env->flags |= LBF_NOHZ_AGAIN; - sgs->group_load += cpu_runnable_load(rq); + sgs->group_load += cpu_load(rq); sgs->group_util += cpu_util(i); sgs->sum_h_nr_running += rq->cfs.h_nr_running; @@ -8517,7 +8522,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) init_sd_lb_stats(&sds); /* - * Compute the various statistics relavent for load balancing at + * Compute the various statistics relevant for load balancing at * this level. */ update_sd_lb_stats(env, &sds); @@ -8677,11 +8682,10 @@ static struct rq *find_busiest_queue(struct lb_env *env, switch (env->migration_type) { case migrate_load: /* - * When comparing with load imbalance, use - * cpu_runnable_load() which is not scaled with the CPU - * capacity. + * When comparing with load imbalance, use cpu_load() + * which is not scaled with the CPU capacity. */ - load = cpu_runnable_load(rq); + load = cpu_load(rq); if (nr_running == 1 && load > env->imbalance && !check_cpu_capacity(rq, env->sd)) @@ -8689,10 +8693,10 @@ static struct rq *find_busiest_queue(struct lb_env *env, /* * For the load comparisons with the other CPU's, - * consider the cpu_runnable_load() scaled with the CPU - * capacity, so that the load can be moved away from - * the CPU that is potentially running at a lower - * capacity. + * consider the cpu_load() scaled with the CPU + * capacity, so that the load can be moved away + * from the CPU that is potentially running at a + * lower capacity. * * Thus we're looking for max(load_i / capacity_i), * crosswise multiplication to rid ourselves of the From patchwork Fri Oct 18 13:26:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176835 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861589ill; Fri, 18 Oct 2019 06:27:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqw/hPN0ZttGPsnrThbm7y1K5LVUMMahd0jE90gZLuYcrDYhwhoeZ5ArpEISGtp5qh6X9+Mg X-Received: by 2002:a17:906:309b:: with SMTP id 27mr8630279ejv.243.1571405226901; Fri, 18 Oct 2019 06:27:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405226; cv=none; d=google.com; s=arc-20160816; b=FRCyAFPYGfjYdTUThWIXfwb1wL3dH7y0658UrR4yFr76aRWzWWnV1NgWNk6YoCRVuE LuioBxBE+RUYv1qP6ml3UTiw9s2rQYBVF4Zg+CnJ3KQ5RUkkModa2eWHCRUaOnM9K+j1 ITDSMct9Cq9bhgdJlGJFFpWR8VrIhNqqKNelDjdhkRXLxApXJyLqwKewaYFAA693l92m agRjlUu+2+F1dXwKgwdnooWRtl0sEmvs7hkdh77BHCbBbA4NQsg/1LBTAmiRc0y1n6Ou jHP3lykJz6z9evwJAANNC15kYWyeY4gdi/tE9ikS+g+oEB0N9NXKxXn/SKTFn3azWEQr 0WnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=U+L99VOS6caLFt+CoaTBRiqHtekpuXrbr4pGYmwWDiE=; b=xbr0blwi8h40Oniggl5JcYm5y6GEYRnaR1OtHmtf4UKMW516XLwlLJoZJtWDlVWwV9 ntih+6uE1WcIRqS4Dp0MYxuI0qaA3HPw/KKtJoI7ZWhwb4Lec4gyg6vo5heJYtFuaTcb uCQrcql8DkWGL1bRtbuegW4YRWGl6chhlKVB+2kNqmpgaYxzFKE5vKSGxC0IPdkTxz5z Zpp2IeoczsLqN1kjUTl0YW9+q6SHX5BZA6EHrSE0BV58o2SXG4YOhKKp1h1Sug3ccp5l zLGlr9lhi65qweqTvIpPF6k8/mcfkk5UDCiLVz24+T6kmO6FLZUlcdRc1+flH0fx8XyI 2+yw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YexuFZ4f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o32si3789756edc.306.2019.10.18.06.27.06; Fri, 18 Oct 2019 06:27:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YexuFZ4f; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410335AbfJRN1B (ORCPT + 26 others); Fri, 18 Oct 2019 09:27:01 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:56134 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2410292AbfJRN1A (ORCPT ); Fri, 18 Oct 2019 09:27:00 -0400 Received: by mail-wm1-f66.google.com with SMTP id a6so6169546wma.5 for ; Fri, 18 Oct 2019 06:26:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=U+L99VOS6caLFt+CoaTBRiqHtekpuXrbr4pGYmwWDiE=; b=YexuFZ4fjlY6AevXkLYwKDu06ECkTRx8vSIFlXx6hDq+1v+NVL84yLK2F9YZbhhFxG FXfR5Ko8czlD+FeLKLYoY9Ve/rxWNtsqruSpX0mZBJMXp9pe1oiSc8KPR6wy2SJCuJCH wQxXCJh4t7J5njM9BWDMVuGFItJGfZd7xOiUmzwaRnnw9UqkPPJJw8lqPBXbLmzoAlZw AQvBqx+pW3YfnOkARBIxpcZCNQfSq6co36YadtT5BFN09zMaMkqojXMIQRX+Lboka8Fw mfBRYyX6DfVIUqkxTx/bmo/08rM/gz1lGdpa9xv2YPg0j1ANnF0zUAWThZIhrYkXklhH UIcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=U+L99VOS6caLFt+CoaTBRiqHtekpuXrbr4pGYmwWDiE=; b=JDt6yMQ0j4UONOwZqL384Ghcm3lD+Pqk80jn315ymPy5ZyWBo1laG2IEhJ5f7BqL73 I9oXSNlrujML96/dJEZxBOOU3DmvnWY9vWOdpF5e3heMeSYXSZw7hcwTGYP6cDMjdNBs AP3OEyRCD5b7MpyxVoXHHFEaYZsA4quzSVHClNKe4d4B//uktrKAwJ8AWVhW08Pxkimz yIIvdo996axB0ljEtl1R8fVGQ6W0zwZmhA4YuRCYDumOTSaAn203lDcCsnPWWZUr1/M/ 7h9Oz49wFEb4DgO2qoPu0NrXz0LrcxmPRWTkCks+Mb5PkFIZ1w3hHCo5BWTnlKrPhUqR M6fQ== X-Gm-Message-State: APjAAAWwHni/5VdNRGm07BZPxZd6fCrHrTvHNfzT8bTIws4eDKD3S3QG 18vrHSyePVD1r0wmEu6MmrI30j9W+Dk= X-Received: by 2002:a05:600c:23cc:: with SMTP id p12mr3592277wmb.163.1571405216635; Fri, 18 Oct 2019 06:26:56 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.26.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:26:55 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 07/11] sched/fair: evenly spread tasks when not overloaded Date: Fri, 18 Oct 2019 15:26:34 +0200 Message-Id: <1571405198-27570-8-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When there is only 1 cpu per group, using the idle cpus to evenly spread tasks doesn't make sense and nr_running is a better metrics. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9ac2264..9b8e20d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8601,18 +8601,34 @@ static struct sched_group *find_busiest_group(struct lb_env *env) busiest->sum_nr_running > local->sum_nr_running + 1) goto force_balance; - if (busiest->group_type != group_overloaded && - (env->idle == CPU_NOT_IDLE || - local->idle_cpus <= (busiest->idle_cpus + 1))) - /* - * If the busiest group is not overloaded - * and there is no imbalance between this and busiest group - * wrt idle CPUs, it is balanced. The imbalance - * becomes significant if the diff is greater than 1 otherwise - * we might end up to just move the imbalance on another - * group. - */ - goto out_balanced; + if (busiest->group_type != group_overloaded) { + if (env->idle == CPU_NOT_IDLE) + /* + * If the busiest group is not overloaded (and as a + * result the local one too) but this cpu is already + * busy, let another idle cpu try to pull task. + */ + goto out_balanced; + + if (busiest->group_weight > 1 && + local->idle_cpus <= (busiest->idle_cpus + 1)) + /* + * If the busiest group is not overloaded + * and there is no imbalance between this and busiest + * group wrt idle CPUs, it is balanced. The imbalance + * becomes significant if the diff is greater than 1 + * otherwise we might end up to just move the imbalance + * on another group. Of course this applies only if + * there is more than 1 CPU per group. + */ + goto out_balanced; + + if (busiest->sum_h_nr_running == 1) + /* + * busiest doesn't have any tasks waiting to run + */ + goto out_balanced; + } force_balance: /* Looks like there is an imbalance. Compute it */ From patchwork Fri Oct 18 13:26:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176839 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861882ill; Fri, 18 Oct 2019 06:27:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqxI5PN7KbJqh2IfuuQCtI/JdF8xQDobQtrRmgi8ukqSi9xJSAvMx6oXcN+vT6Op0tarrJ3K X-Received: by 2002:aa7:d687:: with SMTP id d7mr9639129edr.143.1571405240809; Fri, 18 Oct 2019 06:27:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405240; cv=none; d=google.com; s=arc-20160816; b=ZQlmTiC0H5wTWbv1WGDh54AqbLi0OmUgJ8uv6M/172y1LsDc2oT/qpuxBhcfzfPJ4d wQ7uInuGhxR6uME0eAgMrhk6eZKp2zKP8Qp3+TttyQuAVEgsSgk9N1kEwEczDSfcIGTJ CIzdQtAROCvqC5m127bDXp7oKaUgzMTRtUNagQwnXSdwNU/JP+xn1mmPvxDajjkCpxXN 5byDvG7z8tfkO8Ro9DW2XW2fd1J+O4T7fHmXSQ/ui6xmaVbgi7Y1IddY+VTi3lNV2M2J szedwxp3S1a3K3fRZDftcd5nQAQyaanH1PB5KxozGtHpxthKk7lpkAYQW/CJzzH6zgt/ YiVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=+8jnwsi/OmvMHtm1PwUS69bLbdlbxHcNftrW5yRwhLg=; b=Pd935AxYxQ1/l2/XXL2HGuGe/4BjxTncPwAH16jCq9Gea958Su1+Dp2PT5g8l0oXGA VE/fXpzdPI9ZGBM7jrKnxcH/V6I5WOlBufytx+Ns9TBLXPPIzAmDZnUbixcwhF8XIaOt Gkgn2CQV6pQk8KXz6giCqfJvcj8Vpew21tmGzinEQ/wC8XKfS/9oUG19assxHqAdceQS QQYVj89//0wzv5ogIWOuqgCtNhcA/ohDCTcOFU/F7uf1SQjoSV8Dp2QFjPRFrCOpi6xf XwSf5Y+ZfQKhOSZRfMYxniymLIkHq/F5Rmf4tPkJSa5/RUURwRbf+MYjLqmuicLaX08b HjPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=FjJZTlYA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u27si3475775ejb.172.2019.10.18.06.27.20; Fri, 18 Oct 2019 06:27:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=FjJZTlYA; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2634100AbfJRN1Q (ORCPT + 26 others); Fri, 18 Oct 2019 09:27:16 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:45104 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2410324AbfJRN1A (ORCPT ); Fri, 18 Oct 2019 09:27:00 -0400 Received: by mail-wr1-f65.google.com with SMTP id q13so1296910wrs.12 for ; Fri, 18 Oct 2019 06:26:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=+8jnwsi/OmvMHtm1PwUS69bLbdlbxHcNftrW5yRwhLg=; b=FjJZTlYA8PpPgWv6KZYWHaxO2XAZZBU9r+ZOPZQwTUDI0H8meXrJbKqMRqJpbPJ1VI iZcv2qeW2BZ4CM3PAxjJUZSdAQnB2vrBsZZsqiAeqCOFvDDLe5aKyhxztqq0WGmaV7Ki F1/solils4dofYwDL1hPx5ZcpkayypL3pbIBN2bJhqV3sn9rlHxyBERi8i27egDPh25y d0V54tpUCmEeafhIzyX025Xpb/vvRzFMsYWgs5mDo02hfJXjUR1mGo963Qz5zFVp143U 2XZguQgVtZZUWRUemHyMQs1EyHMN4aq080amxvz42hv017cexxLGEt71iKZdTvAaec1c OXWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=+8jnwsi/OmvMHtm1PwUS69bLbdlbxHcNftrW5yRwhLg=; b=jymF2IL14LatddNJKQ4krIoEoksE7e/5tUNG28Sswzh/15xoNQq8jyOWB7mom/P/D7 Owl2TJziWUfl3lWyzJFJIlXwvBnu+F1hIx7OgoWF90x1BUQ/ZU+MmrM6n13sOWyu0tKU brL2h5F626i9Aqaqftr1ZvwHNHnxYQAsp9Ot0s1UcTUXvAmEwkM1KfH9SDJ5y+QI3dm5 qlUp0qiFtNrvOAAOp30MO3OBeR24UvSmGAQAdf5eXLc35rjd/TXe69+HVGuzcU2tpVhn RPQsx3gNLLdX/r1gvEZbbzXnqduf0j/xWexMWWbnqJrD8/3HsKxs7hIT5w7tikTSE6+4 OA0A== X-Gm-Message-State: APjAAAWvaxSoB5GmpLKEDktyak4EN5Bkf/X36POe8iIMqezEPbDdIlka 7d8viuKlMA+p84l86MbcxGnsv+Rs/hY= X-Received: by 2002:a5d:408f:: with SMTP id o15mr7115548wrp.139.1571405218501; Fri, 18 Oct 2019 06:26:58 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.26.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:26:57 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 08/11] sched/fair: use utilization to select misfit task Date: Fri, 18 Oct 2019 15:26:35 +0200 Message-Id: <1571405198-27570-9-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org utilization is used to detect a misfit task but the load is then used to select the task on the CPU which can lead to select a small task with high weight instead of the task that triggered the misfit migration. Check that task can't fit the CPU's capacity when selecting the misfit task instead of using the load. Signed-off-by: Vincent Guittot Acked-by: Valentin Schneider --- kernel/sched/fair.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9b8e20d..670856d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7418,13 +7418,8 @@ static int detach_tasks(struct lb_env *env) break; case migrate_misfit: - load = task_h_load(p); - - /* - * load of misfit task might decrease a bit since it has - * been recorded. Be conservative in the condition. - */ - if (load / 2 < env->imbalance) + /* This is not a misfit task */ + if (task_fits_capacity(p, capacity_of(env->src_cpu))) goto next; env->imbalance = 0; @@ -8368,7 +8363,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s if (busiest->group_type == group_misfit_task) { /* Set imbalance to allow misfit task to be balanced. */ env->migration_type = migrate_misfit; - env->imbalance = busiest->group_misfit_task_load; + env->imbalance = 1; return; } From patchwork Fri Oct 18 13:26:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176836 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861606ill; Fri, 18 Oct 2019 06:27:07 -0700 (PDT) X-Google-Smtp-Source: APXvYqy+Uu4zUsZQpSuH6HftuBQsehQnNEcW4LY63TinWYgLwjenGdpEGBhPgRYEFh+sU+LfGFDr X-Received: by 2002:aa7:cfd4:: with SMTP id r20mr9595977edy.268.1571405227388; Fri, 18 Oct 2019 06:27:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405227; cv=none; d=google.com; s=arc-20160816; b=HwdyCjM53c0bgHwjQWiQuow1kU5nt8tpV8Za9zWlM0M4SYOOncvZpt051GQIJEyE+P syhVAep5QxD4lKKmKPE7ZpzYRB5u2JN7Xq3uC+Pksb8yLrpmTyst02UVmy2JhRhpdmBa iNBEcgBL6yw7BNjqlDJVuGmB17dRJcd6n5h2DT12ZRmvrzchQMiZZGT6a0INklwAiRx0 9zAuq3+96ORUB4BZFXXSise/W1XdEm/zfrlm7mGuUrb+C74hBZzcop9QL/CS4RQwxd/B H5JOlCIpxG5bpGvtRGua3zcUaOBjL6PCmPnAL1aM8vUaV57GuVm/w+DB72w+A+KBeeru KQpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=P/1EXP59iD2tSSDxQFHqr6RgR861BrswyMXiJ6kbQ8A=; b=BKJR8I8sbtZHiw7MGdN4AQye9pMTsEF8e9rAdJoY6XmjFn8fsaj5aFA3S8E9B15TVU lLXQk2QyJBgO/aFAT4LJuIqalA4BEaz7bJqWCdgcxMu4dkHWHAQQvrFW9oSAbgs2GAdS Z9X97XJkBQj+ccHb7eqzAlA4SMslsol0DiWiSupySJHHXtOIa3hSLQxi+TMc5y8Tht9U 6fyY5zffuyWDAN3foYH+DaKr2ADxaq/1MxAdwR7IEw/Wm2lspt+yBXb1tp/fIMblItmK ZkKdEHTc8PF9d15kbTxembP1JABhAceR4WkzlHlm3YVjug3NVaPWNegOAiMt5UwFHxhy zNoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=awr1kDiw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o32si3789756edc.306.2019.10.18.06.27.07; Fri, 18 Oct 2019 06:27:07 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=awr1kDiw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2439202AbfJRN1E (ORCPT + 26 others); Fri, 18 Oct 2019 09:27:04 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:38852 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2410333AbfJRN1C (ORCPT ); Fri, 18 Oct 2019 09:27:02 -0400 Received: by mail-wr1-f66.google.com with SMTP id o15so5843108wru.5 for ; Fri, 18 Oct 2019 06:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=P/1EXP59iD2tSSDxQFHqr6RgR861BrswyMXiJ6kbQ8A=; b=awr1kDiwCy9ZAcOnBjuEJIflo+CIkuihpEYfhcMLLrMyJoP11CB0U3qlXJ/49XTNEd Z8uBraX991qplCmKIWdripBXmIPizYl9pPYMhMqMK3Yeemr6Fnkg2RcgfXct37PHHLrF 9pPTEYsCdKguRHaEjs3eXNTbdWnnzxO0B6kuLAjhywYonAYaFf4rlvfm9ZthekjCabiG G7zNYo5niBXc1G0zYbT1xyZfqIjBLozVhUzdnzICJsFF2A5jV5BwHc317kAyv5qjQ/pJ u5BOQtoFnqCloxfpx/tF4Lnawlp7alDFKCIsNugb4q5VgYlx9SNnJe7me4WxPJgJgad3 zKKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=P/1EXP59iD2tSSDxQFHqr6RgR861BrswyMXiJ6kbQ8A=; b=a2J/buDtiFbQ2TTCJdeJvqSEbZBk7T+XnciGAWjUixsQnh4U3/i+vtjrEc8kQA5v3Z PG+RYFs62A1PvoSPz1kjt9nb5P0V92gZoZiKaOXAS3DzFf5WSgRZe+jETiMe/7fDVo9u amdG9RcKuubjU/zAjKJ07tGM9FhnwejMEu7lR4o6R2F81zDxJ3ELt5XXGM/ryFdJN4sQ lXTOniyGgMUwesVJPMJroIT0NYKi4Led0HW/1w95ashFWDnI8P8rHjscDNUI6behAB5u kbC+z+AZt+FeMlHjtKEle6sNvDE3UQmkLZ5EQguK3/Cv+hEXVSVtufUaXoVvnHZQLetn OHfg== X-Gm-Message-State: APjAAAX840PCpsIEERZhzJmCEi8pD1xDYvAQNoOEcBO9GvuWihMNavLe pe0rjhIDp3ka2OcbmKqseM9HJukb4sI= X-Received: by 2002:adf:8123:: with SMTP id 32mr8255555wrm.300.1571405220321; Fri, 18 Oct 2019 06:27:00 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.26.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:26:59 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 09/11] sched/fair: use load instead of runnable load in wakeup path Date: Fri, 18 Oct 2019 15:26:36 +0200 Message-Id: <1571405198-27570-10-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org runnable load has been introduced to take into account the case where blocked load biases the wake up path which may end to select an overloaded CPU with a large number of runnable tasks instead of an underutilized CPU with a huge blocked load. Tha wake up path now starts to looks for idle CPUs before comparing runnable load and it's worth aligning the wake up path with the load_balance. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 670856d..6203e71 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1475,7 +1475,12 @@ bool should_numa_migrate_memory(struct task_struct *p, struct page * page, group_faults_cpu(ng, src_nid) * group_faults(p, dst_nid) * 4; } -static unsigned long cpu_runnable_load(struct rq *rq); +static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq); + +static unsigned long cpu_runnable_load(struct rq *rq) +{ + return cfs_rq_runnable_load_avg(&rq->cfs); +} /* Cached statistics for all CPUs within a node */ struct numa_stats { @@ -5380,11 +5385,6 @@ static int sched_idle_cpu(int cpu) rq->nr_running); } -static unsigned long cpu_runnable_load(struct rq *rq) -{ - return cfs_rq_runnable_load_avg(&rq->cfs); -} - static unsigned long cpu_load(struct rq *rq) { return cfs_rq_load_avg(&rq->cfs); @@ -5485,7 +5485,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p, s64 this_eff_load, prev_eff_load; unsigned long task_load; - this_eff_load = cpu_runnable_load(cpu_rq(this_cpu)); + this_eff_load = cpu_load(cpu_rq(this_cpu)); if (sync) { unsigned long current_load = task_h_load(current); @@ -5503,7 +5503,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p, this_eff_load *= 100; this_eff_load *= capacity_of(prev_cpu); - prev_eff_load = cpu_runnable_load(cpu_rq(prev_cpu)); + prev_eff_load = cpu_load(cpu_rq(prev_cpu)); prev_eff_load -= task_load; if (sched_feat(WA_BIAS)) prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2; @@ -5591,7 +5591,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, max_spare_cap = 0; for_each_cpu(i, sched_group_span(group)) { - load = cpu_runnable_load(cpu_rq(i)); + load = cpu_load(cpu_rq(i)); runnable_load += load; avg_load += cfs_rq_load_avg(&cpu_rq(i)->cfs); @@ -5732,7 +5732,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this continue; } - load = cpu_runnable_load(cpu_rq(i)); + load = cpu_load(cpu_rq(i)); if (load < min_load) { min_load = load; least_loaded_cpu = i; From patchwork Fri Oct 18 13:26:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176837 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861723ill; Fri, 18 Oct 2019 06:27:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqyVyLojxTGXHbQQO5lYQzPjB7dmqjOWvDAKhnOl4xjgaVVW8ATdkJ0eGMkhAHesXjl8a1zj X-Received: by 2002:a05:6402:19bd:: with SMTP id o29mr9478402edz.42.1571405232292; Fri, 18 Oct 2019 06:27:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405232; cv=none; d=google.com; s=arc-20160816; b=SUkI20MULkI7jevUEuyvL6iI9/ppGc43AFOVR68plxXvXtj8IyItvqbckn4hC9R917 coTnigZ43WgaACEG1NCYLOqlnIsRDDQT3O7YFq91owLaBF7dZ5+u8BSGCMwcQz0y66I2 Vwgvf2BxbusyvySIHOiNd88oOKH6kfEzuxrR6u1g+3odaGabTX0I8tCwmBpwMrmFKwS7 NODvKazKSQ5Z7ih/9ft35s1JyXtQkD2KcM/YfWHKx8peLRml98wt96+GpOoXfVp7VGnb CDbDWl6fBbYuPKDbX4L77UfbS8qFtMLwrBBEpjCe5Lr3zpq4TwPUMyP1HK24fKQXoCc4 oqdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=rF17NfzVuoPQDjrV4B8MSfUDRkdA/P46N9yA+eCB77U=; b=SvgKpA3hbhn+XZl83WzotEtpd53YbWi2qieaDhtHjeRP3GTDiPgV5+NvVzThhNKmA5 kAPn+UrmNN+fg+x3w71BFq1FJvwbpUyVkV5j86eY2NTpQGMMjw6rPnOev/C8/H2jTKso 7YZYG+GOr6hTdzDAl2brHgFv0lgBVj71GBbIb2s5thCPuauLvDGIFybpxy2r0NvK2O15 Bn8w7LCXCLSLRc13M6XvWnzpL/wHPcKSv2LHDdHhYR8SE6lWxWPObdvDtnGy+yD4Mnkq 6Tx8+arkWxfTxlnHtudU5NSoLasmwdVQfftnSP/J8dX9cGw/mOoNotcnEMAO2Lea69D4 qaqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=G0oDhG1t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x62si4011677ede.352.2019.10.18.06.27.12; Fri, 18 Oct 2019 06:27:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=G0oDhG1t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2442845AbfJRN1H (ORCPT + 26 others); Fri, 18 Oct 2019 09:27:07 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:55007 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438919AbfJRN1E (ORCPT ); Fri, 18 Oct 2019 09:27:04 -0400 Received: by mail-wm1-f65.google.com with SMTP id p7so6193216wmp.4 for ; Fri, 18 Oct 2019 06:27:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=rF17NfzVuoPQDjrV4B8MSfUDRkdA/P46N9yA+eCB77U=; b=G0oDhG1tp21hqgoc0Wftnssc9hxoXtNaWmw+rX4yhbE0TfCUsLCUi2lUibmsO/dO+R ZAeKslXiKdUvQNF5+/tWf8z9YDwDNh6iP0kqNAH1W7mjk3OIIDkR25vTWoJeiXryGQz6 CiTA0HSae02uTYsH1CtxaxiWGKU6dikwaGiWNLbOlebv82W4MPTw/9aOJKRbnidMVfeA Edeo7C2SnFT5Q3TH25FB1bmiIn4sAIFVK0vMDtEm3krbAKip3CQ9qFFs0X7/Il3cQIrx 65F2uNoLKYHNl6GoBI/NfYUfqLUT+Mb/g1cOXMKKY6xumD9nWHZhvz1+B80kCITNjZRm ysYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=rF17NfzVuoPQDjrV4B8MSfUDRkdA/P46N9yA+eCB77U=; b=WhrzCnH9Uzd9PTCSlpWmeaRs+6XZCBCzoW4qm1a6nAnaMWYEnQB59BC4PPMFPXTT0M VlEM/2ZW570TEfRGRbZfzrIJ06Lk1SX6HFXHaX4QQv83xC2KtBgDh1UW1bMdnns6WNRd Phpq6zMvGb7eqWWJ691lMuYZJb8JUT/PeSIFX6Ir7daPci0PBjrHesSRzV1QsLVITYQC s3NH114Y/PtJwFtpBfn5ePzuda1pZcWp72etLjbtatC49sqURXuSSSEv4f16a0X0Z4FY oslzxadSRrLBfaQEF6ojeDhOzB7JW9UBNqREaDSY0I6KaAs07zEuzLti6c0lJPc3+Mwv u3/A== X-Gm-Message-State: APjAAAVW5ilrALn/gTZsmZREvgOWYTe9vS3UMNcNMy/tmqKXRRC8DwIo QFlFsQQUNS9a53AUvZThx9MJvYLO2KY= X-Received: by 2002:a1c:9c0c:: with SMTP id f12mr1836748wme.133.1571405222144; Fri, 18 Oct 2019 06:27:02 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.27.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:27:00 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 10/11] sched/fair: optimize find_idlest_group Date: Fri, 18 Oct 2019 15:26:37 +0200 Message-Id: <1571405198-27570-11-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org find_idlest_group() now reads CPU's load_avg in 2 different ways. Consolidate the function to read and use load_avg only once and simplify the algorithm to only look for the group with lowest load_avg. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 50 ++++++++++++++------------------------------------ 1 file changed, 14 insertions(+), 36 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6203e71..ed1800d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5560,16 +5560,14 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, { struct sched_group *idlest = NULL, *group = sd->groups; struct sched_group *most_spare_sg = NULL; - unsigned long min_runnable_load = ULONG_MAX; - unsigned long this_runnable_load = ULONG_MAX; - unsigned long min_avg_load = ULONG_MAX, this_avg_load = ULONG_MAX; + unsigned long min_load = ULONG_MAX, this_load = ULONG_MAX; unsigned long most_spare = 0, this_spare = 0; int imbalance_scale = 100 + (sd->imbalance_pct-100)/2; unsigned long imbalance = scale_load_down(NICE_0_LOAD) * (sd->imbalance_pct-100) / 100; do { - unsigned long load, avg_load, runnable_load; + unsigned long load; unsigned long spare_cap, max_spare_cap; int local_group; int i; @@ -5586,15 +5584,11 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, * Tally up the load of all CPUs in the group and find * the group containing the CPU with most spare capacity. */ - avg_load = 0; - runnable_load = 0; + load = 0; max_spare_cap = 0; for_each_cpu(i, sched_group_span(group)) { - load = cpu_load(cpu_rq(i)); - runnable_load += load; - - avg_load += cfs_rq_load_avg(&cpu_rq(i)->cfs); + load += cpu_load(cpu_rq(i)); spare_cap = capacity_spare_without(i, p); @@ -5603,31 +5597,15 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, } /* Adjust by relative CPU capacity of the group */ - avg_load = (avg_load * SCHED_CAPACITY_SCALE) / - group->sgc->capacity; - runnable_load = (runnable_load * SCHED_CAPACITY_SCALE) / + load = (load * SCHED_CAPACITY_SCALE) / group->sgc->capacity; if (local_group) { - this_runnable_load = runnable_load; - this_avg_load = avg_load; + this_load = load; this_spare = max_spare_cap; } else { - if (min_runnable_load > (runnable_load + imbalance)) { - /* - * The runnable load is significantly smaller - * so we can pick this new CPU: - */ - min_runnable_load = runnable_load; - min_avg_load = avg_load; - idlest = group; - } else if ((runnable_load < (min_runnable_load + imbalance)) && - (100*min_avg_load > imbalance_scale*avg_load)) { - /* - * The runnable loads are close so take the - * blocked load into account through avg_load: - */ - min_avg_load = avg_load; + if (load < min_load) { + min_load = load; idlest = group; } @@ -5668,18 +5646,18 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, * local domain to be very lightly loaded relative to the remote * domains but "imbalance" skews the comparison making remote CPUs * look much more favourable. When considering cross-domain, add - * imbalance to the runnable load on the remote node and consider - * staying local. + * imbalance to the load on the remote node and consider staying + * local. */ if ((sd->flags & SD_NUMA) && - min_runnable_load + imbalance >= this_runnable_load) + min_load + imbalance >= this_load) return NULL; - if (min_runnable_load > (this_runnable_load + imbalance)) + if (min_load >= this_load + imbalance) return NULL; - if ((this_runnable_load < (min_runnable_load + imbalance)) && - (100*this_avg_load < imbalance_scale*min_avg_load)) + if ((this_load < (min_load + imbalance)) && + (100*this_load < imbalance_scale*min_load)) return NULL; return idlest; From patchwork Fri Oct 18 13:26:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 176838 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp861733ill; Fri, 18 Oct 2019 06:27:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqwDxX21N7bLAr2ar3OyRzBMjW16xGm+rjG3sjUlzk8k4PzCLVJ18GvgpqK8TC2L5QE6ya5o X-Received: by 2002:aa7:d758:: with SMTP id a24mr9816276eds.194.1571405232853; Fri, 18 Oct 2019 06:27:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571405232; cv=none; d=google.com; s=arc-20160816; b=DS6kg92IoTGQnfas77fVdrft+Apm0asHmT+9ZSVEd38imAF5DFxxVbEoZFrZcZATn3 5+JVZWDo5dUdizluzSMLqMZujv+icBcf3budp0PhiSW71S3YLR8Ss2ewEpF9vNzhrROA vOoRwLK5Xd9WCnqTXmXHpL5Agcjrh82r29WYYzzip0AJw+Q+a5D52mILm2YT1nn3eEs1 Urm9d3dtnsaEzg4HF5kvRdPfaAa6wPEG/ape+8PHYx7ge+gP8Ybq3Z6m/DlrsXoOJDiq kOdrwglxGIkjB7wnWL+oO3t7V9jEFAOCu4943ISeTHWe2L3RLFVGiwO+RkwE8M580kkZ U4og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=YYd+mUbPd3e33Spb/1ehWk7tN88cH6YOPs4EY/LiqPc=; b=yTLvc6t/0TE9lt98mrwJ3Js3EwKXQ9EYzYl0dtlA+F4lhyWN0gk6bKkNgbwon1xXUL sjp+Wk5NS/qIXrr0G4fllHrvAqx5ty/5c0Dr8S7u9OSnCsi59tSP0xDdqgABbhsn8hPz 7m/3bfy34ajDvu3rWzmOp4mnjHSik65wIxmSoH/JGqJDlMDm/b4G04AW1k8zSMfZcs+l RWyMMbJkFbz77M969H6puVSHAW8zosluntu+cUbCZNxOS7o6BpOJ1bTvEFADJ7EpZKEK t6fc4bpImbSwANCj7sHl7LxvlV6T3ilUqHU5kxN0hAf3p6wS+qsjBZ1acyyYtmvPb0Yd w8hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VttkybTh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x62si4011677ede.352.2019.10.18.06.27.12; Fri, 18 Oct 2019 06:27:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VttkybTh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2634091AbfJRN1L (ORCPT + 26 others); Fri, 18 Oct 2019 09:27:11 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:34331 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2439247AbfJRN1H (ORCPT ); Fri, 18 Oct 2019 09:27:07 -0400 Received: by mail-wm1-f66.google.com with SMTP id y135so9362293wmc.1 for ; Fri, 18 Oct 2019 06:27:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=YYd+mUbPd3e33Spb/1ehWk7tN88cH6YOPs4EY/LiqPc=; b=VttkybTh0QZIjCzC2bL9vxvKkoHzzSICvaje+Ck9uoTmQo6hcy1P7WgvpAdsOq2m8w Fh0UvF1w+ZRdvAdvCU0grNxC1Ndrkm6dVKebiqp2buBE7ksvLv9fRKPnNcCSFWQEwcxq 4qpjAeLtfWk6fShNIG1MiuawVKEHdVCBYbthScqRbPPGxvGgUfmEGQsbTmSmNdEsU2YD oRLy1GYqYsvLsKil3zQrKxQq1WCYzWfyJy1UD9dyt9VjhGOHHwyKwWKBZ670xWXe41bR wF0nLhao8m6Xt8NCYnKlTy774zRu9rpEqnAlG0JVdgn0yXKbIGBCB4GlbPt4W09QIHkN 8rEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=YYd+mUbPd3e33Spb/1ehWk7tN88cH6YOPs4EY/LiqPc=; b=NVKqxDmB1rWiOjoIw8UXXNGpfkbnTn5x6VTwSe4P8DTnfb0FU4eTpM2rqMUaj0sg19 KgZoQNmo5JlCB/CoMT6AEmSTIFl4uaCMYkCqasTpsH5ZpeSAQLb20LdrzryRYP2FBfeP ouW9hFSLcBywIT7aysvQOZmKW21w2JuoQcJ1G5f6q1Jucmu3Gj/am2AOfOI4WIK3SNZ/ kd+h+mfuSFyPdD7QwNka0dSimO/I1W0gNGzLG3S7rphmi62FtxeaqiitD5KLAF2JamSy tGq8HsVs2+rZoWkK/tkgZd/cSGgZA733HG94d7ZlRld95DD8K5AJLohraUU/GjRkhCV4 SfzQ== X-Gm-Message-State: APjAAAWQ6pmyi9NiSyUaHozugvpCVPXQuT5pZa4qeD3E6mOIJvKIQCet d3OsOIvdVvC5ljMKCgVfOZqOYV+ov4M= X-Received: by 2002:a1c:f401:: with SMTP id z1mr7425910wma.66.1571405223931; Fri, 18 Oct 2019 06:27:03 -0700 (PDT) Received: from localhost.localdomain (91-160-61-128.subs.proxad.net. [91.160.61.128]) by smtp.gmail.com with ESMTPSA id p15sm5870123wrs.94.2019.10.18.06.27.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 18 Oct 2019 06:27:02 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com, riel@surriel.com, Vincent Guittot Subject: [PATCH v4 11/11] sched/fair: rework find_idlest_group Date: Fri, 18 Oct 2019 15:26:38 +0200 Message-Id: <1571405198-27570-12-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> References: <1571405198-27570-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The slow wake up path computes per sched_group statisics to select the idlest group, which is quite similar to what load_balance() is doing for selecting busiest group. Rework find_idlest_group() to classify the sched_group and select the idlest one following the same steps as load_balance(). Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 384 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 256 insertions(+), 128 deletions(-) -- 2.7.4 Reviewed-by: Valentin Schneider diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ed1800d..fbaafae 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5541,127 +5541,9 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, return target; } -static unsigned long cpu_util_without(int cpu, struct task_struct *p); - -static unsigned long capacity_spare_without(int cpu, struct task_struct *p) -{ - return max_t(long, capacity_of(cpu) - cpu_util_without(cpu, p), 0); -} - -/* - * find_idlest_group finds and returns the least busy CPU group within the - * domain. - * - * Assumes p is allowed on at least one CPU in sd. - */ static struct sched_group * find_idlest_group(struct sched_domain *sd, struct task_struct *p, - int this_cpu, int sd_flag) -{ - struct sched_group *idlest = NULL, *group = sd->groups; - struct sched_group *most_spare_sg = NULL; - unsigned long min_load = ULONG_MAX, this_load = ULONG_MAX; - unsigned long most_spare = 0, this_spare = 0; - int imbalance_scale = 100 + (sd->imbalance_pct-100)/2; - unsigned long imbalance = scale_load_down(NICE_0_LOAD) * - (sd->imbalance_pct-100) / 100; - - do { - unsigned long load; - unsigned long spare_cap, max_spare_cap; - int local_group; - int i; - - /* Skip over this group if it has no CPUs allowed */ - if (!cpumask_intersects(sched_group_span(group), - p->cpus_ptr)) - continue; - - local_group = cpumask_test_cpu(this_cpu, - sched_group_span(group)); - - /* - * Tally up the load of all CPUs in the group and find - * the group containing the CPU with most spare capacity. - */ - load = 0; - max_spare_cap = 0; - - for_each_cpu(i, sched_group_span(group)) { - load += cpu_load(cpu_rq(i)); - - spare_cap = capacity_spare_without(i, p); - - if (spare_cap > max_spare_cap) - max_spare_cap = spare_cap; - } - - /* Adjust by relative CPU capacity of the group */ - load = (load * SCHED_CAPACITY_SCALE) / - group->sgc->capacity; - - if (local_group) { - this_load = load; - this_spare = max_spare_cap; - } else { - if (load < min_load) { - min_load = load; - idlest = group; - } - - if (most_spare < max_spare_cap) { - most_spare = max_spare_cap; - most_spare_sg = group; - } - } - } while (group = group->next, group != sd->groups); - - /* - * The cross-over point between using spare capacity or least load - * is too conservative for high utilization tasks on partially - * utilized systems if we require spare_capacity > task_util(p), - * so we allow for some task stuffing by using - * spare_capacity > task_util(p)/2. - * - * Spare capacity can't be used for fork because the utilization has - * not been set yet, we must first select a rq to compute the initial - * utilization. - */ - if (sd_flag & SD_BALANCE_FORK) - goto skip_spare; - - if (this_spare > task_util(p) / 2 && - imbalance_scale*this_spare > 100*most_spare) - return NULL; - - if (most_spare > task_util(p) / 2) - return most_spare_sg; - -skip_spare: - if (!idlest) - return NULL; - - /* - * When comparing groups across NUMA domains, it's possible for the - * local domain to be very lightly loaded relative to the remote - * domains but "imbalance" skews the comparison making remote CPUs - * look much more favourable. When considering cross-domain, add - * imbalance to the load on the remote node and consider staying - * local. - */ - if ((sd->flags & SD_NUMA) && - min_load + imbalance >= this_load) - return NULL; - - if (min_load >= this_load + imbalance) - return NULL; - - if ((this_load < (min_load + imbalance)) && - (100*this_load < imbalance_scale*min_load)) - return NULL; - - return idlest; -} + int this_cpu, int sd_flag); /* * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group. @@ -5734,7 +5616,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p return prev_cpu; /* - * We need task's util for capacity_spare_without, sync it up to + * We need task's util for cpu_util_without, sync it up to * prev_cpu's last_update_time. */ if (!(sd_flag & SD_BALANCE_FORK)) @@ -7915,13 +7797,13 @@ static inline int sg_imbalanced(struct sched_group *group) * any benefit for the load balance. */ static inline bool -group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) +group_has_capacity(unsigned int imbalance_pct, struct sg_lb_stats *sgs) { if (sgs->sum_nr_running < sgs->group_weight) return true; if ((sgs->group_capacity * 100) > - (sgs->group_util * env->sd->imbalance_pct)) + (sgs->group_util * imbalance_pct)) return true; return false; @@ -7936,13 +7818,13 @@ group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) * false. */ static inline bool -group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs) +group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs) { if (sgs->sum_nr_running <= sgs->group_weight) return false; if ((sgs->group_capacity * 100) < - (sgs->group_util * env->sd->imbalance_pct)) + (sgs->group_util * imbalance_pct)) return true; return false; @@ -7969,11 +7851,11 @@ group_smaller_max_cpu_capacity(struct sched_group *sg, struct sched_group *ref) } static inline enum -group_type group_classify(struct lb_env *env, +group_type group_classify(unsigned int imbalance_pct, struct sched_group *group, struct sg_lb_stats *sgs) { - if (group_is_overloaded(env, sgs)) + if (group_is_overloaded(imbalance_pct, sgs)) return group_overloaded; if (sg_imbalanced(group)) @@ -7985,7 +7867,7 @@ group_type group_classify(struct lb_env *env, if (sgs->group_misfit_task_load) return group_misfit_task; - if (!group_has_capacity(env, sgs)) + if (!group_has_capacity(imbalance_pct, sgs)) return group_fully_busy; return group_has_spare; @@ -8086,7 +7968,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_weight = group->group_weight; - sgs->group_type = group_classify(env, group, sgs); + sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs); /* Computing avg_load makes sense only when group is overloaded */ if (sgs->group_type == group_overloaded) @@ -8241,6 +8123,252 @@ static inline enum fbq_type fbq_classify_rq(struct rq *rq) } #endif /* CONFIG_NUMA_BALANCING */ + +struct sg_lb_stats; + +/* + * update_sg_wakeup_stats - Update sched_group's statistics for wakeup. + * @denv: The ched_domain level to look for idlest group. + * @group: sched_group whose statistics are to be updated. + * @sgs: variable to hold the statistics for this group. + */ +static inline void update_sg_wakeup_stats(struct sched_domain *sd, + struct sched_group *group, + struct sg_lb_stats *sgs, + struct task_struct *p) +{ + int i, nr_running; + + memset(sgs, 0, sizeof(*sgs)); + + for_each_cpu(i, sched_group_span(group)) { + struct rq *rq = cpu_rq(i); + + sgs->group_load += cpu_load(rq); + sgs->group_util += cpu_util_without(i, p); + sgs->sum_h_nr_running += rq->cfs.h_nr_running; + + nr_running = rq->nr_running; + sgs->sum_nr_running += nr_running; + + /* + * No need to call idle_cpu() if nr_running is not 0 + */ + if (!nr_running && idle_cpu(i)) + sgs->idle_cpus++; + + + } + + /* Check if task fits in the group */ + if (sd->flags & SD_ASYM_CPUCAPACITY && + !task_fits_capacity(p, group->sgc->max_capacity)) { + sgs->group_misfit_task_load = 1; + } + + sgs->group_capacity = group->sgc->capacity; + + sgs->group_type = group_classify(sd->imbalance_pct, group, sgs); + + /* + * Computing avg_load makes sense only when group is fully busy or + * overloaded + */ + if (sgs->group_type < group_fully_busy) + sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) / + sgs->group_capacity; +} + +static bool update_pick_idlest(struct sched_group *idlest, + struct sg_lb_stats *idlest_sgs, + struct sched_group *group, + struct sg_lb_stats *sgs) +{ + if (sgs->group_type < idlest_sgs->group_type) + return true; + + if (sgs->group_type > idlest_sgs->group_type) + return false; + + /* + * The candidate and the current idles group are the same type of + * group. Let check which one is the idlest according to the type. + */ + + switch (sgs->group_type) { + case group_overloaded: + case group_fully_busy: + /* Select the group with lowest avg_load. */ + if (idlest_sgs->avg_load <= sgs->avg_load) + return false; + break; + + case group_imbalanced: + case group_asym_packing: + /* Those types are not used in the slow wakeup path */ + return false; + + case group_misfit_task: + /* Select group with the highest max capacity */ + if (idlest->sgc->max_capacity >= group->sgc->max_capacity) + return false; + break; + + case group_has_spare: + /* Select group with most idle CPUs */ + if (idlest_sgs->idle_cpus >= sgs->idle_cpus) + return false; + break; + } + + return true; +} + +/* + * find_idlest_group finds and returns the least busy CPU group within the + * domain. + * + * Assumes p is allowed on at least one CPU in sd. + */ +static struct sched_group * +find_idlest_group(struct sched_domain *sd, struct task_struct *p, + int this_cpu, int sd_flag) +{ + struct sched_group *idlest = NULL, *local = NULL, *group = sd->groups; + struct sg_lb_stats local_sgs, tmp_sgs; + struct sg_lb_stats *sgs; + unsigned long imbalance; + struct sg_lb_stats idlest_sgs = { + .avg_load = UINT_MAX, + .group_type = group_overloaded, + }; + + imbalance = scale_load_down(NICE_0_LOAD) * + (sd->imbalance_pct-100) / 100; + + do { + int local_group; + + /* Skip over this group if it has no CPUs allowed */ + if (!cpumask_intersects(sched_group_span(group), + p->cpus_ptr)) + continue; + + local_group = cpumask_test_cpu(this_cpu, + sched_group_span(group)); + + if (local_group) { + sgs = &local_sgs; + local = group; + } else { + sgs = &tmp_sgs; + } + + update_sg_wakeup_stats(sd, group, sgs, p); + + if (!local_group && update_pick_idlest(idlest, &idlest_sgs, group, sgs)) { + idlest = group; + idlest_sgs = *sgs; + } + + } while (group = group->next, group != sd->groups); + + + /* There is no idlest group to push tasks to */ + if (!idlest) + return NULL; + + /* + * If the local group is idler than the selected idlest group + * don't try and push the task. + */ + if (local_sgs.group_type < idlest_sgs.group_type) + return NULL; + + /* + * If the local group is busier than the selected idlest group + * try and push the task. + */ + if (local_sgs.group_type > idlest_sgs.group_type) + return idlest; + + switch (local_sgs.group_type) { + case group_overloaded: + case group_fully_busy: + /* + * When comparing groups across NUMA domains, it's possible for + * the local domain to be very lightly loaded relative to the + * remote domains but "imbalance" skews the comparison making + * remote CPUs look much more favourable. When considering + * cross-domain, add imbalance to the load on the remote node + * and consider staying local. + */ + + if ((sd->flags & SD_NUMA) && + ((idlest_sgs.avg_load + imbalance) >= local_sgs.avg_load)) + return NULL; + + /* + * If the local group is less loaded than the selected + * idlest group don't try and push any tasks. + */ + if (idlest_sgs.avg_load >= (local_sgs.avg_load + imbalance)) + return NULL; + + if (100 * local_sgs.avg_load <= sd->imbalance_pct * idlest_sgs.avg_load) + return NULL; + break; + + case group_imbalanced: + case group_asym_packing: + /* Those type are not used in the slow wakeup path */ + return NULL; + + case group_misfit_task: + /* Select group with the highest max capacity */ + if (local->sgc->max_capacity >= idlest->sgc->max_capacity) + return NULL; + break; + + case group_has_spare: + if (sd->flags & SD_NUMA) { +#ifdef CONFIG_NUMA_BALANCING + int idlest_cpu; + /* + * If there is spare capacity at NUMA, try to select + * the preferred node + */ + if (cpu_to_node(this_cpu) == p->numa_preferred_nid) + return NULL; + + idlest_cpu = cpumask_first(sched_group_span(idlest)); + if (cpu_to_node(idlest_cpu) == p->numa_preferred_nid) + return idlest; +#endif + /* + * Otherwise, keep the task on this node to stay close + * its wakeup source and improve locality. If there is + * a real need of migration, periodic load balance will + * take care of it. + */ + if (local_sgs.idle_cpus) + return NULL; + } + + /* + * Select group with highest number of idle cpus. We could also + * compare the utilization which is more stable but it can end + * up that the group has less spare capacity but finally more + * idle cpus which means more opportunity to run task. + */ + if (local_sgs.idle_cpus >= idlest_sgs.idle_cpus) + return NULL; + break; + } + + return idlest; +} + /** * update_sd_lb_stats - Update sched_domain's statistics for load balancing. * @env: The load balancing environment.