From patchwork Mon Dec 5 09:27:36 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 86522 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp1382783qgi; Mon, 5 Dec 2016 01:29:33 -0800 (PST) X-Received: by 10.84.216.20 with SMTP id m20mr123328885pli.126.1480930173740; Mon, 05 Dec 2016 01:29:33 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d2si14002080pli.315.2016.12.05.01.29.33; Mon, 05 Dec 2016 01:29:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752253AbcLEJ1w (ORCPT + 25 others); Mon, 5 Dec 2016 04:27:52 -0500 Received: from mail-wj0-f170.google.com ([209.85.210.170]:34472 "EHLO mail-wj0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751340AbcLEJ1l (ORCPT ); Mon, 5 Dec 2016 04:27:41 -0500 Received: by mail-wj0-f170.google.com with SMTP id tg4so28920456wjb.1 for ; Mon, 05 Dec 2016 01:27:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=3GIu4Rj1U+rRgeJb+s3rtYN41/tuVEGfYH20JC+HeHk=; b=Ok2K99/fxo8TKczzPORRMVVsy/Qsyj+5LsSABms7IpFXN5zgjzYIhalmEIPwOkBuSZ bXATuclnUeHNEVoGcFwX2sfadj/USJjl7T1ARzHgCGLYyrXvNL1plLGSnK4ELQXbqw8y P3T39ehPmlb9w8DLYUOWkL8Ii1Rn0O2xPrGvE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=3GIu4Rj1U+rRgeJb+s3rtYN41/tuVEGfYH20JC+HeHk=; b=Zt6cNL0YpUsOuNQ+ZONvgWkh8vQxJzRvg/drpaJlppjTfQeL4xLzLADRbyE/vbaUCX OBnYO3DJ6f6o1VgqeAydowG13s+cS78fTzlLDHiGtyUqMiHyOB8d/lchcK0MRtTgxeG9 JZ88vVk6tXh1NELIuM8g+Tl71bif3qkRZorvK/5guKZKW+ODlK/2fX1ZEa+MS2Cn5hIt VaZClt+NywU8QhTlaRLtUIaxhfuiDAAitTTEtmKr18+4CeJ7YjuMYCG2JcJE17H/qSPl Iz6e3z9Boop9v3nSlyuGmuvF2CLR0qrvvP2jUHPMtvvmbENHLhrvhhyp4UHoyZcCxUgx URzw== X-Gm-Message-State: AKaTC00n2ffQjd4mAAMVRMYnrzWsnG9PTtVH1aE3SlnSQ5Xr3XFrXYU3UPQNcdEP2uBD1nSs X-Received: by 10.194.85.77 with SMTP id f13mr49073651wjz.187.1480930059146; Mon, 05 Dec 2016 01:27:39 -0800 (PST) Received: from linaro.org ([2a01:e0a:f:6020:9046:2b86:6f44:ba52]) by smtp.gmail.com with ESMTPSA id c81sm16968965wmf.22.2016.12.05.01.27.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 05 Dec 2016 01:27:37 -0800 (PST) Date: Mon, 5 Dec 2016 10:27:36 +0100 From: Vincent Guittot To: Matt Fleming Cc: Brendan Gregg , Peter Zijlstra , Ingo Molnar , LKML , Morten.Rasmussen@arm.com, dietmar.eggemann@arm.com, kernellwp@gmail.com, yuyang.du@intel.com, umgwanakikbuti@gmail.com, Mel Gorman Subject: Re: [PATCH 2/2 v2] sched: use load_avg for selecting idlest group Message-ID: <20161205092735.GA9161@linaro.org> References: <1480088073-11642-1-git-send-email-vincent.guittot@linaro.org> <1480088073-11642-3-git-send-email-vincent.guittot@linaro.org> <20161203214707.GI20785@codeblueprint.co.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20161203214707.GI20785@codeblueprint.co.uk> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le Saturday 03 Dec 2016 à 21:47:07 (+0000), Matt Fleming a écrit : > On Fri, 02 Dec, at 07:31:04PM, Brendan Gregg wrote: > > > > For background, is this from the "A decade of wasted cores" paper's > > patches? > > No, this patch fixes an issue I originally reported here, > > https://lkml.kernel.org/r/20160923115808.2330-1-matt@codeblueprint.co.uk > > Essentially, if you have an idle or partially-idle system and a > workload that consists of fork()'ing a bunch of tasks, where each of > those tasks immediately sleeps waiting for some wakeup, then those > tasks aren't spread across all idle CPUs very well. > > We saw this issue when running hackbench with a small loop count, such > that the actual benchmark setup (fork()'ing) is where the majority of > the runtime is spent. > > In that scenario, there's a large potential/blocked load, but > essentially no runnable load, and the balance on fork scheduler code > only cares about runnable load without Vincent's patch applied. > > The closest thing I can find in the "A decade of wasted cores" paper > is "The Overload-on-Wakeup bug", but I don't think that's the issue > here since, > > a) We're balancing on fork, not wakeup > b) The fork on balance code balances across nodes OK > > > What's the expected typical gain? Thanks, > > The results are still coming back from the SUSE performance test grid > but they do show that this patch is mainly a win for multi-socket > machines with more than 8 cores or thereabouts. > > [ Vincent, I'll follow up to your PATCH 1/2 with the results that are > specifically for that patch ] > > Assuming a fork-intensive or fork-dominated workload, and a > multi-socket machine, such as this 2 socket, NUMA, with 12 cores and > HT enabled (48 cpus), we saw a very clear win between +10% and +15% > for processes communicating via pipes, > > (1) tip-sched = tip/sched/core branch > (2) fix-fig-for-fork = (1) + PATCH 1/2 > (3) fix-sig = (1) + (2) + PATCH 2/2 > > hackbench-process-pipes > 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 > tip-sched fix-fig-for-fork fix-sig > Amean 1 0.0717 ( 0.00%) 0.0696 ( 2.99%) 0.0730 ( -1.79%) > Amean 4 0.1244 ( 0.00%) 0.1200 ( 3.56%) 0.1190 ( 4.36%) > Amean 7 0.1891 ( 0.00%) 0.1937 ( -2.42%) 0.1831 ( 3.17%) > Amean 12 0.2964 ( 0.00%) 0.3116 ( -5.11%) 0.2784 ( 6.07%) > Amean 21 0.4011 ( 0.00%) 0.4090 ( -1.96%) 0.3574 ( 10.90%) > Amean 30 0.4944 ( 0.00%) 0.4654 ( 5.87%) 0.4171 ( 15.63%) > Amean 48 0.6113 ( 0.00%) 0.6309 ( -3.20%) 0.5331 ( 12.78%) > Amean 79 0.8616 ( 0.00%) 0.8706 ( -1.04%) 0.7710 ( 10.51%) > Amean 110 1.1304 ( 0.00%) 1.2211 ( -8.02%) 1.0163 ( 10.10%) > Amean 141 1.3754 ( 0.00%) 1.4279 ( -3.81%) 1.2803 ( 6.92%) > Amean 172 1.6217 ( 0.00%) 1.7367 ( -7.09%) 1.5363 ( 5.27%) > Amean 192 1.7809 ( 0.00%) 2.0199 (-13.42%) 1.7129 ( 3.82%) > > Things look even better when using threads and pipes, with wins > between 11% and 29% when looking at results outside of the noise, > > hackbench-thread-pipes > 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 > tip-sched fix-fig-for-fork fix-sig > Amean 1 0.0736 ( 0.00%) 0.0794 ( -7.96%) 0.0779 ( -5.83%) > Amean 4 0.1709 ( 0.00%) 0.1690 ( 1.09%) 0.1663 ( 2.68%) > Amean 7 0.2836 ( 0.00%) 0.3080 ( -8.61%) 0.2640 ( 6.90%) > Amean 12 0.4393 ( 0.00%) 0.4843 (-10.24%) 0.4090 ( 6.89%) > Amean 21 0.5821 ( 0.00%) 0.6369 ( -9.40%) 0.5126 ( 11.95%) > Amean 30 0.6557 ( 0.00%) 0.6459 ( 1.50%) 0.5711 ( 12.90%) > Amean 48 0.7924 ( 0.00%) 0.7760 ( 2.07%) 0.6286 ( 20.68%) > Amean 79 1.0534 ( 0.00%) 1.0551 ( -0.16%) 0.8481 ( 19.49%) > Amean 110 1.5286 ( 0.00%) 1.4504 ( 5.11%) 1.1121 ( 27.24%) > Amean 141 1.9507 ( 0.00%) 1.7790 ( 8.80%) 1.3804 ( 29.23%) > Amean 172 2.2261 ( 0.00%) 2.3330 ( -4.80%) 1.6336 ( 26.62%) > Amean 192 2.3753 ( 0.00%) 2.3307 ( 1.88%) 1.8246 ( 23.19%) > > Somewhat surprisingly, I can see improvements for UMA machines with > fewer cores when the workload heavily saturates the machine and the > workload isn't dominated by fork. Such heavy saturation isn't super > realistic, but still interesting. I haven't dug into why these results > occurred, but I am happy things didn't instead fall off a cliff. > > Here's a 4-cpu UMA box showing some improvement at the higher end, > > hackbench-process-pipes > 4.9.0-rc6 4.9.0-rc6 4.9.0-rc6 > tip-sched fix-fig-for-fork fix-sig > Amean 1 3.5060 ( 0.00%) 3.5747 ( -1.96%) 3.5117 ( -0.16%) > Amean 3 7.7113 ( 0.00%) 7.8160 ( -1.36%) 7.7747 ( -0.82%) > Amean 5 11.4453 ( 0.00%) 11.5710 ( -1.10%) 11.3870 ( 0.51%) > Amean 7 15.3147 ( 0.00%) 15.9420 ( -4.10%) 15.8450 ( -3.46%) > Amean 12 25.5110 ( 0.00%) 24.3410 ( 4.59%) 22.6717 ( 11.13%) > Amean 16 32.3010 ( 0.00%) 28.5897 ( 11.49%) 25.7473 ( 20.29%) Hi Matt, Thanks for the results. During the review, it has been pointed out by Morten that the test condition (100*this_avg_load < imbalance_scale*min_avg_load) makes more sense than (100*min_avg_load > imbalance_scale*this_avg_load). But i see lower performances with this change. Coud you run tests with the change below on top of the patchset ? --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e8d1ae7..0129fbb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5514,7 +5514,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, if (!idlest || (min_runnable_load > (this_runnable_load + imbalance)) || ((this_runnable_load < (min_runnable_load + imbalance)) && - (100*min_avg_load > imbalance_scale*this_avg_load))) + (100*this_avg_load < imbalance_scale*min_avg_load))) return NULL; return idlest; }