From patchwork Tue Sep 6 13:01:27 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 75507 Delivered-To: patch@linaro.org Received: by 10.140.106.11 with SMTP id d11csp535463qgf; Tue, 6 Sep 2016 06:01:38 -0700 (PDT) X-Received: by 10.157.59.227 with SMTP id k90mr10723344otc.44.1473166898547; Tue, 06 Sep 2016 06:01:38 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r47si12153244otc.190.2016.09.06.06.01.37; Tue, 06 Sep 2016 06:01:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934925AbcIFNBf (ORCPT + 27 others); Tue, 6 Sep 2016 09:01:35 -0400 Received: from mail-wm0-f50.google.com ([74.125.82.50]:35692 "EHLO mail-wm0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932428AbcIFNBd (ORCPT ); Tue, 6 Sep 2016 09:01:33 -0400 Received: by mail-wm0-f50.google.com with SMTP id i204so33231071wma.0 for ; Tue, 06 Sep 2016 06:01:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=Ygj5sWQWKFbeDRlp96/dA0a5FbVmAt6ZHkGDyp+9xnc=; b=cEPM8m+5oHggbZOpdsh55VGW9sDnLEZuyZYvZNJ8uXdo15/fWlgdLKZa0PDcc7L99s 6fTz07p7XJEKzbTCnFwjXJtLBLAncv+kRp+dMS5+iR1D6o/gedTh7KJbPKWXjKG9D6Rr xgmsSd4thhYVq2WHSEGeCRgnXpfnNPVX+mdZE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=Ygj5sWQWKFbeDRlp96/dA0a5FbVmAt6ZHkGDyp+9xnc=; b=ksSR8LC1FNB+OA9+c/abzYBdpPN2FqYbJd8dppO232K/TzhkPm/YzzyGr7W4rOUEFL FD56hcXw4H+eCSPqRb21KjRd8g/S4nupl8iJAGd2L4Yman1xkbwUFN9ar4drGWykGgPd L4fMCWzedIYsH5Unb2AJdvksNJbZSZfj6CgTQluYuQSmwpqmRDfOC60bqy+aWXy+MyDn ikuACjgeTqcogyMzatGnt91kEGRNVQY/6NUhh/IBpdeJtUoupoZW+pVzUM7lifpbjI1b NZHhawAkd8LKCV6HYKIkZavMGTAkGPERQzrmDWsIQVhi2rSJGQPutG4LTbTf1uOsHp2u Kt6A== X-Gm-Message-State: AE9vXwOR+nHLaG5C04hGwkWbtNGUmY7nmZvXb3LLAmDYfNnvRqKW2/LmvQcz/OsR0XSXIO9T X-Received: by 10.28.23.210 with SMTP id 201mr19518629wmx.108.1473166891626; Tue, 06 Sep 2016 06:01:31 -0700 (PDT) Received: from vingu-laptop ([2a01:e35:8bd4:7750:949e:4a2a:1c80:5a67]) by smtp.gmail.com with ESMTPSA id s6sm33247010wjm.25.2016.09.06.06.01.30 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 06 Sep 2016 06:01:30 -0700 (PDT) Date: Tue, 6 Sep 2016 15:01:27 +0200 From: Vincent Guittot To: Mike Galbraith Cc: Peter Zijlstra , LKML , Rik van Riel Subject: Re: [v2 patch v3.18+ regression fix] sched: Further improve spurious CPU_IDLE active migrations Message-ID: <20160906130127.GA21960@vingu-laptop> References: <1472535775.3960.3.camel@suse.de> <20160831100117.GV10121@twins.programming.kicks-ass.net> <1472638699.3942.14.camel@suse.de> <1472639782.3942.27.camel@gmail.com> <1472703062.3979.60.camel@gmail.com> <1473092813.4412.6.camel@gmail.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1473092813.4412.6.camel@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le Monday 05 Sep 2016 à 18:26:53 (+0200), Mike Galbraith a écrit : > Coming back to this, how about this instead, only increase the group > imbalance threshold when sd_llc_size == 2. Newer L3 equipped > processors then aren't affected. > Not sure that all systems with sd_llc_size == 2 wants this behavior. Why not adding a sched_feature for changing the 2nd half of the test for some systems ? something like below --- kernel/sched/fair.c | 11 ++++++++--- kernel/sched/features.h | 7 +++++++ 2 files changed, 15 insertions(+), 3 deletions(-) -- > > > 43f4d666 partially cured uprious migrations, but when there are > completely idle groups on a lightly loaded processor, and there is > a buddy pair occupying the busiest group, we will not attempt to > migrate due to select_idle_sibling() buddy placement, leaving the > busiest queue with one task. We skip balancing, but increment > nr_balance_failed until we kick active balancing, and bounce a > buddy pair endlessly, demolishing throughput. > > Increase group imbalance threshold to two when sd_llc_size == 2 to > allow buddies to share L2 without affecting larger L3 processors. > > Regression detected on X5472 box, which has 4 MC groups of 2 cores. > > netperf -l 60 -H 127.0.0.1 -t UDP_STREAM -i5,1 -I 95,5 > pre: > !!! WARNING > !!! Desired confidence was not achieved within the specified iterations. > !!! This implies that there was variability in the test environment that > !!! must be investigated before going further. > !!! Confidence intervals: Throughput : 66.421% > !!! Local CPU util : 0.000% > !!! Remote CPU util : 0.000% > > Socket Message Elapsed Messages > Size Size Time Okay Errors Throughput > bytes bytes secs # # 10^6bits/sec > > 212992 65507 60.00 1779143 0 15539.49 > 212992 60.00 1773551 15490.65 > > post: > Socket Message Elapsed Messages > Size Size Time Okay Errors Throughput > bytes bytes secs # # 10^6bits/sec > > 212992 65507 60.00 3719377 0 32486.01 > 212992 60.00 3717492 32469.54 > > Signed-off-by: Mike Galbraith > Fixes: caeb178c sched/fair: Make update_sd_pick_busiest() return 'true' on a busier sd > Cc: # v3.18+ > --- > kernel/sched/fair.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7249,12 +7249,19 @@ static struct sched_group *find_busiest_ > * This cpu is idle. If the busiest group is not overloaded > * and there is no imbalance between this and busiest group > * wrt idle cpus, it is balanced. The imbalance becomes > - * significant if the diff is greater than 1 otherwise we > - * might end up to just move the imbalance on another group > + * significant if the diff is greater than 1 for most CPUs, > + * or 2 for older CPUs having multiple groups of 2 cores > + * sharing an L2, otherwise we may end up uselessly moving > + * the imbalance to another group, or starting a tug of war > + * with idle L2 groups constantly ripping communicating > + * tasks apart, and no L3 to mitigate the cache miss pain. > */ > - if ((busiest->group_type != group_overloaded) && > - (local->idle_cpus <= (busiest->idle_cpus + 1))) > - goto out_balanced; > + if (busiest->group_type != group_overloaded) { > + int imbalance = __this_cpu_read(sd_llc_size) == 2 ? 2 : 1; > + > + if (local->idle_cpus <= busiest->idle_cpus + imbalance) > + goto out_balanced; > + } > } else { > /* > * In the CPU_NEWLY_IDLE, CPU_NOT_IDLE cases, use diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4185e0a..65c9363 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7395,9 +7395,14 @@ static struct sched_group *find_busiest_group(struct lb_env *env) * significant if the diff is greater than 1 otherwise we * might end up to just move the imbalance on another group */ - if ((busiest->group_type != group_overloaded) && - (local->idle_cpus <= (busiest->idle_cpus + 1))) - goto out_balanced; + if (busiest->group_type != group_overloaded) { + int imbalance = 1; + if (!sched_feat(BALANCE_IDLE_CPUS)) + imbalance = __this_cpu_read(sd_llc_size); + + if (local->idle_cpus <= busiest->idle_cpus + imbalance) + goto out_balanced; + } } else { /* * In the CPU_NEWLY_IDLE, CPU_NOT_IDLE cases, use diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 69631fa..16c34ec 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -69,3 +69,10 @@ SCHED_FEAT(RT_RUNTIME_SHARE, true) SCHED_FEAT(LB_MIN, false) SCHED_FEAT(ATTACH_AGE_LOAD, true) +/* + * Try to balance the number of idle CPUs in each group to minimize contention + * on shared ressources. Nevertheless, some older systems without L3 seems to + * prefer to share resource for minimizing the migration between groups + */ +SCHED_FEAT(BALANCE_IDLE_CPUS, true) +