From patchwork Wed Apr  6 03:14:17 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alex Shi <alex.shi@linaro.org>
X-Patchwork-Id: 65149
Delivered-To: patch@linaro.org
Received: by 10.112.199.169 with SMTP id jl9csp807580lbc;
 Tue, 5 Apr 2016 20:17:51 -0700 (PDT)
X-Received: by 10.66.246.234 with SMTP id xz10mr67576338pac.49.1459912671090; 
 Tue, 05 Apr 2016 20:17:51 -0700 (PDT)
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id l5si1331374pfj.80.2016.04.05.20.17.50; 
 Tue, 05 Apr 2016 20:17:51 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1760631AbcDFDRs (ORCPT <rfc822;pramod.gurav@linaro.org>
 + 29 others); Tue, 5 Apr 2016 23:17:48 -0400
Received: from mail-pa0-f48.google.com ([209.85.220.48]:34912 "EHLO
 mail-pa0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1752812AbcDFDRr (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Tue, 5 Apr 2016 23:17:47 -0400
Received: by mail-pa0-f48.google.com with SMTP id td3so23520827pab.2
 for <linux-kernel@vger.kernel.org>;
 Tue, 05 Apr 2016 20:17:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id;
 bh=R/iz/EDhAeqwlfk5UUm1y5Etyap1+nvCPvyOKbfCR+Q=;
 b=JHGJc4dqku9vf9lZ6/SbZx58+hPpF3q7DcKmXm/Bi9gjpX4gmmULuGWjedK94z8agt
 HUfMs7P213sMPZ0EZ891FXOF+1f0cdKzLCM1R5dGsMdS/+xr2JrTH5+iAD9ELC++QZfS
 mWBG90WiBWYW91pmQlOEWawqeD/KyMMooteL8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:to:cc:subject:date:message-id;
 bh=R/iz/EDhAeqwlfk5UUm1y5Etyap1+nvCPvyOKbfCR+Q=;
 b=h7YcFRCpIY1qWRBFKv71vYuX7AqSJbNH4B+oT3UHtZVk6O05gk+eJM003GsAm/4Y/0
 PIBWtV+Mx2oaPwsw3KU8CK3Uddo25l5XaS7pGT6otHPe4E0N3K7irOiVcmpJIQ1U3I8B
 UlH0EYSXqujpM6x9jSy6TpKs7gzPO6xGMLHlrBr8iDtqEUL2CMa+jnOV6dt1fDq2PHJF
 ERANKAFWxhRkjwwC5DM6ylHHr00TYuurPJFZEx+VNKaLMaQQ/K2jb+AtHjnfl3rqIwus
 7qMzZstAo+smMdaieiR4VFdvgMc5mrbwg2YB+24eUFuAdWn8BMiKp9b/IhWrcnRGn+1v
 C0zQ==
X-Gm-Message-State: AD7BkJJ8BjnfNBgdfRkVJBMkwhpUZ5Tcu2MMOWCT9BpV6lg9l0yK5Wrxl/TiVhb/uQaEQPIW
X-Received: by 10.66.246.234 with SMTP id xz10mr67575868pac.49.1459912665430; 
 Tue, 05 Apr 2016 20:17:45 -0700 (PDT)
Received: from localhost.localdomain ([139.59.249.186])
 by smtp.gmail.com with ESMTPSA id
 h19sm679376pfh.43.2016.04.05.20.17.41
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Tue, 05 Apr 2016 20:17:44 -0700 (PDT)
From: Alex Shi <alex.shi@linaro.org>
To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
 "H. Peter Anvin" <hpa@zytor.com>,
 x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)),
 linux-kernel@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND
 64-BIT))
Cc: Alex Shi <alex.shi@linaro.org>, Andrew Morton <akpm@linux-foundation.org>,
 Andy Lutomirski <luto@kernel.org>, Rik van Riel <riel@redhat.com>
Subject: [REF PATCH] x86/tlb: just do tlb flush on one of siblings of SMT
Date: Wed,  6 Apr 2016 11:14:17 +0800
Message-Id: <1459912457-5630-1-git-send-email-alex.shi@linaro.org>
X-Mailer: git-send-email 2.7.2.333.g70bd996
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

It seems Intel core still share the TLB pool, flush both of threads' TLB
just cause a extra useless IPI and a extra flush. The extra flush will 
flush out TLB again which another thread just introduced.
That's double waste.

The micro testing show memory access can save about 25% time on my 
haswell i7 desktop.
munmap source code is here: https://lkml.org/lkml/2012/5/17/59

test result on Kernel v4.5.0:
$/home/alexs/bin/perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads -e tlb:tlb_flush munmap -n 64 -t 16
munmap use 57ms 14072ns/time, memory access uses 48356 times/thread/ms, cost 20ns/time

 Performance counter stats for '/home/alexs/backups/exec-laptop/tlb/munmap -n 64 -t 16':

        18,739,808      dTLB-load-misses          #    2.47% of all dTLB cache hits   (43.05%)
       757,380,911      dTLB-loads                                                    (34.34%)
         2,125,275      dTLB-store-misses                                             (32.23%)
       318,307,759      dTLB-stores                                                   (46.32%)
            32,765      iTLB-load-misses          #    2.03% of all iTLB cache hits   (56.90%)
         1,616,237      iTLB-loads                                                    (44.47%)
            41,476      tlb:tlb_flush

       1.443484546 seconds time elapsed

/proc/vmstat/nr_tlb_remote_flush increased: 4616
/proc/vmstat/nr_tlb_remote_flush_received increased: 32262

test result on Kernel v4.5.0 + this patch:
$/home/alexs/bin/perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads -e tlb:tlb_flush munmap -n 64 -t 16
munmap use 48ms 11933ns/time, memory access uses 59966 times/thread/ms, cost 16ns/time

 Performance counter stats for '/home/alexs/backups/exec-laptop/tlb/munmap -n 64 -t 16':

        15,984,772      dTLB-load-misses          #    1.89% of all dTLB cache hits   (41.72%)
       844,099,241      dTLB-loads                                                    (33.30%)
         1,328,102      dTLB-store-misses                                             (52.13%)
       280,902,875      dTLB-stores                                                   (52.03%)
            27,678      iTLB-load-misses          #    1.67% of all iTLB cache hits   (35.35%)
         1,659,550      iTLB-loads                                                    (38.38%)
            25,137      tlb:tlb_flush

       1.428880301 seconds time elapsed

/proc/vmstat/nr_tlb_remote_flush increased: 4616
/proc/vmstat/nr_tlb_remote_flush_received increased: 15912

BTW, 
This change isn't architecturally guaranteed.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
To: linux-kernel@vger.kernel.org
To: Mel Gorman <mgorman@suse.de>
To: x86@kernel.org
To: "H. Peter Anvin" <hpa@zytor.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Alex Shi <alex.shi@linaro.org>
---
 arch/x86/mm/tlb.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

-- 
2.7.2.333.g70bd996

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 8f4cc3d..6510316 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -134,7 +134,10 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 				 struct mm_struct *mm, unsigned long start,
 				 unsigned long end)
 {
+	int cpu;
 	struct flush_tlb_info info;
+	cpumask_t flush_mask, *sblmask;
+
 	info.flush_mm = mm;
 	info.flush_start = start;
 	info.flush_end = end;
@@ -151,7 +154,23 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 								&info, 1);
 		return;
 	}
-	smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
+
+	if (unlikely(smp_num_siblings <= 1)) {
+		smp_call_function_many(cpumask, flush_tlb_func, &info, 1);
+		return;
+	}
+
+	/* Only one flush needed on both siblings of SMT */
+	cpumask_copy(&flush_mask, cpumask);
+	for_each_cpu(cpu, &flush_mask) {
+		sblmask = topology_sibling_cpumask(cpu);
+		if (!cpumask_subset(sblmask, &flush_mask))
+			continue;
+
+		cpumask_clear_cpu(cpumask_next(cpu, sblmask), &flush_mask);
+	}
+
+	smp_call_function_many(&flush_mask, flush_tlb_func, &info, 1);
 }
 
 void flush_tlb_current_task(void)