From patchwork Wed Apr 6 03:14:17 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Shi X-Patchwork-Id: 65149 Delivered-To: patch@linaro.org Received: by 10.112.199.169 with SMTP id jl9csp807580lbc; Tue, 5 Apr 2016 20:17:51 -0700 (PDT) X-Received: by 10.66.246.234 with SMTP id xz10mr67576338pac.49.1459912671090; Tue, 05 Apr 2016 20:17:51 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l5si1331374pfj.80.2016.04.05.20.17.50; Tue, 05 Apr 2016 20:17:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760631AbcDFDRs (ORCPT + 29 others); Tue, 5 Apr 2016 23:17:48 -0400 Received: from mail-pa0-f48.google.com ([209.85.220.48]:34912 "EHLO mail-pa0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752812AbcDFDRr (ORCPT ); Tue, 5 Apr 2016 23:17:47 -0400 Received: by mail-pa0-f48.google.com with SMTP id td3so23520827pab.2 for ; Tue, 05 Apr 2016 20:17:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=R/iz/EDhAeqwlfk5UUm1y5Etyap1+nvCPvyOKbfCR+Q=; b=JHGJc4dqku9vf9lZ6/SbZx58+hPpF3q7DcKmXm/Bi9gjpX4gmmULuGWjedK94z8agt HUfMs7P213sMPZ0EZ891FXOF+1f0cdKzLCM1R5dGsMdS/+xr2JrTH5+iAD9ELC++QZfS mWBG90WiBWYW91pmQlOEWawqeD/KyMMooteL8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=R/iz/EDhAeqwlfk5UUm1y5Etyap1+nvCPvyOKbfCR+Q=; b=h7YcFRCpIY1qWRBFKv71vYuX7AqSJbNH4B+oT3UHtZVk6O05gk+eJM003GsAm/4Y/0 PIBWtV+Mx2oaPwsw3KU8CK3Uddo25l5XaS7pGT6otHPe4E0N3K7irOiVcmpJIQ1U3I8B UlH0EYSXqujpM6x9jSy6TpKs7gzPO6xGMLHlrBr8iDtqEUL2CMa+jnOV6dt1fDq2PHJF ERANKAFWxhRkjwwC5DM6ylHHr00TYuurPJFZEx+VNKaLMaQQ/K2jb+AtHjnfl3rqIwus 7qMzZstAo+smMdaieiR4VFdvgMc5mrbwg2YB+24eUFuAdWn8BMiKp9b/IhWrcnRGn+1v C0zQ== X-Gm-Message-State: AD7BkJJ8BjnfNBgdfRkVJBMkwhpUZ5Tcu2MMOWCT9BpV6lg9l0yK5Wrxl/TiVhb/uQaEQPIW X-Received: by 10.66.246.234 with SMTP id xz10mr67575868pac.49.1459912665430; Tue, 05 Apr 2016 20:17:45 -0700 (PDT) Received: from localhost.localdomain ([139.59.249.186]) by smtp.gmail.com with ESMTPSA id h19sm679376pfh.43.2016.04.05.20.17.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 05 Apr 2016 20:17:44 -0700 (PDT) From: Alex Shi To: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), linux-kernel@vger.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)) Cc: Alex Shi , Andrew Morton , Andy Lutomirski , Rik van Riel Subject: [REF PATCH] x86/tlb: just do tlb flush on one of siblings of SMT Date: Wed, 6 Apr 2016 11:14:17 +0800 Message-Id: <1459912457-5630-1-git-send-email-alex.shi@linaro.org> X-Mailer: git-send-email 2.7.2.333.g70bd996 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It seems Intel core still share the TLB pool, flush both of threads' TLB just cause a extra useless IPI and a extra flush. The extra flush will flush out TLB again which another thread just introduced. That's double waste. The micro testing show memory access can save about 25% time on my haswell i7 desktop. munmap source code is here: https://lkml.org/lkml/2012/5/17/59 test result on Kernel v4.5.0: $/home/alexs/bin/perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads -e tlb:tlb_flush munmap -n 64 -t 16 munmap use 57ms 14072ns/time, memory access uses 48356 times/thread/ms, cost 20ns/time Performance counter stats for '/home/alexs/backups/exec-laptop/tlb/munmap -n 64 -t 16': 18,739,808 dTLB-load-misses # 2.47% of all dTLB cache hits (43.05%) 757,380,911 dTLB-loads (34.34%) 2,125,275 dTLB-store-misses (32.23%) 318,307,759 dTLB-stores (46.32%) 32,765 iTLB-load-misses # 2.03% of all iTLB cache hits (56.90%) 1,616,237 iTLB-loads (44.47%) 41,476 tlb:tlb_flush 1.443484546 seconds time elapsed /proc/vmstat/nr_tlb_remote_flush increased: 4616 /proc/vmstat/nr_tlb_remote_flush_received increased: 32262 test result on Kernel v4.5.0 + this patch: $/home/alexs/bin/perf stat -e dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads -e tlb:tlb_flush munmap -n 64 -t 16 munmap use 48ms 11933ns/time, memory access uses 59966 times/thread/ms, cost 16ns/time Performance counter stats for '/home/alexs/backups/exec-laptop/tlb/munmap -n 64 -t 16': 15,984,772 dTLB-load-misses # 1.89% of all dTLB cache hits (41.72%) 844,099,241 dTLB-loads (33.30%) 1,328,102 dTLB-store-misses (52.13%) 280,902,875 dTLB-stores (52.03%) 27,678 iTLB-load-misses # 1.67% of all iTLB cache hits (35.35%) 1,659,550 iTLB-loads (38.38%) 25,137 tlb:tlb_flush 1.428880301 seconds time elapsed /proc/vmstat/nr_tlb_remote_flush increased: 4616 /proc/vmstat/nr_tlb_remote_flush_received increased: 15912 BTW, This change isn't architecturally guaranteed. Signed-off-by: Alex Shi Cc: Andrew Morton To: linux-kernel@vger.kernel.org To: Mel Gorman To: x86@kernel.org To: "H. Peter Anvin" To: Thomas Gleixner Cc: Andy Lutomirski Cc: Rik van Riel Cc: Alex Shi --- arch/x86/mm/tlb.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) -- 2.7.2.333.g70bd996 diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 8f4cc3d..6510316 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -134,7 +134,10 @@ void native_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm, unsigned long start, unsigned long end) { + int cpu; struct flush_tlb_info info; + cpumask_t flush_mask, *sblmask; + info.flush_mm = mm; info.flush_start = start; info.flush_end = end; @@ -151,7 +154,23 @@ void native_flush_tlb_others(const struct cpumask *cpumask, &info, 1); return; } - smp_call_function_many(cpumask, flush_tlb_func, &info, 1); + + if (unlikely(smp_num_siblings <= 1)) { + smp_call_function_many(cpumask, flush_tlb_func, &info, 1); + return; + } + + /* Only one flush needed on both siblings of SMT */ + cpumask_copy(&flush_mask, cpumask); + for_each_cpu(cpu, &flush_mask) { + sblmask = topology_sibling_cpumask(cpu); + if (!cpumask_subset(sblmask, &flush_mask)) + continue; + + cpumask_clear_cpu(cpumask_next(cpu, sblmask), &flush_mask); + } + + smp_call_function_many(&flush_mask, flush_tlb_func, &info, 1); } void flush_tlb_current_task(void)