From patchwork Tue Jan 3 10:04:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Rutland X-Patchwork-Id: 89626 Delivered-To: patch@linaro.org Received: by 10.182.112.6 with SMTP id im6csp8249197obb; Tue, 3 Jan 2017 02:06:04 -0800 (PST) X-Received: by 10.99.168.69 with SMTP id i5mr115706937pgp.10.1483437964668; Tue, 03 Jan 2017 02:06:04 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 15si36798958pgg.226.2017.01.03.02.06.04; Tue, 03 Jan 2017 02:06:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757997AbdACKGA (ORCPT + 25 others); Tue, 3 Jan 2017 05:06:00 -0500 Received: from foss.arm.com ([217.140.101.70]:52288 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752552AbdACKFt (ORCPT ); Tue, 3 Jan 2017 05:05:49 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4339F707; Tue, 3 Jan 2017 02:05:48 -0800 (PST) Received: from leverpostej (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EFF4E3F24D; Tue, 3 Jan 2017 02:05:46 -0800 (PST) Date: Tue, 3 Jan 2017 10:04:53 +0000 From: Mark Rutland To: Davidlohr Bueso Cc: mingo@kernel.org, peterz@infradead.org, torvalds@linux-foundation.org, catalin.marinas@arm.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Davidlohr Bueso Subject: Re: [RFC PATCH] sched: Remove set_task_state() Message-ID: <20170103100453.GB5605@leverpostej> References: <1483121873-21528-1-git-send-email-dave@stgolabs.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1483121873-21528-1-git-send-email-dave@stgolabs.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 30, 2016 at 10:17:53AM -0800, Davidlohr Bueso wrote: > Secondly for a higher overview, an unlink microbenchmark was used, > which pounds on a single file with open, close,unlink combos with > increasing thread counts (up to 4x ncpus). While the workload is > quite unrealistic, it does contend a lot on the inode mutex or now > rwsem. With the archs I had access to, the differences are as follows: > > == 1. arm64 == > > 0000000000002784 : > 2784: f9000c1f str xzr, [x0,#24] > > 0000000000002790 : > 2790: d5384100 mrs x0, sp_el0 > 2794: f9000c1f str xzr, [x0,#24] > > Avg runtime set_task_state(): 2648 msecs > Avg runtime set_current_state(): 2686 msecs > Unsurprisingly, the big looser is arm64, due to the masking of sp_el0. > otoh, x86-64 (known to be fast for get_current()/this_cpu_read_stable() > caching) and ppc64 (with paca) show similar improvements in the unlink > microbenches. x86's write latencies delta is similar to the opposite of > arm64: 50ms vs -40ms, respectively. The small delta for ppc64 (2ms), does > not represent the gains on the unlink runs. In the case of x86, there was > a decent amount of variation in the latency runs, but always within a 20 > to 50ms increase), ppc was more constant. > > So, do we want to get rid of the interface (and improve performance on > other archs) at the expense of arm64? Can arm64 do better? We can defineitely do better; the asm constraints in read_sysreg() are overly pessimistic for get_current(). Does the below help? Thanks, Mark. ---->8---- diff --git a/arch/arm64/include/asm/current.h b/arch/arm64/include/asm/current.h index f2bcbe2..c9ba5ac 100644 --- a/arch/arm64/include/asm/current.h +++ b/arch/arm64/include/asm/current.h @@ -11,7 +11,11 @@ static __always_inline struct task_struct *get_current(void) { - return (struct task_struct *)read_sysreg(sp_el0); + struct task_struct *tsk; + + asm ("mrs %0, sp_el0" : "=r" (tsk)); + + return tsk; } #define current get_current()