From patchwork Tue Apr 2 01:47:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Okanovic, Haris" X-Patchwork-Id: 785038 Received: from smtp-fw-52005.amazon.com (smtp-fw-52005.amazon.com [52.119.213.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 695C9945A; Tue, 2 Apr 2024 01:47:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=52.119.213.156 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712022466; cv=none; b=RpLoklF21/MT+czLM+j/4GSWBN7nxUkL6PoIhow8yqRYuPpbaTaUZJxfmmlcSsAqPRbvlV9nISLDPjlRDAFs3TCgCoCtpPRki7xMf2bLv5uROvKgzxuZBchZCfGqguki6/ncP1qOa+tugoPcDIwEg0laycr0GAnbltkfcN+8M2Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712022466; c=relaxed/simple; bh=E005gBSNtjSCIS2doCc/40wSL13e/DWnm8D1a1Y8FXA=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=jAsVJntG/WQXeU7G0ho54LJQBNTIEfroHa2qwocY3siw/n0dIIor4V9ICZjFgqXPrsqhJH/UAG/YFaBKLKeRzziIKVLp7xPCyjXTZVgvAFSqyhYXT6+IXRgyJ43J+0T7i7gS7B9zdXDADwAZCI8cHfmY/qm4wheCdE3oRTWpkwU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=Z+5vbHNN; arc=none smtp.client-ip=52.119.213.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="Z+5vbHNN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1712022465; x=1743558465; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=3vnZ3IHPBXc4ai5YYVkJH/KG1jfBRdDvhLTTJAolhIY=; b=Z+5vbHNNl5C/Sq0z+MttQbjysUUZ9/N+5MgNttcfRTbGcBV+r/jaNC/l EDcvYti6c1Lwqg+FkOL91kkjCjLHXGv6PbS0Gh/8MQRuhDBGmXSKaFDxw 1vx9pCPgj82CZCbJXE4ir6pBklXMhTI7470FCTFhrQjFsl79iAJrY68UQ o=; X-IronPort-AV: E=Sophos;i="6.07,173,1708387200"; d="scan'208";a="645020000" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Apr 2024 01:47:43 +0000 Received: from EX19MTAUWC001.ant.amazon.com [10.0.21.151:1581] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.7.132:2525] with esmtp (Farcaster) id 3851c02f-02ac-4111-8f35-6d8584b37e21; Tue, 2 Apr 2024 01:47:42 +0000 (UTC) X-Farcaster-Flow-ID: 3851c02f-02ac-4111-8f35-6d8584b37e21 Received: from EX19D001UWA003.ant.amazon.com (10.13.138.211) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 2 Apr 2024 01:47:40 +0000 Received: from u34cccd802f2d52.ant.amazon.com (10.252.141.8) by EX19D001UWA003.ant.amazon.com (10.13.138.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.28; Tue, 2 Apr 2024 01:47:39 +0000 From: Haris Okanovic To: , , CC: , Haris Okanovic Subject: [PATCH 1/3] arm64: Add TIF_POLLING_NRFLAG Date: Mon, 1 Apr 2024 20:47:04 -0500 Message-ID: <20240402014706.3969151-1-harisokn@amazon.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D035UWB001.ant.amazon.com (10.13.138.33) To EX19D001UWA003.ant.amazon.com (10.13.138.211) TIF_POLLING_NRFLAG was removed from arm64 as there were no polling idle states. Add back TIF_POLLING_NRFLAG in preparation for an arm64 cpuidle driver which supports polling. Signed-off-by: Haris Okanovic --- arch/arm64/include/asm/thread_info.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index e72a3bf9e563..ab22c7d1967e 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -81,6 +81,7 @@ void arch_setup_new_exec(void); #define TIF_SME 27 /* SME in use */ #define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */ #define TIF_KERNEL_FPSTATE 29 /* Task is in a kernel mode FPSIMD section */ +#define TIF_POLLING_NRFLAG 30 #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) @@ -97,6 +98,7 @@ void arch_setup_new_exec(void); #define _TIF_SVE (1 << TIF_SVE) #define _TIF_MTE_ASYNC_FAULT (1 << TIF_MTE_ASYNC_FAULT) #define _TIF_NOTIFY_SIGNAL (1 << TIF_NOTIFY_SIGNAL) +#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG) #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \ From patchwork Tue Apr 2 01:47:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Okanovic, Haris" X-Patchwork-Id: 786842 Received: from smtp-fw-2101.amazon.com (smtp-fw-2101.amazon.com [72.21.196.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F2CCD52A; Tue, 2 Apr 2024 01:47:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=72.21.196.25 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712022468; cv=none; b=vEYqKGDSkbFVcAa/orhlU0nC29kJyTJOd5EVnrygmF4zko31hHc5fc7nFTi1ZrLd47aBP/bTSf+TtfUARk6Kh5fS3VvQM0eGacqPOXLAMqZHnye6nEQjoCNWFK+4kvCPCO0jMPY80QvBlYKCGajvHCgyxJa+5ndG9hzjcODFwYk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712022468; c=relaxed/simple; bh=byrp57rZFpFTLThv1cyX8/wDPU8rPp83GIPgcwu5g3o=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PeSvi11st6fJJbBHVs7PyP0TajgINEgS2oOatHAgi7lXaeAL+1MiYSxlKuNS1LIKsGS+ymTuZDNzwG/2DpClyL0KWgm4fWBBElcsiRFTtyMLm0esopDkFghRhQW6F+PuGmxJE3Gk+Wqy9RwIfSpRo2otzYrgJVcz97pb3y1jjqA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=B3iyhwRp; arc=none smtp.client-ip=72.21.196.25 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="B3iyhwRp" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1712022468; x=1743558468; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i4X7K8R3U0XvMmPyqS+IPCIp/zMWFiuyylPYoGweXMQ=; b=B3iyhwRpMxrOSqY0qbrSL9K5AvQwNvmOdzRt72NRhQp0Bqw/w/Yd3wGz BL22kHTonboLlUTwYGd3RTbtd9PneUS9X4BaLhXug3dWVnnCX6Ksqki6D pqRa/664hc5NQBcKVPCpqkTzZMCOBcpzjHzidxqpkdiu2Oq7RugRgEgAA 8=; X-IronPort-AV: E=Sophos;i="6.07,173,1708387200"; d="scan'208";a="391962966" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-2101.iad2.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Apr 2024 01:47:47 +0000 Received: from EX19MTAUWB002.ant.amazon.com [10.0.38.20:22255] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.7.132:2525] with esmtp (Farcaster) id 6f55626c-399e-4a5c-a1dc-3175624daa6b; Tue, 2 Apr 2024 01:47:45 +0000 (UTC) X-Farcaster-Flow-ID: 6f55626c-399e-4a5c-a1dc-3175624daa6b Received: from EX19D001UWA003.ant.amazon.com (10.13.138.211) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 2 Apr 2024 01:47:44 +0000 Received: from u34cccd802f2d52.ant.amazon.com (10.252.141.8) by EX19D001UWA003.ant.amazon.com (10.13.138.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.28; Tue, 2 Apr 2024 01:47:43 +0000 From: Haris Okanovic To: , , CC: , Haris Okanovic Subject: [PATCH 2/3] arm64: add __READ_ONCE_EX() Date: Mon, 1 Apr 2024 20:47:05 -0500 Message-ID: <20240402014706.3969151-2-harisokn@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240402014706.3969151-1-harisokn@amazon.com> References: <20240402014706.3969151-1-harisokn@amazon.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D035UWB001.ant.amazon.com (10.13.138.33) To EX19D001UWA003.ant.amazon.com (10.13.138.211) Perform an exclusive load, which atomically loads a word and arms the execusive monitor to enable wfe() polling of an address. Adding this macro in preparation for an arm64 cpuidle driver which supports a wfe() based polling state. https://developer.arm.com/documentation/dht0008/a/arm-synchronization-primitives/exclusive-accesses/exclusive-monitors Signed-off-by: Haris Okanovic --- arch/arm64/include/asm/readex.h | 46 +++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 arch/arm64/include/asm/readex.h diff --git a/arch/arm64/include/asm/readex.h b/arch/arm64/include/asm/readex.h new file mode 100644 index 000000000000..51963c3107e1 --- /dev/null +++ b/arch/arm64/include/asm/readex.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Based on arch/arm64/include/asm/rwonce.h + * + * Copyright (C) 2020 Google LLC. + * Copyright (C) 2024 Amazon.com, Inc. or its affiliates. + */ + +#ifndef __ASM_READEX_H +#define __ASM_READEX_H + +#define __LOAD_EX(sfx, regs...) "ldaxr" #sfx "\t" #regs + +#define __READ_ONCE_EX(x) \ +({ \ + typeof(&(x)) __x = &(x); \ + int atomic = 1; \ + union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u; \ + switch (sizeof(x)) { \ + case 1: \ + asm volatile(__LOAD_EX(b, %w0, %1) \ + : "=r" (*(__u8 *)__u.__c) \ + : "Q" (*__x) : "memory"); \ + break; \ + case 2: \ + asm volatile(__LOAD_EX(h, %w0, %1) \ + : "=r" (*(__u16 *)__u.__c) \ + : "Q" (*__x) : "memory"); \ + break; \ + case 4: \ + asm volatile(__LOAD_EX(, %w0, %1) \ + : "=r" (*(__u32 *)__u.__c) \ + : "Q" (*__x) : "memory"); \ + break; \ + case 8: \ + asm volatile(__LOAD_EX(, %0, %1) \ + : "=r" (*(__u64 *)__u.__c) \ + : "Q" (*__x) : "memory"); \ + break; \ + default: \ + atomic = 0; \ + } \ + atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(__x))__x);\ +}) + +#endif From patchwork Tue Apr 2 01:47:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Okanovic, Haris" X-Patchwork-Id: 785037 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4451F9EC; Tue, 2 Apr 2024 01:47:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=207.171.184.29 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712022471; cv=none; b=n66R1FYMODO/JzitbVvo7oDh+CLzhE4T3FtdY38rNRFxsCPY6eWuy9ssb0/pUN/3PPtXDV44xRZWcOBqlHn5PBsylwtWFz8gV7Qa/Y8NydPC8Efqte7CZC2jjftRBwubWENoNXtT5pa08wQX0otdGo7595vX5d0Pi05BsrgMdqU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712022471; c=relaxed/simple; bh=k41Pvw0mfXzaqhx1OVEqvFMH75xZ+Wubo5cMCx7DvZ0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=l+4erVAWARux7zPfMREekMuHDdbbt245Eu2jXrvragwtxXcSyjTax58SfjfwloizJl6ofETeXthDuJOd+ksKcx8mIO1OzTzTaEY1NxjKtPdIuVAECr7Y6jslOqazs+t3CrTtzHVZG+VHXpyn8Z69yk0i48WIHs0o8SnsvypLUxA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b=vqJv50+j; arc=none smtp.client-ip=207.171.184.29 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="vqJv50+j" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1712022470; x=1743558470; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nmUKUSi2MoSUY6V0czBUbSS/ooN8zBAEjuFUgPRajcw=; b=vqJv50+jcA2YB25b6Zr9xtHWa0RSQNjZa3zVgaPjOaOjeZyB4rsL/HC0 Ve82KKfSGLUr3DFIMCDcxwRSVROR4fyjAp1mTAVcvGOGoqO5oWRjBRWK2 rMnM/HOP+2SwZHvmYdd/G8tT+LJHLcn/no72rUnmvLoF9JVbbxBFhNMc/ E=; X-IronPort-AV: E=Sophos;i="6.07,173,1708387200"; d="scan'208";a="408337927" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Apr 2024 01:47:49 +0000 Received: from EX19MTAUWA002.ant.amazon.com [10.0.7.35:1848] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.36.248:2525] with esmtp (Farcaster) id dbdfce37-3297-4f21-86c3-3f98adefad0d; Tue, 2 Apr 2024 01:47:48 +0000 (UTC) X-Farcaster-Flow-ID: dbdfce37-3297-4f21-86c3-3f98adefad0d Received: from EX19D001UWA003.ant.amazon.com (10.13.138.211) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 2 Apr 2024 01:47:48 +0000 Received: from u34cccd802f2d52.ant.amazon.com (10.252.141.8) by EX19D001UWA003.ant.amazon.com (10.13.138.211) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.28; Tue, 2 Apr 2024 01:47:46 +0000 From: Haris Okanovic To: , , CC: , Haris Okanovic , Ali Saidi , Geoff Blake , Brian Silver Subject: [PATCH 3/3] arm64: cpuidle: Add arm_poll_idle Date: Mon, 1 Apr 2024 20:47:06 -0500 Message-ID: <20240402014706.3969151-3-harisokn@amazon.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240402014706.3969151-1-harisokn@amazon.com> References: <20240402014706.3969151-1-harisokn@amazon.com> Precedence: bulk X-Mailing-List: linux-pm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D035UWB001.ant.amazon.com (10.13.138.33) To EX19D001UWA003.ant.amazon.com (10.13.138.211) An arm64 cpuidle driver with two states: (1) First polls for new runable tasks up to 100 us (by default) before (2) a wfi idle and awoken by interrupt (the current arm64 behavior). It allows CPUs to return from idle more quickly by avoiding the longer interrupt wakeup path, which may require EL1/EL2 transition in certain VM scenarios. Poll duration is optionally configured at load time via the poll_limit module parameter. The default 100 us duration was experimentally chosen, by measuring QPS (queries per sec) of the MLPerf bert inference benchmark, which seems particularly susceptible to this change; see procedure below. 100 us is the inflection point where QPS stopped growing in a range of tested values. All results are from AWS m7g.16xlarge instances (Graviton3 SoC) with dedicated tenancy (dedicated hardware). | before | 10us | 25us | 50us | 100us | 125us | 150us | 200us | 300us | | 5.87 | 5.91 | 5.96 | 6.01 | 6.06 | 6.07 | 6.06 | 6.06 | 6.06 | Perf's scheduler benchmarks also improve with a range of poll_limit values >= 10 us. Higher limits produce near identical results within a 3% noise margin. The following tables are `perf bench sched` results, run times in seconds. `perf bench sched messaging -l 80000` | AWS instance | SoC | Before | After | % Change | | c6g.16xl (VM) | Graviton2 | 18.974 | 18.400 | none | | c7g.16xl (VM) | Graviton3 | 13.852 | 13.859 | none | | c6g.metal | Graviton2 | 17.621 | 16.744 | none | | c7g.metal | Graviton3 | 13.430 | 13.404 | none | `perf bench sched pipe -l 2500000` | AWS instance | SoC | Before | After | % Change | | c6g.16xl (VM) | Graviton2 | 30.158 | 15.181 | -50% | | c7g.16xl (VM) | Graviton3 | 18.289 | 12.067 | -34% | | c6g.metal | Graviton2 | 17.609 | 15.170 | -14% | | c7g.metal | Graviton3 | 14.103 | 12.304 | -13% | `perf bench sched seccomp-notify -l 2500000` | AWS instance | SoC | Before | After | % Change | | c6g.16xl (VM) | Graviton2 | 28.784 | 13.754 | -52% | | c7g.16xl (VM) | Graviton3 | 16.964 | 11.430 | -33% | | c6g.metal | Graviton2 | 15.717 | 13.536 | -14% | | c7g.metal | Graviton3 | 13.301 | 11.491 | -14% | Steps to run MLPerf bert inference on Ubuntu 22.04: sudo apt install build-essential python3 python3-pip pip install "pybind11[global]" tensorflow transformers export TF_ENABLE_ONEDNN_OPTS=1 export DNNL_DEFAULT_FPMATH_MODE=BF16 git clone https://github.com/mlcommons/inference.git --recursive cd inference git checkout v2.0 cd loadgen CFLAGS="-std=c++14" python3 setup.py bdist_wheel pip install dist/*.whl cd ../language/bert make setup python3 run.py --backend=tf --scenario=SingleStream Suggested-by: Ali Saidi Reviewed-by: Ali Saidi Reviewed-by: Geoff Blake Cc: Brian Silver Signed-off-by: Haris Okanovic --- drivers/cpuidle/Kconfig.arm | 13 ++ drivers/cpuidle/Makefile | 1 + drivers/cpuidle/cpuidle-arm-polling.c | 171 ++++++++++++++++++++++++++ 3 files changed, 185 insertions(+) create mode 100644 drivers/cpuidle/cpuidle-arm-polling.c diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm index a1ee475d180d..484666dda38d 100644 --- a/drivers/cpuidle/Kconfig.arm +++ b/drivers/cpuidle/Kconfig.arm @@ -14,6 +14,19 @@ config ARM_CPUIDLE initialized by calling the CPU operations init idle hook provided by architecture code. +config ARM_POLL_CPUIDLE + bool "ARM64 CPU idle Driver with polling" + depends on ARM64 + depends on ARM_ARCH_TIMER_EVTSTREAM + select CPU_IDLE_MULTIPLE_DRIVERS + help + Select this to enable a polling cpuidle driver for ARM64: + The first state polls TIF_NEED_RESCHED for best latency on short + sleep intervals. The second state falls back to arch_cpu_idle() to + wait for interrupt. This is can be helpful in workloads that + frequently block/wake at short intervals or VMs where wakeup IPIs + are more expensive. + config ARM_PSCI_CPUIDLE bool "PSCI CPU idle Driver" depends on ARM_PSCI_FW diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile index d103342b7cfc..23c21422792d 100644 --- a/drivers/cpuidle/Makefile +++ b/drivers/cpuidle/Makefile @@ -22,6 +22,7 @@ obj-$(CONFIG_ARM_U8500_CPUIDLE) += cpuidle-ux500.o obj-$(CONFIG_ARM_AT91_CPUIDLE) += cpuidle-at91.o obj-$(CONFIG_ARM_EXYNOS_CPUIDLE) += cpuidle-exynos.o obj-$(CONFIG_ARM_CPUIDLE) += cpuidle-arm.o +obj-$(CONFIG_ARM_POLL_CPUIDLE) += cpuidle-arm-polling.o obj-$(CONFIG_ARM_PSCI_CPUIDLE) += cpuidle-psci.o obj-$(CONFIG_ARM_PSCI_CPUIDLE_DOMAIN) += cpuidle-psci-domain.o obj-$(CONFIG_ARM_TEGRA_CPUIDLE) += cpuidle-tegra.o diff --git a/drivers/cpuidle/cpuidle-arm-polling.c b/drivers/cpuidle/cpuidle-arm-polling.c new file mode 100644 index 000000000000..bca128568114 --- /dev/null +++ b/drivers/cpuidle/cpuidle-arm-polling.c @@ -0,0 +1,171 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ARM64 CPU idle driver using wfe polling + * + * Copyright 2024 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * Authors: + * Haris Okanovic + * Brian Silver + * + * Based on cpuidle-arm.c + * Copyright (C) 2014 ARM Ltd. + * Author: Lorenzo Pieralisi + */ + +#include +#include +#include +#include + +#include +#include + +#include "dt_idle_states.h" + +/* Max duration of the wfe() poll loop in us, before transitioning to + * arch_cpu_idle()/wfi() sleep. + */ +#define DEFAULT_POLL_LIMIT_US 100 +static unsigned int poll_limit __read_mostly = DEFAULT_POLL_LIMIT_US; + +/* + * arm_idle_wfe_poll - Polls state in wfe loop until reschedule is + * needed or timeout + */ +static int __cpuidle arm_idle_wfe_poll(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int idx) +{ + u64 time_start, time_limit; + + time_start = local_clock(); + dev->poll_time_limit = false; + + local_irq_enable(); + + if (current_set_polling_and_test()) + goto end; + + time_limit = cpuidle_poll_time(drv, dev); + + do { + // exclusive read arms the monitor for wfe + if (__READ_ONCE_EX(current_thread_info()->flags) & _TIF_NEED_RESCHED) + goto end; + + // may exit prematurely, see ARM_ARCH_TIMER_EVTSTREAM + wfe(); + } while (local_clock() - time_start < time_limit); + + dev->poll_time_limit = true; + +end: + current_clr_polling(); + return idx; +} + +/* + * arm_idle_wfi - Places cpu in lower power state until interrupt, + * a fallback to polling + */ +static int __cpuidle arm_idle_wfi(struct cpuidle_device *dev, + struct cpuidle_driver *drv, int idx) +{ + if (current_clr_polling_and_test()) { + local_irq_enable(); + return idx; + } + arch_cpu_idle(); + return idx; +} + +static struct cpuidle_driver arm_poll_idle_driver __initdata = { + .name = "arm_poll_idle", + .owner = THIS_MODULE, + .states = { + { + .enter = arm_idle_wfe_poll, + .exit_latency = 0, + .target_residency = 0, + .exit_latency_ns = 0, + .power_usage = UINT_MAX, + .flags = CPUIDLE_FLAG_POLLING, + .name = "WFE", + .desc = "ARM WFE", + }, + { + .enter = arm_idle_wfi, + .exit_latency = DEFAULT_POLL_LIMIT_US, + .target_residency = DEFAULT_POLL_LIMIT_US, + .power_usage = UINT_MAX, + .name = "WFI", + .desc = "ARM WFI", + }, + }, + .state_count = 2, +}; + +/* + * arm_poll_init_cpu - Initializes arm cpuidle polling driver for one cpu + */ +static int __init arm_poll_init_cpu(int cpu) +{ + int ret; + struct cpuidle_driver *drv; + + drv = kmemdup(&arm_poll_idle_driver, sizeof(*drv), GFP_KERNEL); + if (!drv) + return -ENOMEM; + + drv->cpumask = (struct cpumask *)cpumask_of(cpu); + drv->states[1].exit_latency = poll_limit; + drv->states[1].target_residency = poll_limit; + + ret = cpuidle_register(drv, NULL); + if (ret) { + pr_err("failed to register driver: %d, cpu %d\n", ret, cpu); + goto out_kfree_drv; + } + + pr_info("registered driver cpu %d\n", cpu); + + cpuidle_cooling_register(drv); + + return 0; + +out_kfree_drv: + kfree(drv); + return ret; +} + +/* + * arm_poll_init - Initializes arm cpuidle polling driver + */ +static int __init arm_poll_init(void) +{ + int cpu, ret; + struct cpuidle_driver *drv; + struct cpuidle_device *dev; + + for_each_possible_cpu(cpu) { + ret = arm_poll_init_cpu(cpu); + if (ret) + goto out_fail; + } + + return 0; + +out_fail: + pr_info("de-register all"); + while (--cpu >= 0) { + dev = per_cpu(cpuidle_devices, cpu); + drv = cpuidle_get_cpu_driver(dev); + cpuidle_unregister(drv); + kfree(drv); + } + + return ret; +} + +module_param(poll_limit, uint, 0444); +device_initcall(arm_poll_init);