From patchwork Fri Dec 11 17:22:20 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Pitre X-Patchwork-Id: 58300 Delivered-To: patch@linaro.org Received: by 10.112.73.68 with SMTP id j4csp15337lbv; Fri, 11 Dec 2015 09:22:28 -0800 (PST) X-Received: by 10.98.75.142 with SMTP id d14mr16898690pfj.91.1449854548722; Fri, 11 Dec 2015 09:22:28 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id sf3si2280662pac.58.2015.12.11.09.22.28; Fri, 11 Dec 2015 09:22:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-arm-msm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-arm-msm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-arm-msm-owner@vger.kernel.org; dkim=neutral (body hash did not verify) header.i=@linaro-org.20150623.gappssmtp.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755513AbbLKRW0 (ORCPT + 6 others); Fri, 11 Dec 2015 12:22:26 -0500 Received: from mail-qg0-f52.google.com ([209.85.192.52]:35975 "EHLO mail-qg0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753070AbbLKRWX (ORCPT ); Fri, 11 Dec 2015 12:22:23 -0500 Received: by qgcc31 with SMTP id c31so202467122qgc.3 for ; Fri, 11 Dec 2015 09:22:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version:content-type; bh=KmA3Jr1nShR/oGMypKTpZp9bP9JhiRBjSaOq1dTvmtk=; b=wX1shhd3qnpETl/g50TsUVon9QXmI5yd2O9veOvFuEJIYlbz54PRRF1lNje7xOFik1 K3Aqd7iH/eiTiVrvGWXPg2vLef83f4na5IO+IQLFaA/yDLeWBA68VXllPAvIFkGeoPPa z6NfD89pymvgGOSYmRWArvQwdYHHnpwjYtj7jDnVAWAA7ozbOHWv2WqMlU/jcw0QpqI5 Jx6pQVvO2wEibQQzj+ajK8OwZLQRr0rLEwZgVxCX9Tdtu9c+Orpm0b6IDEjtZspaW7pj IN3veJYN5DBgKCSnweqNS79fJKxPKKkvo8QR4Xx8Yr8sIBYB66aZWVjaKFVC8bfnInNQ NEbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version:content-type; bh=KmA3Jr1nShR/oGMypKTpZp9bP9JhiRBjSaOq1dTvmtk=; b=T2141VoyTRCUaXIKQhCATF8qLjjuz5qUgkCDH9It73iLza2D4dhqqoLG6UJfcG/Qxf 6wX31lEK/PK+wS4ZwB9SE+nSUt7cIySRVfXBcJ7gxBBL/ZpkpXZ4cLDQ/ur1BeD8+764 xI42sNkKHHLPsvAoa955xojL3hgtGO+H9jcJrb2yWN7PziXPqkvTdV/EZkLMRnAoTahV 7ewu+VeotP2A7nCe797+LVmGg+/B/y8DRtEE4bZhMUd+ynvmR04DVTeVdeX2ruHTg+Pg DyE6Zl6RHN0i6hgynm7830QtYYtBMBaJjj1c+NKY54vlavB/dD35rG0Lxi3qXbUPShQp Tvig== X-Gm-Message-State: ALoCoQk9qAh8UIuiw+Q+Hdiz9BSoOKjXLgw8S98/Jw9ruJ+RBTEowZFBriVPw6xlGhGjRBXd/VzGFktvxSTgKaQuW9U0ra7xwQ== X-Received: by 10.140.19.229 with SMTP id 92mr7078197qgh.100.1449854542485; Fri, 11 Dec 2015 09:22:22 -0800 (PST) Received: from xanadu.home (modemcable065.157-23-96.mc.videotron.ca. [96.23.157.65]) by smtp.gmail.com with ESMTPSA id d9sm8555169qhc.25.2015.12.11.09.22.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Dec 2015 09:22:21 -0800 (PST) Date: Fri, 11 Dec 2015 12:22:20 -0500 (EST) From: Nicolas Pitre To: Arnd Bergmann cc: linux-arm-kernel@lists.infradead.org, Stephen Boyd , linux-arm-msm@vger.kernel.org, Russell King , =?ISO-8859-15?Q?M=E5ns_Rullg=E5rd?= , Thomas Petazzoni Subject: Re: [PATCH] ARM: Runtime patch udiv/sdiv instructions into __aeabi_{u}idiv() In-Reply-To: Message-ID: References: <6991913.FSOzozNbSC@wuerfel> User-Agent: Alpine 2.20 (LFD 67 2015-01-07) MIME-Version: 1.0 Sender: linux-arm-msm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-arm-msm@vger.kernel.org On Fri, 11 Dec 2015, Nicolas Pitre wrote: > On Fri, 11 Dec 2015, Arnd Bergmann wrote: > > > Strictly speaking, we can ignore the thumb instruction for pj4, as all > > variants also support the normal T2 opcode for udiv/sdiv. > > OK it is gone. > > Now... The code bails out if ARM mode and HWCAP_IDIVA is not set so > effectively the ARM mode on PJ4 is currently not used either. So I'm > leaning towards having aving another patch to sort PJ4 out. Here's v2 of this patch with PJ4 removed so it can be sorted out separately. ----- >8 Subject: [PATCH] ARM: Runtime patch udiv/sdiv instructions into __aeabi_{u}idiv() The ARM compiler inserts calls to __aeabi_uidiv() and __aeabi_idiv() when it needs to perform division on signed and unsigned integers. If a processor has support for the udiv and sdiv instructions, the kernel may overwrite the beginning of those functions with those instructions and a "bx lr" to get better performance. To ensure those functions are aligned to a 32-bit word for easier patching (which might not always be the case in Thumb mode) and the two patched instructions for each case are contained in the same cache line, a 8-byte alignment is enforced when ARM_PATCH_IDIV is configured. This was heavily inspired by a previous patch by Stephen Boyd. Signed-off-by: Nicolas Pitre -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 34e1569a11..efea5fa975 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1604,6 +1604,21 @@ config THUMB2_AVOID_R_ARM_THM_JUMP11 config ARM_ASM_UNIFIED bool +config ARM_PATCH_IDIV + bool "Runtime patch udiv/sdiv instructions into __aeabi_{u}idiv()" + depends on CPU_32v7 && !XIP_KERNEL + default y + help + Some v7 CPUs have support for the udiv and sdiv instructions + that can be used to implement the __aeabi_uidiv and __aeabi_idiv + functions provided by the ARM runtime ABI. + + Enabling this option allows the kernel to modify itself to replace + the first two instructions of these library functions with the + udiv or sdiv plus "bx lr" instructions. Typically this will be + faster and less power intensive than running the original library + support code to do integer division. + config AEABI bool "Use the ARM EABI to compile the kernel" help diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 20edd349d3..332a0f6baf 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -375,6 +375,72 @@ void __init early_print(const char *str, ...) printk("%s", buf); } +#ifdef CONFIG_ARM_PATCH_IDIV + +static inline u32 __attribute_const__ sdiv_instruction(void) +{ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { + /* "sdiv r0, r0, r1" */ + u32 insn = __opcode_thumb32_compose(0xfb90, 0xf0f1); + return __opcode_to_mem_thumb32(insn); + } + + /* "sdiv r0, r0, r1" */ + return __opcode_to_mem_arm(0xe710f110); +} + +static inline u32 __attribute_const__ udiv_instruction(void) +{ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { + /* "udiv r0, r0, r1" */ + u32 insn = __opcode_thumb32_compose(0xfbb0, 0xf0f1); + return __opcode_to_mem_thumb32(insn); + } + + /* "udiv r0, r0, r1" */ + return __opcode_to_mem_arm(0xe730f110); +} + +static inline u32 __attribute_const__ bx_lr_instruction(void) +{ + if (IS_ENABLED(CONFIG_THUMB2_KERNEL)) { + /* "bx lr; nop" */ + u32 insn = __opcode_thumb32_compose(0x4770, 0x46c0); + return __opcode_to_mem_thumb32(insn); + } + + /* "bx lr" */ + return __opcode_to_mem_arm(0xe12fff1e); +} + +static void __init patch_aeabi_uidiv(void) +{ + extern void __aeabi_uidiv(void); + extern void __aeabi_idiv(void); + uintptr_t fn_addr; + unsigned int mask; + + mask = IS_ENABLED(CONFIG_THUMB2_KERNEL) ? HWCAP_IDIVT : HWCAP_IDIVA; + if (!(elf_hwcap & mask)) + return; + + pr_info("CPU: div instructions available: patching division code\n"); + + fn_addr = ((uintptr_t)&__aeabi_uidiv) & ~1; + ((u32 *)fn_addr)[0] = udiv_instruction(); + ((u32 *)fn_addr)[1] = bx_lr_instruction(); + flush_icache_range(fn_addr, fn_addr + 8); + + fn_addr = ((uintptr_t)&__aeabi_idiv) & ~1; + ((u32 *)fn_addr)[0] = sdiv_instruction(); + ((u32 *)fn_addr)[1] = bx_lr_instruction(); + flush_icache_range(fn_addr, fn_addr + 8); +} + +#else +static inline void patch_aeabi_uidiv(void) { } +#endif + static void __init cpuid_init_hwcaps(void) { int block; @@ -642,6 +708,7 @@ static void __init setup_processor(void) elf_hwcap = list->elf_hwcap; cpuid_init_hwcaps(); + patch_aeabi_uidiv(); #ifndef CONFIG_ARM_THUMB elf_hwcap &= ~(HWCAP_THUMB | HWCAP_IDIVT); diff --git a/arch/arm/lib/lib1funcs.S b/arch/arm/lib/lib1funcs.S index af2267f6a5..9397b2e532 100644 --- a/arch/arm/lib/lib1funcs.S +++ b/arch/arm/lib/lib1funcs.S @@ -205,6 +205,10 @@ Boston, MA 02111-1307, USA. */ .endm +#ifdef CONFIG_ARM_PATCH_IDIV + .align 3 +#endif + ENTRY(__udivsi3) ENTRY(__aeabi_uidiv) UNWIND(.fnstart) @@ -253,6 +257,10 @@ UNWIND(.fnstart) UNWIND(.fnend) ENDPROC(__umodsi3) +#ifdef CONFIG_ARM_PATCH_IDIV + .align 3 +#endif + ENTRY(__divsi3) ENTRY(__aeabi_idiv) UNWIND(.fnstart)