From patchwork Tue Dec 13 12:35:29 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 87874 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp2178294qgi; Tue, 13 Dec 2016 04:37:03 -0800 (PST) X-Received: by 10.99.111.201 with SMTP id k192mr175939451pgc.176.1481632623784; Tue, 13 Dec 2016 04:37:03 -0800 (PST) Return-Path: Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id f17si47784376pgg.308.2016.12.13.04.37.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 13 Dec 2016 04:37:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) client-ip=2001:1868:205::9; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) smtp.mailfrom=linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.85_2 #1 (Red Hat Linux)) id 1cGmJO-0006qJ-BL; Tue, 13 Dec 2016 12:36:02 +0000 Received: from mail-wj0-x236.google.com ([2a00:1450:400c:c01::236]) by bombadil.infradead.org with esmtps (Exim 4.85_2 #1 (Red Hat Linux)) id 1cGmJJ-0006hj-8R for linux-arm-kernel@lists.infradead.org; Tue, 13 Dec 2016 12:35:59 +0000 Received: by mail-wj0-x236.google.com with SMTP id tk12so99260611wjb.3 for ; Tue, 13 Dec 2016 04:35:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=FIsGBYp4S1H/fA9Q3jB8iy9Bw38BMwXFdiFfUwSR1MQ=; b=gNjL8AusrVYAZKYwp/xyXmKg3EzHNlSR6+7J1pRjRTZjJ66KW/74+K95ERpDWIdFrO diD2JDxY1hMkRF8DGWhrOlN2ETuBE8RTgfOcjNfTsoNn6uItmbO0xMBMPb+vvRMTt0ZM dy6dw8MznOGNrFR8XtPPsf03zRWG+LiIzMTfE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=FIsGBYp4S1H/fA9Q3jB8iy9Bw38BMwXFdiFfUwSR1MQ=; b=F0BeHfjen1R7/kqKSfPO0oI9Tpb+I92JAeak4yU0AaG07QDUQRhL2r1R3oZHvkXj8I m1yMWrm1WZ5Jp8AQ8t+6e4LWgglkySWP6EttpFbCOJl7pNBcs/orJBTg1WzJo2gTcTem ow4uzoiFoTzeBnMpI/fnjstvQAlZscUn25UmtNI30NS4Dzdd1vnXeo2AWMDN4e3/t30u XVDbZ+hufCz2ltLYAS+E/4GUpI6e66JFfM76YpbYgevnyAXxPbLz19LfmKTYihNK21sP ic95NPuIXMbPjYcC6TjvGlL0mjI4yvPi9dssZWDMjw2mQb5tOPtto4C7jJTYh9QZwL1J 5Ydg== X-Gm-Message-State: AKaTC00fTcp3oD01sT+zTliCwQdbHoiCwDK58v6tSPqrn803zDUdfWF+EqCja9CD8CFhqeHd X-Received: by 10.194.157.3 with SMTP id wi3mr84196323wjb.0.1481632534968; Tue, 13 Dec 2016 04:35:34 -0800 (PST) Received: from localhost.localdomain ([105.150.173.252]) by smtp.gmail.com with ESMTPSA id x5sm62013017wje.36.2016.12.13.04.35.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 13 Dec 2016 04:35:34 -0800 (PST) From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, dave.martin@arm.com Subject: [PATCH v6] arm64: fpsimd: improve stacking logic in non-interruptible context Date: Tue, 13 Dec 2016 12:35:29 +0000 Message-Id: <1481632529-24218-1-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.7.4 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20161213_043557_526978_7801D22C X-CRM114-Status: GOOD ( 21.20 ) X-Spam-Score: -2.0 (--) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-2.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [2a00:1450:400c:c01:0:0:0:236 listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org Currently, we allow kernel mode NEON in softirq or hardirq context by stacking and unstacking a slice of the NEON register file for each call to kernel_neon_begin() and kernel_neon_end(), respectively. Given that a) a CPU typically spends most of its time in userland, during which time no kernel mode NEON in process context is in progress, b) a CPU spends most of its time in the kernel doing other things than kernel mode NEON when it gets interrupted to perform kernel mode NEON in softirq context the stacking and subsequent unstacking is only necessary if we are interrupting a thread while it is performing kernel mode NEON in process context, which means that in all other cases, we can simply preserve the userland FPSIMD state once, and only restore it upon return to userland, even if we are being invoked from softirq or hardirq context. So instead of checking whether we are running in interrupt context, keep track of the level of nested kernel mode NEON calls in progress, and only perform the eager stack/unstack if the level exceeds 1. Signed-off-by: Ard Biesheuvel --- v6: - use a spinlock instead of disabling interrupts v5: - perform the test-and-set and the fpsimd_save_state with interrupts disabled, to prevent nested kernel_neon_begin()/_end() pairs to clobber the state while it is being preserved v4: - use this_cpu_inc/dec, which give sufficient guarantees regarding concurrency, but do not imply SMP barriers, which are not needed here v3: - avoid corruption by concurrent invocations of kernel_neon_begin()/_end() v2: - BUG() on unexpected values of the nesting level - relax the BUG() on num_regs>32 to a WARN, given that nothing actually breaks in that case arch/arm64/include/asm/fpsimd.h | 3 + arch/arm64/kernel/fpsimd.c | 77 ++++++++++++++------ 2 files changed, 58 insertions(+), 22 deletions(-) -- 2.7.4 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 50f559f574fe..ee09fe1c37b6 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -17,6 +17,7 @@ #define __ASM_FP_H #include +#include #ifndef __ASSEMBLY__ @@ -37,6 +38,8 @@ struct fpsimd_state { u32 fpcr; }; }; + /* lock protecting the preserved state while it is being saved */ + spinlock_t lock; /* the id of the last cpu to have restored this state */ unsigned int cpu; }; diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 394c61db5566..886eea2d4084 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -160,6 +160,7 @@ void fpsimd_flush_thread(void) memset(¤t->thread.fpsimd_state, 0, sizeof(struct fpsimd_state)); fpsimd_flush_task_state(current); set_thread_flag(TIF_FOREIGN_FPSTATE); + spin_lock_init(¤t->thread.fpsimd_state.lock); } /* @@ -220,45 +221,77 @@ void fpsimd_flush_task_state(struct task_struct *t) #ifdef CONFIG_KERNEL_MODE_NEON -static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate); -static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate); +/* + * Although unlikely, it is possible for three kernel mode NEON contexts to + * be live at the same time: process context, softirq context and hardirq + * context. So while the userland context is stashed in the thread's fpsimd + * state structure, we need two additional levels of storage. + */ +static DEFINE_PER_CPU(struct fpsimd_partial_state, nested_fpsimdstate[2]); +static DEFINE_PER_CPU(int, kernel_neon_nesting_level); /* * Kernel-side NEON support functions */ void kernel_neon_begin_partial(u32 num_regs) { - if (in_interrupt()) { - struct fpsimd_partial_state *s = this_cpu_ptr( - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); + struct fpsimd_partial_state *s; + int level; - BUG_ON(num_regs > 32); - fpsimd_save_partial_state(s, roundup(num_regs, 2)); - } else { + preempt_disable(); + + /* + * Save the userland FPSIMD state if we have one and if we + * haven't done so already. Clear fpsimd_last_state to indicate + * that there is no longer userland FPSIMD state in the + * registers. + */ + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE)) { /* - * Save the userland FPSIMD state if we have one and if we - * haven't done so already. Clear fpsimd_last_state to indicate - * that there is no longer userland FPSIMD state in the + * Whoever test-and-sets the TIF_FOREIGN_FPSTATE flag should + * preserve the userland FP/SIMD state without interruption by + * nested kernel_neon_begin()/_end() calls. + * The reason is that the FP/SIMD register file is shared with + * SVE on hardware that supports it, and nested kernel mode NEON + * calls do not restore the upper part of the shared SVE/SIMD * registers. */ - preempt_disable(); - if (current->mm && - !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE)) - fpsimd_save_state(¤t->thread.fpsimd_state); - this_cpu_write(fpsimd_last_state, NULL); + if (spin_trylock(¤t->thread.fpsimd_state.lock)) { + if (!test_and_set_thread_flag(TIF_FOREIGN_FPSTATE)) + fpsimd_save_state(¤t->thread.fpsimd_state); + spin_unlock(¤t->thread.fpsimd_state.lock); + } + } + this_cpu_write(fpsimd_last_state, NULL); + + level = this_cpu_inc_return(kernel_neon_nesting_level); + BUG_ON(level > 3); + + if (level > 1) { + s = this_cpu_ptr(nested_fpsimdstate); + + WARN_ON_ONCE(num_regs > 32); + num_regs = min(roundup(num_regs, 2), 32U); + + fpsimd_save_partial_state(&s[level - 2], num_regs); } } EXPORT_SYMBOL(kernel_neon_begin_partial); void kernel_neon_end(void) { - if (in_interrupt()) { - struct fpsimd_partial_state *s = this_cpu_ptr( - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); - fpsimd_load_partial_state(s); - } else { - preempt_enable(); + struct fpsimd_partial_state *s; + int level; + + level = this_cpu_read(kernel_neon_nesting_level); + BUG_ON(level < 1); + + if (level > 1) { + s = this_cpu_ptr(nested_fpsimdstate); + fpsimd_load_partial_state(&s[level - 2]); } + this_cpu_dec(kernel_neon_nesting_level); + preempt_enable(); } EXPORT_SYMBOL(kernel_neon_end);