From patchwork Wed Dec 14 11:20:16 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 88005 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp140708qgi; Wed, 14 Dec 2016 03:22:10 -0800 (PST) X-Received: by 10.84.215.142 with SMTP id l14mr203070230pli.99.1481714530044; Wed, 14 Dec 2016 03:22:10 -0800 (PST) Return-Path: Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id i7si52383281pli.9.2016.12.14.03.22.09 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Dec 2016 03:22:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) client-ip=2001:1868:205::9; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) smtp.mailfrom=linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.85_2 #1 (Red Hat Linux)) id 1cH7cE-0004P3-ST; Wed, 14 Dec 2016 11:20:54 +0000 Received: from mail-wj0-x233.google.com ([2a00:1450:400c:c01::233]) by bombadil.infradead.org with esmtps (Exim 4.85_2 #1 (Red Hat Linux)) id 1cH7cA-0003vM-A9 for linux-arm-kernel@lists.infradead.org; Wed, 14 Dec 2016 11:20:52 +0000 Received: by mail-wj0-x233.google.com with SMTP id tk12so28260280wjb.3 for ; Wed, 14 Dec 2016 03:20:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=+C72nQUJRGtGgETTHOVRmnWkjwbzRHtW+DnbeL59D94=; b=ZnpygFE0HWS48LCz4xvTLb5mqisniGxnns2cNXeeACp9Ha4e7GrAUBNMvsdd8yPMYT r5avhxlBxEIc3UontzIrRIEWMz3bSM0F3Ifl4tgbsJup2Zxfif1STesDj645KsWl+vjQ G8ryTH0gJKtdrF/r/S7tsdbIXVbL2lJOZ1GVw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=+C72nQUJRGtGgETTHOVRmnWkjwbzRHtW+DnbeL59D94=; b=S5+DCSsZ7VcEUTUKSv5CLtcoJbQ0nrbrloOI31jkaX20FWCXImi8craKgmkR8C/Y5M iG+yG1M+En8SfVDRaGvrFLm4NyJbLRRwD+mL76XxbXI4ljElGGODewi3uVfMiX+lNvnb UamJjzkhnuGTXhOKkOx3r8KEvgPlrK+SkXTNx1KpdE9lKQVEht+hxTfXggVq7Yn6RxNs zUmHKkeNhUcmZabIsgPcL3gozZTYAgekFbI0HH+7ppjBgF3HZjh7BVG2lqIwViSyJTE3 Ciu19MtBGa8dvBghtdm0KturyK6AD4R2IGigsVUYyohO4DU9c/EMJbZaAsSe20lO3XKy 10kg== X-Gm-Message-State: AKaTC03jn1i3gwFB2op0MXInEKL8i6T2/Ya4kYwqmik94HBsbBuDid2lg91SUg3aK14bU+tS X-Received: by 10.28.40.67 with SMTP id o64mr6672565wmo.40.1481714425635; Wed, 14 Dec 2016 03:20:25 -0800 (PST) Received: from localhost.localdomain ([160.169.200.55]) by smtp.gmail.com with ESMTPSA id ab10sm66737884wjc.45.2016.12.14.03.20.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 14 Dec 2016 03:20:24 -0800 (PST) From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, dave.martin@arm.com Subject: [PATCH v7] arm64: fpsimd: improve stacking logic in non-interruptible context Date: Wed, 14 Dec 2016 11:20:16 +0000 Message-Id: <1481714416-19535-1-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.7.4 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20161214_032050_719400_F7EFDE24 X-CRM114-Status: GOOD ( 28.35 ) X-Spam-Score: -2.0 (--) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-2.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [2a00:1450:400c:c01:0:0:0:233 listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org Currently, we allow kernel mode NEON in softirq or hardirq context by stacking and unstacking a slice of the NEON register file for each call to kernel_neon_begin() and kernel_neon_end(), respectively. Given that a) a CPU typically spends most of its time in userland, during which time no kernel mode NEON in process context is in progress, b) a CPU spends most of its time in the kernel doing other things than kernel mode NEON when it gets interrupted to perform kernel mode NEON in softirq context the stacking and subsequent unstacking is only necessary if we are interrupting a thread while it is performing kernel mode NEON in process context, which means that in all other cases, we can simply preserve the userland FP/SIMD state once, and only restore it upon return to userland, even if we are being invoked from softirq or hardirq context. However, with support being added to teh arm64 kernel for Scalable Vector Extensions (SVE), which shares the bottom 128 bits of each FP/SIMD register, but could scale up to 2048 bits per register, the nested stacking and unstacking that occurs in interrupt context is no longer sufficient, given that the register contents will be truncated to 128 bits upon restore, unless we add support for stacking/unstacking the entire SVE state, which does not sound that appealing. This means that the FP/SIMD save state operation that encounters the userland state first *has* to be able to run to completion (since any interruption could truncate the contents of the registers, which would result in corrupted state to be restored once the interrupted context is allowed to resume preserving the state) Since executing all code involving the FP/SIMD state with interrupts disabled is undesirable, let's ban kernel mode NEON in hardirq context altogether. This is a small price to pay, given that the primary use case of kernel mode NEON, crypto, can deal with this quite easily (and simply falls back to generic scalar algorithms whose worse performance should not matter in hardirq context anyway) With hardirq context removed from the equation, we can modify the FP/SIMD state manipulation code to execute with softirqs disable. This allows the critical sections to complete without the risk of having the register contents getting corrupted half way through. Signed-off-by: Ard Biesheuvel --- v7: - ban kernel mode NEON in hardirq context, and execute all FP/SIMD state manipulations with softirqs disabled v6: - use a spinlock instead of disabling interrupts v5: - perform the test-and-set and the fpsimd_save_state with interrupts disabled, to prevent nested kernel_neon_begin()/_end() pairs to clobber the state while it is being preserved v4: - use this_cpu_inc/dec, which give sufficient guarantees regarding concurrency, but do not imply SMP barriers, which are not needed here v3: - avoid corruption by concurrent invocations of kernel_neon_begin()/_end() v2: - BUG() on unexpected values of the nesting level - relax the BUG() on num_regs>32 to a WARN, given that nothing actually breaks in that case arch/arm64/include/asm/Kbuild | 1 - arch/arm64/include/asm/simd.h | 16 ++++ arch/arm64/kernel/fpsimd.c | 77 ++++++++++++++------ 3 files changed, 72 insertions(+), 22 deletions(-) -- 2.7.4 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 44e1d7f10add..39ca0409e157 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -33,7 +33,6 @@ generic-y += segment.h generic-y += sembuf.h generic-y += serial.h generic-y += shmbuf.h -generic-y += simd.h generic-y += sizes.h generic-y += socket.h generic-y += sockios.h diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h new file mode 100644 index 000000000000..142eca29125a --- /dev/null +++ b/arch/arm64/include/asm/simd.h @@ -0,0 +1,16 @@ + +#include + +/* + * may_use_simd - whether it is allowable at this time to issue SIMD + * instructions or access the SIMD register file + * + * On arm64, we allow kernel mode NEON in softirq context but not in hardirq + * context, due to the fact that the NEON register file may be shared with SVE, + * whose state may too large to preserve/restore efficiently at each invocation + * of kernel_neon_begin()/_end() in hardirq context. + */ +static __must_check inline bool may_use_simd(void) +{ + return !in_irq(); +} diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 394c61db5566..97344c94acae 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -127,6 +127,8 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) void fpsimd_thread_switch(struct task_struct *next) { + BUG_ON(!irqs_disabled()); + /* * Save the current FPSIMD state to memory, but only if whatever is in * the registers is in fact the most recent userland FPSIMD state of @@ -169,8 +171,10 @@ void fpsimd_flush_thread(void) void fpsimd_preserve_current_state(void) { preempt_disable(); + local_bh_disable(); if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) fpsimd_save_state(¤t->thread.fpsimd_state); + local_bh_enable(); preempt_enable(); } @@ -182,6 +186,7 @@ void fpsimd_preserve_current_state(void) void fpsimd_restore_current_state(void) { preempt_disable(); + local_bh_disable(); if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { struct fpsimd_state *st = ¤t->thread.fpsimd_state; @@ -189,6 +194,7 @@ void fpsimd_restore_current_state(void) this_cpu_write(fpsimd_last_state, st); st->cpu = smp_processor_id(); } + local_bh_enable(); preempt_enable(); } @@ -200,6 +206,7 @@ void fpsimd_restore_current_state(void) void fpsimd_update_current_state(struct fpsimd_state *state) { preempt_disable(); + local_bh_disable(); fpsimd_load_state(state); if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { struct fpsimd_state *st = ¤t->thread.fpsimd_state; @@ -207,6 +214,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state) this_cpu_write(fpsimd_last_state, st); st->cpu = smp_processor_id(); } + local_bh_enable(); preempt_enable(); } @@ -220,45 +228,68 @@ void fpsimd_flush_task_state(struct task_struct *t) #ifdef CONFIG_KERNEL_MODE_NEON -static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate); -static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate); +static DEFINE_PER_CPU(struct fpsimd_partial_state, nested_fpsimdstate); +static DEFINE_PER_CPU(int, kernel_neon_nesting_level); /* * Kernel-side NEON support functions */ void kernel_neon_begin_partial(u32 num_regs) { - if (in_interrupt()) { - struct fpsimd_partial_state *s = this_cpu_ptr( - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); + int level; - BUG_ON(num_regs > 32); - fpsimd_save_partial_state(s, roundup(num_regs, 2)); - } else { + /* + * We don't allow kernel mode NEON in hard IRQ context because we'd + * have to assume that any sequence involving preserve/restore of the + * FP/SIMD register file could be interrupted by nested use of the NEON. + * + * On SVE capable hardware, that would necessitate executing all + * manipulation of the preserved FP/SIMD state with interrupts disabled, + * unless we preserve/restore the *entire* SVE state in interrupt + * context as well. + */ + BUG_ON(in_irq()); + + preempt_disable(); + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE)) { /* * Save the userland FPSIMD state if we have one and if we * haven't done so already. Clear fpsimd_last_state to indicate * that there is no longer userland FPSIMD state in the * registers. */ - preempt_disable(); - if (current->mm && - !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE)) + local_bh_disable(); + if (!test_and_set_thread_flag(TIF_FOREIGN_FPSTATE)) fpsimd_save_state(¤t->thread.fpsimd_state); - this_cpu_write(fpsimd_last_state, NULL); + local_bh_enable(); + } + this_cpu_write(fpsimd_last_state, NULL); + + level = this_cpu_inc_return(kernel_neon_nesting_level); + BUG_ON(level > 2); + + if (level > 1) { + WARN_ON_ONCE(num_regs > 32); + num_regs = max(roundup(num_regs, 2), 32U); + + fpsimd_save_partial_state(this_cpu_ptr(&nested_fpsimdstate), + num_regs); } } EXPORT_SYMBOL(kernel_neon_begin_partial); void kernel_neon_end(void) { - if (in_interrupt()) { - struct fpsimd_partial_state *s = this_cpu_ptr( - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); - fpsimd_load_partial_state(s); - } else { - preempt_enable(); - } + int level; + + level = this_cpu_read(kernel_neon_nesting_level); + BUG_ON(level < 1); + + if (level > 1) + fpsimd_load_partial_state(this_cpu_ptr(&nested_fpsimdstate)); + + this_cpu_dec(kernel_neon_nesting_level); + preempt_enable(); } EXPORT_SYMBOL(kernel_neon_end); @@ -270,8 +301,12 @@ static int fpsimd_cpu_pm_notifier(struct notifier_block *self, { switch (cmd) { case CPU_PM_ENTER: - if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE)) - fpsimd_save_state(¤t->thread.fpsimd_state); + if (current->mm) { + local_bh_disable(); + if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) + fpsimd_save_state(¤t->thread.fpsimd_state); + local_bh_enable(); + } this_cpu_write(fpsimd_last_state, NULL); break; case CPU_PM_EXIT: