From patchwork Mon Dec 4 12:26:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120527 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4366125qgn; Mon, 4 Dec 2017 04:27:53 -0800 (PST) X-Google-Smtp-Source: AGs4zMZ0C5tdvdp2I+sM9GaDQLd9FR5d/inQSrOu9huBuhY/ADjHgs7N3+AlufjR93GxiefCNJ1M X-Received: by 10.98.32.21 with SMTP id g21mr19263862pfg.52.1512390473009; Mon, 04 Dec 2017 04:27:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390473; cv=none; d=google.com; s=arc-20160816; b=uaW3MvIWnvpPzq7k6VoSU/Gpnd2mHeQmjXeFPKOsm5akCx5YADOMtTcrQIkVFsaWQF S4ug9/ViqcsadOoKFIOmtZ0zbDkswf2564AmZmsdAxBrklFurfRUAVkrhviywJhkajjp jxmSdxihgCFsUPSLlpx1qgj1XthdnhQlQ6t2K6+mAIcV1LycM/UdvfQaqsK23N/yN02Y wsTOccwcNaua3yzhD2HQ/Li3Z9dHUeJV3Bsx2LKj01iaK33HlaFduyep297pJ9y0x+fF dFi725BaRYGVk11XrVFCcTh5/4Q+/qWn7QmnYO3dUZqUOGZqI7BO5eTB+0QhWS/4IHtX GDuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=r3pWMbsQK1Jd8mT5KgNAPizdVYdUe5jOnoV7PBK9uJU=; b=WIfpyczhphvtOaJT/d0H22/i4ivGUL46f39QrS1vWiJiuAT/tDSY92puRy4Sq3cSjH gJNkOIVpdi1fU2wtvT+lGcYlcm4pHRhO6S75ILXgGV7JfHToiGcxKC4fklwoqY9KxPLB 4hRoraOyD89fM3Zpa4NHP5XWx+J/JBraIYWiOOeEsXpi7Sbq1iqDFcOe12TvNtEzNy3o WpqZkZXOD9LKpzBsO2RLIKR8p+DmVSbrTtnBa0smob+6Z+ZXmORfk9YdWf6YTPMaM6ni RSw/EtV9UMJ/bjXVoMqiSGv17M0w+15aUyvxqYjFBV7F1l5T2oFafOzgmJ/NwjF/m1Ho 3neQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kGajrMYV; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.52; Mon, 04 Dec 2017 04:27:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kGajrMYV; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753382AbdLDM1v (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:51 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:40343 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753330AbdLDM1s (ORCPT ); Mon, 4 Dec 2017 07:27:48 -0500 Received: by mail-wr0-f196.google.com with SMTP id q9so17063198wre.7 for ; Mon, 04 Dec 2017 04:27:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=r3pWMbsQK1Jd8mT5KgNAPizdVYdUe5jOnoV7PBK9uJU=; b=kGajrMYVWr2DrIDPBGOAoBuGMS4jRsgcCIK9v80ULwBvyTf6Dg8wUjMu4KECCLpUWq QARSFBT69OdgKKepLJENdOSBzHGpN+cNeUIaw0ZI4cjizCkFyKNSRlAkDmUFZOh5q2PQ yYdOzJLbTyFbbpToLeIs9NcPaFHQc3gLZ1HVc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=r3pWMbsQK1Jd8mT5KgNAPizdVYdUe5jOnoV7PBK9uJU=; b=k/7M4Xwm80ODh3eDELdtwNqXZ5xKOPyFcUQrS/9pxq/5HB2qfQiVC2dzrt7/K4aGKB 0iP5U7KsCRyrQJmUq2AkI3Yl26+HG8jTuviu9aRAnYMYK9mU3KGSzB8Iq8ZQHz+m9oQ1 BKO2XUEXaAvgFjzF2GxqPOYx8pKEozyL1o/0Dd8lZi2i+VrzFvBswvDAW8YnrKvnAPdl aMRkMNjVY3X8dSBPHS/1R3O6ZJwiN5BQCngDvvC/3MfQ4o+nIuH2ZwZEPBuXGQ1MFLSw y9V/S+6/2igtlE0gI/0HudjGDTcRwywgX6R1Lj5g1UbXGCGfU8Kb3V3HxnNkqhm4B39q KP5Q== X-Gm-Message-State: AJaThX6G4ohTwa5falcTMYPLmOxf7pMTqIJi9FRjj+sfTKqWVJl+ke5V 55N1+I+W4MFFE+3STWhIurkCUymbwvM= X-Received: by 10.223.176.27 with SMTP id f27mr13178031wra.105.1512390467409; Mon, 04 Dec 2017 04:27:47 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:46 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 17/19] crypto: arm64/crc32-ce - yield NEON every 16 blocks of input Date: Mon, 4 Dec 2017 12:26:43 +0000 Message-Id: <20171204122645.31535-18-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON every 16 blocks of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/crc32-ce-core.S | 55 +++++++++++++++----- 1 file changed, 43 insertions(+), 12 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/crc32-ce-core.S b/arch/arm64/crypto/crc32-ce-core.S index 18f5a8442276..bca3d22fae7b 100644 --- a/arch/arm64/crypto/crc32-ce-core.S +++ b/arch/arm64/crypto/crc32-ce-core.S @@ -100,9 +100,9 @@ dCONSTANT .req d0 qCONSTANT .req q0 - BUF .req x0 - LEN .req x1 - CRC .req x2 + BUF .req x19 + LEN .req x20 + CRC .req x21 vzr .req v9 @@ -116,13 +116,27 @@ * size_t len, uint crc32) */ ENTRY(crc32_pmull_le) - adr x3, .Lcrc32_constants + stp x29, x30, [sp, #-112]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + + adr x22, .Lcrc32_constants b 0f ENTRY(crc32c_pmull_le) - adr x3, .Lcrc32c_constants + stp x29, x30, [sp, #-112]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + + adr x22, .Lcrc32c_constants -0: bic LEN, LEN, #15 +0: mov BUF, x0 + mov LEN, x1 + mov CRC, x2 + + bic LEN, LEN, #15 ld1 {v1.16b-v4.16b}, [BUF], #0x40 movi vzr.16b, #0 fmov dCONSTANT, CRC @@ -131,7 +145,7 @@ ENTRY(crc32c_pmull_le) cmp LEN, #0x40 b.lt less_64 - ldr qCONSTANT, [x3] + ldr qCONSTANT, [x22] loop_64: /* 64 bytes Full cache line folding */ sub LEN, LEN, #0x40 @@ -161,10 +175,24 @@ loop_64: /* 64 bytes Full cache line folding */ eor v4.16b, v4.16b, v8.16b cmp LEN, #0x40 - b.ge loop_64 + b.lt less_64 + + yield_neon_pre LEN, 4, 64, loop_64 // yield every 16 blocks + stp q1, q2, [sp, #48] + stp q3, q4, [sp, #80] + yield_neon_post 2f + b loop_64 + + .subsection 1 +2: ldp q1, q2, [sp, #48] + ldp q3, q4, [sp, #80] + ldr qCONSTANT, [x22] + movi vzr.16b, #0 + b loop_64 + .previous less_64: /* Folding cache line into 128bit */ - ldr qCONSTANT, [x3, #16] + ldr qCONSTANT, [x22, #16] pmull2 v5.1q, v1.2d, vCONSTANT.2d pmull v1.1q, v1.1d, vCONSTANT.1d @@ -203,8 +231,8 @@ fold_64: eor v1.16b, v1.16b, v2.16b /* final 32-bit fold */ - ldr dCONSTANT, [x3, #32] - ldr d3, [x3, #40] + ldr dCONSTANT, [x22, #32] + ldr d3, [x22, #40] ext v2.16b, v1.16b, vzr.16b, #4 and v1.16b, v1.16b, v3.16b @@ -212,7 +240,7 @@ fold_64: eor v1.16b, v1.16b, v2.16b /* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */ - ldr qCONSTANT, [x3, #48] + ldr qCONSTANT, [x22, #48] and v2.16b, v1.16b, v3.16b ext v2.16b, vzr.16b, v2.16b, #8 @@ -222,6 +250,9 @@ fold_64: eor v1.16b, v1.16b, v2.16b mov w0, v1.s[1] + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x29, x30, [sp], #112 ret ENDPROC(crc32_pmull_le) ENDPROC(crc32c_pmull_le)