From patchwork Thu Jan 11 12:33:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 761963 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E8A215AC5 for ; Thu, 11 Jan 2024 12:33:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tiEueNP9" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-d9a541b720aso7073640276.0 for ; Thu, 11 Jan 2024 04:33:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704976405; x=1705581205; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1r4P+/itv3ZU+X95tsTa9mZmYFVKE80ClGK6kPu6wRQ=; b=tiEueNP9/4HaboDNIRMfZxQD/5zjyyA9iNt8VpEbnEgXUXVovphpi5Xw4G/KjwNJoG 8quWULRAxdivoD932EDXBSWwezyjDbeAGrO3dBnr9ODrspIcn7AnFac5CQszbGVqd42G 8xiVlGbmNTE43z5t/guU2MNK2B8cgu4ikr2gKPfYOdtn8nhv5n0mq4y9QriwDooM7PxK CWh/jWsFoI8hX6IZ6kvMEV6QNZe+ws4mX2XykcOPOPOMEWiivHh4XGvVD2EOiKzCsQh9 t+RrHDpaAyeRfcPsjbi61XiZ11j2LfAJBbSAfkpNtdtawbDCHYwu3CFrgeTfjHQeLSax I9Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704976405; x=1705581205; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1r4P+/itv3ZU+X95tsTa9mZmYFVKE80ClGK6kPu6wRQ=; b=IvaEbF3E9WdI7HkI7VCcK27WaAAivDiWjxqPRCd2qIibZulB2+j14DIK/T7DPhffIU vmqrqBmPImHadpF6dWKkZ1xbjHhRv9ftqXGn42bKEVrzBmlV59O/q+rue/7Rc3crFXZP YaGBBIlNW5hUSZMEyM3mYTxUFcdTJSAd3anMTBXP1B82qEdqqy2XjT041F22XwfIuckq iGNMAQjKuhOxA/3q65YMt2/1Dus3aVnc4Yod7UUYxrXvX4Z6csSfRuoz3wIfaI+QfpsH wXD2MrCtpz5Hwwsa9AfZm7Z1pnJWpdQpmBHt8WMgq4j4dwUR+pxMAQaAEF/4uGWuCpdh fspQ== X-Gm-Message-State: AOJu0YwWIsGfn+3VA06JVee54EAXPWUAmiC5CtGyBefV1DM8wGJCUzjp JN44EPh+cjWTe5wdCMMxYBSWgW41fWOvmvDP2jvvME6OTm7LIGQIZ+tm/Op03+z8klUX+Jvkw0b Ej6B6JBX6J31ALgdv9iDvTPYdeCxAWx6VrXGXd+pF4u4GMOcvTRWTnxmIbNu9w5yqNjIYVEs= X-Google-Smtp-Source: AGHT+IFZ+w0pXnhEuIt5QjyyfDZ6Y28NLSyTv1QBugcUTAoHS4xRHg2dTYYRphLtevBSfgkhBCEfOKGt X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a05:6902:18c1:b0:dbe:23c0:baaf with SMTP id ck1-20020a05690218c100b00dbe23c0baafmr459335ybb.6.1704976405198; Thu, 11 Jan 2024 04:33:25 -0800 (PST) Date: Thu, 11 Jan 2024 13:33:04 +0100 In-Reply-To: <20240111123302.589910-10-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111123302.589910-10-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=4040; i=ardb@kernel.org; h=from:subject; bh=6y8C/lj/zNWs2gTnw3E09HW8DY1N4ZPhG1WnxorPOHQ=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIXX+/f+r+52usPqtF19xWuyzruavl19P3r1xbc/CN9bRa 6IN1mgFdZSyMIhxMMiKKbIIzP77bufpiVK1zrNkYeawMoEMYeDiFICJPL7G8N/9/bkz8ySjVDPV hA++zW5pbYjI3/bTRdByy5LEgg0GPFMZGbqm13c08JpWfbn2d0L9TxutI7IsXNmOOb3KV33NjQq U2QA= X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111123302.589910-11-ardb+git@google.com> Subject: [PATCH 1/8] crypto: arm64/aes-ccm - Revert "Rewrite skcipher walker loop" From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: ebiggers@kernel.org, herbert@gondor.apana.org.au, Ard Biesheuvel From: Ard Biesheuvel This reverts commit 57ead1bf1c54, which updated the CCM code to only rely on walk.nbytes to check for failures returned from the skcipher walk API, mostly for the common good rather than to fix a particular problem in the code. This change introduces a problem of its own: the skcipher walk is started with the 'atomic' argument set to false, which means that the skcipher walk API is permitted to sleep. Subsequently, it invokes skcipher_walk_done() with preemption disabled on the final iteration of the loop. This appears to work by accident, but it is arguably a bad example, and providing a better example was the point of the original patch. Given that future changes to the CCM code will rely on the original behavior of entering the loop even for zero sized inputs, let's just revert this change entirely, and proceed from there. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 57 +++++++++++--------- 1 file changed, 31 insertions(+), 26 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index 25cd3808ecbe..c4f14415f5f0 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -161,39 +161,43 @@ static int ccm_encrypt(struct aead_request *req) memcpy(buf, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_encrypt(&walk, req, false); + if (unlikely(err)) + return err; kernel_neon_begin(); if (req->assoclen) ccm_calculate_auth_mac(req, mac); - while (walk.nbytes) { + do { u32 tail = walk.nbytes % AES_BLOCK_SIZE; - bool final = walk.nbytes == walk.total; - if (final) + if (walk.nbytes == walk.total) tail = 0; ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); - if (!final) - kernel_neon_end(); - err = skcipher_walk_done(&walk, tail); - if (!final) - kernel_neon_begin(); - } + if (walk.nbytes == walk.total) + ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); + kernel_neon_end(); - kernel_neon_end(); + if (walk.nbytes) { + err = skcipher_walk_done(&walk, tail); + if (unlikely(err)) + return err; + if (unlikely(walk.nbytes)) + kernel_neon_begin(); + } + } while (walk.nbytes); /* copy authtag to end of dst */ scatterwalk_map_and_copy(mac, req->dst, req->assoclen + req->cryptlen, crypto_aead_authsize(aead), 1); - return err; + return 0; } static int ccm_decrypt(struct aead_request *req) @@ -215,36 +219,37 @@ static int ccm_decrypt(struct aead_request *req) memcpy(buf, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_decrypt(&walk, req, false); + if (unlikely(err)) + return err; kernel_neon_begin(); if (req->assoclen) ccm_calculate_auth_mac(req, mac); - while (walk.nbytes) { + do { u32 tail = walk.nbytes % AES_BLOCK_SIZE; - bool final = walk.nbytes == walk.total; - if (final) + if (walk.nbytes == walk.total) tail = 0; ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); - if (!final) - kernel_neon_end(); - err = skcipher_walk_done(&walk, tail); - if (!final) - kernel_neon_begin(); - } + if (walk.nbytes == walk.total) + ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); + kernel_neon_end(); - kernel_neon_end(); - - if (unlikely(err)) - return err; + if (walk.nbytes) { + err = skcipher_walk_done(&walk, tail); + if (unlikely(err)) + return err; + if (unlikely(walk.nbytes)) + kernel_neon_begin(); + } + } while (walk.nbytes); /* compare calculated auth tag with the stored one */ scatterwalk_map_and_copy(buf, req->src, From patchwork Thu Jan 11 12:33:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 762246 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8428815AC5 for ; Thu, 11 Jan 2024 12:33:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YerqpYop" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-33768a5f55cso3117734f8f.1 for ; Thu, 11 Jan 2024 04:33:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704976408; x=1705581208; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TzgsTumrVcxIBPU7QnQ9Rqml590Mt0QTqnlJ397suzE=; b=YerqpYopwKGskaN0YwyuMmX0E7zcwonlpb+Pdtsk+1I6cPq1y7Z+wbXXLefvl7v0aK V6yp/WuZuxpIt+uh1YqOgX6Ood0fqGurab/253IrSFEm7joesBTP3OYQdyHy8GRL6rrD WkfgR/f06r0HZIy20DjKWa8QrbUUvjpn2Dgq+gBZxjWMRdMupEWXJKmJT55qKd2mZ5Ss p92RLd5q10qywP1wyyigC4a18EZ02OiKvWmuqpnQsqqX9aWQ3Mvd1WDCJ0FPSDS5jN9p oNFH8h+Nr7slwq23Elp3Syl9Osp4KQGMVPUwuxf6y1SZJN2WjR+jZucDnoyQH/Uwchle hREA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704976408; x=1705581208; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TzgsTumrVcxIBPU7QnQ9Rqml590Mt0QTqnlJ397suzE=; b=GKB1UlIgAU7ms2JYKAo3Z3rqZNpogTj9aDUy46spHAXM1oMYV2KpVvOxKgxlqVUNBy DvJdb+i9UgZIZLcZsvgOYcdJp/sqwc3/7xQSVu/I/DN5dzQVxJQHXiJV9h3pmPl4IRQa QmSUffaRX7H5bg29GbVgXQUIGZqIaczCrREit8BhB4j3p5K9X2Uh9oYxshK6eWDJ6p7N MVbVAADsO1mKPUOzd6HZOa2k/2QD4JoOsLSy9AEKS2LeBXlytkRpzdvLrf9n6AzmGHcf FjvtSoeBG/s7ga0w004RR46nIIP8gOUUl1scm0bl9WjCdkRuDfNo8/pd1n0zlLaMDfKP mdtg== X-Gm-Message-State: AOJu0Yx8Q72WF25hkygxvWdlZKrKbPnyLsgWP8FNZ+vwHdcNlbUbWtLH DQUO33zEgG6E5gUTeGl3LeCqUVp/feH20ip8iIqCO/f6LWpszJkY5HXERikZMwudbomntjDBsje nCAlKvp1xeOd0IiBZ5h8S6Ng0t+kEMxM241DNJfPXx4dadqQVjR56weijsLjQQJviHFd9CXk= X-Google-Smtp-Source: AGHT+IFVtIwgAB8S5lsgRb9QdquxXYtzMM+/kerKZ5QqW8PLlk2JjmKbZgY27RaJgJsSvnIYCkuy+8L9 X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a05:600c:3581:b0:40e:46bc:f757 with SMTP id p1-20020a05600c358100b0040e46bcf757mr9227wmq.3.1704976407436; Thu, 11 Jan 2024 04:33:27 -0800 (PST) Date: Thu, 11 Jan 2024 13:33:05 +0100 In-Reply-To: <20240111123302.589910-10-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111123302.589910-10-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=1868; i=ardb@kernel.org; h=from:subject; bh=cy89LZAGVQoSF1sLZhWsnNc4qSFNESYSS670zbbSREU=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIXX+AwYHxgnO/6omWK6ddmCu7NUbiw/cLJ6qt0FWedpcj eY0h7hbHaUsDGIcDLJiiiwCs/++23l6olSt8yxZmDmsTCBDGLg4BWAiM48zMjQ/mqte9YWDIeRt 9H6t43/+LmWI+zMxLvSWYmo0r+psBVuGv5I5E87ruzTJ/PL3SA38b8u1ttw+uPi0IU/+a9UsydX JzAA= X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111123302.589910-12-ardb+git@google.com> Subject: [PATCH 2/8] crypto: arm64/aes-ccm - Keep NEON enabled during skcipher walk From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: ebiggers@kernel.org, herbert@gondor.apana.org.au, Ard Biesheuvel From: Ard Biesheuvel Now that kernel mode NEON no longer disables preemption, we no longer have to take care to disable and re-enable use of the NEON when calling into the skcipher walk API. So just keep it enabled until done. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 22 +++++++++----------- 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index c4f14415f5f0..b177ebea7d09 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -182,17 +182,16 @@ static int ccm_encrypt(struct aead_request *req) if (walk.nbytes == walk.total) ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - kernel_neon_end(); - if (walk.nbytes) { err = skcipher_walk_done(&walk, tail); - if (unlikely(err)) - return err; - if (unlikely(walk.nbytes)) - kernel_neon_begin(); } } while (walk.nbytes); + kernel_neon_end(); + + if (unlikely(err)) + return err; + /* copy authtag to end of dst */ scatterwalk_map_and_copy(mac, req->dst, req->assoclen + req->cryptlen, crypto_aead_authsize(aead), 1); @@ -240,17 +239,16 @@ static int ccm_decrypt(struct aead_request *req) if (walk.nbytes == walk.total) ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - kernel_neon_end(); - if (walk.nbytes) { err = skcipher_walk_done(&walk, tail); - if (unlikely(err)) - return err; - if (unlikely(walk.nbytes)) - kernel_neon_begin(); } } while (walk.nbytes); + kernel_neon_end(); + + if (unlikely(err)) + return err; + /* compare calculated auth tag with the stored one */ scatterwalk_map_and_copy(buf, req->src, req->assoclen + req->cryptlen - authsize, From patchwork Thu Jan 11 12:33:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 761962 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D45D15AC5 for ; Thu, 11 Jan 2024 12:33:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RcmyVJnF" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-5f6f51cd7e8so73736667b3.1 for ; Thu, 11 Jan 2024 04:33:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704976410; x=1705581210; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=SpZJATqCZATuxaz2gilHI43oqOqQXViPLDcKIFXRnDk=; b=RcmyVJnFI51qaHWDNWFhRNKi0MenLXz/8BcAaQocmH6AhJrwI8wQrHodqULszDQuof 7HIfCOaFYaMaj/5I+bXEEIvWSVTqwPqRoBtCFaCI+h82QV1gPdTEqjHlAHRH4Vd6+sqe +BDvqKicAx60/sr5tQrrVg2V7ihIE0ZGwz32CjTolPTEXOPz4c6gBh/keO+R2nuyxpOL JBRDBqiza8X/I4BOG06d9+zyT8aRxI6QNTUH75LO3xLpvOT7ijBQdEQAjF3d0wUqBGJE 9zhaDoYRY7nh31igjQHlfTxWpWR8tgY7mTmtAWT1fGO9mEJK70c2ungFM6lNL3Kot5Ww U9oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704976410; x=1705581210; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SpZJATqCZATuxaz2gilHI43oqOqQXViPLDcKIFXRnDk=; b=pDGDAMjUA12VYIFetuzrwEGr8cKxM545aILRtDoPSc9us+jVhkUYubiY07BiqOGTof 7RfYSSawBJ68gMEJFHjCZYt2CcB73FZ3xUgR0JPKiYJjw/61eOUHf+GkU/3KSFHmgYi1 PN4ZUByiKm1vYpDxQkLkCB+iKnu5nsXTaP2FTv/sOXzaVyz9h3rq8GpIm6gDnGGFobi9 lUMH6AQxqP9qc98D+sLvTDoM17youVGzdcw0Bh4HiRn4J6Aar2YXvcdmgLBJp9vTXJJ/ q1cE5RcruAZuHfeLqLhziuBjZrCGQuNir5z1rrxb42XDRk4jGR7FDRexsWyUt+TmMkTO 1ifw== X-Gm-Message-State: AOJu0YwpWRb0gGuOIYSDGuXKqi5KrMK2P6iJ2li0lMEjUq5wiaNAiTth lSJZW/w+FgASDnKUsReOFSn0twNc0BlDsTxsXQMpyGXa8/CVp/GlCAXHRSfGqewrYEfaSVxff1l 0WfryAGrY6PtwJhaCJ4CiigJRa3ne/kmVFUVAWnYhqKv4VM+UruFT6d2MhFALULONz6garhM= X-Google-Smtp-Source: AGHT+IGo7LQGrzOzQCR4+X6/oJt2wm+mswEhHOU19JfKuLn6FmFAQUiBNZva0xPOkw2+LytO1g8RQ1G4 X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a0d:c607:0:b0:5f9:4fa1:19cf with SMTP id i7-20020a0dc607000000b005f94fa119cfmr177135ywd.0.1704976410146; Thu, 11 Jan 2024 04:33:30 -0800 (PST) Date: Thu, 11 Jan 2024 13:33:06 +0100 In-Reply-To: <20240111123302.589910-10-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111123302.589910-10-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=5454; i=ardb@kernel.org; h=from:subject; bh=Odhmzeqpckz1VnXGZdxlxNlIGE9EbODByD1xmZr7qf4=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIXX+A8Yyk+Nr/jw6Jl3frJyz+zbXfzaPPyFsnV/6LxwUM Ntg2Ha0o5SFQYyDQVZMkUVg9t93O09PlKp1niULM4eVCWQIAxenAExkijYjw9wKDbODrTfWHj5m aLSoqPtE7x05Dgf1BVfSeyPvtLxm3sXI8CF9vqPA0gMikTVMW2RPsu3+Y8QqueJNp+reiRol1up y/AA= X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111123302.589910-13-ardb+git@google.com> Subject: [PATCH 3/8] crypto: arm64/aes-ccm - Pass short inputs via stack buffer From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: ebiggers@kernel.org, herbert@gondor.apana.org.au, Ard Biesheuvel From: Ard Biesheuvel In preparation for optimizing the CCM core asm code using permutation vectors and overlapping loads and stores, ensure that inputs shorter than the size of a AES block are passed via a buffer on the stack, in a way that positions the data at the end of a 16 byte buffer. This removes the need for the asm code to reason about a rare corner case where the tail of the data cannot be read/written using a single NEON load/store instruction. While at it, tweak the copyright header and authorship to bring it up to date. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 57 ++++++++++++++------ 1 file changed, 40 insertions(+), 17 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index b177ebea7d09..2f4e6a318fcd 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -1,8 +1,11 @@ // SPDX-License-Identifier: GPL-2.0-only /* - * aes-ccm-glue.c - AES-CCM transform for ARMv8 with Crypto Extensions + * aes-ce-ccm-glue.c - AES-CCM transform for ARMv8 with Crypto Extensions * - * Copyright (C) 2013 - 2017 Linaro Ltd + * Copyright (C) 2013 - 2017 Linaro Ltd. + * Copyright (C) 2024 Google LLC + * + * Author: Ard Biesheuvel */ #include @@ -149,7 +152,7 @@ static int ccm_encrypt(struct aead_request *req) struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead); struct skcipher_walk walk; u8 __aligned(8) mac[AES_BLOCK_SIZE]; - u8 buf[AES_BLOCK_SIZE]; + u8 orig_iv[AES_BLOCK_SIZE]; u32 len = req->cryptlen; int err; @@ -158,7 +161,7 @@ static int ccm_encrypt(struct aead_request *req) return err; /* preserve the original iv for the final round */ - memcpy(buf, req->iv, AES_BLOCK_SIZE); + memcpy(orig_iv, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_encrypt(&walk, req, false); if (unlikely(err)) @@ -171,16 +174,26 @@ static int ccm_encrypt(struct aead_request *req) do { u32 tail = walk.nbytes % AES_BLOCK_SIZE; + const u8 *src = walk.src.virt.addr; + u8 *dst = walk.dst.virt.addr; + u8 buf[AES_BLOCK_SIZE]; if (walk.nbytes == walk.total) tail = 0; - ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - walk.nbytes - tail, ctx->key_enc, - num_rounds(ctx), mac, walk.iv); + if (unlikely(walk.total < AES_BLOCK_SIZE)) + src = dst = memcpy(buf + sizeof(buf) - walk.total, + src, walk.total); + + ce_aes_ccm_encrypt(dst, src, walk.nbytes - tail, + ctx->key_enc, num_rounds(ctx), + mac, walk.iv); + + if (unlikely(walk.total < AES_BLOCK_SIZE)) + memcpy(walk.dst.virt.addr, dst, walk.total); if (walk.nbytes == walk.total) - ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); + ce_aes_ccm_final(mac, orig_iv, ctx->key_enc, num_rounds(ctx)); if (walk.nbytes) { err = skcipher_walk_done(&walk, tail); @@ -206,7 +219,7 @@ static int ccm_decrypt(struct aead_request *req) unsigned int authsize = crypto_aead_authsize(aead); struct skcipher_walk walk; u8 __aligned(8) mac[AES_BLOCK_SIZE]; - u8 buf[AES_BLOCK_SIZE]; + u8 orig_iv[AES_BLOCK_SIZE]; u32 len = req->cryptlen - authsize; int err; @@ -215,7 +228,7 @@ static int ccm_decrypt(struct aead_request *req) return err; /* preserve the original iv for the final round */ - memcpy(buf, req->iv, AES_BLOCK_SIZE); + memcpy(orig_iv, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_decrypt(&walk, req, false); if (unlikely(err)) @@ -228,16 +241,26 @@ static int ccm_decrypt(struct aead_request *req) do { u32 tail = walk.nbytes % AES_BLOCK_SIZE; + const u8 *src = walk.src.virt.addr; + u8 *dst = walk.dst.virt.addr; + u8 buf[AES_BLOCK_SIZE]; if (walk.nbytes == walk.total) tail = 0; - ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - walk.nbytes - tail, ctx->key_enc, - num_rounds(ctx), mac, walk.iv); + if (unlikely(walk.total < AES_BLOCK_SIZE)) + src = dst = memcpy(buf + sizeof(buf) - walk.total, + src, walk.total); + + ce_aes_ccm_decrypt(dst, src, walk.nbytes - tail, + ctx->key_enc, num_rounds(ctx), + mac, walk.iv); + + if (unlikely(walk.total < AES_BLOCK_SIZE)) + memcpy(walk.dst.virt.addr, dst, walk.total); if (walk.nbytes == walk.total) - ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); + ce_aes_ccm_final(mac, orig_iv, ctx->key_enc, num_rounds(ctx)); if (walk.nbytes) { err = skcipher_walk_done(&walk, tail); @@ -250,11 +273,11 @@ static int ccm_decrypt(struct aead_request *req) return err; /* compare calculated auth tag with the stored one */ - scatterwalk_map_and_copy(buf, req->src, + scatterwalk_map_and_copy(orig_iv, req->src, req->assoclen + req->cryptlen - authsize, authsize, 0); - if (crypto_memneq(mac, buf, authsize)) + if (crypto_memneq(mac, orig_iv, authsize)) return -EBADMSG; return 0; } @@ -293,6 +316,6 @@ module_init(aes_mod_init); module_exit(aes_mod_exit); MODULE_DESCRIPTION("Synchronous AES in CCM mode using ARMv8 Crypto Extensions"); -MODULE_AUTHOR("Ard Biesheuvel "); +MODULE_AUTHOR("Ard Biesheuvel "); MODULE_LICENSE("GPL v2"); MODULE_ALIAS_CRYPTO("ccm(aes)"); From patchwork Thu Jan 11 12:33:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 762245 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A154115AC2 for ; Thu, 11 Jan 2024 12:33:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="sM+3e7an" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dbdb69bc114so7191588276.0 for ; Thu, 11 Jan 2024 04:33:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704976412; x=1705581212; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QHDclY05fjEWHWVFdqEeEWldPY0XAnoENh1KcMswByg=; b=sM+3e7an8st+nPA96L0jxlU6mILtzay/ZH3amyfPAFWh43pN6QYzL+dgNkXvkIcY9n UeDjItnZe5jktP2Yaw+02Gnk006sJ7i95154jta7U03cuXTAvz1bI08PVOkkw2CTqJD0 D/iJG+qXRekqYlZFaGHcqY6wnlTxpCF99hq45TwXEB1GC5l2v6Z7ui30Rw9HGNgH82b4 ppy4DTpYEf5DVWmbjACOem9vU6LflNRA5oNxL1qcbrhPw+ufKP66zU8iHqui/p8mMX21 urqrbMlc5ZA1+Oaincip1joRFnYW1YaMTiHg+Hl4bhoeDUfTo+1zns3TOYkw5l8Pp9jt CVYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704976412; x=1705581212; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QHDclY05fjEWHWVFdqEeEWldPY0XAnoENh1KcMswByg=; b=oI+jx/BM71e139N6kApKOAHptkrRQT94iBwlyV8ufVtliyGNg5bY0sgibUaISaUo+V An451LIgJ3XOXxhR8pLLB5/j60oLovNhlxOWeHL+ATkNHNil4/G7eI9GP44n4pGvCo9T dtX7N1IwlotMnmUt2kgnqDX7skrkjq/CU8TkBKCEI0kOJNX5qpEctk01kGCMcPCeibT+ RiaUplzYGxqRklWZQM7z53xwzDfEsPt7K5Qly+TIRRtz3IzdDVKqlBzNsdoOfM29YpZ8 OeJhcn2EPcMNTZ2+Y4MfsPzNNv+Pg42Pkvch/38fkATYBXlOceoVqkqwtAQkGGOXQQfT hipg== X-Gm-Message-State: AOJu0YzWjFhhg6xqJIHrusi3ZQUsvr08XTDSE0FM2ZN8ID1Ta4s67Gqy JNRrK8CHPOpx4ynvL5zGBNxK4P7nWFw5cDHqqtiCevctA4ZCTMHSQ7A1gXHLLWFs7f0LnWqLv06 9yjwlEKhb7+IsFuXy+aPx+5UBKK6kVHqVsj7+4FetJnsDQJHRz8VN58lLn9VAmeGU+K23Hsk= X-Google-Smtp-Source: AGHT+IFUEAx5y8BVxQp1jrTICbSiLjwdyWaDzrAPpVT11wT+s5zJ2GtGVHd4GtOEX8p7B835X/U9W0ZY X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a05:6902:2843:b0:dbe:a220:68f9 with SMTP id ee3-20020a056902284300b00dbea22068f9mr475521ybb.0.1704976412450; Thu, 11 Jan 2024 04:33:32 -0800 (PST) Date: Thu, 11 Jan 2024 13:33:07 +0100 In-Reply-To: <20240111123302.589910-10-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111123302.589910-10-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=5865; i=ardb@kernel.org; h=from:subject; bh=tZlQQtnd9aDoIsH9u1ZAwbpmkDhKIfhCzBwvMQih1VI=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIXX+A6aLR29cVglR8/L4uHh65ptaqTlFJnuz915gjy5Nm 6WuJvS3o5SFQYyDQVZMkUVg9t93O09PlKp1niULM4eVCWQIAxenAEyEXZ7hr7DOohXHX8h+XXc7 UUJiU7LlpccnGDol5sVdlZypv3imgBgjw5GgBUJJ25/PmHV+2eY7mzYeT3zgvPagcYBVj8ad4/N 81nADAA== X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111123302.589910-14-ardb+git@google.com> Subject: [PATCH 4/8] crypto: arm64/aes-ccm - Replace bytewise tail handling with NEON permute From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: ebiggers@kernel.org, herbert@gondor.apana.org.au, Ard Biesheuvel From: Ard Biesheuvel Implement the CCM tail handling using a single sequence that uses permute vectors and overlapping loads and stores, rather than going over the tail byte by byte in a loop, and using scalar operations. This is more efficient, even though the measured speedup is only around 1-2% on the CPUs I have tried. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 59 +++++++++++++------- arch/arm64/crypto/aes-ce-ccm-glue.c | 20 +++---- 2 files changed, 48 insertions(+), 31 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index b03f7f71f893..b21a9b759ab2 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -1,8 +1,11 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * aesce-ccm-core.S - AES-CCM transform for ARMv8 with Crypto Extensions + * aes-ce-ccm-core.S - AES-CCM transform for ARMv8 with Crypto Extensions * - * Copyright (C) 2013 - 2017 Linaro Ltd + * Copyright (C) 2013 - 2017 Linaro Ltd. + * Copyright (C) 2024 Google LLC + * + * Author: Ard Biesheuvel */ #include @@ -168,13 +171,13 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ ld1 {v2.16b}, [x1], #16 /* load next input block */ .if \enc == 1 eor v2.16b, v2.16b, v5.16b /* final round enc+mac */ - eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */ + eor v6.16b, v1.16b, v2.16b /* xor with crypted ctr */ .else eor v2.16b, v2.16b, v1.16b /* xor with crypted ctr */ - eor v1.16b, v2.16b, v5.16b /* final round enc */ + eor v6.16b, v2.16b, v5.16b /* final round enc */ .endif eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */ - st1 {v1.16b}, [x0], #16 /* write output block */ + st1 {v6.16b}, [x0], #16 /* write output block */ bne 0b CPU_LE( rev x8, x8 ) st1 {v0.16b}, [x5] /* store mac */ @@ -183,25 +186,31 @@ CPU_LE( rev x8, x8 ) 6: eor v0.16b, v0.16b, v5.16b /* final round mac */ eor v1.16b, v1.16b, v5.16b /* final round enc */ - st1 {v0.16b}, [x5] /* store mac */ - add w2, w2, #16 /* process partial tail block */ -7: ldrb w9, [x1], #1 /* get 1 byte of input */ - umov w6, v1.b[0] /* get top crypted ctr byte */ - umov w7, v0.b[0] /* get top mac byte */ + + add x1, x1, w2, sxtw /* rewind the input pointer (w2 < 0) */ + add x0, x0, w2, sxtw /* rewind the output pointer */ + + adr_l x8, .Lpermute /* load permute vectors */ + add x9, x8, w2, sxtw + sub x8, x8, w2, sxtw + ld1 {v7.16b-v8.16b}, [x9] + ld1 {v9.16b}, [x8] + + ld1 {v2.16b}, [x1] /* load a full block of input */ + tbl v1.16b, {v1.16b}, v7.16b /* move keystream to end of register */ .if \enc == 1 - eor w7, w7, w9 - eor w9, w9, w6 + tbl v7.16b, {v2.16b}, v9.16b /* copy plaintext to start of v7 */ + eor v2.16b, v2.16b, v1.16b /* encrypt partial input block */ .else - eor w9, w9, w6 - eor w7, w7, w9 + eor v2.16b, v2.16b, v1.16b /* decrypt partial input block */ + tbl v7.16b, {v2.16b}, v9.16b /* copy plaintext to start of v7 */ .endif - strb w9, [x0], #1 /* store out byte */ - strb w7, [x5], #1 /* store mac byte */ - subs w2, w2, #1 - beq 5b - ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */ - ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */ - b 7b + eor v0.16b, v0.16b, v7.16b /* fold plaintext into mac */ + tbx v2.16b, {v6.16b}, v8.16b /* insert output from previous iteration */ + + st1 {v0.16b}, [x5] /* store mac */ + st1 {v2.16b}, [x0] /* store output block */ + ret .endm /* @@ -219,3 +228,11 @@ SYM_FUNC_END(ce_aes_ccm_encrypt) SYM_FUNC_START(ce_aes_ccm_decrypt) aes_ccm_do_crypt 0 SYM_FUNC_END(ce_aes_ccm_decrypt) + + .section ".rodata", "a" + .align 6 + .fill 15, 1, 0xff +.Lpermute: + .byte 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7 + .byte 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf + .fill 15, 1, 0xff diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index 2f4e6a318fcd..4710e59075f5 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -181,16 +181,16 @@ static int ccm_encrypt(struct aead_request *req) if (walk.nbytes == walk.total) tail = 0; - if (unlikely(walk.total < AES_BLOCK_SIZE)) - src = dst = memcpy(buf + sizeof(buf) - walk.total, - src, walk.total); + if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) + src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes], + src, walk.nbytes); ce_aes_ccm_encrypt(dst, src, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); - if (unlikely(walk.total < AES_BLOCK_SIZE)) - memcpy(walk.dst.virt.addr, dst, walk.total); + if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) + memcpy(walk.dst.virt.addr, dst, walk.nbytes); if (walk.nbytes == walk.total) ce_aes_ccm_final(mac, orig_iv, ctx->key_enc, num_rounds(ctx)); @@ -248,16 +248,16 @@ static int ccm_decrypt(struct aead_request *req) if (walk.nbytes == walk.total) tail = 0; - if (unlikely(walk.total < AES_BLOCK_SIZE)) - src = dst = memcpy(buf + sizeof(buf) - walk.total, - src, walk.total); + if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) + src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes], + src, walk.nbytes); ce_aes_ccm_decrypt(dst, src, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); - if (unlikely(walk.total < AES_BLOCK_SIZE)) - memcpy(walk.dst.virt.addr, dst, walk.total); + if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) + memcpy(walk.dst.virt.addr, dst, walk.nbytes); if (walk.nbytes == walk.total) ce_aes_ccm_final(mac, orig_iv, ctx->key_enc, num_rounds(ctx)); From patchwork Thu Jan 11 12:33:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 761961 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6949915AC2 for ; Thu, 11 Jan 2024 12:33:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="qEUZLuMD" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-336b8da86eeso3309427f8f.0 for ; Thu, 11 Jan 2024 04:33:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704976414; x=1705581214; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=V7NqHvpVwYIckzJz3vTZ8qk1DnUwBae1Z/b9b550F34=; b=qEUZLuMDZStxy4VsKWA/DIn7XpnVNYaY+mvirx1Bh/sMJTccW5af/OcY2ZEcqY3fQS wY303th/G8o9aN849gqtfBvg7b6Gv/I1NSUCQx3WMBXevf/bglCrowvr2mkma5kYor2r CBdYsAbt5GTrZJuQYaLWldqLHJZG9X6rZooQtWmsCg2as0h5suh+EADxG3im+ajjU2l9 V2Z4FND46FG79LoTgypO3uWSUTAkDMLco1CWjBJIzPYW5oIyOqhtyGz3yb5GeQoOWqQr M2KJyMTR8cyVEWDY1BOI8RB2rrF9k4GdDkZd6G1IslhNjfAXlrAP3e9TOg4SOmhUzJYm HKhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704976414; x=1705581214; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=V7NqHvpVwYIckzJz3vTZ8qk1DnUwBae1Z/b9b550F34=; b=R7xwmKon34mKd/NxLuQklgKVGFoMZojV2CuScgyiHhq32HZSudI/nUAM0KfzEcH7pQ 8b3gkdAT9rsEkaVRID+VORtSzm0HOkHZ+9pNiZST+YsT5/GsJhXsifMgeVmiiKN5X+ZK ehu/3W2G5iIzRvvQ49nN9DaxyMHv5dTT8ouIyNVtxjR1vsHZfm0+HzmjiUYlD99BSbo7 rUhraJelfQCunob2Wpp5lNrPPUNOwCSruDbMTABrNS7N4E+vcDdYj8/0jP+Wp6kok989 b9cppHmdRBoz++xUFHsTGReNzalK9Gw0ryuVx+HcqVYPgiNGQTRA2lumHf9GbG4Z/qiQ vNfA== X-Gm-Message-State: AOJu0YxwyDPXxrjGIFHrT6Sdrj5105aCkmgmbcrxVACgVEgXqKyRIWoe XdVuH6brD/kVtE49kJIHiBbyePtcP8VQ71E2fGOiaoYak5E7FQ8sr2SuQ8+SefhvuqD5HoxMaYA No5Vs5zMkgsvGc3t+3db8uNMAZRhRN6oRZ823wWFh+FNG+k1Xqpp52r/7j3jGZn8c2nn3W8M= X-Google-Smtp-Source: AGHT+IG1OE+vDmEsbe2sQorhqr94K4Y2eP/iuYTMTFAHNPq0MDGTHEk7shg3n4Q2qAwc65fyp0Se8LIf X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a05:6000:1c1c:b0:337:77fc:e84f with SMTP id ba28-20020a0560001c1c00b0033777fce84fmr11442wrb.9.1704976414596; Thu, 11 Jan 2024 04:33:34 -0800 (PST) Date: Thu, 11 Jan 2024 13:33:08 +0100 In-Reply-To: <20240111123302.589910-10-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111123302.589910-10-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=6993; i=ardb@kernel.org; h=from:subject; bh=yle/H4ZDasJ7xiFxa7Y8LY3dYulmTCAxQvljqoR89sw=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIXX+A+Yg09Ware4npj29VlzwjHOloJ6s4mSprKfhm1ye+ wRsXx7aUcrCIMbBICumyCIw+++7nacnStU6z5KFmcPKBDKEgYtTACbyX4uR4Tiv54f8aImk8HaO Ez1z0w5rH7FxZHkTEWsSaKQwceleX0aGBpPrNim+Nsl9jTEfJDSLapsq0zIO/+dtYjt2is2hlJ8 BAA== X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111123302.589910-15-ardb+git@google.com> Subject: [PATCH 5/8] crypto: arm64/aes-ccm - Reuse existing MAC update for AAD input From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: ebiggers@kernel.org, herbert@gondor.apana.org.au, Ard Biesheuvel From: Ard Biesheuvel CCM combines the counter (CTR) encryption mode with a MAC based on the same block cipher. This MAC construction is a bit clunky: it invokes the block cipher in a way that cannot be parallelized, resulting in poor CPU pipeline efficiency. The arm64 CCM code mitigates this by interleaving the encryption and MAC at the AES round level, resulting in a substantial speedup. But this approach does not apply to the additional authenticated data (AAD) which is not encrypted. This means the special asm routine dealing with the AAD is not any better than the MAC update routine used by the arm64 AES block encryption driver, so let's reuse that, and drop the special AES-CCM version. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Kconfig | 1 + arch/arm64/crypto/aes-ce-ccm-core.S | 71 -------------------- arch/arm64/crypto/aes-ce-ccm-glue.c | 49 +++++++++++--- arch/arm64/crypto/aes-glue.c | 1 + 4 files changed, 43 insertions(+), 79 deletions(-) diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index eb7b423ba463..e7d9bd8e4709 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -268,6 +268,7 @@ config CRYPTO_AES_ARM64_CE_CCM depends on ARM64 && KERNEL_MODE_NEON select CRYPTO_ALGAPI select CRYPTO_AES_ARM64_CE + select CRYPTO_AES_ARM64_CE_BLK select CRYPTO_AEAD select CRYPTO_LIB_AES help diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index b21a9b759ab2..0132872bd780 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -14,77 +14,6 @@ .text .arch armv8-a+crypto - /* - * u32 ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes, - * u32 macp, u8 const rk[], u32 rounds); - */ -SYM_FUNC_START(ce_aes_ccm_auth_data) - ld1 {v0.16b}, [x0] /* load mac */ - cbz w3, 1f - sub w3, w3, #16 - eor v1.16b, v1.16b, v1.16b -0: ldrb w7, [x1], #1 /* get 1 byte of input */ - subs w2, w2, #1 - add w3, w3, #1 - ins v1.b[0], w7 - ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */ - beq 8f /* out of input? */ - cbnz w3, 0b - eor v0.16b, v0.16b, v1.16b -1: ld1 {v3.4s}, [x4] /* load first round key */ - prfm pldl1strm, [x1] - cmp w5, #12 /* which key size? */ - add x6, x4, #16 - sub w7, w5, #2 /* modified # of rounds */ - bmi 2f - bne 5f - mov v5.16b, v3.16b - b 4f -2: mov v4.16b, v3.16b - ld1 {v5.4s}, [x6], #16 /* load 2nd round key */ -3: aese v0.16b, v4.16b - aesmc v0.16b, v0.16b -4: ld1 {v3.4s}, [x6], #16 /* load next round key */ - aese v0.16b, v5.16b - aesmc v0.16b, v0.16b -5: ld1 {v4.4s}, [x6], #16 /* load next round key */ - subs w7, w7, #3 - aese v0.16b, v3.16b - aesmc v0.16b, v0.16b - ld1 {v5.4s}, [x6], #16 /* load next round key */ - bpl 3b - aese v0.16b, v4.16b - subs w2, w2, #16 /* last data? */ - eor v0.16b, v0.16b, v5.16b /* final round */ - bmi 6f - ld1 {v1.16b}, [x1], #16 /* load next input block */ - eor v0.16b, v0.16b, v1.16b /* xor with mac */ - bne 1b -6: st1 {v0.16b}, [x0] /* store mac */ - beq 10f - adds w2, w2, #16 - beq 10f - mov w3, w2 -7: ldrb w7, [x1], #1 - umov w6, v0.b[0] - eor w6, w6, w7 - strb w6, [x0], #1 - subs w2, w2, #1 - beq 10f - ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */ - b 7b -8: cbz w3, 91f - mov w7, w3 - add w3, w3, #16 -9: ext v1.16b, v1.16b, v1.16b, #1 - adds w7, w7, #1 - bne 9b -91: eor v0.16b, v0.16b, v1.16b - st1 {v0.16b}, [x0] -10: mov w0, w3 - ret -SYM_FUNC_END(ce_aes_ccm_auth_data) - /* * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[], * u32 rounds); diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index 4710e59075f5..ed3d79e05112 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -18,6 +18,8 @@ #include "aes-ce-setkey.h" +MODULE_IMPORT_NS(CRYPTO_INTERNAL); + static int num_rounds(struct crypto_aes_ctx *ctx) { /* @@ -30,8 +32,9 @@ static int num_rounds(struct crypto_aes_ctx *ctx) return 6 + ctx->key_length / 4; } -asmlinkage u32 ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes, - u32 macp, u32 const rk[], u32 rounds); +asmlinkage u32 ce_aes_mac_update(u8 const in[], u32 const rk[], int rounds, + int blocks, u8 dg[], int enc_before, + int enc_after); asmlinkage void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes, u32 const rk[], u32 rounds, u8 mac[], @@ -97,6 +100,41 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) return 0; } +static u32 ce_aes_ccm_auth_data(u8 mac[], u8 const in[], u32 abytes, + u32 macp, u32 const rk[], u32 rounds) +{ + int enc_after = (macp + abytes) % AES_BLOCK_SIZE; + + do { + u32 blocks = abytes / AES_BLOCK_SIZE; + + if (macp == AES_BLOCK_SIZE || (!macp && blocks > 0)) { + u32 rem = ce_aes_mac_update(in, rk, rounds, blocks, mac, + macp, enc_after); + u32 adv = (blocks - rem) * AES_BLOCK_SIZE; + + macp = enc_after ? 0 : AES_BLOCK_SIZE; + in += adv; + abytes -= adv; + + if (unlikely(rem)) { + kernel_neon_end(); + kernel_neon_begin(); + macp = 0; + } + } else { + u32 l = min(AES_BLOCK_SIZE - macp, abytes); + + crypto_xor(&mac[macp], in, l); + in += l; + macp += l; + abytes -= l; + } + } while (abytes > 0); + + return macp; +} + static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[]) { struct crypto_aead *aead = crypto_aead_reqtfm(req); @@ -104,7 +142,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[]) struct __packed { __be16 l; __be32 h; u16 len; } ltag; struct scatter_walk walk; u32 len = req->assoclen; - u32 macp = 0; + u32 macp = AES_BLOCK_SIZE; /* prepend the AAD with a length tag */ if (len < 0xff00) { @@ -128,16 +166,11 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[]) scatterwalk_start(&walk, sg_next(walk.sg)); n = scatterwalk_clamp(&walk, len); } - n = min_t(u32, n, SZ_4K); /* yield NEON at least every 4k */ p = scatterwalk_map(&walk); macp = ce_aes_ccm_auth_data(mac, p, n, macp, ctx->key_enc, num_rounds(ctx)); - if (len / SZ_4K > (len - n) / SZ_4K) { - kernel_neon_end(); - kernel_neon_begin(); - } len -= n; scatterwalk_unmap(p); diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 162787c7aa86..a147e847a5a1 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -1048,6 +1048,7 @@ static int __init aes_init(void) #ifdef USE_V8_CRYPTO_EXTENSIONS module_cpu_feature_match(AES, aes_init); +EXPORT_SYMBOL_NS(ce_aes_mac_update, CRYPTO_INTERNAL); #else module_init(aes_init); EXPORT_SYMBOL(neon_aes_ecb_encrypt); From patchwork Thu Jan 11 12:33:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 762244 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A5DB015AC2 for ; Thu, 11 Jan 2024 12:33:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lQD92y6f" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-40e530b7596so18866135e9.1 for ; Thu, 11 Jan 2024 04:33:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704976417; x=1705581217; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tshT8cVe3rLNAhTq0AmYLGXGv7+SRdWkw2tW13vqpSo=; b=lQD92y6f5cMhRruv4Bob2pS5nu72o0mjXzJ2E7RVD7sEO5yytbjYPJs52ZIbZc0lXg IMwnxS/yEaPWrat0A8hBXIo/mYo5gCGg/FU+bzONiJopeJlsIDbcY3a/mDHVCzFaPnfq plhyYXP07kBj7nvxCDLbDs+Dx6l3Yl2dDGQOpMPl18cxwqhIC+KYettUutFmEVhVzc7m 7COjzB4JjSpXRC7Gg/kPAIpNDGHly2CQWcG+qvdfwj5ulbXrrawU7iKdrkYs0dYRc/Ak CFTthfWnbA+fR+2v7QeWvczY5oiEorAYGlu0T7+JqRPfoWxAlme6gmAlAOtQU6qkTG60 4UwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704976417; x=1705581217; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tshT8cVe3rLNAhTq0AmYLGXGv7+SRdWkw2tW13vqpSo=; b=ChPBGxad8c4VQ8tzmVSxPeQ5tL+JOjZOvx80qGVKnowSxP/TTCPvfJluCxKTSRB+7O F3Jdv9LF7dVyUzGKl5cU3oyYhqwSDL5SIBSGKJBM4fvwtqCR0M+DYB9Qy+mngKD/VFPg m7BKPjnKUisgabGHEYfUcH69AoFXHD4FOWNgFy+ZhABbEJ0hOmhAL+9DUU9WoYuFNbpw N6C4qX2rYuyYbGRs2pLN36MLx+AN0KxQJDkZWDMtplMePcS4eD/pwsq5FZrZErqAQ4bI HdkXhbuhVhzq5qyREuCuvYBcyCx1Ku42AzXdLnqppksckcGA5E62TbFv7bwa+NTE4jil gxdg== X-Gm-Message-State: AOJu0YyUA/d9dctVb7/YEcYSWPqP3DBjk5zEKggM9Aq6JSfrMWrEuFlw ikTnufgK02sjDt5sLCwERE2kcSDnv6NP+ZIfT7YPhq4xl+3UYceNIddK8Uf6SNFCWam+cvOWQUF oskRZUEWE7JfGfJk9vQMezVTZ8/ugnxPhD/VTRq3yisvREsLYKPsH95iR/5Cu9uW9wefjrIg= X-Google-Smtp-Source: AGHT+IHggGQ8Q7oqdoxLFnr4dEDiGu6WfeqncJaKlv3cSRqySJ7Lj8nZ8t63ceb0nTMxXPddJivP1mpG X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a05:600c:4f12:b0:40e:4cae:a416 with SMTP id l18-20020a05600c4f1200b0040e4caea416mr8012wmq.1.1704976416765; Thu, 11 Jan 2024 04:33:36 -0800 (PST) Date: Thu, 11 Jan 2024 13:33:09 +0100 In-Reply-To: <20240111123302.589910-10-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111123302.589910-10-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=4505; i=ardb@kernel.org; h=from:subject; bh=Gmp1Fk2CXqJjN2EJN3MlWDoy+XN+d7Zbe69qrq+tdjU=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIXX+A5aem5OjL+asvf/rJcePP7k5TGuCwhrLfsxeUTup9 IGIRZNYRykLgxgHg6yYIovA7L/vdp6eKFXrPEsWZg4rE8gQBi5OAZjIprWMDPNmdej/m+3Yy2Yz 94GO1tr+wi1/T7H8bu6fktqyetnUTxcYGRpmNqccy1lw2HvRlM/X11acKvyf1d9oNPmLIPuK8wb 1TYwA X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111123302.589910-16-ardb+git@google.com> Subject: [PATCH 6/8] crypto: arm64/aes-ccm - Cache round keys and unroll AES loops From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: ebiggers@kernel.org, herbert@gondor.apana.org.au, Ard Biesheuvel From: Ard Biesheuvel The CCM code as originally written attempted to use as few NEON registers as possible, to avoid having to eagerly preserve/restore the entire NEON register file at every call to kernel_neon_begin/end. At that time, this API took a number of NEON registers as a parameter, and only preserved that many registers. Today, the NEON register file is restored lazily, and the old API is long gone. This means we can use as many NEON registers as we can make meaningful use of, which means in the AES case that we can keep all round keys in registers rather than reloading each of them for each AES block processed. On Cortex-A53, this results in a speedup of more than 50%. (From 4 cycles per byte to 2.6 cycles per byte) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 95 ++++++++------------ 1 file changed, 38 insertions(+), 57 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index 0132872bd780..0ec59fc4ef3e 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -14,40 +14,46 @@ .text .arch armv8-a+crypto + .macro load_round_keys, rk, nr, tmp + sub w\tmp, \nr, #10 + add \tmp, \rk, w\tmp, sxtw #4 + ld1 {v10.4s-v13.4s}, [\rk] + ld1 {v14.4s-v17.4s}, [\tmp], #64 + ld1 {v18.4s-v21.4s}, [\tmp], #64 + ld1 {v3.4s-v5.4s}, [\tmp] + .endm + + .macro dround, va, vb, vk + aese \va\().16b, \vk\().16b + aesmc \va\().16b, \va\().16b + aese \vb\().16b, \vk\().16b + aesmc \vb\().16b, \vb\().16b + .endm + + .macro aes_encrypt, va, vb, nr + tbz \nr, #2, .L\@ + dround \va, \vb, v10 + dround \va, \vb, v11 + tbz \nr, #1, .L\@ + dround \va, \vb, v12 + dround \va, \vb, v13 +.L\@: .irp v, v14, v15, v16, v17, v18, v19, v20, v21, v3 + dround \va, \vb, \v + .endr + aese \va\().16b, v4.16b + aese \vb\().16b, v4.16b + .endm + /* * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[], * u32 rounds); */ SYM_FUNC_START(ce_aes_ccm_final) - ld1 {v3.4s}, [x2], #16 /* load first round key */ ld1 {v0.16b}, [x0] /* load mac */ - cmp w3, #12 /* which key size? */ - sub w3, w3, #2 /* modified # of rounds */ ld1 {v1.16b}, [x1] /* load 1st ctriv */ - bmi 0f - bne 3f - mov v5.16b, v3.16b - b 2f -0: mov v4.16b, v3.16b -1: ld1 {v5.4s}, [x2], #16 /* load next round key */ - aese v0.16b, v4.16b - aesmc v0.16b, v0.16b - aese v1.16b, v4.16b - aesmc v1.16b, v1.16b -2: ld1 {v3.4s}, [x2], #16 /* load next round key */ - aese v0.16b, v5.16b - aesmc v0.16b, v0.16b - aese v1.16b, v5.16b - aesmc v1.16b, v1.16b -3: ld1 {v4.4s}, [x2], #16 /* load next round key */ - subs w3, w3, #3 - aese v0.16b, v3.16b - aesmc v0.16b, v0.16b - aese v1.16b, v3.16b - aesmc v1.16b, v1.16b - bpl 1b - aese v0.16b, v4.16b - aese v1.16b, v4.16b + + aes_encrypt v0, v1, w3 + /* final round key cancels out */ eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */ st1 {v0.16b}, [x0] /* store result */ @@ -55,6 +61,8 @@ SYM_FUNC_START(ce_aes_ccm_final) SYM_FUNC_END(ce_aes_ccm_final) .macro aes_ccm_do_crypt,enc + load_round_keys x3, w4, x10 + cbz x2, 5f ldr x8, [x6, #8] /* load lower ctr */ ld1 {v0.16b}, [x5] /* load mac */ @@ -64,37 +72,10 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ prfm pldl1strm, [x1] add x8, x8, #1 rev x9, x8 - cmp w4, #12 /* which key size? */ - sub w7, w4, #2 /* get modified # of rounds */ ins v1.d[1], x9 /* no carry in lower ctr */ - ld1 {v3.4s}, [x3] /* load first round key */ - add x10, x3, #16 - bmi 1f - bne 4f - mov v5.16b, v3.16b - b 3f -1: mov v4.16b, v3.16b - ld1 {v5.4s}, [x10], #16 /* load 2nd round key */ -2: /* inner loop: 3 rounds, 2x interleaved */ - aese v0.16b, v4.16b - aesmc v0.16b, v0.16b - aese v1.16b, v4.16b - aesmc v1.16b, v1.16b -3: ld1 {v3.4s}, [x10], #16 /* load next round key */ - aese v0.16b, v5.16b - aesmc v0.16b, v0.16b - aese v1.16b, v5.16b - aesmc v1.16b, v1.16b -4: ld1 {v4.4s}, [x10], #16 /* load next round key */ - subs w7, w7, #3 - aese v0.16b, v3.16b - aesmc v0.16b, v0.16b - aese v1.16b, v3.16b - aesmc v1.16b, v1.16b - ld1 {v5.4s}, [x10], #16 /* load next round key */ - bpl 2b - aese v0.16b, v4.16b - aese v1.16b, v4.16b + + aes_encrypt v0, v1, w4 + subs w2, w2, #16 bmi 6f /* partial block? */ ld1 {v2.16b}, [x1], #16 /* load next input block */ From patchwork Thu Jan 11 12:33:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 761960 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D134215AC2 for ; Thu, 11 Jan 2024 12:33:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WC8Br4CZ" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-33770774fe4so2121783f8f.3 for ; Thu, 11 Jan 2024 04:33:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704976419; x=1705581219; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gsbXqpi9ULnrk+17fUSwxAvQV2SegrOTXEZgPPiuxxY=; b=WC8Br4CZibpnKlLkRXjel96w0Ao+AkCqaP5H/ynrBGslJiB9eIkf/uKh50ceoVJ5MP LVPJB5b6HH1iATFWWVwN8A6gVp+rvvy5R6XzXM9iRqZab/p/o3EFFnxifrHWboNEHBLu vFdHIDfM4GZkXPW2gTsNn5l/+0OQj+zDkf0hK0rej24xWnostjlGz4cWu6fma8VMyKU+ kQYVb8swNEg2ZNDI/hx6yLStdK+oJ68ebuzWFblMQBB57CRjqVypFX6NpLoTMCEeQgKG HZndB8AxP5a/bFDB6A4GH+M+XiiVvVL7uOPIVzYMB5V/eNLlC7MyKsY41f+vydyKwaiY mX9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704976419; x=1705581219; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gsbXqpi9ULnrk+17fUSwxAvQV2SegrOTXEZgPPiuxxY=; b=lhrs6S9YJV7uQZL0gl3En3rXTpo/KwxjqEK6p4DmmwduHrfk/GVHsLpqmHmhMLEDhL abOOim7bFA6ygO2wzGfYIOFZCKCc67g+ObpfmyaSfnVQvB9D0Va0MZDvg1ZSyzCizKRZ 7C//Wv3J4PFaqflJDRBN8/HjPjktD98luVuAxijAktzS8zpPTjrhNAYPlbQDPnFNybXm xATNZ3OK+Jr4IwpjEeyuUrSBGm7Uqj88PfzznHgfNSpeCR03zTpwubuxQn3BLH+Wyth9 K1H8VE0yGywjMQHcOUyxdGQYCrmRnW1SBVaONBjEogywmb2Jx5NzBEL+GKEDf0yzzpnR 6GaA== X-Gm-Message-State: AOJu0Yznx99oqx4FusqauOJmPY1M5fOS4FQXEi3/Ue43lGMrjiOg9PVQ caycdTs2y2wrl9nc4q2jlCE8bqjViAKRVzTvmBHc3mmbUkGCeKf2ofDJweCpOb0rWc15R1wuCZ8 ji49YOypyFDFGIzNmySvDXBminInWU/kV6PlI5For11ff5INimXgj6EIVN1wzxi+QYOyYQoU= X-Google-Smtp-Source: AGHT+IGZp3rZxcW6bq2OIhBvBAHr4dmfmsZ4GhwsvHccNRC8bNAK7xqSVfWZf2gjvw+ImFtAG6Jzadue X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a5d:5a1a:0:b0:337:8f8a:7ea4 with SMTP id bq26-20020a5d5a1a000000b003378f8a7ea4mr87wrb.3.1704976419187; Thu, 11 Jan 2024 04:33:39 -0800 (PST) Date: Thu, 11 Jan 2024 13:33:10 +0100 In-Reply-To: <20240111123302.589910-10-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111123302.589910-10-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=3906; i=ardb@kernel.org; h=from:subject; bh=MZfjFiuAXpOqcK82PSkc20GbzzFi3AmIpe6u5XzRKDY=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIXX+A9bMbJuV8s//7i9l28y0RrLuCvPPhP1z64/8U23uu 8xbeXZ7RykLgxgHg6yYIovA7L/vdp6eKFXrPEsWZg4rE8gQBi5OAZiIdA/D/5ptM8s0W1bGFlyo mzpffeUEkXNuQn++T9v/QK1UR0fY+gTDf8eCjpwFzx6tLunS1GKP2CpaN3uy+4lPAe8NIxxerFd q4QQA X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111123302.589910-17-ardb+git@google.com> Subject: [PATCH 7/8] crypto: arm64/aes-ccm - Merge encrypt and decrypt asm routines From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: ebiggers@kernel.org, herbert@gondor.apana.org.au, Ard Biesheuvel From: Ard Biesheuvel The encryption and decryption code paths are mostly identical, except for a small difference where the plaintext input into the MAC is taken from either the input or the output block. We can factor this in quite easily using a vector bit select, and a few additional XORs, without the need for branches. This way, we can use the same asm helper on the encrypt and decrypt code paths. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 41 +++++++++----------- 1 file changed, 18 insertions(+), 23 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index 0ec59fc4ef3e..75be3157bae1 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -60,7 +60,7 @@ SYM_FUNC_START(ce_aes_ccm_final) ret SYM_FUNC_END(ce_aes_ccm_final) - .macro aes_ccm_do_crypt,enc +SYM_FUNC_START_LOCAL(aes_ccm_do_crypt) load_round_keys x3, w4, x10 cbz x2, 5f @@ -76,28 +76,24 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ aes_encrypt v0, v1, w4 + eor v0.16b, v0.16b, v5.16b /* final round mac */ + eor v1.16b, v1.16b, v5.16b /* final round enc */ subs w2, w2, #16 bmi 6f /* partial block? */ ld1 {v2.16b}, [x1], #16 /* load next input block */ - .if \enc == 1 - eor v2.16b, v2.16b, v5.16b /* final round enc+mac */ - eor v6.16b, v1.16b, v2.16b /* xor with crypted ctr */ - .else - eor v2.16b, v2.16b, v1.16b /* xor with crypted ctr */ - eor v6.16b, v2.16b, v5.16b /* final round enc */ - .endif - eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */ + eor v6.16b, v2.16b, v1.16b /* en/decrypt input block */ + mov v23.16b, v22.16b + bsl v23.16b, v2.16b, v6.16b /* select plaintext */ st1 {v6.16b}, [x0], #16 /* write output block */ + eor v0.16b, v0.16b, v23.16b /* fold plaintext into mac */ + bne 0b CPU_LE( rev x8, x8 ) st1 {v0.16b}, [x5] /* store mac */ str x8, [x6, #8] /* store lsb end of ctr (BE) */ 5: ret -6: eor v0.16b, v0.16b, v5.16b /* final round mac */ - eor v1.16b, v1.16b, v5.16b /* final round enc */ - - add x1, x1, w2, sxtw /* rewind the input pointer (w2 < 0) */ +6: add x1, x1, w2, sxtw /* rewind the input pointer (w2 < 0) */ add x0, x0, w2, sxtw /* rewind the output pointer */ adr_l x8, .Lpermute /* load permute vectors */ @@ -108,20 +104,17 @@ CPU_LE( rev x8, x8 ) ld1 {v2.16b}, [x1] /* load a full block of input */ tbl v1.16b, {v1.16b}, v7.16b /* move keystream to end of register */ - .if \enc == 1 - tbl v7.16b, {v2.16b}, v9.16b /* copy plaintext to start of v7 */ + tbl v7.16b, {v2.16b}, v9.16b /* copy input block to start of v7 */ eor v2.16b, v2.16b, v1.16b /* encrypt partial input block */ - .else - eor v2.16b, v2.16b, v1.16b /* decrypt partial input block */ - tbl v7.16b, {v2.16b}, v9.16b /* copy plaintext to start of v7 */ - .endif - eor v0.16b, v0.16b, v7.16b /* fold plaintext into mac */ + tbl v9.16b, {v2.16b}, v9.16b /* copy output block to start of v9 */ + bsl v22.16b, v7.16b, v9.16b /* select plaintext */ + eor v0.16b, v0.16b, v22.16b /* fold plaintext into mac */ tbx v2.16b, {v6.16b}, v8.16b /* insert output from previous iteration */ st1 {v0.16b}, [x5] /* store mac */ st1 {v2.16b}, [x0] /* store output block */ ret - .endm +SYM_FUNC_END(aes_ccm_do_crypt) /* * void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes, @@ -132,11 +125,13 @@ CPU_LE( rev x8, x8 ) * u8 ctr[]); */ SYM_FUNC_START(ce_aes_ccm_encrypt) - aes_ccm_do_crypt 1 + movi v22.16b, #255 + b aes_ccm_do_crypt SYM_FUNC_END(ce_aes_ccm_encrypt) SYM_FUNC_START(ce_aes_ccm_decrypt) - aes_ccm_do_crypt 0 + movi v22.16b, #0 + b aes_ccm_do_crypt SYM_FUNC_END(ce_aes_ccm_decrypt) .section ".rodata", "a" From patchwork Thu Jan 11 12:33:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 762243 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6268115AD1 for ; Thu, 11 Jan 2024 12:33:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rIpAcpR8" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5f9e5455db1so38300657b3.1 for ; Thu, 11 Jan 2024 04:33:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704976421; x=1705581221; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eklahTTMpbyoQkh7ltPW8pyjirh03W9lGutuGnzNM/Y=; b=rIpAcpR8n6942ZzKqoHH0yHjCUFWmGpjjijuKZn+BFroxIDPfFng17emOgoIXQvqnQ 0EzzABSmd/XGOlsKUHo8BktSkSO2TtfOVzc70KrVBiSJnKb51xcM2z16szIPSHBTxSJX wSM698Ui+L1+CbuHa1oDlofuNR8yAB8PtGYUuaCX6WD/vwEjQQUesBw46s3G6+NOutDa b0qQHNjFA7vYYoo3yau56TMtH2S8Ol2Vo0y7F4EwI+bd0WB7g2XD8KjwFPYfLsBhkFZu THtYAe7R3LcT7cUS/RrJXS1TEtNiG1A8ooUI4Iuub2wI808NKaq/ayF6wLUFG4ITpJbp X+jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704976421; x=1705581221; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eklahTTMpbyoQkh7ltPW8pyjirh03W9lGutuGnzNM/Y=; b=oouEM1alj9UI5hIIRjYzP2NDPDlCtzlduS+Z9/youz7t6QrRzOf4SHHie2eQHNYsmD 2zo6BMqonYUij/FaYbAMe1AjpTWTzOQrMdNzZQ6HbcrJECMcaXroSvka7DQFlsoPmDXq iDdJaiDrOm5ToJ/aauNwzaOa00ED5X4LHWHoIIxRMBscQuvos4U4Ro+sgTU75I9bkdAC cPRQNJj41ce5//1YU7Jyqcd8KYbiEYngoBDgYtW/srVF5CWpal4tOVaL1+hPExpytIlk OYlgjs/aj5H2wgcppQZrFSSam8cSwPkMCI4TxiBh9Jt+J0+e77ZCG8gzj2P5nNSz0C8r Cmjw== X-Gm-Message-State: AOJu0YzYcEhx1PGIxp2xR5QSBhN3TObURbi8c1dS14Oh5gD8G829T/qZ Jo2bGeu/YbiZM1moIvjsLGfiUC3uL6weFuC5uesZcQvsNAKYUN/+TN6t6GUNcsEYnWrVClGi9AJ j4wbHLhZ46oh8d+D+MrqVr48YyBjhS/dUpA0DSfDTEH0mrYlxILap2oJaLZelHWGhZFK88dE= X-Google-Smtp-Source: AGHT+IGO1eSQPerE9rze0TtrTlYckOhcTkXthLkkw+Wb2njY8VSyqGcPni83vwBLeIRXw1aCHS3Lt5um X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a0d:f902:0:b0:5e3:5f02:360a with SMTP id j2-20020a0df902000000b005e35f02360amr174268ywf.9.1704976421349; Thu, 11 Jan 2024 04:33:41 -0800 (PST) Date: Thu, 11 Jan 2024 13:33:11 +0100 In-Reply-To: <20240111123302.589910-10-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240111123302.589910-10-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=5402; i=ardb@kernel.org; h=from:subject; bh=NB9jjOWrNbjzPCu01cSMqRp50yxJ8npXTVcNG1ZF/t8=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIXX+A7Y3HvdOBuaG5wnK+T46vOr94vZHPXnNT1octyr+L t5SVnq0o5SFQYyDQVZMkUVg9t93O09PlKp1niULM4eVCWQIAxenAExk+naG/06rghbbNdim3//B e2DFDVurGxxeK2L37nyYP/Xh/hvpbQmMDJuSa6V9X6++yK/5oUR/upzLj3sfriksFv2QHblTWIl JmQMA X-Mailer: git-send-email 2.43.0.275.g3460e3d667-goog Message-ID: <20240111123302.589910-18-ardb+git@google.com> Subject: [PATCH 8/8] crypto: arm64/aes-ccm - Merge finalization into en/decrypt asm helper From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: ebiggers@kernel.org, herbert@gondor.apana.org.au, Ard Biesheuvel From: Ard Biesheuvel The C glue code already infers whether or not the current iteration is the final one, by comparing walk.nbytes with walk.total. This means we can easily inform the asm helper of this as well, by conditionally passing a pointer to the original IV, which is used in the finalization of the MAC. This removes the need for a separate call into the asm code to perform the finalization. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 32 ++++++++------------ arch/arm64/crypto/aes-ce-ccm-glue.c | 27 ++++++++--------- 2 files changed, 24 insertions(+), 35 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index 75be3157bae1..c0d89f8ae4c4 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -44,28 +44,12 @@ aese \vb\().16b, v4.16b .endm - /* - * void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u8 const rk[], - * u32 rounds); - */ -SYM_FUNC_START(ce_aes_ccm_final) - ld1 {v0.16b}, [x0] /* load mac */ - ld1 {v1.16b}, [x1] /* load 1st ctriv */ - - aes_encrypt v0, v1, w3 - - /* final round key cancels out */ - eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */ - st1 {v0.16b}, [x0] /* store result */ - ret -SYM_FUNC_END(ce_aes_ccm_final) - SYM_FUNC_START_LOCAL(aes_ccm_do_crypt) load_round_keys x3, w4, x10 + ld1 {v0.16b}, [x5] /* load mac */ cbz x2, 5f ldr x8, [x6, #8] /* load lower ctr */ - ld1 {v0.16b}, [x5] /* load mac */ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ 0: /* outer loop */ ld1 {v1.8b}, [x6] /* load upper ctr */ @@ -89,9 +73,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ bne 0b CPU_LE( rev x8, x8 ) - st1 {v0.16b}, [x5] /* store mac */ str x8, [x6, #8] /* store lsb end of ctr (BE) */ -5: ret +5: cbz x7, 8f + b 7f 6: add x1, x1, w2, sxtw /* rewind the input pointer (w2 < 0) */ add x0, x0, w2, sxtw /* rewind the output pointer */ @@ -111,8 +95,16 @@ CPU_LE( rev x8, x8 ) eor v0.16b, v0.16b, v22.16b /* fold plaintext into mac */ tbx v2.16b, {v6.16b}, v8.16b /* insert output from previous iteration */ - st1 {v0.16b}, [x5] /* store mac */ st1 {v2.16b}, [x0] /* store output block */ + cbz x7, 8f /* time to finalize MAC? */ +7: ld1 {v1.16b}, [x7] /* load 1st ctriv */ + + aes_encrypt v0, v1, w4 + + /* final round key cancels out */ + eor v0.16b, v0.16b, v1.16b /* en-/decrypt the mac */ + +8: st1 {v0.16b}, [x5] /* store mac */ ret SYM_FUNC_END(aes_ccm_do_crypt) diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index ed3d79e05112..ce9b28e3c7d6 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -38,14 +38,11 @@ asmlinkage u32 ce_aes_mac_update(u8 const in[], u32 const rk[], int rounds, asmlinkage void ce_aes_ccm_encrypt(u8 out[], u8 const in[], u32 cbytes, u32 const rk[], u32 rounds, u8 mac[], - u8 ctr[]); + u8 ctr[], u8 const final_iv[]); asmlinkage void ce_aes_ccm_decrypt(u8 out[], u8 const in[], u32 cbytes, u32 const rk[], u32 rounds, u8 mac[], - u8 ctr[]); - -asmlinkage void ce_aes_ccm_final(u8 mac[], u8 const ctr[], u32 const rk[], - u32 rounds); + u8 ctr[], u8 const final_iv[]); static int ccm_setkey(struct crypto_aead *tfm, const u8 *in_key, unsigned int key_len) @@ -210,9 +207,12 @@ static int ccm_encrypt(struct aead_request *req) const u8 *src = walk.src.virt.addr; u8 *dst = walk.dst.virt.addr; u8 buf[AES_BLOCK_SIZE]; + u8 *final_iv = NULL; - if (walk.nbytes == walk.total) + if (walk.nbytes == walk.total) { tail = 0; + final_iv = orig_iv; + } if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes], @@ -220,14 +220,11 @@ static int ccm_encrypt(struct aead_request *req) ce_aes_ccm_encrypt(dst, src, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), - mac, walk.iv); + mac, walk.iv, final_iv); if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) memcpy(walk.dst.virt.addr, dst, walk.nbytes); - if (walk.nbytes == walk.total) - ce_aes_ccm_final(mac, orig_iv, ctx->key_enc, num_rounds(ctx)); - if (walk.nbytes) { err = skcipher_walk_done(&walk, tail); } @@ -277,9 +274,12 @@ static int ccm_decrypt(struct aead_request *req) const u8 *src = walk.src.virt.addr; u8 *dst = walk.dst.virt.addr; u8 buf[AES_BLOCK_SIZE]; + u8 *final_iv = NULL; - if (walk.nbytes == walk.total) + if (walk.nbytes == walk.total) { tail = 0; + final_iv = orig_iv; + } if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) src = dst = memcpy(&buf[sizeof(buf) - walk.nbytes], @@ -287,14 +287,11 @@ static int ccm_decrypt(struct aead_request *req) ce_aes_ccm_decrypt(dst, src, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), - mac, walk.iv); + mac, walk.iv, final_iv); if (unlikely(walk.nbytes < AES_BLOCK_SIZE)) memcpy(walk.dst.virt.addr, dst, walk.nbytes); - if (walk.nbytes == walk.total) - ce_aes_ccm_final(mac, orig_iv, ctx->key_enc, num_rounds(ctx)); - if (walk.nbytes) { err = skcipher_walk_done(&walk, tail); }