From patchwork Wed Jan 17 21:36:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 763300 Delivered-To: patch@linaro.org Received: by 2002:a5d:6e5d:0:b0:337:62d3:c6d5 with SMTP id j29csp486758wrz; Wed, 17 Jan 2024 13:37:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IHBGajgRexS7B75XfoBSdIoLYXuXFl0tchiBQhCYehoaf3DNbm8uNwCpeZTEmwkvEB7uWyU X-Received: by 2002:ac8:5b96:0:b0:42a:1729:89bb with SMTP id a22-20020ac85b96000000b0042a172989bbmr163436qta.67.1705527476256; Wed, 17 Jan 2024 13:37:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705527476; cv=none; d=google.com; s=arc-20160816; b=PI6DDkTan0IAT5+t3dSXDNdepydknXAwGPTZ7ZPc6xAJzjyZkThUFL+E4izRluwefz sgP45F19SvPQnl5z/zrNJbDTLOtzZz+c//imbTi4Vb60iV5QvggEGQCBccsW3NNERn47 3CEtaNo7H+PqEU21qczG75SYxUVSA/HAMQ+GW247vyNQKjEYQddJw5Y++6Ka0H9QOEKf raKtblK/I7TGYm2xxBZ81NHIp6MSLlDU+1OyBBIXwgMTs3a+Ex2pJin49r2+gJ5iW4OS hflKVgivcRnlgTBWGKksQVLFwVPOBhs3noeti51+fvmCzKZRBLOLSRSjWFOqD6PBAI47 wUiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=oR7j/cQ2q9J+BBWwmPWf5u1K7zFcq0NhwGS7s5IaHdc=; fh=RP+RoTVOxVBuxo19xroXD2pEFyNC80qsRQaBKozN748=; b=POnzhLC8m4DT6VMNPaCB9eP0w5s/MJdtt7Q5cH/kPHMmuo0gWXALZjbSBSXtAycYgS MZUYMhXPxk/m3L+Fa0zdhjK+odoh5llV1bn7ocPzAoyb4tNq3qruvkz5Q8ieE2tzLN6h JSuvvpkLNW0QmltNWVVJ/u6uKVQ/eqdVUYGmVSXEZHx5fdoj9ct7/JjF/9dYb+uV668/ 32ZMUplwn1hPy7tSmRQIfRDC8s7vDwq5/wG9r50RQ8mFh3FrzMG5IG0xLONsrF7xh1IS ZZ528kxGZouMS5Y4lMeHT8P2oOXlLeIO5clufOy7MnsZ9cSEgr1JK2DvfY9u5vdGOzbr tROA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="gG/WyUNb"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id r5-20020ac85c85000000b00429c9ccda45si11507767qta.594.2024.01.17.13.37.56 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 17 Jan 2024 13:37:56 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="gG/WyUNb"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rQDau-0000z3-2v; Wed, 17 Jan 2024 16:37:08 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rQDaq-0000xr-2R for qemu-devel@nongnu.org; Wed, 17 Jan 2024 16:37:04 -0500 Received: from mail-pl1-x631.google.com ([2607:f8b0:4864:20::631]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rQDan-000214-Jq for qemu-devel@nongnu.org; Wed, 17 Jan 2024 16:37:03 -0500 Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1d5f388407bso13605325ad.0 for ; Wed, 17 Jan 2024 13:37:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1705527420; x=1706132220; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=oR7j/cQ2q9J+BBWwmPWf5u1K7zFcq0NhwGS7s5IaHdc=; b=gG/WyUNbCxT9k5FLhULIcKrzCQtvI9f3kaCuhpI6Q0OmW7QcJU+7ptimjAFI0CueDU Lp430YmRjd9UoO13oyo4gvMBE0Ak/Nj9UyvyD6B9+M5W8z14LNQ/vK5agfO2tQBTmHe4 3MKfCBPMl8YSH7VybjBB6v09/luDHwA+Vlrqp/wOTqk1pIlIPLPD2jTi/ju420m6NxF9 ynzZ1lAmmMYrcQxPC7pbFSlMe1GEGDBIjSd/KbtDC2IuzN74s5A436ovmWNKx7QrKOMd ws+3cayTzQb8qbh7FzP6h8LhA8IEQ9cK8+5IrIqsy1ZWZW8/xOk6MYi5iWDbVMlEXeli Yz8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705527420; x=1706132220; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oR7j/cQ2q9J+BBWwmPWf5u1K7zFcq0NhwGS7s5IaHdc=; b=LMJZKb7kkVH2QdAA8G/W98zIax4LS8ZsnI0NaxCAju85Zkmy3aJX0rma0mv6onVWXK XFyQN2xmeyur1Jl8YwRYID2+9Nf4m2ncST0gSqBlV1dHqD/uHPkz5IB4pHW6QB8ZNZFV fLJG/ZFPkghOw0Uk28sBNH2lxjniGzDJ488ZcBli3S3VzSuAyWLkB46b2NtOQ89css0n Gnpnv+LjcKvvJvJ3A8psK3WOhjQXCMkE26NAIuMzgza5pI75KAhj5OBDrwAk6avWNYHq Cv7uPUsEvwHBHg2a7z6KvHb7Kb4aLpDtOr114sXf+TOo0JfGKFDuujW6bEoAlHNkG30y C/ig== X-Gm-Message-State: AOJu0Yza45ANEVrv1EQgg9ebSuq9NJrn7FAlWJtP4kO6AILZxIJU//h6 QPYynvhinNeUn5tkuTolg2nzTkYuiakRUdchtvv2Yf7YcNdcAA== X-Received: by 2002:a17:903:1252:b0:1d4:e1f1:ef6b with SMTP id u18-20020a170903125200b001d4e1f1ef6bmr6804453plh.114.1705527420118; Wed, 17 Jan 2024 13:37:00 -0800 (PST) Received: from stoup.net ([2001:8004:2728:321b:5fc1:fe4b:9b89:f799]) by smtp.gmail.com with ESMTPSA id r11-20020a170903410b00b001ca86a9caccsm104824pld.228.2024.01.17.13.36.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 13:36:59 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: qemu-s390x@nongnu.org, thuth@redhat.com, david@redhat.com, philmd@linaro.org, mjt@tls.msk.ru, qemu-stable@nongnu.org Subject: [PATCH 1/2] tcg/s390x: Fix encoding of VRIc, VRSa, VRSc insns Date: Thu, 18 Jan 2024 08:36:45 +1100 Message-Id: <20240117213646.159697-2-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240117213646.159697-1-richard.henderson@linaro.org> References: <20240117213646.159697-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::631; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x631.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org While the format names the second vector register 'v3', it is still in the second position (bits 12-15) and the argument to RXB must match. Example error: - e7 00 00 10 2a 33 verllf %v16,%v0,16 + e7 00 00 10 2c 33 verllf %v16,%v16,16 Cc: qemu-stable@nongnu.org Reported-by: Michael Tokarev Fixes: 22cb37b4172 ("tcg/s390x: Implement vector shift operations") Fixes: 79cada8693d ("tcg/s390x: Implement tcg_out_dup*_vec") Signed-off-by: Richard Henderson Reviewed-by: Thomas Huth --- tcg/s390x/tcg-target.c.inc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index fbee43d3b0..7f6b84aa2c 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -683,7 +683,7 @@ static void tcg_out_insn_VRIc(TCGContext *s, S390Opcode op, tcg_debug_assert(is_vector_reg(v3)); tcg_out16(s, (op & 0xff00) | ((v1 & 0xf) << 4) | (v3 & 0xf)); tcg_out16(s, i2); - tcg_out16(s, (op & 0x00ff) | RXB(v1, 0, v3, 0) | (m4 << 12)); + tcg_out16(s, (op & 0x00ff) | RXB(v1, v3, 0, 0) | (m4 << 12)); } static void tcg_out_insn_VRRa(TCGContext *s, S390Opcode op, @@ -738,7 +738,7 @@ static void tcg_out_insn_VRSa(TCGContext *s, S390Opcode op, TCGReg v1, tcg_debug_assert(is_vector_reg(v3)); tcg_out16(s, (op & 0xff00) | ((v1 & 0xf) << 4) | (v3 & 0xf)); tcg_out16(s, b2 << 12 | d2); - tcg_out16(s, (op & 0x00ff) | RXB(v1, 0, v3, 0) | (m4 << 12)); + tcg_out16(s, (op & 0x00ff) | RXB(v1, v3, 0, 0) | (m4 << 12)); } static void tcg_out_insn_VRSb(TCGContext *s, S390Opcode op, TCGReg v1, @@ -762,7 +762,7 @@ static void tcg_out_insn_VRSc(TCGContext *s, S390Opcode op, TCGReg r1, tcg_debug_assert(is_vector_reg(v3)); tcg_out16(s, (op & 0xff00) | (r1 << 4) | (v3 & 0xf)); tcg_out16(s, b2 << 12 | d2); - tcg_out16(s, (op & 0x00ff) | RXB(0, 0, v3, 0) | (m4 << 12)); + tcg_out16(s, (op & 0x00ff) | RXB(0, v3, 0, 0) | (m4 << 12)); } static void tcg_out_insn_VRX(TCGContext *s, S390Opcode op, TCGReg v1, From patchwork Wed Jan 17 21:36:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 763301 Delivered-To: patch@linaro.org Received: by 2002:a5d:6e5d:0:b0:337:62d3:c6d5 with SMTP id j29csp486763wrz; Wed, 17 Jan 2024 13:37:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IE9PYoBnQ5dAWMsnr86g3QbX4H/NcEb87ZrGEoPFG0edDitSDvwDffbIvy3YIlJ3y5VqbYb X-Received: by 2002:a05:622a:344:b0:42a:1323:f17c with SMTP id r4-20020a05622a034400b0042a1323f17cmr1237540qtw.37.1705527476728; Wed, 17 Jan 2024 13:37:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705527476; cv=none; d=google.com; s=arc-20160816; b=WGwy7sDiYtktN+4aXCef3q4dOBB90rpM0iTm4EuM1Djl5ikU6EmmlXiiuQnrZnIXp6 yGioMi95U69OgnvEP7nF+m3JSvb/Cg92iByn/hioh/rrlP6aIWkBcwSfX4G+6cYyohYf h1huFgjxsHkVW1LRoarOaAkthty/sn3CSCOrfFx6F8j4qdVsA+5ZlglUbzYz/yWeuO6u jRsw+6SdBKvLIUP2b67D47laMDIiiF749S8Ffp+HHd/OKB+tqdHpwrBZVEl+ePaNMHQ9 A/Qk6jqFEq2tNb+ne9JINdDXDN0mXfJLXhwBVs7KAZgecXmF/e+fzFZXFyQMUBpBwUNy BZCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=Zm5fLtKGthE5PBK8p65aRfJCZIv9q3CFbJe4iN25PCY=; fh=ng6KirANjHaclBqDKCE0lDzTB8JDm1NjPSMRD533NK0=; b=rh+9qtDeOQ1HQ8DFmN/zINzwoOjZMgZavEGmyJxzXrbdzXGsQRVYog7ipAlP5LHF8u hOdd4PlDQjyRXjH5LvtpSJlFv50foh3BVAUK9SIJLFooUNyiP8a8bma8g5w8WoIbY9Cb 3oWSf/ZNw2X9LjaLg4hOsGWplKjHTcy4jfO9w4/7pfDCboWbojfjyHQ072FBIzGLNc8q grM62idyT8Igtmiq16UGJKpU5O/g5JH664sVob6TZupwwD2sswH11/cg/MX2mMOp08n/ kHIuHjB2Cpwbf12e8b7xLZwcqLWAFBawiHM2WuJQE3mjnGNeLnhHbkG3HM9z6i09nu7u eE1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PPVgxmTE; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id k20-20020ae9f114000000b007835aa0b79csi7023524qkg.417.2024.01.17.13.37.56 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Wed, 17 Jan 2024 13:37:56 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PPVgxmTE; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rQDbA-00010q-BZ; Wed, 17 Jan 2024 16:37:24 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rQDaz-0000zT-8b for qemu-devel@nongnu.org; Wed, 17 Jan 2024 16:37:14 -0500 Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rQDav-000226-WA for qemu-devel@nongnu.org; Wed, 17 Jan 2024 16:37:13 -0500 Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1d6f289a296so4535675ad.2 for ; Wed, 17 Jan 2024 13:37:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1705527428; x=1706132228; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Zm5fLtKGthE5PBK8p65aRfJCZIv9q3CFbJe4iN25PCY=; b=PPVgxmTEE5HSn4O0Vp2JKA0RapGdlpYQ8ZwwaNC3KWoGjOurQc0KepJwj5dfKV0v8X s/xBBPweTQbckxnVMiSZWWg4vm0m7OrlAB65nHdcXlvD1zYL0KLPTfzrHx0lMSuxkwPe KqQnskbmjOuI0XGEL+8zkaZ/dJRgar5gH18LGGzrZ09e7p4i1wOp2con+zccj2exj0H6 IwsI6Au7gzwAiixKMSBsZv+m+yfwZ3xfmobNDcR02QqPefNUySyC96QZhmtK1Bu+3fVz CIACMt7ETHGnhCwKb+/sjXaOCD4+bl+A2ehYVzMxwsvvoq10m+D4UQjOCPKD/Y5P7pXY EpnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705527428; x=1706132228; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Zm5fLtKGthE5PBK8p65aRfJCZIv9q3CFbJe4iN25PCY=; b=euMCaCTTf3eIoH8W1tsoqP/IVdYgiKFJfwN7PYMQrW19W/U6S7dYOJCaUs+fB8mPgo EVEvSCHgYkcymcc1mp4g4O8JWE8G5It/ndQnIGYyRSnEs7cWKv4wqM9GTcFaIE6k0DaV TLeDwJV6v7aNUikid7W5wqtrYX4WQ8RQo2Jrj5+qQeGm/08Efq5KuXwBLHaeSM3kXjSz /pgKIzNIpTbtJ/lEPjXrZsuZfw0eA9f9v9H/mgrRJN8HVb+8RPvyUV0939pub8q98ZhX LbfPslbwljbPb8DjdqtSc7J2iYIdClqUuA2rRQUzSNepMpleMPnmSALiqFgmagD26wpZ QeFQ== X-Gm-Message-State: AOJu0YwSWZ9in8Zh+FjRCgbaOP5vpm/ginIYipmtgraExpSB4MrI34CM l3IMGeWEhlTWf6ScNRPONT4Avp4lXIU6tbbBDwKyXDRuh8ylcQ== X-Received: by 2002:a17:902:6941:b0:1d5:8cbb:a7fc with SMTP id k1-20020a170902694100b001d58cbba7fcmr6222241plt.24.1705527428487; Wed, 17 Jan 2024 13:37:08 -0800 (PST) Received: from stoup.net ([2001:8004:2728:321b:5fc1:fe4b:9b89:f799]) by smtp.gmail.com with ESMTPSA id r11-20020a170903410b00b001ca86a9caccsm104824pld.228.2024.01.17.13.37.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 13:37:08 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Cc: qemu-s390x@nongnu.org, thuth@redhat.com, david@redhat.com, philmd@linaro.org, mjt@tls.msk.ru Subject: [PATCH 2/2] tests/tcg/s390x: Import linux tools/testing/crypto/chacha20-s390 Date: Thu, 18 Jan 2024 08:36:46 +1100 Message-Id: <20240117213646.159697-3-richard.henderson@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240117213646.159697-1-richard.henderson@linaro.org> References: <20240117213646.159697-1-richard.henderson@linaro.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2607:f8b0:4864:20::630; envelope-from=richard.henderson@linaro.org; helo=mail-pl1-x630.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org Modify and simplify the driver, as we're really only interested in correctness of translation of chacha-vx.S. Signed-off-by: Richard Henderson Tested-by: Thomas Huth --- tests/tcg/s390x/chacha.c | 341 ++++++++++++ tests/tcg/s390x/Makefile.target | 4 + tests/tcg/s390x/chacha-vx.S | 914 ++++++++++++++++++++++++++++++++ 3 files changed, 1259 insertions(+) create mode 100644 tests/tcg/s390x/chacha.c create mode 100644 tests/tcg/s390x/chacha-vx.S diff --git a/tests/tcg/s390x/chacha.c b/tests/tcg/s390x/chacha.c new file mode 100644 index 0000000000..ca9e4c1959 --- /dev/null +++ b/tests/tcg/s390x/chacha.c @@ -0,0 +1,341 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Derived from linux kernel sources: + * ./include/crypto/chacha.h + * ./crypto/chacha_generic.c + * ./arch/s390/crypto/chacha-glue.c + * ./tools/testing/crypto/chacha20-s390/test-cipher.c + * ./tools/testing/crypto/chacha20-s390/run-tests.sh + */ + +#include +#include +#include +#include +#include +#include + +typedef uint8_t u8; +typedef uint32_t u32; +typedef uint64_t u64; + +static unsigned data_size; +static bool debug; + +#define CHACHA_IV_SIZE 16 +#define CHACHA_KEY_SIZE 32 +#define CHACHA_BLOCK_SIZE 64 +#define CHACHAPOLY_IV_SIZE 12 +#define CHACHA_STATE_WORDS (CHACHA_BLOCK_SIZE / sizeof(u32)) + +static u32 rol32(u32 val, u32 sh) +{ + return (val << (sh & 31)) | (val >> (-sh & 31)); +} + +static u32 get_unaligned_le32(const void *ptr) +{ + u32 val; + memcpy(&val, ptr, 4); + return __builtin_bswap32(val); +} + +static void put_unaligned_le32(u32 val, void *ptr) +{ + val = __builtin_bswap32(val); + memcpy(ptr, &val, 4); +} + +static void chacha_permute(u32 *x, int nrounds) +{ + for (int i = 0; i < nrounds; i += 2) { + x[0] += x[4]; x[12] = rol32(x[12] ^ x[0], 16); + x[1] += x[5]; x[13] = rol32(x[13] ^ x[1], 16); + x[2] += x[6]; x[14] = rol32(x[14] ^ x[2], 16); + x[3] += x[7]; x[15] = rol32(x[15] ^ x[3], 16); + + x[8] += x[12]; x[4] = rol32(x[4] ^ x[8], 12); + x[9] += x[13]; x[5] = rol32(x[5] ^ x[9], 12); + x[10] += x[14]; x[6] = rol32(x[6] ^ x[10], 12); + x[11] += x[15]; x[7] = rol32(x[7] ^ x[11], 12); + + x[0] += x[4]; x[12] = rol32(x[12] ^ x[0], 8); + x[1] += x[5]; x[13] = rol32(x[13] ^ x[1], 8); + x[2] += x[6]; x[14] = rol32(x[14] ^ x[2], 8); + x[3] += x[7]; x[15] = rol32(x[15] ^ x[3], 8); + + x[8] += x[12]; x[4] = rol32(x[4] ^ x[8], 7); + x[9] += x[13]; x[5] = rol32(x[5] ^ x[9], 7); + x[10] += x[14]; x[6] = rol32(x[6] ^ x[10], 7); + x[11] += x[15]; x[7] = rol32(x[7] ^ x[11], 7); + + x[0] += x[5]; x[15] = rol32(x[15] ^ x[0], 16); + x[1] += x[6]; x[12] = rol32(x[12] ^ x[1], 16); + x[2] += x[7]; x[13] = rol32(x[13] ^ x[2], 16); + x[3] += x[4]; x[14] = rol32(x[14] ^ x[3], 16); + + x[10] += x[15]; x[5] = rol32(x[5] ^ x[10], 12); + x[11] += x[12]; x[6] = rol32(x[6] ^ x[11], 12); + x[8] += x[13]; x[7] = rol32(x[7] ^ x[8], 12); + x[9] += x[14]; x[4] = rol32(x[4] ^ x[9], 12); + + x[0] += x[5]; x[15] = rol32(x[15] ^ x[0], 8); + x[1] += x[6]; x[12] = rol32(x[12] ^ x[1], 8); + x[2] += x[7]; x[13] = rol32(x[13] ^ x[2], 8); + x[3] += x[4]; x[14] = rol32(x[14] ^ x[3], 8); + + x[10] += x[15]; x[5] = rol32(x[5] ^ x[10], 7); + x[11] += x[12]; x[6] = rol32(x[6] ^ x[11], 7); + x[8] += x[13]; x[7] = rol32(x[7] ^ x[8], 7); + x[9] += x[14]; x[4] = rol32(x[4] ^ x[9], 7); + } +} + +static void chacha_block_generic(u32 *state, u8 *stream, int nrounds) +{ + u32 x[16]; + + memcpy(x, state, 64); + chacha_permute(x, nrounds); + + for (int i = 0; i < 16; i++) { + put_unaligned_le32(x[i] + state[i], &stream[i * sizeof(u32)]); + } + state[12]++; +} + +static void crypto_xor_cpy(u8 *dst, const u8 *src1, + const u8 *src2, unsigned len) +{ + while (len--) { + *dst++ = *src1++ ^ *src2++; + } +} + +static void chacha_crypt_generic(u32 *state, u8 *dst, const u8 *src, + unsigned int bytes, int nrounds) +{ + u8 stream[CHACHA_BLOCK_SIZE]; + + while (bytes >= CHACHA_BLOCK_SIZE) { + chacha_block_generic(state, stream, nrounds); + crypto_xor_cpy(dst, src, stream, CHACHA_BLOCK_SIZE); + bytes -= CHACHA_BLOCK_SIZE; + dst += CHACHA_BLOCK_SIZE; + src += CHACHA_BLOCK_SIZE; + } + if (bytes) { + chacha_block_generic(state, stream, nrounds); + crypto_xor_cpy(dst, src, stream, bytes); + } +} + +enum chacha_constants { /* expand 32-byte k */ + CHACHA_CONSTANT_EXPA = 0x61707865U, + CHACHA_CONSTANT_ND_3 = 0x3320646eU, + CHACHA_CONSTANT_2_BY = 0x79622d32U, + CHACHA_CONSTANT_TE_K = 0x6b206574U +}; + +static void chacha_init_generic(u32 *state, const u32 *key, const u8 *iv) +{ + state[0] = CHACHA_CONSTANT_EXPA; + state[1] = CHACHA_CONSTANT_ND_3; + state[2] = CHACHA_CONSTANT_2_BY; + state[3] = CHACHA_CONSTANT_TE_K; + state[4] = key[0]; + state[5] = key[1]; + state[6] = key[2]; + state[7] = key[3]; + state[8] = key[4]; + state[9] = key[5]; + state[10] = key[6]; + state[11] = key[7]; + state[12] = get_unaligned_le32(iv + 0); + state[13] = get_unaligned_le32(iv + 4); + state[14] = get_unaligned_le32(iv + 8); + state[15] = get_unaligned_le32(iv + 12); +} + +void chacha20_vx(u8 *out, const u8 *inp, size_t len, const u32 *key, + const u32 *counter); + +static void chacha20_crypt_s390(u32 *state, u8 *dst, const u8 *src, + unsigned int nbytes, const u32 *key, + u32 *counter) +{ + chacha20_vx(dst, src, nbytes, key, counter); + *counter += (nbytes + CHACHA_BLOCK_SIZE - 1) / CHACHA_BLOCK_SIZE; +} + +static void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, + unsigned int bytes, int nrounds) +{ + /* + * s390 chacha20 implementation has 20 rounds hard-coded, + * it cannot handle a block of data or less, but otherwise + * it can handle data of arbitrary size + */ + if (bytes <= CHACHA_BLOCK_SIZE || nrounds != 20) { + chacha_crypt_generic(state, dst, src, bytes, nrounds); + } else { + chacha20_crypt_s390(state, dst, src, bytes, &state[4], &state[12]); + } +} + +static void print_hex_dump(const char *prefix_str, const void *buf, int len) +{ + for (int i = 0; i < len; i += 16) { + printf("%s%.8x: ", prefix_str, i); + for (int j = 0; j < 16; ++j) { + printf("%02x%c", *(u8 *)(buf + i + j), j == 15 ? '\n' : ' '); + } + } +} + +/* Perform cipher operations with the chacha lib */ +static int test_lib_chacha(u8 *revert, u8 *cipher, u8 *plain, bool generic) +{ + u32 chacha_state[CHACHA_STATE_WORDS]; + u8 iv[16], key[32]; + + memset(key, 'X', sizeof(key)); + memset(iv, 'I', sizeof(iv)); + + if (debug) { + print_hex_dump("key: ", key, 32); + print_hex_dump("iv: ", iv, 16); + } + + /* Encrypt */ + chacha_init_generic(chacha_state, (u32*)key, iv); + + if (generic) { + chacha_crypt_generic(chacha_state, cipher, plain, data_size, 20); + } else { + chacha_crypt_arch(chacha_state, cipher, plain, data_size, 20); + } + + if (debug) { + print_hex_dump("encr:", cipher, + (data_size > 64 ? 64 : data_size)); + } + + /* Decrypt */ + chacha_init_generic(chacha_state, (u32 *)key, iv); + + if (generic) { + chacha_crypt_generic(chacha_state, revert, cipher, data_size, 20); + } else { + chacha_crypt_arch(chacha_state, revert, cipher, data_size, 20); + } + + if (debug) { + print_hex_dump("decr:", revert, + (data_size > 64 ? 64 : data_size)); + } + return 0; +} + +static int chacha_s390_test_init(void) +{ + u8 *plain = NULL, *revert = NULL; + u8 *cipher_generic = NULL, *cipher_s390 = NULL; + int ret = -1; + + printf("s390 ChaCha20 test module: size=%d debug=%d\n", + data_size, debug); + + /* Allocate and fill buffers */ + plain = malloc(data_size); + if (!plain) { + printf("could not allocate plain buffer\n"); + ret = -2; + goto out; + } + + memset(plain, 'a', data_size); + for (unsigned i = 0, n = data_size > 256 ? 256 : data_size; i < n; ) { + ssize_t t = getrandom(plain + i, n - i, 0); + if (t < 0) { + break; + } + i -= t; + } + + cipher_generic = calloc(1, data_size); + if (!cipher_generic) { + printf("could not allocate cipher_generic buffer\n"); + ret = -2; + goto out; + } + + cipher_s390 = calloc(1, data_size); + if (!cipher_s390) { + printf("could not allocate cipher_s390 buffer\n"); + ret = -2; + goto out; + } + + revert = calloc(1, data_size); + if (!revert) { + printf("could not allocate revert buffer\n"); + ret = -2; + goto out; + } + + if (debug) { + print_hex_dump("src: ", plain, + (data_size > 64 ? 64 : data_size)); + } + + /* Use chacha20 lib */ + test_lib_chacha(revert, cipher_generic, plain, true); + if (memcmp(plain, revert, data_size)) { + printf("generic en/decryption check FAILED\n"); + ret = -2; + goto out; + } + printf("generic en/decryption check OK\n"); + + test_lib_chacha(revert, cipher_s390, plain, false); + if (memcmp(plain, revert, data_size)) { + printf("lib en/decryption check FAILED\n"); + ret = -2; + goto out; + } + printf("lib en/decryption check OK\n"); + + if (memcmp(cipher_generic, cipher_s390, data_size)) { + printf("lib vs generic check FAILED\n"); + ret = -2; + goto out; + } + printf("lib vs generic check OK\n"); + + printf("--- chacha20 s390 test end ---\n"); + +out: + free(plain); + free(cipher_generic); + free(cipher_s390); + free(revert); + return ret; +} + +int main(int ac, char **av) +{ + static const unsigned sizes[] = { + 63, 64, 65, 127, 128, 129, 511, 512, 513, 4096, 65611, + /* too slow for tcg: 6291456, 62914560 */ + }; + + debug = ac >= 2; + for (int i = 0; i < sizeof(sizes) / sizeof(sizes[0]); ++i) { + data_size = sizes[i]; + if (chacha_s390_test_init() != -1) { + return 1; + } + } + return 0; +} diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target index 30994dcf9c..28f19a3176 100644 --- a/tests/tcg/s390x/Makefile.target +++ b/tests/tcg/s390x/Makefile.target @@ -66,9 +66,13 @@ Z13_TESTS+=vcksm Z13_TESTS+=vstl Z13_TESTS+=vrep Z13_TESTS+=precise-smc-user +Z13_TESTS+=chacha $(Z13_TESTS): CFLAGS+=-march=z13 -O2 TESTS+=$(Z13_TESTS) +chacha: chacha.c chacha-vx.S + $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $^ -o $@ + ifneq ($(CROSS_CC_HAS_Z14),) Z14_TESTS=vfminmax vfminmax: LDFLAGS+=-lm diff --git a/tests/tcg/s390x/chacha-vx.S b/tests/tcg/s390x/chacha-vx.S new file mode 100644 index 0000000000..eee6275368 --- /dev/null +++ b/tests/tcg/s390x/chacha-vx.S @@ -0,0 +1,914 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Original implementation written by Andy Polyakov, @dot-asm. + * This is an adaptation of the original code for kernel use. + * + * Copyright (C) 2006-2019 CRYPTOGAMS by . All Rights Reserved. + * + * For qemu testing, drop and assume assembler support. + */ + +#define SP %r15 +#define FRAME (16 * 8 + 4 * 8) + + .data + .balign 32 + +sigma: + .long 0x61707865,0x3320646e,0x79622d32,0x6b206574 # endian-neutral + .long 1,0,0,0 + .long 2,0,0,0 + .long 3,0,0,0 + .long 0x03020100,0x07060504,0x0b0a0908,0x0f0e0d0c # byte swap + + .long 0,1,2,3 + .long 0x61707865,0x61707865,0x61707865,0x61707865 # smashed sigma + .long 0x3320646e,0x3320646e,0x3320646e,0x3320646e + .long 0x79622d32,0x79622d32,0x79622d32,0x79622d32 + .long 0x6b206574,0x6b206574,0x6b206574,0x6b206574 + + .type sigma, @object + .size sigma, . - sigma + + .previous + + .text + +############################################################################# +# void chacha20_vx_4x(u8 *out, counst u8 *inp, size_t len, +# counst u32 *key, const u32 *counter) + +#define OUT %r2 +#define INP %r3 +#define LEN %r4 +#define KEY %r5 +#define COUNTER %r6 + +#define BEPERM %v31 +#define CTR %v26 + +#define K0 %v16 +#define K1 %v17 +#define K2 %v18 +#define K3 %v19 + +#define XA0 %v0 +#define XA1 %v1 +#define XA2 %v2 +#define XA3 %v3 + +#define XB0 %v4 +#define XB1 %v5 +#define XB2 %v6 +#define XB3 %v7 + +#define XC0 %v8 +#define XC1 %v9 +#define XC2 %v10 +#define XC3 %v11 + +#define XD0 %v12 +#define XD1 %v13 +#define XD2 %v14 +#define XD3 %v15 + +#define XT0 %v27 +#define XT1 %v28 +#define XT2 %v29 +#define XT3 %v30 + + .balign 32 +chacha20_vx_4x: + stmg %r6,%r7,6*8(SP) + + larl %r7,sigma + lhi %r0,10 + lhi %r1,0 + + vl K0,0(%r7) # load sigma + vl K1,0(KEY) # load key + vl K2,16(KEY) + vl K3,0(COUNTER) # load counter + + vl BEPERM,0x40(%r7) + vl CTR,0x50(%r7) + + vlm XA0,XA3,0x60(%r7),4 # load [smashed] sigma + + vrepf XB0,K1,0 # smash the key + vrepf XB1,K1,1 + vrepf XB2,K1,2 + vrepf XB3,K1,3 + + vrepf XD0,K3,0 + vrepf XD1,K3,1 + vrepf XD2,K3,2 + vrepf XD3,K3,3 + vaf XD0,XD0,CTR + + vrepf XC0,K2,0 + vrepf XC1,K2,1 + vrepf XC2,K2,2 + vrepf XC3,K2,3 + +.Loop_4x: + vaf XA0,XA0,XB0 + vx XD0,XD0,XA0 + verllf XD0,XD0,16 + + vaf XA1,XA1,XB1 + vx XD1,XD1,XA1 + verllf XD1,XD1,16 + + vaf XA2,XA2,XB2 + vx XD2,XD2,XA2 + verllf XD2,XD2,16 + + vaf XA3,XA3,XB3 + vx XD3,XD3,XA3 + verllf XD3,XD3,16 + + vaf XC0,XC0,XD0 + vx XB0,XB0,XC0 + verllf XB0,XB0,12 + + vaf XC1,XC1,XD1 + vx XB1,XB1,XC1 + verllf XB1,XB1,12 + + vaf XC2,XC2,XD2 + vx XB2,XB2,XC2 + verllf XB2,XB2,12 + + vaf XC3,XC3,XD3 + vx XB3,XB3,XC3 + verllf XB3,XB3,12 + + vaf XA0,XA0,XB0 + vx XD0,XD0,XA0 + verllf XD0,XD0,8 + + vaf XA1,XA1,XB1 + vx XD1,XD1,XA1 + verllf XD1,XD1,8 + + vaf XA2,XA2,XB2 + vx XD2,XD2,XA2 + verllf XD2,XD2,8 + + vaf XA3,XA3,XB3 + vx XD3,XD3,XA3 + verllf XD3,XD3,8 + + vaf XC0,XC0,XD0 + vx XB0,XB0,XC0 + verllf XB0,XB0,7 + + vaf XC1,XC1,XD1 + vx XB1,XB1,XC1 + verllf XB1,XB1,7 + + vaf XC2,XC2,XD2 + vx XB2,XB2,XC2 + verllf XB2,XB2,7 + + vaf XC3,XC3,XD3 + vx XB3,XB3,XC3 + verllf XB3,XB3,7 + + vaf XA0,XA0,XB1 + vx XD3,XD3,XA0 + verllf XD3,XD3,16 + + vaf XA1,XA1,XB2 + vx XD0,XD0,XA1 + verllf XD0,XD0,16 + + vaf XA2,XA2,XB3 + vx XD1,XD1,XA2 + verllf XD1,XD1,16 + + vaf XA3,XA3,XB0 + vx XD2,XD2,XA3 + verllf XD2,XD2,16 + + vaf XC2,XC2,XD3 + vx XB1,XB1,XC2 + verllf XB1,XB1,12 + + vaf XC3,XC3,XD0 + vx XB2,XB2,XC3 + verllf XB2,XB2,12 + + vaf XC0,XC0,XD1 + vx XB3,XB3,XC0 + verllf XB3,XB3,12 + + vaf XC1,XC1,XD2 + vx XB0,XB0,XC1 + verllf XB0,XB0,12 + + vaf XA0,XA0,XB1 + vx XD3,XD3,XA0 + verllf XD3,XD3,8 + + vaf XA1,XA1,XB2 + vx XD0,XD0,XA1 + verllf XD0,XD0,8 + + vaf XA2,XA2,XB3 + vx XD1,XD1,XA2 + verllf XD1,XD1,8 + + vaf XA3,XA3,XB0 + vx XD2,XD2,XA3 + verllf XD2,XD2,8 + + vaf XC2,XC2,XD3 + vx XB1,XB1,XC2 + verllf XB1,XB1,7 + + vaf XC3,XC3,XD0 + vx XB2,XB2,XC3 + verllf XB2,XB2,7 + + vaf XC0,XC0,XD1 + vx XB3,XB3,XC0 + verllf XB3,XB3,7 + + vaf XC1,XC1,XD2 + vx XB0,XB0,XC1 + verllf XB0,XB0,7 + brct %r0,.Loop_4x + + vaf XD0,XD0,CTR + + vmrhf XT0,XA0,XA1 # transpose data + vmrhf XT1,XA2,XA3 + vmrlf XT2,XA0,XA1 + vmrlf XT3,XA2,XA3 + vpdi XA0,XT0,XT1,0b0000 + vpdi XA1,XT0,XT1,0b0101 + vpdi XA2,XT2,XT3,0b0000 + vpdi XA3,XT2,XT3,0b0101 + + vmrhf XT0,XB0,XB1 + vmrhf XT1,XB2,XB3 + vmrlf XT2,XB0,XB1 + vmrlf XT3,XB2,XB3 + vpdi XB0,XT0,XT1,0b0000 + vpdi XB1,XT0,XT1,0b0101 + vpdi XB2,XT2,XT3,0b0000 + vpdi XB3,XT2,XT3,0b0101 + + vmrhf XT0,XC0,XC1 + vmrhf XT1,XC2,XC3 + vmrlf XT2,XC0,XC1 + vmrlf XT3,XC2,XC3 + vpdi XC0,XT0,XT1,0b0000 + vpdi XC1,XT0,XT1,0b0101 + vpdi XC2,XT2,XT3,0b0000 + vpdi XC3,XT2,XT3,0b0101 + + vmrhf XT0,XD0,XD1 + vmrhf XT1,XD2,XD3 + vmrlf XT2,XD0,XD1 + vmrlf XT3,XD2,XD3 + vpdi XD0,XT0,XT1,0b0000 + vpdi XD1,XT0,XT1,0b0101 + vpdi XD2,XT2,XT3,0b0000 + vpdi XD3,XT2,XT3,0b0101 + + vaf XA0,XA0,K0 + vaf XB0,XB0,K1 + vaf XC0,XC0,K2 + vaf XD0,XD0,K3 + + vperm XA0,XA0,XA0,BEPERM + vperm XB0,XB0,XB0,BEPERM + vperm XC0,XC0,XC0,BEPERM + vperm XD0,XD0,XD0,BEPERM + + vlm XT0,XT3,0(INP),0 + + vx XT0,XT0,XA0 + vx XT1,XT1,XB0 + vx XT2,XT2,XC0 + vx XT3,XT3,XD0 + + vstm XT0,XT3,0(OUT),0 + + la INP,0x40(INP) + la OUT,0x40(OUT) + aghi LEN,-0x40 + + vaf XA0,XA1,K0 + vaf XB0,XB1,K1 + vaf XC0,XC1,K2 + vaf XD0,XD1,K3 + + vperm XA0,XA0,XA0,BEPERM + vperm XB0,XB0,XB0,BEPERM + vperm XC0,XC0,XC0,BEPERM + vperm XD0,XD0,XD0,BEPERM + + clgfi LEN,0x40 + jl .Ltail_4x + + vlm XT0,XT3,0(INP),0 + + vx XT0,XT0,XA0 + vx XT1,XT1,XB0 + vx XT2,XT2,XC0 + vx XT3,XT3,XD0 + + vstm XT0,XT3,0(OUT),0 + + la INP,0x40(INP) + la OUT,0x40(OUT) + aghi LEN,-0x40 + je .Ldone_4x + + vaf XA0,XA2,K0 + vaf XB0,XB2,K1 + vaf XC0,XC2,K2 + vaf XD0,XD2,K3 + + vperm XA0,XA0,XA0,BEPERM + vperm XB0,XB0,XB0,BEPERM + vperm XC0,XC0,XC0,BEPERM + vperm XD0,XD0,XD0,BEPERM + + clgfi LEN,0x40 + jl .Ltail_4x + + vlm XT0,XT3,0(INP),0 + + vx XT0,XT0,XA0 + vx XT1,XT1,XB0 + vx XT2,XT2,XC0 + vx XT3,XT3,XD0 + + vstm XT0,XT3,0(OUT),0 + + la INP,0x40(INP) + la OUT,0x40(OUT) + aghi LEN,-0x40 + je .Ldone_4x + + vaf XA0,XA3,K0 + vaf XB0,XB3,K1 + vaf XC0,XC3,K2 + vaf XD0,XD3,K3 + + vperm XA0,XA0,XA0,BEPERM + vperm XB0,XB0,XB0,BEPERM + vperm XC0,XC0,XC0,BEPERM + vperm XD0,XD0,XD0,BEPERM + + clgfi LEN,0x40 + jl .Ltail_4x + + vlm XT0,XT3,0(INP),0 + + vx XT0,XT0,XA0 + vx XT1,XT1,XB0 + vx XT2,XT2,XC0 + vx XT3,XT3,XD0 + + vstm XT0,XT3,0(OUT),0 + +.Ldone_4x: + lmg %r6,%r7,6*8(SP) + br %r14 + +.Ltail_4x: + vlr XT0,XC0 + vlr XT1,XD0 + + vst XA0,8*8+0x00(SP) + vst XB0,8*8+0x10(SP) + vst XT0,8*8+0x20(SP) + vst XT1,8*8+0x30(SP) + + lghi %r1,0 + +.Loop_tail_4x: + llgc %r5,0(%r1,INP) + llgc %r6,8*8(%r1,SP) + xr %r6,%r5 + stc %r6,0(%r1,OUT) + la %r1,1(%r1) + brct LEN,.Loop_tail_4x + + lmg %r6,%r7,6*8(SP) + br %r14 + + .type chacha20_vx_4x, @function + .size chacha20_vx_4x, . - chacha20_vx_4x + +#undef OUT +#undef INP +#undef LEN +#undef KEY +#undef COUNTER + +#undef BEPERM + +#undef K0 +#undef K1 +#undef K2 +#undef K3 + + +############################################################################# +# void chacha20_vx(u8 *out, counst u8 *inp, size_t len, +# counst u32 *key, const u32 *counter) + +#define OUT %r2 +#define INP %r3 +#define LEN %r4 +#define KEY %r5 +#define COUNTER %r6 + +#define BEPERM %v31 + +#define K0 %v27 +#define K1 %v24 +#define K2 %v25 +#define K3 %v26 + +#define A0 %v0 +#define B0 %v1 +#define C0 %v2 +#define D0 %v3 + +#define A1 %v4 +#define B1 %v5 +#define C1 %v6 +#define D1 %v7 + +#define A2 %v8 +#define B2 %v9 +#define C2 %v10 +#define D2 %v11 + +#define A3 %v12 +#define B3 %v13 +#define C3 %v14 +#define D3 %v15 + +#define A4 %v16 +#define B4 %v17 +#define C4 %v18 +#define D4 %v19 + +#define A5 %v20 +#define B5 %v21 +#define C5 %v22 +#define D5 %v23 + +#define T0 %v27 +#define T1 %v28 +#define T2 %v29 +#define T3 %v30 + + .balign 32 +chacha20_vx: + clgfi LEN,256 + jle chacha20_vx_4x + stmg %r6,%r7,6*8(SP) + + lghi %r1,-FRAME + lgr %r0,SP + la SP,0(%r1,SP) + stg %r0,0(SP) # back-chain + + larl %r7,sigma + lhi %r0,10 + + vlm K1,K2,0(KEY),0 # load key + vl K3,0(COUNTER) # load counter + + vlm K0,BEPERM,0(%r7),4 # load sigma, increments, ... + +.Loop_outer_vx: + vlr A0,K0 + vlr B0,K1 + vlr A1,K0 + vlr B1,K1 + vlr A2,K0 + vlr B2,K1 + vlr A3,K0 + vlr B3,K1 + vlr A4,K0 + vlr B4,K1 + vlr A5,K0 + vlr B5,K1 + + vlr D0,K3 + vaf D1,K3,T1 # K[3]+1 + vaf D2,K3,T2 # K[3]+2 + vaf D3,K3,T3 # K[3]+3 + vaf D4,D2,T2 # K[3]+4 + vaf D5,D2,T3 # K[3]+5 + + vlr C0,K2 + vlr C1,K2 + vlr C2,K2 + vlr C3,K2 + vlr C4,K2 + vlr C5,K2 + + vlr T1,D1 + vlr T2,D2 + vlr T3,D3 + +.Loop_vx: + vaf A0,A0,B0 + vaf A1,A1,B1 + vaf A2,A2,B2 + vaf A3,A3,B3 + vaf A4,A4,B4 + vaf A5,A5,B5 + vx D0,D0,A0 + vx D1,D1,A1 + vx D2,D2,A2 + vx D3,D3,A3 + vx D4,D4,A4 + vx D5,D5,A5 + verllf D0,D0,16 + verllf D1,D1,16 + verllf D2,D2,16 + verllf D3,D3,16 + verllf D4,D4,16 + verllf D5,D5,16 + + vaf C0,C0,D0 + vaf C1,C1,D1 + vaf C2,C2,D2 + vaf C3,C3,D3 + vaf C4,C4,D4 + vaf C5,C5,D5 + vx B0,B0,C0 + vx B1,B1,C1 + vx B2,B2,C2 + vx B3,B3,C3 + vx B4,B4,C4 + vx B5,B5,C5 + verllf B0,B0,12 + verllf B1,B1,12 + verllf B2,B2,12 + verllf B3,B3,12 + verllf B4,B4,12 + verllf B5,B5,12 + + vaf A0,A0,B0 + vaf A1,A1,B1 + vaf A2,A2,B2 + vaf A3,A3,B3 + vaf A4,A4,B4 + vaf A5,A5,B5 + vx D0,D0,A0 + vx D1,D1,A1 + vx D2,D2,A2 + vx D3,D3,A3 + vx D4,D4,A4 + vx D5,D5,A5 + verllf D0,D0,8 + verllf D1,D1,8 + verllf D2,D2,8 + verllf D3,D3,8 + verllf D4,D4,8 + verllf D5,D5,8 + + vaf C0,C0,D0 + vaf C1,C1,D1 + vaf C2,C2,D2 + vaf C3,C3,D3 + vaf C4,C4,D4 + vaf C5,C5,D5 + vx B0,B0,C0 + vx B1,B1,C1 + vx B2,B2,C2 + vx B3,B3,C3 + vx B4,B4,C4 + vx B5,B5,C5 + verllf B0,B0,7 + verllf B1,B1,7 + verllf B2,B2,7 + verllf B3,B3,7 + verllf B4,B4,7 + verllf B5,B5,7 + + vsldb C0,C0,C0,8 + vsldb C1,C1,C1,8 + vsldb C2,C2,C2,8 + vsldb C3,C3,C3,8 + vsldb C4,C4,C4,8 + vsldb C5,C5,C5,8 + vsldb B0,B0,B0,4 + vsldb B1,B1,B1,4 + vsldb B2,B2,B2,4 + vsldb B3,B3,B3,4 + vsldb B4,B4,B4,4 + vsldb B5,B5,B5,4 + vsldb D0,D0,D0,12 + vsldb D1,D1,D1,12 + vsldb D2,D2,D2,12 + vsldb D3,D3,D3,12 + vsldb D4,D4,D4,12 + vsldb D5,D5,D5,12 + + vaf A0,A0,B0 + vaf A1,A1,B1 + vaf A2,A2,B2 + vaf A3,A3,B3 + vaf A4,A4,B4 + vaf A5,A5,B5 + vx D0,D0,A0 + vx D1,D1,A1 + vx D2,D2,A2 + vx D3,D3,A3 + vx D4,D4,A4 + vx D5,D5,A5 + verllf D0,D0,16 + verllf D1,D1,16 + verllf D2,D2,16 + verllf D3,D3,16 + verllf D4,D4,16 + verllf D5,D5,16 + + vaf C0,C0,D0 + vaf C1,C1,D1 + vaf C2,C2,D2 + vaf C3,C3,D3 + vaf C4,C4,D4 + vaf C5,C5,D5 + vx B0,B0,C0 + vx B1,B1,C1 + vx B2,B2,C2 + vx B3,B3,C3 + vx B4,B4,C4 + vx B5,B5,C5 + verllf B0,B0,12 + verllf B1,B1,12 + verllf B2,B2,12 + verllf B3,B3,12 + verllf B4,B4,12 + verllf B5,B5,12 + + vaf A0,A0,B0 + vaf A1,A1,B1 + vaf A2,A2,B2 + vaf A3,A3,B3 + vaf A4,A4,B4 + vaf A5,A5,B5 + vx D0,D0,A0 + vx D1,D1,A1 + vx D2,D2,A2 + vx D3,D3,A3 + vx D4,D4,A4 + vx D5,D5,A5 + verllf D0,D0,8 + verllf D1,D1,8 + verllf D2,D2,8 + verllf D3,D3,8 + verllf D4,D4,8 + verllf D5,D5,8 + + vaf C0,C0,D0 + vaf C1,C1,D1 + vaf C2,C2,D2 + vaf C3,C3,D3 + vaf C4,C4,D4 + vaf C5,C5,D5 + vx B0,B0,C0 + vx B1,B1,C1 + vx B2,B2,C2 + vx B3,B3,C3 + vx B4,B4,C4 + vx B5,B5,C5 + verllf B0,B0,7 + verllf B1,B1,7 + verllf B2,B2,7 + verllf B3,B3,7 + verllf B4,B4,7 + verllf B5,B5,7 + + vsldb C0,C0,C0,8 + vsldb C1,C1,C1,8 + vsldb C2,C2,C2,8 + vsldb C3,C3,C3,8 + vsldb C4,C4,C4,8 + vsldb C5,C5,C5,8 + vsldb B0,B0,B0,12 + vsldb B1,B1,B1,12 + vsldb B2,B2,B2,12 + vsldb B3,B3,B3,12 + vsldb B4,B4,B4,12 + vsldb B5,B5,B5,12 + vsldb D0,D0,D0,4 + vsldb D1,D1,D1,4 + vsldb D2,D2,D2,4 + vsldb D3,D3,D3,4 + vsldb D4,D4,D4,4 + vsldb D5,D5,D5,4 + brct %r0,.Loop_vx + + vaf A0,A0,K0 + vaf B0,B0,K1 + vaf C0,C0,K2 + vaf D0,D0,K3 + vaf A1,A1,K0 + vaf D1,D1,T1 # +K[3]+1 + + vperm A0,A0,A0,BEPERM + vperm B0,B0,B0,BEPERM + vperm C0,C0,C0,BEPERM + vperm D0,D0,D0,BEPERM + + clgfi LEN,0x40 + jl .Ltail_vx + + vaf D2,D2,T2 # +K[3]+2 + vaf D3,D3,T3 # +K[3]+3 + vlm T0,T3,0(INP),0 + + vx A0,A0,T0 + vx B0,B0,T1 + vx C0,C0,T2 + vx D0,D0,T3 + + vlm K0,T3,0(%r7),4 # re-load sigma and increments + + vstm A0,D0,0(OUT),0 + + la INP,0x40(INP) + la OUT,0x40(OUT) + aghi LEN,-0x40 + je .Ldone_vx + + vaf B1,B1,K1 + vaf C1,C1,K2 + + vperm A0,A1,A1,BEPERM + vperm B0,B1,B1,BEPERM + vperm C0,C1,C1,BEPERM + vperm D0,D1,D1,BEPERM + + clgfi LEN,0x40 + jl .Ltail_vx + + vlm A1,D1,0(INP),0 + + vx A0,A0,A1 + vx B0,B0,B1 + vx C0,C0,C1 + vx D0,D0,D1 + + vstm A0,D0,0(OUT),0 + + la INP,0x40(INP) + la OUT,0x40(OUT) + aghi LEN,-0x40 + je .Ldone_vx + + vaf A2,A2,K0 + vaf B2,B2,K1 + vaf C2,C2,K2 + + vperm A0,A2,A2,BEPERM + vperm B0,B2,B2,BEPERM + vperm C0,C2,C2,BEPERM + vperm D0,D2,D2,BEPERM + + clgfi LEN,0x40 + jl .Ltail_vx + + vlm A1,D1,0(INP),0 + + vx A0,A0,A1 + vx B0,B0,B1 + vx C0,C0,C1 + vx D0,D0,D1 + + vstm A0,D0,0(OUT),0 + + la INP,0x40(INP) + la OUT,0x40(OUT) + aghi LEN,-0x40 + je .Ldone_vx + + vaf A3,A3,K0 + vaf B3,B3,K1 + vaf C3,C3,K2 + vaf D2,K3,T3 # K[3]+3 + + vperm A0,A3,A3,BEPERM + vperm B0,B3,B3,BEPERM + vperm C0,C3,C3,BEPERM + vperm D0,D3,D3,BEPERM + + clgfi LEN,0x40 + jl .Ltail_vx + + vaf D3,D2,T1 # K[3]+4 + VLM A1,D1,0(INP),0 + + vx A0,A0,A1 + vx B0,B0,B1 + vx C0,C0,C1 + vx D0,D0,D1 + + vstm A0,D0,0(OUT),0 + + la INP,0x40(INP) + la OUT,0x40(OUT) + aghi LEN,-0x40 + je .Ldone_vx + + vaf A4,A4,K0 + vaf B4,B4,K1 + vaf C4,C4,K2 + vaf D4,D4,D3 # +K[3]+4 + vaf D3,D3,T1 # K[3]+5 + vaf K3,D2,T3 # K[3]+=6 + + vperm A0,A4,A4,BEPERM + vperm B0,B4,B4,BEPERM + vperm C0,C4,C4,BEPERM + vperm D0,D4,D4,BEPERM + + clgfi LEN,0x40 + jl .Ltail_vx + + vlm A1,D1,0(INP),0 + + vx A0,A0,A1 + vx B0,B0,B1 + vx C0,C0,C1 + vx D0,D0,D1 + + vstm A0,D0,0(OUT),0 + + la INP,0x40(INP) + la OUT,0x40(OUT) + aghi LEN,-0x40 + je .Ldone_vx + + vaf A5,A5,K0 + vaf B5,B5,K1 + vaf C5,C5,K2 + vaf D5,D5,D3 # +K[3]+5 + + vperm A0,A5,A5,BEPERM + vperm B0,B5,B5,BEPERM + vperm C0,C5,C5,BEPERM + vperm D0,D5,D5,BEPERM + + clgfi LEN,0x40 + jl .Ltail_vx + + vlm A1,D1,0(INP),0 + + vx A0,A0,A1 + vx B0,B0,B1 + vx C0,C0,C1 + vx D0,D0,D1 + + vstm A0,D0,0(OUT),0 + + la INP,0x40(INP) + la OUT,0x40(OUT) + lhi %r0,10 + aghi LEN,-0x40 + jne .Loop_outer_vx + +.Ldone_vx: + lmg %r6,%r7,FRAME+6*8(SP) + la SP,FRAME(SP) + br %r14 + +.Ltail_vx: + vstm A0,D0,8*8(SP),3 + lghi %r1,0 + +.Loop_tail_vx: + llgc %r5,0(%r1,INP) + llgc %r6,8*8(%r1,SP) + xr %r6,%r5 + stc %r6,0(%r1,OUT) + la %r1,1(%r1) + brct LEN,.Loop_tail_vx + + lmg %r6,%r7,FRAME+6*8(SP) + la SP,FRAME(SP) + br %r14 + + .type chacha20_vx, @function + .size chacha20_vx, . - chacha20_vx + .globl chacha20_vx + +.previous +.section .note.GNU-stack,"",%progbits