From patchwork Fri Feb 15 19:22:59 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Henderson <richard.henderson@linaro.org>
X-Patchwork-Id: 158542
Delivered-To: patch@linaro.org
Received: by 2002:a02:48:0:0:0:0:0 with SMTP id 69csp1065287jaa;
 Fri, 15 Feb 2019 11:29:00 -0800 (PST)
X-Google-Smtp-Source: AHgI3IZJ1sHxIaPNBJhAngrvrkk/AnLfe+Etb3GnDKvsYXYz1k2oVn2UVlfuBGug4tRKbnD0rLI5
X-Received: by 2002:a25:2417:: with SMTP id k23mr9213220ybk.454.1550258940444; 
 Fri, 15 Feb 2019 11:29:00 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1550258940; cv=none;
 d=google.com; s=arc-20160816;
 b=rMgsKZ0DLOJybPLjMcpeiiMAakXVv5W70kE3YcMVu1DeCbnvmNT78PQEum1L4tOe55
 ZFeEAtA05zOMwC6XLcIzxfmxe2awLSG52NKku3cz9vJsfgfl7hj1f3lMc+n10xYZa43i
 HCtl5NyAXg2YornRd6yN6pKNL/pWPAzIxJ5vtw3K7ULbH+//RMVup7LoILz4ha7SDwxN
 Cn0BajfqrwPvsePNxOU0pDOvtItTwjl9o8V5VwvWXz2L9dFvsZRkhgHfyKYFTcUABytn
 EwyCn3XRc5bVNJ4SDwLR2JRSoB5R0tMK9IFUmx+XHAjgyGUmlR0olSLf60+lj4HK8ewt
 Lelw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive
 :list-unsubscribe:list-id:precedence:subject:references:in-reply-to
 :message-id:date:to:from:dkim-signature;
 bh=y5Stu1HfbXPLKVYHkJvn73UH8fPmTrC6pSpjgP9O8jo=;
 b=Fevjs2WoIyewusQfYZSigiyupjycUrdON8OueDVm5xGUzO+OCOi8INyBInr8U7CMqj
 v7iI2CvdT7mpkYI6hb2oXMUtcy/iAplaFGiSPXBE2pq3oNUiEeFlrXjOdQWOOBHP04PG
 CFj32ggAGZNqlfwnawHPzaYfjBwUiSWmmKb2hzBmDrj5OprljtRAky3FMPzB/AWkY+GW
 PMNrzV+NTsUS40fMNR0Lr1xL2v3e9KPu7USTA4kqNBcOoE0oYvkaLAGs+jqprNqtTJlC
 lNrwq8LIxXEZKU6+SSdfsRw7KQCjj5V92hwxej6rxpfhfLJHjzeqQR0hfzRcMcC0bUya
 V5Uw==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=fail header.i=@linaro.org header.s=google header.b=AIMl9KtX;
 spf=pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 209.51.188.17 as permitted sender)
 smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org";
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17])
 by mx.google.com with ESMTPS id
 p145si2551400ybc.329.2019.02.15.11.29.00 for <patch@linaro.org>
 (version=TLS1 cipher=AES128-SHA bits=128/128);
 Fri, 15 Feb 2019 11:29:00 -0800 (PST)
Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 209.51.188.17 as permitted sender) client-ip=209.51.188.17; 
Authentication-Results: mx.google.com;
 dkim=fail header.i=@linaro.org header.s=google header.b=AIMl9KtX;
 spf=pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates
 209.51.188.17 as permitted sender)
 smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org";
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: from localhost ([127.0.0.1]:45108 helo=lists.gnu.org)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <qemu-devel-bounces+patch=linaro.org@nongnu.org>)
 id 1gujAR-0001QJ-VB
 for patch@linaro.org; Fri, 15 Feb 2019 14:28:59 -0500
Received: from eggs.gnu.org ([209.51.188.92]:33339)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <richard.henderson@linaro.org>) id 1guj4s-0005NM-3w
 for qemu-devel@nongnu.org; Fri, 15 Feb 2019 14:23:15 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <richard.henderson@linaro.org>) id 1guj4r-0003Fs-1F
 for qemu-devel@nongnu.org; Fri, 15 Feb 2019 14:23:14 -0500
Received: from mail-pg1-x541.google.com ([2607:f8b0:4864:20::541]:45974)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <richard.henderson@linaro.org>)
 id 1guj4q-0003FR-RR
 for qemu-devel@nongnu.org; Fri, 15 Feb 2019 14:23:12 -0500
Received: by mail-pg1-x541.google.com with SMTP id y4so5233420pgc.12
 for <qemu-devel@nongnu.org>; Fri, 15 Feb 2019 11:23:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=y5Stu1HfbXPLKVYHkJvn73UH8fPmTrC6pSpjgP9O8jo=;
 b=AIMl9KtXDk+hy9hp6aMc9QbwIxlnUdGk096zx6IdiSHzryoVMDQ7+G21Iy6jcg6eoN
 c5N2Ib2AbIvgn2OSYHDiBFKXl45qI4PBMYKtTo9o1DhLveiClg6pTq/VbpW7TzSYhK9b
 3GjhsfLsazb2NbeG5TmRAQzgir92xEUodgdt+1G3A66BRuuHT5swNbakAaKQL4c4YV/w
 dJu1srhEw2xNpfe7uyLLhwl05wnkqk7nGkrSaaMS//HH7zcWqdNuaHLVe5s2Hi5jJIuC
 cYSA4M9UlKhEq5Gq9TcwD2QcWuc21VH7gmEQ5E1lucrpECPUNTd5rq58R3TSKIb/feIy
 mCgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=y5Stu1HfbXPLKVYHkJvn73UH8fPmTrC6pSpjgP9O8jo=;
 b=T69mfpznh33gcWjs4x3tVqAC1DRINZyQSfANajDu4IwaqGn5tEtszvZTdIghTnIoz8
 5mMkTHuHMg/tAnTpWSVs8ot2RLNq1Wzx4KuuTj0Pqdn9ii15qjlDwRgP1a2t1KWdbhFv
 7g/zKZfXwNU/wdwdjEWaDiwYROQ3Fx4jRgY2zjxLEWe/hip6HDlzV4lO8fAM9DN60PW4
 tQgjw2DJQ0wi6apYbSenO/i4VyohsBGt627gVUpEU8LfZ7RCnIGzoBlzP7eR+Fkm4o9p
 VHjpPnqhXWDV16SabPJ1gdqL29nFB958ymOa+wc9+ecaDsfOL5WXZujCnIlhXDwxJPVD
 bPFA==
X-Gm-Message-State: AHQUAuZbBH9DV67U7Gs19CzJb0n01WUPycdOP6zBDLVTEXo5TWkdRXZB
 xJe/yfboSUrH3i4Oij8mh6mPMY0AdpA=
X-Received: by 2002:a63:c948:: with SMTP id y8mr6958197pgg.263.1550258591391; 
 Fri, 15 Feb 2019 11:23:11 -0800 (PST)
Received: from cloudburst.twiddle.net (97-113-188-82.tukw.qwest.net.
 [97.113.188.82]) by smtp.gmail.com with ESMTPSA id
 o85sm15161596pfi.105.2019.02.15.11.23.10
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Fri, 15 Feb 2019 11:23:10 -0800 (PST)
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Fri, 15 Feb 2019 11:22:59 -0800
Message-Id: <20190215192302.27855-6-richard.henderson@linaro.org>
X-Mailer: git-send-email 2.17.2
In-Reply-To: <20190215192302.27855-1-richard.henderson@linaro.org>
References: <20190215192302.27855-1-richard.henderson@linaro.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-Received-From: 2607:f8b0:4864:20::541
Subject: [Qemu-devel] [PATCH v4 5/8] target/arm: Add helpers for FMLAL
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: peter.maydell@linaro.org
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Note that float16_to_float32 rightly squashes SNaN to QNaN.
But of course pickNaNMulAdd, for ARM, selects SNaNs first.
So we have to preserve SNaN long enough for the correct NaN
to be selected.  Thus float16_to_float32_by_bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h     |   5 ++
 target/arm/vec_helper.c | 114 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 119 insertions(+)

-- 
2.17.2

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 747cb64d29..03a613a00b 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -677,6 +677,11 @@ DEF_HELPER_FLAGS_5(gvec_sqsub_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_sqsub_d, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_fmlal_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fmlal_idx_h, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index dfc635cf9a..224e5315b1 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -898,3 +898,117 @@ void HELPER(gvec_sqsub_d)(void *vd, void *vq, void *vn,
     }
     clear_tail(d, oprsz, simd_maxsz(desc));
 }
+
+/*
+ * Convert float16 to float32, raising no exceptions and
+ * preserving exceptional values, including SNaN.
+ * This is effectively an unpack+repack operation.
+ */
+static float32 float16_to_float32_by_bits(uint32_t f16)
+{
+    const int f16_bias = 15;
+    const int f32_bias = 127;
+    uint32_t sign = extract32(f16, 15, 1);
+    uint32_t exp = extract32(f16, 10, 5);
+    uint32_t frac = extract32(f16, 0, 10);
+
+    if (exp == 0x1f) {
+        /* Inf or NaN */
+        exp = 0xff;
+    } else if (exp == 0) {
+        /* Zero or denormal.  */
+        if (frac != 0) {
+            /*
+             * Denormal; these are all normal float32.
+             * Shift the fraction so that the msb is at bit 11,
+             * then remove bit 11 as the implicit bit of the
+             * normalized float32.  Note that we still go through
+             * the shift for normal numbers below, to put the
+             * float32 fraction at the right place.
+             */
+            int shift = clz32(frac) - 21;
+            frac = (frac << shift) & 0x3ff;
+            exp = f32_bias - f16_bias - shift + 1;
+        }
+    } else {
+        /* Normal number; adjust the bias.  */
+        exp += f32_bias - f16_bias;
+    }
+    sign <<= 31;
+    exp <<= 23;
+    frac <<= 23 - 10;
+
+    return sign | exp | frac;
+}
+
+static uint64_t load4_f16(uint64_t *ptr, int is_q, int is_2)
+{
+    /*
+     * Branchless load of u32[0], u64[0], u32[1], or u64[1].
+     * Load the 2nd qword iff is_q & is_2.
+     * Shift to the 2nd dword iff !is_q & is_2.
+     * For !is_q & !is_2, the upper bits of the result are garbage.
+     */
+    return ptr[is_q & is_2] >> ((is_2 & ~is_q) << 5);
+}
+
+/*
+ * Note that FMLAL requires oprsz == 8 or oprsz == 16,
+ * as there is not yet SVE versions that might use blocking.
+ */
+
+void HELPER(gvec_fmlal_h)(void *vd, void *vn, void *vm,
+                          void *fpst, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    int is_q = oprsz == 16;
+    float32 *d = vd;
+    uint64_t n_4, m_4;
+
+    /* Pre-load all of the f16 data, avoiding overlap issues.  */
+    n_4 = load4_f16(vn, is_q, is_2);
+    m_4 = load4_f16(vm, is_q, is_2);
+
+    /* Negate all inputs for FMLSL at once.  */
+    if (is_s) {
+        n_4 ^= 0x8000800080008000ull;
+    }
+
+    for (i = 0; i < oprsz / 4; i++) {
+        float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16));
+        float32 m_1 = float16_to_float32_by_bits(m_4 >> (i * 16));
+        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
+    }
+    clear_tail(d, oprsz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_fmlal_idx_h)(void *vd, void *vn, void *vm,
+                              void *fpst, uint32_t desc)
+{
+    intptr_t i, oprsz = simd_oprsz(desc);
+    int is_s = extract32(desc, SIMD_DATA_SHIFT, 1);
+    int is_2 = extract32(desc, SIMD_DATA_SHIFT + 1, 1);
+    int index = extract32(desc, SIMD_DATA_SHIFT + 2, 3);
+    int is_q = oprsz == 16;
+    float32 *d = vd;
+    uint64_t n_4;
+    float32 m_1;
+
+    /* Pre-load all of the f16 data, avoiding overlap issues.  */
+    n_4 = load4_f16(vn, is_q, is_2);
+
+    /* Negate all inputs for FMLSL at once.  */
+    if (is_s) {
+        n_4 ^= 0x8000800080008000ull;
+    }
+
+    m_1 = float16_to_float32_by_bits(((float16 *)vm)[H2(index)]);
+
+    for (i = 0; i < oprsz / 4; i++) {
+        float32 n_1 = float16_to_float32_by_bits(n_4 >> (i * 16));
+        d[H4(i)] = float32_muladd(n_1, m_1, d[H4(i)], 0, fpst);
+    }
+    clear_tail(d, oprsz, simd_maxsz(desc));
+}