From patchwork Fri Oct 13 16:24:19 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Alex_Benn=C3=A9e?= X-Patchwork-Id: 115794 Delivered-To: patch@linaro.org Received: by 10.140.22.163 with SMTP id 32csp925192qgn; Fri, 13 Oct 2017 09:32:48 -0700 (PDT) X-Received: by 10.200.43.120 with SMTP id 53mr2994018qtv.127.1507912368824; Fri, 13 Oct 2017 09:32:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507912368; cv=none; d=google.com; s=arc-20160816; b=fy9mKkJnCQz6dVxnc6C0Dzz9K0zzEW9EAfLQ5QS6o87SV3XH+1bu2cgJZ17Gk8XXGg 0fLKbRvcpnwfN/KioTUXUx4zRiA5fJ7H969dh+AbpFy4xxeVJoA4rEjQ2ODMZGF8e2xF m4Fmt9wcmodHEbkOkCqaokvTwCZ6LeJvip8IeTbbulk0Dkfoa072D+r6G1PGZg+m58oo 3HxgmGUC/k17GZj4dXqi/zRTFZbvY75AYvQDWgZVmi89+sdkx3KJIuC7Qj18d3Gsp0FA fhje3ecpiLkKQW/AWYvitcUXo5iRlUqLtooB5hubhmgy31lwaoSmBGnDCThzAEVnVjyI i+KQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=Bkb/DAK/MhmwL4UQX64YPV883SIsP4TZw8zxphtUdEw=; b=x+GAw4G8/oXnb0JLswyyrXosU32WcMtT9S58LNSA0561eSBAgSwL9TyXgbPMZwnv4o 8Ob3ix+t+NmyiRB/qMw0/L/W4IQAZSdZriCJ2Sn6l2weuVi5EIQI3GLYe6J2xeR+dRaC QoeN1gAy0k/izmmJh18dagGXRaAxg4rvotwBS8YxNRC+Bj0obGrO1oZMrgNPdm/rnv2Z eG+Oi19g7WK+Fk+LrWmViXarDlVW3n0TZImGMEB2vgrGsvpI8LY0gXB52dFViHuxaYU/ UzjC+04TTowLY9KjNr9rgZcSRuuc3cTa0uoMykJICyTFLRBNdfiVgTcLewxlHH0eJBzv ORqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=dOFfXf0k; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id k20si659588qtj.431.2017.10.13.09.32.48 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 13 Oct 2017 09:32:48 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=dOFfXf0k; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:51061 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e32tC-0000Uz-8L for patch@linaro.org; Fri, 13 Oct 2017 12:32:46 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41842) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e32ld-0002d2-Qi for qemu-devel@nongnu.org; Fri, 13 Oct 2017 12:24:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e32lY-0006ax-PM for qemu-devel@nongnu.org; Fri, 13 Oct 2017 12:24:57 -0400 Received: from mail-wm0-x233.google.com ([2a00:1450:400c:c09::233]:56123) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1e32lY-0006aR-G7 for qemu-devel@nongnu.org; Fri, 13 Oct 2017 12:24:52 -0400 Received: by mail-wm0-x233.google.com with SMTP id u138so22951877wmu.4 for ; Fri, 13 Oct 2017 09:24:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Bkb/DAK/MhmwL4UQX64YPV883SIsP4TZw8zxphtUdEw=; b=dOFfXf0k6YrSXU8JMcl5Luw77SHTBXfPrCi8OgVQ1mVbxVAFZngDqnAE21f/2ixYVQ la3u/YPYX0nyyPGp0Nu/FE2/PJDL6L1koMFkU1S2QgnuCe6usSh/0zaB/QW9lAc65H5W DjI1E63BfpFURjJJHZMT7HgdY9sKZufrK9lwU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Bkb/DAK/MhmwL4UQX64YPV883SIsP4TZw8zxphtUdEw=; b=CjecAdlzanYLc336zSELtAVSXfRCMeHQzS8Nwdt0Qim/6CnT36e7kSI4fQ+ty5hj1g Tmt4goCsTgx2p45R+3C8eFFfdzUWUC6mW+flVY9ietszYpyHqg2AWuLUHJj5iObjBiiG 50JX0CPtJkBN6ow4ZAtJ7qQyjaHa8Zb0UC0x6yl8vOheR/1CsnofU5CDNmEwouaeAg75 6Ytxz/rn8Y5EWbn8cgh6KRnsslKUNc5zZgeLRmKFrEfO0Q1bPcrGA5XlNa95vCm+zemJ SEfKEOs0PUeGrux8+3L21oZRjsv9outwnkQHXal2X7cvs0FOK7AVueJAdM7jC3k+ge1B uJFA== X-Gm-Message-State: AMCzsaWJpoaHC+pNVmt8Zb3Mdto2MM4OXmSdqd3VhX47JsR/Wm/678kX B8NbLvqyHJVxJliQzodDcnVKKQ== X-Google-Smtp-Source: AOwi7QDXep42gX4oj8a8n1VTo2TmPWw8bdRk4q81jIQrDA3l9xIqkEmLM743VDEZYsmhZ3yRIhD7lQ== X-Received: by 10.28.236.25 with SMTP id k25mr2136788wmh.146.1507911891277; Fri, 13 Oct 2017 09:24:51 -0700 (PDT) Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id v10sm2211799wrb.92.2017.10.13.09.24.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Oct 2017 09:24:48 -0700 (PDT) Received: from zen.linaroharston (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTP id 376273E0AA7; Fri, 13 Oct 2017 17:24:39 +0100 (BST) From: =?utf-8?q?Alex_Benn=C3=A9e?= To: richard.henderson@linaro.org Date: Fri, 13 Oct 2017 17:24:19 +0100 Message-Id: <20171013162438.32458-12-alex.bennee@linaro.org> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171013162438.32458-1-alex.bennee@linaro.org> References: <20171013162438.32458-1-alex.bennee@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c09::233 Subject: [Qemu-devel] [RFC PATCH 11/30] target/arm: implement half-precision F(MIN|MAX)(V|NMV) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org, =?utf-8?q?Alex_Benn?= =?utf-8?b?w6ll?= , qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" This implements the half-precision variants of the across vector reduction operations. This involves a re-factor of the reduction code which more closely matches the ARM ARM order (and handles 8 element reductions). Signed-off-by: Alex Bennée -- v1 - dropped the advsimd_2a stuff --- target/arm/helper-a64.c | 18 ++++++ target/arm/helper-a64.h | 4 ++ target/arm/translate-a64.c | 147 ++++++++++++++++++++++++++++----------------- 3 files changed, 115 insertions(+), 54 deletions(-) -- 2.14.1 diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c index d9df82cff5..a0c20faabc 100644 --- a/target/arm/helper-a64.c +++ b/target/arm/helper-a64.c @@ -537,3 +537,21 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr, return !success; } + +/* + * AdvSIMD half-precision + */ + +#define ADVSIMD_HELPER(name, suffix) HELPER(glue(glue(advsimd_, name), suffix)) + +#define ADVSIMD_HALFOP(name) \ +float16 ADVSIMD_HELPER(name, h)(float16 a, float16 b, void *fpstp) \ +{ \ + float_status *fpst = fpstp; \ + return float16_ ## name(a, b, fpst); \ +} + +ADVSIMD_HALFOP(min) +ADVSIMD_HALFOP(max) +ADVSIMD_HALFOP(minnum) +ADVSIMD_HALFOP(maxnum) diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h index 6f9eaba533..b774431f1f 100644 --- a/target/arm/helper-a64.h +++ b/target/arm/helper-a64.h @@ -44,3 +44,7 @@ DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32) DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32) DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, i64) DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, i64) +DEF_HELPER_3(advsimd_maxh, f16, f16, f16, ptr) +DEF_HELPER_3(advsimd_minh, f16, f16, f16, ptr) +DEF_HELPER_3(advsimd_maxnumh, f16, f16, f16, ptr) +DEF_HELPER_3(advsimd_minnumh, f16, f16, f16, ptr) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index a39b9d3633..1282d14c58 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -5602,26 +5602,80 @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t insn) tcg_temp_free_i64(tcg_resh); } -static void do_minmaxop(DisasContext *s, TCGv_i32 tcg_elt1, TCGv_i32 tcg_elt2, - int opc, bool is_min, TCGv_ptr fpst) -{ - /* Helper function for disas_simd_across_lanes: do a single precision - * min/max operation on the specified two inputs, - * and return the result in tcg_elt1. - */ - if (opc == 0xc) { - if (is_min) { - gen_helper_vfp_minnums(tcg_elt1, tcg_elt1, tcg_elt2, fpst); - } else { - gen_helper_vfp_maxnums(tcg_elt1, tcg_elt1, tcg_elt2, fpst); - } +/* + * do_reduction_op helper + * + * This mirrors the Reduce() pseudocode in the ARM ARM. It is + * important for correct NaN propagation that we do these + * operations in exactly the order specified by the pseudocode. + * + * This is a recursive function, TCG temps should be freed by the + * calling function once it is done with the values. + */ +static TCGv_i32 do_reduction_op(DisasContext *s, int fpopcode, int rn, + int esize, int size, int vmap, TCGv_ptr fpst) +{ + if (esize == size) { + int element; + TCGMemOp msize = esize == 16 ? MO_16 : MO_32; + TCGv_i32 tcg_elem; + + /* We should have one register left here */ + assert(ctpop8(vmap) == 1); + element = ctz32(vmap); + assert(element < 8); + + tcg_elem = tcg_temp_new_i32(); + read_vec_element_i32(s, tcg_elem, rn, element, msize); + return tcg_elem; } else { - assert(opc == 0xf); - if (is_min) { - gen_helper_vfp_mins(tcg_elt1, tcg_elt1, tcg_elt2, fpst); - } else { - gen_helper_vfp_maxs(tcg_elt1, tcg_elt1, tcg_elt2, fpst); + int bits = size / 2; + int shift = ctpop8(vmap) / 2; + int vmap_lo = (vmap >> shift) & vmap; + int vmap_hi = (vmap & ~vmap_lo); + TCGv_i32 tcg_hi, tcg_lo, tcg_res; + + tcg_hi = do_reduction_op(s, fpopcode, rn, esize, bits, vmap_hi, fpst); + tcg_lo = do_reduction_op(s, fpopcode, rn, esize, bits, vmap_lo, fpst); + tcg_res = tcg_temp_new_i32(); + + /* base fpopcode = 0x0c NMV, 0x0f V + 0x10 MIN, 0x00 MAX + 0x20 F32, 0x00 FP16 + */ + switch(fpopcode) { + case 0x0c: /* fmaxnmv half-precision */ + gen_helper_advsimd_maxnumh(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x0f: /* fmaxv half-precision */ + gen_helper_advsimd_maxh(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x1c: /* fminnmv half-precision */ + gen_helper_advsimd_minnumh(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x1f: /* fminv half-precision */ + gen_helper_advsimd_minh(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x2c: /* fmaxnmv */ + gen_helper_vfp_maxnums(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x2f: /* fmaxv */ + gen_helper_vfp_maxs(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x3c: /* fminnmv */ + gen_helper_vfp_minnums(tcg_res, tcg_lo, tcg_hi, fpst); + break; + case 0x3f: /* fminv */ + gen_helper_vfp_mins(tcg_res, tcg_lo, tcg_hi, fpst); + break; + default: + fprintf(stderr, "%s: fpopcode %x not handled\n", __func__, fpopcode); + break; } + + tcg_temp_free_i32(tcg_hi); + tcg_temp_free_i32(tcg_lo); + return tcg_res; } } @@ -5663,16 +5717,21 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) break; case 0xc: /* FMAXNMV, FMINNMV */ case 0xf: /* FMAXV, FMINV */ - if (!is_u || !is_q || extract32(size, 0, 1)) { - unallocated_encoding(s); - return; - } - /* Bit 1 of size field encodes min vs max, and actual size is always - * 32 bits: adjust the size variable so following code can rely on it + /* Bit 1 of size field encodes min vs max and the actual size + * depends on the encoding of the U bit. If not set (and FP16 + * enabled) then we do half-precision float instead of single + * precision. */ is_min = extract32(size, 1, 1); is_fp = true; - size = 2; + if (!is_u && arm_dc_feature(s, ARM_FEATURE_V8_FP16)) { + size = 1; + } else if (!is_u || !is_q || extract32(size, 0, 1)) { + unallocated_encoding(s); + return; + } else { + size = 2; + } break; default: unallocated_encoding(s); @@ -5729,38 +5788,18 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) } } else { - /* Floating point ops which work on 32 bit (single) intermediates. + /* Floating point vector reduction ops which work across 32 + * bit (single) or 16 bit (half-precision) intermediates. * Note that correct NaN propagation requires that we do these * operations in exactly the order specified by the pseudocode. */ - TCGv_i32 tcg_elt1 = tcg_temp_new_i32(); - TCGv_i32 tcg_elt2 = tcg_temp_new_i32(); - TCGv_i32 tcg_elt3 = tcg_temp_new_i32(); TCGv_ptr fpst = get_fpstatus_ptr(); - - assert(esize == 32); - assert(elements == 4); - - read_vec_element(s, tcg_elt, rn, 0, MO_32); - tcg_gen_extrl_i64_i32(tcg_elt1, tcg_elt); - read_vec_element(s, tcg_elt, rn, 1, MO_32); - tcg_gen_extrl_i64_i32(tcg_elt2, tcg_elt); - - do_minmaxop(s, tcg_elt1, tcg_elt2, opcode, is_min, fpst); - - read_vec_element(s, tcg_elt, rn, 2, MO_32); - tcg_gen_extrl_i64_i32(tcg_elt2, tcg_elt); - read_vec_element(s, tcg_elt, rn, 3, MO_32); - tcg_gen_extrl_i64_i32(tcg_elt3, tcg_elt); - - do_minmaxop(s, tcg_elt2, tcg_elt3, opcode, is_min, fpst); - - do_minmaxop(s, tcg_elt1, tcg_elt2, opcode, is_min, fpst); - - tcg_gen_extu_i32_i64(tcg_res, tcg_elt1); - tcg_temp_free_i32(tcg_elt1); - tcg_temp_free_i32(tcg_elt2); - tcg_temp_free_i32(tcg_elt3); + int fpopcode = opcode | is_min << 4 | is_u << 5; + int vmap = (1 << elements) - 1; + TCGv_i32 tcg_res32 = do_reduction_op(s, fpopcode, rn, esize, + (is_q ? 128 : 64), vmap, fpst); + tcg_gen_extu_i32_i64(tcg_res, tcg_res32); + tcg_temp_free_i32(tcg_res32); tcg_temp_free_ptr(fpst); } @@ -5882,7 +5921,7 @@ static void handle_simd_dupg(DisasContext *s, int is_q, int rd, int rn, { int size = ctz32(imm5); int esize = 8 << size; - int elements = (is_q ? 128 : 64)/esize; + int elements = (is_q ? 128 : 64) / esize; int i = 0; if (size > 3 || ((size == 3) && !is_q)) {