From patchwork Fri Sep 6 05:16:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Tokarev X-Patchwork-Id: 825902 Delivered-To: patch@linaro.org Received: by 2002:adf:a345:0:b0:367:895a:4699 with SMTP id d5csp638194wrb; Thu, 5 Sep 2024 22:20:56 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCXvVffXv8RyEder5DZ0Y1RyKMprtE6f0TVrUFt+pfUa3kOjfD2ZQ7DMBdYrD8gSglXH6/mN4Q==@linaro.org X-Google-Smtp-Source: AGHT+IGs+grXz8sMyaP0kohVPITVdyAYD+AC//DKlsXkEKUAYS+mCLL0CCIJ3mkM4n3qmJ/64xXy X-Received: by 2002:a05:6214:2b90:b0:6bf:8890:ae17 with SMTP id 6a1803df08f44-6c52851af66mr15461036d6.53.1725600056086; Thu, 05 Sep 2024 22:20:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725600056; cv=none; d=google.com; s=arc-20240605; b=HY4u/sPeOVjEOCeWpzeGnW19t2M8Ofo2sWv+s6XlJOOpKzsUm112ZkuzN3xxkokiI8 rZLFLLPSDq1TKI7XEqW+iNeVkIE3zu0k8ImkXZGh88qvBQBfYL/LraiIAdFfQy1yXW8L oohxmvfKyl9q6T921rUmuPGcw19mfNKvwJLOY7ythCnM5l0DnMudGpcfO0keA+PCAXC6 dizVKKbabFbmYvwWOho7uLjR1X81beibbRLuVB3rP1qYHJ3ETLQCoVCfGoQaD9iuEeP1 zAUsofZZdM/2y7tmJDn/j7VrYdr+GHJcq5qXnlr/h3Ks1ZcdFVl9FsoOPCpKOm43vrhy xUQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from; bh=LKEvWcM5Bzn5qv1cu3izn+xzgw4xUpU8I/lX0t+5qUs=; fh=xJ1URYKcMN3TM0/XAv5v+aCN+5tIbzAdcfBx5UNgoLw=; b=dAA6LOm0Eknux2uE2rxs2LuQ2rb8vspHzZ9WxXnR7zWkn9s/qFbBqUPiZF85pJJgMK 0wefOaX45pCT+ymHqM5VtWF22Tleue+SCGvnDVs3M6iWrdP+TOroivu60v7bg3BZXIx3 WHM8Hs+4Y8XuJAZPVwFxwSUCAGigiSiaflnDw9bkJ1DAJjztUzMmpIHlbsLWyRBATbLK IftbjqfLdbvBrt3b0xK6GmBm0e80wleM0Dy+AptbtPI5IXs0eyXflVCsP1hTE8zrwgVE gJd+LT+tWal1C4ZSOenr12DrDg98ymMLqCw1tlHUU5wYd8OlUH4w29qNyC/nVDnoXpYu qbOQ==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org" Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id 6a1803df08f44-6c520525962si36458816d6.596.2024.09.05.22.20.55 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Thu, 05 Sep 2024 22:20:56 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org" Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1smRNH-0004UU-V0; Fri, 06 Sep 2024 01:19:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1smRMi-00021F-6j; Fri, 06 Sep 2024 01:18:40 -0400 Received: from isrv.corpit.ru ([86.62.121.231]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1smRMf-00088x-Tm; Fri, 06 Sep 2024 01:18:35 -0400 Received: from tsrv.corpit.ru (tsrv.tls.msk.ru [192.168.177.2]) by isrv.corpit.ru (Postfix) with ESMTP id 8F1198C12B; Fri, 6 Sep 2024 08:15:17 +0300 (MSK) Received: from tls.msk.ru (mjt.wg.tls.msk.ru [192.168.177.130]) by tsrv.corpit.ru (Postfix) with SMTP id 3E20C133370; Fri, 6 Sep 2024 08:16:35 +0300 (MSK) Received: (nullmailer pid 10451 invoked by uid 1000); Fri, 06 Sep 2024 05:16:33 -0000 From: Michael Tokarev To: qemu-devel@nongnu.org Cc: qemu-stable@nongnu.org, Peter Maydell , Richard Henderson , Michael Tokarev Subject: [Stable-7.2.14 22/40] target/arm: Handle denormals correctly for FMOPA (widening) Date: Fri, 6 Sep 2024 08:16:10 +0300 Message-Id: <20240906051633.10288-22-mjt@tls.msk.ru> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=86.62.121.231; envelope-from=mjt@tls.msk.ru; helo=isrv.corpit.ru X-Spam_score_int: -68 X-Spam_score: -6.9 X-Spam_bar: ------ X-Spam_report: (-6.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Peter Maydell The FMOPA (widening) SME instruction takes pairs of half-precision floating point values, widens them to single-precision, does a two-way dot product and accumulates the results into a single-precision destination. We don't quite correctly handle the FPCR bits FZ and FZ16 which control flushing of denormal inputs and outputs. This is because at the moment we pass a single float_status value to the helper function, which then uses that configuration for all the fp operations it does. However, because the inputs to this operation are float16 and the outputs are float32 we need to use the fp_status_f16 for the float16 input widening but the normal fp_status for everything else. Otherwise we will apply the flushing control FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and incorrectly flush a denormal output to zero when we should not (or vice-versa). (In commit 207d30b5fdb5b we tried to fix the FZ handling but didn't get it right, switching from "use FPCR.FZ for everything" to "use FPCR.FZ16 for everything".) (Mjt: it is commit d5373d7bdbee in stable-7.2) Pass the CPU env to the sme_fmopa_h helper instead of an fp_status pointer, and have the helper pass an extra fp_status into the f16_dotadd() function so that we can use the right status for the right parts of this operation. Cc: qemu-stable@nongnu.org Fixes: 207d30b5fdb5 ("target/arm: Use FPST_F16 for SME FMOPA (widening)") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373 Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson (cherry picked from commit 55f9f4ee018c5ccea81d8c8c586756d7711ae46f) Signed-off-by: Michael Tokarev (Mjt: s/tcg_env/cpu_env/ due to missingv 8.1.0-1189-gad75a51e84af "tcg: Rename cpu_env to tcg_env") diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h index d2d544a696..d33fbcd8fd 100644 --- a/target/arm/helper-sme.h +++ b/target/arm/helper-sme.h @@ -122,7 +122,7 @@ DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, ptr, ptr, i32) + void, ptr, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c index f12f3288fd..98a4840970 100644 --- a/target/arm/sme_helper.c +++ b/target/arm/sme_helper.c @@ -1009,12 +1009,23 @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg) } static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, - float_status *s_std, float_status *s_odd) + float_status *s_f16, float_status *s_std, + float_status *s_odd) { - float64 e1r = float16_to_float64(e1 & 0xffff, true, s_std); - float64 e1c = float16_to_float64(e1 >> 16, true, s_std); - float64 e2r = float16_to_float64(e2 & 0xffff, true, s_std); - float64 e2c = float16_to_float64(e2 >> 16, true, s_std); + /* + * We need three different float_status for different parts of this + * operation: + * - the input conversion of the float16 values must use the + * f16-specific float_status, so that the FPCR.FZ16 control is applied + * - operations on float32 including the final accumulation must use + * the normal float_status, so that FPCR.FZ is applied + * - we have pre-set-up copy of s_std which is set to round-to-odd, + * for the multiply (see below) + */ + float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16); + float64 e1c = float16_to_float64(e1 >> 16, true, s_f16); + float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16); + float64 e2c = float16_to_float64(e2 >> 16, true, s_f16); float64 t64; float32 t32; @@ -1036,20 +1047,23 @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, } void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, void *vst, uint32_t desc) + void *vpm, CPUARMState *env, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; uint16_t *pn = vpn, *pm = vpm; - float_status fpst_odd, fpst_std; + float_status fpst_odd, fpst_std, fpst_f16; /* - * Make a copy of float_status because this operation does not - * update the cumulative fp exception status. It also produces - * default nans. Make a second copy with round-to-odd -- see above. + * Make copies of fp_status and fp_status_f16, because this operation + * does not update the cumulative fp exception status. It also + * produces default NaNs. We also need a second copy of fp_status with + * round-to-odd -- see above. */ - fpst_std = *(float_status *)vst; + fpst_f16 = env->vfp.fp_status_f16; + fpst_std = env->vfp.fp_status; set_default_nan_mode(true, &fpst_std); + set_default_nan_mode(true, &fpst_f16); fpst_odd = fpst_std; set_float_rounding_mode(float_round_to_odd, &fpst_odd); @@ -1069,7 +1083,8 @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, uint32_t m = *(uint32_t *)(vzm + H1_4(col)); m = f16mop_adj_pair(m, pcol, 0); - *a = f16_dotadd(*a, n, m, &fpst_std, &fpst_odd); + *a = f16_dotadd(*a, n, m, + &fpst_f16, &fpst_std, &fpst_odd); } col += 4; pcol >>= 4; diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c index 0fcd4ad950..c864bd016c 100644 --- a/target/arm/translate-sme.c +++ b/target/arm/translate-sme.c @@ -376,8 +376,29 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, return true; } -TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, - MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h) +static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, + gen_helper_gvec_5_ptr *fn) +{ + int svl = streaming_vec_reg_size(s); + uint32_t desc = simd_desc(svl, svl, a->sub); + TCGv_ptr za, zn, zm, pn, pm; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + za = get_tile(s, esz, a->zad); + zn = vec_full_reg_ptr(s, a->zn); + zm = vec_full_reg_ptr(s, a->zm); + pn = pred_full_reg_ptr(s, a->pn); + pm = pred_full_reg_ptr(s, a->pm); + + fn(za, zn, zm, pn, pm, cpu_env, tcg_constant_i32(desc)); + return true; +} + +TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a, + MO_32, gen_helper_sme_fmopa_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_FPCR, gen_helper_sme_fmopa_s) TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a,