From patchwork Sat Feb 1 16:39:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 861229 Delivered-To: patch@linaro.org Received: by 2002:adf:fb05:0:b0:385:e875:8a9e with SMTP id c5csp1262121wrr; Sat, 1 Feb 2025 08:40:58 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCWGXBWGq7cAYbykR6oFfxhBoA0Oxj74EZr2sFuAwbSl4bta/6Ur3H1+SOvUVWujM9jYEAZrHw==@linaro.org X-Google-Smtp-Source: AGHT+IEBbRBnmL+RGv5DQ7Hcy6i8r//0Kgxb/TdbjTT1C2u5MfGWPWITNmET2o6c8X8ZmyPZ4Bpk X-Received: by 2002:a05:6214:f2a:b0:6d8:9bbe:392d with SMTP id 6a1803df08f44-6e243b98cd6mr181409916d6.6.1738428058104; Sat, 01 Feb 2025 08:40:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1738428058; cv=none; d=google.com; s=arc-20240605; b=euY8z8Uoz/1izsgcfjBiXi65ZokYoBXEBQjKUcMZNlfWWX9yk2M/x8YH9f5O3yp62k V22MNhjxuy3QRVYBkP3musQHwLw0IIKLT7P4JVnBv+dJ25em7pUQw3nuosmYOKuIk+Vg vzPqjLbGdI87uGXst6/WKHfN5fYI28e44B4j4cerl/XBnE5ki7OVBnNJ1qqN8TXtInYc aYULUCg8hdtK1JamVaFNWFUDqgANijUg/G+IjVCBfcpGoYcCC3PrIPrS7cDyDur/dDcR bOF2EEpHz5nG6DkJbNbai51Jmo3NsB/FJLuX2W86c7QbTBNudRio6ix+yMDD1pI4UUeQ 0JkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:to:from:dkim-signature; bh=VeJ7SEhgewX+In8e+Fd9CTGEg1cGsHs5t+QGi9HKAY8=; fh=BdqETvZ8lykIGUljtPyVGdvioY03v/G3uUeEzc7Nv7M=; b=S/2uWwb0ygHmjFjM2JOfea//6Z56bKgjqxHOeOnpsAIdxdbVE+q6Ykh0u6wNmVjn66 7CgV3Moj3S0VfhkP0kblgvQXILxrmKFE/yL7vIbpeu6mz2frOR7PfKmypNgemIlXA7Fc 9HWii/2BDFaJUTJuA/0HzZ86V20YgGvYa+N2JTAQ3enM+ystHRJ07ezX9KYkBlYvRbUN uwr323jtFMYHFLKsEK5Nch4wExk4gG0LcKeIbK5czDdXCUqd/7HDYn+SNY1vLobC1n8n 6Z9IbSM/U0hGrpjR4v1iQ5wfjV9nX0zY1tasq9sx7FS5nuf4uhsY8570Oe5mnVF5IMLN 5qVg==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="iiOyT/ZT"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org; dara=neutral header.i=@linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id 6a1803df08f44-6e2549c41b4si56374096d6.381.2025.02.01.08.40.57 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Sat, 01 Feb 2025 08:40:58 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="iiOyT/ZT"; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org; dara=neutral header.i=@linaro.org Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1teGXd-0005wf-VZ; Sat, 01 Feb 2025 11:40:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1teGXc-0005wM-Bu for qemu-devel@nongnu.org; Sat, 01 Feb 2025 11:40:20 -0500 Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1teGXa-00011W-4A for qemu-devel@nongnu.org; Sat, 01 Feb 2025 11:40:20 -0500 Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-4361b0ec57aso31074665e9.0 for ; Sat, 01 Feb 2025 08:40:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1738428016; x=1739032816; darn=nongnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=VeJ7SEhgewX+In8e+Fd9CTGEg1cGsHs5t+QGi9HKAY8=; b=iiOyT/ZTedGNiQRDaJBaHhO6ZNugMe2FhhzOGtUHMz+4ys1Z0kMRplgxuSlPyQ1Pyy 8x8X0ULObt62pupW2KXPczi7Ur9KH3E+o9MC0AKZXPf5qJhu9K/nILbu4Tb/B2AtE0eq X/XeqiaqGNY+naDs6XRV6Dxz9QJY5b9jXHItQL2bsIH5358om9neBoO0SE1hRat+0IgA aD+urZcosC6wq1t8pPmpgKhSykreBfapf+zBsglWLuwlTLmjidbvPqzHEHzJT1IngICy Qv9WP2c/rXxCxup7WtKIli8c2nuz1Ql2O3up563Y56Wwhqz8fIdHymSYwdebBaA7XKlD MetA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738428016; x=1739032816; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VeJ7SEhgewX+In8e+Fd9CTGEg1cGsHs5t+QGi9HKAY8=; b=A14pGMlFjdnktLDq/ZS5XRJff2138HC3Z8PVKCSzpSQQ5D2aJAkvoh/KIdTaiyoHlq Ug/DCirsFw90FnJEn05vXQ3TyOKShLrswCmgjwLQ7BtkVXCnJOJUNEydTeIuuD1ZDqiT Lass92WWvJI1eZg23uNa1A7At0Kanvx1ZFcfc8t78GsBiH5cLLGNsRtVh5fkMaHOBvaL kLppjp7FWRjotnYRJBWlCIkB9v9YnmkKcHSX4sT1+oBzsjXKqEEA4OVLKidXWREoIn9R nmqI6fG4yBApDgI/RJ0eESSO/ssD6svanNgXWYuUoyipL1GwZG5wdmcODhrbWivbMP0e iXFQ== X-Forwarded-Encrypted: i=1; AJvYcCU4IWQ8KEj3+j73xCTDTAqNzyYXnmjaKbaIjjCnxLEVf7anDSce/XC+nY30SZArYVSSRAioC0Wx4FMf@nongnu.org X-Gm-Message-State: AOJu0Yzyxkauk3poYvP3evT+KSaM3YkhSKAL28Ta/Ak9vyKViyhkuK0w 8qHg1hYygWQcLFfPDEDAbZgGF8aFZox+maNMuRh+wMDG85Er+eKT4fAOX2mcOw9JNrC2vMtgUPX v X-Gm-Gg: ASbGncvf1NLM8XPykXvVReXeU9QyBDVOjfjve+kqlU4yLe2DItpAlip46zDUVxnIL9J E4L5QXw+E24FANDPCfCYcLLxhH3l9CNqLTyhuHyjPiAruGg/Rjm4Csq5Wh5ygaqXNIqWHrUWf22 yproZhld6zwGP9+E/euEhwoP9m4jxNpdDP9QyzUZi7loOAVlY0xq+FySsMn91OGslQTfSlT+uQ8 rJSoF3Rsgj50eW4CHfTsZEVjL3pegofocNN67WFQpjYvC+u0hwQwcFruhekTau4PBC6AShJQ8Ho m31xJW4zAiRS3rAWkrbE X-Received: by 2002:a05:600c:3c88:b0:434:9c60:95a3 with SMTP id 5b1f17b1804b1-438dc3c2992mr171453985e9.11.1738428015675; Sat, 01 Feb 2025 08:40:15 -0800 (PST) Received: from orth.archaic.org.uk (orth.archaic.org.uk. [2001:8b0:1d0::2]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-438dcc81d74sm127401525e9.37.2025.02.01.08.40.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 01 Feb 2025 08:40:13 -0800 (PST) From: Peter Maydell To: qemu-arm@nongnu.org, qemu-devel@nongnu.org Subject: [PATCH v2 00/69] target/arm: FEAT_AFP and FEAT_RPRES Date: Sat, 1 Feb 2025 16:39:03 +0000 Message-Id: <20250201164012.1660228-1-peter.maydell@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Received-SPF: pass client-ip=2a00:1450:4864:20::336; envelope-from=peter.maydell@linaro.org; helo=mail-wm1-x336.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES extensions, which are floating-point related. (Summary of what these are exactly is at the bottom of the cover letter.) If you'd rather have these patches as a git branch: https://git.linaro.org/people/pmaydell/qemu-arm.git feat-afp with human readable web view at: https://git.linaro.org/people/peter.maydell/qemu-arm.git/log/?h=feat-afp Changes between v1 and v2: * first part of the series has been upstreamed * I've left the first two x86 patches in here, just to avoid having to use a Based-on: tag. They've both been taken by Paolo already, they just haven't landed upstream yet. * the tail-end patches fixing x86 denormal support are not posted here (indeed I didn't mean to send them in v1!); I'll send those separately once the underlying softfloat patches are upstream * the renaming of the FPST_ constants (already upstream) is carried through into these patches * name changes in the "allow flushing of output denormals to be after rounding" patch: now set_float_ftz_detection(), get_float_ftz_detection(), float_ftz_after_rounding and float_ftz_before_rounding * moved select_fpst to translate-a64.h and renamed to select_ah_fpst * use vec_full_reg_offset() in the write_fp_*reg_merging fns * drop no-longer-nedeed float*_input_flush2() calls in the float*_hs_compare() fns in "implement float_flag_input_denormal_used" * adopted RTH's patchset, by a mix of merging in fixes to my patches and adding his (partly on the end, and partly sorted into the series at appropriate places). I updated commit messages in a few places (notably standardising them onto "Handle X for " rather than "for ") Patches that still need review: 04 fpu: Implement float_flag_input_denormal_used 05 fpu: allow flushing of output denormals to be after rounding 06 target/arm: Define FPCR AH, FIZ, NEP bits RTH: I kept your r-by tags on the patches where I squashed in your fixes from your followup series (mostly this is the changes to use the muladd flags). If you want to re-review to check that I did the squashing right, those are patches: 37 target/arm: Handle FPCR.AH in negation steps in SVE FCADD 38 target/arm: Handle FPCR.AH in negation steps in FCADD 41 target/arm: Handle FPCR.AH in negation step in FMLS (indexed) 42 target/arm: Handle FPCR.AH in negation in FMLS (vector) 43 target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector) 44 target/arm: Handle FPCR.AH in SVE FTSSEL 45 target/arm: Handle FPCR.AH in SVE FTMAD Summary of what FEAT_AFP/FEAT_RPRES are, from v1 cover letter: FEAT_AFP defines three new control bits in the FPCR, whose operations are basically independent of each other: * FPCR.AH: "alternate floating point mode"; this changes floating point behaviour in a variety of ways, including: - the sign of a default NaN is 1, not 0 - if FPCR.FZ is also 1, denormals detected after rounding with an unbounded exponent has been applied are flushed to zero - FPCR.FZ does not cause denormalized inputs to be flushed to zero - miscellaneous other corner-case behaviour changes * FPCR.FIZ: flush denormalized numbers to zero on input for most instructions * FPCR.NEP: makes scalar SIMD operations merge the result with higher vector elements in one of the source registers, instead of zeroing the higher elements of the destination FEAT_RPRES makes single-precision FRECPE and FRSQRTE use a 12-bit mantissa precision instead of 8-bit when FPCR.AH is set. thanks -- PMM Peter Maydell (50): target/i386: Do not raise Invalid for 0 * Inf + QNaN tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases fpu: Add float_class_denormal fpu: Implement float_flag_input_denormal_used fpu: allow flushing of output denormals to be after rounding target/arm: Define FPCR AH, FIZ, NEP bits target/arm: Implement FPCR.FIZ handling target/arm: Adjust FP behaviour for FPCR.AH = 1 target/arm: Adjust exception flag handling for AH = 1 target/arm: Add FPCR.AH to tbflags target/arm: Set up float_status to use for FPCR.AH=1 behaviour target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS target/arm: Use FPST_FPCR_AH for BFCVT* insns target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns target/arm: Add FPCR.NEP to TBFLAGS target/arm: Define and use new write_fp_*reg_merging() functions target/arm: Handle FPCR.NEP for 3-input scalar operations target/arm: Handle FPCR.NEP for BFCVT scalar target/arm: Handle FPCR.NEP for 1-input scalar operations target/arm: Handle FPCR.NEP in do_cvtf_scalar() target/arm: Handle FPCR.NEP for scalar FABS and FNEG target/arm: Handle FPCR.NEP for FCVTXN (scalar) target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX target/arm: Implement FPCR.AH semantics for FMAXV and FMINV target/arm: Implement FPCR.AH semantics for FMINP and FMAXP target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector target/arm: Implement FPCR.AH handling of negation of NaN target/arm: Implement FPCR.AH handling for scalar FABS and FABD target/arm: Handle FPCR.AH in vector FABD target/arm: Handle FPCR.AH in SVE FNEG target/arm: Handle FPCR.AH in SVE FABS target/arm: Handle FPCR.AH in SVE FABD target/arm: Handle FPCR.AH in negation steps in SVE FCADD target/arm: Handle FPCR.AH in negation steps in FCADD target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns target/arm: Handle FPCR.AH in negation step in FMLS (indexed) target/arm: Handle FPCR.AH in negation in FMLS (vector) target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector) target/arm: Handle FPCR.AH in SVE FTSSEL target/arm: Handle FPCR.AH in SVE FTMAD target/arm: Enable FEAT_AFP for '-cpu max' target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper target/arm: Implement increased precision FRECPE target/arm: Implement increased precision FRSQRTE target/arm: Enable FEAT_RPRES for -cpu max Richard Henderson (19): target/arm: Handle FPCR.AH in vector FCMLA target/arm: Handle FPCR.AH in FCMLA by index target/arm: Handle FPCR.AH in SVE FCMLA target/arm: Handle FPCR.AH in FMLSL (by element and vector) target/arm: Handle FPCR.AH in SVE FMLSL (indexed) target/arm: Handle FPCR.AH in SVE FMLSLB, FMLSLT (vectors) target/arm: Introduce CPUARMState.vfp.fp_status[] target/arm: Remove standard_fp_status_f16 target/arm: Remove standard_fp_status target/arm: Remove ah_fp_status_f16 target/arm: Remove ah_fp_status target/arm: Remove fp_status_f16_a64 target/arm: Remove fp_status_f16_a32 target/arm: Remove fp_status_a64 target/arm: Remove fp_status_a32 target/arm: Simplify fp_status indexing in mve_helper.c target/arm: Simplify DO_VFP_cmp in vfp_helper.c target/arm: Read fz16 from env->vfp.fpcr target/arm: Sink fp_status and fpcr access into do_fmlal* docs/system/arm/emulation.rst | 2 + include/fpu/softfloat-helpers.h | 11 + include/fpu/softfloat-types.h | 41 +- target/arm/cpu-features.h | 10 + target/arm/cpu.h | 97 ++-- target/arm/helper.h | 26 + target/arm/internals.h | 6 + target/arm/tcg/helper-a64.h | 13 + target/arm/tcg/helper-sve.h | 120 +++++ target/arm/tcg/translate-a64.h | 13 + target/arm/tcg/translate.h | 54 +-- target/arm/tcg/vec_internal.h | 35 ++ target/mips/fpu_helper.h | 6 + fpu/softfloat.c | 66 ++- target/alpha/cpu.c | 7 + target/arm/cpu.c | 46 +- target/arm/helper.c | 2 +- target/arm/tcg/cpu64.c | 2 + target/arm/tcg/helper-a64.c | 151 +++--- target/arm/tcg/hflags.c | 13 + target/arm/tcg/mve_helper.c | 44 +- target/arm/tcg/sme_helper.c | 4 +- target/arm/tcg/sve_helper.c | 367 +++++++++++---- target/arm/tcg/translate-a64.c | 782 +++++++++++++++++++++++++------ target/arm/tcg/translate-sve.c | 193 ++++++-- target/arm/tcg/vec_helper.c | 387 ++++++++++----- target/arm/vfp_helper.c | 372 ++++++++++++--- target/hppa/fpu_helper.c | 11 + target/i386/tcg/fpu_helper.c | 13 +- target/mips/msa.c | 9 + target/ppc/cpu_init.c | 3 + target/rx/cpu.c | 8 + target/sh4/cpu.c | 8 + target/tricore/helper.c | 1 + tests/fp/fp-bench.c | 1 + tests/tcg/x86_64/fma.c | 109 +++++ fpu/softfloat-parts.c.inc | 132 +++++- tests/tcg/x86_64/Makefile.target | 1 + 38 files changed, 2452 insertions(+), 714 deletions(-) create mode 100644 tests/tcg/x86_64/fma.c