mbox series

[00/76] target/arm: Implement FEAT_AFP and FEAT_RPRES

Message ID 20250124162836.2332150-1-peter.maydell@linaro.org
Headers show
Series target/arm: Implement FEAT_AFP and FEAT_RPRES | expand

Message

Peter Maydell Jan. 24, 2025, 4:27 p.m. UTC
This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES
extensions, which are floating-point related. It's based on the
small i386 bugfix series I sent out a while back:

Based-on: 20250116112536.4117889-1-peter.maydell@linaro.org
("target/i386: Fix 0 * Inf + QNaN regression")

(It would also have been based on an initial refactoring series
I sent out on Monday, but AFAICT the list just ate those emails
and they never arrived anywhere :-(  So you get a bigger series
here than I'd hoped.)

If you'd rather have these patches as a git branch:
 https://git.linaro.org/people/pmaydell/qemu-arm.git  feat-afp
with human readable web view at:
 https://git.linaro.org/people/peter.maydell/qemu-arm.git/log/?h=feat-afp


FEAT_AFP defines three new control bits in the FPCR, whose
operations are basically independent of each other:
 * FPCR.AH: "alternate floating point mode"; this changes floating
   point behaviour in a variety of ways, including:
    - the sign of a default NaN is 1, not 0
    - if FPCR.FZ is also 1, denormals detected after rounding
      with an unbounded exponent has been applied are flushed to zero
    - FPCR.FZ does not cause denormalized inputs to be flushed to zero
    - miscellaneous other corner-case behaviour changes
 * FPCR.FIZ: flush denormalized numbers to zero on input for
   most instructions
 * FPCR.NEP: makes scalar SIMD operations merge the result with
   higher vector elements in one of the source registers, instead
   of zeroing the higher elements of the destination

FEAT_RPRES makes single-precision FRECPE and FRSQRTE use a 12-bit
mantissa precision instead of 8-bit when FPCR.AH is set.

Because FPCR.AH implies quite a lot of changes to corner cases
of floating point handling, the resulting patchseries is regrettably
quite big.

Structure of the patchseries:
 * patch 1 fixes a silly bug in arm_reset_sve_state() which only
   has a major bad effect once FEAT_AFP is implemented
 * patches 2-16 are a refactoring which splits the existing
   fp_status and fp_status_f16 so that each have separate a32 and
   a64 versions. We need this because the FEAT_AFP bits only have
   an effect for A64 insns, not A32 insns
 * patches 17-22 add some more functionality to softfloat that we
   need for FEAT_AFP:
    - an exception flag float_flag_input_denormal_used is set when
      an input to an fp op is denormal, is not squashed to zero,
      and is actually consumed (i.e. not an invalid operation or
      an operation where the other input was a NaN)
    - a control setting float_detect_ftz which lets the target
      control whether flush-to-zero of outputs should be done
      before or after rounding
   (Both these are needed for correct x86 FP emulation, incidentally.)
 * patches 23-28 define the FPCR bits and implement the parts of the
   functionality which can be handled by setting softfloat control
   knobs and adjusting how we handle softfloat exception flags.
   (This includes all of the FPCR.FIZ behaviour.)
 * patches 29-33 implement FPCR.AH handling of a small group of
   insns (FRECPE, FRECPS, FRECPX, FRSQRTE, FRSQRTS, BFCVT*, BFMLAL*,
   BFMLSL*) which must:
    - never update FPSR exception flags
    - always round-to-nearest-even
    - always flush single and double denormal inputs and outputs to zero
   We implement this via some new float_status fields that we use for
   this group of insns.
 * patches 34-42 implement the FPCR.NEP "merge high vector elements of
   a source register with  the result of a scalar operation" behaviour
 * patches 43-49 implement FPCR.AH semantics for FMIN and FMAX:
    - comparing two zeroes (even of different sign) or comparing a NaN
      with anything always returns the second argument (possibly
      squashed to zero)
    - denormal outputs are not squashed to zero regardless of FZ or FZ16
 * patches 50-65 implement FPCR.AH semantics for abs and neg of floating
   point values: they must not change the sign bit of a NaN. This applies
   not just to the ABS and NEG insns but to any other insn whose
   pseudocode has it doing an FPAbs() or FPNeg() operation (e.g.
   FMLS, FRECPS, FTSSEL).
 * at this point patch 66 can enable FEAT_AFP for -cpu max
 * patches 67-70 implement FEAT_RPRES

I have also some patchs which make target/i386 use the "detect
flush to zero after rounding" and "report when input denormal is
consumed" softfloat features added here; I don't include them in
this patchset (though you can find them in that git branch I
mentioned earlier) becaus I haven't done as much testing on the
i386 side and in any case this patchset is already pretty long.
I expect I'll send them out when this series has been merged.


thanks
-- PMM


Peter Maydell (76):
  target/i386: Do not raise Invalid for 0 * Inf + QNaN
  tests/tcg/x86_64/fma: Test some x86 fused-multiply-add cases
  target/arm: arm_reset_sve_state() should set FPSR, not FPCR
  target/arm: Use FPSR_ constants in vfp_exceptbits_from_host()
  target/arm: Use uint32_t in vfp_exceptbits_from_host()
  target/arm: Define new fp_status_a32 and fp_status_a64
  target/arm: Use vfp.fp_status_a64 in A64-only helper functions
  target/arm: Use fp_status_a32 in vjvct helper
  target/arm: Use fp_status_a32 in vfp_cmp helpers
  target/arm: Use FPST_FPCR_A32 in A32 decoder
  target/arm: Use FPST_FPCR_A64 in A64 decoder
  target/arm: Remove now-unused vfp.fp_status and FPST_FPCR
  target/arm: Define new fp_status_f16_a32 and fp_status_f16_a64
  target/arm: Use fp_status_f16_a32 in AArch32-only helpers
  target/arm: Use fp_status_f16_a64 in AArch64-only helpers
  target/arm: Use FPST_FPCR_F16_A32 in A32 decoder
  target/arm: Use FPST_FPCR_F16_A64 in A64 decoder
  target/arm: Remove now-unused vfp.fp_status_f16 and FPST_FPCR_F16
  fpu: Rename float_flag_input_denormal to
    float_flag_input_denormal_flushed
  fpu: Rename float_flag_output_denormal to
    float_flag_output_denormal_flushed
  fpu: Fix a comment in softfloat-types.h
  fpu: Add float_class_denormal
  fpu: Implement float_flag_input_denormal_used
  fpu: allow flushing of output denormals to be after rounding
  target/arm: Remove redundant advsimd float16 helpers
  target/arm: Use FPST_FPCR_F16_A64 for halfprec-to-other conversions
  target/arm: Define FPCR AH, FIZ, NEP bits
  target/arm: Implement FPCR.FIZ handling
  target/arm: Adjust FP behaviour for FPCR.AH = 1
  target/arm: Adjust exception flag handling for AH = 1
  target/arm: Add FPCR.AH to tbflags
  target/arm: Set up float_status to use for FPCR.AH=1 behaviour
  target/arm: Use FPST_FPCR_AH for FRECPE, FRECPS, FRECPX, FRSQRTE,
    FRSQRTS
  target/arm: Use FPST_FPCR_AH for BFCVT* insns
  target/arm: Use FPST_FPCR_AH for BFMLAL*, BFMLSL* insns
  target/arm: Add FPCR.NEP to TBFLAGS
  target/arm: Define and use new write_fp_*reg_merging() functions
  target/arm: Handle FPCR.NEP for 3-input scalar operations
  target/arm: Handle FPCR.NEP for BFCVT scalar
  target/arm: Handle FPCR.NEP for 1-input scalar operations
  target/arm: Handle FPCR.NEP in do_cvtf_scalar()
  target/arm: Handle FPCR.NEP for scalar FABS and FNEG
  target/arm: Handle FPCR.NEP for FCVTXN (scalar)
  target/arm: Handle FPCR.NEP for NEP for FMUL, FMULX scalar by element
  target/arm: Implement FPCR.AH semantics for scalar FMIN/FMAX
  target/arm: Implement FPCR.AH semantics for vector FMIN/FMAX
  target/arm: Implement FPCR.AH semantics for FMAXV and FMINV
  target/arm: Implement FPCR.AH semantics for FMINP and FMAXP
  target/arm: Implement FPCR.AH semantics for SVE FMAXV and FMINV
  target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX immediate
  target/arm: Implement FPCR.AH semantics for SVE FMIN/FMAX vector
  target/arm: Implement FPCR.AH handling of negation of NaN
  target/arm: Implement FPCR.AH handling for scalar FABS and FABD
  target/arm: Handle FPCR.AH in vector FABD
  target/arm: Handle FPCR.AH in SVE FNEG
  target/arm: Handle FPCR.AH in SVE FABS
  target/arm: Handle FPCR.AH in SVE FABD
  target/arm: Handle FPCR.AH in negation steps in FCADD
  target/arm: Handle FPCR.AH in negation steps in SVE FCADD
  target/arm: Handle FPCR.AH in FMLSL
  target/arm: Handle FPCR.AH in FRECPS and FRSQRTS scalar insns
  target/arm: Handle FPCR.AH in FRECPS and FRSQRTS vector insns
  target/arm: Handle FPCR.AH in negation step in FMLS (indexed)
  target/arm: Handle FPCR.AH in negation in FMLS (vector)
  target/arm: Handle FPCR.AH in negation step in SVE FMLS (vector)
  target/arm: Handle FPCR.AH in SVE FTSSEL
  target/arm: Handle FPCR.AH in SVE FTMAD
  target/arm: Enable FEAT_AFP for '-cpu max'
  target/arm: Plumb FEAT_RPRES frecpe and frsqrte through to new helper
  target/arm: Implement increased precision FRECPE
  target/arm: Implement increased precision FRSQRTE
  target/arm: Enable FEAT_RPRES for -cpu max
  target/i386: Detect flush-to-zero after rounding
  target/i386: Use correct type for get_float_exception_flags() values
  target/i386: Wire up MXCSR.DE and FPUS.DE correctly
  tests/tcg/x86_64/fma: add test for exact-denormal output

 docs/system/arm/emulation.rst    |   2 +
 include/fpu/softfloat-helpers.h  |  11 +
 include/fpu/softfloat-types.h    |  51 +-
 target/arm/cpu-features.h        |  10 +
 target/arm/cpu.h                 |  32 +-
 target/arm/helper.h              |  12 +
 target/arm/internals.h           |   6 +
 target/arm/tcg/helper-a64.h      |  21 +-
 target/arm/tcg/helper-sve.h      | 120 +++++
 target/arm/tcg/translate.h       |  63 ++-
 target/i386/ops_sse.h            |  16 +-
 target/mips/fpu_helper.h         |   6 +
 fpu/softfloat.c                  |  71 ++-
 target/alpha/cpu.c               |   7 +
 target/arm/cpu.c                 |  32 +-
 target/arm/helper.c              |   4 +-
 target/arm/tcg/cpu64.c           |   2 +
 target/arm/tcg/helper-a64.c      | 173 ++++---
 target/arm/tcg/hflags.c          |  13 +
 target/arm/tcg/sme_helper.c      |   6 +-
 target/arm/tcg/sve_helper.c      | 301 ++++++++---
 target/arm/tcg/translate-a64.c   | 850 ++++++++++++++++++++++++-------
 target/arm/tcg/translate-sme.c   |   4 +-
 target/arm/tcg/translate-sve.c   | 280 ++++++----
 target/arm/tcg/translate-vfp.c   |  78 +--
 target/arm/tcg/vec_helper.c      | 174 ++++++-
 target/arm/vfp_helper.c          | 369 +++++++++++---
 target/hppa/fpu_helper.c         |  11 +
 target/i386/tcg/fpu_helper.c     | 110 ++--
 target/m68k/fpu_helper.c         |   2 +-
 target/mips/msa.c                |   9 +
 target/mips/tcg/msa_helper.c     |   4 +-
 target/ppc/cpu_init.c            |   3 +
 target/rx/cpu.c                  |   8 +
 target/rx/op_helper.c            |   4 +-
 target/sh4/cpu.c                 |   8 +
 target/tricore/fpu_helper.c      |   6 +-
 target/tricore/helper.c          |   1 +
 tests/fp/fp-bench.c              |   1 +
 tests/tcg/x86_64/fma.c           | 116 +++++
 fpu/softfloat-parts.c.inc        | 136 ++++-
 tests/tcg/x86_64/Makefile.target |   1 +
 42 files changed, 2443 insertions(+), 691 deletions(-)
 create mode 100644 tests/tcg/x86_64/fma.c

Comments

Peter Maydell Jan. 24, 2025, 4:35 p.m. UTC | #1
On Fri, 24 Jan 2025 at 16:28, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> This patchset implements emulation of the Arm FEAT_AFP and FEAT_RPRES
> extensions, which are floating-point related. It's based on the
> small i386 bugfix series I sent out a while back:
>
> Based-on: 20250116112536.4117889-1-peter.maydell@linaro.org
> ("target/i386: Fix 0 * Inf + QNaN regression")


> I have also some patchs which make target/i386 use the "detect
> flush to zero after rounding" and "report when input denormal is
> consumed" softfloat features added here; I don't include them in
> this patchset (though you can find them in that git branch I
> mentioned earlier) becaus I haven't done as much testing on the
> i386 side and in any case this patchset is already pretty long.
> I expect I'll send them out when this series has been merged.

...having said which, I was so eager to get the series out
once I'd finished the last test run that I forgot that I
didn't intend to send out the first two or the last four
patches in this series; whoops. Feel free to ignore them.

(The patch numbering in the explanation of the series structure
in the cover letter is all going to be off by 2 as a result,
as well. This doesn't seem worth resending a monster patchset
just to fix, though.)

thanks
-- PMM