[RFC,00/15] softfloat: alternate conversion of float128_addsub

Message ID	20201021045149.1582203-1-richard.henderson@linaro.org
Headers	show Return-Path: <SRS0=0rPh=D4=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0AC0C22200 From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub Date: Tue, 20 Oct 2020 21:51:34 -0700 Message-Id: <20201021045149.1582203-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::534; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x534.google.com Precedence: list Cc: alex.bennee@linaro.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	softfloat: alternate conversion of float128_addsub \| expand [RFC,00/15] softfloat: alternate conversion of float128_addsub [RFC,01/15] qemu/int128: Add int128_or [RFC,02/15] qemu/int128: Add int128_clz, int128_ctz [RFC,03/15] qemu/int128: Rename int128_rshift, int128_lshift [RFC,04/15] qemu/int128: Add int128_shr [RFC,05/15] qemu/int128: Add int128_geu [RFC,06/15] softfloat: Use mulu64 for mul64To128 [RFC,07/15] softfloat: Use int128.h for some operations [RFC,08/15] softfloat: Tidy a * b + inf return [RFC,09/15] softfloat: Add float_cmask and constants [RFC,10/15] softfloat: Inline float_raise [RFC,11/15] Test split to softfloat-parts.c.inc [RFC,12/15] softfloat: Streamline FloatFmt [RFC,13/15] Test float128_addsub [RFC,14/15] softfloat: Use float_cmask for addsub_floats [RFC,15/15] softfloat: Improve subtraction of equal exponent

Message ID

20201021045149.1582203-1-richard.henderson@linaro.org

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0AC0C22200
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Subject: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub
Date: Tue, 20 Oct 2020 21:51:34 -0700
Message-Id: <20201021045149.1582203-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::534;
	envelope-from=richard.henderson@linaro.org;
	helo=mail-pg1-x534.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
	DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
	SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: alex.bennee@linaro.org
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>

Series

softfloat: alternate conversion of float128_addsub | expand

Message

Richard Henderson Oct. 21, 2020, 4:51 a.m. UTC

Hi Alex,

Here's my first adjustment to your conversion for 128-bit floats.

The Idea is to use a set of macros and an include file so that we
can re-use the same large chunk of code that performs the basic
operations on various fraction lengths.  It's ugly, but without
proper language support it seems to be less ugly than most.

While I've just gone and added lots of stuff to int128... I have
had another idea, half-baked because I'm tired and it's late:

typedef struct {
    FloatClass cls;
    int exp;
    bool sign;
    uint64_t frac[];
} FloatPartsBase;

typedef struct {
    FloatPartsBase base;
    uint64_t frac;
} FloatParts64;

typedef struct {
    FloatPartsBase base;
    uint64_t frac_hi, frac_lo;
} FloatParts128;

typedef struct {
    FloatPartsBase base;
    uint64_t frac[4]; /* big endian word ordering */
} FloatParts256;

This layout, with the big-endian ordering, means that storage
can be shared between them, just by ignoring the least significant
words of the fraction as needed.  Which may make muladd more
understandable.

E.g.

static void muladd_floats64(FloatParts128 *r, FloatParts64 *a,
                            FloatParts64 *b, FloatParts128 *c, ...)
{
    // handle nans
    // produce 128-bit product into r
    // handle p vs c special cases.
    // zero-extend c to 128-bits
    c->frac[1] = 0;
    // perform 128-bit fractional addition
    addsub_floats128(r, c, ...);
    // fold 128-bit fraction to 64-bit sticky bit.
    r->frac[0] |= r->frac[1] != 0;
}

float64 float64_muladd(float64 a, float64 b, float64 c, ...)
{
    FloatParts64 pa, pb;
    FloatParts128 pc, pr;

    float64_unpack_canonical(&pa.base, a, status);
    float64_unpack_canonical(&pb.base, b, status);
    float64_unpack_canonical(&pc.base, c, status);
    muladd_floats64(&pr, &pa, &pb, &pc, flags, status);

    return float64_round_pack_canonical(&pr.base, status);
}

Similarly, muladd_floats128 would use addsub_floats256.

However, the big-endian word ordering means that Int128
cannot be used directly; so a set of wrappers are needed.
If added the Int128 routine just for use here, then it's
probably easier to bypass Int128 and just code it here.

Thoughts?


r~


Richard Henderson (15):
  qemu/int128: Add int128_or
  qemu/int128: Add int128_clz, int128_ctz
  qemu/int128: Rename int128_rshift, int128_lshift
  qemu/int128: Add int128_shr
  qemu/int128: Add int128_geu
  softfloat: Use mulu64 for mul64To128
  softfloat: Use int128.h for some operations
  softfloat: Tidy a * b + inf return
  softfloat: Add float_cmask and constants
  softfloat: Inline float_raise
  Test split to softfloat-parts.c.inc
  softfloat: Streamline FloatFmt
  Test float128_addsub
  softfloat: Use float_cmask for addsub_floats
  softfloat: Improve subtraction of equal exponent

 include/fpu/softfloat-macros.h |  89 ++--
 include/fpu/softfloat.h        |   5 +-
 include/qemu/int128.h          |  61 ++-
 fpu/softfloat.c                | 802 ++++++++++-----------------------
 softmmu/physmem.c              |   4 +-
 target/ppc/int_helper.c        |   4 +-
 tests/test-int128.c            |  44 +-
 fpu/softfloat-parts.c.inc      | 339 ++++++++++++++
 fpu/softfloat-specialize.c.inc |  45 +-
 9 files changed, 716 insertions(+), 677 deletions(-)
 create mode 100644 fpu/softfloat-parts.c.inc

Comments

no-reply@patchew.org Oct. 21, 2020, 5:12 a.m. UTC | #1

Patchew URL: https://patchew.org/QEMU/20201021045149.1582203-1-richard.henderson@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20201021045149.1582203-1-richard.henderson@linaro.org
Subject: [RFC PATCH 00/15] softfloat: alternate conversion of float128_addsub

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20201021045149.1582203-1-richard.henderson@linaro.org -> patchew/20201021045149.1582203-1-richard.henderson@linaro.org
Switched to a new branch 'test'
ad120c1 softfloat: Improve subtraction of equal exponent
12eb5a4 softfloat: Use float_cmask for addsub_floats
a04ff7d Test float128_addsub
fc9537e softfloat: Streamline FloatFmt
979beb6 Test split to softfloat-parts.c.inc
4317991 softfloat: Inline float_raise
4689bd2 softfloat: Add float_cmask and constants
b1141ee softfloat: Tidy a * b + inf return
197273c softfloat: Use int128.h for some operations
aa4afa2 softfloat: Use mulu64 for mul64To128
017d276 qemu/int128: Add int128_geu
71dd5f1 qemu/int128: Add int128_shr
9144df9 qemu/int128: Rename int128_rshift, int128_lshift
b6c9afb qemu/int128: Add int128_clz, int128_ctz
0ceff9a qemu/int128: Add int128_or

=== OUTPUT BEGIN ===
1/15 Checking commit 0ceff9a14aa6 (qemu/int128: Add int128_or)
2/15 Checking commit b6c9afb58357 (qemu/int128: Add int128_clz, int128_ctz)
3/15 Checking commit 9144df990b17 (qemu/int128: Rename int128_rshift, int128_lshift)
4/15 Checking commit 71dd5f157a39 (qemu/int128: Add int128_shr)
5/15 Checking commit 017d27627112 (qemu/int128: Add int128_geu)
6/15 Checking commit aa4afa22ee78 (softfloat: Use mulu64 for mul64To128)
7/15 Checking commit 197273c0aeda (softfloat: Use int128.h for some operations)
8/15 Checking commit b1141eecc368 (softfloat: Tidy a * b + inf return)
9/15 Checking commit 4689bd26fd66 (softfloat: Add float_cmask and constants)
10/15 Checking commit 4317991dcbd8 (softfloat: Inline float_raise)
11/15 Checking commit 979beb676e89 (Test split to softfloat-parts.c.inc)
WARNING: Block comments use a leading /* on a separate line
#557: FILE: fpu/softfloat.c:685:
+        /* Note that this check is after pickNaNMulAdd so that function

ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 786 lines checked

Patch 11/15 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

12/15 Checking commit fc9537e66b73 (softfloat: Streamline FloatFmt)
13/15 Checking commit a04ff7dcb003 (Test float128_addsub)
ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 0 warnings, 405 lines checked

Patch 13/15 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

14/15 Checking commit 12eb5a486720 (softfloat: Use float_cmask for addsub_floats)
15/15 Checking commit ad120c1d1ae8 (softfloat: Improve subtraction of equal exponent)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20201021045149.1582203-1-richard.henderson@linaro.org/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

Alex Bennée Oct. 21, 2020, 5:46 p.m. UTC | #2

Richard Henderson <richard.henderson@linaro.org> writes:

> Hi Alex,
>
> Here's my first adjustment to your conversion for 128-bit floats.
>
> The Idea is to use a set of macros and an include file so that we
> can re-use the same large chunk of code that performs the basic
> operations on various fraction lengths.  It's ugly, but without
> proper language support it seems to be less ugly than most.
>
> While I've just gone and added lots of stuff to int128... I have
> had another idea, half-baked because I'm tired and it's late:
>
> typedef struct {
>     FloatClass cls;
>     int exp;
>     bool sign;
>     uint64_t frac[];
> } FloatPartsBase;
>
> typedef struct {
>     FloatPartsBase base;
>     uint64_t frac;
> } FloatParts64;
>
> typedef struct {
>     FloatPartsBase base;
>     uint64_t frac_hi, frac_lo;
> } FloatParts128;
>
> typedef struct {
>     FloatPartsBase base;
>     uint64_t frac[4]; /* big endian word ordering */
> } FloatParts256;
>
> This layout, with the big-endian ordering, means that storage
> can be shared between them, just by ignoring the least significant
> words of the fraction as needed.  Which may make muladd more
> understandable.

Would the big-endian formatting hamper the compiler on x86 where it can
do extra wide operations?

I am still seeing a multi MFlop drop in performance when converting the
float128_addsub to the new code. If this allows the compiler to do
better on the code I can live with it.

>
> E.g.
>
> static void muladd_floats64(FloatParts128 *r, FloatParts64 *a,
>                             FloatParts64 *b, FloatParts128 *c, ...)
> {
>     // handle nans
>     // produce 128-bit product into r
>     // handle p vs c special cases.
>     // zero-extend c to 128-bits
>     c->frac[1] = 0;
>     // perform 128-bit fractional addition
>     addsub_floats128(r, c, ...);
>     // fold 128-bit fraction to 64-bit sticky bit.
>     r->frac[0] |= r->frac[1] != 0;
> }
>
> float64 float64_muladd(float64 a, float64 b, float64 c, ...)
> {
>     FloatParts64 pa, pb;
>     FloatParts128 pc, pr;
>
>     float64_unpack_canonical(&pa.base, a, status);
>     float64_unpack_canonical(&pb.base, b, status);
>     float64_unpack_canonical(&pc.base, c, status);
>     muladd_floats64(&pr, &pa, &pb, &pc, flags, status);
>
>     return float64_round_pack_canonical(&pr.base, status);
> }
>
> Similarly, muladd_floats128 would use addsub_floats256.
>
> However, the big-endian word ordering means that Int128
> cannot be used directly; so a set of wrappers are needed.
> If added the Int128 routine just for use here, then it's
> probably easier to bypass Int128 and just code it here.

Are you talking about all our operations? Will we still need to#ifdef
CONFIG_INT128 in the softfloat code?

>
> Thoughts?
>
>
> r~
>
>
> Richard Henderson (15):
>   qemu/int128: Add int128_or
>   qemu/int128: Add int128_clz, int128_ctz
>   qemu/int128: Rename int128_rshift, int128_lshift
>   qemu/int128: Add int128_shr
>   qemu/int128: Add int128_geu
>   softfloat: Use mulu64 for mul64To128
>   softfloat: Use int128.h for some operations
>   softfloat: Tidy a * b + inf return
>   softfloat: Add float_cmask and constants
>   softfloat: Inline float_raise
>   Test split to softfloat-parts.c.inc
>   softfloat: Streamline FloatFmt
>   Test float128_addsub
>   softfloat: Use float_cmask for addsub_floats
>   softfloat: Improve subtraction of equal exponent
>
>  include/fpu/softfloat-macros.h |  89 ++--
>  include/fpu/softfloat.h        |   5 +-
>  include/qemu/int128.h          |  61 ++-
>  fpu/softfloat.c                | 802 ++++++++++-----------------------
>  softmmu/physmem.c              |   4 +-
>  target/ppc/int_helper.c        |   4 +-
>  tests/test-int128.c            |  44 +-
>  fpu/softfloat-parts.c.inc      | 339 ++++++++++++++
>  fpu/softfloat-specialize.c.inc |  45 +-
>  9 files changed, 716 insertions(+), 677 deletions(-)
>  create mode 100644 fpu/softfloat-parts.c.inc

Richard Henderson Oct. 21, 2020, 5:53 p.m. UTC | #3

On 10/21/20 10:46 AM, Alex Bennée wrote:
>> This layout, with the big-endian ordering, means that storage

>> can be shared between them, just by ignoring the least significant

>> words of the fraction as needed.  Which may make muladd more

>> understandable.

> 

> Would the big-endian formatting hamper the compiler on x86 where it can

> do extra wide operations?


Well, you couldn't just use Int128 in the structure.  But you could write the
helpers via int128_make128/getlo/gethi, which would still get the compiler
expansion.


>> However, the big-endian word ordering means that Int128

>> cannot be used directly; so a set of wrappers are needed.

>> If added the Int128 routine just for use here, then it's

>> probably easier to bypass Int128 and just code it here.

> 

> Are you talking about all our operations? Will we still need to#ifdef

> CONFIG_INT128 in the softfloat code?


If we decline to put the operation into qemu/int128.h, because they're not
generally useful, then yes, we may put those ifdefs into our softfloat code.


r~