[03/72] qemu/host-utils: Add wrappers for carry builtins

Message ID	20210508014802.892561-4-richard.henderson@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins Date: Fri, 7 May 2021 18:46:53 -0700 Message-Id: <20210508014802.892561-4-richard.henderson@linaro.org> In-Reply-To: <20210508014802.892561-1-richard.henderson@linaro.org> References: <20210508014802.892561-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::534; envelope-from=richard.henderson@linaro.org; helo=mail-pg1-x534.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: alex.bennee@linaro.org, david@redhat.com Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	Convert floatx80 and float128 to FloatParts \| expand [00/72] Convert floatx80 and float128 to FloatParts [01/72] qemu/host-utils: Use __builtin_bitreverseN [02/72] qemu/host-utils: Add wrappers for overflow builtins [03/72] qemu/host-utils: Add wrappers for carry builtins [04/72] accel/tcg: Use add/sub overflow routines in tcg-runtime-gvec.c [05/72] tests/fp: add quad support to the benchmark utility [06/72] softfloat: Move the binary point to the msb [07/72] softfloat: Inline float_raise [08/72] softfloat: Use float_raise in more places [09/72] softfloat: Tidy a * b + inf return [10/72] softfloat: Add float_cmask and constants [11/72] softfloat: Use return_nan in float_to_float [12/72] softfloat: fix return_nan vs default_nan_mode [13/72] target/mips: Set set_default_nan_mode with set_snan_bit_is_one [14/72] softfloat: Do not produce a default_nan from parts_silence_nan [15/72] softfloat: Rename FloatParts to FloatParts64 [16/72] softfloat: Move type-specific pack/unpack routines [17/72] softfloat: Use pointers with parts_default_nan [18/72] softfloat: Use pointers with unpack_raw [19/72] softfloat: Use pointers with ftype_unpack_raw [20/72] softfloat: Use pointers with pack_raw [21/72] softfloat: Use pointers with ftype_pack_raw [22/72] softfloat: Use pointers with ftype_unpack_canonical [23/72] softfloat: Use pointers with ftype_round_pack_canonical [24/72] softfloat: Use pointers with parts_silence_nan [25/72] softfloat: Rearrange FloatParts64 [26/72] softfloat: Convert float128_silence_nan to parts [27/72] softfloat: Convert float128_default_nan to parts [28/72] softfloat: Move return_nan to softfloat-parts.c.inc [29/72] softfloat: Move pick_nan to softfloat-parts.c.inc [30/72] softfloat: Move pick_nan_muladd to softfloat-parts.c.inc [31/72] softfloat: Move sf_canonicalize to softfloat-parts.c.inc [32/72] softfloat: Move round_canonical to softfloat-parts.c.inc [33/72] softfloat: Use uadd64_carry, usub64_borrow in softfloat-macros.h [34/72] softfloat: Move addsub_floats to softfloat-parts.c.inc [35/72] softfloat: Implement float128_add/sub via parts [36/72] softfloat: Move mul_floats to softfloat-parts.c.inc [37/72] softfloat: Move muladd_floats to softfloat-parts.c.inc [38/72] softfloat: Use mulu64 for mul64To128 [39/72] softfloat: Use add192 in mul128To256 [40/72] softfloat: Tidy mul128By64To192 [41/72] softfloat: Introduce sh[lr]_double primitives [42/72] softfloat: Move div_floats to softfloat-parts.c.inc [43/72] softfloat: Split float_to_float [44/72] softfloat: Convert float-to-float conversions with float128 [45/72] softfloat: Move round_to_int to softfloat-parts.c.inc [46/72] softfloat: Move rount_to_int_and_pack to softfloat-parts.c.inc [47/72] softfloat: Move rount_to_uint_and_pack to softfloat-parts.c.inc [48/72] softfloat: Move int_to_float to softfloat-parts.c.inc [49/72] softfloat: Move uint_to_float to softfloat-parts.c.inc [50/72] softfloat: Move minmax_flags to softfloat-parts.c.inc [51/72] softfloat: Move compare_floats to softfloat-parts.c.inc [52/72] softfloat: Move scalbn_decomposed to softfloat-parts.c.inc [53/72] softfloat: Move sqrt_float to softfloat-parts.c.inc [54/72] softfloat: Split out parts_uncanon_normal [55/72] softfloat: Reduce FloatFmt [56/72] softfloat: Introduce Floatx80RoundPrec [57/72] softfloat: Adjust parts_uncanon_normal for floatx80 [58/72] tests/fp/fp-test: Reverse order of floatx80 precision tests [59/72] softfloat: Convert floatx80_add/sub to FloatParts [60/72] softfloat: Convert floatx80_mul to FloatParts [61/72] softfloat: Convert floatx80_div to FloatParts [62/72] softfloat: Convert floatx80_sqrt to FloatParts [63/72] softfloat: Convert floatx80_round to FloatParts [64/72] softfloat: Convert floatx80_round_to_int to FloatParts [65/72] softfloat: Convert integer to floatx80 to FloatParts [66/72] softfloat: Convert floatx80 float conversions to FloatParts [67/72] softfloat: Convert floatx80 to integer to FloatParts [68/72] softfloat: Convert floatx80_scalbn to FloatParts [69/72] softfloat: Convert floatx80 compare to FloatParts [70/72] softfloat: Convert float32_exp2 to FloatParts [71/72] softfloat: Move floatN_log2 to softfloat-parts.c.inc [72/72] softfloat: Convert modrem operations to FloatParts

Message ID

20210508014802.892561-4-richard.henderson@linaro.org

State

Superseded

Headers

Received-SPF: pass (google.com: domain of
	qemu-devel-bounces+patch=linaro.org@nongnu.org designates
	209.51.188.17 as permitted sender) client-ip=209.51.188.17; 
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PATCH 03/72] qemu/host-utils: Add wrappers for carry builtins
Date: Fri,  7 May 2021 18:46:53 -0700
Message-Id: <20210508014802.892561-4-richard.henderson@linaro.org>
In-Reply-To: <20210508014802.892561-1-richard.henderson@linaro.org>
References: <20210508014802.892561-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::534;
	envelope-from=richard.henderson@linaro.org;
	helo=mail-pg1-x534.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
	DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
	RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
	SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: alex.bennee@linaro.org, david@redhat.com
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Series

Convert floatx80 and float128 to FloatParts | expand

Commit Message

Richard Henderson May 8, 2021, 1:46 a.m. UTC

These builtins came in clang 3.8, but are not present in gcc through
version 11.  Even in clang the optimization is not ideal except for
x86_64, but no worse than the hand-coding that we currently do.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

---
 include/qemu/host-utils.h | 50 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

-- 
2.25.1

Comments

Alex Bennée May 10, 2021, 12:57 p.m. UTC | #1

Richard Henderson <richard.henderson@linaro.org> writes:

> These builtins came in clang 3.8, but are not present in gcc through

> version 11.  Even in clang the optimization is not ideal except for

> x86_64, but no worse than the hand-coding that we currently do.


Given this statement....

<snip>
> +/**

> + * uadd64_carry - addition with carry-in and carry-out

> + * @x, @y: addends

> + * @pcarry: in-out carry value

> + *

> + * Computes @x + @y + *@pcarry, placing the carry-out back

> + * into *@pcarry and returning the 64-bit sum.

> + */

> +static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)

> +{

> +#if __has_builtin(__builtin_addcll)

> +    unsigned long long c = *pcarry;

> +    x = __builtin_addcll(x, y, c, &c);


what happens when unsigned long long isn't the same as uint64_t? Doesn't
C99 only specify a minimum?

> +    *pcarry = c & 1;


Why do we need to clamp it here? Shouldn't the compiler automatically do
that due to the bool?

> +    return x;

> +#else

> +    bool c = *pcarry;

> +    /* This is clang's internal expansion of __builtin_addc. */

> +    c = uadd64_overflow(x, c, &x);

> +    c |= uadd64_overflow(x, y, &x);

> +    *pcarry = c;

> +    return x;

> +#endif


Either way if you aren't super happy with the compilers builtin and you
get equivalent code with the unambigious hand coded version then what is
the point of having a builtin leg?

> +}

> +

> +/**

> + * usub64_borrow - subtraction with borrow-in and borrow-out

> + * @x, @y: addends

> + * @pborrow: in-out borrow value

> + *

> + * Computes @x - @y - *@pborrow, placing the borrow-out back

> + * into *@pborrow and returning the 64-bit sum.

> + */

> +static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)

> +{

> +#if __has_builtin(__builtin_subcll)

> +    unsigned long long b = *pborrow;

> +    x = __builtin_subcll(x, y, b, &b);

> +    *pborrow = b & 1;

> +    return x;

> +#else

> +    bool b = *pborrow;

> +    b = usub64_overflow(x, b, &x);

> +    b |= usub64_overflow(x, y, &x);

> +    *pborrow = b;

> +    return x;

> +#endif

> +}

> +

>  /* Host type specific sizes of these routines.  */

>  

>  #if ULONG_MAX == UINT32_MAX



-- 
Alex Bennée

Richard Henderson May 11, 2021, 8:10 p.m. UTC | #2

On 5/10/21 7:57 AM, Alex Bennée wrote:
> 

> Richard Henderson <richard.henderson@linaro.org> writes:

> 

>> These builtins came in clang 3.8, but are not present in gcc through

>> version 11.  Even in clang the optimization is not ideal except for

>> x86_64, but no worse than the hand-coding that we currently do.

> 

> Given this statement....

I think you mis-read the "except for x86_64" part?

Anyway, these are simply bugs to be filed against clang, so that hopefully 
clang-12 will do a good job with the builtin.  And as I said, while the 
generated code is not ideal, it's no worse.

>> +static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)

>> +{

>> +#if __has_builtin(__builtin_addcll)

>> +    unsigned long long c = *pcarry;

>> +    x = __builtin_addcll(x, y, c, &c);

> 

> what happens when unsigned long long isn't the same as uint64_t? Doesn't

> C99 only specify a minimum?

If you only look at C99, sure.  But looking at the set of supported hosts, 
unsigned long long is always a 64-bit type.

>> +    *pcarry = c & 1;

> 

> Why do we need to clamp it here? Shouldn't the compiler automatically do

> that due to the bool?

This produces a single AND insn, instead of CMP + SETcc.

r~

Alex Bennée May 12, 2021, 11:17 a.m. UTC | #3

Richard Henderson <richard.henderson@linaro.org> writes:

> On 5/10/21 7:57 AM, Alex Bennée wrote:

>> Richard Henderson <richard.henderson@linaro.org> writes:

>> 

>>> These builtins came in clang 3.8, but are not present in gcc through

>>> version 11.  Even in clang the optimization is not ideal except for

>>> x86_64, but no worse than the hand-coding that we currently do.

>> Given this statement....

>

> I think you mis-read the "except for x86_64" part?

>

> Anyway, these are simply bugs to be filed against clang, so that

> hopefully clang-12 will do a good job with the builtin.  And as I

> said, while the generated code is not ideal, it's no worse.

>

>>> +static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)

>>> +{

>>> +#if __has_builtin(__builtin_addcll)

>>> +    unsigned long long c = *pcarry;

>>> +    x = __builtin_addcll(x, y, c, &c);

>> what happens when unsigned long long isn't the same as uint64_t?

>> Doesn't

>> C99 only specify a minimum?

>

> If you only look at C99, sure.  But looking at the set of supported

> hosts, unsigned long long is always a 64-bit type.


I guess I'm worrying about a theoretical future - but we don't worry
about it for other ll builtins so no biggy.

>

>>> +    *pcarry = c & 1;

>> Why do we need to clamp it here? Shouldn't the compiler

>> automatically do

>> that due to the bool?

>

> This produces a single AND insn, instead of CMP + SETcc.


Might be worth mentioning that in the commit message. 

Anyway:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


-- 
Alex Bennée

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index fd76f0cbd3..2ea8b3000b 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -26,6 +26,7 @@ 
 #ifndef HOST_UTILS_H
 #define HOST_UTILS_H
 
+#include "qemu/compiler.h"
 #include "qemu/bswap.h"
 
 #ifdef CONFIG_INT128
@@ -581,6 +582,55 @@  static inline bool umul64_overflow(uint64_t x, uint64_t y, uint64_t *ret)
 #endif
 }
 
+/**
+ * uadd64_carry - addition with carry-in and carry-out
+ * @x, @y: addends
+ * @pcarry: in-out carry value
+ *
+ * Computes @x + @y + *@pcarry, placing the carry-out back
+ * into *@pcarry and returning the 64-bit sum.
+ */
+static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
+{
+#if __has_builtin(__builtin_addcll)
+    unsigned long long c = *pcarry;
+    x = __builtin_addcll(x, y, c, &c);
+    *pcarry = c & 1;
+    return x;
+#else
+    bool c = *pcarry;
+    /* This is clang's internal expansion of __builtin_addc. */
+    c = uadd64_overflow(x, c, &x);
+    c |= uadd64_overflow(x, y, &x);
+    *pcarry = c;
+    return x;
+#endif
+}
+
+/**
+ * usub64_borrow - subtraction with borrow-in and borrow-out
+ * @x, @y: addends
+ * @pborrow: in-out borrow value
+ *
+ * Computes @x - @y - *@pborrow, placing the borrow-out back
+ * into *@pborrow and returning the 64-bit sum.
+ */
+static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)
+{
+#if __has_builtin(__builtin_subcll)
+    unsigned long long b = *pborrow;
+    x = __builtin_subcll(x, y, b, &b);
+    *pborrow = b & 1;
+    return x;
+#else
+    bool b = *pborrow;
+    b = usub64_overflow(x, b, &x);
+    b |= usub64_overflow(x, y, &x);
+    *pborrow = b;
+    return x;
+#endif
+}
+
 /* Host type specific sizes of these routines.  */
 
 #if ULONG_MAX == UINT32_MAX

[03/72] qemu/host-utils: Add wrappers for carry builtins

Commit Message

Comments

Patch