utils: Use fma in qemu_strtosz

Message ID	20210314234821.1954428-1-richard.henderson@linaro.org
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Subject: [PATCH] utils: Use fma in qemu_strtosz Date: Sun, 14 Mar 2021 17:48:21 -0600 Message-Id: <20210314234821.1954428-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::f2e; envelope-from=richard.henderson@linaro.org; helo=mail-qv1-xf2e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Cc: peter.maydell@linaro.org, berrange@redhat.com Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	utils: Use fma in qemu_strtosz \| expand utils: Use fma in qemu_strtosz

Richard Henderson March 14, 2021, 11:48 p.m. UTC

Use fma to simulatneously scale and round up fraction.

The libm function will always return a properly rounded double precision
value, which will eliminate any extra precision the x87 co-processor may
give us, which will keep the output predictable vs other hosts.

Adding DBL_EPSILON while scaling should help with fractions like
12.345, where the closest representable number is actually 12.3449*.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

---
 util/cutils.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.25.1

Thomas Huth March 15, 2021, 5:32 a.m. UTC | #1

On 15/03/2021 00.48, Richard Henderson wrote:
> Use fma to simulatneously scale and round up fraction.

> 

> The libm function will always return a properly rounded double precision

> value, which will eliminate any extra precision the x87 co-processor may

> give us, which will keep the output predictable vs other hosts.

> 

> Adding DBL_EPSILON while scaling should help with fractions like

> 12.345, where the closest representable number is actually 12.3449*.

> 

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> ---

>   util/cutils.c | 2 +-

>   1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/util/cutils.c b/util/cutils.c

> index d89a40a8c3..f7f8e48a68 100644

> --- a/util/cutils.c

> +++ b/util/cutils.c

> @@ -342,7 +342,7 @@ static int do_strtosz(const char *nptr, const char **end,

>           retval = -ERANGE;

>           goto out;

>       }

> -    *result = val * mul + (uint64_t) (fraction * mul);

> +    *result = val * mul + (uint64_t)fma(fraction, mul, DBL_EPSILON);

>       retval = 0;

>   

>   out:


Will this fix the failure that we're currently seeing with 32-bit builds?

( https://gitlab.com/qemu-project/qemu/-/jobs/1096980112#L3258 for example )

Reviewed-by: Thomas Huth <thuth@redhat.com>

Philippe Mathieu-Daudé March 15, 2021, 9:10 a.m. UTC | #2

On 3/15/21 12:48 AM, Richard Henderson wrote:
> Use fma to simulatneously scale and round up fraction.


"simultaneously"

> The libm function will always return a properly rounded double precision

> value, which will eliminate any extra precision the x87 co-processor may

> give us, which will keep the output predictable vs other hosts.

> 

> Adding DBL_EPSILON while scaling should help with fractions like

> 12.345, where the closest representable number is actually 12.3449*.

> 

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> ---

>  util/cutils.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/util/cutils.c b/util/cutils.c

> index d89a40a8c3..f7f8e48a68 100644

> --- a/util/cutils.c

> +++ b/util/cutils.c

> @@ -342,7 +342,7 @@ static int do_strtosz(const char *nptr, const char **end,

>      if (val > (UINT64_MAX - ((uint64_t) (fraction * mul))) / mul) {


Shouldn't we use fma() here too? ^^^^^^^^^^^^^^^^^^^^^^^^^^

>          retval = -ERANGE;

>          goto out;

>      }

> -    *result = val * mul + (uint64_t) (fraction * mul);

> +    *result = val * mul + (uint64_t)fma(fraction, mul, DBL_EPSILON);

>      retval = 0;

>  

>  out:

>

Eric Blake March 15, 2021, 11:38 a.m. UTC | #3

On 3/14/21 6:48 PM, Richard Henderson wrote:
> Use fma to simulatneously scale and round up fraction.

> 

> The libm function will always return a properly rounded double precision

> value, which will eliminate any extra precision the x87 co-processor may

> give us, which will keep the output predictable vs other hosts.

> 

> Adding DBL_EPSILON while scaling should help with fractions like

> 12.345, where the closest representable number is actually 12.3449*.

> 

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> ---

>  util/cutils.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/util/cutils.c b/util/cutils.c

> index d89a40a8c3..f7f8e48a68 100644

> --- a/util/cutils.c

> +++ b/util/cutils.c

> @@ -342,7 +342,7 @@ static int do_strtosz(const char *nptr, const char **end,

>          retval = -ERANGE;

>          goto out;

>      }

> -    *result = val * mul + (uint64_t) (fraction * mul);

> +    *result = val * mul + (uint64_t)fma(fraction, mul, DBL_EPSILON);


Don't you need to include <float.h> to get DBL_EPSILON?

More importantly, this patch seems wrong.  fma(a, b, c) performs (a * b)
+ c without intermediate rounding errors, but given our values for a and
b, where mul > 1 in any situation we care about, adding 2^-53 is so much
smaller than a*b that it not going to round the result up to the next
integer.  Don't you want to do fma(fraction, mul, 0.5) instead?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Eric Blake March 15, 2021, 11:40 a.m. UTC | #4

On 3/15/21 4:10 AM, Philippe Mathieu-Daudé wrote:
> On 3/15/21 12:48 AM, Richard Henderson wrote:

>> Use fma to simulatneously scale and round up fraction.

> 

> "simultaneously"

> 

>> The libm function will always return a properly rounded double precision

>> value, which will eliminate any extra precision the x87 co-processor may

>> give us, which will keep the output predictable vs other hosts.

>>

>> Adding DBL_EPSILON while scaling should help with fractions like

>> 12.345, where the closest representable number is actually 12.3449*.

>>

>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

>> ---

>>  util/cutils.c | 2 +-

>>  1 file changed, 1 insertion(+), 1 deletion(-)

>>

>> diff --git a/util/cutils.c b/util/cutils.c

>> index d89a40a8c3..f7f8e48a68 100644

>> --- a/util/cutils.c

>> +++ b/util/cutils.c

>> @@ -342,7 +342,7 @@ static int do_strtosz(const char *nptr, const char **end,

>>      if (val > (UINT64_MAX - ((uint64_t) (fraction * mul))) / mul) {

> 

> Shouldn't we use fma() here too? ^^^^^^^^^^^^^^^^^^^^^^^^^^


Unlikely to make a difference in practice, but yes, consistency argues
that we should perform the same computations.  In fact, it may be worth
factoring out the computation to be done once, instead of relying on the
compiler to determine if fma() can undergo common subexpression elimination.

> 

>>          retval = -ERANGE;

>>          goto out;

>>      }

>> -    *result = val * mul + (uint64_t) (fraction * mul);

>> +    *result = val * mul + (uint64_t)fma(fraction, mul, DBL_EPSILON);

>>      retval = 0;

>>  

>>  out:

>>

> 

> 


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Richard Henderson March 15, 2021, 1:19 p.m. UTC | #5

On 3/15/21 3:10 AM, Philippe Mathieu-Daudé wrote:
> On 3/15/21 12:48 AM, Richard Henderson wrote:

>> Use fma to simulatneously scale and round up fraction.

> 

> "simultaneously"

> 

>> The libm function will always return a properly rounded double precision

>> value, which will eliminate any extra precision the x87 co-processor may

>> give us, which will keep the output predictable vs other hosts.

>>

>> Adding DBL_EPSILON while scaling should help with fractions like

>> 12.345, where the closest representable number is actually 12.3449*.

>>

>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

>> ---

>>   util/cutils.c | 2 +-

>>   1 file changed, 1 insertion(+), 1 deletion(-)

>>

>> diff --git a/util/cutils.c b/util/cutils.c

>> index d89a40a8c3..f7f8e48a68 100644

>> --- a/util/cutils.c

>> +++ b/util/cutils.c

>> @@ -342,7 +342,7 @@ static int do_strtosz(const char *nptr, const char **end,

>>       if (val > (UINT64_MAX - ((uint64_t) (fraction * mul))) / mul) {

> 

> Shouldn't we use fma() here too? ^^^^^^^^^^^^^^^^^^^^^^^^^^


Yep, I should have looked at the larger context.


r~

> 

>>           retval = -ERANGE;

>>           goto out;

>>       }

>> -    *result = val * mul + (uint64_t) (fraction * mul);

>> +    *result = val * mul + (uint64_t)fma(fraction, mul, DBL_EPSILON);

>>       retval = 0;

>>   

>>   out:

>>

>

Richard Henderson March 15, 2021, 1:20 p.m. UTC | #6

On 3/14/21 11:32 PM, Thomas Huth wrote:
> On 15/03/2021 00.48, Richard Henderson wrote:

>> Use fma to simulatneously scale and round up fraction.

>>

>> The libm function will always return a properly rounded double precision

>> value, which will eliminate any extra precision the x87 co-processor may

>> give us, which will keep the output predictable vs other hosts.

>>

>> Adding DBL_EPSILON while scaling should help with fractions like

>> 12.345, where the closest representable number is actually 12.3449*.

>>

>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

>> ---

>>   util/cutils.c | 2 +-

>>   1 file changed, 1 insertion(+), 1 deletion(-)

>>

>> diff --git a/util/cutils.c b/util/cutils.c

>> index d89a40a8c3..f7f8e48a68 100644

>> --- a/util/cutils.c

>> +++ b/util/cutils.c

>> @@ -342,7 +342,7 @@ static int do_strtosz(const char *nptr, const char **end,

>>           retval = -ERANGE;

>>           goto out;

>>       }

>> -    *result = val * mul + (uint64_t) (fraction * mul);

>> +    *result = val * mul + (uint64_t)fma(fraction, mul, DBL_EPSILON);

>>       retval = 0;

>>   out:

> 

> Will this fix the failure that we're currently seeing with 32-bit builds?


Yes.

https://gitlab.com/rth7680/qemu/-/pipelines/270311986

Richard Henderson March 15, 2021, 1:27 p.m. UTC | #7

On 3/15/21 5:38 AM, Eric Blake wrote:
> On 3/14/21 6:48 PM, Richard Henderson wrote:

>> Use fma to simulatneously scale and round up fraction.

>>

>> The libm function will always return a properly rounded double precision

>> value, which will eliminate any extra precision the x87 co-processor may

>> give us, which will keep the output predictable vs other hosts.

>>

>> Adding DBL_EPSILON while scaling should help with fractions like

>> 12.345, where the closest representable number is actually 12.3449*.

>>

>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

>> ---

>>   util/cutils.c | 2 +-

>>   1 file changed, 1 insertion(+), 1 deletion(-)

>>

>> diff --git a/util/cutils.c b/util/cutils.c

>> index d89a40a8c3..f7f8e48a68 100644

>> --- a/util/cutils.c

>> +++ b/util/cutils.c

>> @@ -342,7 +342,7 @@ static int do_strtosz(const char *nptr, const char **end,

>>           retval = -ERANGE;

>>           goto out;

>>       }

>> -    *result = val * mul + (uint64_t) (fraction * mul);

>> +    *result = val * mul + (uint64_t)fma(fraction, mul, DBL_EPSILON);

> 

> Don't you need to include <float.h> to get DBL_EPSILON?


Apparently we get it via some other route.

> More importantly, this patch seems wrong.  fma(a, b, c) performs (a * b)

> + c without intermediate rounding errors, but given our values for a and

> b, where mul > 1 in any situation we care about, adding 2^-53 is so much

> smaller than a*b that it not going to round the result up to the next

> integer.  Don't you want to do fma(fraction, mul, 0.5) instead?


I tried that first, but that requires changes to the testsuite, where we have a 
*.7 result, and expect it to be truncated.

I assumed adding 0.00*1 to 0.99*9 would still have an effect.

I think I should have tried fma(fraction, mul, 0) as well, just to see if it's 
the elimination of extended-precision results that were the real problem.


r~

utils: Use fma in qemu_strtosz

Commit Message

Comments

Patch