diff mbox

[for-2.5] hw/timer/hpet.c: Avoid signed integer overflow which results in bugs on OSX

Message ID 1447080991-24995-1-git-send-email-peter.maydell@linaro.org
State Accepted
Headers show

Commit Message

Peter Maydell Nov. 9, 2015, 2:56 p.m. UTC
Signed integer overflow in C is undefined behaviour, and the compiler
is at liberty to assume it can never happen and optimize accordingly.
In particular, the subtractions in hpet_time_after() and hpet_time_after64()
were causing OSX clang to optimize the code such that it was prone to
hangs and complaints about the main loop stalling (presumably because
we were spending all our time trying to service very high frequency
HPET timer callbacks). The clang sanitizer confirms the UB:

hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 - 2147003978 cannot be represented in type 'int'

Fix this by doing the subtraction as an unsigned operation and then
converting to signed for the comparison.

Reported-by: Aaron Elkins <threcius@yahoo.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

---
 hw/timer/hpet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.6.2

Comments

Peter Maydell Nov. 9, 2015, 3:26 p.m. UTC | #1
On 9 November 2015 at 15:25, Paolo Bonzini <pbonzini@redhat.com> wrote:
>

>

> On 09/11/2015 15:56, Peter Maydell wrote:

>> Signed integer overflow in C is undefined behaviour, and the compiler

>> is at liberty to assume it can never happen and optimize accordingly.

>> In particular, the subtractions in hpet_time_after() and hpet_time_after64()

>> were causing OSX clang to optimize the code such that it was prone to

>> hangs and complaints about the main loop stalling (presumably because

>> we were spending all our time trying to service very high frequency

>> HPET timer callbacks). The clang sanitizer confirms the UB:

>>

>> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 - 2147003978 cannot be represented in type 'int'

>>

>> Fix this by doing the subtraction as an unsigned operation and then

>> converting to signed for the comparison.

>>

>> Reported-by: Aaron Elkins <threcius@yahoo.com>

>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

>> ---

>>  hw/timer/hpet.c | 4 ++--

>>  1 file changed, 2 insertions(+), 2 deletions(-)

>>

>> diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c

>> index 3037bef..7f0391c 100644

>> --- a/hw/timer/hpet.c

>> +++ b/hw/timer/hpet.c

>> @@ -116,12 +116,12 @@ static uint32_t timer_enabled(HPETTimer *t)

>>

>>  static uint32_t hpet_time_after(uint64_t a, uint64_t b)

>>  {

>> -    return ((int32_t)(b) - (int32_t)(a) < 0);

>> +    return ((int32_t)(b - a) < 0);

>

> Does checkpatch complain about the outer parentheses?


Nope :-)

>>  }

>>

>>  static uint32_t hpet_time_after64(uint64_t a, uint64_t b)

>>  {

>> -    return ((int64_t)(b) - (int64_t)(a) < 0);

>> +    return ((int64_t)(b - a) < 0);

>>  }

>

> Cc: qemu-stable@nongnu.org


Whoops, yes, cc'ing stable would be a good idea.

> Looks good, thanks!  Can you apply it yourself?


Sure.

thanks
-- PMM
Peter Maydell Nov. 9, 2015, 4:27 p.m. UTC | #2
On 9 November 2015 at 15:26, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 9 November 2015 at 15:25, Paolo Bonzini <pbonzini@redhat.com> wrote:

>>

>>

>> On 09/11/2015 15:56, Peter Maydell wrote:

>>> Signed integer overflow in C is undefined behaviour, and the compiler

>>> is at liberty to assume it can never happen and optimize accordingly.

>>> In particular, the subtractions in hpet_time_after() and hpet_time_after64()

>>> were causing OSX clang to optimize the code such that it was prone to

>>> hangs and complaints about the main loop stalling (presumably because

>>> we were spending all our time trying to service very high frequency

>>> HPET timer callbacks). The clang sanitizer confirms the UB:

>>>

>>> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 - 2147003978 cannot be represented in type 'int'

>>>

>>> Fix this by doing the subtraction as an unsigned operation and then

>>> converting to signed for the comparison.

>>>

>>> Reported-by: Aaron Elkins <threcius@yahoo.com>

>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

>>> ---

>>>  hw/timer/hpet.c | 4 ++--

>>>  1 file changed, 2 insertions(+), 2 deletions(-)

>>>

>>> diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c

>>> index 3037bef..7f0391c 100644

>>> --- a/hw/timer/hpet.c

>>> +++ b/hw/timer/hpet.c

>>> @@ -116,12 +116,12 @@ static uint32_t timer_enabled(HPETTimer *t)

>>>

>>>  static uint32_t hpet_time_after(uint64_t a, uint64_t b)

>>>  {

>>> -    return ((int32_t)(b) - (int32_t)(a) < 0);

>>> +    return ((int32_t)(b - a) < 0);

>>

>> Does checkpatch complain about the outer parentheses?

>

> Nope :-)

>

>>>  }

>>>

>>>  static uint32_t hpet_time_after64(uint64_t a, uint64_t b)

>>>  {

>>> -    return ((int64_t)(b) - (int64_t)(a) < 0);

>>> +    return ((int64_t)(b - a) < 0);

>>>  }

>>

>> Cc: qemu-stable@nongnu.org

>

> Whoops, yes, cc'ing stable would be a good idea.

>

>> Looks good, thanks!  Can you apply it yourself?

>

> Sure.


Applied to master, thanks.

-- PMM
Laszlo Ersek Nov. 9, 2015, 10:25 p.m. UTC | #3
On 11/09/15 15:56, Peter Maydell wrote:
> Signed integer overflow in C is undefined behaviour, and the compiler

> is at liberty to assume it can never happen and optimize accordingly.

> In particular, the subtractions in hpet_time_after() and hpet_time_after64()

> were causing OSX clang to optimize the code such that it was prone to

> hangs and complaints about the main loop stalling (presumably because

> we were spending all our time trying to service very high frequency

> HPET timer callbacks). The clang sanitizer confirms the UB:

> 

> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 - 2147003978 cannot be represented in type 'int'

> 

> Fix this by doing the subtraction as an unsigned operation and then

> converting to signed for the comparison.

> 

> Reported-by: Aaron Elkins <threcius@yahoo.com>

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

> ---

>  hw/timer/hpet.c | 4 ++--

>  1 file changed, 2 insertions(+), 2 deletions(-)

> 

> diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c

> index 3037bef..7f0391c 100644

> --- a/hw/timer/hpet.c

> +++ b/hw/timer/hpet.c

> @@ -116,12 +116,12 @@ static uint32_t timer_enabled(HPETTimer *t)

>  

>  static uint32_t hpet_time_after(uint64_t a, uint64_t b)

>  {

> -    return ((int32_t)(b) - (int32_t)(a) < 0);

> +    return ((int32_t)(b - a) < 0);

>  }

>  

>  static uint32_t hpet_time_after64(uint64_t a, uint64_t b)

>  {

> -    return ((int64_t)(b) - (int64_t)(a) < 0);

> +    return ((int64_t)(b - a) < 0);

>  }

>  

>  static uint64_t ticks_to_ns(uint64_t value)

> 


I'm late to the discussion, but I cannot imagine what would speak against:

    return (b < a);

The post-patch code still converts a uint64_t difference to int32_t.
According to the C standard(s), such a conversion (i.e., when the
integer value being converted doesn't fit in the target signed integer)
results in an implementation-defined value, or an implementation-defined
signal is raised.

On our platforms, the impl-def value is determined by "truncate to 32
bits, then reinterpret the bit pattern as two's complement signed
int32_t". Meaning, if:

    (b > a) && ((b - a) & (1u << 31))

(that is, "b" is so much larger than "a" that bit#31 is set in the (b-a)
difference), then hpet_time_after() will now incorrectly return 1.
(Because bit#31 will be interpreted as the sign bit, turned on.)

Again, what speaks against

    return (b < a);

?

(The pre-patch code dates back to commit 16b29ae1 (year 2008), which
offers precious little justification for the formula.)

Thanks
Laszlo
Laszlo Ersek Nov. 10, 2015, 8:57 a.m. UTC | #4
On 11/09/15 23:25, Laszlo Ersek wrote:
> On 11/09/15 15:56, Peter Maydell wrote:

>> Signed integer overflow in C is undefined behaviour, and the compiler

>> is at liberty to assume it can never happen and optimize accordingly.

>> In particular, the subtractions in hpet_time_after() and hpet_time_after64()

>> were causing OSX clang to optimize the code such that it was prone to

>> hangs and complaints about the main loop stalling (presumably because

>> we were spending all our time trying to service very high frequency

>> HPET timer callbacks). The clang sanitizer confirms the UB:

>>

>> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 - 2147003978 cannot be represented in type 'int'

>>

>> Fix this by doing the subtraction as an unsigned operation and then

>> converting to signed for the comparison.

>>

>> Reported-by: Aaron Elkins <threcius@yahoo.com>

>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

>> ---

>>  hw/timer/hpet.c | 4 ++--

>>  1 file changed, 2 insertions(+), 2 deletions(-)

>>

>> diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c

>> index 3037bef..7f0391c 100644

>> --- a/hw/timer/hpet.c

>> +++ b/hw/timer/hpet.c

>> @@ -116,12 +116,12 @@ static uint32_t timer_enabled(HPETTimer *t)

>>  

>>  static uint32_t hpet_time_after(uint64_t a, uint64_t b)

>>  {

>> -    return ((int32_t)(b) - (int32_t)(a) < 0);

>> +    return ((int32_t)(b - a) < 0);

>>  }

>>  

>>  static uint32_t hpet_time_after64(uint64_t a, uint64_t b)

>>  {

>> -    return ((int64_t)(b) - (int64_t)(a) < 0);

>> +    return ((int64_t)(b - a) < 0);

>>  }

>>  

>>  static uint64_t ticks_to_ns(uint64_t value)

>>

> 

> I'm late to the discussion, but I cannot imagine what would speak against:

> 

>     return (b < a);

> 

> The post-patch code still converts a uint64_t difference to int32_t.

> According to the C standard(s), such a conversion (i.e., when the

> integer value being converted doesn't fit in the target signed integer)

> results in an implementation-defined value, or an implementation-defined

> signal is raised.

> 

> On our platforms, the impl-def value is determined by "truncate to 32

> bits, then reinterpret the bit pattern as two's complement signed

> int32_t". Meaning, if:

> 

>     (b > a) && ((b - a) & (1u << 31))

> 

> (that is, "b" is so much larger than "a" that bit#31 is set in the (b-a)

> difference), then hpet_time_after() will now incorrectly return 1.

> (Because bit#31 will be interpreted as the sign bit, turned on.)

> 

> Again, what speaks against

> 

>     return (b < a);

> 

> ?

> 

> (The pre-patch code dates back to commit 16b29ae1 (year 2008), which

> offers precious little justification for the formula.)


An hour or so after sending this email, I think I got an idea about the
code's intent. (Knowing practically nothing about HPET.) I guess the
HPET provides counters that can wrap around, so if you don't look
frequently enough, you won't know if the value is actually smaller or
greater (because you can't use raw magnitude to tell that).

So I *guess* this code implemented the following idea: assume you have a
"last value", and a reading (?) from "just a bit later". You take the
neighborhood (with radius 2^31, or 2^63) of the "last value", and if the
new reading falls into the upper half of that neighborhood, you say "the
value has grown".

This idea is actually very well suited for uintN_t modular arithmetic,
because the (x - y) difference expresses the number of times you have to
increment y to make it fall into the same remainder class as x, modulo 2^N.

Hence, ((x - y) < 2^(N-1)) expresses "x is later than or equal to y"
(with both x and y being uintN_t variables). Equivalently, we have ((x -
y) >= 2^(N-1)) meaning "x is strictly earlier than y", which can also be
said as "y is strictly after x".

And I think that's exactly what these functions implement:

- Their names say "time after".

- The condition

  (x - y) >= 2^(N-1)

  tests exactly whether the most significant bit is set in the
  difference.

  When the bit pattern of the difference is reinterpreted as intN_t,
  that in turn means

  (intN_t)(x - y) < 0

So the functions seem to check if "a is strictly after b".

... The call sites seem to confirm this:

        if (t->config & HPET_TN_32BIT) {
            while (hpet_time_after(cur_tick, t->cmp)) {
                t->cmp = (uint32_t)(t->cmp + t->period);
            }
        } else {
            while (hpet_time_after64(cur_tick, t->cmp)) {
                t->cmp += period;
            }
        }

The loops increment "t->cmp" as long as "cur_tick is strictly after
t->cmp"; in other words, the loops make "t->cmp" catch up with "cur_tick".

... I think the functions are right after all, it's just that the
following would have matched my personal taste more:

  b - a >= 1u << 31

and

  b - a >= 1ull << 63

(Because they don't have any impl-def parts in them, plus to me they
make the intent, with the modular arithmetic and the "neighborhoods",
clearer.)

I guess for others it's the opposite... :)

Cheers
Laszlo
Laszlo Ersek Nov. 10, 2015, 9:51 a.m. UTC | #5
On 11/10/15 10:26, Paolo Bonzini wrote:
> 

> 

> On 10/11/2015 09:57, Laszlo Ersek wrote:

>> On 11/09/15 23:25, Laszlo Ersek wrote:

>>> On 11/09/15 15:56, Peter Maydell wrote:

>>>> Signed integer overflow in C is undefined behaviour, and the compiler

>>>> is at liberty to assume it can never happen and optimize accordingly.

>>>> In particular, the subtractions in hpet_time_after() and hpet_time_after64()

>>>> were causing OSX clang to optimize the code such that it was prone to

>>>> hangs and complaints about the main loop stalling (presumably because

>>>> we were spending all our time trying to service very high frequency

>>>> HPET timer callbacks). The clang sanitizer confirms the UB:

>>>>

>>>> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 - 2147003978 cannot be represented in type 'int'

>>>>

>>>> Fix this by doing the subtraction as an unsigned operation and then

>>>> converting to signed for the comparison.

>>>>

>>>> Reported-by: Aaron Elkins <threcius@yahoo.com>

>>>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

>>>> ---

>>>>  hw/timer/hpet.c | 4 ++--

>>>>  1 file changed, 2 insertions(+), 2 deletions(-)

>>>>

>>>> diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c

>>>> index 3037bef..7f0391c 100644

>>>> --- a/hw/timer/hpet.c

>>>> +++ b/hw/timer/hpet.c

>>>> @@ -116,12 +116,12 @@ static uint32_t timer_enabled(HPETTimer *t)

>>>>  

>>>>  static uint32_t hpet_time_after(uint64_t a, uint64_t b)

>>>>  {

>>>> -    return ((int32_t)(b) - (int32_t)(a) < 0);

>>>> +    return ((int32_t)(b - a) < 0);

>>>>  }

>>>>  

>>>>  static uint32_t hpet_time_after64(uint64_t a, uint64_t b)

>>>>  {

>>>> -    return ((int64_t)(b) - (int64_t)(a) < 0);

>>>> +    return ((int64_t)(b - a) < 0);

>>>>  }

>>>>  

>>>>  static uint64_t ticks_to_ns(uint64_t value)

>>>>

>>>

>>> I'm late to the discussion, but I cannot imagine what would speak against:

>>>

>>>     return (b < a);

> 

> With uint32_t, b < a is wrong if b has just overflowed and a is just

> below 2^32.

> 

> With int32_t, b < a is wrong if b is just above 2^31 and a is just below

> 2^31.

> 

> Basically you want to consider a sliding window around (a+b)/2 (where

> a+b is computed with "infinite" precision), and see whether it's a or b

> that comes before the average.


Thanks!

(I guess / hope this is about the same that I managed to realize on my
own in my other email :))

> For int64_t/uint64_t it is indeed moot, because it takes centuries

> before you get close to 2^63 ticks (QEMU's emulated HPET has a 100 MHz

> frequency; one year is 86400*365.25*10^8 ticks, or about 2^51.5).


Finally! I resisted the urge to write "yet another hardware clock /
counter that overflows within a humanly observable interval, *groan*".
But, now that you say that the 64-bit HPET fixes (or may fix) that, I
don't have to hold back. :)

Thanks
Laszlo

> 

> Paolo

> 

>>> The post-patch code still converts a uint64_t difference to int32_t.

>>> According to the C standard(s), such a conversion (i.e., when the

>>> integer value being converted doesn't fit in the target signed integer)

>>> results in an implementation-defined value, or an implementation-defined

>>> signal is raised.

>>>

>>> On our platforms, the impl-def value is determined by "truncate to 32

>>> bits, then reinterpret the bit pattern as two's complement signed

>>> int32_t". Meaning, if:

>>>

>>>     (b > a) && ((b - a) & (1u << 31))

>>>

>>> (that is, "b" is so much larger than "a" that bit#31 is set in the (b-a)

>>> difference), then hpet_time_after() will now incorrectly return 1.

>>> (Because bit#31 will be interpreted as the sign bit, turned on.)

>>>

>>> Again, what speaks against

>>>

>>>     return (b < a);

>>>

>>> ?

>>>

>>> (The pre-patch code dates back to commit 16b29ae1 (year 2008), which

>>> offers precious little justification for the formula.)

>>

>> An hour or so after sending this email, I think I got an idea about the

>> code's intent. (Knowing practically nothing about HPET.) I guess the

>> HPET provides counters that can wrap around, so if you don't look

>> frequently enough, you won't know if the value is actually smaller or

>> greater (because you can't use raw magnitude to tell that).

>>

>> So I *guess* this code implemented the following idea: assume you have a

>> "last value", and a reading (?) from "just a bit later". You take the

>> neighborhood (with radius 2^31, or 2^63) of the "last value", and if the

>> new reading falls into the upper half of that neighborhood, you say "the

>> value has grown".

>>

>> This idea is actually very well suited for uintN_t modular arithmetic,

>> because the (x - y) difference expresses the number of times you have to

>> increment y to make it fall into the same remainder class as x, modulo 2^N.

>>

>> Hence, ((x - y) < 2^(N-1)) expresses "x is later than or equal to y"

>> (with both x and y being uintN_t variables). Equivalently, we have ((x -

>> y) >= 2^(N-1)) meaning "x is strictly earlier than y", which can also be

>> said as "y is strictly after x".

>>

>> And I think that's exactly what these functions implement:

>>

>> - Their names say "time after".

>>

>> - The condition

>>

>>   (x - y) >= 2^(N-1)

>>

>>   tests exactly whether the most significant bit is set in the

>>   difference.

>>

>>   When the bit pattern of the difference is reinterpreted as intN_t,

>>   that in turn means

>>

>>   (intN_t)(x - y) < 0

>>

>> So the functions seem to check if "a is strictly after b".

>>

>> ... The call sites seem to confirm this:

>>

>>         if (t->config & HPET_TN_32BIT) {

>>             while (hpet_time_after(cur_tick, t->cmp)) {

>>                 t->cmp = (uint32_t)(t->cmp + t->period);

>>             }

>>         } else {

>>             while (hpet_time_after64(cur_tick, t->cmp)) {

>>                 t->cmp += period;

>>             }

>>         }

>>

>> The loops increment "t->cmp" as long as "cur_tick is strictly after

>> t->cmp"; in other words, the loops make "t->cmp" catch up with "cur_tick".

>>

>> ... I think the functions are right after all, it's just that the

>> following would have matched my personal taste more:

>>

>>   b - a >= 1u << 31

>>

>> and

>>

>>   b - a >= 1ull << 63

>>

>> (Because they don't have any impl-def parts in them, plus to me they

>> make the intent, with the modular arithmetic and the "neighborhoods",

>> clearer.)

>>

>> I guess for others it's the opposite... :)

>>

>> Cheers

>> Laszlo

>>
Peter Maydell Nov. 10, 2015, 10:04 a.m. UTC | #6
On 9 November 2015 at 20:17, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Nov 09, 2015 at 02:56:31PM +0000, Peter Maydell wrote:

>> Signed integer overflow in C is undefined behaviour, and the compiler

>> is at liberty to assume it can never happen and optimize accordingly.

>> In particular, the subtractions in hpet_time_after() and hpet_time_after64()

>> were causing OSX clang to optimize the code such that it was prone to

>> hangs and complaints about the main loop stalling (presumably because

>> we were spending all our time trying to service very high frequency

>> HPET timer callbacks). The clang sanitizer confirms the UB:

>>

>> hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 - 2147003978 cannot be represented in type 'int'

>>

>> Fix this by doing the subtraction as an unsigned operation and then

>> converting to signed for the comparison.

>>

>> Reported-by: Aaron Elkins <threcius@yahoo.com>

>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

>

> Agree, this makes no sense the way it's written.

>

> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

>

> I'll pick this up in the next pull if Paolo doesn't

> beat me to it.


I went ahead and committed it to master yesterday; sorry
if that was a bit hasty of me.

thanks
-- PMM
diff mbox

Patch

diff --git a/hw/timer/hpet.c b/hw/timer/hpet.c
index 3037bef..7f0391c 100644
--- a/hw/timer/hpet.c
+++ b/hw/timer/hpet.c
@@ -116,12 +116,12 @@  static uint32_t timer_enabled(HPETTimer *t)
 
 static uint32_t hpet_time_after(uint64_t a, uint64_t b)
 {
-    return ((int32_t)(b) - (int32_t)(a) < 0);
+    return ((int32_t)(b - a) < 0);
 }
 
 static uint32_t hpet_time_after64(uint64_t a, uint64_t b)
 {
-    return ((int64_t)(b) - (int64_t)(a) < 0);
+    return ((int64_t)(b - a) < 0);
 }
 
 static uint64_t ticks_to_ns(uint64_t value)