diff mbox series

Revert "lib: Improve _parse_integer_fixup_radix base 16 detection"

Message ID 20200607053645.1408462-1-seanga2@gmail.com
State Accepted
Commit 1fae74125b644bb7f21e4afef3b9084c58e7863a
Headers show
Series Revert "lib: Improve _parse_integer_fixup_radix base 16 detection" | expand

Commit Message

Sean Anderson June 7, 2020, 5:36 a.m. UTC
This reverts commit 0486497e2b5f4d36fa968a1a60fea358cbf70b65.

The strtoul has well-defined semantics. It is defined by the C standard and
POSIX. To quote the relevant section of the man pages,

> If base is zero or 16, the string may then include a "0x" prefix, and the
> number will be read in base 16; otherwise, a zero base is taken as 10
> (decimal) unless the next character is '0', in which case it is taken as
> 8 (octal).

Keeping these semantics is important for several reasons. First, it is very
surprising for standard library functions to behave differently than usual.
Every other implementation of strtoul has different semantics than the
implementation in U-Boot at the moment. Second, it can result in very
surprising results from small changes. For example, changing the string
"1f" to "20" causes the parsed value to *decrease*. Forcing use of the "0x"
prefix to specify hexidecimal numbers is a feature, not a bug. Lastly, this
is slightly less performant, since the entire number is parsed twice.

This fixes the str_simple_strtoul test failing with

test/str_ut.c:29, run_strtoul(): expect_val == val: Expected 0x44b (1099), got 0x1099ab (1087915)
test/str_ut.c:46, str_simple_strtoul(): 0 == run_strtoul(uts, str2, 0, 1099, 4): Expected 0x0 (0), got 0x1 (1)

Signed-off-by: Sean Anderson <seanga2 at gmail.com>
CC: Michal Simek <michal.simek at xilinx.com>
CC: Shiril Tichkule <shirilt at xilinx.com>
---

 lib/strto.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

Comments

Heinrich Schuchardt June 7, 2020, 6:40 a.m. UTC | #1
On 6/7/20 7:36 AM, Sean Anderson wrote:
> This reverts commit 0486497e2b5f4d36fa968a1a60fea358cbf70b65.
>
> The strtoul has well-defined semantics. It is defined by the C standard and
> POSIX. To quote the relevant section of the man pages,
>
>> If base is zero or 16, the string may then include a "0x" prefix, and the
>> number will be read in base 16; otherwise, a zero base is taken as 10
>> (decimal) unless the next character is '0', in which case it is taken as
>> 8 (octal).
>
> Keeping these semantics is important for several reasons. First, it is very
> surprising for standard library functions to behave differently than usual.
> Every other implementation of strtoul has different semantics than the
> implementation in U-Boot at the moment. Second, it can result in very
> surprising results from small changes. For example, changing the string
> "1f" to "20" causes the parsed value to *decrease*. Forcing use of the "0x"
> prefix to specify hexidecimal numbers is a feature, not a bug. Lastly, this
> is slightly less performant, since the entire number is parsed twice.
>
> This fixes the str_simple_strtoul test failing with
>
> test/str_ut.c:29, run_strtoul(): expect_val == val: Expected 0x44b (1099), got 0x1099ab (1087915)
> test/str_ut.c:46, str_simple_strtoul(): 0 == run_strtoul(uts, str2, 0, 1099, 4): Expected 0x0 (0), got 0x1 (1)
>

I suggest to add a Fixes line here:

Fixes: 0486497e2b5f ("lib: Improve _parse_integer_fixup_radix base 16
detection")

Cf.
https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes

> Signed-off-by: Sean Anderson <seanga2 at gmail.com>
> CC: Michal Simek <michal.simek at xilinx.com>
> CC: Shiril Tichkule <shirilt at xilinx.com>
> ---
>
>  lib/strto.c | 18 +-----------------
>  1 file changed, 1 insertion(+), 17 deletions(-)
>
> diff --git a/lib/strto.c b/lib/strto.c
> index 3d77115d4d..c00bb5895d 100644
> --- a/lib/strto.c
> +++ b/lib/strto.c
> @@ -22,25 +22,9 @@ static const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
>  				*base = 16;
>  			else
>  				*base = 8;
> -		} else {
> -			int i = 0;
> -			char var;
> -
> +		} else

scripts/checkpatch.pl reports a problem:

CHECK: Unbalanced braces around else statement
#50: FILE: lib/strto.c:25:
+               } else

Best regards

Heinrich

>  			*base = 10;
> -
> -			do {
> -				var = tolower(s[i++]);
> -				if (var >= 'a' && var <= 'f') {
> -					*base = 16;
> -					break;
> -				}
> -
> -				if (!(var >= '0' && var <= '9'))
> -					break;
> -			} while (var);
> -		}
>  	}
> -
>  	if (*base == 16 && s[0] == '0' && tolower(s[1]) == 'x')
>  		s += 2;
>  	return s;
>
Sean Anderson June 7, 2020, 6:57 a.m. UTC | #2
On 6/7/20 2:40 AM, Heinrich Schuchardt wrote:
> On 6/7/20 7:36 AM, Sean Anderson wrote:
>> This reverts commit 0486497e2b5f4d36fa968a1a60fea358cbf70b65.
>>
>> The strtoul has well-defined semantics. It is defined by the C standard and
>> POSIX. To quote the relevant section of the man pages,
>>
>>> If base is zero or 16, the string may then include a "0x" prefix, and the
>>> number will be read in base 16; otherwise, a zero base is taken as 10
>>> (decimal) unless the next character is '0', in which case it is taken as
>>> 8 (octal).
>>
>> Keeping these semantics is important for several reasons. First, it is very
>> surprising for standard library functions to behave differently than usual.
>> Every other implementation of strtoul has different semantics than the
>> implementation in U-Boot at the moment. Second, it can result in very
>> surprising results from small changes. For example, changing the string
>> "1f" to "20" causes the parsed value to *decrease*. Forcing use of the "0x"
>> prefix to specify hexidecimal numbers is a feature, not a bug. Lastly, this
>> is slightly less performant, since the entire number is parsed twice.
>>
>> This fixes the str_simple_strtoul test failing with
>>
>> test/str_ut.c:29, run_strtoul(): expect_val == val: Expected 0x44b (1099), got 0x1099ab (1087915)
>> test/str_ut.c:46, str_simple_strtoul(): 0 == run_strtoul(uts, str2, 0, 1099, 4): Expected 0x0 (0), got 0x1 (1)
>>
> 
> I suggest to add a Fixes line here:
> 
> Fixes: 0486497e2b5f ("lib: Improve _parse_integer_fixup_radix base 16
> detection")

Thanks, I will add that.

> 
> Cf.
> https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html#using-reported-by-tested-by-reviewed-by-suggested-by-and-fixes
> 
>> Signed-off-by: Sean Anderson <seanga2 at gmail.com>
>> CC: Michal Simek <michal.simek at xilinx.com>
>> CC: Shiril Tichkule <shirilt at xilinx.com>
>> ---
>>
>>  lib/strto.c | 18 +-----------------
>>  1 file changed, 1 insertion(+), 17 deletions(-)
>>
>> diff --git a/lib/strto.c b/lib/strto.c
>> index 3d77115d4d..c00bb5895d 100644
>> --- a/lib/strto.c
>> +++ b/lib/strto.c
>> @@ -22,25 +22,9 @@ static const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
>>  				*base = 16;
>>  			else
>>  				*base = 8;
>> -		} else {
>> -			int i = 0;
>> -			char var;
>> -
>> +		} else
> 
> scripts/checkpatch.pl reports a problem:
> 
> CHECK: Unbalanced braces around else statement
> #50: FILE: lib/strto.c:25:
> +               } else

I chose this form since that is what is used in Linux, and that is
how it formatted before the commit this reverts. I do think that it is
better form to have braces around this else block, though.

--Sean
Michal Simek June 8, 2020, 6:24 a.m. UTC | #3
On 07. 06. 20 7:36, Sean Anderson wrote:
> This reverts commit 0486497e2b5f4d36fa968a1a60fea358cbf70b65.
> 
> The strtoul has well-defined semantics. It is defined by the C standard and
> POSIX. To quote the relevant section of the man pages,
> 
>> If base is zero or 16, the string may then include a "0x" prefix, and the
>> number will be read in base 16; otherwise, a zero base is taken as 10
>> (decimal) unless the next character is '0', in which case it is taken as
>> 8 (octal).
> 
> Keeping these semantics is important for several reasons. First, it is very
> surprising for standard library functions to behave differently than usual.
> Every other implementation of strtoul has different semantics than the
> implementation in U-Boot at the moment. Second, it can result in very
> surprising results from small changes. For example, changing the string
> "1f" to "20" causes the parsed value to *decrease*. Forcing use of the "0x"
> prefix to specify hexidecimal numbers is a feature, not a bug. Lastly, this
> is slightly less performant, since the entire number is parsed twice.
> 
> This fixes the str_simple_strtoul test failing with
> 
> test/str_ut.c:29, run_strtoul(): expect_val == val: Expected 0x44b (1099), got 0x1099ab (1087915)
> test/str_ut.c:46, str_simple_strtoul(): 0 == run_strtoul(uts, str2, 0, 1099, 4): Expected 0x0 (0), got 0x1 (1)
> 
> Signed-off-by: Sean Anderson <seanga2 at gmail.com>
> CC: Michal Simek <michal.simek at xilinx.com>
> CC: Shiril Tichkule <shirilt at xilinx.com>
> ---
> 
>  lib/strto.c | 18 +-----------------
>  1 file changed, 1 insertion(+), 17 deletions(-)
> 
> diff --git a/lib/strto.c b/lib/strto.c
> index 3d77115d4d..c00bb5895d 100644
> --- a/lib/strto.c
> +++ b/lib/strto.c
> @@ -22,25 +22,9 @@ static const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
>  				*base = 16;
>  			else
>  				*base = 8;
> -		} else {
> -			int i = 0;
> -			char var;
> -
> +		} else
>  			*base = 10;
> -
> -			do {
> -				var = tolower(s[i++]);
> -				if (var >= 'a' && var <= 'f') {
> -					*base = 16;
> -					break;
> -				}
> -
> -				if (!(var >= '0' && var <= '9'))
> -					break;
> -			} while (var);
> -		}
>  	}
> -
>  	if (*base == 16 && s[0] == '0' && tolower(s[1]) == 'x')
>  		s += 2;
>  	return s;
> 


It is in u-boot mainline from February. Then we had to fix it in April.
In the middle of this I have seen IIC one patchset which improves hex
handling which is likely better way then this.
I am fine with reverting this patch but I would also like to see more
comments in the code to say that we shouldn't really change this.

Thanks,
Michal
Sean Anderson June 8, 2020, 5:05 p.m. UTC | #4
On 6/8/20 2:24 AM, Michal Simek wrote:
> It is in u-boot mainline from February. Then we had to fix it in April.

Do you have a link/commit hash for said fix?

> In the middle of this I have seen IIC one patchset which improves hex
> handling which is likely better way then this.

Same for this.

> I am fine with reverting this patch but I would also like to see more
> comments in the code to say that we shouldn't really change this.

This seems like a larger issue. There is a lot of code in U-Boot
implementing the C standard library, especially the string library.
Should we have a comment in each file which says something like "The
functions here implement the C standard, think carefully before making
semantic changes"?

--Sean
Simon Glass June 8, 2020, 5:12 p.m. UTC | #5
Hi,

On Mon, 8 Jun 2020 at 00:24, Michal Simek <michal.simek at xilinx.com> wrote:
>
> On 07. 06. 20 7:36, Sean Anderson wrote:
> > This reverts commit 0486497e2b5f4d36fa968a1a60fea358cbf70b65.
> >
> > The strtoul has well-defined semantics. It is defined by the C standard and
> > POSIX. To quote the relevant section of the man pages,
> >
> >> If base is zero or 16, the string may then include a "0x" prefix, and the
> >> number will be read in base 16; otherwise, a zero base is taken as 10
> >> (decimal) unless the next character is '0', in which case it is taken as
> >> 8 (octal).
> >
> > Keeping these semantics is important for several reasons. First, it is very
> > surprising for standard library functions to behave differently than usual.
> > Every other implementation of strtoul has different semantics than the
> > implementation in U-Boot at the moment. Second, it can result in very
> > surprising results from small changes. For example, changing the string
> > "1f" to "20" causes the parsed value to *decrease*. Forcing use of the "0x"
> > prefix to specify hexidecimal numbers is a feature, not a bug. Lastly, this
> > is slightly less performant, since the entire number is parsed twice.
> >
> > This fixes the str_simple_strtoul test failing with
> >
> > test/str_ut.c:29, run_strtoul(): expect_val == val: Expected 0x44b (1099), got 0x1099ab (1087915)
> > test/str_ut.c:46, str_simple_strtoul(): 0 == run_strtoul(uts, str2, 0, 1099, 4): Expected 0x0 (0), got 0x1 (1)
> >
> > Signed-off-by: Sean Anderson <seanga2 at gmail.com>
> > CC: Michal Simek <michal.simek at xilinx.com>
> > CC: Shiril Tichkule <shirilt at xilinx.com>
> > ---
> >
> >  lib/strto.c | 18 +-----------------
> >  1 file changed, 1 insertion(+), 17 deletions(-)
> >
> > diff --git a/lib/strto.c b/lib/strto.c
> > index 3d77115d4d..c00bb5895d 100644
> > --- a/lib/strto.c
> > +++ b/lib/strto.c
> > @@ -22,25 +22,9 @@ static const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
> >                               *base = 16;
> >                       else
> >                               *base = 8;
> > -             } else {
> > -                     int i = 0;
> > -                     char var;
> > -
> > +             } else
> >                       *base = 10;
> > -
> > -                     do {
> > -                             var = tolower(s[i++]);
> > -                             if (var >= 'a' && var <= 'f') {
> > -                                     *base = 16;
> > -                                     break;
> > -                             }
> > -
> > -                             if (!(var >= '0' && var <= '9'))
> > -                                     break;
> > -                     } while (var);
> > -             }
> >       }
> > -
> >       if (*base == 16 && s[0] == '0' && tolower(s[1]) == 'x')
> >               s += 2;
> >       return s;
> >
>
>
> It is in u-boot mainline from February. Then we had to fix it in April.
> In the middle of this I have seen IIC one patchset which improves hex
> handling which is likely better way then this.
> I am fine with reverting this patch but I would also like to see more
> comments in the code to say that we shouldn't really change this.

Once this revert is in I think we should refrain from changing it
until there are tests to cover its current behaviour, e.g. in str_ut.c

Reviewed-by: Simon Glass <sjg at chromium.org>

Regards,
Simon
Tom Rini June 8, 2020, 8:30 p.m. UTC | #6
On Mon, Jun 08, 2020 at 01:05:21PM -0400, Sean Anderson wrote:
> On 6/8/20 2:24 AM, Michal Simek wrote:
> > It is in u-boot mainline from February. Then we had to fix it in April.
> 
> Do you have a link/commit hash for said fix?
> 
> > In the middle of this I have seen IIC one patchset which improves hex
> > handling which is likely better way then this.
> 
> Same for this.
> 
> > I am fine with reverting this patch but I would also like to see more
> > comments in the code to say that we shouldn't really change this.
> 
> This seems like a larger issue. There is a lot of code in U-Boot
> implementing the C standard library, especially the string library.
> Should we have a comment in each file which says something like "The
> functions here implement the C standard, think carefully before making
> semantic changes"?

Honestly?  Yes, it would be good to note which functions do their best
to implement / conform to C standard and which _may_ deviate from it.

For example, we have an alternative printf that doesn't support all
format characters.  Making that clearer (and then what ones our normal
printf doesn't handle) would be good.
Tom Rini June 15, 2020, 7:55 p.m. UTC | #7
On Sun, Jun 07, 2020 at 01:36:45AM -0400, Sean Anderson wrote:

> This reverts commit 0486497e2b5f4d36fa968a1a60fea358cbf70b65.
> 
> The strtoul has well-defined semantics. It is defined by the C standard and
> POSIX. To quote the relevant section of the man pages,
> 
> > If base is zero or 16, the string may then include a "0x" prefix, and the
> > number will be read in base 16; otherwise, a zero base is taken as 10
> > (decimal) unless the next character is '0', in which case it is taken as
> > 8 (octal).
> 
> Keeping these semantics is important for several reasons. First, it is very
> surprising for standard library functions to behave differently than usual.
> Every other implementation of strtoul has different semantics than the
> implementation in U-Boot at the moment. Second, it can result in very
> surprising results from small changes. For example, changing the string
> "1f" to "20" causes the parsed value to *decrease*. Forcing use of the "0x"
> prefix to specify hexidecimal numbers is a feature, not a bug. Lastly, this
> is slightly less performant, since the entire number is parsed twice.
> 
> This fixes the str_simple_strtoul test failing with
> 
> test/str_ut.c:29, run_strtoul(): expect_val == val: Expected 0x44b (1099), got 0x1099ab (1087915)
> test/str_ut.c:46, str_simple_strtoul(): 0 == run_strtoul(uts, str2, 0, 1099, 4): Expected 0x0 (0), got 0x1 (1)
> 
> Signed-off-by: Sean Anderson <seanga2 at gmail.com>
> CC: Michal Simek <michal.simek at xilinx.com>
> CC: Shiril Tichkule <shirilt at xilinx.com>
> Reviewed-by: Simon Glass <sjg at chromium.org>

Applied to u-boot/master, thanks!
diff mbox series

Patch

diff --git a/lib/strto.c b/lib/strto.c
index 3d77115d4d..c00bb5895d 100644
--- a/lib/strto.c
+++ b/lib/strto.c
@@ -22,25 +22,9 @@  static const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
 				*base = 16;
 			else
 				*base = 8;
-		} else {
-			int i = 0;
-			char var;
-
+		} else
 			*base = 10;
-
-			do {
-				var = tolower(s[i++]);
-				if (var >= 'a' && var <= 'f') {
-					*base = 16;
-					break;
-				}
-
-				if (!(var >= '0' && var <= '9'))
-					break;
-			} while (var);
-		}
 	}
-
 	if (*base == 16 && s[0] == '0' && tolower(s[1]) == 'x')
 		s += 2;
 	return s;