Message ID | 20210203165009.6299-2-rf@opensource.cirrus.com |
---|---|
State | Superseded |
Headers | show |
Series | None | expand |
On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote: > On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote: > > The existing code attempted to handle numbers by doing a strto[u]l(), > > ignoring the field width, and then repeatedly dividing to extract the > > field out of the full converted value. If the string contains a run of > > valid digits longer than will fit in a long or long long, this would > > overflow and no amount of dividing can recover the correct value. > ... > > > + for (; max_chars > 0; max_chars--) { > > Less fragile is to write > > while (max_chars--) Except that the original was more obvious at least for me. I always prefer more readable code when the compiler might do the optimization easily. But this is my personal taste. I am fine with both variants. > > This allows max_char to be an unsigned type. > > Moreover... > > > + return _parse_integer_limit(s, base, p, INT_MAX); > > You have inconsistency with INT_MAX vs, size_t above. Ah, this was on my request. INT_MAX is already used on many other locations in vsnprintf() for this purpose. An alternative is to fix all the other locations. We would need to check if it is really safe. Well, I do not want to force Richard to fix this historical mess. He already put a lot lot of effort into fixing this long term issue. ... > > - unsigned long long result; > > + const char *cp; > > + unsigned long long result = 0ULL; > > unsigned int rv; > > > > - cp = _parse_integer_fixup_radix(cp, &base); > > - rv = _parse_integer(cp, base, &result); > > > + if (max_chars == 0) { > > + cp = startp; > > + goto out; > > + } > > It's redundant if I'm not mistaken. Also this is more obvious and less error prone from my POV. But I agree that it is redundant. I am not sure if this function is used in some fast paths. Again I am fine with both variants. > > + cp = _parse_integer_fixup_radix(startp, &base); > > + if ((cp - startp) >= max_chars) { > > + cp = startp + max_chars; > > + goto out; > > + } > > This will be exactly the same, no? Best Regards, Petr
On 04/02/2021 16:35, Petr Mladek wrote: > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote: >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote: >>> The existing code attempted to handle numbers by doing a strto[u]l(), >>> ignoring the field width, and then repeatedly dividing to extract the >>> field out of the full converted value. If the string contains a run of >>> valid digits longer than will fit in a long or long long, this would >>> overflow and no amount of dividing can recover the correct value. > >> ... >> >>> + for (; max_chars > 0; max_chars--) { >> >> Less fragile is to write >> >> while (max_chars--) > > Except that the original was more obvious at least for me. > I always prefer more readable code when the compiler might do > the optimization easily. But this is my personal taste. > I am fine with both variants. > >> >> This allows max_char to be an unsigned type. >> >> Moreover... >> >>> + return _parse_integer_limit(s, base, p, INT_MAX); >> >> You have inconsistency with INT_MAX vs, size_t above. > > Ah, this was on my request. INT_MAX is already used on many other > locations in vsnprintf() for this purpose. > I originally had UINT_MAX and changed on Petr's request to be consistent with other code. (Sorry Andy - my mistake not including you on the earlier review versions). But 0 < INT_MAX < UINT_MAX, so ok to pass to an unsigned. And as Petr said on his original review, INT_MAX is "big enough". I don't mind either way. > An alternative is to fix all the other locations. We would need to > check if it is really safe. Well, I do not want to force Richard > to fix this historical mess. He already put a lot lot of effort > into fixing this long term issue. > > ... > >>> - unsigned long long result; >>> + const char *cp; >>> + unsigned long long result = 0ULL; >>> unsigned int rv; >>> >>> - cp = _parse_integer_fixup_radix(cp, &base); >>> - rv = _parse_integer(cp, base, &result); >> >>> + if (max_chars == 0) { >>> + cp = startp; >>> + goto out; >>> + } >> >> It's redundant if I'm not mistaken. > > Also this is more obvious and less error prone from my POV. > But I agree that it is redundant. I am not sure if this > function is used in some fast paths. > > Again I am fine with both variants. > >>> + cp = _parse_integer_fixup_radix(startp, &base); >>> + if ((cp - startp) >= max_chars) { >>> + cp = startp + max_chars; >>> + goto out; >>> + } >> >> This will be exactly the same, no? > > Best Regards, > Petr >
On Fri, Feb 5, 2021 at 1:35 PM Richard Fitzgerald <rf@opensource.cirrus.com> wrote: > On 04/02/2021 16:35, Petr Mladek wrote: > > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote: > >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote: ... > >>> + for (; max_chars > 0; max_chars--) { > >> > >> Less fragile is to write > >> > >> while (max_chars--) > > > > Except that the original was more obvious at least for me. > > I always prefer more readable code when the compiler might do > > the optimization easily. But this is my personal taste. > > I am fine with both variants. I *slightly* prefer while-loop *in this case* due to less characters to parse to understand the logic. > >> This allows max_char to be an unsigned type. > >> > >> Moreover... > >> > >>> + return _parse_integer_limit(s, base, p, INT_MAX); > >> > >> You have inconsistency with INT_MAX vs, size_t above. > > > > Ah, this was on my request. INT_MAX is already used on many other > > locations in vsnprintf() for this purpose. > > I originally had UINT_MAX and changed on Petr's request to be > consistent with other code. (Sorry Andy - my mistake not including > you on the earlier review versions). > > But 0 < INT_MAX < UINT_MAX, so ok to pass to an unsigned. And as Petr > said on his original review, INT_MAX is "big enough". Some code has INT_MAX, some has UINT_MAX, while the parameter is size_t. I think all of these inconsistencies should have a comment either in the code, or in the commit message, or in the cover letter (depending on the importance). Or being fixed to be more consistent with existing code. Whichever you consider better. > I don't mind either way. > > > An alternative is to fix all the other locations. We would need to > > check if it is really safe. Well, I do not want to force Richard > > to fix this historical mess. He already put a lot lot of effort > > into fixing this long term issue. ... > >>> - unsigned long long result; > >>> + const char *cp; > >>> + unsigned long long result = 0ULL; > >>> unsigned int rv; > >>> > >>> - cp = _parse_integer_fixup_radix(cp, &base); > >>> - rv = _parse_integer(cp, base, &result); > >> > >>> + if (max_chars == 0) { > >>> + cp = startp; > >>> + goto out; > >>> + } > >> > >> It's redundant if I'm not mistaken. > > > > Also this is more obvious and less error prone from my POV. > > But I agree that it is redundant. I am not sure if this > > function is used in some fast paths. > > > > Again I am fine with both variants. I think we don't need a (redundant) code, but rather a comment. > >>> + cp = _parse_integer_fixup_radix(startp, &base); > >>> + if ((cp - startp) >= max_chars) { > >>> + cp = startp + max_chars; > >>> + goto out; > >>> + } > >> > >> This will be exactly the same, no? -- With Best Regards, Andy Shevchenko
From: Andy Shevchenko > Sent: 05 February 2021 12:51 > > On Fri, Feb 5, 2021 at 1:35 PM Richard Fitzgerald > <rf@opensource.cirrus.com> wrote: > > On 04/02/2021 16:35, Petr Mladek wrote: > > > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote: > > >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote: > > ... > > > >>> + for (; max_chars > 0; max_chars--) { > > >> > > >> Less fragile is to write > > >> > > >> while (max_chars--) > > > > > > Except that the original was more obvious at least for me. > > > I always prefer more readable code when the compiler might do > > > the optimization easily. But this is my personal taste. > > > I am fine with both variants. > > I *slightly* prefer while-loop *in this case* due to less characters > to parse to understand the logic. The two loops are also have different values for 'max_chars' inside the loop body. If 'max_chars' is known to be non-zero the do ... while (--max_chars); loop will probable generate better code. But there is no accounting for just how odd some decisions gcc makes are. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, Feb 5, 2021 at 5:23 PM David Laight <David.Laight@aculab.com> wrote: > From: Andy Shevchenko > > Sent: 05 February 2021 12:51 > > On Fri, Feb 5, 2021 at 1:35 PM Richard Fitzgerald > > <rf@opensource.cirrus.com> wrote: > > > On 04/02/2021 16:35, Petr Mladek wrote: > > > > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote: > > > >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote: > > > > ... > > > > > >>> + for (; max_chars > 0; max_chars--) { > > > >> > > > >> Less fragile is to write > > > >> > > > >> while (max_chars--) > > > > > > > > Except that the original was more obvious at least for me. > > > > I always prefer more readable code when the compiler might do > > > > the optimization easily. But this is my personal taste. > > > > I am fine with both variants. > > > > I *slightly* prefer while-loop *in this case* due to less characters > > to parse to understand the logic. > > The two loops are also have different values for 'max_chars' > inside the loop body. off-by-one to be precise. > If 'max_chars' is known to be non-zero the do ... while (--max_chars); > loop will probable generate better code. What?! while (--x) and while(x--) are equivalent. > But there is no accounting for just how odd some decisions gcc > makes are. Why should we care about the compiler bugs here? -- With Best Regards, Andy Shevchenko
On 04/02/2021 16:35, Petr Mladek wrote: > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote: >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote: >>> The existing code attempted to handle numbers by doing a strto[u]l(), >>> ignoring the field width, and then repeatedly dividing to extract the >>> field out of the full converted value. If the string contains a run of >>> valid digits longer than will fit in a long or long long, this would >>> overflow and no amount of dividing can recover the correct value. > >> ... >> >>> + for (; max_chars > 0; max_chars--) { >> >> Less fragile is to write >> >> while (max_chars--) > > Except that the original was more obvious at least for me. > I always prefer more readable code when the compiler might do > the optimization easily. But this is my personal taste. > I am fine with both variants. > >> >> This allows max_char to be an unsigned type. >> >> Moreover... >> >>> + return _parse_integer_limit(s, base, p, INT_MAX); >> >> You have inconsistency with INT_MAX vs, size_t above. > > Ah, this was on my request. INT_MAX is already used on many other > locations in vsnprintf() for this purpose. > Strictly speaking this should be SIZE_MAX because the argument is a size_t.
On Fri 2021-02-05 14:50:56, Andy Shevchenko wrote: > On Fri, Feb 5, 2021 at 1:35 PM Richard Fitzgerald > <rf@opensource.cirrus.com> wrote: > > On 04/02/2021 16:35, Petr Mladek wrote: > > > On Wed 2021-02-03 21:45:55, Andy Shevchenko wrote: > > >> On Wed, Feb 03, 2021 at 04:50:07PM +0000, Richard Fitzgerald wrote: > > >> This allows max_char to be an unsigned type. > > >> > > >> Moreover... > > >> > > >>> + return _parse_integer_limit(s, base, p, INT_MAX); > > >> > > >> You have inconsistency with INT_MAX vs, size_t above. > > > > > > Ah, this was on my request. INT_MAX is already used on many other > > > locations in vsnprintf() for this purpose. > > > > I originally had UINT_MAX and changed on Petr's request to be > > consistent with other code. (Sorry Andy - my mistake not including > > you on the earlier review versions). > > > > But 0 < INT_MAX < UINT_MAX, so ok to pass to an unsigned. And as Petr > > said on his original review, INT_MAX is "big enough". > > Some code has INT_MAX, some has UINT_MAX, while the parameter is size_t. Yeah, if I remember correctly I wanted to have INT_MAX everywhere but I did not want to nitpick about it in the later versions. It looked like an arbitrary number anyway. > I think all of these inconsistencies should have a comment either in > the code, or in the commit message, or in the cover letter (depending > on the importance). > Or being fixed to be more consistent with existing code. Whichever you > consider better. OK, you made me to do some archaeology. The INT_MAX limit has been added into vsnprintf() in 2.6.2 by the commit: Author: Linus Torvalds <torvalds@home.osdl.org> Date: Mon Feb 2 21:17:29 2004 -0800 Warn loudly if somebody passes a negative value as the size to "vsnprintf()". That's a pretty clear case of overflow. It might catch problems. And the limit seems to have worked all the time. IMHO, it would make sense to have INT_MAX limit also in _parse_integer_limit() and WARN() when a larger value is passed. By other words, it would mean to add this check and use INT_MAX everywhere in this patch. Best Regards, Petr
diff --git a/lib/kstrtox.c b/lib/kstrtox.c index a118b0b1e9b2..e6d14945df4f 100644 --- a/lib/kstrtox.c +++ b/lib/kstrtox.c @@ -39,20 +39,22 @@ const char *_parse_integer_fixup_radix(const char *s, unsigned int *base) /* * Convert non-negative integer string representation in explicitly given radix - * to an integer. + * to an integer. A maximum of max_chars characters will be converted. + * * Return number of characters consumed maybe or-ed with overflow bit. * If overflow occurs, result integer (incorrect) is still returned. * * Don't you dare use this function. */ -unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p) +unsigned int _parse_integer_limit(const char *s, unsigned int base, + unsigned long long *p, size_t max_chars) { unsigned long long res; unsigned int rv; res = 0; rv = 0; - while (1) { + for (; max_chars > 0; max_chars--) { unsigned int c = *s; unsigned int lc = c | 0x20; /* don't tolower() this line */ unsigned int val; @@ -82,6 +84,11 @@ unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long return rv; } +unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p) +{ + return _parse_integer_limit(s, base, p, INT_MAX); +} + static int _kstrtoull(const char *s, unsigned int base, unsigned long long *res) { unsigned long long _res; diff --git a/lib/kstrtox.h b/lib/kstrtox.h index 3b4637bcd254..4c6536f85cac 100644 --- a/lib/kstrtox.h +++ b/lib/kstrtox.h @@ -4,6 +4,8 @@ #define KSTRTOX_OVERFLOW (1U << 31) const char *_parse_integer_fixup_radix(const char *s, unsigned int *base); +unsigned int _parse_integer_limit(const char *s, unsigned int base, + unsigned long long *res, size_t max_chars); unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *res); #endif diff --git a/lib/vsprintf.c b/lib/vsprintf.c index 28bb26cd1f67..6afc47575331 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -53,29 +53,47 @@ #include <linux/string_helpers.h> #include "kstrtox.h" -/** - * simple_strtoull - convert a string to an unsigned long long - * @cp: The start of the string - * @endp: A pointer to the end of the parsed string will be placed here - * @base: The number base to use - * - * This function has caveats. Please use kstrtoull instead. - */ -unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base) +static unsigned long long simple_strntoull(const char *startp, size_t max_chars, + char **endp, unsigned int base) { - unsigned long long result; + const char *cp; + unsigned long long result = 0ULL; unsigned int rv; - cp = _parse_integer_fixup_radix(cp, &base); - rv = _parse_integer(cp, base, &result); + if (max_chars == 0) { + cp = startp; + goto out; + } + + cp = _parse_integer_fixup_radix(startp, &base); + if ((cp - startp) >= max_chars) { + cp = startp + max_chars; + goto out; + } + + max_chars -= (cp - startp); + rv = _parse_integer_limit(cp, base, &result, max_chars); /* FIXME */ cp += (rv & ~KSTRTOX_OVERFLOW); - +out: if (endp) *endp = (char *)cp; return result; } + +/** + * simple_strtoull - convert a string to an unsigned long long + * @cp: The start of the string + * @endp: A pointer to the end of the parsed string will be placed here + * @base: The number base to use + * + * This function has caveats. Please use kstrtoull instead. + */ +unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base) +{ + return simple_strntoull(cp, UINT_MAX, endp, base); +} EXPORT_SYMBOL(simple_strtoull); /** @@ -88,7 +106,7 @@ EXPORT_SYMBOL(simple_strtoull); */ unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base) { - return simple_strtoull(cp, endp, base); + return simple_strntoull(cp, UINT_MAX, endp, base); } EXPORT_SYMBOL(simple_strtoul); @@ -109,6 +127,19 @@ long simple_strtol(const char *cp, char **endp, unsigned int base) } EXPORT_SYMBOL(simple_strtol); +static long long simple_strntoll(const char *cp, size_t max_chars, char **endp, + unsigned int base) +{ + /* + * simple_strntoull safely handles receiving max_chars==0 in the + * case we start with max_chars==1 and find a '-' prefix. + */ + if (*cp == '-' && max_chars > 0) + return -simple_strntoull(cp + 1, max_chars - 1, endp, base); + + return simple_strntoull(cp, max_chars, endp, base); +} + /** * simple_strtoll - convert a string to a signed long long * @cp: The start of the string @@ -119,10 +150,7 @@ EXPORT_SYMBOL(simple_strtol); */ long long simple_strtoll(const char *cp, char **endp, unsigned int base) { - if (*cp == '-') - return -simple_strtoull(cp + 1, endp, base); - - return simple_strtoull(cp, endp, base); + return simple_strntoll(cp, UINT_MAX, endp, base); } EXPORT_SYMBOL(simple_strtoll); @@ -3449,25 +3477,13 @@ int vsscanf(const char *buf, const char *fmt, va_list args) break; if (is_sign) - val.s = qualifier != 'L' ? - simple_strtol(str, &next, base) : - simple_strtoll(str, &next, base); + val.s = simple_strntoll(str, + field_width > 0 ? field_width : UINT_MAX, + &next, base); else - val.u = qualifier != 'L' ? - simple_strtoul(str, &next, base) : - simple_strtoull(str, &next, base); - - if (field_width > 0 && next - str > field_width) { - if (base == 0) - _parse_integer_fixup_radix(str, &base); - while (next - str > field_width) { - if (is_sign) - val.s = div_s64(val.s, base); - else - val.u = div_u64(val.u, base); - --next; - } - } + val.u = simple_strntoull(str, + field_width > 0 ? field_width : UINT_MAX, + &next, base); switch (qualifier) { case 'H': /* that's 'hh' in format */