Message ID | 1526931539-11863-1-git-send-email-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | manual: Document optind zero set behaviour (BZ#23157) | expand |
On 05/21/2018 03:38 PM, Adhemerval Zanella wrote: > POSIX [1] does not explicit state the expected way to rescans the same > vector more than once. FreeBSD [2], for instance, exports a non-standard > variable 'optreset' which must be set to '1' prior the second and each > additional call to 'getopt'. GLIBC in turn requires the program to > reset 'optind' to 0 instead (and POSIX states the behavior is unspecified). I see 5 getopt test cases that use optind = 1 to reparse the options. Is it optind = 1 or optind = 0? Can you verify the exact behaviour by adding a specific test case that tests for *just* this particular behaviour? > Unfortunately this is not documented on the manual, only on man-pages [3] > (on NOTES). This patch adds an explanation of this behavior on manual. > > * manual/getopt.texi: Document optind zero set behaviour. > > [1] http://pubs.opengroup.org/onlinepubs/9699919799/ > [2] https://www.freebsd.org/cgi/man.cgi?getopt(3) > [3] http://man7.org/linux/man-pages/man3/getopt.3.html > --- > ChangeLog | 4 ++++ > manual/getopt.texi | 6 ++++++ > 2 files changed, 10 insertions(+) > > diff --git a/manual/getopt.texi b/manual/getopt.texi > index 5485fc4..a4f6366 100644 > --- a/manual/getopt.texi > +++ b/manual/getopt.texi > @@ -45,6 +45,12 @@ of the @var{argv} array to be processed. Once @code{getopt} has found > all of the option arguments, you can use this variable to determine > where the remaining non-option arguments begin. The initial value of > this variable is @code{1}. > + > +Resetting the variable value to @code{0} forces the invocation of an > +internal initialization routine and it is used mainly when a program > +wants to rescan the same vector more than once. It also should be used > +to scan multiple argument vectors or if @code{POSIXLY_CORRECT} is changed > +between scans. > @end deftypevar Suggest: Resetting the variable's value to @code{0} forces the invocation of an internal initialization routine and, on subsequent calls to getopt, causes the program to rescan the same vector more than once. This behaviour may also be used to scan multiple argument vectors or if @code{POSIXLY_CORRECT} is changed between scans. > @deftypevar {char *} optarg > -- Cheers, Carlos.
On 23/05/2018 14:52, Carlos O'Donell wrote: > On 05/21/2018 03:38 PM, Adhemerval Zanella wrote: >> POSIX [1] does not explicit state the expected way to rescans the same >> vector more than once. FreeBSD [2], for instance, exports a non-standard >> variable 'optreset' which must be set to '1' prior the second and each >> additional call to 'getopt'. GLIBC in turn requires the program to >> reset 'optind' to 0 instead (and POSIX states the behavior is unspecified). > > I see 5 getopt test cases that use optind = 1 to reparse the options. > > Is it optind = 1 or optind = 0? > > Can you verify the exact behaviour by adding a specific test case that > tests for *just* this particular behaviour? The issue is getopt uses information tied to argv internal contents to determine whether to advance to next argv element. On the testcase from BZ#23157 it does: --- void test_getopt(int argc, char **argv, const char *optstring, int expected) { int oc = getopt(argc, argv, optstring); if (oc == expected) { printf("PASS getopt = %c (%d)\n", oc, oc); } else { printf("FAIL getopt = %c (%d), expected %c (%d)\n", oc, oc, expected, expected); } } int main(int ac, char **av) { int argc; char *argv[16]; char test[16]; argv[0] = "ignored-1"; strcpy(test, "-a"); argv[1] = test; argv[2] = "non-option-1"; argv[3] = NULL; argc = 3; /* As expected */ test_getopt(argc, argv, "ab", 'a'); /* As expected */ test_getopt(argc, argv, "ab", -1); argv[0] = "ignored-2"; argv[1] = "non-option-2"; argv[2] = NULL; argc = 2; optind = 1; strcpy(test, "-ab"); /* Fails, as __nextchar is still pointing into 'test' */ test_getopt(argc, argv, "ab", -1); return 0; } --- After second getopt called by test_getopt, __nextchar will point to &test[2] which '\0'. Without changing 'test', the behaviour would be to indicate the next argv should be used as: posix/getopt.c 492 if (d->__nextchar == NULL || *d->__nextchar == '\0') 493 { 494 /* Advance to the next ARGV-element. */ 495 496 /* Give FIRST_NONOPT & LAST_NONOPT rational values if OPTIND has been 497 moved back by the user (who may also have changed the arguments). */ 498 if (d->__last_nonopt > d->optind) However, since program explicit changed 'test' value, the __nextchar on third getopt invocation will yield 'b' instead and thus will basically invalidate the algorithm. Without changing test contents, setting optind to 1 works unless the man-pages noted cases. Now I am not sure if program is abusing of getopt semantic, or glibc is tying with information it should (the input argument), or if it just undefined behaviour. > >> Unfortunately this is not documented on the manual, only on man-pages [3] >> (on NOTES). This patch adds an explanation of this behavior on manual. >> >> * manual/getopt.texi: Document optind zero set behaviour. >> >> [1] http://pubs.opengroup.org/onlinepubs/9699919799/ >> [2] https://www.freebsd.org/cgi/man.cgi?getopt(3) >> [3] http://man7.org/linux/man-pages/man3/getopt.3.html >> --- >> ChangeLog | 4 ++++ >> manual/getopt.texi | 6 ++++++ >> 2 files changed, 10 insertions(+) >> >> diff --git a/manual/getopt.texi b/manual/getopt.texi >> index 5485fc4..a4f6366 100644 >> --- a/manual/getopt.texi >> +++ b/manual/getopt.texi >> @@ -45,6 +45,12 @@ of the @var{argv} array to be processed. Once @code{getopt} has found >> all of the option arguments, you can use this variable to determine >> where the remaining non-option arguments begin. The initial value of >> this variable is @code{1}. >> + >> +Resetting the variable value to @code{0} forces the invocation of an >> +internal initialization routine and it is used mainly when a program >> +wants to rescan the same vector more than once. It also should be used >> +to scan multiple argument vectors or if @code{POSIXLY_CORRECT} is changed >> +between scans. >> @end deftypevar > > Suggest: > Resetting the variable's value to @code{0} forces the invocation of an > internal initialization routine and, on subsequent calls to getopt, causes > the program to rescan the same vector more than once. This behaviour may > also be used to scan multiple argument vectors or if @code{POSIXLY_CORRECT} > is changed between scans. I would add it is required as well if argv contents is changed over the calls. > >> @deftypevar {char *} optarg >> >
On 05/29/2018 02:44 PM, Adhemerval Zanella wrote: > Now I am not sure if program is abusing of getopt semantic, or glibc is tying > with information it should (the input argument), or if it just undefined > behaviour. We have no test cases that use optind setting to zero. That isn't proof conclusive that the feature is not supported. We have only a manual page with a note about resetting optind to 0 to allow the reset to happen. That's not canonical. We have a BSD implementation that does something similar but with a distinct variable. My suggestion: * Review other implementations of GNU getopt and see what they support for optind setting to 0, e.g. https://github.com/agriffis/pure-getopt If they don't support setting optind to 0, then that's better proof that we should not support this. * Consider copying BSD's non-standard optreset. I like the idea of a distinct variable for use in the reset. * Consider sending a patch to the linux man pages project to remove the note regarding glibc's undocumented method for rescanning. Cheers, Carlos.
On 29/05/2018 17:02, Carlos O'Donell wrote: > On 05/29/2018 02:44 PM, Adhemerval Zanella wrote: >> Now I am not sure if program is abusing of getopt semantic, or glibc is tying >> with information it should (the input argument), or if it just undefined >> behaviour. > > We have no test cases that use optind setting to zero. That isn't proof conclusive > that the feature is not supported. We do: * posix/bug-getopt2.c explicit sets optind first to 0 and then to 1. * posix/tst-getopt-cancel.c also sets optind to 0. Both are due to explicit reset internal state to handle the optstring starting with '+'. > > We have only a manual page with a note about resetting optind to 0 to allow the > reset to happen. That's not canonical. > > We have a BSD implementation that does something similar but with a distinct > variable. > > My suggestion: > > * Review other implementations of GNU getopt and see what they support for > optind setting to 0, e.g. https://github.com/agriffis/pure-getopt > If they don't support setting optind to 0, then that's better proof that > we should not support this. I don't think this example applies here (optind is really tied to C semantic, this bash project have a different to access the results). GLIBC getopt is pretty much in sync with gnulib and I am not aware of any other gnu getopt replacement. > > * Consider copying BSD's non-standard optreset. I like the idea of a distinct > variable for use in the reset. I am not sure because we will have different reset mechanism if we want to provide the current semantic and we will need to define optreset interaction with GNU extensions (which is essentially which adds optind complexity here). We can also rewrite how optind interacts with GNU extensions and define optreset would be canonical way, but it will also adds more code complexity and probably requires a compat symbol. > > * Consider sending a patch to the linux man pages project to remove the > note regarding glibc's undocumented method for rescanning. I still prefer if we document more properly what are GLIBC behaviour regarding non specified POSIX semantic.
It's not just test cases. Some GNU programs set optind=0 and expect this to cause getopt_long to rescan. The Coreutils programs env, nice, and stty all do this. This suggests that getopt_long should keep this behavior and its documentation, and that for consistency's sake getopt should also keep doing it.
On 05/29/2018 06:45 PM, Paul Eggert wrote: > It's not just test cases. Some GNU programs set optind=0 and expect > this to cause getopt_long to rescan. The Coreutils programs env, > nice, and stty all do this. This suggests that getopt_long should > keep this behavior and its documentation, and that for consistency's > sake getopt should also keep doing it. Excellent, this is all proof conclusive that we need: * Document optind=0 setting. * Add a testcase to cover this specifically and call it out. Adhemerval can you post a v2 with a test case and I'll review that? Cheers, Carlos.
diff --git a/manual/getopt.texi b/manual/getopt.texi index 5485fc4..a4f6366 100644 --- a/manual/getopt.texi +++ b/manual/getopt.texi @@ -45,6 +45,12 @@ of the @var{argv} array to be processed. Once @code{getopt} has found all of the option arguments, you can use this variable to determine where the remaining non-option arguments begin. The initial value of this variable is @code{1}. + +Resetting the variable value to @code{0} forces the invocation of an +internal initialization routine and it is used mainly when a program +wants to rescan the same vector more than once. It also should be used +to scan multiple argument vectors or if @code{POSIXLY_CORRECT} is changed +between scans. @end deftypevar @deftypevar {char *} optarg