Message ID | 20190521184859.29249-1-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | RFC: Add posix_spawn_file_actions_closefrom | expand |
* Adhemerval Zanella: > diff --git a/posix/spawn.h b/posix/spawn.h > index 471dbea022..095ee67a26 100644 > --- a/posix/spawn.h > +++ b/posix/spawn.h > @@ -213,6 +213,12 @@ extern int posix_spawn_file_actions_addchdir_np (posix_spawn_file_actions_t * > extern int posix_spawn_file_actions_addfchdir_np (posix_spawn_file_actions_t *, > int __fd) > __THROW __nonnull ((1)); > + > +/* Add an action to close all file descriptor greater than FROM durint > + spawn. This affects the subsequent file actions. */ > +extern int posix_spawn_file_actions_addclosefrom_np (posix_spawn_file_actions_t *, > + int __from); > + Missing __THROW __nonnull ((11)), I think. > diff --git a/posix/spawn_faction_addclosefrom.c b/posix/spawn_faction_addclosefrom.c > new file mode 100644 > index 0000000000..24e6c62dbc > --- /dev/null > +++ b/posix/spawn_faction_addclosefrom.c Can we implement this on Hurd fairly quickly? Otherwise, this should probable be a stub that returns ENOSYS. > diff --git a/posix/tst-spawn5.c b/posix/tst-spawn5.c > new file mode 100644 > index 0000000000..7ed03810e4 > +static int > +do_test (int argc, char *argv[]) > +{ > + /* We must have one or four parameters left if called initially: > + + path for ld.so optional > + + "--library-path" optional > + + the library path optional > + + the application name > + > + Plus one parameter to indicate which test to execute through > + re-execution. > + > + So for default usage without --enable-hardcoded-path-in-tests, it > + will be called initially with 5 arguments and later with 2. For > + --enable-hardcoded-path-in-tests it will be called with 2 arguments > + regardless. */ You could run this as a container test, so that this would become unnecessary. (Or link statically.) The test doesn't exercise the gaps case. > diff --git a/sysdeps/posix/opendir.c b/sysdeps/posix/opendir.c > index 8e3ba480b7..b576b3233d 100644 > --- a/sysdeps/posix/opendir.c > +++ b/sysdeps/posix/opendir.c > +DIR * > +__alloc_dir (int fd, bool close_fd, const struct stat64 *statp, > + void *buffer, size_t size) > { > /* We have to set the close-on-exit flag if the user provided the > file descriptor. */ > @@ -114,31 +147,44 @@ __alloc_dir (int fd, bool close_fd, int flags, const struct stat64 *statp) > allocation = MIN (MAX ((size_t) statp->st_blksize, default_allocation), > MAX_DIR_BUFFER_SIZE); > #endif > - > - DIR *dirp = (DIR *) malloc (sizeof (DIR) + allocation); > - if (dirp == NULL) > + DIR *dirp; > + if (!buffer) Style issue, I think: buffer != NULL. Otherwise, this is a fairly nice hack to get an async-signal-safe readdir. But I'm not sure that it's what we need here. See below. > diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c > index c1abf3f960..fe0ff95825 100644 > --- a/sysdeps/unix/sysv/linux/spawni.c > +++ b/sysdeps/unix/sysv/linux/spawni.c > +/* Close all file descriptor up to FROM by interacting /proc/self/fd. > + Any failure should */ > +static bool > +spawn_closefrom (int from) > +{ > + /* Increasing the buffer size incurs in less getdents syscalls from > + readdir, however it would require more stack size to be allocated > + on __spawnix. */ > + char buffer[sizeof (struct __dirstream) + 2 * sizeof (struct dirent)]; We could allocate this on the heap, in the parent. Maybe we could opendir in the parent, and play with the underlying descriptor in the child? Then you wouldn't need to add __opendir_inplace at all. Given that we know what our implementation looks like, this should be fairly safe. > + DIR *dp; > + if ((dp = __opendir_inplace ("/proc/self/fd", buffer, sizeof buffer)) > + == NULL) > + return false; This could check for ENFILE/EMFILE/ENOMEM and try closing descriptors directly in case of that error, to make room for the new descriptor. But perhaps that's not worth the complexity. > + bool ret = true; > + struct dirent *dirp; > + while ((dirp = __readdir (dp)) != NULL) Should this be __readdir64? > + { > + if (dirp->d_name[0] == '.') > + continue; > + > + char *endptr; > + long int fd = strtol (dirp->d_name, &endptr, 10); > + if (*endptr != '\0' || fd < 0 || fd > INT_MAX) > + { > + ret = false; > + break; > + } > + > + if (fd == dirfd (dp) || fd < from) > + continue; > + > + __close (fd); > + } > + __closedir (dp); > + > + return ret; > +} I'm not sure if this is entirely correct. If we close some descriptors, and then readdir calls getdents64, what will the kernel return? Will there be a gap in the descriptor list? (Curiously, it's the same issue we have the the fork handler list. 8-) If you share my concerns, maybe we should call getdents64 directly and parse the buffer? And restart after the buffer has been exhausted? (I have a patch with such parser functions and planned to submit it after the getdents64 syscall wrapper went in.) > + > /* Function used in the clone call to setup the signals mask, posix_spawn > attributes, and file actions. It run on its own stack (provided by the > posix_spawn call). */ > @@ -280,6 +321,11 @@ __spawni_child (void *arguments) > if (__fchdir (action->action.fchdir_action.fd) != 0) > goto fail; > break; > + > + case spawn_do_closefrom: > + if (!spawn_closefrom (action->action.closefrom_action.from)) > + goto fail; > + break; > } > } The Hurd and generic implementations will need to be updated to handle spawn_do_closefrom as well, otherwise they will fail to build. Thanks, Florian
On 24/05/2019 08:34, Florian Weimer wrote: > * Adhemerval Zanella: > >> diff --git a/posix/spawn.h b/posix/spawn.h >> index 471dbea022..095ee67a26 100644 >> --- a/posix/spawn.h >> +++ b/posix/spawn.h >> @@ -213,6 +213,12 @@ extern int posix_spawn_file_actions_addchdir_np (posix_spawn_file_actions_t * >> extern int posix_spawn_file_actions_addfchdir_np (posix_spawn_file_actions_t *, >> int __fd) >> __THROW __nonnull ((1)); >> + >> +/* Add an action to close all file descriptor greater than FROM durint >> + spawn. This affects the subsequent file actions. */ >> +extern int posix_spawn_file_actions_addclosefrom_np (posix_spawn_file_actions_t *, >> + int __from); >> + > > Missing __THROW __nonnull ((11)), I think. Ack. > >> diff --git a/posix/spawn_faction_addclosefrom.c b/posix/spawn_faction_addclosefrom.c >> new file mode 100644 >> index 0000000000..24e6c62dbc >> --- /dev/null >> +++ b/posix/spawn_faction_addclosefrom.c > > Can we implement this on Hurd fairly quickly? Otherwise, this should > probable be a stub that returns ENOSYS. Not sure, I will need to check how to accomplish it on Hurd. I think it would be better to add a ENOSYS wrapper for now. > >> diff --git a/posix/tst-spawn5.c b/posix/tst-spawn5.c >> new file mode 100644 >> index 0000000000..7ed03810e4 > >> +static int >> +do_test (int argc, char *argv[]) >> +{ >> + /* We must have one or four parameters left if called initially: >> + + path for ld.so optional >> + + "--library-path" optional >> + + the library path optional >> + + the application name >> + >> + Plus one parameter to indicate which test to execute through >> + re-execution. >> + >> + So for default usage without --enable-hardcoded-path-in-tests, it >> + will be called initially with 5 arguments and later with 2. For >> + --enable-hardcoded-path-in-tests it will be called with 2 arguments >> + regardless. */ > > You could run this as a container test, so that this would become > unnecessary. (Or link statically.) Right, I think maybe container could be an option. Another possibility is to add some API to make this process re-spawn less bloated. > > The test doesn't exercise the gaps case. Do you mean gaps in file descriptor initial set before posix_spawn? > >> diff --git a/sysdeps/posix/opendir.c b/sysdeps/posix/opendir.c >> index 8e3ba480b7..b576b3233d 100644 >> --- a/sysdeps/posix/opendir.c >> +++ b/sysdeps/posix/opendir.c > >> +DIR * >> +__alloc_dir (int fd, bool close_fd, const struct stat64 *statp, >> + void *buffer, size_t size) >> { >> /* We have to set the close-on-exit flag if the user provided the >> file descriptor. */ >> @@ -114,31 +147,44 @@ __alloc_dir (int fd, bool close_fd, int flags, const struct stat64 *statp) >> allocation = MIN (MAX ((size_t) statp->st_blksize, default_allocation), >> MAX_DIR_BUFFER_SIZE); >> #endif >> - >> - DIR *dirp = (DIR *) malloc (sizeof (DIR) + allocation); >> - if (dirp == NULL) >> + DIR *dirp; >> + if (!buffer) > > Style issue, I think: buffer != NULL. > > Otherwise, this is a fairly nice hack to get an async-signal-safe > readdir. But I'm not sure that it's what we need here. See below. > >> diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c >> index c1abf3f960..fe0ff95825 100644 >> --- a/sysdeps/unix/sysv/linux/spawni.c >> +++ b/sysdeps/unix/sysv/linux/spawni.c > >> +/* Close all file descriptor up to FROM by interacting /proc/self/fd. >> + Any failure should */ >> +static bool >> +spawn_closefrom (int from) >> +{ >> + /* Increasing the buffer size incurs in less getdents syscalls from >> + readdir, however it would require more stack size to be allocated >> + on __spawnix. */ >> + char buffer[sizeof (struct __dirstream) + 2 * sizeof (struct dirent)]; > > We could allocate this on the heap, in the parent. Maybe we could > opendir in the parent, and play with the underlying descriptor in the > child? Then you wouldn't need to add __opendir_inplace at all. Given > that we know what our implementation looks like, this should be fairly > safe. I don't have a strong opinion here, it would add some complexity on parent helper which would need to transverse all file actions, call opendir, and deallocate after helper process returns. My idea is to keep the required logic more in place, so its more obvious where things are initiated. > >> + DIR *dp; >> + if ((dp = __opendir_inplace ("/proc/self/fd", buffer, sizeof buffer)) >> + == NULL) >> + return false; > > This could check for ENFILE/EMFILE/ENOMEM and try closing descriptors > directly in case of that error, to make room for the new descriptor. > But perhaps that's not worth the complexity. Hum, this could be an enhancement indeed. However the main issue is to find which is the lower opened file descriptor greater than FROM without polling /proc/self/fd or by using close with random file descriptors. > >> + bool ret = true; >> + struct dirent *dirp; >> + while ((dirp = __readdir (dp)) != NULL) > > Should this be __readdir64? It should indeed, we should definitely move away from non-LFS calls. > >> + { >> + if (dirp->d_name[0] == '.') >> + continue; >> + >> + char *endptr; >> + long int fd = strtol (dirp->d_name, &endptr, 10); >> + if (*endptr != '\0' || fd < 0 || fd > INT_MAX) >> + { >> + ret = false; >> + break; >> + } >> + >> + if (fd == dirfd (dp) || fd < from) >> + continue; >> + >> + __close (fd); >> + } >> + __closedir (dp); >> + >> + return ret; >> +} > > I'm not sure if this is entirely correct. If we close some descriptors, > and then readdir calls getdents64, what will the kernel return? Will > there be a gap in the descriptor list? (Curiously, it's the same issue > we have the the fork handler list. 8-) It does not seems to be case with my experiments. I hack opendir to allocate the minimum workable buffer (__dirstream plus a struct dirent, about 40 bytes on x86_64) to force each readdir to call getdents. A simple testcase shows: -- #include <unistd.h> #include <sys/stat.h> #include <fcntl.h> #include <assert.h> #include <dirent.h> #include <limits.h> int main (int argc, char *argv[]) { int fd1 = open ("/dev/null", O_WRONLY); assert (fd1 != -1); int fd2 = open ("/dev/null", O_WRONLY); assert (fd2 != -1); int fd3 = open ("/dev/null", O_WRONLY); assert (fd3 != -1); int fd4 = open ("/dev/null", O_WRONLY); assert (fd4 != -1); printf ("fd1=%d fd2=%d fd3=%d fd4=%d\n", fd1, fd2, fd3, fd4); system ("ls /proc/self/fd"); int from = fd1 - 1; DIR *dp = opendir ("/proc/self/fd"); assert (dp != NULL); struct dirent *dirp; while ((dirp = readdir (dp)) != NULL) { if (dirp->d_name[0] == '.') continue; char *endptr; long int fd = strtol (dirp->d_name, &endptr, 10); if (*endptr != '\0' || fd < 0 || fd > INT_MAX) break; if (fd == dirfd (dp) || fd < from) continue; close (fd); system ("ls /proc/self/fd"); } closedir (dp); return 0; } $ ./testrun.sh /tmp/test fd1=3 fd2=4 fd3=5 fd4=6 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 0 1 2 4 5 6 0 1 2 5 6 0 1 2 6 0 1 2 $ ./testrun.sh --tool=strace /tmp/test 2>&1 | grep getdents64 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 1 entries */, 40) = 24 getdents64(7, /* 0 entries */, 40) = 0 -- I tried also to filter out an additional file descriptor by setting the condition: if (fd == dirfd (dp) || fd < from || fd == fd2) continue; And the result seems what is expected: $ ./testrun.sh /tmp/test fd1=3 fd2=4 fd3=5 fd4=6 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 0 1 2 4 5 6 0 1 2 4 6 0 1 2 4 I haven't actually read the kernel source to check what exactly getdents does for /proc/self/fd updates neither if it handles it differently. > > If you share my concerns, maybe we should call getdents64 directly and > parse the buffer? And restart after the buffer has been exhausted? > (I have a patch with such parser functions and planned to submit it > after the getdents64 syscall wrapper went in.) I though about calling getdents64 directly, but I think we would just ended up replicating readdir code on closefrom. The __opendir_inplace has the extra advantage to work with readdir. > >> + >> /* Function used in the clone call to setup the signals mask, posix_spawn >> attributes, and file actions. It run on its own stack (provided by the >> posix_spawn call). */ >> @@ -280,6 +321,11 @@ __spawni_child (void *arguments) >> if (__fchdir (action->action.fchdir_action.fd) != 0) >> goto fail; >> break; >> + >> + case spawn_do_closefrom: >> + if (!spawn_closefrom (action->action.closefrom_action.from)) >> + goto fail; >> + break; >> } >> } > > The Hurd and generic implementations will need to be updated to handle > spawn_do_closefrom as well, otherwise they will fail to build. Ack.
* Adhemerval Zanella: >> The test doesn't exercise the gaps case. > > Do you mean gaps in file descriptor initial set before posix_spawn? Yes, where the directory descriptor is in the middle of the closefrom range. >>> +/* Close all file descriptor up to FROM by interacting /proc/self/fd. >>> + Any failure should */ >>> +static bool >>> +spawn_closefrom (int from) >>> +{ >>> + /* Increasing the buffer size incurs in less getdents syscalls from >>> + readdir, however it would require more stack size to be allocated >>> + on __spawnix. */ >>> + char buffer[sizeof (struct __dirstream) + 2 * sizeof (struct dirent)]; >> >> We could allocate this on the heap, in the parent. Maybe we could >> opendir in the parent, and play with the underlying descriptor in the >> child? Then you wouldn't need to add __opendir_inplace at all. Given >> that we know what our implementation looks like, this should be fairly >> safe. > > I don't have a strong opinion here, it would add some complexity on parent > helper which would need to transverse all file actions, call opendir, and > deallocate after helper process returns. My idea is to keep the required > logic more in place, so its more obvious where things are initiated. You could turn one of the padding elements in posix_spawn_file_actions_t into a flag and have posix_spawn_file_actions_addclosefrom_np set the flag. Then the second iteration isn't necessary. >>> + DIR *dp; >>> + if ((dp = __opendir_inplace ("/proc/self/fd", buffer, sizeof buffer)) >>> + == NULL) >>> + return false; >> >> This could check for ENFILE/EMFILE/ENOMEM and try closing descriptors >> directly in case of that error, to make room for the new descriptor. >> But perhaps that's not worth the complexity. > > Hum, this could be an enhancement indeed. However the main issue is > to find which is the lower opened file descriptor greater than FROM > without polling /proc/self/fd or by using close with random file > descriptors. You can do close (from), close (from + 1), etc., up to a certain limit, and retry if one of the close calls doesn't return EBADF. The magic limit is needed in case the closefrom does not overlap with any file descriptors. >>> + { >>> + if (dirp->d_name[0] == '.') >>> + continue; >>> + >>> + char *endptr; >>> + long int fd = strtol (dirp->d_name, &endptr, 10); >>> + if (*endptr != '\0' || fd < 0 || fd > INT_MAX) >>> + { >>> + ret = false; >>> + break; >>> + } >>> + >>> + if (fd == dirfd (dp) || fd < from) >>> + continue; >>> + >>> + __close (fd); >>> + } >>> + __closedir (dp); >>> + >>> + return ret; >>> +} >> >> I'm not sure if this is entirely correct. If we close some descriptors, >> and then readdir calls getdents64, what will the kernel return? Will >> there be a gap in the descriptor list? (Curiously, it's the same issue >> we have the the fork handler list. 8-) > > It does not seems to be case with my experiments. I hack opendir to > allocate the minimum workable buffer (__dirstream plus a > struct dirent, about 40 bytes on x86_64) to force each readdir to > call getdents. A simple testcase shows: It's still looks very implementation-defined to me. proc_readfd_common does this: for (fd = ctx->pos - 2; fd < files_fdtable(files)->max_fds; fd++, ctx->pos++) { And I think ctx->pos somehow corresponds to d_off. But I don't see a 1:1 correspondence between descriptors and offsets. I wonder whether the single-entry case is indeed the worst-possible test case for this. Thanks, Florian
On 24/05/2019 11:37, Florian Weimer wrote: > * Adhemerval Zanella: > >>> The test doesn't exercise the gaps case. >> >> Do you mean gaps in file descriptor initial set before posix_spawn? > > Yes, where the directory descriptor is in the middle of the closefrom > range. Ok, I would add a test to check for it. > >>>> +/* Close all file descriptor up to FROM by interacting /proc/self/fd. >>>> + Any failure should */ >>>> +static bool >>>> +spawn_closefrom (int from) >>>> +{ >>>> + /* Increasing the buffer size incurs in less getdents syscalls from >>>> + readdir, however it would require more stack size to be allocated >>>> + on __spawnix. */ >>>> + char buffer[sizeof (struct __dirstream) + 2 * sizeof (struct dirent)]; >>> >>> We could allocate this on the heap, in the parent. Maybe we could >>> opendir in the parent, and play with the underlying descriptor in the >>> child? Then you wouldn't need to add __opendir_inplace at all. Given >>> that we know what our implementation looks like, this should be fairly >>> safe. >> >> I don't have a strong opinion here, it would add some complexity on parent >> helper which would need to transverse all file actions, call opendir, and >> deallocate after helper process returns. My idea is to keep the required >> logic more in place, so its more obvious where things are initiated. > > You could turn one of the padding elements in posix_spawn_file_actions_t > into a flag and have posix_spawn_file_actions_addclosefrom_np set the > flag. Then the second iteration isn't necessary. > >>>> + DIR *dp; >>>> + if ((dp = __opendir_inplace ("/proc/self/fd", buffer, sizeof buffer)) >>>> + == NULL) >>>> + return false; >>> >>> This could check for ENFILE/EMFILE/ENOMEM and try closing descriptors >>> directly in case of that error, to make room for the new descriptor. >>> But perhaps that's not worth the complexity. >> >> Hum, this could be an enhancement indeed. However the main issue is >> to find which is the lower opened file descriptor greater than FROM >> without polling /proc/self/fd or by using close with random file >> descriptors. > > You can do close (from), close (from + 1), etc., up to a certain limit, > and retry if one of the close calls doesn't return EBADF. The magic > limit is needed in case the closefrom does not overlap with any file > descriptors. Yeah, this is exactly the random close calls I would like to avoid. But I also don't see a better option. > >>>> + { >>>> + if (dirp->d_name[0] == '.') >>>> + continue; >>>> + >>>> + char *endptr; >>>> + long int fd = strtol (dirp->d_name, &endptr, 10); >>>> + if (*endptr != '\0' || fd < 0 || fd > INT_MAX) >>>> + { >>>> + ret = false; >>>> + break; >>>> + } >>>> + >>>> + if (fd == dirfd (dp) || fd < from) >>>> + continue; >>>> + >>>> + __close (fd); >>>> + } >>>> + __closedir (dp); >>>> + >>>> + return ret; >>>> +} >>> >>> I'm not sure if this is entirely correct. If we close some descriptors, >>> and then readdir calls getdents64, what will the kernel return? Will >>> there be a gap in the descriptor list? (Curiously, it's the same issue >>> we have the the fork handler list. 8-) >> >> It does not seems to be case with my experiments. I hack opendir to >> allocate the minimum workable buffer (__dirstream plus a >> struct dirent, about 40 bytes on x86_64) to force each readdir to >> call getdents. A simple testcase shows: > > It's still looks very implementation-defined to me. proc_readfd_common > does this: > > for (fd = ctx->pos - 2; > fd < files_fdtable(files)->max_fds; > fd++, ctx->pos++) { > > And I think ctx->pos somehow corresponds to d_off. But I don't see a > 1:1 correspondence between descriptors and offsets. I wonder whether > the single-entry case is indeed the worst-possible test case for this. > I will check with different permutations by changing the opendir buffer and a file descriptor set with different set of gaps.
On 24/05/2019 11:55, Adhemerval Zanella wrote: > > > On 24/05/2019 11:37, Florian Weimer wrote: >> * Adhemerval Zanella: >> >>>> The test doesn't exercise the gaps case. >>> >>> Do you mean gaps in file descriptor initial set before posix_spawn? >> >> Yes, where the directory descriptor is in the middle of the closefrom >> range. > > Ok, I would add a test to check for it. > >> >>>>> +/* Close all file descriptor up to FROM by interacting /proc/self/fd. >>>>> + Any failure should */ >>>>> +static bool >>>>> +spawn_closefrom (int from) >>>>> +{ >>>>> + /* Increasing the buffer size incurs in less getdents syscalls from >>>>> + readdir, however it would require more stack size to be allocated >>>>> + on __spawnix. */ >>>>> + char buffer[sizeof (struct __dirstream) + 2 * sizeof (struct dirent)]; >>>> >>>> We could allocate this on the heap, in the parent. Maybe we could >>>> opendir in the parent, and play with the underlying descriptor in the >>>> child? Then you wouldn't need to add __opendir_inplace at all. Given >>>> that we know what our implementation looks like, this should be fairly >>>> safe. >>> >>> I don't have a strong opinion here, it would add some complexity on parent >>> helper which would need to transverse all file actions, call opendir, and >>> deallocate after helper process returns. My idea is to keep the required >>> logic more in place, so its more obvious where things are initiated. >> >> You could turn one of the padding elements in posix_spawn_file_actions_t >> into a flag and have posix_spawn_file_actions_addclosefrom_np set the >> flag. Then the second iteration isn't necessary. Another possible advantage of using a inplace buffer for opendir is we can tune memory usage based on expected filename entries. For /proc/self/fds the names are expected for be at most sizeof (int) * 3 + 1, so with: enum { dirent_base_size = offsetof (struct dirent, d_name), d_name_max_length = sizeof (int) * 3 + 1, dirent_max_size = ALIGN_UP (dirent_base_size + d_name_max_length, sizeof (long)) }; char buffer[sizeof (struct __dirstream) + 10 * dirent_max_size] We can obtain 10 entries for each getdents calls at cost of about just 432 stack size for spawn_closefrom instead of allocate 4*BUFSIZ/BUFSIZ (32728 in most cases). >> >>>>> + DIR *dp; >>>>> + if ((dp = __opendir_inplace ("/proc/self/fd", buffer, sizeof buffer)) >>>>> + == NULL) >>>>> + return false; >>>> >>>> This could check for ENFILE/EMFILE/ENOMEM and try closing descriptors >>>> directly in case of that error, to make room for the new descriptor. >>>> But perhaps that's not worth the complexity. >>> >>> Hum, this could be an enhancement indeed. However the main issue is >>> to find which is the lower opened file descriptor greater than FROM >>> without polling /proc/self/fd or by using close with random file >>> descriptors. >> >> You can do close (from), close (from + 1), etc., up to a certain limit, >> and retry if one of the close calls doesn't return EBADF. The magic >> limit is needed in case the closefrom does not overlap with any file >> descriptors. > > Yeah, this is exactly the random close calls I would like to avoid. But > I also don't see a better option. > >> >>>>> + { >>>>> + if (dirp->d_name[0] == '.') >>>>> + continue; >>>>> + >>>>> + char *endptr; >>>>> + long int fd = strtol (dirp->d_name, &endptr, 10); >>>>> + if (*endptr != '\0' || fd < 0 || fd > INT_MAX) >>>>> + { >>>>> + ret = false; >>>>> + break; >>>>> + } >>>>> + >>>>> + if (fd == dirfd (dp) || fd < from) >>>>> + continue; >>>>> + >>>>> + __close (fd); >>>>> + } >>>>> + __closedir (dp); >>>>> + >>>>> + return ret; >>>>> +} >>>> >>>> I'm not sure if this is entirely correct. If we close some descriptors, >>>> and then readdir calls getdents64, what will the kernel return? Will >>>> there be a gap in the descriptor list? (Curiously, it's the same issue >>>> we have the the fork handler list. 8-) >>> >>> It does not seems to be case with my experiments. I hack opendir to >>> allocate the minimum workable buffer (__dirstream plus a >>> struct dirent, about 40 bytes on x86_64) to force each readdir to >>> call getdents. A simple testcase shows: >> >> It's still looks very implementation-defined to me. proc_readfd_common >> does this: >> >> for (fd = ctx->pos - 2; >> fd < files_fdtable(files)->max_fds; >> fd++, ctx->pos++) { >> >> And I think ctx->pos somehow corresponds to d_off. But I don't see a >> 1:1 correspondence between descriptors and offsets. I wonder whether >> the single-entry case is indeed the worst-possible test case for this. >> > > I will check with different permutations by changing the opendir buffer > and a file descriptor set with different set of gaps. I checked a permutation of 3 tests: - Issue 10 open calls and call closefrom with minimum file descriptor value; - Issue 10 open calls, close first one, and call closefrom; - Issue 10 open calls, close first and last one, and call closefrom. With a buffer for getdents starting with just sizeof (struct __dirstream) plus ALIGN_UP (offsetof (struct dirent, d_name) + sizeof (int) * 3 + 1) up to sizeof (struct __dirstream) plus 10 * ALIGN_UP value. In the end the idea is to vary the maximum entry amount getdents will return from 1 to 10 (the number of files to test). I saw no issue, the opendir interaction close all the file without any leakage (I can send you the testcase to check if I get something wrong). And I think it would be expected behaviour imho: if we get all the file descriptor information with a getdents call, holes does not really matter. If holes does exist we have two scenarios: a hole in the obtained buffer or a hole in a subsequent call. Former should be handled as first case, while latter kernel removes it from next getdents call (as I am seeing in my experiments).
diff --git a/include/dirent.h b/include/dirent.h index 400835eefe..7642fdaba0 100644 --- a/include/dirent.h +++ b/include/dirent.h @@ -16,6 +16,8 @@ struct scandir_cancel_struct /* Now define the internal interfaces. */ extern DIR *__opendir (const char *__name) attribute_hidden; +extern DIR *__opendir_inplace (const char *__name, void *buffer, size_t size) + attribute_hidden; extern DIR *__opendirat (int dfd, const char *__name) attribute_hidden; extern DIR *__fdopendir (int __fd) attribute_hidden; extern int __closedir (DIR *__dirp) attribute_hidden; @@ -44,8 +46,8 @@ extern int __alphasort64 (const struct dirent64 **a, const struct dirent64 **b) extern int __versionsort64 (const struct dirent64 **a, const struct dirent64 **b) __attribute_pure__; -extern DIR *__alloc_dir (int fd, bool close_fd, int flags, - const struct stat64 *statp) attribute_hidden; +extern DIR *__alloc_dir (int fd, bool close_fd, const struct stat64 *statp, + void *buffer, size_t size) attribute_hidden; extern __typeof (rewinddir) __rewinddir; extern __typeof (seekdir) __seekdir; extern __typeof (dirfd) __dirfd; diff --git a/posix/Makefile b/posix/Makefile index 8ac6743ad7..1ac41ad85a 100644 --- a/posix/Makefile +++ b/posix/Makefile @@ -57,6 +57,7 @@ routines := \ spawn_faction_init spawn_faction_destroy spawn_faction_addclose \ spawn_faction_addopen spawn_faction_adddup2 spawn_valid_fd \ spawn_faction_addchdir spawn_faction_addfchdir \ + spawn_faction_addclosefrom \ spawnattr_init spawnattr_destroy \ spawnattr_getdefault spawnattr_setdefault \ spawnattr_getflags spawnattr_setflags \ @@ -100,7 +101,8 @@ tests := test-errno tstgetopt testfnm runtests runptests \ tst-posix_fadvise tst-posix_fadvise64 \ tst-sysconf-empty-chroot tst-glob_symlinks tst-fexecve \ tst-glob-tilde test-ssize-max tst-spawn4 bug-regex37 \ - bug-regex38 tst-regcomp-truncated tst-spawn-chdir + bug-regex38 tst-regcomp-truncated tst-spawn-chdir \ + tst-spawn5 tests-internal := bug-regex5 bug-regex20 bug-regex33 \ tst-rfc3484 tst-rfc3484-2 tst-rfc3484-3 \ tst-glob_lstat_compat tst-spawn4-compat @@ -254,6 +256,7 @@ tst-exec-static-ARGS = $(tst-exec-ARGS) tst-execvpe5-ARGS = -- $(host-test-program-cmd) tst-spawn-ARGS = -- $(host-test-program-cmd) tst-spawn-static-ARGS = $(tst-spawn-ARGS) +tst-spawn5-ARGS = -- $(host-test-program-cmd) tst-dir-ARGS = `pwd` `cd $(common-objdir)/$(subdir); pwd` `cd $(common-objdir); pwd` $(objpfx)tst-dir tst-chmod-ARGS = $(objdir) tst-vfork3-ARGS = --test-dir=$(objpfx) diff --git a/posix/Versions b/posix/Versions index 7d06a6d0c0..34fe242072 100644 --- a/posix/Versions +++ b/posix/Versions @@ -144,6 +144,7 @@ libc { GLIBC_2.29 { posix_spawn_file_actions_addchdir_np; posix_spawn_file_actions_addfchdir_np; + posix_spawn_file_actions_addclosefrom_np; } GLIBC_2.30 { } diff --git a/posix/spawn.h b/posix/spawn.h index 471dbea022..095ee67a26 100644 --- a/posix/spawn.h +++ b/posix/spawn.h @@ -213,6 +213,12 @@ extern int posix_spawn_file_actions_addchdir_np (posix_spawn_file_actions_t * extern int posix_spawn_file_actions_addfchdir_np (posix_spawn_file_actions_t *, int __fd) __THROW __nonnull ((1)); + +/* Add an action to close all file descriptor greater than FROM durint + spawn. This affects the subsequent file actions. */ +extern int posix_spawn_file_actions_addclosefrom_np (posix_spawn_file_actions_t *, + int __from); + #endif __END_DECLS diff --git a/posix/spawn_faction_addclosefrom.c b/posix/spawn_faction_addclosefrom.c new file mode 100644 index 0000000000..24e6c62dbc --- /dev/null +++ b/posix/spawn_faction_addclosefrom.c @@ -0,0 +1,51 @@ +/* Add a closefrom to a file action list for posix_spawn. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <errno.h> +#include <spawn.h> +#include <unistd.h> + +#include "spawn_int.h" + +int +__posix_spawn_file_actions_addclosefrom (posix_spawn_file_actions_t + *file_actions, int from) +{ + struct __spawn_action *rec; + + if (!__spawn_valid_fd (from)) + return EBADF; + + /* Allocate more memory if needed. */ + if (file_actions->__used == file_actions->__allocated + && __posix_spawn_file_actions_realloc (file_actions) != 0) + /* This can only mean we ran out of memory. */ + return ENOMEM; + + /* Add the new value. */ + rec = &file_actions->__actions[file_actions->__used]; + rec->tag = spawn_do_closefrom; + rec->action.closefrom_action.from = from; + + /* Account for the new entry. */ + ++file_actions->__used; + + return 0; +} +weak_alias (__posix_spawn_file_actions_addclosefrom, + posix_spawn_file_actions_addclosefrom_np) diff --git a/posix/spawn_faction_destroy.c b/posix/spawn_faction_destroy.c index 51fab13585..b45d1cd889 100644 --- a/posix/spawn_faction_destroy.c +++ b/posix/spawn_faction_destroy.c @@ -39,6 +39,7 @@ __posix_spawn_file_actions_destroy (posix_spawn_file_actions_t *file_actions) case spawn_do_close: case spawn_do_dup2: case spawn_do_fchdir: + case spawn_do_closefrom: /* No cleanup required. */ break; } diff --git a/posix/spawn_int.h b/posix/spawn_int.h index 93b7597f90..2abefe0f57 100644 --- a/posix/spawn_int.h +++ b/posix/spawn_int.h @@ -32,6 +32,7 @@ struct __spawn_action spawn_do_open, spawn_do_chdir, spawn_do_fchdir, + spawn_do_closefrom } tag; union @@ -60,6 +61,10 @@ struct __spawn_action { int fd; } fchdir_action; + struct + { + int from; + } closefrom_action; } action; }; diff --git a/posix/tst-spawn5.c b/posix/tst-spawn5.c new file mode 100644 index 0000000000..7ed03810e4 --- /dev/null +++ b/posix/tst-spawn5.c @@ -0,0 +1,176 @@ +/* Tests for posix_spawn signal handling. + Copyright (C) 2019 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <stdio.h> +#include <stdlib.h> +#include <getopt.h> +#include <spawn.h> +#include <fcntl.h> +#include <sys/wait.h> +#include <dirent.h> +#include <stdbool.h> +#include <errno.h> +#include <limits.h> + +#include <support/check.h> +#include <support/xunistd.h> +#include <support/support.h> +#include <array_length.h> + +/* Nonzero if the program gets called via `exec'. */ +static int restart; +#define CMDLINE_OPTIONS \ + { "restart", no_argument, &restart, 1 }, + +/* Number of lingering opened file descriptors to opened and checked on + the spawned process. */ +static const int num_fd_to_open = 10; + +/* Called on process re-execution. */ +static int +handle_restart (int from) +{ + DIR *fds = opendir ("/proc/self/fd"); + if (fds == NULL) + FAIL_EXIT1 ("opendir (\"/proc/self/fd\"): %m"); + + while (true) + { + errno = 0; + struct dirent64 *e = readdir64 (fds); + if (e == NULL) + { + if (errno != 0) + FAIL_EXIT1 ("readdir: %m"); + break; + } + + if (e->d_name[0] == '.') + continue; + + char *endptr; + long int fd = strtol (e->d_name, &endptr, 10); + if (*endptr != '\0' || fd < 0 || fd > INT_MAX) + FAIL_EXIT1 ("readdir: invalid file descriptor name: /proc/self/fd/%s", + e->d_name); + + /* Skip the descriptor which is used to enumerate the + descriptors. */ + if (fd == dirfd (fds)) + continue; + + struct stat64 st; + if (fstat64 (fd, &st) != 0) + FAIL_EXIT1 ("readdir: fstat64 (%ld) failed: %m", fd); + + if (fd >= from) + FAIL_EXIT1 ("error: fd (%ld) greater than from (%d)", fd, from); + } + + closedir (fds); + + return 0; +} + +/* Common argument used for process re-execution. */ +static char *initial_spargv[5]; +static size_t initial_spargv_size; + +/* Re-execute the test process with both '--direct', '--restart', and the + TEST (as integer value) as arguments. */ +static void +reexecute (int fd, const posix_spawn_file_actions_t *fa) +{ + char *spargv[8]; + int i; + + for (i = 0; i < initial_spargv_size; i++) + spargv[i] = initial_spargv[i]; + /* Three digits per byte plus null terminator. */ + char teststr[3 * sizeof (fd) + 1]; + snprintf (teststr, array_length (teststr), "%d", fd); + spargv[i++] = teststr; + spargv[i] = NULL; + TEST_VERIFY (i < 8); + + pid_t pid; + int status; + + TEST_COMPARE (posix_spawn (&pid, spargv[0], fa, NULL, spargv, environ), + 0); + TEST_COMPARE (xwaitpid (pid, &status, 0), pid); + TEST_VERIFY (WIFEXITED (status)); + TEST_VERIFY (!WIFSIGNALED (status)); + TEST_COMPARE (WEXITSTATUS (status), 0); +} + +static void +do_test_closefrom (void) +{ + int from = xopen ("/dev/null", O_WRONLY, 0); + for (int i = 1; i < num_fd_to_open; i++) + xopen ("/dev/null", O_WRONLY, 0); + + posix_spawn_file_actions_t fa; + /* posix_spawn_file_actions_init does not fail. */ + posix_spawn_file_actions_init (&fa); + + TEST_COMPARE (posix_spawn_file_actions_addclosefrom_np (&fa, from), 0); + + reexecute (from, &fa); +} + +static int +do_test (int argc, char *argv[]) +{ + /* We must have one or four parameters left if called initially: + + path for ld.so optional + + "--library-path" optional + + the library path optional + + the application name + + Plus one parameter to indicate which test to execute through + re-execution. + + So for default usage without --enable-hardcoded-path-in-tests, it + will be called initially with 5 arguments and later with 2. For + --enable-hardcoded-path-in-tests it will be called with 2 arguments + regardless. */ + + if (argc != (restart ? 2 : 5) && argc != 2) + FAIL_EXIT1 ("wrong number of arguments (%d)", argc); + + if (restart) + return handle_restart (atoi (argv[1])); + + { + int i; + for (i = 0; i < (argc == 5 ? 4 : 1); i++) + initial_spargv[i] = argv[i + 1]; + initial_spargv[i++] = (char *) "--direct"; + initial_spargv[i++] = (char *) "--restart"; + initial_spargv_size = i; + } + + do_test_closefrom (); + + return 0; +} + +#define TEST_FUNCTION_ARGV do_test +#include <support/test-driver.c> diff --git a/sysdeps/posix/closedir.c b/sysdeps/posix/closedir.c index fb7da86218..0e55659485 100644 --- a/sysdeps/posix/closedir.c +++ b/sysdeps/posix/closedir.c @@ -47,7 +47,8 @@ __closedir (DIR *dirp) __libc_lock_fini (dirp->lock); #endif - free ((void *) dirp); + if (dirp->allocated) + free ((void *) dirp); return __close_nocancel (fd); } diff --git a/sysdeps/posix/dirstream.h b/sysdeps/posix/dirstream.h index 109fcc1609..861932abd6 100644 --- a/sysdeps/posix/dirstream.h +++ b/sysdeps/posix/dirstream.h @@ -34,6 +34,7 @@ struct __dirstream __libc_lock_define (, lock) /* Mutex lock for this structure. */ size_t allocation; /* Space allocated for the block. */ + bool allocated; /* Space was allocated by opendir. */ size_t size; /* Total valid data in the block. */ size_t offset; /* Current offset into the block. */ diff --git a/sysdeps/posix/fdopendir.c b/sysdeps/posix/fdopendir.c index 4b4e218115..13fe3f14b1 100644 --- a/sysdeps/posix/fdopendir.c +++ b/sysdeps/posix/fdopendir.c @@ -47,6 +47,6 @@ __fdopendir (int fd) return NULL; } - return __alloc_dir (fd, false, flags, &statbuf); + return __alloc_dir (fd, false, &statbuf, NULL, 0); } weak_alias (__fdopendir, fdopendir) diff --git a/sysdeps/posix/opendir.c b/sysdeps/posix/opendir.c index 8e3ba480b7..b576b3233d 100644 --- a/sysdeps/posix/opendir.c +++ b/sysdeps/posix/opendir.c @@ -21,6 +21,8 @@ #include <stdio.h> /* For BUFSIZ. */ #include <sys/param.h> /* For MIN and MAX. */ +#include <dirstream.h> + #include <not-cancel.h> /* The st_blksize value of the directory is used as a hint for the @@ -46,27 +48,47 @@ invalid_name (const char *name) return false; } -static DIR * -opendir_tail (int fd) +static bool +opendir_tail_common (int fd, struct stat64 *statbuf) { if (__glibc_unlikely (fd < 0)) - return NULL; + return false; /* Now make sure this really is a directory and nothing changed since the `stat' call. The S_ISDIR check is superfluous if O_DIRECTORY works, but it's cheap and we need the stat call for st_blksize anyway. */ - struct stat64 statbuf; - if (__glibc_unlikely (__fxstat64 (_STAT_VER, fd, &statbuf) < 0)) - goto lose; - if (__glibc_unlikely (! S_ISDIR (statbuf.st_mode))) + if (__glibc_unlikely (__fxstat64 (_STAT_VER, fd, statbuf) < 0)) + { + __close_nocancel_nostatus (fd); + return false; + } + if (__glibc_unlikely (! S_ISDIR (statbuf->st_mode))) { __set_errno (ENOTDIR); - lose: __close_nocancel_nostatus (fd); - return NULL; + return false; } + return true; +} + +static DIR * +opendir_tail (int fd) +{ + struct stat64 statbuf; + if (!opendir_tail_common (fd, &statbuf)) + return NULL; + + return __alloc_dir (fd, true, &statbuf, NULL, 0); +} - return __alloc_dir (fd, true, 0, &statbuf); +static DIR * +opendir_tail_inplace (int fd, void *buffer, size_t size) +{ + struct stat64 statbuf; + if (!opendir_tail_common (fd, &statbuf)) + return NULL; + + return __alloc_dir (fd, true, &statbuf, buffer, size); } @@ -94,7 +116,18 @@ __opendir (const char *name) weak_alias (__opendir, opendir) DIR * -__alloc_dir (int fd, bool close_fd, int flags, const struct stat64 *statp) +__opendir_inplace (const char *name, void *buffer, size_t size) +{ + if (__glibc_unlikely (invalid_name (name))) + return NULL; + + return opendir_tail_inplace (__open_nocancel (name, opendir_oflags), + buffer, size); +} + +DIR * +__alloc_dir (int fd, bool close_fd, const struct stat64 *statp, + void *buffer, size_t size) { /* We have to set the close-on-exit flag if the user provided the file descriptor. */ @@ -114,31 +147,44 @@ __alloc_dir (int fd, bool close_fd, int flags, const struct stat64 *statp) allocation = MIN (MAX ((size_t) statp->st_blksize, default_allocation), MAX_DIR_BUFFER_SIZE); #endif - - DIR *dirp = (DIR *) malloc (sizeof (DIR) + allocation); - if (dirp == NULL) + DIR *dirp; + if (!buffer) { - allocation = small_allocation; - dirp = (DIR *) malloc (sizeof (DIR) + allocation); - + dirp = malloc (sizeof (DIR) + allocation); if (dirp == NULL) - lose: { - if (close_fd) + allocation = small_allocation; + dirp = (DIR *) malloc (sizeof (DIR) + allocation); + + if (dirp == NULL) { - int save_errno = errno; - __close_nocancel_nostatus (fd); - __set_errno (save_errno); + if (close_fd) + lose: + { + int save_errno = errno; + __close_nocancel_nostatus (fd); + __set_errno (save_errno); + } + return NULL; } - return NULL; } + dirp->allocated = true; + dirp->allocation = allocation; + } + else + { + dirp = buffer; + if (allocation < sizeof (struct __dirstream) + + sizeof (struct dirent)) + goto lose; + dirp->allocated = false; + dirp->allocation = size; } dirp->fd = fd; #if IS_IN (libc) __libc_lock_init (dirp->lock); #endif - dirp->allocation = allocation; dirp->size = 0; dirp->offset = 0; dirp->filepos = 0; diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c index c1abf3f960..fe0ff95825 100644 --- a/sysdeps/unix/sysv/linux/spawni.c +++ b/sysdeps/unix/sysv/linux/spawni.c @@ -31,6 +31,7 @@ #include <dl-sysdep.h> #include <libc-pointer-arith.h> #include <ldsodefs.h> +#include <dirent.h> #include "spawn_int.h" /* The Linux implementation of posix_spawn{p} uses the clone syscall directly @@ -114,6 +115,46 @@ maybe_script_execute (struct posix_spawn_args *args) } } +/* Close all file descriptor up to FROM by interacting /proc/self/fd. + Any failure should */ +static bool +spawn_closefrom (int from) +{ + /* Increasing the buffer size incurs in less getdents syscalls from + readdir, however it would require more stack size to be allocated + on __spawnix. */ + char buffer[sizeof (struct __dirstream) + 2 * sizeof (struct dirent)]; + + DIR *dp; + if ((dp = __opendir_inplace ("/proc/self/fd", buffer, sizeof buffer)) + == NULL) + return false; + + bool ret = true; + struct dirent *dirp; + while ((dirp = __readdir (dp)) != NULL) + { + if (dirp->d_name[0] == '.') + continue; + + char *endptr; + long int fd = strtol (dirp->d_name, &endptr, 10); + if (*endptr != '\0' || fd < 0 || fd > INT_MAX) + { + ret = false; + break; + } + + if (fd == dirfd (dp) || fd < from) + continue; + + __close (fd); + } + __closedir (dp); + + return ret; +} + /* Function used in the clone call to setup the signals mask, posix_spawn attributes, and file actions. It run on its own stack (provided by the posix_spawn call). */ @@ -280,6 +321,11 @@ __spawni_child (void *arguments) if (__fchdir (action->action.fchdir_action.fd) != 0) goto fail; break; + + case spawn_do_closefrom: + if (!spawn_closefrom (action->action.closefrom_action.from)) + goto fail; + break; } } } @@ -339,8 +385,12 @@ __spawnix (pid_t * pid, const char *file, int prot = (PROT_READ | PROT_WRITE | ((GL (dl_stack_flags) & PF_X) ? PROT_EXEC : 0)); + /* The __spawni_child uses about 768 on x86_64 (including the stack for + opendir_inplace), so add some extra space. */ + const size_t stack_slack = 1024; + /* Add a slack area for child's stack. */ - size_t argv_size = (argc * sizeof (void *)) + 512; + size_t argv_size = (argc * sizeof (void *)) + stack_slack; /* We need at least a few pages in case the compiler's stack checking is enabled. In some configs, it is known to use at least 24KiB. We use 32KiB to be "safe" from anything the compiler might do. Besides, the