mbox series

[v2,0/3] introduce PIDFD_SELF* sentinels

Message ID cover.1728643714.git.lorenzo.stoakes@oracle.com
Headers show
Series introduce PIDFD_SELF* sentinels | expand

Message

Lorenzo Stoakes Oct. 11, 2024, 11:05 a.m. UTC
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:

	int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);

	...

	close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.

This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.

It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.

There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:

* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.

In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.

This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.

We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.

In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.

We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.

v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.

Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
  a good idea, but perhaps some debate to be had on implementation. It
  seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
  PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracle.com/

RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracle.com/

Lorenzo Stoakes (3):
  pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
  pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
  selftests: pidfd: add tests for PIDFD_SELF_*

 include/linux/pid.h                           |  43 +++++-
 include/uapi/linux/pidfd.h                    |  15 ++
 kernel/exit.c                                 |   3 +-
 kernel/nsproxy.c                              |   1 +
 kernel/pid.c                                  |  73 ++++++---
 kernel/signal.c                               |  22 +--
 tools/testing/selftests/pidfd/pidfd.h         |   8 +
 .../selftests/pidfd/pidfd_getfd_test.c        | 141 ++++++++++++++++++
 .../selftests/pidfd/pidfd_setns_test.c        |  11 ++
 tools/testing/selftests/pidfd/pidfd_test.c    |  76 ++++++++--
 10 files changed, 341 insertions(+), 52 deletions(-)

--
2.46.2

Comments

Suren Baghdasaryan Oct. 15, 2024, 7:40 p.m. UTC | #1
On Fri, Oct 11, 2024 at 4:06 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> The means by which a pid is determined from a pidfd is duplicated, with
> some callers holding a reference to the (pid)fd, and others explicitly
> pinning the pid.
>
> Introduce __pidfd_get_pid() which abstracts both approaches and provide
> optional output parameters for file->f_flags and the fd (the latter of
> which, if provided, prevents the function from decrementing the fd's
> refernce count).
>
> Additionally, allow the ability to open a pidfd by opening a /proc/<pid>
> directory, utilised by the pidfd_send_signal() system call, providing a
> pidfd_get_pid_proc() helper function to do so.
>
> Doing this allows us to eliminate open-coded pidfd pid lookup and to
> consistently handle this in one place.
>
> This lays the groundwork for a subsequent patch which adds a new sentinel
> pidfd to explicitly reference the current process (i.e. thread group
> leader) without the need for a pidfd.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  include/linux/pid.h | 42 +++++++++++++++++++++++++++++++-
>  kernel/pid.c        | 58 ++++++++++++++++++++++++++++++---------------
>  kernel/signal.c     | 22 ++++-------------
>  3 files changed, 84 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index a3aad9b4074c..68b02eab7509 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -2,6 +2,7 @@
>  #ifndef _LINUX_PID_H
>  #define _LINUX_PID_H
>
> +#include <linux/file.h>
>  #include <linux/pid_types.h>
>  #include <linux/rculist.h>
>  #include <linux/rcupdate.h>
> @@ -72,8 +73,47 @@ extern struct pid init_struct_pid;
>
>  struct file;
>
> +
> +/**
> + * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
> + *
> + * @pidfd:      The pidfd whose pid we want, or the fd of a /proc/<pid> file if
> + *              @alloc_proc is also set.
> + * @pin_pid:    If set, then the reference counter of the returned pid is
> + *              incremented. If not set, then @fd should be provided to pin the
> + *              pidfd.
> + * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead
> + *              of a pidfd, and this will be used to determine the pid.
> + * @flags:      Output variable, if non-NULL, then the file->f_flags of the
> + *              pidfd will be set here.
> + * @fd:         Output variable, if non-NULL, then the pidfd reference will
> + *              remain elevated and the caller will need to decrement it
> + *              themselves.
> + *
> + * Returns: If successful, the pid associated with the pidfd, otherwise an
> + *          error.
> + */
> +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> +                           bool allow_proc, unsigned int *flags,
> +                           struct fd *fd);
> +
> +static inline struct pid *pidfd_get_pid(unsigned int pidfd, unsigned int *flags)
> +{
> +       return __pidfd_get_pid(pidfd, /* pin_pid = */ true,
> +                              /* allow_proc = */ false,
> +                              flags, /* fd = */ NULL);
> +}
> +
> +static inline struct pid *pidfd_to_pid_proc(unsigned int pidfd,
> +                                           unsigned int *flags,
> +                                           struct fd *fd)
> +{
> +       return __pidfd_get_pid(pidfd, /* pin_pid = */ false,
> +                              /* allow_proc = */ true,
> +                              flags, fd);
> +}
> +
>  struct pid *pidfd_pid(const struct file *file);
> -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
>  struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
>  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
>  void do_notify_pidfd(struct task_struct *task);
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 2715afb77eab..25cc1c36a1b1 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -36,6 +36,7 @@
>  #include <linux/pid_namespace.h>
>  #include <linux/init_task.h>
>  #include <linux/syscalls.h>
> +#include <linux/proc_fs.h>
>  #include <linux/proc_ns.h>
>  #include <linux/refcount.h>
>  #include <linux/anon_inodes.h>
> @@ -534,22 +535,46 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
>  }
>  EXPORT_SYMBOL_GPL(find_ge_pid);
>
> -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
> +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> +                           bool allow_proc, unsigned int *flags,
> +                           struct fd *fd)
>  {
> -       struct fd f;
> +       struct file *file;
>         struct pid *pid;
> +       struct fd f = fdget(pidfd);
>
> -       f = fdget(fd);
> -       if (!fd_file(f))
> +       file = fd_file(f);
> +       if (!file)
>                 return ERR_PTR(-EBADF);
>
> -       pid = pidfd_pid(fd_file(f));
> -       if (!IS_ERR(pid)) {
> -               get_pid(pid);
> -               *flags = fd_file(f)->f_flags;
> +       pid = pidfd_pid(file);
> +       /* If we allow opening a pidfd via /proc/<pid>, do so. */
> +       if (IS_ERR(pid) && allow_proc)
> +               pid = tgid_pidfd_to_pid(file);
> +
> +       if (IS_ERR(pid)) {
> +               fdput(f);
> +               return pid;
>         }
>
> -       fdput(f);
> +       if (pin_pid)
> +               get_pid(pid);
> +       else
> +               WARN_ON_ONCE(!fd); /* Nothing to keep pid/pidfd around? */
> +
> +       if (flags)
> +               *flags = file->f_flags;
> +
> +       /*
> +        * If the user provides an fd output then it will handle decrementing
> +        * its reference counter.
> +        */
> +       if (fd)
> +               *fd = f;
> +       else
> +               /* Otherwise we release it. */
> +               fdput(f);
> +
>         return pid;
>  }

There is an EXPORT_SYMBOL_GPL(pidfd_get_pid) right after this line. It
should also be changed to EXPORT_SYMBOL_GPL(__pidfd_get_pid),
otherwise __pidfd_get_pid() will not be exported. A module calling
pidfd_get_pid() now inlined in the header file will try to call
__pidfd_get_pid() and will have trouble resolving this symbol.

>
> @@ -747,23 +772,18 @@ SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd,
>                 unsigned int, flags)
>  {
>         struct pid *pid;
> -       struct fd f;
>         int ret;
>
>         /* flags is currently unused - make sure it's unset */
>         if (flags)
>                 return -EINVAL;
>
> -       f = fdget(pidfd);
> -       if (!fd_file(f))
> -               return -EBADF;
> -
> -       pid = pidfd_pid(fd_file(f));
> +       pid = pidfd_get_pid(pidfd, NULL);
>         if (IS_ERR(pid))
> -               ret = PTR_ERR(pid);
> -       else
> -               ret = pidfd_getfd(pid, fd);
> +               return PTR_ERR(pid);
>
> -       fdput(f);
> +       ret = pidfd_getfd(pid, fd);
> +
> +       put_pid(pid);
>         return ret;
>  }
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 4344860ffcac..868bfa674c62 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -3875,17 +3875,6 @@ static int copy_siginfo_from_user_any(kernel_siginfo_t *kinfo,
>         return copy_siginfo_from_user(kinfo, info);
>  }
>
> -static struct pid *pidfd_to_pid(const struct file *file)
> -{
> -       struct pid *pid;
> -
> -       pid = pidfd_pid(file);
> -       if (!IS_ERR(pid))
> -               return pid;
> -
> -       return tgid_pidfd_to_pid(file);
> -}
> -
>  #define PIDFD_SEND_SIGNAL_FLAGS                            \
>         (PIDFD_SIGNAL_THREAD | PIDFD_SIGNAL_THREAD_GROUP | \
>          PIDFD_SIGNAL_PROCESS_GROUP)
> @@ -3908,10 +3897,11 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
>                 siginfo_t __user *, info, unsigned int, flags)
>  {
>         int ret;
> -       struct fd f;
>         struct pid *pid;
>         kernel_siginfo_t kinfo;
>         enum pid_type type;
> +       unsigned int f_flags;
> +       struct fd f;
>
>         /* Enforce flags be set to 0 until we add an extension. */
>         if (flags & ~PIDFD_SEND_SIGNAL_FLAGS)
> @@ -3921,12 +3911,8 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
>         if (hweight32(flags & PIDFD_SEND_SIGNAL_FLAGS) > 1)
>                 return -EINVAL;
>
> -       f = fdget(pidfd);
> -       if (!fd_file(f))
> -               return -EBADF;
> -
>         /* Is this a pidfd? */
> -       pid = pidfd_to_pid(fd_file(f));
> +       pid = pidfd_to_pid_proc(pidfd, &f_flags, &f);
>         if (IS_ERR(pid)) {
>                 ret = PTR_ERR(pid);
>                 goto err;
> @@ -3939,7 +3925,7 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
>         switch (flags) {
>         case 0:
>                 /* Infer scope from the type of pidfd. */
> -               if (fd_file(f)->f_flags & PIDFD_THREAD)
> +               if (f_flags & PIDFD_THREAD)
>                         type = PIDTYPE_PID;
>                 else
>                         type = PIDTYPE_TGID;
> --
> 2.46.2
>
Lorenzo Stoakes Oct. 16, 2024, 6:05 a.m. UTC | #2
On Tue, Oct 15, 2024 at 12:40:41PM -0700, Suren Baghdasaryan wrote:
[snip]
> > -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
> > +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> > +                           bool allow_proc, unsigned int *flags,
> > +                           struct fd *fd)
> >  {
> > -       struct fd f;
> > +       struct file *file;
> >         struct pid *pid;
> > +       struct fd f = fdget(pidfd);
> >
> > -       f = fdget(fd);
> > -       if (!fd_file(f))
> > +       file = fd_file(f);
> > +       if (!file)
> >                 return ERR_PTR(-EBADF);
> >
> > -       pid = pidfd_pid(fd_file(f));
> > -       if (!IS_ERR(pid)) {
> > -               get_pid(pid);
> > -               *flags = fd_file(f)->f_flags;
> > +       pid = pidfd_pid(file);
> > +       /* If we allow opening a pidfd via /proc/<pid>, do so. */
> > +       if (IS_ERR(pid) && allow_proc)
> > +               pid = tgid_pidfd_to_pid(file);
> > +
> > +       if (IS_ERR(pid)) {
> > +               fdput(f);
> > +               return pid;
> >         }
> >
> > -       fdput(f);
> > +       if (pin_pid)
> > +               get_pid(pid);
> > +       else
> > +               WARN_ON_ONCE(!fd); /* Nothing to keep pid/pidfd around? */
> > +
> > +       if (flags)
> > +               *flags = file->f_flags;
> > +
> > +       /*
> > +        * If the user provides an fd output then it will handle decrementing
> > +        * its reference counter.
> > +        */
> > +       if (fd)
> > +               *fd = f;
> > +       else
> > +               /* Otherwise we release it. */
> > +               fdput(f);
> > +
> >         return pid;
> >  }
>
> There is an EXPORT_SYMBOL_GPL(pidfd_get_pid) right after this line. It
> should also be changed to EXPORT_SYMBOL_GPL(__pidfd_get_pid),
> otherwise __pidfd_get_pid() will not be exported. A module calling
> pidfd_get_pid() now inlined in the header file will try to call
> __pidfd_get_pid() and will have trouble resolving this symbol.

Hmm hang on not there isn't? I don't see that anywhere?

[snip]
Suren Baghdasaryan Oct. 16, 2024, 8:16 a.m. UTC | #3
On Tue, Oct 15, 2024 at 11:05 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Tue, Oct 15, 2024 at 12:40:41PM -0700, Suren Baghdasaryan wrote:
> [snip]
> > > -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
> > > +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> > > +                           bool allow_proc, unsigned int *flags,
> > > +                           struct fd *fd)
> > >  {
> > > -       struct fd f;
> > > +       struct file *file;
> > >         struct pid *pid;
> > > +       struct fd f = fdget(pidfd);
> > >
> > > -       f = fdget(fd);
> > > -       if (!fd_file(f))
> > > +       file = fd_file(f);
> > > +       if (!file)
> > >                 return ERR_PTR(-EBADF);
> > >
> > > -       pid = pidfd_pid(fd_file(f));
> > > -       if (!IS_ERR(pid)) {
> > > -               get_pid(pid);
> > > -               *flags = fd_file(f)->f_flags;
> > > +       pid = pidfd_pid(file);
> > > +       /* If we allow opening a pidfd via /proc/<pid>, do so. */
> > > +       if (IS_ERR(pid) && allow_proc)
> > > +               pid = tgid_pidfd_to_pid(file);
> > > +
> > > +       if (IS_ERR(pid)) {
> > > +               fdput(f);
> > > +               return pid;
> > >         }
> > >
> > > -       fdput(f);
> > > +       if (pin_pid)
> > > +               get_pid(pid);
> > > +       else
> > > +               WARN_ON_ONCE(!fd); /* Nothing to keep pid/pidfd around? */
> > > +
> > > +       if (flags)
> > > +               *flags = file->f_flags;
> > > +
> > > +       /*
> > > +        * If the user provides an fd output then it will handle decrementing
> > > +        * its reference counter.
> > > +        */
> > > +       if (fd)
> > > +               *fd = f;
> > > +       else
> > > +               /* Otherwise we release it. */
> > > +               fdput(f);
> > > +
> > >         return pid;
> > >  }
> >
> > There is an EXPORT_SYMBOL_GPL(pidfd_get_pid) right after this line. It
> > should also be changed to EXPORT_SYMBOL_GPL(__pidfd_get_pid),
> > otherwise __pidfd_get_pid() will not be exported. A module calling
> > pidfd_get_pid() now inlined in the header file will try to call
> > __pidfd_get_pid() and will have trouble resolving this symbol.
>
> Hmm hang on not there isn't? I don't see that anywhere?

Doh! Sorry, I didn't realize the export was an out-of-tree Android
change. Never mind...

>
> [snip]
Lorenzo Stoakes Oct. 16, 2024, 8:22 a.m. UTC | #4
On Wed, Oct 16, 2024 at 01:16:15AM -0700, Suren Baghdasaryan wrote:
> On Tue, Oct 15, 2024 at 11:05 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > On Tue, Oct 15, 2024 at 12:40:41PM -0700, Suren Baghdasaryan wrote:
> > [snip]
> > > > -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
> > > > +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> > > > +                           bool allow_proc, unsigned int *flags,
> > > > +                           struct fd *fd)
> > > >  {
> > > > -       struct fd f;
> > > > +       struct file *file;
> > > >         struct pid *pid;
> > > > +       struct fd f = fdget(pidfd);
> > > >
> > > > -       f = fdget(fd);
> > > > -       if (!fd_file(f))
> > > > +       file = fd_file(f);
> > > > +       if (!file)
> > > >                 return ERR_PTR(-EBADF);
> > > >
> > > > -       pid = pidfd_pid(fd_file(f));
> > > > -       if (!IS_ERR(pid)) {
> > > > -               get_pid(pid);
> > > > -               *flags = fd_file(f)->f_flags;
> > > > +       pid = pidfd_pid(file);
> > > > +       /* If we allow opening a pidfd via /proc/<pid>, do so. */
> > > > +       if (IS_ERR(pid) && allow_proc)
> > > > +               pid = tgid_pidfd_to_pid(file);
> > > > +
> > > > +       if (IS_ERR(pid)) {
> > > > +               fdput(f);
> > > > +               return pid;
> > > >         }
> > > >
> > > > -       fdput(f);
> > > > +       if (pin_pid)
> > > > +               get_pid(pid);
> > > > +       else
> > > > +               WARN_ON_ONCE(!fd); /* Nothing to keep pid/pidfd around? */
> > > > +
> > > > +       if (flags)
> > > > +               *flags = file->f_flags;
> > > > +
> > > > +       /*
> > > > +        * If the user provides an fd output then it will handle decrementing
> > > > +        * its reference counter.
> > > > +        */
> > > > +       if (fd)
> > > > +               *fd = f;
> > > > +       else
> > > > +               /* Otherwise we release it. */
> > > > +               fdput(f);
> > > > +
> > > >         return pid;
> > > >  }
> > >
> > > There is an EXPORT_SYMBOL_GPL(pidfd_get_pid) right after this line. It
> > > should also be changed to EXPORT_SYMBOL_GPL(__pidfd_get_pid),
> > > otherwise __pidfd_get_pid() will not be exported. A module calling
> > > pidfd_get_pid() now inlined in the header file will try to call
> > > __pidfd_get_pid() and will have trouble resolving this symbol.
> >
> > Hmm hang on not there isn't? I don't see that anywhere?
>
> Doh! Sorry, I didn't realize the export was an out-of-tree Android
> change. Never mind...

No probs :P just glad I didn't miss something in this series!

Hey maybe a motivation to upstream some of this? ;)

>
> >
> > [snip]
Suren Baghdasaryan Oct. 16, 2024, 9:02 a.m. UTC | #5
On Wed, Oct 16, 2024 at 1:22 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Oct 16, 2024 at 01:16:15AM -0700, Suren Baghdasaryan wrote:
> > On Tue, Oct 15, 2024 at 11:05 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > On Tue, Oct 15, 2024 at 12:40:41PM -0700, Suren Baghdasaryan wrote:
> > > [snip]
> > > > > -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
> > > > > +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> > > > > +                           bool allow_proc, unsigned int *flags,
> > > > > +                           struct fd *fd)
> > > > >  {
> > > > > -       struct fd f;
> > > > > +       struct file *file;
> > > > >         struct pid *pid;
> > > > > +       struct fd f = fdget(pidfd);
> > > > >
> > > > > -       f = fdget(fd);
> > > > > -       if (!fd_file(f))
> > > > > +       file = fd_file(f);
> > > > > +       if (!file)
> > > > >                 return ERR_PTR(-EBADF);
> > > > >
> > > > > -       pid = pidfd_pid(fd_file(f));
> > > > > -       if (!IS_ERR(pid)) {
> > > > > -               get_pid(pid);
> > > > > -               *flags = fd_file(f)->f_flags;
> > > > > +       pid = pidfd_pid(file);
> > > > > +       /* If we allow opening a pidfd via /proc/<pid>, do so. */
> > > > > +       if (IS_ERR(pid) && allow_proc)
> > > > > +               pid = tgid_pidfd_to_pid(file);
> > > > > +
> > > > > +       if (IS_ERR(pid)) {
> > > > > +               fdput(f);
> > > > > +               return pid;
> > > > >         }
> > > > >
> > > > > -       fdput(f);
> > > > > +       if (pin_pid)
> > > > > +               get_pid(pid);
> > > > > +       else
> > > > > +               WARN_ON_ONCE(!fd); /* Nothing to keep pid/pidfd around? */
> > > > > +
> > > > > +       if (flags)
> > > > > +               *flags = file->f_flags;
> > > > > +
> > > > > +       /*
> > > > > +        * If the user provides an fd output then it will handle decrementing
> > > > > +        * its reference counter.
> > > > > +        */
> > > > > +       if (fd)
> > > > > +               *fd = f;
> > > > > +       else
> > > > > +               /* Otherwise we release it. */
> > > > > +               fdput(f);
> > > > > +
> > > > >         return pid;
> > > > >  }
> > > >
> > > > There is an EXPORT_SYMBOL_GPL(pidfd_get_pid) right after this line. It
> > > > should also be changed to EXPORT_SYMBOL_GPL(__pidfd_get_pid),
> > > > otherwise __pidfd_get_pid() will not be exported. A module calling
> > > > pidfd_get_pid() now inlined in the header file will try to call
> > > > __pidfd_get_pid() and will have trouble resolving this symbol.
> > >
> > > Hmm hang on not there isn't? I don't see that anywhere?
> >
> > Doh! Sorry, I didn't realize the export was an out-of-tree Android
> > change. Never mind...
>
> No probs :P just glad I didn't miss something in this series!
>
> Hey maybe a motivation to upstream some of this? ;)

I wish... Without an upstream user the exports are not accepted
upstream and unfortunately Android vendors often resist upstreaming
their modules.

>
> >
> > >
> > > [snip]
Christian Brauner Oct. 16, 2024, 1 p.m. UTC | #6
On Fri, Oct 11, 2024 at 12:05:55PM +0100, Lorenzo Stoakes wrote:
> The means by which a pid is determined from a pidfd is duplicated, with
> some callers holding a reference to the (pid)fd, and others explicitly
> pinning the pid.
> 
> Introduce __pidfd_get_pid() which abstracts both approaches and provide
> optional output parameters for file->f_flags and the fd (the latter of
> which, if provided, prevents the function from decrementing the fd's
> refernce count).
> 
> Additionally, allow the ability to open a pidfd by opening a /proc/<pid>
> directory, utilised by the pidfd_send_signal() system call, providing a
> pidfd_get_pid_proc() helper function to do so.
> 
> Doing this allows us to eliminate open-coded pidfd pid lookup and to
> consistently handle this in one place.
> 
> This lays the groundwork for a subsequent patch which adds a new sentinel
> pidfd to explicitly reference the current process (i.e. thread group
> leader) without the need for a pidfd.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  include/linux/pid.h | 42 +++++++++++++++++++++++++++++++-
>  kernel/pid.c        | 58 ++++++++++++++++++++++++++++++---------------
>  kernel/signal.c     | 22 ++++-------------
>  3 files changed, 84 insertions(+), 38 deletions(-)
> 
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index a3aad9b4074c..68b02eab7509 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -2,6 +2,7 @@
>  #ifndef _LINUX_PID_H
>  #define _LINUX_PID_H
>  
> +#include <linux/file.h>
>  #include <linux/pid_types.h>
>  #include <linux/rculist.h>
>  #include <linux/rcupdate.h>
> @@ -72,8 +73,47 @@ extern struct pid init_struct_pid;
>  
>  struct file;
>  
> +
> +/**
> + * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
> + *
> + * @pidfd:      The pidfd whose pid we want, or the fd of a /proc/<pid> file if
> + *              @alloc_proc is also set.
> + * @pin_pid:    If set, then the reference counter of the returned pid is
> + *              incremented. If not set, then @fd should be provided to pin the
> + *              pidfd.
> + * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead
> + *              of a pidfd, and this will be used to determine the pid.
> + * @flags:      Output variable, if non-NULL, then the file->f_flags of the
> + *              pidfd will be set here.
> + * @fd:         Output variable, if non-NULL, then the pidfd reference will
> + *              remain elevated and the caller will need to decrement it
> + *              themselves.
> + *
> + * Returns: If successful, the pid associated with the pidfd, otherwise an
> + *          error.
> + */
> +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> +			    bool allow_proc, unsigned int *flags,
> +			    struct fd *fd);
> +
> +static inline struct pid *pidfd_get_pid(unsigned int pidfd, unsigned int *flags)
> +{
> +	return __pidfd_get_pid(pidfd, /* pin_pid = */ true,
> +			       /* allow_proc = */ false,
> +			       flags, /* fd = */ NULL);
> +}
> +
> +static inline struct pid *pidfd_to_pid_proc(unsigned int pidfd,
> +					    unsigned int *flags,
> +					    struct fd *fd)
> +{
> +	return __pidfd_get_pid(pidfd, /* pin_pid = */ false,
> +			       /* allow_proc = */ true,
> +			       flags, fd);
> +}
> +
>  struct pid *pidfd_pid(const struct file *file);
> -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
>  struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
>  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
>  void do_notify_pidfd(struct task_struct *task);
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 2715afb77eab..25cc1c36a1b1 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -36,6 +36,7 @@
>  #include <linux/pid_namespace.h>
>  #include <linux/init_task.h>
>  #include <linux/syscalls.h>
> +#include <linux/proc_fs.h>
>  #include <linux/proc_ns.h>
>  #include <linux/refcount.h>
>  #include <linux/anon_inodes.h>
> @@ -534,22 +535,46 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
>  }
>  EXPORT_SYMBOL_GPL(find_ge_pid);
>  
> -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
> +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> +			    bool allow_proc, unsigned int *flags,
> +			    struct fd *fd)

Hm, we should never return a struct fd. A struct fd is an inherently
scoped-bound concept - or at least aims to be. Simply put, we always
want to have the fdget() and the fdput() in the same scope as the file
pointer you can access via fd_file() is only valid as long as we're in
the syscall.

Ideally we mostly use CLASS(fd/fd_raw) and nearly never fdget(). The
point is that this is the wrong api to expose.

It would probably be wiser if you added a pidfd based fdget() inspired
primitive.
Lorenzo Stoakes Oct. 16, 2024, 8 p.m. UTC | #7
On Wed, Oct 16, 2024 at 03:00:55PM +0200, Christian Brauner wrote:
> On Fri, Oct 11, 2024 at 12:05:55PM +0100, Lorenzo Stoakes wrote:
> > The means by which a pid is determined from a pidfd is duplicated, with
> > some callers holding a reference to the (pid)fd, and others explicitly
> > pinning the pid.
> >
> > Introduce __pidfd_get_pid() which abstracts both approaches and provide
> > optional output parameters for file->f_flags and the fd (the latter of
> > which, if provided, prevents the function from decrementing the fd's
> > refernce count).
> >
> > Additionally, allow the ability to open a pidfd by opening a /proc/<pid>
> > directory, utilised by the pidfd_send_signal() system call, providing a
> > pidfd_get_pid_proc() helper function to do so.
> >
> > Doing this allows us to eliminate open-coded pidfd pid lookup and to
> > consistently handle this in one place.
> >
> > This lays the groundwork for a subsequent patch which adds a new sentinel
> > pidfd to explicitly reference the current process (i.e. thread group
> > leader) without the need for a pidfd.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  include/linux/pid.h | 42 +++++++++++++++++++++++++++++++-
> >  kernel/pid.c        | 58 ++++++++++++++++++++++++++++++---------------
> >  kernel/signal.c     | 22 ++++-------------
> >  3 files changed, 84 insertions(+), 38 deletions(-)
> >
> > diff --git a/include/linux/pid.h b/include/linux/pid.h
> > index a3aad9b4074c..68b02eab7509 100644
> > --- a/include/linux/pid.h
> > +++ b/include/linux/pid.h
> > @@ -2,6 +2,7 @@
> >  #ifndef _LINUX_PID_H
> >  #define _LINUX_PID_H
> >
> > +#include <linux/file.h>
> >  #include <linux/pid_types.h>
> >  #include <linux/rculist.h>
> >  #include <linux/rcupdate.h>
> > @@ -72,8 +73,47 @@ extern struct pid init_struct_pid;
> >
> >  struct file;
> >
> > +
> > +/**
> > + * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd.
> > + *
> > + * @pidfd:      The pidfd whose pid we want, or the fd of a /proc/<pid> file if
> > + *              @alloc_proc is also set.
> > + * @pin_pid:    If set, then the reference counter of the returned pid is
> > + *              incremented. If not set, then @fd should be provided to pin the
> > + *              pidfd.
> > + * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead
> > + *              of a pidfd, and this will be used to determine the pid.
> > + * @flags:      Output variable, if non-NULL, then the file->f_flags of the
> > + *              pidfd will be set here.
> > + * @fd:         Output variable, if non-NULL, then the pidfd reference will
> > + *              remain elevated and the caller will need to decrement it
> > + *              themselves.
> > + *
> > + * Returns: If successful, the pid associated with the pidfd, otherwise an
> > + *          error.
> > + */
> > +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> > +			    bool allow_proc, unsigned int *flags,
> > +			    struct fd *fd);
> > +
> > +static inline struct pid *pidfd_get_pid(unsigned int pidfd, unsigned int *flags)
> > +{
> > +	return __pidfd_get_pid(pidfd, /* pin_pid = */ true,
> > +			       /* allow_proc = */ false,
> > +			       flags, /* fd = */ NULL);
> > +}
> > +
> > +static inline struct pid *pidfd_to_pid_proc(unsigned int pidfd,
> > +					    unsigned int *flags,
> > +					    struct fd *fd)
> > +{
> > +	return __pidfd_get_pid(pidfd, /* pin_pid = */ false,
> > +			       /* allow_proc = */ true,
> > +			       flags, fd);
> > +}
> > +
> >  struct pid *pidfd_pid(const struct file *file);
> > -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
> >  struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
> >  int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
> >  void do_notify_pidfd(struct task_struct *task);
> > diff --git a/kernel/pid.c b/kernel/pid.c
> > index 2715afb77eab..25cc1c36a1b1 100644
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -36,6 +36,7 @@
> >  #include <linux/pid_namespace.h>
> >  #include <linux/init_task.h>
> >  #include <linux/syscalls.h>
> > +#include <linux/proc_fs.h>
> >  #include <linux/proc_ns.h>
> >  #include <linux/refcount.h>
> >  #include <linux/anon_inodes.h>
> > @@ -534,22 +535,46 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
> >  }
> >  EXPORT_SYMBOL_GPL(find_ge_pid);
> >
> > -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
> > +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid,
> > +			    bool allow_proc, unsigned int *flags,
> > +			    struct fd *fd)
>
> Hm, we should never return a struct fd. A struct fd is an inherently
> scoped-bound concept - or at least aims to be. Simply put, we always
> want to have the fdget() and the fdput() in the same scope as the file
> pointer you can access via fd_file() is only valid as long as we're in
> the syscall.
>
> Ideally we mostly use CLASS(fd/fd_raw) and nearly never fdget(). The
> point is that this is the wrong api to expose.
>
> It would probably be wiser if you added a pidfd based fdget() inspired
> primitive.

I think we can actually probably just avoid passing it back and pin the pid
instead of the fd, which keeps the scope as before and simplifies things
generally.

Let me experiment with that!