mbox series

[0/2] procfs: Add file path and size to /proc/<pid>/fdinfo

Message ID 20220531212521.1231133-1-kaleshsingh@google.com
Headers show
Series procfs: Add file path and size to /proc/<pid>/fdinfo | expand

Message

Kalesh Singh May 31, 2022, 9:25 p.m. UTC
Processes can pin shared memory by keeping a handle to it through a
file descriptor; for instance dmabufs, memfd, and ashmem (in Android).

In the case of a memory leak, to identify the process pinning the
memory, userspace needs to:
  - Iterate the /proc/<pid>/fd/* for each process
  - Do a readlink on each entry to identify the type of memory from
    the file path.
  - stat() each entry to get the size of the memory.

The file permissions on /proc/<pid>/fd/* only allows for the owner
or root to perform the operations above; and so is not suitable for
capturing the system-wide state in a production environment.

This issue was addressed for dmabufs by making /proc/*/fdinfo/*
accessible to a process with PTRACE_MODE_READ_FSCREDS credentials[1]
To allow the same kind of tracking for other types of shared memory,
add the following fields to /proc/<pid>/fdinfo/<fd>:

path - This allows identifying the type of memory based on common
       prefixes: e.g. "/memfd...", "/dmabuf...", "/dev/ashmem..."

       This was not an issued when dmabuf tracking was introduced
       because the exp_name field of dmabuf fdinfo could be used
       to distinguish dmabuf fds from other types.

size - To track the amount of memory that is being pinned.

       dmabufs expose size as an additional field in fdinfo. Remove
       this and make it a common field for all fds.

Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
-- the same as for /proc/<pid>/maps which also exposes the path and
size for mapped memory regions.

This allows for a system process with PTRACE_MODE_READ_FSCREDS to
account the pinned per-process memory via fdinfo.

-----

There was some concern about exposing the file path in the RFC[2], to that
effect the change was split into separte patches. Also retrieving the file
path from fdinfo is guarded by the same capability (PTRACE_MODE_READ) as
/proc/<pid>/maps which also exposes file path, so this may not be an issue.

[1] https://lore.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com/
[2] https://lore.kernel.org/r/20220519214021.3572840-1-kaleshsingh@google.com/


Kalesh Singh (2):
  procfs: Add 'size' to /proc/<pid>/fdinfo/
  procfs: Add 'path' to /proc/<pid>/fdinfo/

 Documentation/filesystems/proc.rst | 22 ++++++++++++++++++++--
 drivers/dma-buf/dma-buf.c          |  1 -
 fs/proc/fd.c                       | 13 +++++++++----
 3 files changed, 29 insertions(+), 7 deletions(-)


base-commit: 8ab2afa23bd197df47819a87f0265c0ac95c5b6a

Comments

Kalesh Singh June 1, 2022, 2:58 p.m. UTC | #1
On Wed, Jun 1, 2022 at 6:55 AM Christian König <christian.koenig@amd.com> wrote:
>
> Am 31.05.22 um 23:25 schrieb Kalesh Singh:
> > To be able to account the amount of memory a process is keeping pinned
> > by open file descriptors add a 'size' field to fdinfo output.
> >
> > dmabufs fds already expose a 'size' field for this reason, remove this
> > and make it a common field for all fds. This allows tracking of
> > other types of memory (e.g. memfd and ashmem in Android).
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
>
> At least for the DMA-buf part feel free to add an Reviewed-by: Christian
> König <christian.koenig@amd.com> for this.

Thanks for the review, Christian.

--Kalesh

>
> Regards,
> Christian.
>
> > ---
> >
> > Changes from rfc:
> >    - Split adding 'size' and 'path' into a separate patches, per Christian
> >    - Split fdinfo seq_printf into separate lines, per Christian
> >    - Fix indentation (use tabs) in documentaion, per Randy
> >
> >   Documentation/filesystems/proc.rst | 12 ++++++++++--
> >   drivers/dma-buf/dma-buf.c          |  1 -
> >   fs/proc/fd.c                       |  9 +++++----
> >   3 files changed, 15 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index 1bc91fb8c321..779c05528e87 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -1886,13 +1886,14 @@ if precise results are needed.
> >   3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
> >   ---------------------------------------------------------------
> >   This file provides information associated with an opened file. The regular
> > -files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
> > +files have at least five fields -- 'pos', 'flags', 'mnt_id', 'ino', and 'size'.
> > +
> >   The 'pos' represents the current offset of the opened file in decimal
> >   form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
> >   file has been created with [see open(2) for details] and 'mnt_id' represents
> >   mount ID of the file system containing the opened file [see 3.5
> >   /proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
> > -the file.
> > +the file, and 'size' represents the size of the file in bytes.
> >
> >   A typical output is::
> >
> > @@ -1900,6 +1901,7 @@ A typical output is::
> >       flags:  0100002
> >       mnt_id: 19
> >       ino:    63107
> > +     size:   0
> >
> >   All locks associated with a file descriptor are shown in its fdinfo too::
> >
> > @@ -1917,6 +1919,7 @@ Eventfd files
> >       flags:  04002
> >       mnt_id: 9
> >       ino:    63107
> > +     size:   0
> >       eventfd-count:  5a
> >
> >   where 'eventfd-count' is hex value of a counter.
> > @@ -1930,6 +1933,7 @@ Signalfd files
> >       flags:  04002
> >       mnt_id: 9
> >       ino:    63107
> > +     size:   0
> >       sigmask:        0000000000000200
> >
> >   where 'sigmask' is hex value of the signal mask associated
> > @@ -1944,6 +1948,7 @@ Epoll files
> >       flags:  02
> >       mnt_id: 9
> >       ino:    63107
> > +     size:   0
> >       tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
> >
> >   where 'tfd' is a target file descriptor number in decimal form,
> > @@ -1962,6 +1967,7 @@ For inotify files the format is the following::
> >       flags:  02000000
> >       mnt_id: 9
> >       ino:    63107
> > +     size:   0
> >       inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
> >
> >   where 'wd' is a watch descriptor in decimal form, i.e. a target file
> > @@ -1985,6 +1991,7 @@ For fanotify files the format is::
> >       flags:  02
> >       mnt_id: 9
> >       ino:    63107
> > +     size:   0
> >       fanotify flags:10 event-flags:0
> >       fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
> >       fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
> > @@ -2010,6 +2017,7 @@ Timerfd files
> >       flags:  02
> >       mnt_id: 9
> >       ino:    63107
> > +     size:   0
> >       clockid: 0
> >       ticks: 0
> >       settime flags: 01
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 32f55640890c..5f2ae38c960f 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -378,7 +378,6 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
> >   {
> >       struct dma_buf *dmabuf = file->private_data;
> >
> > -     seq_printf(m, "size:\t%zu\n", dmabuf->size);
> >       /* Don't count the temporary reference taken inside procfs seq_show */
> >       seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
> >       seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name);
> > diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> > index 913bef0d2a36..464bc3f55759 100644
> > --- a/fs/proc/fd.c
> > +++ b/fs/proc/fd.c
> > @@ -54,10 +54,11 @@ static int seq_show(struct seq_file *m, void *v)
> >       if (ret)
> >               return ret;
> >
> > -     seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
> > -                (long long)file->f_pos, f_flags,
> > -                real_mount(file->f_path.mnt)->mnt_id,
> > -                file_inode(file)->i_ino);
> > +     seq_printf(m, "pos:\t%lli\n", (long long)file->f_pos);
> > +     seq_printf(m, "flags:\t0%o\n", f_flags);
> > +     seq_printf(m, "mnt_id:\t%i\n", real_mount(file->f_path.mnt)->mnt_id);
> > +     seq_printf(m, "ino:\t%lu\n", file_inode(file)->i_ino);
> > +     seq_printf(m, "size:\t%lli\n", (long long)file_inode(file)->i_size);
> >
> >       /* show_fd_locks() never deferences files so a stale value is safe */
> >       show_fd_locks(m, file, files);
>