mbox series

[RFC,0/3] drm: Add comm/cmdline fdinfo fields

Message ID 20230417201215.448099-1-robdclark@gmail.com
Headers show
Series drm: Add comm/cmdline fdinfo fields | expand

Message

Rob Clark April 17, 2023, 8:12 p.m. UTC
From: Rob Clark <robdclark@chromium.org>

When many of the things using the GPU are processes in a VM guest, the
actual client process is just a proxy.  The msm driver has a way to let
the proxy tell the kernel the actual VM client process's executable name
and command-line, which has until now been used simply for GPU crash
devcore dumps.  Lets also expose this via fdinfo so that tools can
expose who the actual user of the GPU is.

Rob Clark (3):
  drm/doc: Relax fdinfo string constraints
  drm/msm: Rework get_comm_cmdline() helper
  drm/msm: Add comm/cmdline fields

 Documentation/gpu/drm-usage-stats.rst   | 37 +++++++++++++++----------
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +--
 drivers/gpu/drm/msm/msm_drv.c           |  2 ++
 drivers/gpu/drm/msm/msm_gpu.c           | 27 +++++++++++++-----
 drivers/gpu/drm/msm/msm_gpu.h           | 12 ++++++--
 drivers/gpu/drm/msm/msm_submitqueue.c   |  1 +
 6 files changed, 58 insertions(+), 25 deletions(-)

Comments

Rob Clark April 17, 2023, 8:45 p.m. UTC | #1
On Mon, Apr 17, 2023 at 1:12 PM Rob Clark <robdclark@gmail.com> wrote:
>
> From: Rob Clark <robdclark@chromium.org>
>
> When many of the things using the GPU are processes in a VM guest, the
> actual client process is just a proxy.  The msm driver has a way to let
> the proxy tell the kernel the actual VM client process's executable name
> and command-line, which has until now been used simply for GPU crash
> devcore dumps.  Lets also expose this via fdinfo so that tools can
> expose who the actual user of the GPU is.

I should have also mentioned, in the VM/proxy scenario we have a
single process with separate drm_file's for each guest VM process.  So
it isn't an option to just change the proxy process's name to match
the client.

> Rob Clark (3):
>   drm/doc: Relax fdinfo string constraints
>   drm/msm: Rework get_comm_cmdline() helper
>   drm/msm: Add comm/cmdline fields
>
>  Documentation/gpu/drm-usage-stats.rst   | 37 +++++++++++++++----------
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +--
>  drivers/gpu/drm/msm/msm_drv.c           |  2 ++
>  drivers/gpu/drm/msm/msm_gpu.c           | 27 +++++++++++++-----
>  drivers/gpu/drm/msm/msm_gpu.h           | 12 ++++++--
>  drivers/gpu/drm/msm/msm_submitqueue.c   |  1 +
>  6 files changed, 58 insertions(+), 25 deletions(-)
>
> --
> 2.39.2
>
Tvrtko Ursulin April 18, 2023, 8:53 a.m. UTC | #2
On 17/04/2023 21:12, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Normally this would be the same information that can be obtained in
> other ways.  But in some cases the process opening the drm fd is merely
> a sort of proxy for the actual process using the GPU.  This is the case
> for guest VM processes using the GPU via virglrenderer, in which case
> the msm native-context renderer in virglrenderer overrides the comm/
> cmdline to be the guest process's values.
> 
> Exposing this via fdinfo allows tools like gputop to show something more
> meaningful than just a bunch of "pcivirtio-gpu" users.

You also later expanded with:

"""
I should have also mentioned, in the VM/proxy scenario we have a
single process with separate drm_file's for each guest VM process.  So
it isn't an option to just change the proxy process's name to match
the client.
"""

So how does that work - this single process temporarily changes it's 
name for each drm fd it opens and creates a context or it is actually in 
the native context protocol?

> 
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   Documentation/gpu/drm-usage-stats.rst |  8 ++++++++
>   drivers/gpu/drm/msm/msm_gpu.c         | 14 ++++++++++++++
>   2 files changed, 22 insertions(+)
> 
> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> index 8e00d53231e0..bc90bed455e3 100644
> --- a/Documentation/gpu/drm-usage-stats.rst
> +++ b/Documentation/gpu/drm-usage-stats.rst
> @@ -148,6 +148,14 @@ percentage utilization of the engine, whereas drm-engine-<keystr> only reflects
>   time active without considering what frequency the engine is operating as a
>   percentage of it's maximum frequency.
>   
> +- drm-comm: <valstr>
> +
> +Returns the clients executable path.

Full path and not just current->comm? In this case probably give it a 
more descriptive name here.

drm-client-executable
drm-client-command-line

So we stay in the drm-client- namespace?

Or if the former is absolute path could one key be enough for both?

drm-client-command-line: /path/to/executable --arguments

> +
> +- drm-cmdline: <valstr>
> +
> +Returns the clients cmdline.

I think drm-usage-stats.rst text should provide some more text with 
these two. To precisely define their content and outline the use case 
under which driver authors may want to add them, and fdinfo consumer 
therefore expect to see them. Just so everything is completely clear and 
people do not start adding them for drivers which do not support native 
context (or like).

But on the overall it sounds reasonable to me - it would be really cool 
to not just see pcivirtio-gpu as you say. Even if the standard virtiogpu 
use case (not native context) could show real users.

Regards,

Tvrtko

> +
>   Implementation Details
>   ======================
>   
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index f0f4f845c32d..1150dcbf28aa 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -148,12 +148,26 @@ int msm_gpu_pm_suspend(struct msm_gpu *gpu)
>   	return 0;
>   }
>   
> +static void get_comm_cmdline(struct msm_file_private *ctx, char **comm, char **cmd);
> +
>   void msm_gpu_show_fdinfo(struct msm_gpu *gpu, struct msm_file_private *ctx,
>   			 struct drm_printer *p)
>   {
> +	char *comm, *cmdline;
> +
> +	get_comm_cmdline(ctx, &comm, &cmdline);
> +
>   	drm_printf(p, "drm-engine-gpu:\t%llu ns\n", ctx->elapsed_ns);
>   	drm_printf(p, "drm-cycles-gpu:\t%llu\n", ctx->cycles);
>   	drm_printf(p, "drm-maxfreq-gpu:\t%u Hz\n", gpu->fast_rate);
> +
> +	if (comm)
> +		drm_printf(p, "drm-comm:\t%s\n", comm);
> +	if (cmdline)
> +		drm_printf(p, "drm-cmdline:\t%s\n", cmdline);
> +
> +	kfree(comm);
> +	kfree(cmdline);
>   }
>   
>   int msm_gpu_hw_init(struct msm_gpu *gpu)
Konrad Dybcio April 18, 2023, 9:33 a.m. UTC | #3
Looks like the 'PATCH' part of your subject was cut off!

Konrad

On 17.04.2023 22:12, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> When many of the things using the GPU are processes in a VM guest, the
> actual client process is just a proxy.  The msm driver has a way to let
> the proxy tell the kernel the actual VM client process's executable name
> and command-line, which has until now been used simply for GPU crash
> devcore dumps.  Lets also expose this via fdinfo so that tools can
> expose who the actual user of the GPU is.
> 
> Rob Clark (3):
>   drm/doc: Relax fdinfo string constraints
>   drm/msm: Rework get_comm_cmdline() helper
>   drm/msm: Add comm/cmdline fields
> 
>  Documentation/gpu/drm-usage-stats.rst   | 37 +++++++++++++++----------
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  4 +--
>  drivers/gpu/drm/msm/msm_drv.c           |  2 ++
>  drivers/gpu/drm/msm/msm_gpu.c           | 27 +++++++++++++-----
>  drivers/gpu/drm/msm/msm_gpu.h           | 12 ++++++--
>  drivers/gpu/drm/msm/msm_submitqueue.c   |  1 +
>  6 files changed, 58 insertions(+), 25 deletions(-)
>
Rob Clark April 18, 2023, 2:56 p.m. UTC | #4
On Tue, Apr 18, 2023 at 1:53 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 17/04/2023 21:12, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Normally this would be the same information that can be obtained in
> > other ways.  But in some cases the process opening the drm fd is merely
> > a sort of proxy for the actual process using the GPU.  This is the case
> > for guest VM processes using the GPU via virglrenderer, in which case
> > the msm native-context renderer in virglrenderer overrides the comm/
> > cmdline to be the guest process's values.
> >
> > Exposing this via fdinfo allows tools like gputop to show something more
> > meaningful than just a bunch of "pcivirtio-gpu" users.
>
> You also later expanded with:
>
> """
> I should have also mentioned, in the VM/proxy scenario we have a
> single process with separate drm_file's for each guest VM process.  So
> it isn't an option to just change the proxy process's name to match
> the client.
> """
>
> So how does that work - this single process temporarily changes it's
> name for each drm fd it opens and creates a context or it is actually in
> the native context protocol?

It is part of the protocol, the mesa driver in the VM sends[1] this
info to the native-context "shim" in host userspace which uses the
SET_PARAM ioctl to pass this to the kernel.  In the host userspace
there is just a single process (you see the host PID below) but it
does a separate open() of the drm dev for each guest process (so that
they each have their own GPU address space for isolation):

DRM minor 128
    PID    MEM ACTIV              NAME                    gpu
    5297  200M   82M com.mojang.minecr |██████████████▏                        |
    1859  199M    0B            chrome |█▉                                     |
    5297   64M    9M    surfaceflinger |                                       |
    5297   12M    0B org.chromium.arc. |                                       |
    5297   12M    0B com.android.syste |                                       |
    5297   12M    0B org.chromium.arc. |                                       |
    5297   26M    0B com.google.androi |                                       |
    5297   65M    0B     system_server |                                       |


[1] https://gitlab.freedesktop.org/virgl/virglrenderer/-/blob/master/src/drm/msm/msm_proto.h#L326
[2] https://gitlab.freedesktop.org/virgl/virglrenderer/-/blob/master/src/drm/msm/msm_renderer.c#L1050

> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >   Documentation/gpu/drm-usage-stats.rst |  8 ++++++++
> >   drivers/gpu/drm/msm/msm_gpu.c         | 14 ++++++++++++++
> >   2 files changed, 22 insertions(+)
> >
> > diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> > index 8e00d53231e0..bc90bed455e3 100644
> > --- a/Documentation/gpu/drm-usage-stats.rst
> > +++ b/Documentation/gpu/drm-usage-stats.rst
> > @@ -148,6 +148,14 @@ percentage utilization of the engine, whereas drm-engine-<keystr> only reflects
> >   time active without considering what frequency the engine is operating as a
> >   percentage of it's maximum frequency.
> >
> > +- drm-comm: <valstr>
> > +
> > +Returns the clients executable path.
>
> Full path and not just current->comm? In this case probably give it a
> more descriptive name here.
>
> drm-client-executable
> drm-client-command-line
>
> So we stay in the drm-client- namespace?
>
> Or if the former is absolute path could one key be enough for both?
>
> drm-client-command-line: /path/to/executable --arguments

comm and cmdline can be different. Android seems to change the comm to
the apk name, for example (and w/ the zygote stuff cmdline isn't
really a thing)

I guess it could be drm-client-comm and drm-client-cmdline?  Although
comm/cmdline aren't the best names, they are just following what the
kernel calls them elsewhere.

> > +
> > +- drm-cmdline: <valstr>
> > +
> > +Returns the clients cmdline.
>
> I think drm-usage-stats.rst text should provide some more text with
> these two. To precisely define their content and outline the use case
> under which driver authors may want to add them, and fdinfo consumer
> therefore expect to see them. Just so everything is completely clear and
> people do not start adding them for drivers which do not support native
> context (or like).

I really was just piggy-backing on existing comm/cmdline.. but I'll
try to write up something better.

I think it maybe should not be limited just to native context.. for
ex. if the browser did somehow manage to create different displays
associated with different drm_file instances (I guess it would have to
use gbm to do this?) it would be nice to see browser tab names.

> But on the overall it sounds reasonable to me - it would be really cool
> to not just see pcivirtio-gpu as you say. Even if the standard virtiogpu
> use case (not native context) could show real users.

For vrend/virgl, we'd first need to solve the issue that there is just
a single drm_file for all guest processes.  But really, just don't use
virgl.  (I mean, like seriously, would you put a gl driver in the
kernel?  Vrend has access to all guest memory, so this is essentially
what you have with virgl.  This is just not a sane thing to do.) The
only "valid" reason for not doing native-context is if you don't have
the src code for your UMD to be able to modify it to talk
native-context to virtgpu in the guest. ;-)

BR,
-R

> Regards,
>
> Tvrtko
>
> > +
> >   Implementation Details
> >   ======================
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> > index f0f4f845c32d..1150dcbf28aa 100644
> > --- a/drivers/gpu/drm/msm/msm_gpu.c
> > +++ b/drivers/gpu/drm/msm/msm_gpu.c
> > @@ -148,12 +148,26 @@ int msm_gpu_pm_suspend(struct msm_gpu *gpu)
> >       return 0;
> >   }
> >
> > +static void get_comm_cmdline(struct msm_file_private *ctx, char **comm, char **cmd);
> > +
> >   void msm_gpu_show_fdinfo(struct msm_gpu *gpu, struct msm_file_private *ctx,
> >                        struct drm_printer *p)
> >   {
> > +     char *comm, *cmdline;
> > +
> > +     get_comm_cmdline(ctx, &comm, &cmdline);
> > +
> >       drm_printf(p, "drm-engine-gpu:\t%llu ns\n", ctx->elapsed_ns);
> >       drm_printf(p, "drm-cycles-gpu:\t%llu\n", ctx->cycles);
> >       drm_printf(p, "drm-maxfreq-gpu:\t%u Hz\n", gpu->fast_rate);
> > +
> > +     if (comm)
> > +             drm_printf(p, "drm-comm:\t%s\n", comm);
> > +     if (cmdline)
> > +             drm_printf(p, "drm-cmdline:\t%s\n", cmdline);
> > +
> > +     kfree(comm);
> > +     kfree(cmdline);
> >   }
> >
> >   int msm_gpu_hw_init(struct msm_gpu *gpu)
Rob Clark April 19, 2023, 3 p.m. UTC | #5
On Wed, Apr 19, 2023 at 6:36 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 18/04/2023 15:56, Rob Clark wrote:
> > On Tue, Apr 18, 2023 at 1:53 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> >>
> >>
> >> On 17/04/2023 21:12, Rob Clark wrote:
> >>> From: Rob Clark <robdclark@chromium.org>
> >>>
> >>> Normally this would be the same information that can be obtained in
> >>> other ways.  But in some cases the process opening the drm fd is merely
> >>> a sort of proxy for the actual process using the GPU.  This is the case
> >>> for guest VM processes using the GPU via virglrenderer, in which case
> >>> the msm native-context renderer in virglrenderer overrides the comm/
> >>> cmdline to be the guest process's values.
> >>>
> >>> Exposing this via fdinfo allows tools like gputop to show something more
> >>> meaningful than just a bunch of "pcivirtio-gpu" users.
> >>
> >> You also later expanded with:
> >>
> >> """
> >> I should have also mentioned, in the VM/proxy scenario we have a
> >> single process with separate drm_file's for each guest VM process.  So
> >> it isn't an option to just change the proxy process's name to match
> >> the client.
> >> """
> >>
> >> So how does that work - this single process temporarily changes it's
> >> name for each drm fd it opens and creates a context or it is actually in
> >> the native context protocol?
> >
> > It is part of the protocol, the mesa driver in the VM sends[1] this
> > info to the native-context "shim" in host userspace which uses the
> > SET_PARAM ioctl to pass this to the kernel.  In the host userspace
> > there is just a single process (you see the host PID below) but it
> > does a separate open() of the drm dev for each guest process (so that
> > they each have their own GPU address space for isolation):
> >
> > DRM minor 128
> >      PID    MEM ACTIV              NAME                    gpu
> >      5297  200M   82M com.mojang.minecr |██████████████▏                        |
> >      1859  199M    0B            chrome |█▉                                     |
> >      5297   64M    9M    surfaceflinger |                                       |
> >      5297   12M    0B org.chromium.arc. |                                       |
> >      5297   12M    0B com.android.syste |                                       |
> >      5297   12M    0B org.chromium.arc. |                                       |
> >      5297   26M    0B com.google.androi |                                       |
> >      5297   65M    0B     system_server |                                       |
> >
> >
> > [1] https://gitlab.freedesktop.org/virgl/virglrenderer/-/blob/master/src/drm/msm/msm_proto.h#L326
> > [2] https://gitlab.freedesktop.org/virgl/virglrenderer/-/blob/master/src/drm/msm/msm_renderer.c#L1050
> >
> >>>
> >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>> ---
> >>>    Documentation/gpu/drm-usage-stats.rst |  8 ++++++++
> >>>    drivers/gpu/drm/msm/msm_gpu.c         | 14 ++++++++++++++
> >>>    2 files changed, 22 insertions(+)
> >>>
> >>> diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst
> >>> index 8e00d53231e0..bc90bed455e3 100644
> >>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>> @@ -148,6 +148,14 @@ percentage utilization of the engine, whereas drm-engine-<keystr> only reflects
> >>>    time active without considering what frequency the engine is operating as a
> >>>    percentage of it's maximum frequency.
> >>>
> >>> +- drm-comm: <valstr>
> >>> +
> >>> +Returns the clients executable path.
> >>
> >> Full path and not just current->comm? In this case probably give it a
> >> more descriptive name here.
> >>
> >> drm-client-executable
> >> drm-client-command-line
> >>
> >> So we stay in the drm-client- namespace?
> >>
> >> Or if the former is absolute path could one key be enough for both?
> >>
> >> drm-client-command-line: /path/to/executable --arguments
> >
> > comm and cmdline can be different. Android seems to change the comm to
> > the apk name, for example (and w/ the zygote stuff cmdline isn't
> > really a thing)
> >
> > I guess it could be drm-client-comm and drm-client-cmdline?  Although
> > comm/cmdline aren't the best names, they are just following what the
> > kernel calls them elsewhere.
>
> I wasn't sure what do you plan to do given mention of a path under the
> drm-comm description. If it is a path then comm would be misleading,
> since comm as defined in procfs is not a path, I don't think so at
> least. Which is why I was suggesting executable. But if you remove the
> mention of a path from rst and rather refer to processes' comm value I
> think that is then okay.

Oh, whoops the mention of "path" for comm was a mistake.  task->comm
is described as executable name without path, and that is what the
fdinfo field was intending to follow.

> >>> +
> >>> +- drm-cmdline: <valstr>
> >>> +
> >>> +Returns the clients cmdline.
> >>
> >> I think drm-usage-stats.rst text should provide some more text with
> >> these two. To precisely define their content and outline the use case
> >> under which driver authors may want to add them, and fdinfo consumer
> >> therefore expect to see them. Just so everything is completely clear and
> >> people do not start adding them for drivers which do not support native
> >> context (or like).
> >
> > I really was just piggy-backing on existing comm/cmdline.. but I'll
> > try to write up something better.
> >
> > I think it maybe should not be limited just to native context.. for
> > ex. if the browser did somehow manage to create different displays
> > associated with different drm_file instances (I guess it would have to
> > use gbm to do this?) it would be nice to see browser tab names.
>
> Would be cool yes.
>
> My thinking behind why we maybe do not want to blanket add them is
> because for common case is it the same information which can be obtained
> from procfs. Like in igt_drm_clients.c I get the pid and comm from
> /proc/$pid/stat. So I was thinking it is only interesting to add to
> fdinfo for drivers where it could differ by the explicit override like
> you have with native context.

Yeah, I suppose I could define them as drm-client-comm-override and
drm-client-cmdline-override

> It can be added once there is a GL/whatever extension which would allow
> it? (I am not familiar with how browsers manage rendering contexts so
> maybe I am missing something.)
>
> >> But on the overall it sounds reasonable to me - it would be really cool
> >> to not just see pcivirtio-gpu as you say. Even if the standard virtiogpu
> >> use case (not native context) could show real users.
> >
> > For vrend/virgl, we'd first need to solve the issue that there is just
> > a single drm_file for all guest processes.  But really, just don't use
> > virgl.  (I mean, like seriously, would you put a gl driver in the
> > kernel?  Vrend has access to all guest memory, so this is essentially
> > what you have with virgl.  This is just not a sane thing to do.) The
> > only "valid" reason for not doing native-context is if you don't have
> > the src code for your UMD to be able to modify it to talk
> > native-context to virtgpu in the guest. ;-)
>
> I am just observing the current state of things on an Intel based
> Chromebook. :) Presumably the custom name for a context would be
> passable via the virtio-gpu protocol or something?

It is part of the context-type specific protocol.  Ie. some parts of
the protocol are "core" and dealt with in virtgpu guest kernel driver.
But on top of that there are various context-types with their own
protocol (ie. virgl, venus, cross-domain, msm native ctx, and some WIP
native ctx types floating around)

BR,
-R

> Regards,
>
> Tvrtko
>
> >
> > BR,
> > -R
> >
> >> Regards,
> >>
> >> Tvrtko
> >>
> >>> +
> >>>    Implementation Details
> >>>    ======================
> >>>
> >>> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> >>> index f0f4f845c32d..1150dcbf28aa 100644
> >>> --- a/drivers/gpu/drm/msm/msm_gpu.c
> >>> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> >>> @@ -148,12 +148,26 @@ int msm_gpu_pm_suspend(struct msm_gpu *gpu)
> >>>        return 0;
> >>>    }
> >>>
> >>> +static void get_comm_cmdline(struct msm_file_private *ctx, char **comm, char **cmd);
> >>> +
> >>>    void msm_gpu_show_fdinfo(struct msm_gpu *gpu, struct msm_file_private *ctx,
> >>>                         struct drm_printer *p)
> >>>    {
> >>> +     char *comm, *cmdline;
> >>> +
> >>> +     get_comm_cmdline(ctx, &comm, &cmdline);
> >>> +
> >>>        drm_printf(p, "drm-engine-gpu:\t%llu ns\n", ctx->elapsed_ns);
> >>>        drm_printf(p, "drm-cycles-gpu:\t%llu\n", ctx->cycles);
> >>>        drm_printf(p, "drm-maxfreq-gpu:\t%u Hz\n", gpu->fast_rate);
> >>> +
> >>> +     if (comm)
> >>> +             drm_printf(p, "drm-comm:\t%s\n", comm);
> >>> +     if (cmdline)
> >>> +             drm_printf(p, "drm-cmdline:\t%s\n", cmdline);
> >>> +
> >>> +     kfree(comm);
> >>> +     kfree(cmdline);
> >>>    }
> >>>
> >>>    int msm_gpu_hw_init(struct msm_gpu *gpu)